WO2021169637A1 - 图像识别方法、装置、计算机设备及存储介质 - Google Patents

图像识别方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021169637A1
WO2021169637A1 PCT/CN2021/071172 CN2021071172W WO2021169637A1 WO 2021169637 A1 WO2021169637 A1 WO 2021169637A1 CN 2021071172 W CN2021071172 W CN 2021071172W WO 2021169637 A1 WO2021169637 A1 WO 2021169637A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
predicted
module
facial feature
human eye
Prior art date
Application number
PCT/CN2021/071172
Other languages
English (en)
French (fr)
Inventor
胡艺飞
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021169637A1 publication Critical patent/WO2021169637A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction

Definitions

  • This application relates to the field of face recognition, and in particular to an image recognition method, device, computer equipment and storage medium.
  • Sentiment analysis is developing rapidly with the rise of online social media (such as comments, forums, blogs, and microblogs).
  • online social media such as comments, forums, blogs, and microblogs.
  • Sentiment analysis is usually carried out in a non-contact, easy-to-collect and process scene, with the continuous development of face recognition technology, the use of image recognition technology to analyze the emotional changes of the evaluator is more and more popular with the public .
  • the class is to use a monocular camera to collect images for image recognition.
  • the equipment is expensive to equip, and it needs to be calibrated in advance for each user, so it cannot be used in bank branches to look at unspecified users.
  • Analysis scenario The method of image recognition using monocular camera to collect images is: detecting human faces, estimating the rotation angle of the human head, recognizing 68 key points on the human face to obtain pictures of eye parts, and recognizing the direction of the eyes.
  • the inventor realizes that the disadvantage of the above-mentioned method is that the image recognition model construction process is complicated, and the computational resources and time consumption are too high when the model is used.
  • Four models are needed for eye expression recognition of a picture.
  • the model occupies a large storage space and is difficult to deploy on the mobile phone.
  • Many calculations performed by the face key point recognition model are not related to eye expression judgment, and existing methods The calculation process takes a long time and cannot be analyzed in real time.
  • the existing image recognition method has high cost, low efficiency, large storage space occupation, and limited application scenarios.
  • This application provides an image recognition method, including:
  • the human eye line of sight prediction neural network model is used to recognize the image to be predicted, and the direction of the human eye line of sight is determined.
  • This application also provides an image recognition device, including:
  • the receiving unit is used to obtain the image to be detected
  • the detection unit is configured to perform face detection on the image to be detected, and obtain the face image and the positioning data of the face image;
  • a correction unit configured to correct the face image based on the positioning data to obtain an image to be predicted
  • the recognition unit is used for recognizing the image to be predicted by adopting the human eye line of sight prediction neural network model, and determining the direction of the human eye line of sight.
  • the present application also provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the above-mentioned image recognition when the computer program is executed.
  • a computer device which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the above-mentioned image recognition when the computer program is executed.
  • the image recognition method includes the following steps:
  • the human eye line of sight prediction neural network model is used to recognize the image to be predicted, and the direction of the human eye line of sight is determined.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned image recognition method is realized, and the image recognition method includes the following steps:
  • the human eye line of sight prediction neural network model is used to recognize the image to be predicted, and the direction of the human eye line of sight is determined.
  • This application corrects the face image according to the positioning data to obtain the image to be predicted for image recognition, reducing the amount of calculation, and uses the human eye line of sight prediction neural network model to recognize the image to be predicted, thereby determining the direction of the line of sight of the human eye and the recognition speed Fast, short time-consuming, the adopted neural network model of human eye line of sight prediction occupies low memory space, and the calculation speed is fast.
  • FIG. 1 is a flowchart of an embodiment of the image recognition method described in this application.
  • FIG. 2 is a flowchart of an embodiment in which the application adopts a human eye line of sight prediction neural network model to recognize the image to be predicted;
  • FIG. 3 is a block diagram of an embodiment of the image recognition device described in this application.
  • FIG. 4 is a hardware architecture diagram of an embodiment of the computer device of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, smart city, blockchain and/or big data technology to realize intelligent image recognition.
  • the data involved in this application such as images to be detected, positioning data, and/or images to be predicted, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, this application does not Make a limit.
  • the image recognition method, device, computer equipment, and storage medium provided in this application can be applied to business fields such as banking and insurance.
  • This application performs face detection on the acquired image to be detected to obtain the face image and the positioning data of the face image, and corrects the face image according to the positioning data to obtain the image to be predicted for image recognition, reducing the amount of calculation ,
  • Adopt the human eye gaze prediction neural network model to recognize the to-be-predicted image to determine the direction of the human eye gaze.
  • the recognition speed is fast and time-consuming.
  • the human eye gaze prediction neural network model occupies low memory space and fast calculation speed.
  • An image recognition method of this embodiment includes the following steps:
  • the positioning data may include: the coordinates of the center points of the two oval ellipses of the eyes, the coordinates of the nose and the coordinates of the two ends of the corners of the mouth.
  • the positioning data in this embodiment includes 5 key point coordinates, which are the coordinates of the center point of the two oval ellipses, the coordinates of the nose and the coordinates of the two ends of the mouth.
  • the line of sight needs to be obtained by acquiring 68 key points.
  • the amount of calculation is greatly reduced, and the calculation processing speed is increased.
  • the image recognition method can be widely used in a variety of application scenarios, such as bank branches, mobile terminals (such as mobile phones), billboards and other scenarios. It should be noted that the face detection network of this embodiment can detect multiple faces at a time, and obtain each face image and corresponding positioning data at the same time.
  • step S2 performs face detection on the image to be detected, and acquiring the face image and the positioning data of the face image includes:
  • a multi-task convolutional neural network (Multi-task Cascaded Convolutional Networks, referred to as MTCNN) is used to perform face detection on the image to be detected to obtain a face image and positioning data of the face image.
  • MTCNN Multi-task Cascaded Convolutional Networks
  • the multi-task convolutional neural network uses a three-layer cascade architecture combined with a convolutional neural network algorithm to detect faces and locate key points (the coordinates of the center of the two oval ellipses, the coordinates of the nose and the coordinates of the corners of the mouth).
  • the multi-task convolutional neural network consists of three parts: neural network P-Net (Proposal Network), R-Net (Refine Network) and O-Net (Output Network).
  • the fully convolutional neural network P-Net is used to process the image to be detected.
  • the window calibrates the first candidate window, uses Non-Maximum Suppression (NMS) to remove the overlapping window, and obtains the second candidate window; since the detection of neural network P-Net is relatively rough, it is adopted
  • NMS Non-Maximum Suppression
  • the neural network R-Net further optimizes the acquisition of the second candidate form.
  • the neural network R-Net is similar to the neural network P-Net.
  • the second candidate form is input into the neural network R-Net for identification, and the false form is filtered to achieve the
  • the further positioning of the face area generates the third candidate form; the neural network O-Net with one more layer of convolution than R-Net is used to supervise the third candidate form, and the overlapping windows are removed to confirm the face area.
  • the position coordinates of the five key points of the face can be located based on the confirmed face area.
  • step S4 in order to facilitate the subsequent (step S4) to perform line-of-sight recognition on the image to be predicted, it is necessary to convert the face image into an image to be predicted with a head straightened (for example, eyes looking straight ahead) for easy recognition, so as to improve the line of sight Accuracy of recognition.
  • a head straightened for example, eyes looking straight ahead
  • step S3 correcting the face image based on the positioning data, and obtaining the image to be predicted includes:
  • the positioning data is compared with the standard coordinate data, and the face image is similarly transformed according to the comparison result to generate an image to be predicted.
  • the standard coordinate data is the pre-stored standard coordinates of 5 key points.
  • the standard coordinates of the five key points include: the two oval center points of the eyes, the nose and the mouth.
  • the positioning data is compared with the standard coordinate data to obtain the relationship change amount, and similar transformations such as rotation, translation, zooming, etc. are performed on the face image based on the relationship change amount, and the face image is converted into the image to be predicted, so that The image to be predicted meets the requirements of line-of-sight recognition.
  • the correction method adopted in this embodiment effectively reduces the amount of calculation, and there is no need to train the head rotation angle estimation model, which greatly reduces Calculate the cost.
  • the human eye sight prediction neural network model includes: a separable convolution module, an attention mechanism module, and a classification module;
  • step S4 using the human eye line of sight prediction neural network model to recognize the image to be predicted, and determining the direction of the human eye line of sight may include:
  • the use of separable convolution and the convolution kernel instead of the standard convolutional neural network greatly reduces the amount of calculation and reduces the complexity of calculation.
  • the standard convolution kernel calculation amount is d ⁇ d ⁇ m ⁇ n ⁇ k ⁇ k;
  • the calculation amount of the separable convolution kernel is d ⁇ d ⁇ m ⁇ (n+k ⁇ k);
  • d represents the width of the image to be predicted
  • c represents the height of the image to be predicted
  • m and n are the number of channels
  • k represents the size of the convolutional layer
  • the separable convolution can reduce the parameter amount of the model and the calculation amount of the convolution process.
  • step S41 the separable convolution module and the forward residual module may be combined to perform first facial feature extraction on the image to be predicted.
  • the forward residual module is used to add the features obtained by the separable convolution module and the initial features at the same position. While enabling the network to learn high-level features, it will not forget useful low-level features.
  • step S41 the separable convolution module and the inverse residual module may be combined to perform first facial feature extraction on the image to be predicted.
  • each input channel of the image to be predicted is convolved using a single convolution kernel to obtain the first feature map; then pointwise convolution is used to pass 1 ⁇ 1 Convolution combines the first feature map of the previous step with a weighted combination in the depth direction to obtain more features.
  • Combine the inverse residual module with the separable convolution module and add 1 ⁇ 1 cross-channel convolution between every two point-by-point convolution channels for inter-channel information fusion to ensure the extraction of a more effective second feature map , Stitching all the second feature maps to obtain the first facial feature.
  • the anti-residual module enables the neural network to learn high-order features while not forgetting useful low-order features. At the same time, it has fewer parameters and faster calculation speed than the positive residual module. Greatly reduce the memory footprint.
  • the attention mechanism module uses a self-attention mechanism.
  • the self-attention mechanism is a mechanism that correlates the weight and the position of the sequence when calculating the same sequence representation. It has been proven to be very effective in machine reading comprehension, abstract summary and image description generation.
  • the attention mechanism module corresponds to the convolutional layer of the separable convolution module.
  • the attention mechanism module is located behind the corresponding convolution layer.
  • the attention mechanism module is used to extract the eye Convolutional features around the part, the output of each attention mechanism module is used as the input of the next attention mechanism module, and the convolutional features purified by the last attention mechanism module are used as feature weights (ie: features that enhance eye feature weights) Weights).
  • feature weights ie: features that enhance eye feature weights
  • Weights weights
  • the classification module uses a fully connected layer. Multiply the first facial feature and the feature weight to generate the second facial feature, and input the second facial feature into the fully connected layer.
  • the fully connected layer integrates the second facial features through the weight matrix, and calculates the bias based on the integrated neuron. Shift probability information, the up and down offset and the left and right offset of the line of sight corresponding to each offset probability information, and the direction of the human eye line of sight is obtained according to the up and down offset and the left and right offset.
  • step S4 the input to-be-predicted image of the human eye gaze prediction neural network model is the entire human face.
  • the accuracy of the prediction is that the muscles around the human eye can change It assists in the judgment of eye direction, but the existing method only inputs the eye picture and cannot use the surrounding information;
  • the existing method in order to obtain the eye picture, the existing method needs to build a detection model of 68 key points of the face to obtain the coordinates of the eye frame, and the amount of calculation It is large and high in cost, the calculation amount of the technical solution is small, the storage space occupied is low, and the cost is low.
  • the image recognition method obtains the face image and the positioning data of the face image by performing face detection on the acquired image to be detected, and corrects the face image according to the positioning data to obtain the pending image for image recognition. Predict the image, reduce the amount of calculation, use the human eye gaze prediction neural network model to recognize the predicted image to determine the direction of the human eye gaze, the recognition speed is fast, and the time-consuming is short, and the human eye gaze prediction neural network model occupies low memory space , The calculation speed is fast.
  • the image recognition method is compared with the human eye recognition system using infrared cameras. It only needs a monocular camera to complete image acquisition, reducing equipment costs; at the same time, it does not require human participation to calibrate, which can be widely used. It is used in various scenarios, such as bank outlets, personal mobile phones, etc.
  • the image recognition method requires only two models, and the eye gaze prediction neural network model has fewer parameters than existing eye recognition models, which greatly speeds up
  • a calculation of eye recognition can be analyzed in real time on the NVIDIA 1080 model GPU; the model of the eye prediction neural network model occupies less than 8MB of memory space, while the space memory of the existing eye recognition model is usually more than 100MB.
  • the image recognition method in this embodiment can be applied to sentiment analysis. For example, when you are nervous or lying when your eyes are erratic, it can be used as a feature of anti-fraud judgment; it can also be used to analyze customer areas of interest such as billboards; it can also be applied to small businesses. In the game, perform human eye recognition or game interaction.
  • the present application also provides an image recognition device 1, including: a receiving unit 11, a detecting unit 12, a correcting unit 13, and an identifying unit 14, wherein:
  • the receiving unit 11 is used to obtain the image to be detected
  • the detection unit 12 is configured to perform face detection on the image to be detected, and obtain the face image and the positioning data of the face image;
  • the positioning data may include: the coordinates of the center points of the two oval ellipses of the eyes, the coordinates of the nose and the coordinates of the two ends of the corners of the mouth.
  • the positioning data in this embodiment includes 5 key point coordinates, which are the coordinates of the center point of the two oval ellipses, the coordinates of the nose and the coordinates of the two ends of the mouth.
  • the line of sight needs to be obtained by acquiring 68 key points.
  • the amount of calculation is greatly reduced, and the calculation processing speed is increased.
  • the image recognition method can be widely used in a variety of application scenarios, such as bank branches, mobile terminals (such as mobile phones), billboards and other scenarios. It should be noted that the face detection network of this embodiment can detect multiple faces at a time, and obtain each face image and corresponding positioning data at the same time.
  • the detection unit 12 may use a multi-task convolutional neural network (Multi-task Cascaded Convolutional Networks, referred to as MTCNN) to perform face detection on the image to be detected, and obtain the face image and the positioning data of the face image.
  • MTCNN Multi-task Cascaded Convolutional Networks
  • the multi-task convolutional neural network uses a three-layer cascade architecture combined with a convolutional neural network algorithm to detect faces and locate key points (the coordinates of the center of the two oval ellipses, the coordinates of the nose and the coordinates of the corners of the mouth).
  • the multi-task convolutional neural network consists of three parts: neural network P-Net (Proposal Network), R-Net (Refine Network) and O-Net (Output Network).
  • the fully convolutional neural network P-Net is used to process the image to be detected.
  • the window calibrates the first candidate window, uses Non-Maximum Suppression (NMS) to remove the overlapping window, and obtains the second candidate window; since the detection of neural network P-Net is relatively rough, it is adopted
  • NMS Non-Maximum Suppression
  • the neural network R-Net further optimizes the acquisition of the second candidate form.
  • the neural network R-Net is similar to the neural network P-Net.
  • the second candidate form is input into the neural network R-Net for identification, and the false form is filtered to achieve the
  • the further positioning of the face area generates the third candidate form; the neural network O-Net with one more layer of convolution than R-Net is used to supervise the third candidate form, and the overlapping windows are removed to confirm the face area.
  • the position coordinates of the five key points of the face can be located based on the confirmed face area.
  • the correction unit 13 is configured to correct the face image based on the positioning data to obtain an image to be predicted;
  • the correction unit 13 compares the positioning data with the standard coordinate data, performs similar transformation on the face image according to the comparison result, and generates an image to be predicted.
  • the standard coordinate data is the pre-stored standard coordinates of 5 key points.
  • the standard coordinates of the five key points include: the two oval center points of the eyes, the nose and the mouth.
  • the positioning data is compared with the standard coordinate data to obtain the relationship change amount, and similar transformations such as rotation, translation, zooming, etc. are performed on the face image based on the relationship change amount, and the face image is converted into the image to be predicted, so that The image to be predicted meets the requirements of line-of-sight recognition.
  • the correction method adopted in this embodiment effectively reduces the amount of calculation, and there is no need to train the head rotation angle estimation model, which greatly reduces Calculate the cost.
  • the recognition unit 14 is configured to recognize the to-be-predicted image using a human eye line of sight prediction neural network model, and determine the direction of the human eye line of sight.
  • the human eye sight prediction neural network model includes: a separable convolution module, an attention mechanism module, and a classification module;
  • the recognition unit 14 performs first facial feature extraction on the image to be predicted through the separable convolution module; the separable convolution module and the forward residual module can be combined to perform the first facial feature extraction on the image to be predicted.
  • the forward residual module is used to add the features obtained by the separable convolution module and the initial features at the same position. While enabling the network to learn high-level features, it will not forget useful low-level features.
  • the separable convolution module and the inverse residual module can be combined to perform the first facial feature extraction on the image to be predicted.
  • each input channel of the image to be predicted is convolved using a single convolution kernel to obtain the first feature map; then pointwise convolution is used to pass 1 ⁇ 1 Convolution combines the first feature map of the previous step with a weighted combination in the depth direction to obtain more features.
  • Combine the inverse residual module with the separable convolution module and add 1 ⁇ 1 cross-channel convolution between every two point-by-point convolution channels for inter-channel information fusion to ensure the extraction of a more effective second feature map , Stitching all the second feature maps to obtain the first facial feature.
  • the anti-residual module enables the neural network to learn high-order features while not forgetting useful low-order features. At the same time, it has fewer parameters and faster calculation speed than the positive residual module. Greatly reduce the memory footprint.
  • the recognition unit 14 adjusts the weight of the first facial feature through the attention mechanism module to obtain feature weights that enhance the eye feature weight; the attention mechanism module adopts a self-attention mechanism.
  • the self-attention mechanism is a mechanism that correlates the weight and the position of the sequence when calculating the same sequence representation. It has been proven to be very effective in machine reading comprehension, abstract summary and image description generation.
  • the attention mechanism module corresponds to the convolutional layer of the separable convolution module.
  • the attention mechanism module is located behind the corresponding convolutional layer.
  • the attention mechanism module extracts the eye. Convolutional features around the part, the output of each attention mechanism module is used as the input of the next attention mechanism module, and the convolutional features purified by the last attention mechanism module are used as feature weights (ie: features that enhance eye feature weights) Weights).
  • feature weights ie: features that enhance eye feature weights
  • Weights weights
  • the recognition unit 14 combines the first facial feature and the feature weight to generate a second facial feature, and processes the second facial feature through the classification module to obtain the direction of the line of sight of the human eye.
  • the classification module uses a fully connected layer. Multiply the first facial feature and the feature weight to generate the second facial feature, and input the second facial feature into the fully connected layer.
  • the fully connected layer integrates the second facial features through the weight matrix, and calculates the bias based on the integrated neuron. Shift probability information, the up and down offset and the left and right offset of the line of sight corresponding to each offset probability information, and the direction of the human eye line of sight is obtained according to the up and down offset and the left and right offset.
  • the input to-be-predicted image of the human eye gaze prediction neural network model is the entire face.
  • the muscle changes around the eye can assist in the judgment of the eye direction.
  • the existing method only inputs the eye picture and cannot use the surrounding information;
  • the existing method needs to build a detection model of 68 key points of the face to obtain the coordinates of the eye frame, which requires a large amount of calculation and high cost , Adopting this technical solution has a small amount of calculation, low storage space and low cost.
  • the image recognition device 1 obtains the face image and the positioning data of the face image by performing face detection on the acquired image to be detected, and corrects the face image according to the positioning data to obtain the image recognition data.
  • the image to be predicted reduces the amount of calculation.
  • the human eye gaze prediction neural network model is used to recognize the predicted image to determine the direction of the eye gaze. The recognition speed is fast and time-consuming.
  • the human eye gaze prediction neural network model occupies memory space Low and fast calculation speed.
  • the image recognition method is compared with the human eye recognition system using infrared cameras. It only needs a monocular camera to complete image acquisition, reducing equipment costs; at the same time, it does not require human participation to calibrate, which can be widely used. It is used in various scenarios, such as bank outlets, personal mobile phones, etc.
  • the image recognition method requires only two models, and the eye gaze prediction neural network model has fewer parameters than existing eye recognition models, which greatly speeds up
  • a calculation of eye recognition can be analyzed in real time on the NVIDIA 1080 model GPU; the model of the eye prediction neural network model occupies less than 8MB of memory space, while the space memory of the existing eye recognition model is usually more than 100MB.
  • the image recognition device 1 in this embodiment can be applied to emotion analysis, such as: when you are nervous or lying when your eyes are erratic, it can be used as a feature of anti-fraud judgment; it can also be used to analyze customer areas of interest such as billboards; it can also be applied to In mini games, human eye recognition or game interaction is performed.
  • emotion analysis such as: when you are nervous or lying when your eyes are erratic, it can be used as a feature of anti-fraud judgment; it can also be used to analyze customer areas of interest such as billboards; it can also be applied to In mini games, human eye recognition or game interaction is performed.
  • the present application also provides a computer device 2 which includes a plurality of computer devices 2.
  • the components of the image recognition device 1 in the second embodiment can be dispersed in different computer devices 2.
  • the computer device 2 It can be a smartphone, tablet, laptop, desktop computer, rack server, blade server, tower server, or rack server (including independent servers, or server clusters composed of multiple servers) that executes the program.
  • the computer device 2 in this embodiment at least includes but is not limited to: a memory and a processor.
  • the computer equipment 2 may also include a network interface and/or an image recognition device.
  • the computer equipment 2 may include a memory 21, a processor 23, a network interface 22, and an image recognition device 1, such as a memory 21, a processor 23, a network interface 22, and an image recognition device 1 (refer to Figure 4).
  • FIG. 4 only shows the computer device 2 with components, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the image recognition method in the first embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 23 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 23 is generally used to control the overall operation of the computer device 2, for example, to perform data interaction or communication-related control and processing with the computer device 2.
  • the processor 23 is used to run the program code or processing data stored in the memory 21, for example, to run the image recognition device 1 and the like.
  • the network interface 22 may include a wireless network interface or a wired network interface, and the network interface 22 is generally used to establish a communication connection between the computer device 2 and other computer devices 2.
  • the network interface 22 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 4 only shows the computer device 2 with components 21-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the image recognition device 1 stored in the memory 21 may also be divided into one or more program modules.
  • the one or more program modules are stored in the memory 21 and are composed of one or more program modules. It is executed by two processors (in this embodiment, the processor 23) to complete the application.
  • the present application also provides a computer-readable storage medium, which includes multiple storage media, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM ), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App applications
  • a shopping mall, etc. has a computer program stored thereon, and the program is executed by the processor 23 to realize corresponding functions.
  • the computer-readable storage medium of this embodiment is used to store the image recognition device 1, and when executed by the processor 23, implements the image recognition method of the first embodiment.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别方法、装置、计算机设备及存储介质,属于人脸识别领域。通过对获取的待检测图像进行人脸检测,得到人脸图像及人脸图像的定位数据(S2),根据定位数据对人脸图像进行校正以获取用于图像识别的待预测图像(S3),减低了计算量,采用人眼视线预测神经网络模型对待预测图像进行识别,从而确定人眼视线方向(S4),识别速度快,耗时短,采用的人眼视线预测神经网络模型占用内存空间低,运算速度快。

Description

图像识别方法、装置、计算机设备及存储介质
本申请要求于2020年2月28日提交中国专利局、申请号为202010127177.3,发明名称为“图像识别方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人脸识别领域,尤其涉及一种图像识别方法、装置、计算机设备及存储介质。
背景技术
情感分析伴随着网络社会媒体(如评论、论坛、博客和微博)的兴起而快速发展,通过对人的情感分析可以分析出其表达的观点、情感、评价、态度、情绪及倾向等。由于人在发生心理变化时,会引起一些生理参数(如:皮肤电、心跳、血压、呼吸脑电波、声音及视线等)的变化,因此,可通过检测这些变化来评估被分析者的情感变化。考虑到情感分析通常在非接触、便于采集及处理的场景下进行,因此随着人脸识别技术的不断发展采用图像识别技术对被评估者的情绪变化进行分析的技术越来越被大众所青睐。
发明人发现,现有的图像识别系统主要分为两类,一类是利用红外摄像头采集图像进行图像识别,例如:外星人电脑(alienware)的眼动追踪系统(Tobi Eye Tracking);另一类是利用单目摄像头采集图像进行图像识别。关于采用红外摄像头采集图像进行图像识别的类技术其存在的缺陷主要有:设备配备成本昂贵,同时需要对每个使用人进行事先定标,这样没法用在银行网点等对非特定用户进行眼神分析的场景。对于利用单目摄像头采集图像进行图像识别方法为:检测人脸,估计人体头部转动角度,对人脸进行68个关键点识别从而得到眼睛部位图片,进行眼神方向识别。但是,发明人意识到,上述方法存在的缺陷是:图像识别模型构建过程复杂,在使用模型时,计算资源及耗时过高。对一张图片的眼神识别需要利用4个模型,模型所占存储空间大,手机端部署难度大;采用人脸关键点识别模型进行的很多计算都是与眼神判断无关的,且现有的方法计算过程耗时长,无法做到实时分析。
综上所述,现有的图像识别方法成本高、效率低、占用存储空间大,应用场景受限。
发明内容
针对现有图像识别方法效率低、占用存储空间大的问题,现提供一种旨在可提高识别效率,占用存储空间小的图像识别方法、装置、计算机设备及存储介质。
本申请提供了一种图像识别方法,包括:
获取待检测图像;
对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
本申请还提供了一种图像识别装置,包括:
接收单元,用于获取待检测图像;
检测单元,用于对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
校正单元,用于基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
识别单元,用于采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
本申请还提供了一种计算机设备,所述计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述图像识别方法,该图像识别方法包括以下步骤:
获取待检测图像;
对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述图像识别方法,该图像识别方法包括以下步骤:
获取待检测图像;
对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
本申请根据定位数据对人脸图像进行校正以获取用于图像识别的待预测图像,减低了计算量,采用人眼视线预测神经网络模型对待预测图像进行识别,从而确定人眼视线方向,识别速度快,耗时短,采用的人眼视线预测神经网络模型占用内存空间低,运算速度快。
附图说明
图1为本申请所述的图像识别方法的一种是实施例的流程图;
图2为本申请采用人眼视线预测神经网络模型对所述待预测图像进行识别的一种是实施例的流程图;
图3为本申请所述的图像识别装置的一种实施例的模块图;
图4为本申请计算机设备的一个实施例的硬件架构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请的技术方案可应用于人工智能、智慧城市、区块链和/或大数据技术领域,以实现智能化图像识别。可选的,本申请涉及的数据如待检测图像、定位数据和/或待预测图像等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
本申请提供的图像识别方法、装置、计算机设备及存储介质可应用于银行、保险等业务领域。本申请通过对获取的待检测图像进行人脸检测,得到人脸图像及人脸图像的定位数据,根据定位数据对人脸图像进行校正以获取用于图像识别的待预测图像,减低了计算量,采用人眼视线预测神经网络模型对待预测图像进行识别,从而确定人眼视线方向,识别速度快,耗时短,采用的人眼视线预测神经网络模型占用内存空间低,运算速度快。
实施例一
请参阅图1,本实施例的一种图像识别方法,包括下述步骤:
S1.获取待检测图像;
在本实施例中,多于采集图像的设备没有严格的要求,可采用单目摄像头采集待检测图像,对采集设备的要求低,可有效的降低设备成本。
S2.对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
其中,所述定位数据可包括:两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标。
在本实施例中的定位数据包括5个关键点坐标,分别为两个眼部椭圆形中心点坐标、 鼻头坐标和嘴角两端的坐标,相比于现有技术需通过获取68个关键点进行视线预测而言,大大降低了计算量,提高了计算处理速度。图像识别方法可广泛应用于多种应用场景中,例如:银行网点、移动终端(如:手机端)、广告牌等场景中。需要说明的是,本实施例的人脸检测网络可一次检测多张人脸,并同时得到每一个人脸图像及相应的定位数据。
进一步地,步骤S2对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据包括:
采用多任务卷积神经网络(Multi-task Cascaded Convolutional Networks,简称MTCNN)对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据。
多任务卷积神经网络是利用三层级联架构结合卷积神经网络算法对人脸进行检测和关键点(两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标)的定位。多任务卷积神经网络包括三个部分:神经网络P-Net(Proposal Network)、R-Net(Refine Network)和O-Net(Output Network),采用全卷积神经网络P-Net对待检测图像进行识别获得第一候选窗体(在待检测图像中标识人脸位置的窗体)和边界回归向量,依据边界回归向量计算每一个第一候选窗体的偏移量,从而确定边界窗口,依据边界窗口对第一候选窗体进行校准,利用非极大值抑制(Non-Maximum Suppression,简称NMS)去除重叠窗体,获取第二候选窗体;由于神经网络P-Net的检测比较粗略,因此采用神经网络R-Net对获取第二候选窗体进一步优化,神经网络R-Net和神经网络P-Net类似,将第二候选窗体输入神经网络R-Net进行识别,过滤虚假窗体以实现对人脸区域的进一步定位,生成第三候选窗体;采用比R-Net多一层卷积的神经网络O-Net对第三候选窗体进行监督,去除重叠窗口,从而确认脸部区域,同时可基于确认的脸部区域定位五个面部关键点的位置坐标。
S3.基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
在本实施例中,为了方便后续(步骤S4)对待预测图像进行视线识别,因此需要将人脸图像转换为便于识别的头部摆正(如:眼睛正视前方)的待预测图像,以提高视线识别的准确度。
进一步地,步骤S3基于所述定位数据对所述人脸图像进行校正,获取待预测图像包括:
将所述定位数据与所述标准坐标数据进行比对,根据比对结果对所述人脸图像进行相似变换,生成待预测图像。
需要说明的是,标准坐标数据为预先存储的5个关键点标准坐标。5个关键点标准坐标包括:两个眼部椭圆形中心点标注坐标、鼻头标注坐标和嘴角两端的标注坐标。
在实施例中,将定位数据与标准坐标数据进行比对获取关系变化量,基于关系变化量对人脸图像进行旋转、平移、缩放等相似变换,将人脸图像转换为待预测图像,以使待预测图像达到视线识别的要求。相比现有的校正方法需要采用深度神经网络模型计算头部转动角度,在本实施例中采用的校正方法有效的减少了计算量,而且无需用训练头部转动角度估计的模型,大大的降低了计算成本。
S4.采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
需要说明的是,所述人眼视线预测神经网络模型包括:可分离卷积模块、注意力机制模块和分类模块;
如图2所示,进一步地,步骤S4所述采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向可包括:
S41.通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取;
在本步骤中,采用可分离卷积和代替标准卷积神经网络的卷积核大大减小了计算量,降低计算的复杂度。以输入的待预测图像为d×c×m,输出的第一面部特征是d×c×n,卷积层为k×k为例:
标准的卷积核计算量为d×d×m×n×k×k;
可分离卷积核的计算量为d×d×m×(n+k×k);
其中,d表示待预测图像的宽,c表示待预测图像的高,m和n均为通道数,k表示卷积层的尺寸;
由此可见,可分离卷积减少模型的参数量和卷积过程的计算量。
在步骤S41中,可将可分离卷积模块与正向残差模块结合对待预测图像进行第一面部特征提取。
利用正向残差模块将可分离卷积模块获得的特征和初始特征在相同位置进行相加。使网络学习到高阶特征的同时,不会遗忘有用的低阶特征。
在步骤S41中,可将可分离卷积模块与反向残差模块结合对待预测图像进行第一面部特征提取。
通过可分离卷积模块的深度卷积(depthwise convolution)对待预测图像的每个输入通道利用单个卷积核进行卷积获取第一特征图;再采用逐点卷积(pointwise convolution)通过1×1卷积将上一步的第一特征图在深度方向进行加权组合,获得更多特征。将反向残差模块与可分离卷积模块结合,在每两个逐点卷积通道之间加入1×1的跨通道卷积进行通道间信息融合,以保证提取更有效的第二特征图,将所有的第二特征图进行拼接获取第一面部特征。在本实施例中通过反残差模块在使神经网络学习到高阶特征的同时,不会遗忘有用的低阶特征,同时相比正向的残差模块参数量更少、计算速度更快,极大的减少内存占用空间。
S42.通过所述注意力机制模块对所述第一面部特征的权重进行调整,获取增强眼部特征权重的特征权重;
在本步骤中,注意力机制模块采用自注意力机制。其中,自注意力机制是一种在计算同一序列表示时,权重和序列的位置相关机制,被证明在机器阅读理解,抽象概要和图片描述生成中非常有效。
在本实施例中,可包括多个注意力机制模块,注意力机制模块与可分离卷积模块的卷积层对应,注意力机制模块位于相应的卷积层后面,通过注意力机制模块提取眼部周围的卷积特征,每一注意力机制模块的输出作为下一个注意力机制模块的输入,最后一个注意力机制模块提纯后的卷积特征作为特征权重(即:增强眼部特征权重的特征权重)。通过注意力机制调节权重的方式在第一面部特征的基础上增强对眼部周围特征的提取,进而根据眼球特征及眼部肌肉的特征生成眼部特征,获取可增强眼部特征的特征权重。
S43.将所述第一面部特征和所述特征权重结合生成第二面部特征,通过所述分类模块对所述第二面部特征进行处理,获取人眼视线方向。
在本步骤中,分类模块采用全连接层。将第一面部特征和特征权重相乘生成第二面部特征,将第二面部特征输入全连接层,全连接层通过权值矩阵将第二面部特征进行整合,基于整合后的神经元计算偏移概率信息,每一偏移概率信息对应的视线上下偏移量和左右的偏移量,根据上下偏移量和左右的偏移量获取人眼视线方向。
在步骤S4中,人眼视线预测神经网络模型的输入待预测图像是整张人脸,相对于现有技术主要有两个优势:一是对于预测的准确性上,人眼周围的肌肉变化能辅助进行眼神方向判断,而现有方法只是输入眼睛图片,无法利用周围的信息;二是现有方法为了得到眼睛图片,需要构建68个人脸关键点的检测模型,得到眼框的坐标,计算量大,且成本高,采用本技术方案计算量小,占用的存储空间低且成本低。
在本实施例中,图像识别方法通过对获取的待检测图像进行人脸检测,得到人脸图像及人脸图像的定位数据,根据定位数据对人脸图像进行校正以获取用于图像识别的待预测图像,减低了计算量,采用人眼视线预测神经网络模型对待预测图像进行识别,从而确定人眼视线方向,识别速度快,耗时短,采用的人眼视线预测神经网络模型占用内存空间低, 运算速度快。
在实际应用中,图像识别方法相较于采用红外摄像头的人眼识别系统,只需要一个单目摄像头即可完成图像采集,减低了设备成本;同时,不需要对人为参与进行定标,可以广泛的利用在各种场景,如银行网点,个人手机等。图像识别方法相较于其他使用单目摄像头的人眼识别系统而言,只需要两个模型,而且人眼视线预测神经网络模型比现有的人眼识别模型参数量更少,极大加快了一次眼神识别的计算,在英伟达1080型号GPU上能做到实时分析;眼神预测神经网络模型的模型占用内存空间在8MB以内,而现有的人眼识别模型的空间内存通常在100MB以上。
本实施例中的图像识别方法可应用于情绪分析,如:紧张或撒谎时眼神飘忽,可用于反欺诈判断的一个特征;还可用于对广告牌等客户感兴趣区域分析;还可应用在小游戏中,进行人眼识别或游戏互动等。
实施例二
如图3所示,本申请还提供了一种图像识别装置1,包括:接收单元11、检测单元12、校正单元13和识别单元14,其中:
接收单元11,用于获取待检测图像;
在本实施例中,多于采集图像的设备没有严格的要求,可采用单目摄像头采集待检测图像,对采集设备的要求低,可有效的降低设备成本。
检测单元12,用于对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
其中,所述定位数据可包括:两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标。
在本实施例中的定位数据包括5个关键点坐标,分别为两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标,相比于现有技术需通过获取68个关键点进行视线预测而言,大大降低了计算量,提高了计算处理速度。图像识别方法可广泛应用于多种应用场景中,例如:银行网点、移动终端(如:手机端)、广告牌等场景中。需要说明的是,本实施例的人脸检测网络可一次检测多张人脸,并同时得到每一个人脸图像及相应的定位数据。
具体地,检测单元12可采用多任务卷积神经网络(Multi-task Cascaded Convolutional Networks,简称MTCNN)对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据。
多任务卷积神经网络是利用三层级联架构结合卷积神经网络算法对人脸进行检测和关键点(两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标)的定位。多任务卷积神经网络包括三个部分:神经网络P-Net(Proposal Network)、R-Net(Refine Network)和O-Net(Output Network),采用全卷积神经网络P-Net对待检测图像进行识别获得第一候选窗体(在待检测图像中标识人脸位置的窗体)和边界回归向量,依据边界回归向量计算每一个第一候选窗体的偏移量,从而确定边界窗口,依据边界窗口对第一候选窗体进行校准,利用非极大值抑制(Non-Maximum Suppression,简称NMS)去除重叠窗体,获取第二候选窗体;由于神经网络P-Net的检测比较粗略,因此采用神经网络R-Net对获取第二候选窗体进一步优化,神经网络R-Net和神经网络P-Net类似,将第二候选窗体输入神经网络R-Net进行识别,过滤虚假窗体以实现对人脸区域的进一步定位,生成第三候选窗体;采用比R-Net多一层卷积的神经网络O-Net对第三候选窗体进行监督,去除重叠窗口,从而确认脸部区域,同时可基于确认的脸部区域定位五个面部关键点的位置坐标。
校正单元13,用于基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
在本实施例中,为了方便后续对待预测图像进行视线识别,因此需要将人脸图像转换为便于识别的头部摆正(如:眼睛正视前方)的待预测图像,以提高视线识别的准确度。
校正单元13将所述定位数据与所述标准坐标数据进行比对,根据比对结果对所述人脸图像进行相似变换,生成待预测图像。
需要说明的是,标准坐标数据为预先存储的5个关键点标准坐标。5个关键点标准坐标包括:两个眼部椭圆形中心点标注坐标、鼻头标注坐标和嘴角两端的标注坐标。
在实施例中,将定位数据与标准坐标数据进行比对获取关系变化量,基于关系变化量对人脸图像进行旋转、平移、缩放等相似变换,将人脸图像转换为待预测图像,以使待预测图像达到视线识别的要求。相比现有的校正方法需要采用深度神经网络模型计算头部转动角度,在本实施例中采用的校正方法有效的减少了计算量,而且无需用训练头部转动角度估计的模型,大大的降低了计算成本。
识别单元14,用于采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
需要说明的是,所述人眼视线预测神经网络模型包括:可分离卷积模块、注意力机制模块和分类模块;
识别单元14通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取;可将可分离卷积模块与正向残差模块结合对待预测图像进行第一面部特征提取。利用正向残差模块将可分离卷积模块获得的特征和初始特征在相同位置进行相加。使网络学习到高阶特征的同时,不会遗忘有用的低阶特征。
可将可分离卷积模块与反向残差模块结合对待预测图像进行第一面部特征提取。通过可分离卷积模块的深度卷积(depthwise convolution)对待预测图像的每个输入通道利用单个卷积核进行卷积获取第一特征图;再采用逐点卷积(pointwise convolution)通过1×1卷积将上一步的第一特征图在深度方向进行加权组合,获得更多特征。将反向残差模块与可分离卷积模块结合,在每两个逐点卷积通道之间加入1×1的跨通道卷积进行通道间信息融合,以保证提取更有效的第二特征图,将所有的第二特征图进行拼接获取第一面部特征。在本实施例中通过反残差模块在使神经网络学习到高阶特征的同时,不会遗忘有用的低阶特征,同时相比正向的残差模块参数量更少、计算速度更快,极大的减少内存占用空间。
识别单元14通过所述注意力机制模块对所述第一面部特征的权重进行调整,获取增强眼部特征权重的特征权重;注意力机制模块采用自注意力机制。其中,自注意力机制是一种在计算同一序列表示时,权重和序列的位置相关机制,被证明在机器阅读理解,抽象概要和图片描述生成中非常有效。
在本实施例中,可包括多个注意力机制模块,注意力机制模块与可分离卷积模块的卷积层对应,注意力机制模块位于相应的卷积层后面,通过注意力机制模块提取眼部周围的卷积特征,每一注意力机制模块的输出作为下一个注意力机制模块的输入,最后一个注意力机制模块提纯后的卷积特征作为特征权重(即:增强眼部特征权重的特征权重)。通过注意力机制调节权重的方式在第一面部特征的基础上增强对眼部周围特征的提取,进而根据眼球特征及眼部肌肉的特征生成眼部特征,获取可增强眼部特征的特征权重。
识别单元14将所述第一面部特征和所述特征权重结合生成第二面部特征,通过所述分类模块对所述第二面部特征进行处理,获取人眼视线方向。
分类模块采用全连接层。将第一面部特征和特征权重相乘生成第二面部特征,将第二面部特征输入全连接层,全连接层通过权值矩阵将第二面部特征进行整合,基于整合后的神经元计算偏移概率信息,每一偏移概率信息对应的视线上下偏移量和左右的偏移量,根据上下偏移量和左右的偏移量获取人眼视线方向。
人眼视线预测神经网络模型的输入待预测图像是整张人脸,相对于现有技术主要有两个优势:一是对于预测的准确性上,人眼周围的肌肉变化能辅助进行眼神方向判断,而现有方法只是输入眼睛图片,无法利用周围的信息;二是现有方法为了得到眼睛图片,需要 构建68个人脸关键点的检测模型,得到眼框的坐标,计算量大,且成本高,采用本技术方案计算量小,占用的存储空间低且成本低。
在本实施例中,图像识别装置1通过对获取的待检测图像进行人脸检测,得到人脸图像及人脸图像的定位数据,根据定位数据对人脸图像进行校正以获取用于图像识别的待预测图像,减低了计算量,采用人眼视线预测神经网络模型对待预测图像进行识别,从而确定人眼视线方向,识别速度快,耗时短,采用的人眼视线预测神经网络模型占用内存空间低,运算速度快。
在实际应用中,图像识别方法相较于采用红外摄像头的人眼识别系统,只需要一个单目摄像头即可完成图像采集,减低了设备成本;同时,不需要对人为参与进行定标,可以广泛的利用在各种场景,如银行网点,个人手机等。图像识别方法相较于其他使用单目摄像头的人眼识别系统而言,只需要两个模型,而且人眼视线预测神经网络模型比现有的人眼识别模型参数量更少,极大加快了一次眼神识别的计算,在英伟达1080型号GPU上能做到实时分析;眼神预测神经网络模型的模型占用内存空间在8MB以内,而现有的人眼识别模型的空间内存通常在100MB以上。
本实施例中的图像识别装置1可应用于情绪分析,如:紧张或撒谎时眼神飘忽,可用于反欺诈判断的一个特征;还可用于对广告牌等客户感兴趣区域分析;还可应用在小游戏中,进行人眼识别或游戏互动等。
实施例三
为实现上述目的,本申请还提供一种计算机设备2,该计算机设备2包括多个计算机设备2,实施例二的图像识别装置1的组成部分可分散于不同的计算机设备2中,计算机设备2可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备2至少包括但不限于:存储器和处理器。可选的,该计算机设备2还可包括网络接口和/或图像识别装置。例如,该计算机设备2可包括存储器21、处理器23、网络接口22以及图像识别装置1,如可通过系统总线相互通信连接的存储器21、处理器23、网络接口22以及图像识别装置1(参考图4)。需要指出的是,图4仅示出了具有组件-的计算机设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,所述存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例一的图像识别方法的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器23在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器23通常用于控制计算机设备2的总体操作例如执行与所述计算机设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器23用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的图像识别装置1等。
所述网络接口22可包括无线网络接口或有线网络接口,该网络接口22通常用于在所述计算机设备2与其他计算机设备2之间建立通信连接。例如,所述网络接口22用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图4仅示出了具有部件21-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的所述图像识别装置1还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器23)所执行,以完成本申请。
实施例四
为实现上述目的,本申请还提供一种计算机可读存储介质,其包括多个存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器23执行时实现相应功能。本实施例的计算机可读存储介质用于存储图像识别装置1,被处理器23执行时实现实施例一的图像识别方法。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种图像识别方法,其中,包括:
    获取待检测图像;
    对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
    基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
    采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
  2. 根据权利要求1所述的图像识别方法,其中,所述对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据包括:
    采用多任务卷积神经网络对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据。
  3. 根据权利要求1或2所述的图像识别方法,其中,所述定位数据包括:两个眼部椭圆形中心点坐标、鼻头坐标和嘴角两端的坐标。
  4. 根据权利要求1所述的图像识别方法,其中,所述基于所述定位数据对所述人脸图像进行校正,获取待预测图像包括:
    将所述定位数据与标准坐标数据进行比对,根据比对结果对所述人脸图像进行相似变换,生成待预测图像。
  5. 根据权利要求1所述的图像识别方法,其中,所述人眼视线预测神经网络模型包括:可分离卷积模块、注意力机制模块和分类模块;
    所述采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向包括:
    通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取;
    通过所述注意力机制模块对所述第一面部特征的权重进行调整,获取增强眼部特征权重的特征权重;
    将所述第一面部特征和所述特征权重结合生成第二面部特征,通过所述分类模块对所述第二面部特征进行处理,获取人眼视线方向。
  6. 根据权利要求5所述的图像识别方法,其中,所述可分离卷积模块与正向残差模块结合对待预测图像进行所述第一面部特征提取;采用所述正向残差模块将所述可分离卷积模块获得的特征和初始特征在相同位置进行相加,以获取所述第一面部特征提取。
  7. 根据权利要求5所述的图像识别方法,其中,所述可分离卷积模块与反向残差模块结合对待预测图像进行所述第一面部特征提取;将所述反向残差模块与所述可分离卷积模块结合,在每两个逐点卷积通道之间加入1×1的跨通道卷积进行通道间信息融合,以获取第一面部特征提取。
  8. 一种图像识别装置,其中,包括:
    接收单元,用于获取待检测图像;
    检测单元,用于对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
    校正单元,用于基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
    识别单元,用于采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
  9. 一种计算机设备,所述计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中:所述处理器执行所述计算机程序时实现图像识别方法,所述图像识别方法包括以下步骤:
    获取待检测图像;
    对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
    基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
    采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
  10. 根据权利要求9所述的计算机设备,其中,执行所述对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据,包括:
    采用多任务卷积神经网络对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据。
  11. 根据权利要求9所述的计算机设备,其中,执行所述基于所述定位数据对所述人脸图像进行校正,获取待预测图像,包括:
    将所述定位数据与标准坐标数据进行比对,根据比对结果对所述人脸图像进行相似变换,生成待预测图像。
  12. 根据权利要求9所述的计算机设备,其中,所述人眼视线预测神经网络模型包括:可分离卷积模块、注意力机制模块和分类模块;
    执行所述采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向,包括:
    通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取;
    通过所述注意力机制模块对所述第一面部特征的权重进行调整,获取增强眼部特征权重的特征权重;
    将所述第一面部特征和所述特征权重结合生成第二面部特征,通过所述分类模块对所述第二面部特征进行处理,获取人眼视线方向。
  13. 根据权利要求12所述的计算机设备,其中,执行所述通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取,包括:
    通过所述可分离卷积模块与正向残差模块结合对待预测图像进行所述第一面部特征提取;包括采用所述正向残差模块将所述可分离卷积模块获得的特征和初始特征在相同位置进行相加,以获取所述第一面部特征。
  14. 根据权利要求12所述的计算机设备,其中,执行所述通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取,包括:
    通过所述可分离卷积模块与反向残差模块结合对待预测图像进行所述第一面部特征提取;包括将所述反向残差模块与所述可分离卷积模块结合,在每两个逐点卷积通道之间加入1×1的跨通道卷积进行通道间信息融合,以获取第一面部特征。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中:所述计算机程序被处理器执行时实现图像识别方法,所述图像识别方法包括以下步骤:
    获取待检测图像;
    对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据;
    基于所述定位数据对所述人脸图像进行校正,获取待预测图像;
    采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向。
  16. 根据权利要求15所述的计算机可读存储介质,其中,执行所述对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据,包括:
    采用多任务卷积神经网络对所述待检测图像进行人脸检测,获取人脸图像及所述人脸图像的定位数据。
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述基于所述定位数据对所述人脸图像进行校正,获取待预测图像,包括:
    将所述定位数据与标准坐标数据进行比对,根据比对结果对所述人脸图像进行相似变换,生成待预测图像。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述人眼视线预测神经网络 模型包括:可分离卷积模块、注意力机制模块和分类模块;
    执行所述采用人眼视线预测神经网络模型对所述待预测图像进行识别,确定人眼视线方向,包括:
    通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取;
    通过所述注意力机制模块对所述第一面部特征的权重进行调整,获取增强眼部特征权重的特征权重;
    将所述第一面部特征和所述特征权重结合生成第二面部特征,通过所述分类模块对所述第二面部特征进行处理,获取人眼视线方向。
  19. 根据权利要求18所述的计算机可读存储介质,其中,执行所述通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取,包括:
    通过所述可分离卷积模块与正向残差模块结合对待预测图像进行所述第一面部特征提取;包括采用所述正向残差模块将所述可分离卷积模块获得的特征和初始特征在相同位置进行相加,以获取所述第一面部特征。
  20. 根据权利要求18所述的计算机可读存储介质,其中,执行所述通过所述可分离卷积模块对所述待预测图像进行第一面部特征提取,包括:
    通过所述可分离卷积模块与反向残差模块结合对待预测图像进行所述第一面部特征提取;包括将所述反向残差模块与所述可分离卷积模块结合,在每两个逐点卷积通道之间加入1×1的跨通道卷积进行通道间信息融合,以获取第一面部特征。
PCT/CN2021/071172 2020-02-28 2021-01-12 图像识别方法、装置、计算机设备及存储介质 WO2021169637A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010127177.3A CN111310705A (zh) 2020-02-28 2020-02-28 图像识别方法、装置、计算机设备及存储介质
CN202010127177.3 2020-02-28

Publications (1)

Publication Number Publication Date
WO2021169637A1 true WO2021169637A1 (zh) 2021-09-02

Family

ID=71149407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071172 WO2021169637A1 (zh) 2020-02-28 2021-01-12 图像识别方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN111310705A (zh)
WO (1) WO2021169637A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114115535A (zh) * 2021-11-12 2022-03-01 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 基于银河锐华移动操作系统的眼动追踪、识别方法及系统
CN114360042A (zh) * 2022-01-07 2022-04-15 桂林电子科技大学 一种人眼注视方向预测方法及系统
CN116912924A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310705A (zh) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 图像识别方法、装置、计算机设备及存储介质
CN111639537A (zh) * 2020-04-29 2020-09-08 深圳壹账通智能科技有限公司 人脸动作单元识别方法、装置、电子设备及存储介质
CN111767846B (zh) * 2020-06-29 2024-08-23 北京百度网讯科技有限公司 图像识别方法、装置、设备和计算机存储介质
CN111710109A (zh) * 2020-07-01 2020-09-25 中国银行股份有限公司 取款控制方法、装置和系统
CN112464793B (zh) * 2020-11-25 2024-09-06 东软教育科技集团有限公司 一种在线考试作弊行为检测方法、系统和存储介质
CN112749655B (zh) * 2021-01-05 2024-08-02 风变科技(深圳)有限公司 视线追踪方法、装置、计算机设备和存储介质
CN113111745B (zh) * 2021-03-30 2023-04-07 四川大学 基于openpose的产品关注度的眼动识别的方法
CN112801069B (zh) * 2021-04-14 2021-06-29 四川翼飞视科技有限公司 一种人脸关键特征点检测装置、方法和存储介质
CN114706484B (zh) * 2022-04-18 2024-08-09 Oppo广东移动通信有限公司 视线坐标确定方法及装置、计算机可读介质和电子设备
CN114898447B (zh) * 2022-07-13 2022-10-11 北京科技大学 一种基于自注意力机制的个性化注视点检测方法及装置
CN117132869A (zh) * 2023-08-28 2023-11-28 广州视景医疗软件有限公司 视线偏差估算模型的训练、视线偏差值的校正方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930278A (zh) * 2012-10-16 2013-02-13 天津大学 一种人眼视线估计方法及其装置
CN104978548A (zh) * 2014-04-02 2015-10-14 汉王科技股份有限公司 一种基于三维主动形状模型的视线估计方法与装置
CN109492514A (zh) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 一种单相机采集人眼视线方向的方法及系统
US20190110003A1 (en) * 2017-10-11 2019-04-11 Wistron Corporation Image processing method and system for eye-gaze correction
CN110678873A (zh) * 2019-07-30 2020-01-10 珠海全志科技股份有限公司 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
CN111310705A (zh) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 图像识别方法、装置、计算机设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748858A (zh) * 2017-06-15 2018-03-02 华南理工大学 一种基于级联卷积神经网络的多姿态眼睛定位方法
CN109740491B (zh) * 2018-12-27 2021-04-09 北京旷视科技有限公司 一种人眼视线识别方法、装置、系统及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930278A (zh) * 2012-10-16 2013-02-13 天津大学 一种人眼视线估计方法及其装置
CN104978548A (zh) * 2014-04-02 2015-10-14 汉王科技股份有限公司 一种基于三维主动形状模型的视线估计方法与装置
US20190110003A1 (en) * 2017-10-11 2019-04-11 Wistron Corporation Image processing method and system for eye-gaze correction
CN109492514A (zh) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 一种单相机采集人眼视线方向的方法及系统
CN110678873A (zh) * 2019-07-30 2020-01-10 珠海全志科技股份有限公司 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
CN111310705A (zh) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 图像识别方法、装置、计算机设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114115535A (zh) * 2021-11-12 2022-03-01 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 基于银河锐华移动操作系统的眼动追踪、识别方法及系统
CN114360042A (zh) * 2022-01-07 2022-04-15 桂林电子科技大学 一种人眼注视方向预测方法及系统
CN116912924A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置
CN116912924B (zh) * 2023-09-12 2024-01-05 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置

Also Published As

Publication number Publication date
CN111310705A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021169637A1 (zh) 图像识别方法、装置、计算机设备及存储介质
CN109558832B (zh) 一种人体姿态检测方法、装置、设备及存储介质
US10504268B1 (en) Systems and methods for generating facial expressions in a user interface
WO2021078157A1 (zh) 图像处理方法、装置、电子设备及存储介质
Yang et al. Benchmarking commercial emotion detection systems using realistic distortions of facial image datasets
US10318797B2 (en) Image processing apparatus and image processing method
US20230021661A1 (en) Forgery detection of face image
US20180204094A1 (en) Image recognition method and apparatus
US20230049533A1 (en) Image gaze correction method, apparatus, electronic device, computer-readable storage medium, and computer program product
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
US20230081982A1 (en) Image processing method and apparatus, computer device, storage medium, and computer program product
WO2023098128A1 (zh) 活体检测方法及装置、活体检测系统的训练方法及装置
Yang et al. PipeNet: Selective modal pipeline of fusion network for multi-modal face anti-spoofing
Cai et al. Semi-supervised natural face de-occlusion
WO2021051547A1 (zh) 暴力行为检测方法及系统
US20230095182A1 (en) Method and apparatus for extracting biological features, device, medium, and program product
CN111680550B (zh) 情感信息识别方法、装置、存储介质及计算机设备
CN111108508B (zh) 脸部情感识别方法、智能装置和计算机可读存储介质
CN111902821A (zh) 检测动作以阻止识别
CN115115552B (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN112419326A (zh) 图像分割数据处理方法、装置、设备及存储介质
CN111444928A (zh) 关键点检测的方法、装置、电子设备及存储介质
KR102594093B1 (ko) 딥러닝 기술을 활용한 피부과 시술 추천 시스템 및 방법
CN112580395A (zh) 基于深度信息的3d人脸活体识别方法、系统、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21759909

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21759909

Country of ref document: EP

Kind code of ref document: A1