WO2020233333A1 - 图像处理方法和装置 - Google Patents

图像处理方法和装置 Download PDF

Info

Publication number
WO2020233333A1
WO2020233333A1 PCT/CN2020/086304 CN2020086304W WO2020233333A1 WO 2020233333 A1 WO2020233333 A1 WO 2020233333A1 CN 2020086304 W CN2020086304 W CN 2020086304W WO 2020233333 A1 WO2020233333 A1 WO 2020233333A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
key point
coordinates
image
face image
Prior art date
Application number
PCT/CN2020/086304
Other languages
English (en)
French (fr)
Inventor
杨赟
李松江
遇冰
冯柏岚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20808836.9A priority Critical patent/EP3965003A4/en
Publication of WO2020233333A1 publication Critical patent/WO2020233333A1/zh
Priority to US17/530,688 priority patent/US20220076000A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the embodiments of the present application relate to image processing technology, and in particular to an image processing method and device.
  • Face key point detection also called face alignment
  • face alignment refers to inputting a face image and obtaining pre-defined key point coordinates through computer vision algorithms, such as the corners of the eyes, the corners of the mouth, the tip of the nose, and the contour of the face.
  • the face image is processed to predict the positions of some key points such as eyes, nose, and mouth.
  • the embodiments of the present application provide an image processing method and device to improve the accuracy of positioning key points of a human face.
  • an embodiment of the present application provides an image processing method.
  • the method may include: acquiring a face image; acquiring a left face image and a right face image respectively according to the face image, and the size of the left face image and the right face image is different from that of the person.
  • the size of the face image is the same; the left face image is input to the first target key point convolutional neural network model, and the coordinates of the first left face key point are output.
  • the first target key point convolutional neural network model uses the key point information
  • the left face image is obtained after training the key point convolutional neural network model; the right face image is input to the second target key point convolutional neural network model, and the coordinates of the first right face key point are output, and the second target key point volume
  • the product neural network model is obtained after training the key point convolutional neural network model using the right face image with key point information; according to the coordinates of the first left face key point and the first right face key point, the face is obtained The coordinates of the key points of the face of the image.
  • the first target key point convolutional neural network model is used to process the left face image
  • the second target key point convolutional neural network model is used to process the right face image.
  • the half-face positioning accuracy is high, and it can be used
  • the structural features of the face improve the positioning accuracy of key points on the face.
  • the left face image with key point information and the right face image with key point information are obtained from face images with different posture information, and face images with different posture information have corresponding key point information
  • the face images of different posture information include the face image of the first posture information, the face image of the second posture information, and the face image of the third posture information, and the first posture information is used to indicate the deflection angle of the face.
  • the direction of is the posture information to the left
  • the second posture information is used to indicate that the direction of the deflection angle of the face is positive
  • the third posture information is used to indicate that the direction of the deflection angle of the face is right To the posture information.
  • the first target key point convolutional neural network model and the second target key point convolutional neural network model are obtained after training the key point convolutional neural network model using face images with different pose information. In this way, the influence of face images with different posture information on the optimization of the key point convolutional neural network model can be balanced, and the accuracy of the key point positioning of the face can be effectively improved.
  • obtaining the coordinates of the face key points of the face image according to the coordinates of the first left face key point and the coordinates of the first right face key point may include: The coordinates of the face key points determine the first affine transformation matrix; obtain the corrected left face image according to the first affine transformation matrix and the left face image; input the corrected left face image to the third target key point convolution
  • the neural network model outputs the coordinates of the corrected first left face key point; according to the corrected coordinates of the first left face key point and the inverse transformation of the first affine transformation matrix, the coordinates of the second left face key point are obtained; According to the coordinates of the second left face key point and the coordinates of the first right face key point, the coordinates of the face key points of the face image are obtained.
  • the first target key point convolutional neural network model is used to process the left face image, the left face image is corrected according to the output result of the first target key point convolutional neural network model, and the third target key point is used
  • the convolutional neural network model processes the corrected left face image, which can improve the positioning accuracy of the key points on the left face, thereby improving the accuracy of the key point positioning on the face.
  • obtaining the coordinates of the face key point of the face image may include: according to the first right face key point The coordinates determine the second affine transformation matrix; obtain the corrected right face image according to the second affine transformation matrix and the right face image; input the corrected right face image to the fourth target keypoint convolutional neural network model, and output the correction
  • the coordinates of the first right face key point after correction and the inverse transformation of the second affine transformation matrix the coordinates of the second right face key point are obtained; according to the second right face The coordinates of the face key points and the coordinates of the first left face key point are obtained to obtain the coordinates of the face key points of the face image.
  • the right face image is processed using the second target key point convolutional neural network model, the right face image is corrected according to the output result of the second target key point convolutional neural network model, and the fourth target key point is used
  • the convolutional neural network model processes the corrected right face image, which can improve the positioning accuracy of the key points on the right face, and then the accuracy of the key point positioning on the face.
  • obtaining the coordinates of the face key point of the face image may include: according to the coordinates of the first left face key point Determine the first affine transformation matrix, determine the second affine transformation matrix according to the coordinates of the first right face key point; obtain the corrected left face image according to the first affine transformation matrix and the left face image, and according to the second affine transformation matrix
  • the corrected right face image is obtained by the projection transformation matrix and the right face image; the corrected left face image is input to the third target key point convolutional neural network model, and the coordinates of the corrected first left face key point are output, Input the corrected right face image to the fourth target key point convolutional neural network model, and output the corrected first right face key point coordinates; according to the corrected first left face key point coordinates and first affine
  • the left face image is processed using the first target key point convolutional neural network model, the left face image is corrected according to the output result of the first target key point convolutional neural network model, and the third target key point volume is used
  • the integrative neural network model processes the corrected left face image, which can improve the positioning accuracy of the key points of the left face.
  • the output result of the neural network model corrects the right face image
  • the fourth target key point convolutional neural network model is used to process the corrected right face image, which can improve the positioning accuracy of the right face key points, thereby improving the face key points The accuracy of the positioning, thereby improving the accuracy of the key point positioning of the face.
  • the method may further include: classifying a plurality of training samples based on pose information, obtaining s training sample sets, the training samples include face images with key point information; from s training sample sets Select multiple training samples from at least three sets as training data; use the training data to train the two key point convolutional neural network models to obtain the first target key point convolutional neural network model and the second target key point volume Product neural network model; where s is any integer greater than or equal to 3.
  • the selection of training data can increase the convergence speed of the model and the training speed of the model.
  • the selection of training data based on posture information enables the training data to balance the impact of various angles of face on model optimization, and improve the key points of the face The positioning accuracy. For example, the positioning accuracy of key points of a face image with a large deflection angle can be improved.
  • acquiring a face image may include: acquiring a to-be-processed image through a photographing function or a photographing function of the terminal; and intercepting the face image in the to-be-processed image.
  • the method may further include: determining the driver's behavior according to the coordinates of the key points of the human face, and determining whether to issue an alarm signal according to the driver's behavior.
  • the method may further include: adjusting the image to be processed according to the coordinates of the key points of the face and the beauty effect parameters, and displaying the adjusted image to be processed on the image preview interface;
  • the beauty effect parameters include at least one or a combination of virtual decoration parameters, face-lift parameters, eye size adjustment parameters, dermabrasion and acne removal parameters, skin whitening parameters, teeth whitening parameters, and blush parameters.
  • the method before displaying the adjusted image to be processed, may further include: obtaining a face image of key points according to the coordinates of the key points of the face and the face image.
  • the face image is marked with the face key points;
  • the key point face image is displayed on the image preview interface;
  • the key point adjustment instruction input by the user is received, and the key point adjustment instruction is used to indicate the adjusted face key point;
  • Adjusting the coordinates of the key points of the face and the beauty effect parameters to the image to be processed may include: adjusting the image to be processed according to the adjusted key points of the face and the beauty effect parameters.
  • the method may further include: performing face recognition according to the coordinates of the key points of the face.
  • performing face recognition according to the coordinates of the key points of the face may include: extracting features of the face image according to the coordinates of the key points of the face to obtain the features of the face image; Matching the features of the face image with the feature template in the database, and outputting the recognition result.
  • an embodiment of the present application provides an image processing method.
  • the method may include: acquiring a left face image with key point information and a right face image with key point information according to face images with different posture information.
  • the face image with posture information has corresponding key point information; the left face image with key point information is used to train the key point convolutional neural network model to obtain the first target key point convolutional neural network model.
  • a target key point convolutional neural network model is used to process the input left face image and output the coordinates of the left face key point; use the right face image with key point information to train the key point convolutional neural network model, Obtain a second target key point convolutional neural network model, the second target key point convolutional neural network model is used to process the input right face image, and output the coordinates of the right face key points; wherein the posture information is used To reflect the deflection angle of the face.
  • the face image with different posture information includes a face image with first posture information, a face image with second posture information, and a face image with third posture information, and the first posture
  • the information is used to indicate that the direction of the deflection angle of the face is leftward posture information
  • the second posture information is used to indicate the posture information that the direction of the deflection angle of the face is positive
  • the third posture information is used to indicate The direction of the deflection angle of the face is the right posture information.
  • the method may further include: classifying a plurality of training samples based on posture information, and obtaining s training sample sets, where the training samples include face images with key point information; A plurality of training samples are selected from at least three sets of training sample sets as the face images with different posture information; where s is any integer greater than or equal to 3.
  • an image processing device may be a terminal device or a chip in the terminal device.
  • the device has the function of realizing the terminal equipment involved in the above embodiments. This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above-mentioned functions.
  • the device when the device is a terminal device, the device may include: an acquisition module and a processing module, the acquisition module and the processing module may be, for example, a processor, and the acquisition module may be connected to the transceiver module,
  • the transceiver module may be, for example, a transceiver, and the transceiver may include a radio frequency circuit and a baseband circuit.
  • the device may further include a storage unit, and the storage unit may be a memory, for example.
  • the storage unit is used to store computer execution instructions
  • the acquisition module and processing module are connected to the storage unit, and the acquisition module and processing module execute the computer execution instructions stored in the storage unit to enable the terminal
  • the device executes the above-mentioned image processing method involving terminal device functions.
  • the chip when the device is a chip in a terminal device, the chip includes: a processing module and a transceiver module.
  • the processing module may be a processor, for example, and the transceiver module may be on the chip.
  • the device may further include a storage unit, and the processing module can execute computer-executable instructions stored in the storage unit, so that the chip in the terminal device executes any one of the above-mentioned image processing methods involving terminal device functions.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in a terminal device, such as a read-only memory (read-only memory).
  • read-only memory read-only memory
  • the processor mentioned in any of the above can be a general-purpose central processing unit (Central Processing Unit, CPU for short), microprocessor, application-specific integrated circuit (ASIC for short), or one or A plurality of integrated circuits used to control the execution of the programs of the above-mentioned image processing methods.
  • CPU Central Processing Unit
  • ASIC application-specific integrated circuit
  • the present application provides an image processing device.
  • the device may be a training device or a chip in the training device.
  • the device has the function of realizing the various embodiments of the training equipment related to the above aspects. This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above-mentioned functions.
  • the device when the device is a training device, the device may include a processing module and a transceiver module.
  • the processing module may be, for example, a processor, the transceiver module may be, for example, a transceiver,
  • the device includes a radio frequency circuit.
  • the device further includes a storage unit.
  • the storage unit may be a memory, for example.
  • the chip when the device is a chip in a training device, the chip includes: a processing module and a transceiver module.
  • the processing module may be a processor, for example, and the transceiver module may be on the chip.
  • the processing module can execute the computer-executable instructions stored in the storage unit, so that the chip in the access point executes the above-mentioned image processing methods related to the training device.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the training device, such as a ROM or a storage Other types of static storage devices for static information and instructions, RAM, etc.
  • the processor mentioned in any one of the foregoing may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the foregoing image processing method.
  • a computer storage medium is provided, and program code is stored in the computer storage medium, and the program code is used to instruct the execution of any one of the first aspect to the second aspect or any possible implementation manner thereof Method of instruction.
  • a processor configured to be coupled with a memory, and configured to execute any one of the foregoing first to second aspects or a method in any possible implementation manner thereof.
  • a computer program product containing instructions which when running on a computer, causes the computer to execute any one of the first to second aspects or the method in any possible implementation manner thereof.
  • the image processing method and device in the embodiments of the present application acquire a face image, input the face image into the target key point convolutional neural network model, and output the coordinates of the face key point, wherein the target key point convolution
  • the neural network model is obtained after training the key point convolutional neural network model using face images with different posture information.
  • the posture information is used to reflect the deflection angle of the face, so that the accuracy of the key point positioning of the face can be improved.
  • FIG. 1 is a schematic diagram of key points of a human face according to an embodiment of the application
  • Fig. 2 is a schematic diagram of key points of two-dimensional and three-dimensional faces according to an embodiment of the application;
  • 3A is a schematic diagram of a network architecture of an image processing method according to an embodiment of the application.
  • 3B is a schematic diagram of another network architecture of the image processing method according to an embodiment of the application.
  • 4A is a schematic diagram of the training data construction process of the image processing method according to an embodiment of the application.
  • 4B is a schematic diagram of training data construction of the image processing method according to an embodiment of the application.
  • FIG. 5A is a schematic diagram of the distribution of key points of the face without using the GPA algorithm to process training samples according to an embodiment of the application;
  • FIG. 5B is a schematic diagram of the distribution of key points on the face after using the GPA algorithm in the image processing method of the embodiment of the application;
  • FIG. 6 is a schematic diagram of the distribution of the training sample set of the image processing method according to the embodiment of the application.
  • FIG. 7A is a schematic diagram of a key point convolutional neural network model of an embodiment of the application.
  • FIG. 7B is a flowchart of a method for training a key point convolutional neural network model according to an embodiment of the application
  • FIGS. 7C and 7D are schematic diagrams of the network structure of ResNet50 according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of key point convolutional neural network model training according to an embodiment of the application.
  • FIG. 9 is a schematic diagram of key point convolutional neural network model training according to an embodiment of the application.
  • FIG. 10 is a flowchart of an image processing method according to an embodiment of the application.
  • 11A is a flowchart of another image processing method according to an embodiment of the application.
  • 11B is a schematic diagram of another image processing method according to an embodiment of the application.
  • FIG. 12A is a flowchart of another image processing method according to an embodiment of the application.
  • FIG. 12B is a schematic diagram of another image processing method according to an embodiment of the application.
  • FIG. 13 is a schematic diagram of an application scenario of the image processing method according to an embodiment of the application.
  • FIGS. 14A to 14C are schematic diagrams of interfaces of an application scenario of the image processing method according to an embodiment of the application.
  • 15 is a schematic structural diagram of a terminal device according to an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of another terminal device according to an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of another terminal device according to an embodiment of the application.
  • FIG. 18 is a schematic structural diagram of another terminal device according to an embodiment of this application.
  • FIG. 19 is a schematic structural diagram of another terminal device according to an embodiment of the application.
  • FIG. 20 is a structural block diagram when the terminal device of an embodiment of the application is a mobile phone
  • FIG. 21 is a schematic structural diagram of a training device according to an embodiment of the application.
  • 22 is a schematic structural diagram of another training device according to an embodiment of the application.
  • FIG. 23 is a schematic structural diagram of a chip according to an embodiment of the application.
  • Face key points Points used to locate the key areas of the face in the face image.
  • the key areas include eyebrows, eyes, nose, mouth, facial contours and other areas.
  • the key points of a human face are the points marked in the figure.
  • Face key point detection also known as face key point positioning or face alignment, it refers to processing the input face image to determine the face key points as described above.
  • the way to determine the key points of the face may be to process the input face image through a data mining model to determine the key points of the face.
  • the data mining model may be a neural network model, for example, a convolutional neural network model. Take the convolutional neural network model as an example. As shown in Figure 1, the face image is input to the convolutional neural network model, and the convolutional neural network model outputs the coordinates of the key points of the human face, that is, each marked in Figure 1. The coordinates of the point.
  • the location of key points on the face can be divided into two-dimensional perspective and three-dimensional perspective.
  • the circular marked points represent the position of the key points of the face in the two-dimensional perspective, that is, there are some when the face has a large angle.
  • the key points are not visible.
  • the key point of the face in the two-dimensional perspective only the labeling at the visible position in the image is considered.
  • the key point of the face in the three-dimensional perspective is shown in Figure 2.
  • the square label represents the position of the key point in the three-dimensional perspective. For a large-angle face, it is not visible.
  • the real coordinates of key points should also be estimated.
  • the key points of the human face involved in the following embodiments of the present application refer to key points in a two-dimensional viewing angle.
  • Convolutional Neural Network (CNN) model a feed-forward neural network model.
  • the artificial neurons of the neural network model can respond to a part of the surrounding units in the coverage area, and can use CNN for image processing.
  • the convolutional neural network model can be composed of one or more convolutional layers and a fully connected layer at the top, and can also include associated weights and a pooling layer. Compared with other deep learning structures, the convolutional neural network model can give better results in image and speech recognition. This model can also be trained using backpropagation algorithms. Compared with other deep, feed-forward neural networks, the convolutional neural network model requires fewer parameters to estimate.
  • the embodiment of this application takes the above-mentioned convolutional neural network model as an example for illustration, and this application is not limited thereto.
  • the parameters of the convolutional neural network model (for example, weight parameters and bias parameters) are used to represent the convolutional neural network model. Different convolutional neural network models have different parameters, and their processing performance is also different.
  • the training device can use the training data to train the convolutional neural network model.
  • the training device consists of a processor, a hard disk, a memory, and a system bus.
  • the convolutional neural network model involved in this application specifically refers to a convolutional neural network model for realizing the location of key points of a human face, and may be called a key-point convolutional neural network model.
  • the training process of the key point convolutional neural network model specifically refers to adjusting the parameters of the model through the learning of training data, so that the output of the convolutional neural network model is as close to the target value as possible, for example, the target value is the correct person The coordinates of the key points of the face.
  • the network structure of the key point convolutional neural network model of this application can be referred to the prior art convolutional neural network model.
  • the difference between this application and the prior art is that the key point convolutional neural network model is performed as follows Through the training in the above embodiment, the target key point convolutional neural network model is obtained.
  • the “adjusting the key point convolutional neural network model” in the embodiment of the present application refers to adjusting the parameters involved in the network model, for example, the weight parameter and the bias parameter.
  • the training sample includes a face image with key point information.
  • a face image with key point information may include the face image and the coordinates of the corresponding face key point, which is a type of key point information in the embodiment of the application
  • the expression form may be the coordinates of key points on the face, and the coordinates of the key points on the face correspond to the face image and are used to identify the key points in the face image.
  • the embodiment of the application classifies multiple training samples according to the deflection angle of the face. For example, the face images deflected to the left are divided into one type to form a training sample set, and the face images deflected to the right are divided into one.
  • Classes form a training sample set, divide the face images of the front face into one category, and form a training sample set, that is, three training sample sets.
  • a training sample set that is, three training sample sets.
  • the number of training sample sets can be flexibly set according to needs to classify the training samples and obtain a corresponding number of training sample sets.
  • the training data includes one or more training samples, and the one or more training samples may come from the same or different training sample sets.
  • the training data is used to train the key point convolutional neural network model.
  • the training data of the embodiment of this application can also be referred to as minibatch data.
  • the embodiment of this application constructs the training data based on the posture information of the face image. For the specific construction process of the training data, please refer to the following embodiments Specific instructions.
  • Posture information is used to reflect the deflection angle of the human face, for example, the human face is deflected 15 degrees to the left, 25 degrees to the right, and so on.
  • a wireless terminal can be a wireless terminal or a wired terminal.
  • a wireless terminal can be a device that provides voice and/or other service data connectivity to users, a handheld device with wireless connection function, or other processing devices connected to a wireless modem.
  • a wireless terminal can communicate with one or more core networks via a radio access network (RAN).
  • the wireless terminal can be a mobile terminal, such as a mobile phone (or "cellular" phone) and a computer with a mobile terminal For example, they can be portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, which exchange language and/or data with the wireless access network.
  • Wireless terminals can also be called systems, subscriber units (Subscriber Unit), subscriber stations (Subscriber Station), mobile stations (Mobile Station), mobile stations (Mobile), remote stations (Remote Station), remote terminals (Remote Terminal), The access terminal (Access Terminal), user terminal (User Terminal), user agent (User Agent), and user equipment (User Device or User Equipment) are not limited here.
  • plural means two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects are in an "or” relationship.
  • FIG. 3A is a schematic diagram of a network architecture of the image processing method according to an embodiment of the application.
  • the network architecture includes training equipment and model application equipment.
  • the training device uses face images with different pose information to train the key point convolutional neural network model to obtain the target key point convolutional neural network model.
  • the model application device refers to a device that uses the target key point convolutional neural network model of the embodiment of the present application to perform image processing, and the model application device may be any specific form of the aforementioned terminal.
  • the image processing method of the present application may include: the training stage, the training device will use the face images with different pose information and the coordinates of the corresponding key points to train the key point convolutional neural network model to obtain the target key point convolutional neural network model .
  • the model use stage is an achievable way.
  • the target key point convolutional neural network model is stored in the model application device.
  • the model application device collects images, and uses the target key point convolutional neural network model to process the collected images.
  • the coordinates of the key points of the face are output, so that the model application device performs subsequent processing on the collected images according to the coordinates of the key points of the face.
  • the subsequent processing may be face matching processing (applied to face recognition).
  • Figure 3B is a schematic diagram of another network architecture of the image processing method according to an embodiment of the application.
  • the network architecture includes a training device, an application server, and a model application device.
  • the training device uses different posture information.
  • the face image and the coordinates of the corresponding face key points are trained on the key point convolutional neural network model to obtain the target key point convolutional neural network model.
  • the target key point convolutional neural network model is stored in the application server.
  • the application server can send the target key point convolutional neural network model to the model application device, and the model application device collects images , Use the target key point convolutional neural network model to process the collected images, output the coordinates of the key points of the face, and the model application device performs subsequent processing on the collected images according to the coordinates of the key points of the face, for example,
  • the subsequent processing process may be face matching processing (applied to face recognition).
  • the model application device collects images and sends the collected images to the application server, and the application server uses the target key point convolutional neural network model to process the image and output the coordinates of the key points of the face.
  • the application server performs subsequent processing on the collected image according to the coordinates of the key points of the face.
  • the subsequent processing may be face matching processing (applied to face recognition), and the processing result is sent to the model application device.
  • the aforementioned training device and model application device can be two separate devices, or one device, such as a terminal in any specific form as described above, and the aforementioned training device and application server can be two separate devices. , It can also be a device, such as a server, which is not limited in this application.
  • the embodiment of the application constructs training data based on the posture information of the face image of each training sample, and uses the training data to train the key point convolutional neural network, where the training data includes training samples with different posture information, that is, different posture information The coordinates of the face image and the corresponding key points of the face.
  • the model is trained through training samples of different pose information, which can balance the influence of face images with different pose information on the optimization of the key point convolutional neural network model, and can improve the face of the target convolutional neural network model obtained by training Positioning accuracy of key points.
  • the gradient descent direction during model training can be made more accurate.
  • the face image with different posture information may include the face image of the first posture information, the face image of the second posture information, and the face image of the third posture information.
  • the first posture information is used for The direction indicating the deflection angle of the face is posture information to the left
  • the second posture information is used to indicate the posture information of the deflection angle of the face in the positive direction
  • the third posture information is used to indicate the deflection angle of the face
  • the direction of is the right attitude information.
  • the face image of the first posture information may include face images with different degrees of deflection to the left, for example, a face image that is deflection to the left by 10 degrees, a face image that is deflection to the left by 20 degrees, etc., and no examples are given here. Description.
  • the face image of the third posture information may include face images with different degrees of deflection to the right, for example, a face image that is deflection to the right by 10 degrees, a face image that is deflection to the right by 20 degrees, etc., and no examples are given here. Description.
  • FIG. 4A is a schematic diagram of the training data construction process of the image processing method according to an embodiment of the application
  • FIG. 4B is a schematic diagram of training data construction of the image processing method according to an embodiment of the application.
  • the training data construction may include:
  • Step 101 Classify training samples based on the posture information of the face images of each training sample, and obtain s training sample sets.
  • the training samples include the face image and the coordinates of the corresponding key points of the face.
  • the training samples are collected from various complex scenes, which can be classified based on the posture information.
  • the posture information is used to reflect information about the deflection angle of the face.
  • the posture information is p, and the value range of p is [-100,100], a negative value indicates a leftward deflection, and a positive value indicates a rightward deflection.
  • An achievable way to obtain the pose information of the face image in each training sample is to input the coordinates of the key points of each face image into the Generalized Procrustes Analysis (GPA) algorithm, and output the adjusted The coordinates of the key points of the face. Input the adjusted coordinates of the key points of the face to the Principal Component Analysis (PCA) algorithm.
  • the PCA algorithm will perform the dimensionality reduction operation on the coordinates of the adjusted key points of the face and output The p value of each training sample. The p value is used to represent posture information.
  • the GPA algorithm can align the face images of all training samples to an average face image (for example, the front face standard image), for example, for each face image
  • an average face image for example, the front face standard image
  • the processed face image is located near the average face image, and the mean square error (MSE) from the average face image is the smallest.
  • MSE mean square error
  • FIG. 5A is a schematic diagram of the distribution of key points of the face that does not use the GPA algorithm to process the training samples in an embodiment of the application
  • FIG. 5B is the GPA algorithm used in the image processing method of the embodiment of the application
  • the following schematic diagram of the distribution of face key points taking 68 face key points in each face image as an example, as shown in Figure 5A, the face key points of multiple training samples that have not been processed by the GPA algorithm are distributed in The average face image is near the key points of the face, and the distribution is relatively messy. It can be seen that the postures of the face images of the training samples are different, and the gap is large. As shown in FIG.
  • the face key points of multiple training samples processed by the GPA algorithm are distributed near the face key points of the average face image, and the distribution presents a certain ellipse, which has a certain aggregation effect. It can be seen that through the GPA algorithm, the MSE between the face image of the training sample and the average face image can be minimized, so that the accuracy of subsequent face key points processing can be improved.
  • the PCA algorithm's processing of the adjusted coordinates of the key points of the face is described, and the PCA algorithm is used to perform a dimensionality reduction operation on the coordinates of the adjusted key points of the face, and the p-value of each training sample is output.
  • the embodiment of the present application adopts a way that the principal component is one-dimensional, and it is understandable that it can also be multi-dimensional, and the embodiment of the present application is not limited to one dimension.
  • the coordinate length of the key points of each face image (2L, L is the number of key points) is converted into a number p.
  • the training samples of the embodiment of the present application are divided into s small data sets, that is, s training sample sets T1 to Ts, and each small data set contains a certain number of training samples.
  • Each training sample set represents a face image that meets certain angle conditions.
  • the value of s is 3, a plurality of training samples are classified based on the pose information, and 3 training sample sets are obtained.
  • a training sample set includes the face image of the first pose information and the corresponding face key points
  • One training sample set includes the face image of the second pose information and the corresponding face key points
  • the other training sample set includes the face image of the third pose information and the corresponding face key points.
  • Multiple training samples can be selected from the 3 training sample sets as training data to train the key point convolutional neural network model.
  • the value of s is 5, multiple training samples are classified based on the pose information, and 5 training sample sets are obtained.
  • the first and second training sample sets include the face image of the first pose information and the corresponding Face key points
  • the third training sample set includes the face image of the second pose information and the corresponding face key points
  • the fourth and fifth training sample sets include the face image of the third pose information and the corresponding Key points of the face.
  • the p value of the face image in the first training sample set is less than -50
  • the p value of the face image in the second training sample set is greater than or equal to -50
  • the face image in the fifth training sample set The p value of is greater than 50 degrees
  • the p value of the face image in the fourth training sample set is less than or equal to 50 degrees.
  • FIG. 4B s is equal to 9 as an example.
  • 9 training sample sets as shown in Figure 4B can be obtained.
  • the p value gradually increases from left to right, and the angle of the face is from Tilt the head to the left and gradually tilt the head to the right. It can be seen that the p-value can reflect the posture information of the face image.
  • Figure 6 is a schematic diagram of the distribution of the training sample set of the image processing method according to the embodiment of the application. As shown in Figure 6, the horizontal axis is the p axis and the vertical axis is the number axis.
  • All training samples satisfy the normal distribution, that is, the left and right side faces Compared with the number of frontal photos, the number of frontal photos in all training samples is relatively large. Therefore, in this embodiment of the present application, the following steps are used to select training samples to train the key point convolutional neural network model. To improve the accuracy of the trained target key point convolutional neural network model for the positioning of the key points of the face image with different pose information.
  • 9 training sample sets as shown in FIG. 4B can be obtained, and then the following step 102 is used to select training samples to construct training data.
  • Step 102 Select multiple training samples from at least three sets of s training sample sets as training data.
  • the training data may include N training samples, where N is an integer greater than or equal to 3, that is, N training samples are selected from the s training sample set as the training data.
  • N training samples are selected from the s training sample set as the training data.
  • each iteration inputs N training samples, and calculates the loss of the model output value and the coordinates of the key points of the training sample's face, and calculates the gradient backwards
  • the parameter update value of the model is repeated iteratively to obtain a usable target key point convolutional neural network model.
  • At least three training samples are selected from the s training sample sets according to the classification result according to the sample selection strategy.
  • the sample selection strategy can be based on a preset ratio, for example, the proportion of face images in the first pose information is 30, the proportion of face images in the second pose information is 40, and the proportion of face images in the third pose information in the training data 30.
  • sample selection strategy can be the following three strategies:
  • the average sampling strategy refers to the selection of N/s training samples from each training sample set Ti (i takes 1 to s) to form the training data of the batch, that is, the face of each deflection angle in the training data occupies a balanced proportion To ensure the accuracy of the gradient direction.
  • the proportions of the face image of the first posture information, the face image of the second posture information, and the face image of the third posture information are the same, that is, the face images deflected to the left, forward, and right.
  • the ratio is the same.
  • the left-face enhanced sampling strategy refers to taking more faces from the set of s training samples for the set with faces to the left, less for the set with faces to the right, and fewer faces for the frontal images. image.
  • the ratio of 66333344 can be selected according to this strategy, that is, 6 from each of the two training sample sets of the left face, and each of the two training sample sets of the right face Choose 4, and choose 3 from each of the four training sample sets that approximate the front face to form the training data of the entire training iteration, which can increase the proportion of the side face in the training data, and emphasize the left face in the side face The proportion ensures that the model has a better positioning effect on the left face.
  • the right-face enhanced sampling strategy refers to the collection of s training samples, taking more pictures for the set whose faces are leaning to the right, taking less pictures for the set whose faces are leaning to the left, and taking fewer pictures for the frontal pictures.
  • N 32
  • a ratio of 44333366 can be selected, that is, 4 images are selected from each of the two training sample sets of the left face, and 6 images are selected from each of the two training sample sets of the right face.
  • sample selection strategy in the embodiment of this application may also include other strategies, and the embodiment of this application does not limit the above three strategies.
  • training samples are classified based on posture information, s training sample sets are obtained, training samples are selected from s training sample sets, and training data is constructed.
  • the selection of training data can increase the convergence speed of the model and improve the model’s performance
  • the training speed and the selection of training data based on posture information enable the training data to balance the influence of the face from various angles on model optimization and improve the positioning accuracy of key points of the face. For example, the positioning accuracy of key points of a face image with a large deflection angle can be improved.
  • FIG. 7A is a schematic diagram of training the key point convolutional neural network model according to an embodiment of the application.
  • the training data of the foregoing embodiment is input to the key point convolutional neural network model.
  • the convolutional neural network model learns from training data to adjust the network model.
  • the key point convolutional neural network model can process the face image of the training sample, and output the coordinates of the key points of the face, according to the coordinates of the key points of the face output by the model and the coordinates of the actual key points of the face of the training sample Optimize and adjust the network model.
  • the key point convolutional neural network model of the embodiment of the present application may be a residual network (ResNet), such as ResNet50.
  • Figures 7C and 7D are schematic diagrams of the network structure of ResNet50 according to an embodiment of the application.
  • ResNet50 is composed of a large number of small network blocks (also called layers), and the structure of each small network block is as follows
  • the idea of residuals is to remove the same main part to highlight small changes. By introducing the idea of residuals, the number of network layers can be deepened and the expressive ability of the network model can be enhanced.
  • ResNet50 includes 49 convolutional layers, 2 pooling layers, and 1 fully connected layer. Each convolutional layer is followed by a normalization layer and a linear rectification function layer. The output of the convolutional layer is constrained, so that the network model can be designed deeper and has a stronger expressive ability.
  • the convolutional layer is responsible for extracting the high-level feature expression of the image, and extracting the abstract feature expression of the input image by fusing the information of different channels; the pooling layer is used to compress the size of the output matrix to increase the receptive field of the image , So as to ensure that the features are highly compact; the fully connected layer is used to linearly integrate the feature maps and adapt to each pre-defined output dimension related to the problem to be solved.
  • the output dimension is L*2 (for example, 68*2) dimension
  • L is the number of face key points in the face image
  • every two values represent the coordinate value of a key point, such as the x coordinate and y coordinate of a key point.
  • FIG. 7B is a flowchart of a method for training a key point convolutional neural network model according to an embodiment of the application. As shown in FIG. 7B, the method for training a key point convolutional neural network model of the present application may include:
  • Step 201 Initialize the key point convolutional neural network model.
  • Step 202 Input the training data into the initialized key point convolutional neural network model as shown in FIG. 7A, and obtain the target key point convolutional neural network model after loop iteration.
  • the input face image is processed, and the coordinates of the key points of the face are output.
  • the coordinates of the key points of the face are outputted with the face key of the training sample.
  • the coordinates of the points are compared, for example, a corresponding operation is performed to obtain a loss cost result, and the initialized key point convolutional neural network model is adjusted according to the loss cost result.
  • a preset condition that the loss cost result meets can be set, if If not satisfied, you can adjust the parameters of the key point convolutional neural network model, and use the adjusted key point convolutional neural network model to process the face image of the training data, and then calculate a new loss cost result to judge the new Whether the loss cost result meets the preset condition, iterate repeatedly until the new loss cost result meets the preset condition, and the target key point convolutional neural network model is obtained. In the stage of using the model, use the target key point convolutional neural network model.
  • the key point convolutional neural network model is initialized, and the training data is input to the initialized key point convolutional neural network model as shown in FIG. 7A.
  • the target key point convolutional neural network model is obtained, and through different poses
  • the training samples of the information train the model, which can balance the influence of face images with different pose information on the optimization of key point convolutional neural network models, and can improve the key points of the target convolutional neural network model obtained by training positioning accuracy.
  • the gradient descent direction during model training can be made more accurate.
  • the key point convolutional neural network model for training in the embodiment of this application includes a first key point convolutional neural network model and a second key point convolutional neural network model. .
  • FIG. 8 is a schematic diagram of the key point convolutional neural network model training of an embodiment of the application. As shown in FIG. 8, the training data of the above embodiment is input to the first key point convolutional neural network model and the second key point convolution A neural network model, where the first keypoint convolutional neural network model is used to learn the left face image, and the second keypoint convolutional neural network model is used to learn the right face image.
  • the first key point convolutional neural network model and the second key point convolutional neural network model of the embodiment of the application are network structure models based on half-face regression.
  • the image segmentation module and the first key point convolutional neural network model and the second are respectively connected, and the first key point convolutional neural network model and the second key point convolutional neural network model are respectively connected to the summary output module.
  • the face image of the training data is input to the image segmentation module, and the image segmentation module is used to segment the face image to output the left face image and the right face image.
  • the left face image is input to the first keypoint convolutional neural network model
  • the right face image is input to the second keypoint convolutional neural network model.
  • the key point convolutional neural network model namely the first target key point convolutional neural network model and the second target key point convolutional neural network model.
  • any one of the first keypoint convolutional neural network model and the second keypoint convolutional neural network model of the embodiment of the present application may be a residual network (ResNet), such as ResNet50, and the specific explanation may be Please refer to the description of FIG. 7C and FIG. 7D, which will not be repeated here.
  • ResNet residual network
  • the flow of the training method of the key point convolutional neural network model of this embodiment can be seen in Figure 7B, that is, the key point convolutional neural network model is initialized first, and then the network model is iteratively optimized.
  • the key point convolutional neural network model of the embodiment of the application includes a first key point convolutional neural network model and a second key point convolutional neural network model, that is, two network models are initialized, and the training data Input to the initialization of the first key point convolutional neural network model and the second key point neural network model as shown in Figure 8.
  • the first target key point convolutional neural network model and the second target key point volume are obtained Product neural network model.
  • the first keypoint convolutional neural network model and the second keypoint convolutional neural network model are initialized, and the training data is input to the initialized first keypoint convolutional neural network model and the second keypoint as shown in FIG. 8
  • the point convolutional neural network model after loop iterations, obtains the first target keypoint convolutional neural network model and the second target keypoint convolutional neural network model, trains the model through training samples of different posture information, so as to balance different The influence of the face image of the pose information on the optimization of the key point convolutional neural network model, and can improve the positioning accuracy of the face key points of the target convolutional neural network model obtained by training.
  • the gradient descent direction during model training can be made more accurate.
  • first target key point convolutional neural network model and the second target key point convolutional neural network model are half-face regression models.
  • the network model is relatively simple, the optimization is more accurate, and it can use the structural features of the face , Improve the positioning accuracy of the key points of the face of the model.
  • the key point convolutional neural network model for training in the embodiment of this application includes the first key point convolutional neural network model and the second key point convolutional neural network model. , The third key point convolutional neural network model and the fourth key point convolutional neural network model.
  • FIG. 9 is a schematic diagram of the key point convolutional neural network model training of an embodiment of the application.
  • the training data of the above embodiment is input to the first key point convolutional neural network model, and the second key point convolution The neural network model, the third key point convolutional neural network model and the fourth key point convolutional neural network model, where the first key point convolutional neural network model and the third key point convolutional neural network model are used to learn the left face Image, the second key point convolutional neural network model and the fourth key point convolutional neural network model are used to learn the right face image.
  • the first keypoint convolutional neural network model, the second keypoint convolutional neural network model, the third keypoint convolutional neural network model, and the fourth keypoint convolutional neural network model of the embodiments of the present application are based on half-face regression
  • the two-stage network structure model, as shown in Figure 9, the image segmentation module is connected to the first keypoint convolutional neural network model and the second keypoint convolutional neural network model respectively.
  • the first keypoint convolutional neural network model and the second keypoint convolutional neural network model are connected respectively.
  • the three key point convolutional neural network model is cascaded through the first affine transformation module, and then connected to the inverse transformation module of the first affine transformation, the second key point convolutional neural network model and the fourth key point convolutional neural network model pass
  • the second affine transformation module is cascaded, and then the inverse transformation module of the second affine transformation is connected.
  • the inverse transformation module of the first affine transformation and the inverse transformation module of the second affine transformation are respectively connected to the summary output module.
  • the face image of the training data is input to the image segmentation module, and the image segmentation module is used to segment the face image to output the left face image and the right face image.
  • the left face image is input to the first keypoint convolutional neural network model
  • the right face image is input to the second keypoint convolutional neural network model.
  • the coordinates of the face key points of the left face image can be obtained
  • the coordinates of the face key points of the right face image can be obtained through the second branch, and then the structured features of the face are used to summarize and output the face image The coordinates of key points on the face.
  • the target key point convolutional neural network model that is, the first The target key point convolutional neural network model, the second target key point convolutional neural network model, the third target key point convolutional neural network model, and the fourth target key point convolutional neural network model.
  • any of the first keypoint convolutional neural network model, the second keypoint convolutional neural network model, the third keypoint convolutional neural network model, and the fourth keypoint convolutional neural network model in the embodiments of the present application can be a residual network (ResNet), such as ResNet50.
  • ResNet50 residual network
  • the flow of the method for training the key point convolutional neural network model of this embodiment can be seen in FIG. 7B, that is, the key point convolutional neural network model is initialized first, and then the network model is iteratively optimized.
  • the key point convolutional neural network model of the embodiment of this application includes a first key point convolutional neural network model, a second key point convolutional neural network model, a third key point convolutional neural network model, and
  • the fourth key point convolutional neural network model that is, initialize four network models, and input the training data into the initialized first key point convolutional neural network model and second key point neural network model as shown in Figure 9, after looping Iteratively obtain the first target key point convolutional neural network model, the second target key point convolutional neural network model, the third target key point convolutional neural network model, and the fourth target key point convolutional neural network model.
  • the sample trains the model, so as to balance the influence of face images with different pose information on the optimization of the key point convolutional neural network model, and can improve the positioning accuracy of the face key points of the target convolutional neural network model obtained by training.
  • the gradient descent direction during model training can be made more accurate.
  • the first target key point convolutional neural network model, the second target key point convolutional neural network model, the third key point convolutional neural network model, and the fourth key point convolutional neural network model are two-stage half-face regression models . Its network model is relatively simple, optimization is more accurate, and it can use the structural features of the face to improve the positioning accuracy of the model's face key points.
  • the foregoing embodiment introduces the construction of training data and the training of the model using the training data.
  • the following embodiment explains the use of the trained model to locate key points of the face.
  • FIG. 10 is a flowchart of an image processing method according to an embodiment of this application.
  • the execution subject of this embodiment may be the above-mentioned model application device or application server, or its internal chip.
  • the image processing method of this application Can include:
  • Step 301 Obtain a face image.
  • the face image is a to-be-processed image or an image obtained through a interception operation on the to-be-processed image.
  • the to-be-processed image may be collected by any terminal with a camera function or a camera function, for example, an image collected by a smart phone.
  • Step 302 Input the face image to the target key point convolutional neural network model, and output the coordinates of the face key point.
  • the target key point convolutional neural network model is obtained after training the key point convolutional neural network model using face images with different posture information, and the posture information is used to reflect the deflection angle of the face.
  • the target key point convolutional neural network model may be a target target key point convolutional neural network model obtained by training using the training process shown in FIG. 7A.
  • a face image is acquired, the face image is input to the target key point convolutional neural network model, and the coordinates of the face key point are output.
  • This embodiment uses the target key point convolutional neural network model to process the face image. Because the target key point convolutional neural network model uses training samples of different posture information to train the model, it can balance faces with different posture information. The influence of the image on the optimization of the key point convolutional neural network model effectively improves the accuracy of the key point positioning of the face.
  • FIG. 11A is a flowchart of another image processing method according to an embodiment of this application
  • FIG. 11B is a schematic diagram of another image processing method according to an embodiment of this application, as shown in FIG. 11A, which is different from the embodiment shown in FIG. 10,
  • This embodiment can also perform segmentation processing on the face image, using the target key point convolutional neural network model obtained by the training process as shown in FIG. 8 for processing.
  • the image processing method of this application may include:
  • Step 401 Obtain a face image.
  • step 401 please refer to the explanation of step 301 in the embodiment shown in FIG. 10, which will not be repeated here.
  • Step 402 Obtain a left face image and a right face image respectively according to the face image.
  • An achievable way is to perform segmentation processing and filling processing on the face image to obtain a left face image and a right face image respectively, and the size of the left face image and the right face image are the same as the size of the face image.
  • the face image is divided into four equal parts in the vertical direction, the left three parts are taken, and a black background image is added to the leftmost part of the left three parts.
  • the size of the background image is the same as an equal size, and the left face image is obtained, and the size of the left face image is the same as the size of the face image.
  • the size of the black background image is the same as that of the first equal part to obtain the right face image.
  • the size is the same as the size of the face image.
  • This segmentation method can ensure that in the left face image and the right face image, the left and right half face regions are respectively located in the center of the image.
  • segmentation method takes four equal parts as an example for illustration, and it can also be equal parts with integer values such as six, seven, and eight, and the embodiments of the present application will not exemplify one by one.
  • the acquired left face image and right face image are input to the first target keypoint convolutional neural network model and the second target keypoint convolutional neural network model, respectively.
  • Step 4031 Input the left face image to the first target key point convolutional neural network model, and output the coordinates of the first left face key point.
  • the first target key point convolutional neural network model may be obtained by using the training process shown in FIG. 8.
  • the first target key point convolutional neural network model processes the left face image, and the input is as shown in FIG. 11B The coordinates of the key point of the first left face.
  • Step 4032 Input the right face image to the second target key point convolutional neural network model, and output the coordinates of the first right face key point.
  • the second target key point convolutional neural network model can be obtained by using the training process shown in FIG. 8.
  • the second target key point convolutional neural network model processes the right face image, and the input is as shown in FIG. 11B The coordinates of the key point of the second left face.
  • Step 404 Obtain the coordinates of the face key points of the face image according to the coordinates of the first left face key point and the coordinates of the first right face key point.
  • the coordinates of the first left face key point and the first right face key point are summarized. For example, the number of the first left face key point is 39, and the number of the first right face key point is 39. According to the face
  • the structured information summarizes the coordinates of the first left face key point and the first right face key point.
  • the middle area can use the average calculation method to obtain the face key of the middle area
  • a face image is acquired, a left face image and a right face image are respectively acquired according to the face image, the left face image is input to the first target key point convolutional neural network model, and the coordinates of the first left face key point are output.
  • Input the right face image to the second target key point convolutional neural network model output the coordinates of the first right face key point, and obtain the face image according to the coordinates of the first left face key point and the first right face key point
  • the coordinates of the key points of the face is used to process the left face image
  • the second target keypoint convolutional neural network model is used to process the right face image.
  • the first target keypoint convolutional neural network model And the second target key point convolutional neural network model is to use training samples of different posture information to train the model, so as to balance the influence of face images with different posture information on the optimization of the key point convolutional neural network model, and effectively improve people The accuracy of the key point positioning of the face.
  • first target key point convolutional neural network model and the second target key point convolutional neural network model are half-face regression models.
  • the network model is relatively simple, the half-face positioning accuracy is high, and it can use the structure of the face Features to further improve the positioning accuracy of the key points of the face of the model.
  • FIG. 12A is a flowchart of another image processing method according to an embodiment of this application
  • FIG. 12B is a schematic diagram of another image processing method according to an embodiment of this application, as shown in FIG. 12A, which is different from the embodiment shown in FIG. 11A
  • the third target key point convolutional neural network model and the fourth target key point convolutional neural network model can also be used to improve the positioning accuracy of the key points of the face.
  • the image processing method of the present application may include:
  • Step 501 Obtain a face image.
  • step 501 please refer to the explanation of step 301 in the embodiment shown in FIG. 10, which will not be repeated here.
  • Step 502 Acquire a left face image and a right face image respectively according to the face image, and the size of the left face image and the right face image are the same as the size of the face image.
  • step 502 please refer to the explanation of step 402 in the embodiment shown in FIG. 10, which will not be repeated here.
  • Step 5031 Input the left face image to the first target key point convolutional neural network model, output the coordinates of the first left face key point, and determine the first affine transformation matrix according to the coordinates of the first left face key point.
  • the affine transformation matrix and the left face image are used to obtain the corrected left face image
  • the corrected left face image is input to the third target key point convolutional neural network model
  • the coordinates of the corrected first left face key point are output according to
  • the corrected coordinates of the first left face key point and the inverse transformation of the first affine transformation matrix are used to obtain the coordinates of the second left face key point.
  • the first target key point convolutional neural network model is used to process the left face image to output the coordinates of the first left face key point of the left face image, according to the coordinates of the first left face key point and the above average image
  • the coordinates of the key points determine the first affine transformation matrix, for example, a 3*3 matrix, which makes the first affine transformation matrix multiplied by the transpose and average of the coordinates of the first left face key point
  • the two-norm gap between the coordinates of the key points of the image is the smallest.
  • the classical least squares method can be used to solve the first affine transformation matrix T L , and the left face image is aligned to the average image using the first affine transformation matrix , The corrected left face image is obtained, and the corrected left face image is shown in FIG. 12B.
  • the corrected left face image is input to the third target key point convolutional neural network model, and the corrected first left face key point coordinates are output according to the first affine
  • the inverse transformation of the transformation matrix obtains the coordinates of the key point of the second left face, thereby obtaining the key point output of the half face.
  • Step 5032 Input the right face image to the second target key point convolutional neural network model, output the coordinates of the first right face key point, and determine the second affine transformation matrix according to the coordinates of the first right face key point.
  • the affine transformation matrix and the right face image are used to obtain the corrected right face image
  • the corrected right face image is input to the fourth target key point convolutional neural network model
  • the coordinates of the corrected first right face key point are output according to
  • the corrected coordinates of the first right face key point and the inverse transformation of the second affine transformation matrix are used to obtain the coordinates of the second right face key point.
  • the second target key point convolutional neural network model is used to process the right face image to output the coordinates of the first right face key point of the right face image, according to the coordinates of the first right face key point and the above average image
  • the coordinates of the key points determine the second affine transformation matrix, for example, a 3*3 matrix.
  • the second affine transformation matrix makes the second affine transformation matrix multiplied by the transpose and average of the coordinates of the first right face key point
  • the two-norm gap between the coordinates of the key points of the image is the smallest.
  • the classical least squares method can be used to solve the first affine transformation matrix T R , and the second affine transformation matrix is used to align the right face image to the average image ,
  • the corrected right face image is obtained, and the corrected right face image is shown in FIG. 12B.
  • the corrected right face image is input to the fourth target key point convolutional neural network model, and the corrected first right face key point coordinates are output according to the second affine transformation
  • the inverse transformation of the matrix obtains the coordinates of the key point of the second right face, thereby obtaining the key point output of the half face.
  • Step 504 Obtain the coordinates of the face key points of the face image according to the coordinates of the second left face key point and the coordinates of the second right face key point.
  • the coordinates of the second left face key point and the coordinates of the second right face key point are summarized.
  • the coordinates of the second left face key point and the coordinates of the second right face key point are compared with those of the embodiment shown in FIG. 11
  • the coordinates of a key point on the left face and the coordinates of a key point on the first right face have higher accuracy.
  • this embodiment uses the first target keypoint convolutional neural network model to process the left face image, corrects the left face image according to the output result of the first target keypoint convolutional neural network model, and uses the third target
  • the key point convolutional neural network model processes the corrected left face image, which can improve the positioning accuracy of the left face key points.
  • the second target key point convolutional neural network model is used to process the right face image according to the second target key
  • the output result of the point convolutional neural network model corrects the right face image
  • the fourth target key point convolutional neural network model is used to process the corrected right face image, which can improve the positioning accuracy of the right face key points.
  • the target key point convolutional neural network model, the second target key point convolutional neural network model, the third target key point convolutional neural network model, and the fourth target key point convolutional neural network model are pairs of training samples using different pose information
  • the model is trained to balance the influence of face images with different pose information on the optimization of the key point convolutional neural network model, and effectively improve the accuracy of face key point positioning.
  • the first target key point convolutional neural network model, the second target key point convolutional neural network model, the third target key point convolutional neural network model, and the fourth target key point convolutional neural network model are two-stage half face
  • the regression model has high half-face positioning accuracy, and it can use the structural features of the face to further improve the positioning accuracy of the key points of the face of the model.
  • the image processing method of the foregoing embodiment of the present application can locate key points of a human face, and the image processing method can be applied to different scenarios such as face recognition, face pose estimation, face image quality evaluation, video interaction, and in vivo verification.
  • the following uses several specific application scenarios to illustrate.
  • FIG. 13 is a schematic diagram of an application scenario of the image processing method of an embodiment of the application. As shown in FIG. 13, any image processing method of the foregoing embodiments of the application can be applied to the model application device shown in FIG.
  • the model application device is provided with a camera, the camera faces the driver, the camera can be fixed above the vehicle operating platform or other positions, and the model application device stores the target key point convolutional neural network model of the embodiment of the present application.
  • the camera of the model application device can collect a photo of the driver’s face or take a camera of the driver, and use the image processing method of this application to process each frame of the photo or the video obtained by the camera to locate the driver’s person Face key points, and then determine whether to send an alarm signal according to the face key points.
  • the implementation of determining whether to issue an alarm signal according to the key points of the human face may be: determining the driver's behavior according to the key points of the human face, and judging whether the driver's behavior satisfies a preset condition, and the preset condition may include frequent driving of the driver.
  • the duration of head and eyes closed exceeds the preset duration, etc.
  • the alarm signal can trigger the speaker to play a prompt tone, or trigger the steering wheel to vibrate.
  • FIGS. 14A to 14C are schematic diagrams of the interface of an application scenario of the image processing method of the embodiment of the application.
  • the above image processing method may be applied to a model application device, which may be any of the above A terminal
  • the model application device is provided with a client (for example, APP)
  • the client collects a face image through the camera of the model application device, and determines the key points of the face through the image processing method in the embodiment of the application, and then according to the person
  • the key points of the face realize interactive operations such as virtual makeup and wearing decoration.
  • the model application device displays the image preview interface of the client.
  • the image preview interface can be any one of the left interfaces in Figures 14A to 14C.
  • the client uses the camera of the model application device to collect face images, and through this The image processing method of the application embodiment determines the key points of the face.
  • the client adjusts the image to be processed according to the coordinates of the key points of the face and the beauty effect parameters.
  • the adjusted image to be processed is displayed on the image preview interface, and the adjusted to be processed
  • the image can be any right interface as shown in Fig. 14A to Fig. 14C.
  • the beauty effect parameters include virtual decoration parameters, face-lifting parameters, eye size adjustment parameters, dermabrasion and anti-acne parameters, skin whitening parameters, teeth whitening parameters and blush parameters At least one or a combination of them.
  • the beauty effect parameters are determined according to the trigger instruction input by the user, as shown in any left interface of FIGS. 14A to 14C, the image preview interface includes multiple graphic components, and each image component is used to trigger a beauty Color effects, for example, the first image component is used to trigger the addition of virtual decoration 1, the second image component is used to trigger the addition of virtual decoration 2, and the third image component is used to trigger the addition of virtual decoration 3.
  • the image preview interface switches to the right interface as shown in Figure 14A, that is, the virtual decoration of rabbit ears is added to the forehead of the face image.
  • the nose part of the face image adds a virtual decoration of the rabbit nose.
  • the image preview interface switches to the right interface as shown in Figure 14B, that is, the virtual decoration of glasses is added to the eyes of the face image.
  • the background area of the image adds a virtual decoration of mathematical symbols.
  • the image preview interface switches to the right interface as shown in FIG. 14C, that is, the virtual decoration of the crown is added to the forehead of the face image.
  • the key point face image according to the coordinates of the key point of the face and the face image, mark the key point of the face in the key point face image, and preview it in the image
  • the interface displays the face image of the key points, and receives the key point adjustment instructions input by the user, and the key point adjustment instructions are used to indicate the adjusted key points of the face.
  • the image to be processed is adjusted according to the adjusted key points of the face and the beauty effect parameters.
  • the image processing method of the foregoing embodiment of the present application can accurately locate key points of a human face, the beauty effect can be improved.
  • the key point face image according to the coordinates of the key point of the face and the face image, mark the key point of the face in the key point face image, and preview it in the image
  • the interface displays the face image of the key points, and receives the key point adjustment instructions input by the user, and the key point adjustment instructions are used to indicate the adjusted key points of the face.
  • the image to be processed is adjusted according to the adjusted key points of the face and the beauty effect parameters.
  • a person’s identity can usually be determined based on a person’s face.
  • people’s posture angles are different.
  • the image processing method of the above-mentioned embodiment of this application can Accurately determine the key points of the face, thereby greatly reducing the difficulty of the face recognition algorithm and improving the algorithm's recognition ability.
  • the camera in the video surveillance system can collect the image to be processed, obtain the face image, process the face image through the image processing method of the embodiment of the application, and output the coordinates of the face key points of the face image.
  • the face recognition module can extract features of the face image according to the coordinates of the key points of the face, obtain the features of the face image, match the features of the face image with the feature template in the database, and output the recognition result.
  • the image processing method provided in this application is not only applicable to application scenarios where the terminal device uses a front camera sensor to shoot, but also applies to application scenarios where the terminal device uses a rear camera sensor to shoot.
  • the method of the present application is also applicable to application scenarios where the terminal device uses dual camera sensors to shoot.
  • the terminal device can process the image output by the camera sensor by using the method steps of step 301 to step 302, or step 401 to step 404, or step 501 to step 504.
  • the methods or steps implemented by the terminal device may also be implemented by a chip inside the terminal device.
  • FIG. 15 is a schematic structural diagram of a terminal device according to an embodiment of the application. As shown in Figure 15, the aforementioned terminal device may include:
  • the obtaining module 101 is used to obtain a face image.
  • the processing module 102 is configured to obtain a left face image and a right face image respectively according to the face image, and the size of the left face image and the right face image are the same as the size of the face image;
  • the processing module 102 is further configured to input the left face image into a first target key point convolutional neural network model, and output the coordinates of the first left face key point, and the first target key point convolutional neural network model is used
  • the left face image with key point information is obtained after training the key point convolutional neural network model;
  • the processing module 102 is further configured to input the right face image to a second target keypoint convolutional neural network model, and output the coordinates of the first right face keypoint, and the second target keypoint convolutional neural network model is used
  • the right face image with key point information is obtained after training the key point convolutional neural network model;
  • the processing module 102 is further configured to obtain the coordinates of the face key points of the face image according to the coordinates of the first left face key point and the coordinates of the first right face key point.
  • the left face image with key point information and the right face image with key point information are obtained according to face images with different posture information, and the face images with different posture information have corresponding
  • the key point information of different posture information includes the face image of the first posture information, the face image of the second posture information, and the face image of the third posture information.
  • the first posture information is used to represent the face
  • the direction of the deflection angle is posture information to the left
  • the second posture information is used to indicate that the direction of the deflection angle of the face is positive posture information
  • the third posture information is used to indicate that the direction of the deflection angle of the face is right. Posture information.
  • the processing module 102 is configured to: determine a first affine transformation matrix according to the coordinates of the first left face key point; obtain the corrected affine transformation matrix according to the first affine transformation matrix and the left face image Left face image; input the corrected left face image to the third target keypoint convolutional neural network model, and output the coordinates of the corrected first left face keypoint; according to the corrected first left face key Point coordinates and the inverse transformation of the first affine transformation matrix to obtain the coordinates of the second left face key point; according to the coordinates of the second left face key point and the coordinates of the first right face key point, obtain The coordinates of key points on the face of the face image.
  • the processing module 102 is configured to: determine a second affine transformation matrix according to the coordinates of the first right face key point; obtain the corrected affine transformation matrix according to the second affine transformation matrix and the right face image Right face image; input the corrected right face image to the fourth target key point convolutional neural network model, and output the coordinates of the corrected first right face key point; according to the corrected first right face key Point coordinates and the inverse transformation of the second affine transformation matrix to obtain the coordinates of the second right face key point; according to the coordinates of the second right face key point and the coordinates of the first left face key point, obtain The coordinates of key points on the face of the face image.
  • the processing module 102 is configured to: determine a first affine transformation matrix according to the coordinates of the first left face key point, and obtain the corrected affine transformation matrix according to the first affine transformation matrix and the left face image A left face image; a second affine transformation matrix is determined according to the coordinates of the first right face key point, and a corrected right face image is obtained according to the second affine transformation matrix and the right face image; and the correction
  • the left face image is input to the third target key point convolutional neural network model, and the corrected first left face key point coordinates are output according to the coordinates of the corrected first left face key point and the first Inverse transformation of the affine transformation matrix to obtain the coordinates of the second left face key point; input the corrected right face image to the fourth target key point convolutional neural network model, and output the corrected first right face key point According to the coordinates of the corrected first right face key point and the inverse transformation of the second affine transformation matrix, the coordinates of the second right face key point are obtained; according to the
  • the acquisition module 101 is further configured to: classify multiple training samples based on posture information, and obtain s training sample sets, where the training samples include face images with key point information; from the s Multiple training samples are selected from at least three sets in the training sample set as training data; using the training data to train two key point convolutional neural network models to obtain the first target key point convolutional neural network model And the second target key point convolutional neural network model; where s is any integer greater than or equal to 3.
  • the acquisition module 101 is further configured to: collect the image to be processed through the photographing function or the photographing function of the terminal; and to intercept the face image in the image to be processed.
  • the terminal device provided in the present application can execute the foregoing method embodiments, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 16 is a schematic structural diagram of another terminal device according to an embodiment of this application.
  • the terminal device may further include: a driving warning module 103, configured to determine driver behavior according to the coordinates of the key points of the human face, and according to the driver behavior Determine whether to send an alarm signal.
  • a driving warning module 103 configured to determine driver behavior according to the coordinates of the key points of the human face, and according to the driver behavior Determine whether to send an alarm signal.
  • FIG. 17 is a schematic structural diagram of another terminal device according to an embodiment of this application.
  • the terminal device may further include an adjustment module 104 for adjusting the image to be processed according to the coordinates of the key points of the face and the beauty effect parameters,
  • the adjusted image to be processed is displayed on the image preview interface;
  • the beauty effect parameters include virtual decoration parameters, face-lifting parameters, eye size adjustment parameters, dermabrasion and anti-acne parameters, skin whitening parameters, teeth whitening parameters, and blush parameters At least one or a combination of them.
  • the adjustment module 104 is further configured to: obtain a face image of a key point according to the coordinates of the key point of the human face and the face image, where the face image of the key point is marked with the face Key points; display the key point face image on the image preview interface; the adjustment module 104 is also used to receive a key point adjustment instruction input by the user, the key point adjustment instruction is used to indicate the adjusted face key point; The adjusted face key points and beauty effect parameters are adjusted to the image to be processed.
  • FIG. 18 is a schematic structural diagram of another terminal device according to an embodiment of this application. As shown in FIG. 18, based on the block diagram shown in FIG. 15, the terminal device may further include: a face recognition module 105, configured to perform face recognition according to the coordinates of the key points of the face.
  • a face recognition module 105 configured to perform face recognition according to the coordinates of the key points of the face.
  • the face recognition module 105 is configured to: perform feature extraction on the face image according to the coordinates of the key points of the face to obtain face image features; and match the face image features with the feature template in the database , Output the recognition result.
  • the terminal device provided in the present application can execute the foregoing method embodiments, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 19 is a schematic structural diagram of another terminal device according to an embodiment of the application.
  • the terminal device may include: a processor 21 (such as a CPU) and a memory 22; the memory 22 may include a high-speed RAM memory, or may also include a non-volatile memory NVM, such as at least one disk memory, the memory 22 Various instructions can be stored in it to complete various processing functions and implement the method steps of the present application.
  • the terminal device involved in the present application may further include: a receiver 23, a transmitter 24, a power supply 25, a communication bus 26, and a communication port 27.
  • the receiver 23 and the transmitter 24 may be integrated in the transceiver of the terminal device, or may be independent transceiver antennas on the terminal device.
  • the communication bus 26 is used to implement communication connections between components.
  • the aforementioned communication port 27 is used to realize connection and communication between the terminal device and other peripherals.
  • the above-mentioned memory 22 is used to store computer executable program code, and the program code includes instructions; when the processor 21 executes the instructions, the instructions cause the terminal device to execute the above-mentioned method embodiments.
  • the program code includes instructions; when the processor 21 executes the instructions, the instructions cause the terminal device to execute the above-mentioned method embodiments.
  • the implementation principles and technical effects are similar. No longer.
  • FIG. 20 is a structural block diagram when the terminal device in an embodiment of the application is a mobile phone.
  • the mobile phone may include: a radio frequency (RF) circuit 1110, a memory 1120, an input unit 1130, a display unit 1140, a sensor 1150, an audio circuit 1160, a wireless fidelity (WiFi) module 1170, a processing Adapter 1180, and power supply 1190.
  • RF radio frequency
  • the structure of the mobile phone shown in FIG. 20 does not constitute a limitation on the mobile phone, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the RF circuit 1110 can be used for receiving and sending signals during information transmission or communication. For example, after receiving the downlink information of the base station, it is processed by the processor 1180; in addition, the uplink data is sent to the base station.
  • the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • the RF circuit 1110 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE)), Email, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 1120 can be used to store software programs and modules.
  • the processor 1180 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1120.
  • the memory 1120 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones.
  • the memory 1120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 1130 can be used to receive input digital or character information, and generate key signal input related to user settings and function control of the mobile phone.
  • the input unit 1130 may include a touch panel 1131 and other input devices 1132.
  • the touch panel 1131 also called a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1131 or near the touch panel 1131. Operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1131 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 1180, and can receive and execute commands sent by the processor 1180.
  • the touch panel 1131 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 1130 may also include other input devices 1132.
  • other input devices 1132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick.
  • the display unit 1140 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
  • the display unit 1140 may include a display panel 1141.
  • the display panel 1141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
  • the touch panel 1131 can cover the display panel 1141. When the touch panel 1131 detects a touch operation on or near it, it transmits it to the processor 1180 to determine the type of the touch event, and then the processor 1180 The type of touch event provides corresponding visual output on the display panel 1141.
  • the touch panel 1131 and the display panel 1141 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 1131 and the display panel 1141 can be integrated. Realize the input and output functions of mobile phones.
  • the mobile phone may also include at least one sensor 1150, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1141 according to the brightness of the ambient light.
  • the light sensor can close the display panel 1141 and/or when the mobile phone is moved to the ear. Or backlight.
  • the acceleration sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary.
  • the audio circuit 1160, the speaker 1161, and the microphone 1162 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 1160 can transmit the electric signal after the conversion of the received audio data to the speaker 1161, which is converted into a sound signal by the speaker 1161 for output; on the other hand, the microphone 1162 converts the collected sound signal into an electric signal, and the audio circuit 1160 After being received, it is converted into audio data, processed by the audio data output processor 1180, and sent to another mobile phone via the RF circuit 1110, or the audio data is output to the memory 1120 for further processing.
  • WiFi is a short-distance wireless transmission technology.
  • the mobile phone can help users send and receive e-mails, browse web pages, and access streaming media through the WiFi module 1170. It provides users with wireless broadband Internet access.
  • FIG. 20 shows the WiFi module 1170, it is understandable that it is not a necessary component of the mobile phone, and can be omitted as needed without changing the essence of the application.
  • the processor 1180 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. It executes by running or executing software programs and/or modules stored in the memory 1120, and calling data stored in the memory 1120. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole.
  • the processor 1180 may include one or more processing units; for example, the processor 1180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, and application programs, etc.
  • the modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1180.
  • the mobile phone also includes a power supply 1190 (such as a battery) for supplying power to various components.
  • a power supply 1190 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 1180 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the mobile phone may also include a camera 1200, which may be a front camera or a rear camera. Although not shown, the mobile phone may also include a Bluetooth module, a GPS module, etc., which will not be repeated here.
  • the processor 1180 included in the mobile phone may be used to execute the foregoing image processing method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 21 is a schematic structural diagram of a training device according to an embodiment of this application.
  • the training device of this embodiment may include: an image acquisition module 201, which is used to acquire face images with key points according to different posture information Information about the left face image and the right face image with key point information, the face images with different posture information have corresponding key point information; the training module 202 is configured to use the left face image with key point information to key
  • the point convolutional neural network model is trained to obtain the first target keypoint convolutional neural network model.
  • the first target keypoint convolutional neural network model is used to process the input left face image and output the left face keypoint Coordinates; use the right face image with key point information to train the key point convolutional neural network model to obtain a second target key point convolutional neural network model, the second target key point convolutional neural network model for Process the input right face image and output the coordinates of the key points of the right face; among them, the posture information is used to reflect the deflection angle of the face.
  • the face image of different posture information includes the face image of the first posture information, the face image of the second posture information, and the face image of the third posture information.
  • the first posture information is used to indicate the direction of the deflection angle of the face.
  • the left posture information, the second posture information is used to indicate the posture information of the deflection angle of the face being positive
  • the third posture information is used to indicate the posture information of the deflection angle of the face being right.
  • the image acquisition module 201 is further configured to classify multiple training samples based on posture information, and obtain s training sample sets.
  • the training samples include face images with key point information; from s training sample sets Multiple training samples are selected from at least three sets in, as face images with different posture information.
  • the training device described above in this embodiment can be used to execute the technical solutions executed by the training device/training device chip or the application server/application server chip in the foregoing embodiment.
  • the implementation principles and technical effects are similar, and each module For the function of, refer to the corresponding description in the method embodiment, which will not be repeated here.
  • FIG. 22 is a schematic structural diagram of another training device according to an embodiment of this application. As shown in FIG. 22, the training device in this embodiment includes a transceiver 211 and a processor 212.
  • the transceiver 211 may include necessary radio frequency communication devices such as a mixer.
  • the processor 212 may include at least one of a CPU, DSP, MCU, ASIC, or FPGA.
  • the training device of this embodiment may further include a memory 213, where the memory 213 is used to store program instructions, and the transceiver 211 is used to call the program instructions in the memory 213 to execute the above solution.
  • the training device described above in this embodiment can be used to execute the technical solutions executed by the training device/training device chip or the application server/application server chip in the foregoing method embodiments.
  • the implementation principles and technical effects are similar.
  • the function of each device can refer to the corresponding description in the method embodiment, which will not be repeated here.
  • FIG. 23 is a schematic structural diagram of a chip of an embodiment of the application. As shown in FIG. 23, the chip of this embodiment may be used as a chip of a training device or a chip of an application server.
  • the chip of this embodiment may include: memory 221 and Processor 222.
  • the memory 221 is in communication connection with the processor 222.
  • the processor 222 may include at least one of a CPU, a DSP, an MCU, an ASIC, or an FPGA, for example.
  • each of the above functional modules may be embedded in the processor 222 in the form of hardware or independent of the chip.
  • the memory 221 is used to store program instructions, and the processor 222 is used to call the program instructions in the memory 221 to execute the above solution.
  • the program instructions can be implemented in the form of a software functional unit and can be sold or used as an independent product, and the memory can be a computer readable storage medium in any form. Based on this understanding, all or part of the technical solution of the present application can be embodied in the form of a software product, including several instructions to enable a computer device, specifically the processor 222, to execute the network in each embodiment of the present application. All or part of the equipment steps.
  • the aforementioned computer-readable storage medium includes: U disk, mobile hard disk, ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
  • the chip described above in this embodiment can be used to implement the technical solutions of the training device or its internal chip in the foregoing method embodiments of this application.
  • the implementation principles and technical effects are similar.
  • the functions of each module can be referred to in the method embodiments. The corresponding description will not be repeated here.
  • modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • the functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions can be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means to transmit to another website, computer, server or data center.
  • a computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法和装置。所述图像处理方法,包括:获取人脸图像;根据人脸图像分别获取左脸图像和右脸图像;将左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,第一目标关键点卷积神经网络模型为使用具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练后获取的;将右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,第二目标关键点卷积神经网络模型为使用具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练后获取的;根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。该图像处理方法可以提升人脸关键点定位的精度。

Description

图像处理方法和装置
本申请要求于2019年05月21日提交中国国家知识产权局、申请号为201910421550.3、申请名称为“图像处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像处理技术,尤其涉及一种图像处理方法和装置。
背景技术
随着科技的发展,近年来关于人脸的相关应用层出不穷,例如,人脸识别、三维人脸重建、活体检测、人脸美颜以及情感估计等。各种人脸相关的应用的基础是人脸关键点检测。人脸关键点检测(也可以称之为人脸对齐)指的是输入一张人脸图像,通过计算机视觉算法得到预先定义的关键点坐标,比如眼角、嘴角、鼻尖、脸部轮廓等,即对该人脸图像进行处理以预测出眼睛,鼻子,嘴巴等一些关键点的位置。
从人脸图像得到关键点坐标可以使用各式各样的算法,例如,基于回归的方法、基于神经网络的方法等。其中,通过卷积神经网络实现上述过程得到广泛应用。
然而,在实际场景中,所获取的人脸图像中大多数是姿态幅度很大或者有一定程度的遮挡,通过卷积神经网络对该人脸姿态较大或有一定遮挡的人脸图像进行人脸关键点检测存在不准确的问题,即无法准确确定人脸关键点的坐标。
发明内容
本申请实施例提供一种图像处理方法和装置,以提升人脸关键点定位的精度。
第一方面,本申请实施例提供一种图像处理方法,该方法可以包括:获取人脸图像;根据人脸图像分别获取左脸图像和右脸图像,左脸图像和右脸图像的尺寸与人脸图像的尺寸相同;将左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,第一目标关键点卷积神经网络模型为使用具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练后获取的;将右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,第二目标关键点卷积神经网络模型为使用具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练后获取的;根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
本实现方式,利用第一目标关键点卷积神经网络模型对左脸图像进行处理,利用第二目标关键点卷积神经网络模型对右脸图像进行处理,半脸定位精度高,并且其可以利用人脸的结构化特征,提升人脸关键点的定位精度。
在一种可能的设计中,具有关键点信息的左脸图像和具有关键点信息的右脸图像为根据不同姿态信息的人脸图像获取的,不同姿态信息的人脸图像具有对应的关键点信息,不 同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
本实现方式中,由于第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型是使用不同姿态信息的人脸图像对关键点卷积神经网络模型进行训练后获取的,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,有效地提升人脸关键点定位的精度。
在一种可能的设计中,根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,可以包括:根据第一左脸关键点的坐标确定第一仿射变换矩阵;根据第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;将矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标;根据矫正后的第一左脸关键点的坐标和第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;根据第二左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
本实现方式中,利用第一目标关键点卷积神经网络模型对左脸图像进行处理,根据第一目标关键点卷积神经网络模型的输出结果对左脸图像进行矫正,利用第三目标关键点卷积神经网络模型对矫正后的左脸图像进行处理,可以提升左脸关键点的定位精度,进而提升人脸关键点定位的精度。
在一种可能的设计中,根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,可以包括:根据第一右脸关键点的坐标确定第二仿射变换矩阵;根据第二仿射变换矩阵和右脸图像获取矫正后的右脸图像;将矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;根据矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;根据第二右脸关键点的坐标和第一左脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
本实现方式中,利用第二目标关键点卷积神经网络模型对右脸图像进行处理,根据第二目标关键点卷积神经网络模型的输出结果对右脸图像进行矫正,利用第四目标关键点卷积神经网络模型对矫正后的右脸图像进行处理,可以提升右脸关键点的定位精度,进而提升人脸关键点定位的精度。
一种可能的设计中,根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,可以包括:根据第一左脸关键点的坐标确定第一仿射变换矩阵,根据所述第一右脸关键点的坐标确定第二仿射变换矩阵;根据第一仿射变换矩阵和左脸图像获取矫正后的左脸图像,根据第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;将矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,将矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;根据矫正后的第一左脸关键点的坐标和第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标,根据矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;根据第二左脸关键点的坐标和第二右脸 关键点的坐标,获取人脸图像的人脸关键点的坐标。
本实现方式,利用第一目标关键点卷积神经网络模型对左脸图像进行处理,根据第一目标关键点卷积神经网络模型的输出结果对左脸图像进行矫正,利用第三目标关键点卷积神经网络模型对矫正后的左脸图像进行处理,可以提升左脸关键点的定位精度,利用第二目标关键点卷积神经网络模型对右脸图像进行处理,根据第二目标关键点卷积神经网络模型的输出结果对右脸图像进行矫正,利用第四目标关键点卷积神经网络模型对矫正后的右脸图像进行处理,可以提升右脸关键点的定位精度,进而提升人脸关键点定位的精度,从而提升人脸关键点定位的精度。
在一种可能的设计中,该方法还可以包括:基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,训练样本包括具有关键点信息的人脸图像;从s个训练样本集合中至少三个集合中选取多个训练样本,作为训练数据;使用训练数据对两个关键点卷积神经网络模型进行训练,获取第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型;其中,s为大于等于3的任意整数。
本实现方式,训练数据的选取可以提升模型的收敛速度,提升模型的训练速度,基于姿态信息的训练数据的选取使得训练数据可以平衡各个角度的人脸对于模型优化的影响,提升人脸关键点的定位精度。例如,可以提升对偏转角度大的人脸图像的关键点的定位精度。
在一种可能的设计中,获取人脸图像,可以包括:通过终端的拍照功能或拍摄功能采集待处理图像;在所述待处理图像中截取所述人脸图像。
在一种可能的设计中,该方法还可以包括:根据所述人脸关键点的坐标确定驾驶员行为,根据所述驾驶员行为确定是否发出告警信号。
在一种可能的设计中,该方法还可以包括:根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,在图像预览界面显示调整后的待处理图像;所述美颜效果参数包括虚拟装饰参数、瘦脸参数、眼睛大小调整参数、磨皮去痘参数、皮肤美白参数、牙齿美白参数和腮红参数中至少一项或其组合。
在一种可能的设计中,在显示调整后的待处理图像之前,该方法还可以包括:根据人脸关键点的坐标和所述人脸图像,获取关键点人脸图像,所述关键点人脸图像中标记有所述人脸关键点;在图像预览界面显示关键点人脸图像;接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点;根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,可以包括:根据所述调整后的人脸关键点和美颜效果参数对所述待处理图像进行调整。
在一种可能的设计中,该方法还可以包括:根据人脸关键点的坐标进行人脸识别。
在一种可能的设计中,根据所述人脸关键点的坐标进行人脸识别,可以包括:根据所述人脸关键点的坐标对所述人脸图像进行特征提取,获取人脸图像特征;将所述人脸图像特征与数据库中的特征模板进行匹配,输出识别结果。
第二方面,本申请实施例提供一种图像处理方法,该方法可以包括:根据不同姿态信息的人脸图像获取具有关键点信息的左脸图像和具有关键点信息的右脸图像,所述不同姿态信息的人脸图像具有对应的关键点信息;使用所述具有关键点信息的左脸图像对关键点 卷积神经网络模型进行训练,获取第一目标关键点卷积神经网络模型,所述第一目标关键点卷积神经网络模型用于对输入的左脸图像进行处理,输出左脸关键点的坐标;使用所述具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练,获取第二目标关键点卷积神经网络模型,所述第二目标关键点卷积神经网络模型用于对输入的右脸图像进行处理,输出右脸关键点的坐标;其中,所述姿态信息用于反映人脸的偏转角度。
在一种可能的设计中,所述不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
在一种可能的设计中,该方法还可以包括:基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为所述不同姿态信息的人脸图像;其中,s为大于等于3的任意整数。
第三方面,提供了一种图像处理装置,该装置可以是终端设备,也可以是终端设备内的芯片。该装置具有实现上述各实施例涉及终端设备的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的单元。
在一种可能的设计中,当该装置为终端设备时,该装置可以包括:获取模块和处理模块,所述获取模块和处理模块例如可以是处理器,所述获取模块可以与收发模块连接,该收发模块例如可以是收发器,所述收发器可以包括射频电路和基带电路。
可选地,所述装置还可以包括存储单元,该存储单元例如可以是存储器。当该装置包括存储单元时,该存储单元用于存储计算机执行指令,该获取模块和处理模块与该存储单元连接,该获取模块和处理模块执行该存储单元存储的计算机执行指令,以使该终端设备执行上述涉及终端设备功能的图像处理方法。
在另一种可能的设计中,当该装置为终端设备内的芯片时,该芯片包括:处理模块和收发模块,所述处理模块例如可以是处理器,所述收发模块例如可以是该芯片上的输入/输出接口、管脚或电路等。可选的,该装置还可以包括存储单元,该处理模块可执行存储单元存储的计算机执行指令,以使该终端设备内的芯片执行上述任一方面涉及终端设备功能的图像处理方法。
可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是终端设备内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,简称ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,简称RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器(Central Processing Unit,简称CPU),微处理器,特定应用集成电路(application-specific integrated circuit,简称ASIC),或一个或多个用于控制上述各方面图像处理方法的程序执行的集成电路。
第四方面,本申请提供一种图像处理装置,该装置可以是训练设备,也可以是训练设备内的芯片。该装置具有实现上述各方面涉及训练设备的各实施例的功能。该功 能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的单元。
在一种可能的设计中,当该装置为训练设备时,该装置可以包括:处理模块和收发模块,所述处理模块例如可以是处理器,所述收发模块例如可以是收发器,所述收发器包括射频电路,可选地,所述装置还包括存储单元,该存储单元例如可以是存储器。当装置包括存储单元时,该存储单元用于存储计算机执行指令,该处理模块与该存储单元连接,该处理模块执行该存储单元存储的计算机执行指令,以使该装置执行上述任意一方面涉及训练设备的图像处理方法。
在另一种可能的设计中,当该装置为训练设备内的芯片时,该芯片包括:处理模块和收发模块,所述处理模块例如可以是处理器,所述收发模块例如可以是该芯片上的输入/输出接口、管脚或电路等。该处理模块可执行存储单元存储的计算机执行指令,以使该接入点内的芯片执行上述各方面涉及训练设备的图像处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述训练设备内的位于所述芯片外部的存储单元,如ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM等。
其中,上述任一处提到的处理器,可以是一个CPU,微处理器,ASIC,或一个或多个用于控制上述图像处理方法的程序执行的集成电路。
第五方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码用于指示执行上述第一方面至第二方面中的任一方面或其任意可能的实现方式中的方法的指令。
第六方面,提供了一种处理器,用于与存储器耦合,用于执行上述第一方面至第二方面中的任一方面或其任意可能的实现方式中的方法。
第七方面,提供了一种包含指令的计算机程序产品,其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其任意可能的实现方式中的方法。
本申请实施例图像处理方法和装置,通过获取人脸图像,将所述人脸图像输入至目标关键点卷积神经网络模型,输出人脸关键点的坐标,其中,所述目标关键点卷积神经网络模型为使用不同姿态信息的人脸图像对关键点卷积神经网络模型进行训练后获取的,所述姿态信息用于反映人脸的偏转角度,从而可以提升人脸关键点定位的精度。
附图说明
下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍。
图1为本申请实施例的人脸关键点的示意图;
图2为本申请实施例的二维和三维人脸关键点的示意图;
图3A为本申请实施例的图像处理方法的一种网络架构的示意图;
图3B为本申请实施例的图像处理方法的另一种网络架构的示意图;
图4A为本申请实施例的图像处理方法的训练数据构建流程的示意图;
图4B为本申请实施例的图像处理方法的训练数据构建的示意图;
图5A为本申请实施例的未使用GPA算法对训练样本进行处理的人脸关键点的分布示意图;
图5B为本申请实施例的图像处理方法中使用GPA算法后的人脸关键点的分布示意图;
图6为本申请实施例的图像处理方式的训练样本集合的分布示意图;
图7A为本申请实施例的关键点卷积神经网络模型的示意图;
图7B为本申请实施例的关键点卷积神经网络模型的训练方法的流程图;
图7C和图7D为本申请实施例的ResNet50的网络结构示意图;
图8为本申请实施例的关键点卷积神经网络模型训练的示意图;
图9为本申请实施例的关键点卷积神经网络模型训练的示意图;
图10为本申请实施例的图像处理方法的流程图;
图11A为本申请实施例的另一种图像处理方法的流程图;
图11B为本申请实施例的另一种图像处理方法的示意图;
图12A为本申请实施例的另一种图像处理方法的流程图;
图12B为本申请实施例的另一种图像处理方法的示意图;
图13为本申请实施例的图像处理方法的一种应用场景的示意图;
图14A至图14C为本申请实施例的图像处理方法的一种应用场景的界面示意图;
图15为本申请实施例的一种终端设备的结构示意图;
图16为本申请实施例的又一种终端设备的结构示意图;
图17为本申请实施例的又一种终端设备的结构示意图;
图18为本申请实施例的又一种终端设备的结构示意图;
图19为本申请实施例的又一种终端设备的结构示意图;
图20为本申请实施例的终端设备为手机时的结构框图;
图21为本申请实施例的一种训练设备的结构示意图;
图22为本申请实施例的另一种训练设备的结构示意图;
图23为本申请实施例的一种芯片的结构示意图。
具体实施方式
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解:
人脸关键点:用于在人脸图像中定位出人脸面部的关键区域位置的点,关键区域包括眉毛、眼睛、鼻子、嘴巴、脸部轮廓等区域。例如,如图1所示,人脸关键点为图中标注出的各个点。
人脸关键点检测:也称为人脸关键点定位或者人脸对齐,指对输入的人脸图像进行处理,确定出如上所述的人脸关键点。确定人脸关键点的方式可以为通过数据挖掘模型对输入的人脸图像进行处理,以确定人脸关键点。其中,该数据挖掘模型可以是神经网络模型,例如,卷积神经网络模型等。以卷积神经网络模型为例,如图1所示,将人脸图像输入至卷积神经网络模型,该卷积神经网络模型输出人脸关键点的坐标,即如图1中标注出的各个点的坐标。
人脸关键点定位可以分为二位视角和三维视角,如图2所示,圆形的标注点代表二维 视角下的人脸关键点位置,即在人脸有大角度的情况下存在一些关键点不可见。对于二维视角的人脸关键点定位只考虑在图像中可见的位置上进行标注。与二维视角的人脸关键点定位不同的是,三维视角的人脸关键点如图2所示的方形的标注代表三维视角下的人脸关键点位置,对于大角度人脸,不可见的关键点也要预估其真实坐标。本申请下述实施例所涉及的人脸关键点指二维视角下的关键点。
卷积神经网络(Convolutional Neural Network,CNN)模型:一种前馈神经网络模型,神经网络模型的人工神经元可以响应一部分覆盖范围内的周围单元,可以使用CNN进行图像处理。卷积神经网络模型可以由一个或多个卷积层和顶端的全连通层组成,还可以包括关联权重和池化层(pooling layer)。与其他深度学习结构相比,卷积神经网络模型在图像和语音识别方面能够给出更优的结果。这一模型也可以使用反向传播算法进行训练。相比较其他深度、前馈神经网络,卷积神经网络模型需要估计的参数更少。本申请实施例以上述卷积神经网络模型为例做举例说明,本申请不以此作为限制。该卷积神经网络模型的参数(例如,权值参数和偏置参数)用于表示卷积神经网络模型。不同的卷积神经网络模型的参数不同,其处理性能也不尽相同。
训练设备,可以使用训练数据对卷积神经网络模型进行训练,该训练设备的构成包括处理器、硬盘、内存、系统总线等。本申请所涉及的卷积神经网络模型具体指用于实现人脸关键点定位的卷积神经网络模型,可称之为关键点卷积神经网络模型。
关键点卷积神经网络模型的训练过程,具体指通过对训练数据的学习,调整该模型的参数,使得该卷积神经网络模型的输出尽可能的接近目标值,例如该目标值为正确的人脸关键点的坐标。
需要说明的是,本申请的关键点卷积神经网络模型的网络结构可以参见现有技术的卷积神经网络模型,本申请与现有技术不同之处在于对关键点卷积神经网络模型进行如下述实施例的训练,得到目标关键点卷积神经网络模型。本申请实施例的“调整关键点卷积神经网络模型”即指调整网络模型所涉及的参数,例如,权值参数和偏置参数。
训练样本,包括具有关键点信息的人脸图像,例如,具有关键点信息的人脸图像可以包括人脸图像和对应的人脸关键点的坐标,即本申请实施例的关键点信息的一种表现形式可以为人脸关键点的坐标,该人脸关键点的坐标与人脸图像对应,用于标识该人脸图像中的关键点。本申请实施例根据人脸的偏转角度对多个训练样本进行分类,例如,将向左偏转的人脸图像分为一类,形成一个训练样本集合,将向右偏转的人脸图像分为一类,形成一个训练样本集合,将正脸的人脸图像分为一类,形成一个训练样本集合,即三个训练样本集合。当然可以理解的,训练样本集合的个数可以根据需要进行灵活设置,以对训练样本进行分类,获取相应个数的训练样本集合。
训练数据,包括一个或多个训练样本,该一个或多个训练样本可以来自相同或不同的训练样本集合。训练数据用于训练上述关键点卷积神经网络模型。本申请实施例的训练数据也可以称之为小批处理(minibatch)数据,本申请实施例基于人脸图像的姿态信息构建该训练数据,其中,训练数据的具体构建过程可以参见下述实施例的具体说明。
姿态信息,用于反映人脸的偏转角度的信息,例如,人脸向左偏转15度、向右偏转25度等。
终端:可以是无线终端也可以是有线终端,无线终端可以是指向用户提供语音和/或其他业务数据连通性的设备,具有无线连接功能的手持式设备、或连接到无线调制解调器的其他处理设备。无线终端可以经无线接入网(Radio Access Network,RAN)与一个或多个核心网进行通信,无线终端可以是移动终端,如移动电话(或称为“蜂窝”电话)和具有移动终端的计算机,例如,可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,它们与无线接入网交换语言和/或数据。例如,个人通信业务(Personal Communication Service,PCS)电话、无绳电话、会话发起协议(Session Initiation Protocol,SIP)话机、无线本地环路(Wireless Local Loop,WLL)站、个人数字助理(Personal Digital Assistant,PDA)等设备。无线终端也可以称为系统、订户单元(Subscriber Unit)、订户站(Subscriber Station),移动站(Mobile Station)、移动台(Mobile)、远程站(Remote Station)、远程终端(Remote Terminal)、接入终端(Access Terminal)、用户终端(User Terminal)、用户代理(User Agent)、用户设备(User Device or User Equipment),在此不作限定。
本申请中,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
图3A为本申请实施例的图像处理方法的一种网络架构的示意图,如图3A所示,该网络架构包括训练设备和模型应用设备。该训练设备使用不同姿态信息的人脸图像对关键点卷积神经网络模型进行训练,获取目标关键点卷积神经网络模型。该模型应用设备指使用本申请实施例的目标关键点卷积神经网络模型进行图像处理的设备,该模型应用设备可以是上述终端的任一种具体形式。本申请的图像处理方法可以包括:训练阶段,训练设备将使用不同姿态信息的人脸图像和相应的关键点的坐标对关键点卷积神经网络模型进行训练,得到目标关键点卷积神经网络模型。模型使用阶段,一种可实现方式,将该目标关键点卷积神经网络模型存储于模型应用设备,模型应用设备采集图像,使用该目标关键点卷积神经网络模型对采集到的图像进行处理,输出人脸关键点的坐标,以便模型应用设备根据该人脸关键点的坐标对采集到的图像进行后续处理过程,例如,该后续处理过程可以为人脸匹配处理(应用于人脸识别)。
图3B为本申请实施例的图像处理方法的另一种网络架构的示意图,如图3B所示,该网络架构包括训练设备、应用服务器和模型应用设备,训练阶段,训练设备使用不同姿态信息的人脸图像和对应的人脸关键点的坐标对关键点卷积神经网络模型进行训练,得到目标关键点卷积神经网络模型。模型使用阶段,将该目标关键点卷积神经网络模型存储于应用服务器中,一种可实现方式,应用服务器可以将该目标关键点卷积神经网络模型发送给模型应用设备,模型应用设备采集图像,使用该目标关键点卷积神经网络模型对采集到的图像进行处理,输出人脸关键点的坐标,模型应用设备根据该人脸关键点的坐标对采集到的图像进行后续处理过程,例如,该后续处理过程可以为人脸匹配处理(应用于人脸识别)。另一种可实现方式,模型应用设备采集图像,将采集到的图像发送给应用服务器,由该应用服务器使用该目标关键点卷积神经网络模型对该图像进行处理输出人脸关键点的坐标,应用服务器根据该人脸关键点的坐标对采集到的图像进行后续处理过程,例如, 该后续处理过程可以为人脸匹配处理(应用于人脸识别),并将处理结果发送给模型应用设备。
可以理解,上述的训练设备和模型应用设备可以是两个分离的设备,也可以是一个设备,例如一个如上述任一具体形式的终端,上述的训练设备和应用服务器可以是两个分离的设备,也可以是一个设备,例如一个服务器,本申请对此不作限制。
下面,对训练数据构建和训练模型的具体实施过程进行解释说明。
(1)训练数据构建
本申请实施例基于各个训练样本的人脸图像的姿态信息构建训练数据,使用该训练数据对关键点卷积神经网络进行训练,其中,该训练数据包括不同姿态信息的训练样本,即不同姿态信息的人脸图像和相应的人脸关键点的坐标。通过不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,并且可以提升训练所获取的目标卷积神经网络模型的人脸关键点的定位精度。另外,由于训练数据的选取,可以使得模型训练过程中梯度下降方向更准确。
一种可实现方式,该不同姿态信息的人脸图像可以包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,该第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,该第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,该第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。该第一姿态信息的人脸图像可以包括向左偏转程度不同的人脸图像,例如,向左偏转10度的人脸图像、向左偏转20度的人脸图像等,此处不一一举例说明。该第三姿态信息的人脸图像可以包括向右偏转程度不同的人脸图像,例如,向右偏转10度的人脸图像、向右偏转20度的人脸图像等,此处不一一举例说明。
下面采用一个具体的实施例对训练数据的构建过程进行解释说明。
图4A为本申请实施例的图像处理方法的训练数据构建流程的示意图,图4B为本申请实施例的图像处理方法的训练数据构建的示意图,如图4A所示,训练数据构建可以包括:
步骤101、基于各个训练样本的人脸图像的姿态信息对训练样本进行分类,获取s个训练样本集合。
如上所述训练样本包括人脸图像和对应的人脸关键点的坐标,对于大量的训练样本,训练样本采集自各种复杂的场景,可以基于姿态信息对其进行分类。该姿态信息用于反映人脸的偏转角度的信息,例如,该姿态信息为p,p的取值范围为[-100,100],负值表示向左偏转,正值表示向右偏转。
获取每个训练样本中人脸图像的姿态信息的一种可实现方式,将各个人脸图像的人脸关键点的坐标输入至广义普氏分析(Generalized Procrustes Analysis,GPA)算法,输出调整后的人脸关键点的坐标,将调整后的人脸关键点的坐标输入至主成分分析(Principal Component Analysis,PCA)算法,由PCA算法对调整后的人脸关键点的坐标进行降维操作,输出各个训练样本的p值。该p值用于表示姿态信息。
对GPA算法对人脸关键点的坐标的调整进行说明,GPA算法可以将所有训练样本的 人脸图像对齐到一个平均人脸图像(例如,正脸标准图),例如,对每一个人脸图像经过旋转、平移和缩放中至少一项处理,使得处理后的人脸图像位于该平均人脸图像附近位置,与该平均人脸图像的均方误差(Mean Square Error,MSE)最小。
对GPA算法的处理效果的说明,图5A为本申请实施例的未使用GPA算法对训练样本进行处理的人脸关键点的分布示意图,图5B为本申请实施例的图像处理方法中使用GPA算法后的人脸关键点的分布示意图,以每个人脸图像有68个人脸关键点为例进行举例说明,如图5A所示,没有经过GPA算法处理的多个训练样本的人脸关键点分布于平均人脸图像的人脸关键点附近,且分布较为杂乱,由此可见,训练样本的人脸图像的姿态各异,且差距较大。而如图5B所示,经过GPA算法处理的多个训练样本的人脸关键点的分布于平均人脸图像的人脸关键点附近,且分布呈现一定的椭圆形,有一定的聚集效果。由此可见,经过GPA算法,可以使得训练样本的人脸图像与该平均人脸图像的MSE最小,从而可以提升后续人脸关键点的处理的准确性。
对PCA算法对调整后的人脸关键点的坐标的处理进行说明,使用PCA算法对调整后的人脸关键点的坐标进行降维操作,输出各个训练样本的p值。需要说明的是,本申请实施例采用主成分为一维的方式,可以理解的,其也可以采用多维,本申请实施例不以一维作为限定。经过PCA算法后,每一个人脸图像的人脸关键点的坐标长度(2L,L为关键点的个数)转换为一个数字p。
根据p值,将本申请实施例的训练样本划分为s个小数据集合,即s个训练样本集合T1~Ts,每一个小数据集合内包含一定数量的训练样本。每一个训练样本集合代表了满足一定角度条件的人脸图像。
举例而言,s取值为3,基于姿态信息对多个训练样本进行分类,获取3个训练样本集合,其中,一个训练样本集合包括第一姿态信息的人脸图像和对应的人脸关键点,一个训练样本集合包括第二姿态信息的人脸图像和对应的人脸关键点,另一个训练样本集合包括第三姿态信息的人脸图像和对应的人脸关键点。可以从该3个训练样本集合中选取多个训练样本,作为训练数据,对关键点卷积神经网络模型进行训练。
另一个举例,s取值为5,基于姿态信息对多个训练样本进行分类,获取5个训练样本集合,其中,第一和第二个训练样本集合包括第一姿态信息的人脸图像和对应的人脸关键点,第三个训练样本集合包括第二姿态信息的人脸图像和对应的人脸关键点,第四和第五个训练样本集合包括第三姿态信息的人脸图像和对应的人脸关键点。其中,第一个训练样本集合中的人脸图像的p值小于-50,第二个训练样本集合中的人脸图像的p值大于等于-50,第五个训练样本集合中的人脸图像的p值大于50度,第四个训练样本集合中的人脸图像的p值小于等于50度。
另一个举例,如图4B所示,以s等于9进行举例说明,经过分类后可以获取如图4B所示的9个训练样本集合,从左至右p值逐渐变大,人脸的角度从向左侧偏头逐渐的向右侧偏头,由此可见,p值可以反映人脸图像的姿态信息。图6为本申请实施例的图像处理方式的训练样本集合的分布示意图,如图6所示,其横轴为p轴,纵轴为个数轴,所有训练样本满足正态分布,即左右侧脸的图片数量相对于正脸照片的数量少些,所有训练样本中正脸照片的数量较多,由此,本申请实施例通过下述步骤以选取训练样本对关键点卷积 神经网络模型进行训练,以提升训练后的目标关键点卷积神经网络模型对于不同姿态信息的人脸图像的关键点的定位的准确度。
示例性的,经过上述步骤处理后可以得到如图4B所示9个训练样本集合,进而通过下述步骤102以选取训练样本,构建训练数据。
当然可以理解的,其还可以是其他个数的训练样本集合,此处不一一举例说明。
步骤102、从s个训练样本集合中至少三个集合中选取多个训练样本,作为训练数据。
其中,训练数据可以包括N个训练样本,N为大于等于3的整数,即从s个训练样本集合中选取N个训练样本作为训练数据。使用训练数据对关键点卷积神经网络模型进行训练,训练过程中,每一个迭代输入N个训练样本,通过计算模型输出值与训练样本的人脸关键点的坐标的损失,通过梯度反向计算每一次模型的参数更新值,经过反复迭代以得到可使用的目标关键点卷积神经网络模型。
在一些实施例中,在从s个训练样本集合中至少三个集合中选取多个训练样本作为训练数据的一种可实现方式,根据分类结果按照样本选取策略从s个训练样本集合中至少三个集合中选取多个训练样本构建训练数据。该样本选取策略可以是按照预设比例,例如,训练数据中第一姿态信息的人脸图像占比30,第二姿态信息的人脸图像占比40,第三姿态信息的人脸图像占比30。
示例性的,对于样本选取策略可以是如下三种策略:
策略一,平均采样策略。平均采样策略指从每一个训练样本集合Ti(i取1至s)中选取N/s个训练样本,构成该批次的训练数据,即训练数据中各个偏转角度的人脸占据了均衡的比例,以保证梯度方向的准确性。例如,训练数据中第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像的比例相同,即向左偏转、正向、向右偏转的人脸图像的比例相同。
策略二,左脸加强采样策略。左脸加强采样策略指的是从s个训练样本集合中,针对人脸偏向左侧的集合多取一些,针对人脸偏向右侧的集合少取一些,针对正脸图片取更少的人脸图像。比如一个N=32的情况,按照该策略可以选取66333344的比例,即从左侧脸的两个训练样本集合中每个集合选取6张,从右侧脸的两个训练样本集合中每个集合选取4张,从近似正脸的四个训练样本集合中每个选取3张,构成整个训练迭代的训练数据,从而可以增加训练数据中侧脸的比例,并且在侧脸中强调了左脸的比例,保证了模型对于左侧脸有更好的定位效果。
策略三,右脸加强采样策略。右脸加强采样策略指的是s个训练样本集合中,针对人脸偏向右侧的集合多取一些,针对人脸偏向左侧的集合少取一些,针对正脸图片取更少的图片。比如一个N=32的情况,可以选取44333366的比例,即从左侧脸的两个训练样本集合中每个集合选取4张,从右侧脸的两个训练样本集合中每个集合选取6张,从近似正脸的四个训练样本集合中每个选取3张,构成整个训练迭代的训练数据,从而可以增加训练数据中侧脸的比例,并且在侧脸中强调了右脸的比例,保证了模型对于右侧脸有更好的定位效果。
需要说明的是,本申请实施例的样本选取策略还可以包括其他策略,本申请实施例不以上述三种策略作为限制。
本申请实施例,基于姿态信息对训练样本进行分类,获取s个训练样本集合,从s个训练样本集合中选取训练样本,构建训练数据,训练数据的选取可以提升模型的收敛速度,提升模型的训练速度,基于姿态信息的训练数据的选取使得训练数据可以平衡各个角度的人脸对于模型优化的影响,提升人脸关键点的定位精度。例如,可以提升对偏转角度大的人脸图像的关键点的定位精度。
(2)训练模型
一种可实现方式,图7A为本申请实施例的关键点卷积神经网络模型训练的示意图,如图7A所示,将上述实施例的训练数据输入至关键点卷积神经网络模型,关键点卷积神经网络模型对训练数据进行学习,以调整网络模型。
其中,关键点卷积神经网络模型可以对训练样本的人脸图像进行处理,输出人脸关键点的坐标,根据模型输出的人脸关键点的坐标与训练样本的实际的人脸关键点的坐标对网络模型进行优化调整。
示例性的,本申请实施例的关键点卷积神经网络模型可以是残差网络(ResNet),例如ResNet50。图7C和图7D为本申请实施例的ResNet50的网络结构示意图,如图7C和图7D所示,ResNet50由很多个小网络块(也可称为层)组成,每个小网络块的构成如图7D所示,通过加入一个恒等映射连接,使得网络模型对于细微的变动更加敏感,比如,把5映射到5.1,那么引入残差前是要学习一个映射使得F'(5)=5.1,引入残差后是H(5)=5.1,H(5)=F(5)+5,F(5)=0.1。这里的F'和F都表示网络参数映射,引入残差后的映射对输出的变化更敏感。比如s输出从5.1变到5.2,映射F'的输出增加了1/51=2%,而对于残差结构输出从5.1到5.2,映射F是从0.1到0.2,增加了100%。明显后者输出变化对权重的调整作用更大,所以效果更好。残差的思想都是去掉相同的主体部分,从而突出微小的变化。通过引入残差的思想,使得网络层数可以加深,增强网络模型的表达能力。
如图7C所示ResNet50共包括49个卷积层,2个池化层,以及1个全连接层,其中每一个卷积层之后便会跟着一个归一化层和一个线性整流函数层来对卷积层的输出进行约束,从而使得网络模型可以设计得更深,具有更强的表达能力。其中卷积层负责提取图像的高层次特征表达,通过对于不同通道的信息进行融合,提取出输入图像抽象的特征表达;池化层用于对输出矩阵的大小进行压缩,增大图像的感受野,从而保证特征的高度紧凑;全连接层用于对特征图进行线性整合,适配每一个与要解决的问题相关的事先定义好的输出维数,比如,在申请实施例中输出维数为L*2(例如68*2)维,L为人脸图像的人脸关键点的个数,每两个数值表示一个关键点的坐标值,例如一个关键点的x坐标和y坐标。
图7B为本申请实施例的关键点卷积神经网络模型的训练方法的流程图,如图7B所示,本申请的关键点卷积神经网络模型的训练方法可以包括:
步骤201、初始化关键点卷积神经网络模型。
初始化上述图7A所示的关键点卷积神经网络模型,即将关键点卷积神经网络模型的参数赋值为初始化的参数。
步骤202、将训练数据输入至初始化的如图7A所示关键点卷积神经网络模型,经过 循环迭代,获取目标关键点卷积神经网络模型。
具体实现方式可以为,参见图7A所示,输入的人脸图像,经过处理后,输出人脸关键点的坐标,在本步骤中将输出的人脸关键点的坐标与训练样本的人脸关键点的坐标进行比对,例如,进行相应运算,得到一个损失代价结果,根据损失代价结果对初始化的关键点卷积神经网络模型进行调整,例如可以设置一个损失代价结果满足的预设条件,如果不满足,则可以调整关键点卷积神经网络模型的参数,以调整后的关键点卷积神经网络模型对训练数据的人脸图像进行处理,进而计算一个新的损失代价结果,判断该新的损失代价结果是否满足预设条件,如此反复迭代,直至新的损失代价结果满足预设条件,得到目标关键点卷积神经网络模型。在使用模型阶段,使用该目标关键点卷积神经网络模型。
本实施例,初始化关键点卷积神经网络模型,将训练数据输入至初始化的如图7A所示关键点卷积神经网络模型,经过循环迭代,获取目标关键点卷积神经网络模型,通过不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,并且可以提升训练所获取的目标卷积神经网络模型的人脸关键点的定位精度。另外,由于训练数据的选取,可以使得模型训练过程中梯度下降方向更准确。
另一种可实现方式,与图7A所示实施例不同,本申请实施例进行训练的关键点卷积神经网络模型包括第一关键点卷积神经网络模型和第二关键点卷积神经网络模型。
图8为本申请实施例的关键点卷积神经网络模型训练的示意图,如图8所示,将上述实施例的训练数据输入至第一关键点卷积神经网络模型和第二关键点卷积神经网络模型,其中,第一关键点卷积神经网络模型用于学习左脸图像,第二关键点卷积神经网络模型用于学习右脸图像。
本申请实施例的第一关键点卷积神经网络模型和第二关键点卷积神经网络模型为基于半脸回归的网络结构模型,图像切分模块与第一关键点卷积神经网络模型和第二关键点卷积神经网络模型分别连接,第一关键点卷积神经网络模型和第二关键点卷积神经网络模型分别与汇总输出模块连接。
具体的,将训练数据的人脸图像输入至图像切分模块,图像切分模块用于对人脸图像进行切分,以输出左脸图像和右脸图像。将左脸图像输入至第一关键点卷积神经网络模型,将右脸图像输入至第二关键点卷积神经网络模型。通过第一支路可以得到左脸图像的人脸关键点的坐标,通过第二支路可以得到右脸图像的人脸关键点的坐标,再通过汇总输出模块利用人脸的结构化特征汇总输出人脸图像的人脸关键点的坐标。将输出的人脸关键点的坐标与训练样本的人脸关键点的坐标进行比对,以对第一关键点卷积神经网络模型和第二关键点卷积神经网络模型进行优化,以得到目标关键点卷积神经网络模型,即第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型。
示例性的,本申请实施例的第一关键点卷积神经网络模型和第二关键点卷积神经网络模型中任一模型均可以是残差网络(ResNet),例如ResNet50,其具体解释说明可以参见图7C和图7D的说明,此处不再赘述。
本实施例的关键点卷积神经网络模型的训练方法的流程可以参见图7B所示,即先初 始化关键点卷积神经网络模型,再迭代优化网络模型。其中,不同之处在于,本申请实施例的关键点卷积神经网络模型包括第一关键点卷积神经网络模型和第二关键点卷积神经网络模型,即初始化两个网络模型,将训练数据输入至初始化的如图8所示的第一关键点卷积神经网络模型和第二关键点神经网络模型,经过循环迭代,获取第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型。
本实施例,初始化第一关键点卷积神经网络模型和第二关键点卷积神经网络模型,将训练数据输入至初始化的如图8所示第一关键点卷积神经网络模型和第二关键点卷积神经网络模型,经过循环迭代,获取第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型,通过不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,并且可以提升训练所获取的目标卷积神经网络模型的人脸关键点的定位精度。另外,由于训练数据的选取,可以使得模型训练过程中梯度下降方向更准确。
并且,第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型为半脸回归模型,其网络模型较为简单,优化更为准确,并且其可以利用人脸的结构化特征,提升模型的人脸关键点的定位精度。
再一种可实现方式,与图8所示实施例不同,本申请实施例进行训练的关键点卷积神经网络模型包括第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型。
图9为本申请实施例的关键点卷积神经网络模型训练的示意图,如图9所示,将上述实施例的训练数据输入至第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型,其中,第一关键点卷积神经网络模型和第三关键点卷积神经网络模型用于学习左脸图像,第二关键点卷积神经网络模型和第四关键点卷积神经网络模型用于学习右脸图像。
本申请实施例的第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型为基于半脸回归的两阶段网络结构模型,如图9所示,图像切分模块与第一关键点卷积神经网络模型和第二关键点卷积神经网络模型分别连接,第一关键点卷积神经网络模型和第三关键点卷积神经网络模型通过第一仿射变换模块级联,之后连接第一仿射变换的逆变换模块,第二关键点卷积神经网络模型和第四关键点卷积神经网络模型通过第二仿射变换模块级联,之后连接第二仿射变换的逆变换模块。第一仿射变换的逆变换模块和第二仿射变换的逆变换模块分别连接汇总输出模块。
具体的,将训练数据的人脸图像输入至图像切分模块,图像切分模块用于对人脸图像进行切分,以输出左脸图像和右脸图像。将左脸图像输入至第一关键点卷积神经网络模型,将右脸图像输入至第二关键点卷积神经网络模型。通过第一支路可以得到左脸图像的人脸关键点的坐标,通过第二支路可以得到右脸图像的人脸关键点的坐标,再利用人脸的结构化特征汇总输出人脸图像的人脸关键点的坐标。将输出的人脸关键点的坐标与训练样本的人脸关键点的坐标进行比对,以对各个关键点卷积神经网络模型进行优化,以得到目标关 键点卷积神经网络模型,即第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三目标关键点卷积神经网络模型和第四目标关键点卷积神经网络模型。
示例性的,本申请实施例的第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型中任一模型均可以是残差网络(ResNet),例如ResNet50,其具体解释说明可以参见图7C和图7D的说明,此处不再赘述。
本实施例的关键点卷积神经网络模型的训练方法的流程可以参见图7B所示,即先初始化关键点卷积神经网络模型,再迭代优化网络模型。其中,不同之处在于,本申请实施例的关键点卷积神经网络模型包括第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型,即初始化四个网络模型,将训练数据输入至初始化的如图9所示的第一关键点卷积神经网络模型和第二关键点神经网络模型,经过循环迭代,获取第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三目标关键点卷积神经网络模型和第四目标关键点卷积神经网络模型。
本实施例,初始化第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型,将训练数据输入至初始化的如图9所示第一关键点卷积神经网络模型、第二关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型,经过循环迭代,获取第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型,通过不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,并且可以提升训练所获取的目标卷积神经网络模型的人脸关键点的定位精度。另外,由于训练数据的选取,可以使得模型训练过程中梯度下降方向更准确。
并且,第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三关键点卷积神经网络模型和第四关键点卷积神经网络模型为两阶段半脸回归模型,其网络模型较为简单,优化更为准确,并且其可以利用人脸的结构化特征,提升模型的人脸关键点的定位精度。
上述实施例介绍了训练数据构建和使用训练数据对模型的训练,下述实施例解释说明使用训练后的模型进行人脸关键点定位。
(3)使用模型
图10为本申请实施例的一种图像处理方法的流程图,如图10所示,本实施例的执行主体可以是上述模型应用设备或应用服务器,或其内部芯片,本申请的图像处理方法可以包括:
步骤301、获取人脸图像。
该人脸图像为待处理图像或对待处理图像进行截取操作获取的图像,该待处理图像可以是任意具有拍照功能或摄像功能的终端采集的,例如,智能手机采集的图像。
步骤302、将人脸图像输入至目标关键点卷积神经网络模型,输出人脸关键点的坐标。
其中,该目标关键点卷积神经网络模型为使用不同姿态信息的人脸图像对关键点卷积神经网络模型进行训练后获取的,该姿态信息用于反映人脸的偏转角度。
该目标关键点卷积神经网络模型可以是采用如图7A所示的训练过程进行训练得到的目标目标关键点卷积神经网络模型。
本实施例,获取人脸图像,将人脸图像输入至目标关键点卷积神经网络模型,输出人脸关键点的坐标。本实施例利用目标关键点卷积神经网络模型对人脸图像进行处理,由于目标关键点卷积神经网络模型是使用不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,有效地提升人脸关键点定位的精度。
图11A为本申请实施例的另一种图像处理方法的流程图,图11B为本申请实施例的另一种图像处理方法的示意图,如图11A所示,与图10所示实施例不同,本实施例还可以对人脸图像进行切分处理,使用如图8所示的训练过程获取的目标关键点卷积神经网络模型进行处理,本申请的图像处理方法可以包括:
步骤401、获取人脸图像。
其中,步骤401的具体解释说明可以参见图10所示实施例的步骤301的解释说明,此处不再赘述。
步骤402、根据人脸图像分别获取左脸图像和右脸图像。
一种可实现方式,对人脸图像进行切分处理和填充处理,分别获取左脸图像和右脸图像,该左脸图像和该右脸图像的尺寸与所述人脸图像的尺寸相同。
示例性的一种切分方式,对人脸图像沿竖直方向切分为四等份,取其中左侧三份,并在该左侧三份的最左侧补入黑色背景图,该黑色背景图的尺寸与一等份尺寸相同,获取左脸图像,该左脸图像的尺寸与人脸图像的尺寸大小相同。取四等份中右侧三份,并在该右侧三份的最右侧步入黑色背景图,该黑色背景图的尺寸与一等份尺寸相同,获取右脸图像,该右脸图像的尺寸与人脸图像的尺寸大小相同。
该切分方式可以保证在左脸图像和右脸图像中,左右半脸区域分别位于图像的中心。
需要说明的是,上述切分方式以四等份为例进行举例说明,其也可以是六、七、八等整数值等份,本申请实施例不一一举例说明。
参见图11B,通过步骤402处理后,获取的左脸图像和右脸图像分别输入至第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型。
步骤4031、将左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标。
该第一目标关键点卷积神经网络模型可以是采用如图8所示的训练过程获取的,该第一目标关键点卷积神经网络模型对左脸图像进行处理,输入如图11B所示的第一左脸关键点的坐标。
步骤4032、将右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标。
该第二目标关键点卷积神经网络模型可以是采用如图8所示的训练过程获取的,该第 二目标关键点卷积神经网络模型对右脸图像进行处理,输入如图11B所示的第二左脸关键点的坐标。
步骤404、根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
对第一左脸关键点的坐标和第一右脸关键点的坐标进行汇总,例如,第一左脸关键点个数为39,第一右脸关键点的个数为39,根据人脸的结构化信息将第一左脸关键点的坐标和第一右脸关键点的坐标进行汇总,中间区域存在10个点是重复的,中间区域可以使用平均值的计算方式获取中间区域的人脸关键点的坐标,最终得到68个人脸关键点的坐标。
本实施例,获取人脸图像,根据人脸图像分别获取左脸图像和右脸图像,将左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,将右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,根据第一左脸关键点的坐标和第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。本实施例利用第一目标关键点卷积神经网络模型对左脸图像进行处理,利用第二目标关键点卷积神经网络模型对右脸图像进行处理,由于第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型是使用不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,有效地提升人脸关键点定位的精度。
并且,第一目标关键点卷积神经网络模型和第二目标关键点卷积神经网络模型为半脸回归模型,其网络模型较为简单,半脸定位精度高,并且其可以利用人脸的结构化特征,进一步提升模型的人脸关键点的定位精度。
图12A为本申请实施例的另一种图像处理方法的流程图,图12B为本申请实施例的另一种图像处理方法的示意图,如图12A所示,与图11A所示实施例不同,本实施例还可以通过第三目标关键点卷积神经网络模型和第四目标关键点卷积神经网络模型提升人脸关键点的定位精度,本申请的图像处理方法可以包括:
步骤501、获取人脸图像。
其中,步骤501的具体解释说明可以参见图10所示实施例的步骤301的解释说明,此处不再赘述。
步骤502、根据人脸图像分别获取左脸图像和右脸图像,该左脸图像和该右脸图像的尺寸与所述人脸图像的尺寸相同。
其中,步骤502的具体解释说明可以参见图10所示实施例的步骤402的解释说明,此处不再赘述。
步骤5031、将左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,根据第一左脸关键点的坐标确定第一仿射变换矩阵,根据第一仿射变换矩阵和左脸图像获取矫正后的左脸图像,将矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,根据矫正后的第一左脸关键点的坐标和第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标。
该第一目标关键点卷积神经网络模型用于对左脸图像进行处理,以输出该左脸图像的 第一左脸关键点的坐标,根据第一左脸关键点的坐标和上述平均图像的关键点的坐标确定第一仿射变换矩阵,例如,一个3*3的矩阵,该第一仿射变换矩阵使得第一仿射变换矩阵乘以第一左脸关键点的坐标的转置与平均图像的关键点的坐标之间的二范数差距最小,其中可以使用经典的最小二乘法来求解第一仿射变换矩阵T L,使用该第一仿射变换矩阵将左脸图像对齐到平均图像,获取矫正后的左脸图像,该矫正后的左脸图像如图12B所示。为了进一步提高人脸关键点定位的准确性,将矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,根据第一仿射变换矩阵的逆变换获取第二左脸关键点的坐标,从而得到半脸的关键点输出。
步骤5032、将右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,根据第一右脸关键点的坐标确定第二仿射变换矩阵,根据第二仿射变换矩阵和右脸图像获取矫正后的右脸图像,将矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标,根据矫正后的第一右脸关键点的坐标和第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标。
该第二目标关键点卷积神经网络模型用于对右脸图像进行处理,以输出该右脸图像的第一右脸关键点的坐标,根据第一右脸关键点的坐标和上述平均图像的关键点的坐标确定第二仿射变换矩阵,例如,一个3*3的矩阵,该第二仿射变换矩阵使得第二仿射变换矩阵乘以第一右脸关键点的坐标的转置与平均图像的关键点的坐标之间的二范数差距最小,其中可以使用经典的最小二乘法来求解第一仿射变换矩阵T R,使用该第二仿射变换矩阵将右脸图像对齐到平均图像,获取矫正后的右脸图像,该矫正后的右脸图像如图12B所示。为了进一步提高人脸关键点定位的准确性,将矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点坐标,根据第二仿射变换矩阵的逆变换获取第二右脸关键点的坐标,从而得到半脸的关键点输出。
步骤504、根据第二左脸关键点的坐标和第二右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
对第二左脸关键点的坐标和第二右脸关键点的坐标进行汇总,该第二左脸关键点的坐标和第二右脸关键点的坐标相较于图11所示实施例的第一左脸关键点的坐标和第一右脸关键点的坐标,其精度更高。
本实施例,本实施例利用第一目标关键点卷积神经网络模型对左脸图像进行处理,根据第一目标关键点卷积神经网络模型的输出结果对左脸图像进行矫正,利用第三目标关键点卷积神经网络模型对矫正后的左脸图像进行处理,可以提升左脸关键点的定位精度,利用第二目标关键点卷积神经网络模型对右脸图像进行处理,根据第二目标关键点卷积神经网络模型的输出结果对右脸图像进行矫正,利用第四目标关键点卷积神经网络模型对矫正后的右脸图像进行处理,可以提升右脸关键点的定位精度,由于第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三目标关键点卷积神经网络模型和第四目标关键点卷积神经网络模型是使用不同姿态信息的训练样本对模型进行训练,从而可以平衡不同姿态信息的人脸图像对于关键点卷积神经网络模型优化的影响,有效地提升人脸关键点定位的精度。
并且,第一目标关键点卷积神经网络模型、第二目标关键点卷积神经网络模型、第三 目标关键点卷积神经网络模型和第四目标关键点卷积神经网络模型为两阶段半脸回归模型,半脸定位精度高,并且其可以利用人脸的结构化特征,进一步提升模型的人脸关键点的定位精度。
(4)应用场景
本申请上述实施例的图像处理方法可以对人脸关键点进行定位,该图像处理方法可以应用于人脸识别、人脸姿态估计、人脸图像质量评价、视频交互、活体验证等不同场景。下面采用几个具体的应用场景进行举例说明。
场景一、驾驶员疲劳驾驶提醒系统
图13为本申请实施例的图像处理方法的一种应用场景的示意图,如图13所示,本申请上述实施例的任一种图像处理方法可以应用于图13所示的模型应用设备,该模型应用设备设置有摄像头,该摄像头朝向驾驶员,该摄像头可以固定在车辆操作平台的上方或者其他位置,该模型应用设备存储有本申请实施例的目标关键点卷积神经网络模型。
该模型应用设备的摄像头可以采集驾驶员脸部的照片或对驾驶员进行摄像,采用本申请的图像处理方法对该照片或者摄像获取的视频中每一帧图像进行处理,定位出驾驶员的人脸关键点,进而根据人脸关键点确定是否发出告警信号。
示例性的,根据人脸关键点确定是否发出告警信号的实现方式可以为:根据人脸关键点确定驾驶员行为,判断驾驶员行为是否满足预设条件,该预设条件可以包括驾驶员频繁栽头、闭眼时长超过预设时长等。其中,根据人脸关键点可以确定驾驶员是否有闭眼、栽头、打哈欠的行为,进而判断驾驶员是否处于疲劳驾驶状态,当驾驶员的疲劳状态对驾驶构成威胁时,对驾驶员提出警告。例如,该告警信号可以触发扬声器播放提示音,或者触发方向盘振动。
场景二、视频应用交互系统
图14A至图14C为本申请实施例的图像处理方法的一种应用场景的界面示意图,本实施例中,上述图像处理方法可以应用于模型应用设备,该模型应用设备可以是如上所述的任一种终端,该模型应用设备上设置有客户端(例如APP),客户端通过模型应用设备的摄像头采集人脸图像,并通过本申请实施例的图像处理方法确定人脸关键点,进而根据人脸关键点实现虚拟化妆、佩带装饰等交互操作。示例性的,该模型应用设备显示该客户端的图像预览界面,该图像预览界面可以是如图14A至图14C任一左侧界面,客户端通过模型应用设备的摄像头采集人脸图像,并通过本申请实施例的图像处理方法确定人脸关键点,客户端根据人脸关键点的坐标和美颜效果参数对待处理图像进行调整,在图像预览界面显示调整后的待处理图像,该调整后的待处理图像可以是如图14A至图14C任一右侧界面,该美颜效果参数包括虚拟装饰参数、瘦脸参数、眼睛大小调整参数、磨皮去痘参数、皮肤美白参数、牙齿美白参数和腮红参数中至少一项或其组合。
其中,美颜效果参数为根据用户输入的触发指令确定的,如图14A至图14C任一左侧界面所示,该图像预览界面包括多个图形组件,每个图像组件用于触发一种美颜效果,例如,第一个图像组件用于触发增加虚拟装饰1,第二个图像组件用于触发增加虚拟装饰 2,第三个图像组件用于触发增加虚拟装饰3。当用户点击该第一个图像组件,响应该用户操作方式对应的触发指令,图像预览界面切换至如图14A的右侧界面,即在人脸图像的额头部位增加兔子耳朵的虚拟装饰,在人脸图像的鼻子部位增加兔子鼻子的虚拟装饰。当用户点击该第二个图像组件,响应该用户操作方式对应的触发指令,图像预览界面切换至如图14B的右侧界面,即在人脸图像的眼睛部位增加眼镜的虚拟装饰,在人脸图像的背景区域增加数学符号的虚拟装饰。当用户点击该第三个图像组件,响应该用户操作方式对应的触发指令,图像预览界面切换至如图14C的右侧界面,即在人脸图像的额头部位增加皇冠的虚拟装饰。
可选的,在显示调整后的待处理图像之前,还可以根据人脸关键点的坐标和人脸图像,获取关键点人脸图像,关键点人脸图像中标记有人脸关键点,在图像预览界面显示关键点人脸图像,接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点。根据调整后的人脸关键点和美颜效果参数对待处理图像进行调整。
由于本申请上述实施例的图像处理方法可以准确定位人脸关键点,从而可以提升美颜效果。
可选的,在显示调整后的待处理图像之前,还可以根据人脸关键点的坐标和人脸图像,获取关键点人脸图像,关键点人脸图像中标记有人脸关键点,在图像预览界面显示关键点人脸图像,接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点。根据调整后的人脸关键点和美颜效果参数对待处理图像进行调整。
场景三、视频监控中人脸识别
视频监控中通常可以根据人脸来确定一个人的身份,一般情况下人的姿态角度各异,对于人脸识别模块来说,进行提取特征较为困难,通过本申请上述实施例的图像处理方法可以准确确定出人脸关键点,从而大幅度降低人脸识别算法的难度,提升算法识别能力。
示例性的,视频监控系统中的摄像头可以采集待处理图像,获取人脸图像,通过本申请实施例的图像处理方法对人脸图像进行处理,输出人脸图像的人脸关键点的坐标,人脸识别模块可以根据人脸关键点的坐标对人脸图像进行特征提取,获取人脸图像特征,将人脸图像特征与数据库中的特征模板进行匹配,输出识别结果。
需要说明的是,本申请所提供的图像处理方法,不仅适用于终端设备采用前置摄像头传感器拍摄的应用场景,也适用于终端设备采用后置摄像头传感器拍摄的应用场景。同样的,本申请的方法还适用于终端设备采用双摄像头传感器拍摄的应用场景。在任一应用场景下,终端设备可以通过对摄像头传感器输出的图像采用步骤301-步骤302、或者步骤401-步骤404、或者步骤501-步骤504的方法步骤进行处理。
可以理解的是,上述各个实施例中,由终端设备实现的方法或步骤,也可以是由终端设备内部的芯片实现的。
图15为本申请实施例的一种终端设备的结构示意图。如图15所示,上述终端设备可以包括:
获取模块101,用于获取人脸图像。
处理模块102,用于根据所述人脸图像分别获取左脸图像和右脸图像,所述左脸图像和所述右脸图像的尺寸与所述人脸图像的尺寸相同;
处理模块102,还用于将所述左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,所述第一目标关键点卷积神经网络模型为使用具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练后获取的;
处理模块102,还用于将所述右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,所述第二目标关键点卷积神经网络模型为使用具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练后获取的;
处理模块102,还用于根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
在一些实施例中,所述具有关键点信息的左脸图像和所述具有关键点信息的右脸图像为根据不同姿态信息的人脸图像获取的,所述不同姿态信息的人脸图像具有对应的关键点信息,不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
在一些实施例中,处理模块102用于:根据所述第一左脸关键点的坐标确定第一仿射变换矩阵;根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标;根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;根据所述第二左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
在一些实施例中,处理模块102用于:根据所述第一右脸关键点的坐标确定第二仿射变换矩阵;根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;根据所述第二右脸关键点的坐标和所述第一左脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
在一些实施例中,处理模块102用于:根据所述第一左脸关键点的坐标确定第一仿射变换矩阵,根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;根据所述第一右脸关键点的坐标确定第二仿射变换矩阵,根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标,根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;根据所述第二左脸关键点的坐标和所述第二右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
在一些实施例中,获取模块101还用于:基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为训练数据;使用所述训练数据对两个关键点卷积神经网络模型进行训练,获取所述第一目标关键点卷积神经网络模型和所述第二目标关键点卷积神经网络模型;其中,s为大于等于3的任意整数。
在一些实施例中,获取模块101还用于:通过终端的拍照功能或拍摄功能采集待处理图像;在所述待处理图像中截取所述人脸图像。
本申请提供的终端设备,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。
在一些实施例中,图16为本申请实施例的又一种终端设备的结构示意图。如图16所示,在上述图15所示框图的基础上,终端设备还可以包括:驾驶预警模块103,用于根据所述人脸关键点的坐标确定驾驶员行为,根据所述驾驶员行为确定是否发出告警信号。
在一些实施例中,图17为本申请实施例的又一种终端设备的结构示意图。如图17所示,在上述图15所示框图的基础上,终端设备还可以包括调整模块104,用于根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,在所述图像预览界面显示调整后的待处理图像;所述美颜效果参数包括虚拟装饰参数、瘦脸参数、眼睛大小调整参数、磨皮去痘参数、皮肤美白参数、牙齿美白参数和腮红参数中至少一项或其组合。
在一些实施例中,调整模块104还用于:根据所述人脸关键点的坐标和所述人脸图像,获取关键点人脸图像,所述关键点人脸图像中标记有所述人脸关键点;在所述图像预览界面显示关键点人脸图像;调整模块104还用于接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点;根据所述调整后的人脸关键点和美颜效果参数对所述待处理图像进行调整。
在一些实施例中,图18为本申请实施例的又一种终端设备的结构示意图。如图18所示,在上述图15所示框图的基础上,终端设备还可以包括:人脸识别模块105,用于根据所述人脸关键点的坐标进行人脸识别。
在一些实施例中,人脸识别模块105用于:根据人脸关键点的坐标对所述人脸图像进行特征提取,获取人脸图像特征;将人脸图像特征与数据库中的特征模板进行匹配,输出识别结果。
本申请提供的终端设备,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。
图19为本申请实施例的又一种终端设备的结构示意图。如图19所示,该终端设备可以包括:处理器21(例如CPU)和存储器22;存储器22可能包含高速RAM存储器,也可能还包括非易失性存储器NVM,例如至少一个磁盘存储器,存储器22中可以存储各种指令,以用于完成各种处理功能以及实现本申请的方法步骤。可选的,本申请涉及的终端设备还可以包括:接收器23、发送器24、电源25、通信总线26以及通信端口27。接收器23和发送器24可以集成在终端设备的收发信机中,也可以为终端设备上独立的收发天线。通信总线26用于实现元件之间的通信连接。上述通信端口27用于实现终端设备与其 他外设之间进行连接通信。
在本申请中,上述存储器22用于存储计算机可执行程序代码,程序代码包括指令;当处理器21执行指令时,指令使终端设备执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。
正如上述实施例,本申请涉及的终端设备可以是手机、平板电脑等无线终端,因此,以终端设备为手机为例:图20为本申请实施例的终端设备为手机时的结构框图。参考图20,该手机可以包括:射频(Radio Frequency,RF)电路1110、存储器1120、输入单元1130、显示单元1140、传感器1150、音频电路1160、无线保真(wireless fidelity,WiFi)模块1170、处理器1180、以及电源1190等部件。本领域技术人员可以理解,图20中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图20对手机的各个构成部件进行具体的介绍:
RF电路1110可用于收发信息或通话过程中,信号的接收和发送,例如,将基站的下行信息接收后,给处理器1180处理;另外,将上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路1110还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE))、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器1120可用于存储软件程序以及模块,处理器1180通过运行存储在存储器1120的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1130可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1130可包括触控面板1131以及其他输入设备1132。触控面板1131,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1131上或在触控面板1131附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1131可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1180,并能接收处理器1180发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1131。除了触控面板1131,输入单元1130还可以包括其他输入设备1132。具体地,其他 输入设备1132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1140可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1140可包括显示面板1141,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1141。进一步的,触控面板1131可覆盖于显示面板1141之上,当触控面板1131检测到在其上或附近的触摸操作后,传送给处理器1180以确定触摸事件的类型,随后处理器1180根据触摸事件的类型在显示面板1141上提供相应的视觉输出。虽然在图10中,触控面板1131与显示面板1141是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1131与显示面板1141集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1150,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1141的亮度,光传感器可在手机移动到耳边时,关闭显示面板1141和/或背光。作为运动传感器的一种,加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1160、扬声器1161以及传声器1162可提供用户与手机之间的音频接口。音频电路1160可将接收到的音频数据转换后的电信号,传输到扬声器1161,由扬声器1161转换为声音信号输出;另一方面,传声器1162将收集的声音信号转换为电信号,由音频电路1160接收后转换为音频数据,再将音频数据输出处理器1180处理后,经RF电路1110以发送给比如另一手机,或者将音频数据输出至存储器1120以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1170可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图20示出了WiFi模块1170,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变本申请的本质的范围内而省略。
处理器1180是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1120内的软件程序和/或模块,以及调用存储在存储器1120内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1180可包括一个或多个处理单元;例如,处理器1180可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1180中。
手机还包括给各个部件供电的电源1190(比如电池),可选的,电源可以通过电源管理系统与处理器1180逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
手机还可以包括摄像头1200,该摄像头可以为前置摄像头,也可以为后置摄像头。 尽管未示出,手机还可以包括蓝牙模块、GPS模块等,在此不再赘述。
在本申请中,该手机所包括的处理器1180可以用于执行上述图像处理方法实施例,其实现原理和技术效果类似,在此不再赘述。
图21为本申请实施例的一种训练设备的结构示意图,如图21所示,本实施例的训练设备可以包括:图像获取模块201,用于根据不同姿态信息的人脸图像获取具有关键点信息的左脸图像和具有关键点信息的右脸图像,所述不同姿态信息的人脸图像具有对应的关键点信息;训练模块202,用于使用所述具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练,获取第一目标关键点卷积神经网络模型,所述第一目标关键点卷积神经网络模型用于对输入的左脸图像进行处理,输出左脸关键点的坐标;使用所述具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练,获取第二目标关键点卷积神经网络模型,所述第二目标关键点卷积神经网络模型用于对输入的右脸图像进行处理,输出右脸关键点的坐标;其中,姿态信息用于反映人脸的偏转角度。
在一些实施例中。不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
在一些实施例中,图像获取模块201,还用于基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,训练样本包括具有关键点信息的人脸图像;从s个训练样本集合中至少三个集合中选取多个训练样本,作为不同姿态信息的人脸图像。
本实施例以上所述的训练设备,可以用于执行上述实施例中训练设备/训练设备的芯片、或者应用服务器/应用服务器的芯片执行的技术方案,其实现原理和技术效果类似,其中各个模块的功能可以参考方法实施例中相应的描述,此处不再赘述。
图22为本申请实施例的另一种训练设备的结构示意图,如图22所示,本实施例的训练设备,包括:收发器211和处理器212。
收发器211可以包括混频器等必要的射频通信器件。处理器212可以包括CPU、DSP、MCU、ASIC或FPGA中的至少一个。
可选地,本实施例的训练设备还可以包括存储器213,存储器213用于存储程序指令,收发器211用于调用存储器213中的程序指令执行上述方案。
本实施例以上所述的训练设备,可以用于执行上述各方法实施例中训练设备/训练设备的芯片、或者应用服务器/应用服务器的芯片执行的技术方案,其实现原理和技术效果类似,其中各个器件的功能可以参考方法实施例中相应的描述,此处不再赘述。
图23为本申请实施例的一种芯片的结构示意图,如图23所示,本实施例的芯片可以作为训练设备的芯片、或者应用服务器的芯片,本实施例的芯片可以包括:存储器221和处理器222。存储器221与处理器222通信连接。所述处理器222例如可以包括CPU、DSP、MCU、ASIC或FPGA的至少一个。
在硬件实现上,以上各个功能模块可以以硬件形式内嵌于或独立于芯片的处理器222中。
其中,存储器221用于存储程序指令,处理器222用于调用存储器221中的程序指令执行上述方案。
所述程序指令可以以软件功能单元的形式实现并能够作为独立的产品销售或使用,所述存储器可以是任意形式的计算机可读取存储介质。基于这样的理解,本申请的技术方案的全部或部分可以以软件产品的形式体现出来,包括若干指令用以使得一台计算机设备,具体可以是处理器222,来执行本申请各个实施例中网络设备的全部或部分步骤。而前述的计算机可读存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本实施例以上所述的芯片,可以用于执行本申请上述各方法实施例中训练设备或其内部芯片的技术方案,其实现原理和技术效果类似,其中各个模块的功能可以参考方法实施例中相应的描述,此处不再赘述。
需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。

Claims (33)

  1. 一种图像处理方法,其特征在于,包括:
    获取人脸图像;
    根据所述人脸图像分别获取左脸图像和右脸图像,所述左脸图像和所述右脸图像的尺寸与所述人脸图像的尺寸相同;
    将所述左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,所述第一目标关键点卷积神经网络模型为使用具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练后获取的;
    将所述右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,所述第二目标关键点卷积神经网络模型为使用具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练后获取的;
    根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  2. 根据权利要求1所述的方法,其特征在于,所述具有关键点信息的左脸图像和所述具有关键点信息的右脸图像为根据不同姿态信息的人脸图像获取的,所述不同姿态信息的人脸图像具有对应的关键点信息,所述不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,包括:
    根据所述第一左脸关键点的坐标确定第一仿射变换矩阵;
    根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;
    将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标;
    根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;
    根据所述第二左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  4. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,包括:
    根据所述第一右脸关键点的坐标确定第二仿射变换矩阵;
    根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;
    将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;
    根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;
    根据所述第二右脸关键点的坐标和所述第一左脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  5. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标,包括:
    根据所述第一左脸关键点的坐标确定第一仿射变换矩阵,根据所述第一右脸关键点的坐标确定第二仿射变换矩阵;
    根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像,根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;
    将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;
    根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标,根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;
    根据所述第二左脸关键点的坐标和所述第二右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  6. 根据权利要求2至5任一项所述的方法,其特征在于,所述方法还包括:
    基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;
    从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为训练数据;
    使用所述训练数据对两个关键点卷积神经网络模型进行训练,获取所述第一目标关键点卷积神经网络模型和所述第二目标关键点卷积神经网络模型;
    其中,s为大于等于3的任意整数。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述获取人脸图像,包括:
    通过终端的拍照功能或拍摄功能采集待处理图像;
    在所述待处理图像中截取所述人脸图像。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据所述人脸关键点的坐标确定驾驶员行为,根据所述驾驶员行为确定是否发出告警信号。
  9. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,在图像预览界面显示调整后的待处理图像;
    所述美颜效果参数包括虚拟装饰参数、瘦脸参数、眼睛大小调整参数、磨皮去痘参数、皮肤美白参数、牙齿美白参数和腮红参数中至少一项或其组合。
  10. 根据权利要求9所述的方法,其特征在于,在显示调整后的待处理图像之前,所述方法还包括:
    根据所述人脸关键点的坐标和所述人脸图像,获取关键点人脸图像,所述关键点人脸图像中标记有所述人脸关键点;
    在所述图像预览界面显示关键点人脸图像;
    接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点;
    所述根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,包括:
    根据所述调整后的人脸关键点和美颜效果参数对所述待处理图像进行调整。
  11. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据所述人脸关键点的坐标进行人脸识别。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述人脸关键点的坐标进行人脸识别,包括:
    根据所述人脸关键点的坐标对所述人脸图像进行特征提取,获取人脸图像特征;
    将所述人脸图像特征与数据库中的特征模板进行匹配,输出识别结果。
  13. 一种图像处理方法,其特征在于,包括:
    根据不同姿态信息的人脸图像获取具有关键点信息的左脸图像和具有关键点信息的右脸图像,所述不同姿态信息的人脸图像具有对应的关键点信息;
    使用所述具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练,获取第一目标关键点卷积神经网络模型,所述第一目标关键点卷积神经网络模型用于对输入的左脸图像进行处理,输出左脸关键点的坐标;
    使用所述具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练,获取第二目标关键点卷积神经网络模型,所述第二目标关键点卷积神经网络模型用于对输入的右脸图像进行处理,输出右脸关键点的坐标;
    其中,所述姿态信息用于反映人脸的偏转角度。
  14. 根据权利要求13所述的方法,其特征在于,所述不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;
    从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为所述不同姿态信息的人脸图像;
    其中,s为大于等于3的任意整数。
  16. 一种图像处理装置,其特征在于,包括:
    获取模块,用于获取人脸图像;
    处理模块,根据所述人脸图像分别获取左脸图像和右脸图像,所述左脸图像和所述右脸图像的尺寸与所述人脸图像的尺寸相同;
    所述处理模块,还用于将所述左脸图像输入至第一目标关键点卷积神经网络模型,输出第一左脸关键点的坐标,所述第一目标关键点卷积神经网络模型为使用具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练后获取的;
    所述处理模块,还用于将所述右脸图像输入至第二目标关键点卷积神经网络模型,输出第一右脸关键点的坐标,所述第二目标关键点卷积神经网络模型为使用具有关键点信息的右脸图像对关键点卷积神经网络模型进行训练后获取的;
    所述处理模块,还用于根据所述第一左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  17. 根据权利要求16所述的装置,其特征在于,所述具有关键点信息的左脸图像和所述具有关键点信息的右脸图像为根据不同姿态信息的人脸图像获取的,所述不同姿态信息的人脸图像具有对应的关键点信息,所述不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
  18. 根据权利要求16或17所述的装置,其特征在于,所述处理模块用于:
    根据所述第一左脸关键点的坐标确定第一仿射变换矩阵;根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;
    将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标;根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;
    根据所述第二左脸关键点的坐标和所述第一右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  19. 根据权利要求16或17所述的装置,其特征在于,所述处理模块用于:
    根据所述第一右脸关键点的坐标确定第二仿射变换矩阵;根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;
    将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标;根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵的逆变换,获取第二右脸关键点的坐标;
    根据所述第二右脸关键点的坐标和所述第一左脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  20. 根据权利要求16或17所述的装置,其特征在于,所述处理模块用于:
    根据所述第一左脸关键点的坐标确定第一仿射变换矩阵,根据所述第一仿射变换矩阵和所述左脸图像获取矫正后的左脸图像;
    根据所述第一右脸关键点的坐标确定第二仿射变换矩阵,根据所述第二仿射变换矩阵和所述右脸图像获取矫正后的右脸图像;
    将所述矫正后的左脸图像输入至第三目标关键点卷积神经网络模型,输出矫正后的第一左脸关键点的坐标,根据所述矫正后的第一左脸关键点的坐标和所述第一仿射变换矩阵的逆变换,获取第二左脸关键点的坐标;
    将所述矫正后的右脸图像输入至第四目标关键点卷积神经网络模型,输出矫正后的第一右脸关键点的坐标,根据所述矫正后的第一右脸关键点的坐标和所述第二仿射变换矩阵 的逆变换,获取第二右脸关键点的坐标;
    根据所述第二左脸关键点的坐标和所述第二右脸关键点的坐标,获取人脸图像的人脸关键点的坐标。
  21. 根据权利要求17至20任一项所述的装置,其特征在于,所述获取模块还用于:
    基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;
    从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为训练数据;
    使用所述训练数据对两个关键点卷积神经网络模型进行训练,获取所述第一目标关键点卷积神经网络模型和所述第二目标关键点卷积神经网络模型;
    其中,s为大于等于3的任意整数。
  22. 根据权利要求16至21任一项所述的装置,其特征在于,所述获取模块还用于:
    通过终端的拍照功能或拍摄功能采集待处理图像;
    在所述待处理图像中截取所述人脸图像。
  23. 根据权利要求22所述的装置,其特征在于,所述装置还包括:
    驾驶预警模块,用于根据所述人脸关键点的坐标确定驾驶员行为,根据所述驾驶员行为确定是否发出告警信号。
  24. 根据权利要求22所述的装置,其特征在于,所述装置还包括:
    调整模块,用于根据所述人脸关键点的坐标和美颜效果参数对所述待处理图像进行调整,在所述图像预览界面显示调整后的待处理图像;
    所述美颜效果参数包括虚拟装饰参数、瘦脸参数、眼睛大小调整参数、磨皮去痘参数、皮肤美白参数、牙齿美白参数和腮红参数中至少一项或其组合。
  25. 根据权利要求24所述的装置,其特征在于,所述调整模块还用于根据所述人脸关键点的坐标和所述人脸图像,获取关键点人脸图像,所述关键点人脸图像中标记有所述人脸关键点;在所述图像预览界面显示关键点人脸图像;
    所述调整模块还用于接收用户输入的关键点调整指令,所述关键点调整指令用于指示调整后的人脸关键点;
    根据所述调整后的人脸关键点和美颜效果参数对所述待处理图像进行调整。
  26. 根据权利要求22所述的装置,其特征在于,所述装置还包括:
    人脸识别模块,用于根据所述人脸关键点的坐标进行人脸识别。
  27. 根据权利要求26所述的装置,其特征在于,所述人脸识别模块用于:
    根据所述人脸关键点的坐标对所述人脸图像进行特征提取,获取人脸图像特征;
    将所述人脸图像特征与数据库中的特征模板进行匹配,输出识别结果。
  28. 一种图像处理装置,其特征在于,包括:
    图像获取模块,用于根据不同姿态信息的人脸图像获取具有关键点信息的左脸图像和具有关键点信息的右脸图像,所述不同姿态信息的人脸图像具有对应的关键点信息;
    训练模块,用于使用所述具有关键点信息的左脸图像对关键点卷积神经网络模型进行训练,获取第一目标关键点卷积神经网络模型,所述第一目标关键点卷积神经网络模型用于对输入的左脸图像进行处理,输出左脸关键点的坐标;使用所述具有关键点信息的右脸 图像对关键点卷积神经网络模型进行训练,获取第二目标关键点卷积神经网络模型,所述第二目标关键点卷积神经网络模型用于对输入的右脸图像进行处理,输出右脸关键点的坐标;
    其中,所述姿态信息用于反映人脸的偏转角度。
  29. 根据权利要求28所述的装置,其特征在于,所述不同姿态信息的人脸图像包括第一姿态信息的人脸图像、第二姿态信息的人脸图像和第三姿态信息的人脸图像,所述第一姿态信息用于表示人脸的偏转角度的方向为向左的姿态信息,所述第二姿态信息用于表示人脸的偏转角度的方向为正向的姿态信息,所述第三姿态信息用于表示人脸的偏转角度的方向为右向的姿态信息。
  30. 根据权利要求29所述的装置,其特征在于,所述图像获取模块,还用于基于姿态信息对多个训练样本进行分类,获取s个训练样本集合,所述训练样本包括具有关键点信息的人脸图像;从所述s个训练样本集合中至少三个集合中选取多个训练样本,作为所述不同姿态信息的人脸图像;
    其中,s为大于等于3的任意整数。
  31. 一种图像处理装置,其特征在于,所述图像处理装置包括:处理器、存储器;
    其中,所述存储器用于存储计算机可执行程序代码,所述程序代码包括指令;当所述处理器执行所述指令时,所述指令使所述图像处理装置执行如权利要求1-12任一项所述的图像处理方法,或者,执行如权利要求13-15任一项所述的图像处理方法。
  32. 根据权利要求31所述的装置,其特征在于,所述图像处理装置包括终端设备。
  33. 一种计算机存储介质,其上存储有计算机程序或指令,其特征在于,当所述计算机程序或指令被处理器或计算机执行时,实现如权利要求1至15任一项所述的图像处理方法。
PCT/CN2020/086304 2019-05-21 2020-04-23 图像处理方法和装置 WO2020233333A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20808836.9A EP3965003A4 (en) 2019-05-21 2020-04-23 IMAGE PROCESSING METHOD AND DEVICE
US17/530,688 US20220076000A1 (en) 2019-05-21 2021-11-19 Image Processing Method And Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910421550.3 2019-05-21
CN201910421550.3A CN111985265B (zh) 2019-05-21 2019-05-21 图像处理方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/530,688 Continuation US20220076000A1 (en) 2019-05-21 2021-11-19 Image Processing Method And Apparatus

Publications (1)

Publication Number Publication Date
WO2020233333A1 true WO2020233333A1 (zh) 2020-11-26

Family

ID=73435796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086304 WO2020233333A1 (zh) 2019-05-21 2020-04-23 图像处理方法和装置

Country Status (4)

Country Link
US (1) US20220076000A1 (zh)
EP (1) EP3965003A4 (zh)
CN (1) CN111985265B (zh)
WO (1) WO2020233333A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966592A (zh) * 2021-03-03 2021-06-15 北京百度网讯科技有限公司 手部关键点检测方法、装置、设备和介质
CN113361380A (zh) * 2021-06-03 2021-09-07 上海哔哩哔哩科技有限公司 人体关键点检测模型训练方法、检测方法及装置
US20210295483A1 (en) * 2019-02-26 2021-09-23 Tencent Technology (Shenzhen) Company Limited Image fusion method, model training method, and related apparatuses
CN113657357A (zh) * 2021-10-20 2021-11-16 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN114049250A (zh) * 2022-01-13 2022-02-15 广州卓腾科技有限公司 一种证件照人脸姿态矫正方法、装置及介质
CN113240780B (zh) * 2021-05-14 2023-08-04 北京百度网讯科技有限公司 生成动画的方法和装置

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028343B (zh) * 2019-12-16 2020-12-11 腾讯科技(深圳)有限公司 三维人脸模型的生成方法、装置、设备及介质
CN111695602B (zh) * 2020-05-18 2021-06-08 五邑大学 多维度任务人脸美丽预测方法、系统及存储介质
CN111709428B (zh) * 2020-05-29 2023-09-15 北京百度网讯科技有限公司 图像中关键点位置的识别方法、装置、电子设备及介质
US11978207B2 (en) * 2021-06-03 2024-05-07 The Procter & Gamble Company Oral care based digital imaging systems and methods for determining perceived attractiveness of a facial image portion
CN111832435A (zh) * 2020-06-24 2020-10-27 五邑大学 基于迁移与弱监督的美丽预测方法、装置及存储介质
CN112633084B (zh) * 2020-12-07 2024-06-11 深圳云天励飞技术股份有限公司 人脸框确定方法、装置、终端设备及存储介质
CN112651389B (zh) * 2021-01-20 2023-11-14 北京中科虹霸科技有限公司 非正视虹膜图像的矫正模型训练、矫正、识别方法及装置
CN115082978A (zh) * 2021-03-10 2022-09-20 佳能株式会社 面部姿态的检测装置、方法、图像处理系统及存储介质
CN113674230B (zh) * 2021-08-10 2023-12-19 深圳市捷顺科技实业股份有限公司 一种室内逆光人脸关键点的检测方法及装置
CN113674139B (zh) * 2021-08-17 2024-08-20 北京京东尚科信息技术有限公司 人脸图像的处理方法、装置、电子设备及存储介质
CN113705444A (zh) * 2021-08-27 2021-11-26 成都玻尔兹曼智贝科技有限公司 一种面部发育分析评估方法及系统
TWI803333B (zh) * 2022-05-31 2023-05-21 鴻海精密工業股份有限公司 圖像特徵匹配方法、電腦裝置及儲存介質
CN116740777B (zh) * 2022-09-28 2024-07-05 荣耀终端有限公司 人脸质量检测模型的训练方法及其相关设备
CN115546845B (zh) * 2022-11-24 2023-06-06 中国平安财产保险股份有限公司 一种多视角牛脸识别方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104917963A (zh) * 2015-05-25 2015-09-16 深圳市金立通信设备有限公司 一种图像处理方法及终端
US20160371537A1 (en) * 2015-03-26 2016-12-22 Beijing Kuangshi Technology Co., Ltd. Method, system, and computer program product for recognizing face
CN107958439A (zh) * 2017-11-09 2018-04-24 北京小米移动软件有限公司 图像处理方法及装置
CN108062521A (zh) * 2017-12-12 2018-05-22 深圳大学 基于卷积神经网络的人脸检测方法、装置、终端及介质
CN108898087A (zh) * 2018-06-22 2018-11-27 腾讯科技(深圳)有限公司 人脸关键点定位模型的训练方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100552709B1 (ko) * 2004-05-21 2006-02-20 삼성전자주식회사 눈검출 장치 및 방법
KR100664956B1 (ko) * 2004-11-24 2007-01-04 삼성전자주식회사 눈 검출 방법 및 장치
JP4788319B2 (ja) * 2005-12-05 2011-10-05 日産自動車株式会社 開閉眼判定装置及び方法
CN106295533B (zh) * 2016-08-01 2019-07-02 厦门美图之家科技有限公司 一种自拍图像的优化方法、装置和拍摄终端
CN109344711B (zh) * 2018-08-30 2020-10-30 中国地质大学(武汉) 一种基于睡意程度的服务机器人主动服务方法
CN109800643B (zh) * 2018-12-14 2023-03-31 天津大学 一种活体人脸多角度的身份识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160371537A1 (en) * 2015-03-26 2016-12-22 Beijing Kuangshi Technology Co., Ltd. Method, system, and computer program product for recognizing face
CN104917963A (zh) * 2015-05-25 2015-09-16 深圳市金立通信设备有限公司 一种图像处理方法及终端
CN107958439A (zh) * 2017-11-09 2018-04-24 北京小米移动软件有限公司 图像处理方法及装置
CN108062521A (zh) * 2017-12-12 2018-05-22 深圳大学 基于卷积神经网络的人脸检测方法、装置、终端及介质
CN108898087A (zh) * 2018-06-22 2018-11-27 腾讯科技(深圳)有限公司 人脸关键点定位模型的训练方法、装置、设备及存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210295483A1 (en) * 2019-02-26 2021-09-23 Tencent Technology (Shenzhen) Company Limited Image fusion method, model training method, and related apparatuses
US11776097B2 (en) * 2019-02-26 2023-10-03 Tencent Technology (Shenzhen) Company Limited Image fusion method, model training method, and related apparatuses
CN112966592A (zh) * 2021-03-03 2021-06-15 北京百度网讯科技有限公司 手部关键点检测方法、装置、设备和介质
CN113240780B (zh) * 2021-05-14 2023-08-04 北京百度网讯科技有限公司 生成动画的方法和装置
CN113361380A (zh) * 2021-06-03 2021-09-07 上海哔哩哔哩科技有限公司 人体关键点检测模型训练方法、检测方法及装置
CN113657357A (zh) * 2021-10-20 2021-11-16 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN114049250A (zh) * 2022-01-13 2022-02-15 广州卓腾科技有限公司 一种证件照人脸姿态矫正方法、装置及介质
CN114049250B (zh) * 2022-01-13 2022-04-12 广州卓腾科技有限公司 一种证件照人脸姿态矫正方法、装置及介质

Also Published As

Publication number Publication date
CN111985265B (zh) 2024-04-12
EP3965003A4 (en) 2022-07-06
CN111985265A (zh) 2020-11-24
EP3965003A1 (en) 2022-03-09
US20220076000A1 (en) 2022-03-10

Similar Documents

Publication Publication Date Title
WO2020233333A1 (zh) 图像处理方法和装置
WO2020216054A1 (zh) 视线追踪模型训练的方法、视线追踪的方法及装置
US11776097B2 (en) Image fusion method, model training method, and related apparatuses
US11989350B2 (en) Hand key point recognition model training method, hand key point recognition method and device
US20220245961A1 (en) Training method for expression transfer model, expression transfer method and apparatus
US11715224B2 (en) Three-dimensional object reconstruction method and apparatus
WO2019105285A1 (zh) 人脸属性识别方法、电子设备及存储介质
WO2021135601A1 (zh) 辅助拍照方法、装置、终端设备及存储介质
CN108985220B (zh) 一种人脸图像处理方法、装置及存储介质
CN110647865A (zh) 人脸姿态的识别方法、装置、设备及存储介质
WO2020108404A1 (zh) 三维人脸模型的参数配置方法、装置、设备及存储介质
CN110443769B (zh) 图像处理方法、图像处理装置及终端设备
US11468544B2 (en) Eye texture inpainting
WO2019024717A1 (zh) 防伪处理方法及相关产品
WO2019233216A1 (zh) 一种手势动作的识别方法、装置以及设备
CN109272473B (zh) 一种图像处理方法及移动终端
CN111080747A (zh) 一种人脸图像处理方法及电子设备
WO2024055748A1 (zh) 一种头部姿态估计方法、装置、设备以及存储介质
WO2019071562A1 (zh) 一种数据处理方法及终端
CN115171196B (zh) 人脸图像处理方法、相关装置及存储介质
CN111797754B (zh) 图像检测的方法、装置、电子设备及介质
CN113676718A (zh) 图像处理方法、移动终端及可读存储介质
CN113849142B (zh) 图像展示方法、装置、电子设备及计算机可读存储介质
CN109584150B (zh) 一种图像处理方法及终端设备
CN111640204B (zh) 三维对象模型的构建方法、构建装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20808836

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020808836

Country of ref document: EP

Effective date: 20211129