WO2024109772A1 - Face posture estimation method and apparatus based on structured light system - Google Patents

Face posture estimation method and apparatus based on structured light system Download PDF

Info

Publication number
WO2024109772A1
WO2024109772A1 PCT/CN2023/133069 CN2023133069W WO2024109772A1 WO 2024109772 A1 WO2024109772 A1 WO 2024109772A1 CN 2023133069 W CN2023133069 W CN 2023133069W WO 2024109772 A1 WO2024109772 A1 WO 2024109772A1
Authority
WO
WIPO (PCT)
Prior art keywords
key points
real
key
time
standard
Prior art date
Application number
PCT/CN2023/133069
Other languages
French (fr)
Chinese (zh)
Inventor
宋展
汪奕鋆
叶于平
赵娟
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2024109772A1 publication Critical patent/WO2024109772A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present invention belongs to the technical field of computer vision, and in particular relates to a method and device for estimating human face posture based on a structured light system.
  • Face pose estimation is an important subfield in computer vision. Its main purpose is to obtain the orientation information of the face. It can be represented by rotation matrix, rotation vector, quaternion or Euler angle, and the above representations can also be converted to each other. Generally speaking, Euler angles are more visible and more commonly used. Euler angles include pitch, yaw and roll, which generally correspond to raising the head, shaking the head and turning the head.
  • face pose estimation has been widely used in many fields of production and life, such as biometric recognition and human-computer interaction.
  • face recognition due to the complexity and variability of people's facial postures during actual recognition, face recognition technology combined with face pose estimation can be recognized after correction, greatly improving the accuracy of face recognition.
  • face pose estimation can obtain the driver's facial posture, that is, the face orientation, in real time. When the orientation exceeds the threshold, it can be considered as fatigue driving or encountering special circumstances, and an early warning can be issued in time.
  • face pose estimation can provide corresponding information for face reconstruction, which can widely improve people's experience in games, social networking, film and television, and other fields.
  • Face pose estimation is challenging due to the diversity of facial appearance, as well as changes in face angles, facial expressions, facial texture differences, uneven lighting, and face occlusions.
  • the purpose of the embodiments of this specification is to provide a method and device for estimating facial posture based on a structured light system.
  • the present application provides a method for estimating face posture based on a structured light system, the method comprising:
  • Obtain a frontal zero-pose face image of the object to be tested use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
  • Collect the two-dimensional face image of the object to be tested in real time select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain the real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
  • selecting 2D key points from a frontal zero-pose face image or a two-dimensional face image includes:
  • determining a 3D key point cloud based on standard 3D key points and real-time 3D key points includes:
  • determining a 3D key point cloud based on standard 3D key points, real-time 3D key points and initial poses includes:
  • the complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object.
  • the real-time 3D key point is used as the center, and a search is performed within a sphere with a preset radius, and the searched 3D face points are added to a candidate set, which includes the real-time key point and the searched 3D face points;
  • For each point in the candidate set perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
  • the first consistent set and the second consistent set constitute a 3D keypoint cloud.
  • determining the precise pose based on the 3D key point cloud includes:
  • the first consistent set and the second consistent set are precisely aligned to determine the precise pose.
  • the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.
  • the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
  • the present application provides a face posture estimation device based on a structured light system, the device comprising:
  • the first selection module is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
  • the second selection module is used to collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image, obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
  • a first determination module is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;
  • the second determination module is used to determine the precise pose based on the 3D key point cloud.
  • the present application provides an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for estimating facial posture based on a structured light system as in the first aspect is implemented.
  • the present application provides a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for estimating facial posture based on a structured light system as in the first aspect.
  • the solution uses a structured light system to estimate facial posture. This method is fast, real-time, accurate and stable.
  • FIG1 is a schematic diagram of the structure of a structured light system
  • FIG2 is a schematic diagram of a coded pattern projected by a projector in a structured light system
  • FIG3 is a schematic diagram of a mathematical model of a structured light system
  • FIG4 is a specific schematic diagram of using a structured light system
  • FIG5 is a schematic diagram of a flow chart of a method for estimating facial posture based on a structured light system provided in the present application
  • FIG6 is a schematic diagram of a face 68 feature point model detected by Dlib
  • FIG7 is a schematic diagram of the nose as a 2D key point
  • FIG8 is a schematic diagram of the structure of a face posture estimation device based on a structured light system provided by the present application.
  • FIG. 9 is a schematic diagram of the structure of an electronic device provided in this application.
  • Existing face pose estimation generally includes model-based methods, feature regression-based methods, classification-based methods, methods based on the geometric relationship of facial key points, and also includes emerging methods such as subspace learning-based methods.
  • the model-based method extracts 2D feature points from the face area in the two-dimensional image and establishes a corresponding relationship with the 3D feature points of the three-dimensional face model to estimate the face posture.
  • the face posture obtained by this method is a continuous value with high accuracy, so it has become a commonly used method.
  • the three-dimensional average face model commonly used in this type of method is usually different from the two-dimensional face image, and the accuracy of posture estimation for large-angle deviations and exaggerated expressions of the face is low, and the robustness is poor.
  • the method based on feature regression obtain the mapping relationship from image space to posture space through machine learning, that is, by building a mathematical model (mainly neural network) to train a large number of face images with known postures to establish a mapping relationship and determine the face posture of the sample.
  • a mathematical model mainly neural network
  • this correspondence requires a large number of data sets to verify, and interpolation is often required in the process of image processing, which requires a lot of calculations. It is also greatly affected by the face detection and positioning results and is not robust enough.
  • the classification-based method divides the facial posture into different categories within a certain range, and classifies the samples to be determined.
  • This type of method quantifies the head posture space into several discrete points, and prepares several templates for each posture, and then compares the samples to be determined with the templates one by one.
  • the head posture corresponding to the template with the highest matching score is the classification result.
  • the specific methods can be divided into shape template-based, detection-based or local constraint model-based methods.
  • the physical quantities that need to be compared with the templates are image texture, posture detector, and sub-organ detector set arranged in a certain topology.
  • the results obtained by this type of method are discrete values, with high time complexity, low efficiency, large errors, and real-time performance is difficult to guarantee.
  • the method based on the geometric relationship of facial key points first determine the location of facial key points, and then estimate the face posture through the relative position of the key points. After determining the key points, this method determines the face posture through certain geometric constraints.
  • This type of method is relatively simple and has low time complexity, but it is very sensitive to occlusion and missing key points and has poor stability.
  • the method based on subspace learning It is believed that the posture space of the face is a natural three-dimensional space, which can be regarded as a three-dimensional posture manifold embedded in the high-dimensional image space. This type of method is relatively new, but currently has high time complexity, low accuracy, and poor practicality.
  • the present application proposes a face pose estimation method based on a structured light system, which can perform high-precision and robust estimation of face pose.
  • structured light technology as a mature active 3D information acquisition technology, has the advantages of non-contact, high precision, good real-time performance, low cost, large field of view and strong anti-interference ability.
  • 3D reconstruction technology based on structured light system is an active 3D reconstruction method.
  • the structured light system replaces a camera in the traditional binocular vision method with a projector.
  • this application uses an infrared projector and an infrared camera (or infrared camera or simply camera or camera) and a terminal device (such as the computer in Figure 1) to build a dynamic structured light system.
  • the structured light system uses the template projected by the codec projector to solve the corresponding point matching problem that is difficult to solve in binocular vision, and then uses the calibrated camera and projector to obtain the object's triangulation principle. Three-dimensional information.
  • the structured light system of this application is based on the principle of time coding, specifically a stripe binary coding method based on Gray code plus line shift coding.
  • the basic idea is to first make a series of coded patterns according to the coding principle, then use a projector to project the sequence pattern onto the target surface, and then decode the target image with the coded pattern.
  • the point cloud scanned and calculated using this method not only has a high spatial resolution, but also has a high reconstruction accuracy.
  • the Gray code plus line shift coding method adopted in this application achieves the coding purpose by continuously projecting multiple coding patterns to the target to be tested.
  • the general method for finding the coded value of each pixel is to first determine the edge position of the stripe in the image, then determine the stripe condition based on the pixel values on both sides of the stripe edge, and then determine the code value corresponding to each pixel based on the stripe condition.
  • the image stripe edge detection can use, for example, the zero crossing detection method, which can achieve sub-pixel detection accuracy.
  • gray ⁇ ⁇ 0 it means that the pixel value of the left stripe of edge L 2 is smaller than that of the right stripe, and the left side of L 2 corresponds to the black stripe and the right side corresponds to the white stripe. If gray ⁇ >0, the stripes are opposite, and the stripe situation is determined accordingly.
  • P is the absolute decoding value
  • G is the Gray code decoding value
  • S is the line shift decoding value
  • n is the number of Gray code coding patterns
  • m is the number of line shifts.
  • the parameters with superscript c correspond to the parameters of the camera
  • the parameters with superscript p correspond to the parameters of the projector.
  • m c/p is the 2D image coordinate in the digital image coordinate system
  • Mc /p is the coordinate in the camera/projector coordinate system
  • M w is the coordinate in the world coordinate system
  • R c/p and T c/p are the rotation and translation matrices (external parameters) of the camera/projector coordinate system relative to the world coordinate system
  • s is the scale factor
  • fu , f v , u 0 , v 0 , ⁇ are the parameters of the intrinsic parameter matrix (internal parameters).
  • R and T are the rotation and translation matrices between the projector and camera coordinate systems. After calibrating the external parameters R c , R p , T c and T p , R and T can be obtained from the coordinate transformation relationship.
  • edge points of the stripes extracted on the camera image plane are ( Represents the projection representation of M) and the projector pattern on the plane is the matching point of the same scene point M.
  • the edge points of the stripes extracted on the camera image plane are ( Represents the projection representation of M) and the projector pattern on the plane is the matching point of the same scene point M.
  • the two-dimensional position we encode along the x dimension (horizontally) and determine the correspondence between u c and up by matching the encoded values.
  • O c O p is the epipolar line.
  • K c and K p are the intrinsic parameter matrices of the camera and projector. According to Formula 6, the corresponding relationship between v c and v p in the y dimension (vertical) can be determined, from which we can determine the corresponding matching points and The complete coordinates of .
  • Formula (3) can be used to obtain complete three-dimensional space point information Mc ( xc , yc , zc ), thereby achieving a 2D-3D one-to-one correspondence from two-dimensional pixel points to three-dimensional space points.
  • FIG 4 it is a specific schematic diagram of using a structured light system.
  • a DLP4500 infrared projector is used to continuously project 18 coded patterns onto a human face in real time, and an infrared camera is used to capture the facial data of the projected patterns in real time.
  • an infrared camera is used to capture the facial data of the projected patterns in real time.
  • a high-precision real-time point cloud data sequence can be obtained, realizing a one-to-one correspondence between 2D-3D facial data (i.e., 2D facial data and 3D point cloud data).
  • the structured light system used in this application is based on the principle of time coding, specifically based on Gray code plus line shift coding method.
  • the structured light system can obtain high-precision, high-frame rate three-dimensional point cloud data that corresponds one-to-one to two-dimensional facial pixels for facial posture estimation.
  • FIG. 5 which shows a flow chart of a method for estimating a facial posture based on a structured light system applicable to an embodiment of the present application.
  • a method for estimating a face posture based on a structured light system may include:
  • the 3D point cloud is used as the standard model to select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and the standard 3D key points corresponding to the standard 2D key points are obtained according to the structured light system.
  • the object to be measured refers to an object whose facial posture is to be estimated, which may be a driver in a driving state, a player playing virtual reality, a person performing face recognition, etc.
  • the frontal zero-pose face image refers to the face pose image obtained when the face of the subject to be tested faces the camera.
  • Using the three-dimensional point cloud of the frontal zero-pose facial image of the object to be tested as a standard model can overcome the diversity of facial appearance and differences in facial texture to a certain extent compared to directly using the 3D average face as the standard model.
  • the two-dimensional face image refers to an image captured by a camera in real time.
  • the infrared structured light system of this application can obtain one-to-one corresponding 2D face data and 3D point cloud data.
  • the calculation of the entire 3D point cloud (the complete point cloud corresponding to the 2D face photo is called the 3D point cloud) is large and time-consuming, and will be affected by factors such as changes in facial expressions, and the stability and robustness are poor. Therefore, this application selects stable 2D key points and 3D key points for face pose estimation.
  • 2D key point selection is performed on a frontal zero-pose face image or a two-dimensional face image, including:
  • facial feature point detection when performing key point detection and selection on a frontal zero-pose face image or a two-dimensional face image, this application does not limit the specific algorithm.
  • Commonly used optimization-based methods such as ASM and AAM
  • regression-based methods cascaded pose regression and SDM
  • deep learning-based methods can all be used in the present invention to achieve facial feature point detection.
  • Dlib Dlib based on the cascade regression method to achieve 68 facial feature point detection.
  • Dlib uses the classic oriented histogram, gradient features combined with linear classifiers to achieve face detection, and uses ERT cascade regression, that is, a regression tree method based on gradient boosting learning to detect 68 feature points of the face, as shown in Figure 6, including eyebrows, eyes, nose, mouth and facial contours.
  • the coordinates of facial feature points can be obtained in the 2D image obtained by the system.
  • the eyebrows, eyes, mouth and chin have large movements, so the feature points of these parts are not selected.
  • the nose is relatively stable and is selected as the key point for subsequent estimation. Therefore, 9 points of the nose are selected as 2D key points, as shown in Figure 7. It can be understood that different stable areas can also be selected as key points according to actual conditions, such as using key point clouds of the nose, cheeks and corners of the eyes at the same time.
  • the 3D key point coordinates corresponding to the 2D key points can be obtained, specifically including standard 3D key points corresponding to standard 2D key points and real-time 3D key points corresponding to real-time 2D key points.
  • S530 determining a 3D key point cloud according to standard 3D key points and real-time 3D key points, may include:
  • the initial poses R 0 and t 0 between the object to be measured and the standard model can be calculated, and the initial poses can be used as initial values for subsequent precise registration and can also be used to obtain 3D key point clouds.
  • this application expands the acquisition of 3D key point cloud.
  • determining a 3D key point cloud according to standard 3D key points, real-time 3D key points and initial poses may include:
  • the complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object.
  • the search is performed within a sphere with a preset radius, with the acquired real-time 3D key points as the center.
  • the searched face 3D points are Add candidate sets, which include real-time key points and searched 3D face points;
  • For each point in the candidate set perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
  • the first consistent set and the second consistent set constitute a 3D keypoint cloud.
  • the candidate set C is constructed by taking the complete 3D point cloud corresponding to the 2D face photo (i.e., the 2D face image of the object to be measured) as the search object. With the acquired real-time 3D key points as the center, search within a sphere with a radius of r (i.e., the preset radius, which can be set according to actual needs), and add the searched face 3D points to the candidate set C. According to the above operation, the candidate set C of the face to be estimated can be obtained, and the candidate set C includes 3D key points and the newly searched face 3D points.
  • Consistent set S (i.e. 3D key point cloud) construction: Based on the candidate set C, the consistent set S is constructed based on the idea of RANSAC algorithm. Specifically, for each point p in the candidate set C, the point cloud registration transformation is performed based on the initial pose R 0 and T 0 , and after the transformation, the distance from each point to the nearest point in the complete 3D point cloud of the standard face model is calculated, and the point p with a distance less than the threshold ⁇ (which can be set according to actual needs) is added to the consistent set S 0 (i.e. the first consistent set), and the nearest point q corresponding to point p is added to the consistent set S 1 (i.e. the second consistent set). Consistent set S 0 and consistent set S 1 are the obtained 3D key point clouds.
  • the first consistent set and the second consistent set are precisely aligned to determine the precise pose.
  • the Trimmed ICP algorithm is used to perform precise alignment on the first consistent set and the second consistent set.
  • this application adopts the Trimmed ICP algorithm for precise alignment, and uses the LTS (least trimmed squares) method to perform an ascending sort on the residuals obtained for each group of matching points, only intercepts the first proportion of corresponding points to fit the error function, and solves R and T by iteratively minimizing the error function.
  • LTS least trimmed squares
  • N po ⁇ N p (10)
  • is a preset non-negative number
  • e is the trimmed MSE
  • the 3D key point clouds S 0 and S 1 are registered to obtain the optimal rotation matrix R, which is the final face pose matrix. It can be understood that if the result needs to be better visualized, R can be matrix decomposed to obtain the Euler angle representation of the face pose, namely the pitch angle (pitch), yaw angle (yaw) and roll angle (roll).
  • the face posture estimation method based on the structured light system uses the structured light system to The system is used to estimate face posture.
  • the method has fast speed, strong real-time performance, high accuracy and strong stability.
  • the face posture estimation method based on the structured light system provided in the embodiment of the present application does not require too many restrictions.
  • the present invention selects a stable area for face posture estimation based on the high-precision real-time reconstruction data of the face. Compared with most existing algorithms, it has higher accuracy and better robustness for problems such as changes in facial expressions and large angle changes.
  • FIG. 8 shows a schematic diagram of the structure of a face pose estimation device based on a structured light system according to an embodiment of the present application.
  • a face posture estimation device 800 based on a structured light system may include:
  • the first selection module 810 is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
  • the second selection module 820 is used to collect a two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
  • a first determination module 830 is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;
  • the second determination module 840 is used to determine the precise pose according to the 3D key point cloud.
  • the first selection module 810 or the second selection module 820 is further used for:
  • the first determining module 830 is further configured to:
  • the first determining module 830 is further configured to:
  • the complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object, and the search is performed within a sphere of a preset radius with the acquired real-time 3D key points as the center, and the searched face 3D points are added to the candidate set, which includes the real-time key points and the searched face 3D points;
  • For each point in the candidate set perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
  • the first consistent set and the second consistent set constitute a 3D keypoint cloud.
  • the second determining module 840 is further configured to:
  • the first consistent set and the second consistent set are precisely aligned to determine the precise pose.
  • the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.
  • the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
  • This embodiment provides a face posture estimation device based on a structured light system, which can execute the embodiment of the above method. Its implementation principle and technical effect are similar and will not be repeated here.
  • Fig. 9 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention. As shown in Fig. 9, a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present application is shown.
  • the electronic device 900 includes a central processing unit (CPU) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage part 908 into a random access memory (RAM) 903.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the device 900 are also stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the I/O interface 905 includes an input section 906 including a keyboard, a mouse, etc.; an output section 907 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.;
  • the I/O interface 906 includes a storage section 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 909 performs communication processing via a network such as the Internet.
  • a drive 910 is also connected to the I/O interface 906 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like, is installed on the drive 910 as needed, so that a computer program read therefrom is installed into the storage section 908 as needed.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program tangibly contained on a machine-readable medium, and the computer program includes a program code for executing the above-mentioned method for estimating a face pose based on a structured light system.
  • the computer program can be downloaded and installed from a network through the communication part 909, and/or installed from a removable medium 911.
  • each box in the flow chart or block diagram can represent a module, a program segment or a part of the code, and the aforementioned module, program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units or modules involved in the embodiments described in the present application may be implemented by software or hardware.
  • the units or modules described may also be arranged in a processor.
  • the names of these units or modules do not constitute limitations on the units or modules themselves in certain circumstances.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, A personal digital assistant, a media player, a navigation device, an email device, a gaming console, a tablet computer, a wearable device, or a combination of any of these devices.
  • the present application further provides a storage medium, which may be the storage medium included in the aforementioned device in the above embodiment; or it may be a storage medium that exists independently and is not assembled into the device.
  • the storage medium stores one or more programs, and the aforementioned programs are used by one or more processors to execute the face pose estimation method based on the structured light system described in the present application.
  • Storage media includes permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
  • Information can be computer-readable instructions, data structures, program modules or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

Provided in the present application are a face posture estimation method and apparatus based on a structured light system. The method comprises: acquiring a head-on zero-posture face image of a subject under test, using a three-dimensional point cloud of the head-on zero-posture face image as a standard model to perform 2D key point selection on the head-on zero-posture face image to obtain standard 2D key points, and acquiring standard 3D key points corresponding to the standard 2D key points according to the structured light system; collecting a two-dimensional face image of the subject under test in real time, performing 2D key point selection on the two-dimensional face image to obtain real-time 2D key points, and acquiring real-time 3D key points corresponding to the real-time 2D key points according to the structured light system; determining a 3D key point cloud according to the standard 3D key points and the real-time 3D key points; and determining a precise pose according to the 3D key point cloud. The solution has high speed, high real-time performance, high precision and high stability.

Description

一种基于结构光系统的人脸姿态估计方法及装置A method and device for estimating face posture based on structured light system 技术领域Technical Field
本发明属于计算机视觉技术领域,特别涉及一种基于结构光系统的人脸姿态估计方法及装置。The present invention belongs to the technical field of computer vision, and in particular relates to a method and device for estimating human face posture based on a structured light system.
背景技术Background technique
随着人工智能技术的不断进步和发展,以计算机视觉为代表的人工智能技术已在生产生活各个领域得到广泛应用,推动人们的生活进一步实现自动化、智能化和便利化。人脸姿态估计是计算机视觉中的一个重要子领域,其主要目的在于获取人脸的朝向信息,可以通过旋转矩阵、旋转向量、四元数或欧拉角等方式进行表示,上述表示形式之间也可以相互转换。一般而言,欧拉角可视性更强,使用更为普遍。欧拉角包括俯仰角(pitch)、偏航角(yaw)和旋转角(roll)表示,通俗来讲对应抬头、摇头和转头。With the continuous progress and development of artificial intelligence technology, artificial intelligence technology represented by computer vision has been widely used in various fields of production and life, promoting people's lives to be further automated, intelligent and convenient. Face pose estimation is an important subfield in computer vision. Its main purpose is to obtain the orientation information of the face. It can be represented by rotation matrix, rotation vector, quaternion or Euler angle, and the above representations can also be converted to each other. Generally speaking, Euler angles are more visible and more commonly used. Euler angles include pitch, yaw and roll, which generally correspond to raising the head, shaking the head and turning the head.
人脸姿态估计作为一个热门实用领域,已经广泛应用于生物特征识别和人机交互等生产生活的众多领域。比如人脸识别、驾驶状态检测、人体辅助、虚拟现实以及各种娱乐设备等。在人脸识别中,由于实际识别时人们脸部姿态的复杂多变,结合人脸姿态估计的人脸识别技术可以在校正后进行识别,大大提高人脸识别的准确率。在驾驶状态检测中,人脸姿态估计可以实时获取驾驶人的脸部姿态即脸部朝向,当朝向超出阈值时可认为是疲劳驾驶或遇特殊情况,可及时发出预警。在虚拟现实中,人脸姿态估计可以为人脸重建提供相应的信息,在游戏、社交、影视等领域广泛提升人们的体验。As a popular practical field, face pose estimation has been widely used in many fields of production and life, such as biometric recognition and human-computer interaction. For example, face recognition, driving state detection, human assistance, virtual reality, and various entertainment devices. In face recognition, due to the complexity and variability of people's facial postures during actual recognition, face recognition technology combined with face pose estimation can be recognized after correction, greatly improving the accuracy of face recognition. In driving state detection, face pose estimation can obtain the driver's facial posture, that is, the face orientation, in real time. When the orientation exceeds the threshold, it can be considered as fatigue driving or encountering special circumstances, and an early warning can be issued in time. In virtual reality, face pose estimation can provide corresponding information for face reconstruction, which can widely improve people's experience in games, social networking, film and television, and other fields.
由于人脸外观的多样性,以及人脸不同角度变化、人脸表情变化、人脸纹理差异、光照不均匀和脸部遮挡等问题,人脸姿态估计颇具挑战性。Face pose estimation is challenging due to the diversity of facial appearance, as well as changes in face angles, facial expressions, facial texture differences, uneven lighting, and face occlusions.
现有方法对人脸姿态估计的精度较低。 Existing methods have low accuracy in estimating facial pose.
发明内容Summary of the invention
本说明书实施例的目的是提供一种基于结构光系统的人脸姿态估计方法及装置。The purpose of the embodiments of this specification is to provide a method and device for estimating facial posture based on a structured light system.
为解决上述技术问题,本申请实施例通过以下方式实现的:To solve the above technical problems, the embodiments of the present application are implemented in the following ways:
第一方面,本申请提供一种基于结构光系统的人脸姿态估计方法,该方法包括:In a first aspect, the present application provides a method for estimating face posture based on a structured light system, the method comprising:
获取待测对象的正脸零姿态人脸图像,将正脸零姿态人脸图像的三维点云作为标准模型,对正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取标准2D关键点对应的标准3D关键点;Obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
实时采集待测对象的二维人脸图像,对二维人脸图像进行2D关键点选取,得到实时2D关键点,并根据结构光系统获取实时2D关键点对应的实时3D关键点;Collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain the real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
根据标准3D关键点及实时3D关键点,确定3D关键点云;Determine the 3D key point cloud based on standard 3D key points and real-time 3D key points;
根据3D关键点云,确定精准位姿。Determine the precise pose based on the 3D key point cloud.
在其中一个实施例中,对正脸零姿态人脸图像或二维人脸图像进行2D关键点选取,包括:In one embodiment, selecting 2D key points from a frontal zero-pose face image or a two-dimensional face image includes:
对正脸零姿态人脸图像或二维人脸图像进行特征点检测,得到人脸特征点;Perform feature point detection on a frontal zero-pose face image or a two-dimensional face image to obtain face feature points;
在人脸特征点中进行2D关键点选取。Select 2D key points from facial feature points.
在其中一个实施例中,根据标准3D关键点及实时3D关键点,确定3D关键点云,包括:In one embodiment, determining a 3D key point cloud based on standard 3D key points and real-time 3D key points includes:
根据标准3D关键点及实时3D关键点,确定待测对象与标准模型之间的初始位姿;Determine the initial pose between the object to be tested and the standard model based on the standard 3D key points and the real-time 3D key points;
根据标准3D关键点、实时3D关键点及初始位姿,确定3D关键点云。Determine the 3D key point cloud based on standard 3D key points, real-time 3D key points and initial pose.
在其中一个实施例中,根据标准3D关键点、实时3D关键点及初始位姿,确定3D关键点云,包括:In one embodiment, determining a 3D key point cloud based on standard 3D key points, real-time 3D key points and initial poses includes:
将待测对象的二维人脸图像对应的完整3D点云作为搜索对象,以已获取的 实时3D关键点为中心,在预设半径的球域内进行搜索,将搜索到的人脸3D点加入候选集,候选集包括实时关键点及搜索到的人脸3D点;The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object. The real-time 3D key point is used as the center, and a search is performed within a sphere with a preset radius, and the searched 3D face points are added to a candidate set, which includes the real-time key point and the searched 3D face points;
对于候选集中的每个点,基于初始位姿进行点云配准变换,并在变换后计算各点到标准模型的完整3D点云中最近点的距离,当距离小于阈值时,把对应的候选集中的点加入第一一致集中,并将对应的标准模型的完整3D点云中的最近点加入第二一致集中;For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
第一一致集和第二一致集构成3D关键点云。The first consistent set and the second consistent set constitute a 3D keypoint cloud.
在其中一个实施例中,根据3D关键点云,确定精准位姿,包括:In one embodiment, determining the precise pose based on the 3D key point cloud includes:
以初始位姿为初值,对第一一致集和第二一致集进行精配准,确定精准位姿。Taking the initial pose as the initial value, the first consistent set and the second consistent set are precisely aligned to determine the precise pose.
在其中一个实施例中,对第一一致集和第二一致集采用Trimmed ICP算法进行精配准操作。In one of the embodiments, the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.
在其中一个实施例中,结构光系统包括红外摄像机、红外投影仪和终端设备,红外摄像机、红外投影仪分别与终端设备连接。In one embodiment, the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
第二方面,本申请提供一种基于结构光系统的人脸姿态估计装置,该装置包括:In a second aspect, the present application provides a face posture estimation device based on a structured light system, the device comprising:
第一选取模块,用于获取待测对象的正脸零姿态人脸图像,将正脸零姿态人脸图像的三维点云作为标准模型,对正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取标准2D关键点对应的标准3D关键点;The first selection module is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
第二选取模块,用于实时采集待测对象的二维人脸图像,对二维人脸图像进行2D关键点选取,得到实时2D关键点,并根据结构光系统获取实时2D关键点对应的实时3D关键点;The second selection module is used to collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image, obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
第一确定模块,用于根据标准3D关键点及实时3D关键点,确定3D关键点云;A first determination module is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;
第二确定模块,用于根据3D关键点云,确定精准位姿。 The second determination module is used to determine the precise pose based on the 3D key point cloud.
第三方面,本申请提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时实现如第一方面的基于结构光系统的人脸姿态估计方法。In a third aspect, the present application provides an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for estimating facial posture based on a structured light system as in the first aspect is implemented.
第四方面,本申请提供一种可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面的基于结构光系统的人脸姿态估计方法。In a fourth aspect, the present application provides a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for estimating facial posture based on a structured light system as in the first aspect.
由以上本说明书实施例提供的技术方案可见,该方案:利用结构光系统进行人脸姿态估计,该方法速度快,实时性强,精度高,稳定性强。It can be seen from the technical solution provided in the above embodiments of this specification that the solution: uses a structured light system to estimate facial posture. This method is fast, real-time, accurate and stable.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of this specification or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.
图1为结构光系统的结构示意图;FIG1 is a schematic diagram of the structure of a structured light system;
图2为结构光系统中投影仪投射的编码图案的示意图;FIG2 is a schematic diagram of a coded pattern projected by a projector in a structured light system;
图3为结构光系统的数学模型示意图;FIG3 is a schematic diagram of a mathematical model of a structured light system;
图4为使用结构光系统的具体示意图;FIG4 is a specific schematic diagram of using a structured light system;
图5为本申请提供的基于结构光系统的人脸姿态估计方法的流程示意图;FIG5 is a schematic diagram of a flow chart of a method for estimating facial posture based on a structured light system provided in the present application;
图6为采用Dlib检测的人脸68特征点模型示意图;FIG6 is a schematic diagram of a face 68 feature point model detected by Dlib;
图7为鼻子部位作为2D关键点的示意图;FIG7 is a schematic diagram of the nose as a 2D key point;
图8为本申请提供的基于结构光系统的人脸姿态估计装置的结构示意图;FIG8 is a schematic diagram of the structure of a face posture estimation device based on a structured light system provided by the present application;
图9为本申请提供的电子设备的结构示意图。FIG. 9 is a schematic diagram of the structure of an electronic device provided in this application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书一部分实施例,而不是全部的实 施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments are only part of the embodiments of this specification, not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of this specification.
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.
在不背离本申请的范围或精神的情况下,可对本申请说明书的具体实施方式做多种改进和变化,这对本领域技术人员而言是显而易见的。由本申请的说明书得到的其他实施方式对技术人员而言是显而易见得的。本申请说明书和实施例仅是示例性的。It will be apparent to those skilled in the art that various modifications and variations may be made to the specific embodiments of the present application description without departing from the scope or spirit of the present application. Other embodiments derived from the present application description will be apparent to those skilled in the art. The present application description and examples are merely exemplary.
关于本文中所使用的“包含”、“包括”、“具有”、“含有”等等,均为开放性的用语,即意指包含但不限于。The words “include,” “including,” “have,” “contain,” etc. used in this document are open-ended terms, meaning including but not limited to.
本申请中的“份”如无特别说明,均按质量份计。Unless otherwise specified, "parts" in this application are calculated by mass.
现有的人脸姿态估计一般包括基于模型的方法、基于特征回归的方法、基于分类的方法、基于脸部关键点几何关系等方法,还包括基于子空间学习等新兴方法。Existing face pose estimation generally includes model-based methods, feature regression-based methods, classification-based methods, methods based on the geometric relationship of facial key points, and also includes emerging methods such as subspace learning-based methods.
其中,基于模型的方法:对二维图像中人脸区域提取2D特征点,并与三维人脸模型3D特征点建立对应关系,以此来估计人脸姿态。此方法得到的人脸姿态是连续值且精度较高,因此成为一种常用方法。但该类方法常用的三维平均脸模型与二维人脸图像通常存在差异,同时对人脸大角度偏移和夸张表情进行姿态估计的精准度较低,鲁棒性较差。Among them, the model-based method extracts 2D feature points from the face area in the two-dimensional image and establishes a corresponding relationship with the 3D feature points of the three-dimensional face model to estimate the face posture. The face posture obtained by this method is a continuous value with high accuracy, so it has become a commonly used method. However, the three-dimensional average face model commonly used in this type of method is usually different from the two-dimensional face image, and the accuracy of posture estimation for large-angle deviations and exaggerated expressions of the face is low, and the robustness is poor.
其中,基于特征回归的方法:通过机器学习获取从图像空间到姿态空间的映射关系,即通过构建数学模型(主要是神经网络)训练大量已知姿态的人脸图像建立映射关系并确定样本的人脸姿态。但实际这种对应关系需要大量的数据集进行验证,并且在对图像进行处理的过程中往往需要进行插值,计算量大, 且受人脸检测定位结果的影响较大,鲁棒性不够。Among them, the method based on feature regression: obtain the mapping relationship from image space to posture space through machine learning, that is, by building a mathematical model (mainly neural network) to train a large number of face images with known postures to establish a mapping relationship and determine the face posture of the sample. However, in reality, this correspondence requires a large number of data sets to verify, and interpolation is often required in the process of image processing, which requires a lot of calculations. It is also greatly affected by the face detection and positioning results and is not robust enough.
其中,基于分类的方法:将人脸姿态按一定范围划分成不同的类别,并对待确定样本进行分类。该类方法把头部姿势空间量化为若干离散点,并为每个姿态准备若干模板,然后将待确定样本与模板一一比较,匹配分数最高的模板所对应的头部姿态即为分类结果。根据所需要对比模板的不同,具体方法可分为基于形状模板、基于检测或基于局部约束模型等方法,需要比较模板的物理量分别为图像纹理、姿态检测子、按一定拓扑排列的子器官检测器集。该类方法得到的结果是离散值,时间复杂度高、效率低、误差较大,实时性也难以得到保证。Among them, the classification-based method: divides the facial posture into different categories within a certain range, and classifies the samples to be determined. This type of method quantifies the head posture space into several discrete points, and prepares several templates for each posture, and then compares the samples to be determined with the templates one by one. The head posture corresponding to the template with the highest matching score is the classification result. According to the different templates to be compared, the specific methods can be divided into shape template-based, detection-based or local constraint model-based methods. The physical quantities that need to be compared with the templates are image texture, posture detector, and sub-organ detector set arranged in a certain topology. The results obtained by this type of method are discrete values, with high time complexity, low efficiency, large errors, and real-time performance is difficult to guarantee.
其中,基于脸部关键点几何关系的方法:首先确定脸部关键点所在位置,然后通过关键点的相对位置来估计人脸姿态。该方法确定关键点后,通过一定的几何约束来确定人脸姿态。该类方法较简单,时间复杂度较低,但其对遮挡和关键点的缺失十分敏感,稳定性较差。Among them, the method based on the geometric relationship of facial key points: first determine the location of facial key points, and then estimate the face posture through the relative position of the key points. After determining the key points, this method determines the face posture through certain geometric constraints. This type of method is relatively simple and has low time complexity, but it is very sensitive to occlusion and missing key points and has poor stability.
其中,基于子空间学习的方法:认为人脸的姿态空间是一个天然的三维空间,可将其视为嵌入在高维图像空间中的三维姿态流形。此类方法较为新兴,但目前时间复杂度较高、精度较低,实用性较差。Among them, the method based on subspace learning: It is believed that the posture space of the face is a natural three-dimensional space, which can be regarded as a three-dimensional posture manifold embedded in the high-dimensional image space. This type of method is relatively new, but currently has high time complexity, low accuracy, and poor practicality.
基于上述缺陷,本申请提出一种基于结构光系统的人脸姿态估计方法,可以对人脸姿态进行高精度鲁棒估计。Based on the above defects, the present application proposes a face pose estimation method based on a structured light system, which can perform high-precision and robust estimation of face pose.
其中,结构光技术作为一种成熟的主动式三维信息获取技术,具有非接触、精度高、实时性好、低成本、大视场和抗干扰能力强等优点。基于结构光系统的三维重建技术(包括人脸姿态估计)是一种主动式的三维重建方法。Among them, structured light technology, as a mature active 3D information acquisition technology, has the advantages of non-contact, high precision, good real-time performance, low cost, large field of view and strong anti-interference ability. 3D reconstruction technology based on structured light system (including face pose estimation) is an active 3D reconstruction method.
结构光系统将传统双目视觉方法中一个摄像头用投影仪来代替。如图1所示,本申请采用一个红外投影仪和一个红外摄像机(或称为红外相机或简称摄像机或相机)及终端设备(如图1中的计算机)搭建动态结构光系统。结构光系统利用编解码投影仪投射的模板解决双目视觉当中难以解决的对应点匹配问题,再通过已经标定的摄像机和投影仪,利用三角测量原理就可以得到物体的 三维信息。The structured light system replaces a camera in the traditional binocular vision method with a projector. As shown in Figure 1, this application uses an infrared projector and an infrared camera (or infrared camera or simply camera or camera) and a terminal device (such as the computer in Figure 1) to build a dynamic structured light system. The structured light system uses the template projected by the codec projector to solve the corresponding point matching problem that is difficult to solve in binocular vision, and then uses the calibrated camera and projector to obtain the object's triangulation principle. Three-dimensional information.
本申请的结构光系统基于时间编码原理,具体而言是一种基于格雷码加线移编码的条纹二值编码方法。其基本思路是先根据编码原理制作连续多张的编码图案,然后使用投影仪投射该序列图案到目标表面,再对具有编码图案的目标图像进行解码。利用该方法扫描计算出的点云不仅空间分辨率较高,同时重建精度也很高。The structured light system of this application is based on the principle of time coding, specifically a stripe binary coding method based on Gray code plus line shift coding. The basic idea is to first make a series of coded patterns according to the coding principle, then use a projector to project the sequence pattern onto the target surface, and then decode the target image with the coded pattern. The point cloud scanned and calculated using this method not only has a high spatial resolution, but also has a high reconstruction accuracy.
本申请采用的格雷码加线移编码方法通过向待测目标连续投射多幅编码图案来达到编码目的。根据该方法制作的图案一共有18张,如图2中所示,其中,前2张是全白和全黑图案,用于提取有效像素点和获得当前纹理,中间8张是格雷码编码图案,最后8张是基于格雷码图案的线移图案。The Gray code plus line shift coding method adopted in this application achieves the coding purpose by continuously projecting multiple coding patterns to the target to be tested. There are a total of 18 patterns produced according to this method, as shown in Figure 2, of which the first two are all-white and all-black patterns, which are used to extract effective pixels and obtain the current texture, the middle 8 are Gray code coding patterns, and the last 8 are line shift patterns based on Gray code patterns.
在使用编码图案投影待测目标之后,需要从相机拍摄的图像序列中找到各像素的编码值,一般查找各像素的编码值采用的方法是先确定图像中条纹边缘位置,再根据条纹边缘两侧像素值确定条纹情况,并根据条纹情况确定每个像素对应的码值。其中图像条纹边缘检测可以采用例如零交叉检测法,利用该方法可以得到亚像素级的检测精度。After the target to be measured is projected with the coded pattern, the coded value of each pixel needs to be found from the image sequence captured by the camera. The general method for finding the coded value of each pixel is to first determine the edge position of the stripe in the image, then determine the stripe condition based on the pixel values on both sides of the stripe edge, and then determine the code value corresponding to each pixel based on the stripe condition. The image stripe edge detection can use, for example, the zero crossing detection method, which can achieve sub-pixel detection accuracy.
设编码图像检测出的某3个条纹边缘从左到右分别为L1、L2和L3,记边缘L1和L2之间某一行的像素灰度平均值为grayL,边缘L2和L3之间某一行的像素灰度平均值为grayR,则L2左右两边的灰度值差值为
grayΔ=grayL-grayR       (1)
Suppose that the three stripe edges detected in the coded image are L 1 , L 2 and L 3 from left to right, and the average grayscale value of the pixels in a row between edges L 1 and L 2 is gray L , and the average grayscale value of the pixels in a row between edges L 2 and L 3 is gray R , then the grayscale value difference between the left and right sides of L 2 is
gray Δ = gray L - gray R (1)
若grayΔ<0,则说明边缘L2左边条纹像素值小于右边条纹,L2左边对应黑色条纹,右边对应白色条纹。若grayΔ>0,则条纹相反,由此确定条纹情况。If gray Δ <0, it means that the pixel value of the left stripe of edge L 2 is smaller than that of the right stripe, and the left side of L 2 corresponds to the black stripe and the right side corresponds to the white stripe. If gray Δ >0, the stripes are opposite, and the stripe situation is determined accordingly.
确定条纹情况后,将黑色条纹对应像素点的编码值设为0,白色条纹对应像素点的编码值设为1,对格雷码和线移编码图像分别进行解码,根据每个线移解码值再细分格雷码解码值就能得到每个像素位置的唯一绝对解码值,该过程如下:
P=G+S
G∈{0,1,2,…,2n-1}       (2)
S∈{0,1,2,…,m-1}
After determining the stripe situation, set the encoding value of the pixel corresponding to the black stripe to 0, and the encoding value of the pixel corresponding to the white stripe to 1. Decode the Gray code and line shift coded images respectively. Subdivide the Gray code decoding value according to each line shift decoding value to obtain the unique absolute decoding value of each pixel position. The process is as follows:
P=G+S
G∈{0,1,2,…,2 n -1} (2)
S∈{0,1,2,…,m-1}
其中,P是绝对解码值,G是格雷码解码值,S是线移解码值,n是格雷码编码图案数,m是线移次数。通过上述操作编解码投影仪投射的18张编码图案,即可将投影仪与相机中的匹配点一一对应起来。Among them, P is the absolute decoding value, G is the Gray code decoding value, S is the line shift decoding value, n is the number of Gray code coding patterns, and m is the number of line shifts. Through the above operation, the 18 coding patterns projected by the projector can be encoded and decoded, and the matching points in the projector and the camera can be matched one by one.
基于相机的成像模型,我们可以快速建立空间三维点与其对应的图像平面坐标之间的关系。在结构光重建技术中,我们一般将投影仪看成一个逆光路的摄像机,所以也可以用相机模型来建立相关的联系。具体的数学模型表示如图3所示,图左为投影仪成像模型,图右为相机成像模型。M为待测物体上一点,M在投影仪图像对应点为mp,在相机图像对应点为mc。由相机成像模型,我们可以得到:

Based on the camera's imaging model, we can quickly establish the relationship between a three-dimensional point in space and its corresponding image plane coordinates. In structured light reconstruction technology, we generally regard the projector as a camera in the reverse light path, so we can also use the camera model to establish relevant connections. The specific mathematical model is shown in Figure 3, with the projector imaging model on the left and the camera imaging model on the right. M is a point on the object to be measured, and the corresponding point of M in the projector image is m p , and the corresponding point in the camera image is m c . From the camera imaging model, we can get:

公式中上标c的参数对应为摄像机的参数,上标p的参数对应为投影仪的参数。mc/p为数字图像坐标系中的2D图像坐标,Mc/p为摄相机/投影仪坐标系中的坐标,Mw为世界坐标系中的坐标,Rc/p和Tc/p为摄像机/投影仪坐标系相对于世界坐标系的旋转和平移矩阵(外参),s为尺度因子,fu、fv、u0、v0、γ为内参矩阵的各个参数(内参)。通过标定,我们可以得到摄像机和投影仪的内参及外参。In the formula, the parameters with superscript c correspond to the parameters of the camera, and the parameters with superscript p correspond to the parameters of the projector. m c/p is the 2D image coordinate in the digital image coordinate system, Mc /p is the coordinate in the camera/projector coordinate system, M w is the coordinate in the world coordinate system, R c/p and T c/p are the rotation and translation matrices (external parameters) of the camera/projector coordinate system relative to the world coordinate system, s is the scale factor, and fu , f v , u 0 , v 0 , γ are the parameters of the intrinsic parameter matrix (internal parameters). Through calibration, we can obtain the intrinsic and extrinsic parameters of the camera and projector.
摄像机和投影仪之间的位姿关系见式(5):
The position relationship between the camera and the projector is shown in equation (5):
其中R和T为投影仪和摄像机坐标系之间的旋转和平移矩阵。标定求得外参Rc、Rp、Tc和Tp后,由坐标变换关系即可获取R和T。 Where R and T are the rotation and translation matrices between the projector and camera coordinate systems. After calibrating the external parameters R c , R p , T c and T p , R and T can be obtained from the coordinate transformation relationship.
设在摄像机图像平面上提取的条纹边缘点(代表M的投影表示)和投影仪图案生成平面上的是同一场景点M的匹配点。对于二维位置而言,我们沿x维度(横向)进行编码,通过匹配编码值即可确定uc和up的对应关系。Suppose the edge points of the stripes extracted on the camera image plane are ( Represents the projection representation of M) and the projector pattern on the plane is the matching point of the same scene point M. For the two-dimensional position, we encode along the x dimension (horizontally) and determine the correspondence between u c and up by matching the encoded values.
如图3所示,OcOp是极线,利用极线约束我们可以得到:
As shown in Figure 3, O c O p is the epipolar line. Using the epipolar line constraint, we can get:
其中Kc和Kp是摄像机和投影仪的内参矩阵。根据公式6可以确定y维度(纵向)上vc和vp的对应关系,由此我们可以确定对应匹配点的完整坐标。Where K c and K p are the intrinsic parameter matrices of the camera and projector. According to Formula 6, the corresponding relationship between v c and v p in the y dimension (vertical) can be determined, from which we can determine the corresponding matching points and The complete coordinates of .
最后利用三角测距原理我们可以得到深度信息zc
Finally, using the principle of triangulation, we can get the depth information z c :
利用公式(3)可以得到完整的三维空间点信息Mc(xc,yc,zc),从而实现从二维像素点到三维空间点的2D-3D一一对应。Formula (3) can be used to obtain complete three-dimensional space point information Mc ( xc , yc , zc ), thereby achieving a 2D-3D one-to-one correspondence from two-dimensional pixel points to three-dimensional space points.
如图4所示为使用结构光系统的具体示意图,在本申请中采用DLP4500的红外投影仪实时不间断地投射18张编码图案到人脸上,通过红外摄像机对投图案的人脸数据进行实时捕捉。通过编解码图案可以得到高精度实时的点云数据序列,实现人脸数据2D-3D(即2D人脸数据与3D点云数据)的一一对应。As shown in Figure 4, it is a specific schematic diagram of using a structured light system. In this application, a DLP4500 infrared projector is used to continuously project 18 coded patterns onto a human face in real time, and an infrared camera is used to capture the facial data of the projected patterns in real time. By encoding and decoding the patterns, a high-precision real-time point cloud data sequence can be obtained, realizing a one-to-one correspondence between 2D-3D facial data (i.e., 2D facial data and 3D point cloud data).
本申请采用的结构光系统基于时间编码原理,具体的基于格雷码加线移编码方法,采用该结构光系统可以获得高精度、高帧率且与二维人脸像素点一一对应的三维点云数据进行人脸姿态估计。The structured light system used in this application is based on the principle of time coding, specifically based on Gray code plus line shift coding method. The structured light system can obtain high-precision, high-frame rate three-dimensional point cloud data that corresponds one-to-one to two-dimensional facial pixels for facial posture estimation.
下面结合附图和实施例对本发明进一步详细说明。The present invention is further described in detail below with reference to the accompanying drawings and embodiments.
参照图5,其示出了适用于本申请实施例提供的基于结构光系统的人脸姿态估计方法的流程示意图。5 , which shows a flow chart of a method for estimating a facial posture based on a structured light system applicable to an embodiment of the present application.
如图5所示,一种基于结构光系统的人脸姿态估计方法,可以包括:As shown in FIG5 , a method for estimating a face posture based on a structured light system may include:
S510、获取待测对象的正脸零姿态人脸图像,将正脸零姿态人脸图像的三 维点云作为标准模型,对正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取标准2D关键点对应的标准3D关键点。S510, obtaining a frontal zero-pose face image of the subject to be tested, and converting the three The 3D point cloud is used as the standard model to select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and the standard 3D key points corresponding to the standard 2D key points are obtained according to the structured light system.
具体的,待测对象是指待估计人脸姿态的对象,可以是处于驾驶状态时的驾驶员、玩虚拟现实时的玩家、进行人脸识别时的人等。Specifically, the object to be measured refers to an object whose facial posture is to be estimated, which may be a driver in a driving state, a player playing virtual reality, a person performing face recognition, etc.
正脸零姿态人脸图像是指待测对象人脸正对摄像机时得到的人脸姿态的图像。The frontal zero-pose face image refers to the face pose image obtained when the face of the subject to be tested faces the camera.
将待测对象的正脸零姿态人脸图像的三维点云作为标准模型(或称为标准人脸或标准人脸3D模型或标准人脸模型),与直接采用3D平均脸作为标准模型相比,这样做一定程度上可以克服人脸外观的多样性以及人脸纹理差异。Using the three-dimensional point cloud of the frontal zero-pose facial image of the object to be tested as a standard model (or called a standard face or a standard face 3D model or a standard face model) can overcome the diversity of facial appearance and differences in facial texture to a certain extent compared to directly using the 3D average face as the standard model.
S520、实时采集待测对象的二维人脸图像,对于二维人脸图像,进行2D关键点选取,并根据结构光系统获取2D关键点对应的3D关键点。S520 , collecting a two-dimensional face image of the object to be measured in real time, selecting 2D key points of the two-dimensional face image, and obtaining 3D key points corresponding to the 2D key points according to the structured light system.
具体的,二维人脸图像是指相机实时拍摄的图像。Specifically, the two-dimensional face image refers to an image captured by a camera in real time.
通过本申请的红外结构光系统可以得到一一对应的2D人脸数据与3D点云数据。对于人脸姿态估计任务而言,采用全体3D点云(2D人脸照片对应的完整点云称为3D点云)进行解算计算量大且非常耗时,同时会受人脸表情变化等因素影响,稳定性及鲁棒性差。因此,本申请选择稳定的2D关键点与3D关键点进行人脸姿态估计。The infrared structured light system of this application can obtain one-to-one corresponding 2D face data and 3D point cloud data. For the task of face pose estimation, the calculation of the entire 3D point cloud (the complete point cloud corresponding to the 2D face photo is called the 3D point cloud) is large and time-consuming, and will be affected by factors such as changes in facial expressions, and the stability and robustness are poor. Therefore, this application selects stable 2D key points and 3D key points for face pose estimation.
一个实施例中,对正脸零姿态人脸图像或二维人脸图像,进行2D关键点选取,包括:In one embodiment, 2D key point selection is performed on a frontal zero-pose face image or a two-dimensional face image, including:
对正脸零姿态人脸图像或二维人脸图像进行特征点检测,得到人脸特征点;Perform feature point detection on a frontal zero-pose face image or a two-dimensional face image to obtain face feature points;
在人脸特征点中进行2D关键点选取。Select 2D key points from facial feature points.
具体的,对正脸零姿态人脸图像或二维人脸图像,进行关键点检测与选取时,需要注意的是,对于人脸特征点的检测,本申请并不限定具体算法,常用的基于优化的方法(如ASM和AAM)、基于回归的方法(级联姿势回归和SDM)和基于深度学习的方法,上述方法均可用于本发明,以实现人脸特征点检测。以下以采用基于级联回归方法的Dlib实现人脸68特征点检测为例进行介绍。 Specifically, when performing key point detection and selection on a frontal zero-pose face image or a two-dimensional face image, it should be noted that for the detection of facial feature points, this application does not limit the specific algorithm. Commonly used optimization-based methods (such as ASM and AAM), regression-based methods (cascaded pose regression and SDM) and deep learning-based methods can all be used in the present invention to achieve facial feature point detection. The following is an example of using Dlib based on the cascade regression method to achieve 68 facial feature point detection.
其中,Dlib使用经典的定向直方图、梯度特征结合线性分类器实现人脸检测,使用ERT级联回归,即基于梯度提高学习的回归树方法实现人脸68个特征点的检测,如图6所示,包括眉毛、眼睛、鼻子、嘴巴及脸部轮廓等部分。Among them, Dlib uses the classic oriented histogram, gradient features combined with linear classifiers to achieve face detection, and uses ERT cascade regression, that is, a regression tree method based on gradient boosting learning to detect 68 feature points of the face, as shown in Figure 6, including eyebrows, eyes, nose, mouth and facial contours.
通过Dlib可在系统获取的2D图像中获取人脸特征点的坐标。为了避免受到人脸表情变化的影响、提高鲁棒性,对于特征点我们进行进一步选择。在人不同表情变化时,眉毛、眼睛、嘴巴和下巴等部位动作幅度较大,故对这些部位的特征点不进行选取。鼻子部位较为稳定,选作后续估计所用的关键点。故选取鼻子9点作为2D关键点,如图7所示。可以理解的,还可以根据实际情况选取不同的稳定区域作为关键点,例如同时采用人脸鼻子、脸颊两侧和眼角区域的关键点云等。Through Dlib, the coordinates of facial feature points can be obtained in the 2D image obtained by the system. In order to avoid the influence of changes in facial expressions and improve robustness, we further select the feature points. When people change different expressions, the eyebrows, eyes, mouth and chin have large movements, so the feature points of these parts are not selected. The nose is relatively stable and is selected as the key point for subsequent estimation. Therefore, 9 points of the nose are selected as 2D key points, as shown in Figure 7. It can be understood that different stable areas can also be selected as key points according to actual conditions, such as using key point clouds of the nose, cheeks and corners of the eyes at the same time.
根据上述已经获取的2D关键点坐标以及红外结构光系统中确定的2D-3D的几何对应关系,可以获取2D关键点对应的3D关键点坐标,具体包括标准2D关键点对应的标准3D关键点及实时2D关键点对应的实时3D关键点。According to the 2D key point coordinates obtained above and the 2D-3D geometric correspondence determined in the infrared structured light system, the 3D key point coordinates corresponding to the 2D key points can be obtained, specifically including standard 3D key points corresponding to standard 2D key points and real-time 3D key points corresponding to real-time 2D key points.
S530、根据标准3D关键点及实时3D关键点,确定3D关键点云,可以包括:S530, determining a 3D key point cloud according to standard 3D key points and real-time 3D key points, may include:
根据标准3D关键点及实时3D关键点,确定待测对象与标准模型之间的初始位姿;Determine the initial pose between the object to be tested and the standard model based on the standard 3D key points and the real-time 3D key points;
根据标准3D关键点、实时3D关键点及初始位姿,确定3D关键点云。Determine the 3D key point cloud based on standard 3D key points, real-time 3D key points and initial pose.
具体的,根据确定的标准3D关键点及实时3D关键点的坐标,可以计算得到待测对象与标准模型之间的初始位姿R0和t0,该初始位姿可以作为后续精配准的初始值,也可以用于3D关键点云的获取。Specifically, according to the coordinates of the determined standard 3D key points and real-time 3D key points, the initial poses R 0 and t 0 between the object to be measured and the standard model can be calculated, and the initial poses can be used as initial values for subsequent precise registration and can also be used to obtain 3D key point clouds.
为了防止人脸大角度变化、提高人脸姿态估计的稳定性与鲁棒性,在已获得3D关键点的基础上,本申请加以拓展获取3D关键点云。In order to prevent large angle changes of the face and improve the stability and robustness of face pose estimation, based on the already obtained 3D key points, this application expands the acquisition of 3D key point cloud.
一个实施例中,根据标准3D关键点、实时3D关键点及初始位姿,确定3D关键点云,可以包括:In one embodiment, determining a 3D key point cloud according to standard 3D key points, real-time 3D key points and initial poses may include:
将待测对象的二维人脸图像对应的完整3D点云作为搜索对象,以已获取的实时3D关键点为中心,在预设半径的球域内进行搜索,将搜索到的人脸3D点 加入候选集,候选集包括实时关键点及搜索到的人脸3D点;The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object. The search is performed within a sphere with a preset radius, with the acquired real-time 3D key points as the center. The searched face 3D points are Add candidate sets, which include real-time key points and searched 3D face points;
对于候选集中的每个点,基于初始位姿进行点云配准变换,并在变换后计算各点到标准模型的完整3D点云中最近点的距离,当距离小于阈值时,把对应的候选集中的点加入第一一致集中,并将对应的标准模型的完整3D点云中的最近点加入第二一致集中;For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
第一一致集和第二一致集构成3D关键点云。The first consistent set and the second consistent set constitute a 3D keypoint cloud.
具体的,候选集C构建:将2D人脸照片(即待测对象的二维人脸图像)对应的完整3D点云作为搜索对象。以已获取的实时3D关键点作为中心,在半径为r(即预设半径,可以根据实际需求进行设置)的球域内进行搜寻,将搜索到的人脸3D点加入候选集C中。按照上述操作,可获取待估计人脸候选集C,候选集C中包括3D关键点及新搜索到的人脸3D点。Specifically, the candidate set C is constructed by taking the complete 3D point cloud corresponding to the 2D face photo (i.e., the 2D face image of the object to be measured) as the search object. With the acquired real-time 3D key points as the center, search within a sphere with a radius of r (i.e., the preset radius, which can be set according to actual needs), and add the searched face 3D points to the candidate set C. According to the above operation, the candidate set C of the face to be estimated can be obtained, and the candidate set C includes 3D key points and the newly searched face 3D points.
一致集S(即3D关键点云)构建:在候选集C的基础上,基于类似RANSAC算法的思想,构建一致集S。具体而言,对于候选集C中的每个点p,基于初始位姿R0和T0进行点云配准变换,并在变换后计算各点到标准人脸模型的完整3D点云中最近点的距离,将距离小于阈值δ(可以根据实际需求进行设定)的点p加入一致集S0(即第一一致集)中,并将点p对应最近点q加入一致集S1(即第二一致集)中。一致集S0和一致集S1即为所获取的3D关键点云。Consistent set S (i.e. 3D key point cloud) construction: Based on the candidate set C, the consistent set S is constructed based on the idea of RANSAC algorithm. Specifically, for each point p in the candidate set C, the point cloud registration transformation is performed based on the initial pose R 0 and T 0 , and after the transformation, the distance from each point to the nearest point in the complete 3D point cloud of the standard face model is calculated, and the point p with a distance less than the threshold δ (which can be set according to actual needs) is added to the consistent set S 0 (i.e. the first consistent set), and the nearest point q corresponding to point p is added to the consistent set S 1 (i.e. the second consistent set). Consistent set S 0 and consistent set S 1 are the obtained 3D key point clouds.
S540、根据3D关键点云,确定精准位姿,包括:S540, determine the precise position and posture according to the 3D key point cloud, including:
以初始位姿为初值,对第一一致集和第二一致集进行精配准,确定精准位姿。其中,对第一一致集和第二一致集采用Trimmed ICP算法进行精配准操作。Taking the initial pose as the initial value, the first consistent set and the second consistent set are precisely aligned to determine the precise pose. Among them, the Trimmed ICP algorithm is used to perform precise alignment on the first consistent set and the second consistent set.
具体的,本申请采用Trimmed ICP算法进行精配准操作,采用LTS(the least trimmed squares)方法对每组匹配点求得的残差做一个升值排序,只截取前部分比例的对应点拟合误差函数,并通过迭代最小化误差函数求解R和T。Specifically, this application adopts the Trimmed ICP algorithm for precise alignment, and uses the LTS (least trimmed squares) method to perform an ascending sort on the residuals obtained for each group of matching points, only intercepts the first proportion of corresponding points to fit the error function, and solves R and T by iteratively minimizing the error function.
具体流程为:The specific process is:
a)为第一一致集S0(也称为源点云)上每一个点找到它在第二一致集S1(也称为目标点云)上的匹配点,计算其残差的平方值di(R,T)2a) For each point on the first consistent set S 0 (also called the source point cloud), find its matching point on the second consistent set S 1 (also called the target point cloud), and calculate the square value of its residual d i (R,T) 2 .
其中,设源点云S0中共有Np个点,即对于S0中的每个点pi,经过R和T转换后的对应点为pi(R,T)=R·pi+T,我们在目标点云S1中为每一点pi(R,T)寻找匹配点,寻找方式如式(8):
Among them, suppose there are N p points in the source point cloud S 0 , that is, For each point p i in S 0 , the corresponding point after R and T conversion is p i (R, T) = R · p i + T. We search for matching points for each point p i (R, T) in the target point cloud S 1 , and the search method is as follows:
寻找距离最近的点作为匹配点后,计算每一组匹配点的残差di(R,T):
di(R,T)=|mcl(i,R,T)-pi(R,T)|        (9)
After finding the closest point as the matching point, calculate the residual d i (R,T) of each set of matching points:
d i (R, T) = |m cl (i, R, T) - p i (R, T)| (9)
b)对di(R,T)2做升值排序,选择前Npo个点对应的残差平方值,求和得到S′LTS。其中Npo为选取的点的个数,按式(10)求得:
Npo=ξNp     (10)
b) Sort d i (R, T) 2 in ascending order, select the residual square values corresponding to the first N po points, and sum them up to get S′ LTS . Where N po is the number of selected points, and according to formula (10), we get:
N po =ξN p (10)
其中,ξ为最小重叠率,可通过最小化目标函数式(11)自适应求得:
ψ(ξ)=e(ξ)ξ-(1+λ)      (11)
Where ξ is the minimum overlap rate, which can be adaptively obtained by minimizing the objective function (11):
ψ(ξ)=e(ξ)ξ -(1+λ) (11)
式中,λ是一个预设的非负数,e为trimmed MSE,见式(12):
e=S′LTS/Npo      (12)
In the formula, λ is a preset non-negative number, and e is the trimmed MSE, see formula (12):
e=S′ LTS /N po (12)
c)如果满足终止条件,则退出迭代,否则开始新一轮迭代,通过最小化S′LTS计算Npo个被选点的最优变换R和T,由得到的R和T做对应点的转换,然后返回a)。c) If the termination condition is met, exit the iteration, otherwise start a new round of iteration, calculate the optimal transformation R and T of N po selected points by minimizing S′ LTS , use the obtained R and T to transform the corresponding points, and then return to a).
其中终止条件为:达到了设定的最大迭代次数(该最大迭代次数可以根据实际需求进行设定)、trimmed MSE e=S′LTS/Npo足够小、trimmed MSE的相对变换量|e-e′|/e足够小中任一一个。The termination condition is: any one of reaching the set maximum number of iterations (the maximum number of iterations can be set according to actual needs), trimmed MSE e=S′ LTS /N po is small enough, and the relative change amount of trimmed MSE |ee′|/e is small enough.
按照Trimmed ICP算法对3D关键点云S0和S1进行配准,得到最优的旋转矩阵R即为最终确定的人脸姿态矩阵。可以理解的,若需要对该结果进行更好地可视化处理,可对R进行矩阵分解,获取人脸姿态的欧拉角表示,即俯仰角(pitch)、偏航角(yaw)和旋转角(roll)。According to the Trimmed ICP algorithm, the 3D key point clouds S 0 and S 1 are registered to obtain the optimal rotation matrix R, which is the final face pose matrix. It can be understood that if the result needs to be better visualized, R can be matrix decomposed to obtain the Euler angle representation of the face pose, namely the pitch angle (pitch), yaw angle (yaw) and roll angle (roll).
可以理解的,本实施例中,还可以采用其他配准方法对第一一致集和第二一致集进行精配准操作。It can be understood that in this embodiment, other registration methods may also be used to perform a precise registration operation on the first consistent set and the second consistent set.
本申请实施例提供的基于结构光系统的人脸姿态估计方法,利用结构光系 统进行人脸姿态估计,该方法速度快,实时性强,精度高,稳定性强。The face posture estimation method based on the structured light system provided in the embodiment of the present application uses the structured light system to The system is used to estimate face posture. The method has fast speed, strong real-time performance, high accuracy and strong stability.
本申请实施例提供的基于结构光系统的人脸姿态估计方法,不需要太多的限制。与此同时,本发明在人脸高精度实时重建数据的基础上选择稳定的区域进行人脸姿态估计,相对已有的大部分算法而言,精度更高,且对于人脸表情变化和大角度变化等问题均具有较好的鲁棒性。The face posture estimation method based on the structured light system provided in the embodiment of the present application does not require too many restrictions. At the same time, the present invention selects a stable area for face posture estimation based on the high-precision real-time reconstruction data of the face. Compared with most existing algorithms, it has higher accuracy and better robustness for problems such as changes in facial expressions and large angle changes.
参照图8,其示出了根据本申请一个实施例描述的基于结构光系统的人脸姿态估计装置的结构示意图。8 , which shows a schematic diagram of the structure of a face pose estimation device based on a structured light system according to an embodiment of the present application.
如图8所示,基于结构光系统的人脸姿态估计装置800,可以包括:As shown in FIG8 , a face posture estimation device 800 based on a structured light system may include:
第一选取模块810,用于获取待测对象的正脸零姿态人脸图像,将正脸零姿态人脸图像的三维点云作为标准模型,对正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取标准2D关键点对应的标准3D关键点;The first selection module 810 is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
第二选取模块820,用于实时采集待测对象的二维人脸图像,对二维人脸图像进行2D关键点选取,得到实时2D关键点,并根据结构光系统获取实时2D关键点对应的实时3D关键点;The second selection module 820 is used to collect a two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
第一确定模块830,用于根据标准3D关键点及实时3D关键点,确定3D关键点云;A first determination module 830 is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;
第二确定模块840,用于根据3D关键点云,确定精准位姿。The second determination module 840 is used to determine the precise pose according to the 3D key point cloud.
可选的,第一选取模块810或第二选取模块820还用于:Optionally, the first selection module 810 or the second selection module 820 is further used for:
对正脸零姿态人脸图像或二维人脸图像进行特征点检测,得到人脸特征点;Perform feature point detection on a frontal zero-pose face image or a two-dimensional face image to obtain face feature points;
在人脸特征点中进行2D关键点选取。Select 2D key points from facial feature points.
可选的,第一确定模块830还用于:Optionally, the first determining module 830 is further configured to:
根据标准3D关键点及实时3D关键点,确定待测对象与标准模型之间的初始位姿;Determine the initial pose between the object to be tested and the standard model based on the standard 3D key points and the real-time 3D key points;
根据标准3D关键点、实时3D关键点及初始位姿,确定3D关键点云。Determine the 3D key point cloud based on standard 3D key points, real-time 3D key points and initial pose.
可选的,第一确定模块830还用于: Optionally, the first determining module 830 is further configured to:
将待测对象的二维人脸图像对应的完整3D点云作为搜索对象,以已获取的实时3D关键点为中心,在预设半径的球域内进行搜索,将搜索到的人脸3D点加入候选集,候选集包括实时关键点及搜索到的人脸3D点;The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object, and the search is performed within a sphere of a preset radius with the acquired real-time 3D key points as the center, and the searched face 3D points are added to the candidate set, which includes the real-time key points and the searched face 3D points;
对于候选集中的每个点,基于初始位姿进行点云配准变换,并在变换后计算各点到标准模型的完整3D点云中最近点的距离,当距离小于阈值时,把对应的候选集中的点加入第一一致集中,并将对应的标准模型的完整3D点云中的最近点加入第二一致集中;For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;
第一一致集和第二一致集构成3D关键点云。The first consistent set and the second consistent set constitute a 3D keypoint cloud.
可选的,第二确定模块840还用于:Optionally, the second determining module 840 is further configured to:
以初始位姿为初值,对第一一致集和第二一致集进行精配准,确定精准位姿。Taking the initial pose as the initial value, the first consistent set and the second consistent set are precisely aligned to determine the precise pose.
可选的,对第一一致集和第二一致集采用Trimmed ICP算法进行精配准操作。Optionally, the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.
可选的,结构光系统包括红外摄像机、红外投影仪和终端设备,红外摄像机、红外投影仪分别与终端设备连接。Optionally, the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
本实施例提供的一种基于结构光系统的人脸姿态估计装置,可以执行上述方法的实施例,其实现原理和技术效果类似,在此不再赘述。This embodiment provides a face posture estimation device based on a structured light system, which can execute the embodiment of the above method. Its implementation principle and technical effect are similar and will not be repeated here.
图9为本发明实施例提供的一种电子设备的结构示意图。如图9所示,示出了适于用来实现本申请实施例的电子设备900的结构示意图。Fig. 9 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention. As shown in Fig. 9, a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present application is shown.
如图9所示,电子设备900包括中央处理单元(CPU)901,其可以根据存储在只读存储器(ROM)902中的程序或者从存储部分908加载到随机访问存储器(RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有设备900操作所需的各种程序和数据。CPU 901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG9 , the electronic device 900 includes a central processing unit (CPU) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage part 908 into a random access memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 are also stored. The CPU 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包 括硬盘等的存储部分908;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口906。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, etc.; an output section 907 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; The I/O interface 906 includes a storage section 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, and the like. The communication section 909 performs communication processing via a network such as the Internet. A drive 910 is also connected to the I/O interface 906 as needed. A removable medium 911, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like, is installed on the drive 910 as needed, so that a computer program read therefrom is installed into the storage section 908 as needed.
特别地,根据本公开的实施例,上文参考图1描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行上述基于结构光系统的人脸姿态估计方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分909从网络上被下载和安装,和/或从可拆卸介质911被安装。In particular, according to an embodiment of the present disclosure, the process described above with reference to FIG. 1 can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program tangibly contained on a machine-readable medium, and the computer program includes a program code for executing the above-mentioned method for estimating a face pose based on a structured light system. In such an embodiment, the computer program can be downloaded and installed from a network through the communication part 909, and/or installed from a removable medium 911.
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,前述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present invention. In this regard, each box in the flow chart or block diagram can represent a module, a program segment or a part of the code, and the aforementioned module, program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中。这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments described in the present application may be implemented by software or hardware. The units or modules described may also be arranged in a processor. The names of these units or modules do not constitute limitations on the units or modules themselves in certain circumstances.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、笔记本电脑、行动电话、智能电话、 个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, A personal digital assistant, a media player, a navigation device, an email device, a gaming console, a tablet computer, a wearable device, or a combination of any of these devices.
作为另一方面,本申请还提供了一种存储介质,该存储介质可以是上述实施例中前述装置中所包含的存储介质;也可以是单独存在,未装配入设备中的存储介质。存储介质存储有一个或者一个以上程序,前述程序被一个或者一个以上的处理器用来执行描述于本申请的基于结构光系统的人脸姿态估计方法。As another aspect, the present application further provides a storage medium, which may be the storage medium included in the aforementioned device in the above embodiment; or it may be a storage medium that exists independently and is not assembled into the device. The storage medium stores one or more programs, and the aforementioned programs are used by one or more processors to execute the face pose estimation method based on the structured light system described in the present application.
存储介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Storage media includes permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of further restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。 Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

Claims (10)

  1. 一种基于结构光系统的人脸姿态估计方法,其特征在于,所述方法包括:A method for estimating facial posture based on a structured light system, characterized in that the method comprises:
    获取待测对象的正脸零姿态人脸图像,将所述正脸零姿态人脸图像的三维点云作为标准模型,对所述正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取所述标准2D关键点对应的标准3D关键点;Acquire a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
    实时采集所述待测对象的二维人脸图像,对所述二维人脸图像进行2D关键点选取,得到实时2D关键点,并根据结构光系统获取所述实时2D关键点对应的实时3D关键点;Collecting a two-dimensional face image of the object to be tested in real time, selecting 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtaining real-time 3D key points corresponding to the real-time 2D key points according to a structured light system;
    根据所述标准3D关键点及所述实时3D关键点,确定3D关键点云;Determine a 3D key point cloud according to the standard 3D key points and the real-time 3D key points;
    根据所述3D关键点云,确定精准位姿。According to the 3D key point cloud, the precise pose is determined.
  2. 根据权利要求1所述的方法,其特征在于,对所述正脸零姿态人脸图像或所述二维人脸图像进行2D关键点选取,包括:The method according to claim 1, characterized in that the step of selecting 2D key points from the frontal zero-pose face image or the two-dimensional face image comprises:
    对所述正脸零姿态人脸图像或所述二维人脸图像进行特征点检测,得到人脸特征点;Performing feature point detection on the frontal zero-pose face image or the two-dimensional face image to obtain face feature points;
    在人脸特征点中进行2D关键点选取。Select 2D key points from facial feature points.
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述标准3D关键点及所述实时3D关键点,确定3D关键点云,包括:The method according to claim 1, characterized in that the step of determining a 3D key point cloud based on the standard 3D key points and the real-time 3D key points comprises:
    根据所述标准3D关键点及所述实时3D关键点,确定所述待测对象与所述标准模型之间的初始位姿;Determining an initial position between the object to be measured and the standard model according to the standard 3D key points and the real-time 3D key points;
    根据所述标准3D关键点、所述实时3D关键点及所述初始位姿,确定3D关键点云。A 3D key point cloud is determined according to the standard 3D key points, the real-time 3D key points and the initial pose.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述标准3D关键点、所述实时3D关键点及所述初始位姿,确定3D关键点云,包括:The method according to claim 3, characterized in that the step of determining a 3D key point cloud based on the standard 3D key points, the real-time 3D key points and the initial pose comprises:
    将所述待测对象的二维人脸图像对应的完整3D点云作为搜索对象,以已获取的所述实时3D关键点为中心,在预设半径的球域内进行搜索,将搜索到的人脸3D点加入候选集,所述候选集包括所述实时关键点及所述搜索到的人脸3D点; Taking the complete 3D point cloud corresponding to the two-dimensional face image of the object to be tested as the search object, searching within a spherical area of a preset radius with the acquired real-time 3D key points as the center, and adding the searched face 3D points to a candidate set, wherein the candidate set includes the real-time key points and the searched face 3D points;
    对于所述候选集中的每个点,基于所述初始位姿进行点云配准变换,并在变换后计算各点到所述标准模型的完整3D点云中最近点的距离,当距离小于阈值时,把对应的候选集中的点加入第一一致集中,并将对应的所述标准模型的完整3D点云中的最近点加入第二一致集中;For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than a threshold, add the corresponding point in the candidate set to the first consistent set, and add the corresponding nearest point in the complete 3D point cloud of the standard model to the second consistent set;
    所述第一一致集和所述第二一致集构成所述3D关键点云。The first consistent set and the second consistent set constitute the 3D key point cloud.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述3D关键点云,确定精准位姿,包括:The method according to claim 4, characterized in that determining the precise pose according to the 3D key point cloud comprises:
    以所述初始位姿为初值,对所述第一一致集和所述第二一致集进行精配准,确定所述精准位姿。The first consistent set and the second consistent set are precisely aligned with each other using the initial pose as an initial value to determine the precise pose.
  6. 根据权利要求5所述的方法,其特征在于,对所述第一一致集和所述第二一致集采用Trimmed ICP算法进行精配准操作。The method according to claim 5 is characterized in that the first consistent set and the second consistent set are precisely aligned using the Trimmed ICP algorithm.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述结构光系统包括红外摄像机、红外投影仪和终端设备,所述红外摄像机、所述红外投影仪分别与所述终端设备连接。The method according to any one of claims 1 to 6 is characterized in that the structured light system comprises an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
  8. 一种基于结构光系统的人脸姿态估计装置,其特征在于,所述装置包括:A face posture estimation device based on a structured light system, characterized in that the device comprises:
    第一选取模块,用于获取待测对象的正脸零姿态人脸图像,将所述正脸零姿态人脸图像的三维点云作为标准模型,对所述正脸零姿态人脸图像进行2D关键点选取,得到标准2D关键点,并根据结构光系统获取所述标准2D关键点对应的标准3D关键点;The first selection module is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, perform 2D key point selection on the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;
    第二选取模块,用于实时采集所述待测对象的二维人脸图像,对所述二维人脸图像进行2D关键点选取,得到实时2D关键点,并根据结构光系统获取所述实时2D关键点对应的实时3D关键点;The second selection module is used to collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;
    第一确定模块,用于根据所述标准3D关键点及所述实时3D关键点,确定3D关键点云;A first determination module, configured to determine a 3D key point cloud according to the standard 3D key points and the real-time 3D key points;
    第二确定模块,用于根据所述3D关键点云,确定精准位姿。The second determination module is used to determine the precise posture according to the 3D key point cloud.
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上 运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7中任一所述的基于结构光系统的人脸姿态估计方法。An electronic device comprising a memory, a processor, and a device stored in the memory and operable on the processor The running computer program is characterized in that when the processor executes the program, the method for estimating facial posture based on the structured light system as described in any one of claims 1 to 7 is implemented.
  10. 一种可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一所述的基于结构光系统的人脸姿态估计方法。 A readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the method for estimating a facial posture based on a structured light system as described in any one of claims 1 to 7 is implemented.
PCT/CN2023/133069 2022-11-23 2023-11-21 Face posture estimation method and apparatus based on structured light system WO2024109772A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211475210.7 2022-11-23
CN202211475210.7A CN115798000A (en) 2022-11-23 2022-11-23 Face pose estimation method and device based on structured light system

Publications (1)

Publication Number Publication Date
WO2024109772A1 true WO2024109772A1 (en) 2024-05-30

Family

ID=85440540

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133069 WO2024109772A1 (en) 2022-11-23 2023-11-21 Face posture estimation method and apparatus based on structured light system

Country Status (2)

Country Link
CN (1) CN115798000A (en)
WO (1) WO2024109772A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798000A (en) * 2022-11-23 2023-03-14 中国科学院深圳先进技术研究院 Face pose estimation method and device based on structured light system
CN116311540B (en) * 2023-05-19 2023-08-08 深圳市江元科技(集团)有限公司 Human body posture scanning method, system and medium based on 3D structured light

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113544744A (en) * 2021-06-01 2021-10-22 华为技术有限公司 Head posture measuring method and device
CN113850865A (en) * 2021-09-26 2021-12-28 北京欧比邻科技有限公司 Human body posture positioning method and system based on binocular vision and storage medium
WO2022027912A1 (en) * 2020-08-05 2022-02-10 深圳市优必选科技股份有限公司 Face pose recognition method and apparatus, terminal device, and storage medium.
CN114333034A (en) * 2022-01-04 2022-04-12 广州虎牙科技有限公司 Face pose estimation method and device, electronic equipment and readable storage medium
CN115798000A (en) * 2022-11-23 2023-03-14 中国科学院深圳先进技术研究院 Face pose estimation method and device based on structured light system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022027912A1 (en) * 2020-08-05 2022-02-10 深圳市优必选科技股份有限公司 Face pose recognition method and apparatus, terminal device, and storage medium.
CN113544744A (en) * 2021-06-01 2021-10-22 华为技术有限公司 Head posture measuring method and device
CN113850865A (en) * 2021-09-26 2021-12-28 北京欧比邻科技有限公司 Human body posture positioning method and system based on binocular vision and storage medium
CN114333034A (en) * 2022-01-04 2022-04-12 广州虎牙科技有限公司 Face pose estimation method and device, electronic equipment and readable storage medium
CN115798000A (en) * 2022-11-23 2023-03-14 中国科学院深圳先进技术研究院 Face pose estimation method and device based on structured light system

Also Published As

Publication number Publication date
CN115798000A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
Deng et al. Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images
Dai et al. A 3d morphable model of craniofacial shape and texture variation
WO2024109772A1 (en) Face posture estimation method and apparatus based on structured light system
US20140043329A1 (en) Method of augmented makeover with 3d face modeling and landmark alignment
Cohen et al. Inference of human postures by classification of 3D human body shape
US8175412B2 (en) Method and apparatus for matching portions of input images
JP4353246B2 (en) Normal information estimation device, registered image group creation device, image collation device, and normal information estimation method
CN110490158B (en) Robust face alignment method based on multistage model
Tippetts et al. Dense disparity real-time stereo vision algorithm for resource-limited systems
Kemelmacher-Shlizerman et al. Being john malkovich
Muratov et al. 3DCapture: 3D Reconstruction for a Smartphone
US10380796B2 (en) Methods and systems for 3D contour recognition and 3D mesh generation
Lee et al. A SfM-based 3D face reconstruction method robust to self-occlusion by using a shape conversion matrix
CN109858433B (en) Method and device for identifying two-dimensional face picture based on three-dimensional face model
Anbarjafari et al. 3D face reconstruction with region based best fit blending using mobile phone for virtual reality based social media
CN111401157A (en) Face recognition method and system based on three-dimensional features
CN111709893A (en) ORB-SLAM2 improved algorithm based on information entropy and sharpening adjustment
Zeng et al. Joint 3D facial shape reconstruction and texture completion from a single image
CN114283265A (en) Unsupervised face correcting method based on 3D rotation modeling
Pham et al. Robust real-time 3d face tracking from rgbd videos under extreme pose, depth, and expression variation
GB2592583A (en) Aligning images
Lefevre et al. Structure and appearance features for robust 3d facial actions tracking
Marinescu et al. A versatile 3d face reconstruction from multiple images for face shape classification
Popovic et al. Surface normal clustering for implicit representation of manhattan scenes
CN112016495A (en) Face recognition method and device and electronic equipment