CN113239850B - Three-dimensional human face sign acquisition system and method - Google Patents

Three-dimensional human face sign acquisition system and method Download PDF

Info

Publication number
CN113239850B
CN113239850B CN202110582384.2A CN202110582384A CN113239850B CN 113239850 B CN113239850 B CN 113239850B CN 202110582384 A CN202110582384 A CN 202110582384A CN 113239850 B CN113239850 B CN 113239850B
Authority
CN
China
Prior art keywords
image
dimensional
human face
support rod
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110582384.2A
Other languages
Chinese (zh)
Other versions
CN113239850A (en
Inventor
刘凯
李学川
聂丹
王强
毛再生
杨刚
黄海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Tianyuanshi Technology Co ltd
Original Assignee
Wuhan Tianyuanshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tianyuanshi Technology Co ltd filed Critical Wuhan Tianyuanshi Technology Co ltd
Priority to CN202110582384.2A priority Critical patent/CN113239850B/en
Publication of CN113239850A publication Critical patent/CN113239850A/en
Application granted granted Critical
Publication of CN113239850B publication Critical patent/CN113239850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/147Details of sensors, e.g. sensor lenses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Vascular Medicine (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention discloses a three-dimensional human face sign acquisition system and a method, belonging to the technical field of image processing, human face three-dimensional modeling and recognition. The system comprises a photographic field, wherein the photographic field is semi-annular and is used for acquiring images of a multi-view portrait, a coordinate system with a real scale is established through a photographic field calibration technology, and information of two positive sides, human face key features and a human face three-dimensional model is acquired; the camera shooting field comprises a lens support and a back plate, wherein the lens support is used for collecting photos of the face and the upper body of a person from different angles, and the back plate is used for keeping the background single and calibrating the position; the method comprises the following steps of S1: calibrating a photographic field; s2: extracting and measuring key points of the face; s3: generating images on two sides; s4: and (5) three-dimensional modeling of the human face. The invention has high compatibility of the collected data and can collect information such as a human face three-dimensional model, a human face certificate comparison small picture, a human face key point characteristic value, images on two sides of a front side, height and the like.

Description

Three-dimensional human face sign acquisition system and method
Technical Field
The invention relates to the technical field of image processing, human face three-dimensional modeling and recognition, in particular to a three-dimensional human face sign acquisition system and a method.
Background
With the deepening of informatization construction in the fields of criminal investigation, security protection and the like, individual identification technologies are more and more abundant, and the requirement for further using new technologies in actual combat is more and more urgent. It is undeniable that the data collected is the basis for the identification comparison. From the content of the existing standardized personnel information acquisition system, the information acquired by the portrait generally comprises the identity, the portrait photo, the height, the two sides and other data of the acquired person. The process and information application of the traditional acquisition equipment have certain limitations, and generally have the following problems:
(1) Compared with a three-dimensional portrait model, a two-dimensional image is only a two-dimensional projection of the three-dimensional model at a specific angle, and lacks of real texture and geometric information, so that information comparison and face recognition in actual case detection can be influenced;
(2) The real key feature scale information of the human face cannot be automatically acquired. For example, eye width distance, nose height, mouth width, etc.;
(3) The degree of automation is low, the acquisition speed is low, the portrait picture, the height and the information on one positive side cannot be integrally and compatibly acquired, and the acquired person needs to be matched for multiple times;
(4) The acquisition process is complicated, the operation technical requirements on an acquirer are high, and the success rate of single acquisition is low.
A traditional three-dimensional portrait acquisition scheme is generally adopted, a face area is scanned for a period of time by using handheld three-dimensional scanning equipment, the acquired person needs to keep a static state in the scanning process, constraint conditions are strict, operation is complex, efficiency is low, and an acquired three-dimensional model cannot truly reflect actual geometric scale information.
Disclosure of Invention
The invention aims to provide a three-dimensional human face sign acquisition system and a three-dimensional human face sign acquisition method, which can realize acquisition of a three-dimensional human face model, images on two sides, height, human face key characteristic quantity and human image of an acquired person.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a three-dimensional human face sign acquisition system comprises a photographic field, wherein the photographic field is semi-annular and is used for acquiring images of multi-view human figures, a coordinate system with a real scale is established through a photographic field calibration technology, and information of two positive sides, human face key features and a human face three-dimensional model is acquired; the camera shooting field comprises a lens support and a back plate, wherein the lens support is used for collecting photos of the face and the upper body of a person from different angles, and the back plate is used for keeping the background single and calibrating the position;
the camera comprises a lower lens support and an upper lens support which are horizontally arranged, cameras are mounted on the lens supports, the lower lens support is 155cm away from the floor, the upper lens support is 175cm away from the floor, the lower lens support and the upper lens support respectively comprise a left support rod, a middle support rod and a right support rod, the rod length of each left support rod is 50cm, the included angle between each left support rod and the corresponding middle support rod and the included angle between each right support rod and the corresponding middle support rod are 120 degrees, 5 cameras are horizontally and uniformly arranged on the left support rod, the middle support rod and the right support rod, the distance between every two adjacent cameras on each rod is 10cm, the cameras on the left support rod and the right support rod, which are close to the two ends of the middle support rod, are respectively 5cm away from the left end and the right end of the middle support rod, a vertical support rod is further arranged on a connecting line of the middle cameras on the middle support rods of the lower lens support and the middle support rod of the upper lens support, 1 camera is vertically arranged on the middle support rod, a middle camera collecting space of the middle camera is arranged on the vertical support rod, a camera collecting line is connected with a host computer, and a camera collecting central point is connected with the host computer through an API collecting interface;
a field-shaped frame is arranged on a floor vertically corresponding to a position 35cm away from a middle camera on the vertical rod horizontally, and the size of the field-shaped frame is 30cm multiplied by 30cm and is used for a collected person to stand for taking a picture;
the backboard is vertically arranged on a floor which is horizontally spaced from a vertical supporting rod by 150cm away from a middle camera, the backboard is pure white, the height of the backboard is 200cm, the width of the backboard is 120cm, an upper row of frame markers and a lower row of frame markers are horizontally arranged on the backboard, each row of frame markers are arranged by 2, the 2 frame markers are respectively spaced from the left edge and the right edge of the backboard by 3cm, the lower row of frame markers is spaced from the floor by 174cm, the upper row of frame markers is spaced from the lower row of frame markers by 5cm, and is spaced from the upper edge of the backboard by 3cm, and the size of the frame markers is 8cm multiplied by 8cm.
A three-dimensional human face sign acquisition method is applied to a three-dimensional human face sign acquisition system and comprises the following steps:
s1: calibrating a photographic field, and acquiring the inside and outside orientation parameters of all cameras in the photographic field;
s2: extracting and measuring face key points to obtain three-dimensional coordinate values of face physical sign data;
s3: generating images on two positive sides;
s4: and (5) three-dimensional modeling of the human face.
Further, the step S1 of calibrating the camera field specifically includes the steps of: establishing an absolute space coordinate system, solving the internal and external orientation parameters of all cameras in the photographic field in a real space scale through a frame mark at a known space position distributed in the photographic field and then through an algorithm for automatically identifying the frame mark in an image and a calibration algorithm, and providing the positioning parameters of an image for the whole three-dimensional reconstruction and the facial feature measurement of the human face model.
Further, the algorithm for automatically identifying the frame marks in the images is a deep learning identification algorithm based on yolov3, hundreds of photos with the frame marks are labeled, an algorithm model is trained, after the initial position of the frame mark is identified, the initial position and the correlation coefficient of the images are accurately matched with the template frame mark, and the relation between the pixel coordinate of the central point of the frame mark and the real three-dimensional coordinate of the space is obtained;
the calibration algorithm is used for determining the absolute position of the frame mark center in the coordinate system and the image plane coordinates of the frame mark which can be seen on all the lenses.
Further, the facial key point extraction and measurement in step S2 specifically comprises the following steps:
s201: based on the face key point detection of the image, extracting the face characteristics through a model based on a deep convolution network;
s202: forward intersection calculation is carried out, for images with known orientation parameters and coordinates of image points of same name points between the images, three-dimensional space coordinates of object points corresponding to the image points are calculated by utilizing a light intersection principle, and the specific calculation process is as follows:
Figure GDA0003826174860000041
Figure GDA0003826174860000042
Figure GDA0003826174860000043
/>
Figure GDA0003826174860000044
(x'P 2 -P 0 )X=0
(y'P 2 -P 1 )X=0
wherein u and v are pixel coordinates of the image point, K is a camera parameter matrix, R and T are translation and rotation parameters of the image, and P 0 ,P 1 ,P 2 Is the 1,2,3 th row of the matrix P.
Further, the specific steps of generating an image on both sides in step S3 are:
s301: image portrait segmentation, namely adopting a convolutional neural network based on deep learning, acquiring and labeling pictures of different people shot in a photographic field, then training and learning to obtain related parameters for integrating a semantic segmentation program, and finally adopting a full convolution network to perform human shape segmentation;
s302: calculating the height, obtaining the height and height difference of the key points of the face from the floor according to the coordinates of the key points of the face calculated in the step S2, calculating the difference ratio of the key points of the face to the coordinates y of the image plane, obtaining the height of the pixels at the top of the head of the person to be collected in the image through the image segmented in the step S301, and calculating the actual height of the person to be collected according to the ratio;
and S303, synthesizing a picture, selecting a picture shot on the corresponding vertical support according to the height of the collected person, cutting the image according to a standard, and drawing a scale and writing a name up and down according to the proportion.
Further, the specific steps of the three-dimensional face modeling in the step S4 are as follows:
s401, sparse reconstruction based on the feature points is used for automatically solving the shooting pose and camera parameters of the user input image;
s402, image dense matching, namely completing automatic partitioning and multi-view dense matching according to the shooting pose of the image and the camera parameters to generate dense space point cloud of a three-dimensional scene;
s403, spatial point cloud networking, wherein the spatial point cloud networking is used for conducting multi-view fusion on the spatial point cloud and building a triangular network to generate an initial value of a three-dimensional scene surface model;
and S404, integrally optimizing the three-dimensional model, finely adjusting the surface model according to the image consistency and visibility constraint, performing texture mapping according to the original image, and finally generating the three-dimensional model meeting the user requirement.
Further, the step S401 specifically includes two-dimensional image feature tracking, three-dimensional pose estimation, bundle adjustment and optimization, and three-dimensional restoration, where the two-dimensional image feature tracking includes image visibility analysis, feature extraction, and feature matching, and the three-dimensional pose estimation includes projection matrix estimation and basis matrix estimation.
Further, the specific steps of step S402 are: selecting a neighboring image set of each image according to the pose of the image, then re-projecting the neighboring image set to an original image through an image imaging model according to the initially acquired three-dimensional portrait outline to form an initial depth map, and generating the initial depth map of each image in a confidence propagation mode by adopting a dense matching method with additional imaging geometric constraint to finely adjust the initial depth map of each image so as to obtain a final depth map and generate a spatial point cloud.
Further, the specific step of step S403 is: and finally, dividing the Delaunay tetrahedron into an outer body and an inner body by using a graph cut method, wherein the middle surface of the intersection of the inner body and the outer body is the final space triangular grid.
The invention has the beneficial effects that:
the three-dimensional human face sign acquisition system and the method have the advantages that the acquisition flow is simple, the acquisition speed is high, and only about 10s is required for acquiring a single person; the compatibility of the collected data is high, the information such as a human face three-dimensional model, a human evidence comparison small graph, a human face key point characteristic value, a front side image and a height can be collected, the collection success rate is larger than or equal to 99%, the 3D human face and face physical sign measurement precision is smaller than or equal to 2mm, the height precision is smaller than or equal to 1cm, the texture of the 3D human face model is real, the error is small, the front side image generation meets the department level standard, the real-time generation is realized, the automation degree is high, the equipment is suitable for different people with different heights, fatness and thinness, the occupied space of the equipment is small, and the industrial design and production are easy.
Drawings
Fig. 1 is a schematic top view of a three-dimensional human face sign acquisition system according to embodiment 1 of the present invention;
fig. 2 is a schematic structural side view and a schematic structural front view of a three-dimensional human face sign acquisition system in embodiment 1 of the present invention;
FIG. 3 is a schematic view of a back plate structure in embodiment 1 of the present invention;
fig. 4 is a schematic structural view of a standing position for photographing a human body in embodiment 1 of the present invention;
FIG. 5 is a schematic flow chart in example 2 of the present invention;
fig. 6 is a schematic structural diagram of a visual shell in embodiment 2 of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1-2, a three-dimensional human face sign acquisition system includes a camera field, the camera field is semi-annular and is used for acquiring images of a multi-view portrait, establishing a coordinate system with a real scale by a camera field calibration technology, and acquiring information of a front side, two sides, human face key features and a human face three-dimensional model; the photography field comprises a lens support and a back plate, wherein the lens support is used for collecting pictures of the face and the upper body of a person from different angles, and the back plate is used for keeping the background single and calibrating the position.
The camera support comprises a lower lens support and an upper lens support which are horizontally arranged, cameras are mounted on the lens support, the lower lens support is 155cm away from the floor, the upper lens support is 175cm away from the floor, the lower lens support and the upper lens support both comprise a left support rod with the rod length of 50cm, a middle support rod and a right support rod, the included angle between the left support rod and the middle support rod and the included angle between the right support rod and the middle support rod are 120 degrees, 5 cameras are horizontally uniformly arranged on the left support rod, the middle support rod and the right support rod, the distance between the adjacent cameras on each rod is 10cm, the cameras on the left support rod and the right support rod, which are close to the two ends of the middle support rod, are respectively 5cm away from the left end and the right end of the middle support rod, a vertical support rod is further arranged on a connecting line of the middle cameras on the middle support rod of the lower lens support and the upper lens support, 1 camera is vertically arranged on the vertical support rod, the middle camera on the horizontal distance support rod, the middle camera support rod is set to be 40cm away from the middle camera, and the central point of the camera field is towards the direction.
The camera provides an HTTP API interface, secondary development can be conducted, the camera is connected with the collection host through the network cable and the router, the collection host is used for sending photographing and downloading commands, and photos of the upper part of the face of the collected person at different angles are obtained and processed.
Meanwhile, as shown in fig. 3, the backboard is vertically arranged on a floor which is horizontally spaced from the vertical support rod by 150cm from the middle camera, the backboard is pure white, the height of the backboard is 200cm, the width of the backboard is 120cm, the upper row of frame markers and the lower row of frame markers are horizontally arranged on the backboard, each row of frame markers is provided with 2 frame markers, the 2 frame markers are respectively spaced from the left edge and the right edge of the backboard by 3cm, the lower row of frame markers is spaced from the floor by 174cm, the upper row of frame markers is spaced from the lower row of frame markers by 5cm, and is spaced from the upper edge of the backboard by 3cm, and the size of the frame markers is 8cm × 8cm.
In order to avoid the situation that the human face in the generated image is not in the image range during shooting, as shown in fig. 4, a field-shaped frame is arranged on a floor vertically corresponding to the position 35cm away from the middle camera on the vertical rod horizontally, and the size of the field-shaped frame is 30cm multiplied by 30cm and is used for the collected person to stand for shooting.
In the embodiment, the left and right width of the lens in the whole system is 100cm, the height is 145cm-185cm, and the range of the space between the back plate and the lens is about 100 cm. The parameters of the lens are that the focal length is 36 mm, the resolution is 1080 (wide) × 1920 (high), the pixel size is 2.9um, and the power supply is 200W. The number of cameras is 33, two rows of upper and lower, 15 lenses in each row, and 3 additional lenses are added on the front vertical rod. The distance between adjacent lenses on each row of rods is 10cm, all the lenses are fixed and kept horizontal, the direction of the lenses faces to a central point, the horizontal distance between the central point and the front lens is 40cm, the width of an entrance through which a person to be shot enters and exits is 100cm, the person stands at the position 40cm away from the front lens, and the person faces to one row of lenses of the vertical rods to shoot.
The whole system is suitable for different crowds with different heights and slimness, occupies small space and is easy for industrial design and mass production. And the system has high data acquisition compatibility and automation degree, can be shot from multiple angles, has abundant textures in the image of the shot person on the picture, has high image resolution and clear details, can cover the face with the image, and can acquire information such as a face three-dimensional model, a witness comparison thumbnail, face key point characteristic values, images on two sides and the height.
Example 2
As shown in fig. 5, a three-dimensional human face sign acquisition method applied to the three-dimensional human face sign acquisition system in embodiment 1 includes the following steps:
s1: calibrating a photographic field, and acquiring the inside and outside orientation parameters of all cameras in the photographic field;
s2: extracting and measuring face key points to obtain three-dimensional coordinate values of face physical sign data;
s3: generating images on two positive sides;
s4: and (5) carrying out three-dimensional modeling on the human face.
Specifically, the steps of the camera field calibration in step S1 are as follows: establishing an absolute space coordinate system, automatically identifying the frame marks in the known space positions in the photography field through the frame marks distributed in the image, and solving the internal and external orientation parameters of all cameras in the photography field in the real space scale through the calibration algorithm, thereby providing the positioning parameters of the image for the whole three-dimensional reconstruction and the face characteristic measurement of the human face model.
The algorithm for automatically identifying the frame marks in the images is a deep learning identification algorithm based on yolov3, hundreds of photos with the frame marks are marked, an algorithm model is trained, after the initial position of the frame mark is identified, the initial position is accurately matched with the correlation coefficient of the images of the template frame mark, and the relation between the pixel coordinate of the central point of the frame mark and the real three-dimensional coordinate of the space is obtained. The accurate matching of the correlation coefficient of the image with the template frame mark realizes more accurate determination of the center position of the frame mark.
And the calibration algorithm is used for determining the absolute position of the frame mark center in the coordinate system and the image plane coordinates of the frame mark which can be seen on all the lenses.
The specific steps of extracting and measuring the face key points in the step S2 are as follows:
s201: based on the face key point detection of the image, extracting the face characteristics through a model based on a deep convolution network;
s202: forward intersection calculation is carried out, for images with known orientation parameters and coordinates of image points of same name points between the images, three-dimensional space coordinates of object points corresponding to the image points are calculated by utilizing a light intersection principle, and the specific calculation process is as follows:
Figure GDA0003826174860000081
Figure GDA0003826174860000082
/>
Figure GDA0003826174860000083
Figure GDA0003826174860000084
(x'P 2 -P 0 )X=0
(y'P 2 -P 1 )X=0
wherein u and v are pixel coordinates of image point, K is camera parameter matrix, R and T are translation and rotation parameters of image, and P 0 ,P 1 ,P 2 Is the 1,2,3 th row of the matrix P.
For an object space point, any image point of the object space point can list the two equations, and the equations of a plurality of image points form a form of AX =0, and the object space coordinate X can be solved.
The computer vision field prefers to use cross multiplication to calculate, and for the forward intersection, the cross multiplication can be used for resolving, and the specific process is as follows: for multiple observations of the object point, there are the following multiple projection equations:
x 1 =P 1 X
x 2 =P 2 X
Figure GDA0003826174860000091
before calculation, the coordinate of the camera coordinate system of each image point is normalized, so | x | 1 || 2 And =1. For each observation projection equation, it can be converted into the following form:
[x 1 ] × P 1 X=0
[x 1 ] × [x 1 ] × P 1 X=0
(x 1 x 1 T -||x 1 || 2 I)P 1 X=0
(P 1 -x 1 x 1 T P 1 )X=0
the human face sign key point detection means that key region positions of a human face, including spatial positions of eyebrows, eyes, a nose, a mouth, two contours and the like, are positioned on a human face image, and the structure and the shape of each organ of the human face are different, so that the face of each person has different characteristics, and therefore, in the aspect of quantitative evaluation, the main measurement standard is the deviation between the key point position acquired by an algorithm and a real key point position; when evaluating the deviation, since the sizes of the faces of different images are different, a normalization strategy needs to be applied to the data for comparison under the same scale. In the embodiment, the model based on the deep convolutional network is adopted to extract the facial features of the human face, so that high accuracy can be realized, the accuracy of the test set LFW exceeds the accuracy of manual identification (the accuracy of human identification is 95%), and the latest research realizes the accuracy of 99.83% of the identification on the LFW. On a computer with conventional configuration, the model calculation speed can achieve the effect of quasi-real time.
The human face is essentially a three-dimensional model, after the internal and external orientation parameters of all cameras in a photographic field are confirmed, the three-dimensional coordinate values of the facial sign data can be obtained by calculating a plurality of homonymic human face feature key points of a plurality of images and conducting forward intersection, and the purpose of subsequent measurement is achieved.
The specific steps of generating an image on two sides in step S3 are as follows:
s301: image portrait segmentation, namely adopting a convolutional neural network based on deep learning, acquiring and labeling pictures of different people shot in a photographic field, then training and learning to obtain related parameters for integrating a semantic segmentation program, and finally adopting a full convolution network to perform human shape segmentation;
s302: calculating the height, obtaining the height and height difference of the key points of the face from the floor according to the coordinates of the key points of the face calculated in the step S2, calculating the difference ratio of the key points of the face to the coordinates y of the image plane, obtaining the height of the pixels at the top of the head of the person to be collected in the image through the image segmented in the step S301, and calculating the actual height of the person to be collected according to the ratio;
and S303, synthesizing the picture, selecting a picture shot on the corresponding vertical support according to the height of the collected person, cutting the image according to the standard, and drawing a ruler up and down and writing a name according to the proportion.
The generation of the images on the two sides adopts a full convolution network to carry out human shape segmentation, the network structure is optimized, the processing speed is higher on the premise of ensuring high precision, fine gaps among human bodies can be segmented, and a semantic segmentation program processes the received images to be segmented in a background service mode for coordinating the program loading speed and the image segmentation step speed. And when the photos are synthesized, one photo shot on the corresponding vertical support is selected, so that the condition that people look down or look up is avoided.
It should be noted that the cropping criteria and the labeling ratio in the composite photo are according to the document standard of "technical requirements and collection criteria for digital photos of criminal suspects".
The specific steps of the face three-dimensional modeling in the step S4 are as follows:
s401, sparse reconstruction based on the feature points is used for automatically solving the shooting pose and camera parameters of the user input image;
s402, image dense matching, namely completing automatic partitioning and multi-view dense matching according to the shooting pose of the image and the camera parameters to generate dense space point cloud of a three-dimensional scene;
s403, spatial point cloud networking, wherein the spatial point cloud networking is used for conducting multi-view fusion on the spatial point cloud and building a triangular network to generate an initial value of a three-dimensional scene surface model;
and S404, integrally optimizing the three-dimensional model, finely adjusting the surface model according to the image consistency and visibility constraint, performing texture mapping according to the original image, and finally generating the three-dimensional model meeting the user requirement.
In the above steps, step S401 specifically includes two-dimensional image Feature Tracking (2D Feature Tracking), three-dimensional pose estimation (3D estimation), adjustment and Optimization by beam method (Optimization Bundle Adjustment), three-dimensional recovery (Geometry fixing), and the like. The two-dimensional image feature tracking comprises image visibility analysis, feature extraction and feature matching, and is equivalent to an automatic point transferring process in aerial triangulation, namely a process of searching for a same-name point between images by a feature matching method. The three-dimensional pose estimation comprises a projection matrix estimation (corresponding to the single-slice spatial backward intersection in the photogrammetry theory) and a basic matrix estimation (corresponding to the relative orientation in the photogrammetry theory), which is equivalent to providing an initial value for the block adjustment in the aerial triangulation. The adjustment by the beam method is mainly the adjustment by the beam method with additional camera parameter self-checking under the controlled condition.
The specific process of step S402 is: selecting a neighboring image set of each image according to the pose of the image, then re-projecting the neighboring image set to an original image through an image imaging model according to the initially acquired three-dimensional portrait outline to form an initial depth map, and generating the initial depth map of each image in a confidence propagation mode by adopting a dense matching method with additional imaging geometric constraint to finely adjust the initial depth map of each image so as to obtain a final depth map and generate a spatial point cloud.
The multi-angle visibility condition provided by the annular camera field and the portrait outline information obtained by image segmentation are introduced into the image dense matching integral optimization model, so that the portrait three-dimensional reconstruction of the weak texture image can be realized.
The specific process of step S403 is: and performing Delaunay triangulation network subdivision on the three-dimensional discrete space point cloud obtained in the step S402 to construct a space Delaunay tetrahedron, calculating the visibility of the Delaunay tetrahedron according to the sight direction observed by the corresponding image point of the three-dimensional discrete space point cloud, and finally dividing the Delaunay tetrahedron into an outer body and an inner body by using a graph cutting method, wherein the intermediate surface where the inner body and the outer body are intersected is the final space triangular grid. The spatial point cloud further automatically detects and deletes the mismatching in the shielding area according to the visibility between the image points and the visibility between the spatial point and the multi-view image.
In the overall optimization of the three-dimensional model in step S404, a key step is discretizing the spatially continuous surface, so that the adjustment of the triangular lattice points in the three-dimensional space is converted into the accumulation of the light flow field contributions of a series of stereopair, and the energy functional for constructing the overall optimization mainly constructs a data-driven term according to the image consistency and a smooth term according to the curved surface constraint flow of the three-dimensional surface. The overall optimization of the energy functional can adopt a convex optimization method to obtain a global optimal solution.
In order to overcome the influence of large-area weak texture areas in images on matching of photographic images of the human body, the position of a camera is fixed, the result of well-calibrated internal and external orientation elements can be used, a space three-dimensional surface model with a rough shot target is obtained firstly before image matching and is used as an initial value of image matching, and a method of segmenting the outline of the human body and using a Visual shell (Visual Hull) is adopted to obtain the three-dimensional surface model of the shot human body in space.
The visible shell is the convex hull of the object determined by all known silhouette contours of the object in space. As shown in fig. 6, when an object in a space is observed from multiple viewing angles by means of perspective projection, a silhouette contour line of the object is obtained at each viewing angle, and the silhouette contour line and the corresponding perspective projection center together determine a general-shaped cone in a three-dimensional space. Obviously, the object must fall within this cone; and the cones, determined by all known silhouette contours and the corresponding perspective projection centers, eventually meet to form a convex hull containing the object, which is the visible shell of the object. In most cases, the visible envelope of a spatial object is a reasonable approximation of the object.
By the method, the acquisition success rate can be guaranteed to be larger than or equal to 99%, the measurement precision of the 3D human face and facial signs is smaller than or equal to 2mm, the height precision is smaller than or equal to 1cm, the 3D human face model is real in texture and small in error, the generation of images on one front side and two rear sides meets the department level standard, real-time generation is achieved, and the degree of automation is high.

Claims (10)

1. A three-dimensional human face sign acquisition system comprises a photographic field and is characterized in that the photographic field is semi-annular and is used for acquiring images of a multi-view portrait, establishing a coordinate system with a real scale through a photographic field calibration technology, and acquiring human face images on two sides, human face key features and human face three-dimensional model information; the camera shooting field comprises a lens support and a back plate, the lens support is used for a camera to collect photos of the face and the upper body of a person from different angles, and the back plate is used for keeping the background single and calibrating the position;
the camera comprises a lower lens support and an upper lens support which are horizontally arranged, cameras are mounted on the lens supports, the lower lens support is 155cm away from the floor, the upper lens support is 175cm away from the floor, the lower lens support and the upper lens support respectively comprise a left support rod, a middle support rod and a right support rod, the rod length of each left support rod is 50cm, the included angle between each left support rod and the corresponding middle support rod and the included angle between each right support rod and the corresponding middle support rod are 120 degrees, 5 cameras are horizontally and uniformly arranged on the left support rod, the middle support rod and the right support rod, the distance between every two adjacent cameras on each rod is 10cm, the cameras on the left support rod and the right support rod, which are close to the two ends of the middle support rod, are respectively 5cm away from the left end and the right end of the middle support rod, a vertical support rod is further arranged on a connecting line of the middle cameras on the middle support rods of the lower lens support and the middle support rod of the upper lens support, 1 camera is vertically arranged on the middle support rod, a middle camera collecting space of the middle camera is arranged on the vertical support rod, a camera collecting line is connected with a host computer, and a camera collecting central point is connected with the host computer through an API collecting interface;
a field-shaped frame is arranged on a floor vertically corresponding to a position 35cm away from a middle camera on the vertical support rod horizontally, and the size of the field-shaped frame is 30cm multiplied by 30cm and is used for a collected person to stand for taking a picture;
the backboard is vertically arranged on a floor which is horizontally arranged at a distance of 150cm from a middle camera on the vertical supporting rod, the backboard is pure white, the height of the backboard is 200cm, the width of the backboard is 120cm, an upper row of frame markers and a lower row of frame markers are horizontally arranged on the backboard, each row of frame markers are arranged at 2 numbers, the 2 frame markers are respectively 3cm away from the left edge and the right edge of the backboard, the lower row of frame markers is 174cm away from the floor, the upper row of frame markers is 5cm away from the lower row of frame markers and is 3cm away from the upper edge of the backboard, and the size of the frame markers is 8cm multiplied by 8cm.
2. A three-dimensional human face sign acquisition method is applied to the three-dimensional human face sign acquisition system of claim 1, and is characterized by comprising the following steps:
s1: calibrating a photographic field, and acquiring the inside and outside orientation parameters of all cameras in the photographic field;
s2: extracting and measuring face key points to obtain three-dimensional coordinate values of face physical sign data;
s3: generating images on two sides;
s4: and (5) carrying out three-dimensional modeling on the human face.
3. The method for acquiring three-dimensional human face signs according to claim 2, wherein the specific steps of camera field calibration in the step S1 are as follows: establishing an absolute space coordinate system, solving the internal and external orientation parameters of all cameras in the photographic field in a real space scale through a frame mark at a known space position distributed in the photographic field and then through an algorithm for automatically identifying the frame mark in an image and a calibration algorithm, and providing the positioning parameters of an image for the whole three-dimensional reconstruction and the facial feature measurement of the human face model.
4. The method for acquiring the three-dimensional human face signs according to claim 3, wherein the algorithm for automatically identifying the frame marks in the images is a yolov 3-based deep learning identification algorithm, hundreds of photos with the frame marks are marked, an algorithm model is trained, after the initial position of the frame mark is identified, the initial position of the frame mark is accurately matched with the correlation coefficient of the images of the template frame mark, and the relation between the pixel coordinates of the center point of the frame mark and the real three-dimensional coordinates of the space is obtained;
the calibration algorithm is used to determine the absolute position of the fiducial mark center in the coordinate system and the image plane coordinates where the fiducial mark can be seen on all lenses.
5. The method for acquiring three-dimensional human face signs according to claim 2, wherein the specific steps of extracting and measuring the face key points in the step S2 are as follows:
s201: based on the face key point detection of the image, extracting the face feature through a model based on a deep convolution network;
s202: forward intersection calculation is carried out, for images with known orientation parameters and coordinates of image points of same name points between the images, three-dimensional space coordinates of object points corresponding to the image points are calculated by utilizing a light intersection principle, and the specific calculation process is as follows:
Figure FDA0003826174850000031
Figure FDA0003826174850000032
Figure FDA0003826174850000033
Figure FDA0003826174850000034
(x'P 2 -P 0 )X=0
(y'P 2 -P 1 )X=0
wherein u and v are pixel coordinates of the image point, K is a camera parameter matrix, R and T are translation and rotation parameters of the image, and P 0 ,P 1 ,P 2 Is the 1,2,3 th row of the matrix P.
6. The three-dimensional human face sign acquisition method according to claim 2, wherein the specific steps of generating an image on the front side and the back side in step S3 are as follows:
s301: image portrait segmentation, namely, adopting a convolutional neural network based on deep learning, acquiring and labeling pictures of different figures shot in a photographic field, then training and learning to obtain related parameters for integrating a semantic segmentation program, and finally adopting a full convolution network to perform human shape segmentation;
s302: calculating the height, obtaining the height and height difference of the key points of the face from the floor according to the coordinates of the key points of the face calculated in the step S2, calculating the difference ratio of the key points of the face to the coordinates y of the image plane, obtaining the height of the pixels at the top of the head of the person to be collected in the image through the image segmented in the step S301, and calculating the actual height of the person to be collected according to the ratio;
and S303, synthesizing a picture, selecting a picture shot on the corresponding vertical support according to the height of the collected person, cutting the image according to a standard, and drawing a scale and writing a name up and down according to the proportion.
7. The method for acquiring three-dimensional human face signs according to claim 2, wherein the step S4 of three-dimensional modeling of the human face specifically comprises the steps of:
s401, sparse reconstruction based on the feature points is used for automatically solving the shooting pose and camera parameters of the user input image;
s402, image dense matching, namely completing automatic partitioning and multi-view dense matching according to the shooting pose of the image and the camera parameters to generate dense space point cloud of a three-dimensional scene;
s403, spatial point cloud networking, wherein the spatial point cloud networking is used for conducting multi-view fusion on the spatial point cloud and building a triangular network to generate an initial value of a three-dimensional scene surface model;
and S404, integrally optimizing the three-dimensional model, finely adjusting the surface model according to the image consistency and visibility constraint, performing texture mapping according to the original image, and finally generating the three-dimensional model meeting the user requirement.
8. The three-dimensional human face sign acquisition method according to claim 7, wherein the step S401 specifically includes two-dimensional image feature tracking, three-dimensional pose estimation, bundle adjustment and optimization, and three-dimensional restoration, wherein the two-dimensional image feature tracking includes image visibility analysis, feature extraction, and feature matching, and the three-dimensional pose estimation includes projection matrix estimation and basis matrix estimation.
9. The method for acquiring three-dimensional human face signs according to claim 7, wherein the specific steps of the step S402 are as follows: selecting a neighboring image set of each image according to the pose of the image, then re-projecting the neighboring image set to an original image through an image imaging model according to the initially acquired three-dimensional portrait outline to form an initial depth map, and generating and finely adjusting the initial depth map of each image in a confidence propagation mode by adopting a dense matching method with additional imaging geometric constraint to obtain a final depth map and generate a spatial point cloud.
10. The method for acquiring three-dimensional human face signs according to claim 7, wherein the specific steps in step S403 are as follows: and performing Delaunay triangulation network subdivision on the three-dimensional discrete space point cloud obtained in the step S402 to construct a space Delaunay tetrahedron, calculating the visibility of the Delaunay tetrahedron according to the sight direction observed by the corresponding image point of the three-dimensional discrete space point cloud, and finally dividing the Delaunay tetrahedron into an outer body and an inner body by using a graph cutting method, wherein the intermediate surface where the inner body and the outer body are intersected is the final space triangular grid.
CN202110582384.2A 2021-05-27 2021-05-27 Three-dimensional human face sign acquisition system and method Active CN113239850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110582384.2A CN113239850B (en) 2021-05-27 2021-05-27 Three-dimensional human face sign acquisition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110582384.2A CN113239850B (en) 2021-05-27 2021-05-27 Three-dimensional human face sign acquisition system and method

Publications (2)

Publication Number Publication Date
CN113239850A CN113239850A (en) 2021-08-10
CN113239850B true CN113239850B (en) 2023-04-07

Family

ID=77139076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110582384.2A Active CN113239850B (en) 2021-05-27 2021-05-27 Three-dimensional human face sign acquisition system and method

Country Status (1)

Country Link
CN (1) CN113239850B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530599B (en) * 2013-04-17 2017-10-24 Tcl集团股份有限公司 The detection method and system of a kind of real human face and picture face
CN108765537A (en) * 2018-06-04 2018-11-06 北京旷视科技有限公司 A kind of processing method of image, device, electronic equipment and computer-readable medium
CN108898665A (en) * 2018-06-15 2018-11-27 上饶市中科院云计算中心大数据研究院 Three-dimensional facial reconstruction method, device, equipment and computer readable storage medium
CN109285215B (en) * 2018-08-28 2021-01-08 腾讯科技(深圳)有限公司 Human body three-dimensional model reconstruction method and device and storage medium

Also Published As

Publication number Publication date
CN113239850A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN110462686B (en) Apparatus and method for obtaining depth information from a scene
WO2015188684A1 (en) Three-dimensional model reconstruction method and system
Hoppe et al. Online Feedback for Structure-from-Motion Image Acquisition.
JP5093053B2 (en) Electronic camera
CN109360240A (en) A kind of small drone localization method based on binocular vision
CN107504957A (en) The method that three-dimensional terrain model structure is quickly carried out using unmanned plane multi-visual angle filming
CN111275750A (en) Indoor space panoramic image generation method based on multi-sensor fusion
AU2011312140A1 (en) Rapid 3D modeling
CN109242898B (en) Three-dimensional modeling method and system based on image sequence
CN102073863A (en) Method for acquiring characteristic size of remote video monitored target on basis of depth fingerprint
CN114241031A (en) Fish body ruler measurement and weight prediction method and device based on double-view fusion
CN112016497A (en) Single-view Taijiquan action analysis and assessment system based on artificial intelligence
CN110189294A (en) RGB-D image significance detection method based on depth Analysis on confidence
CN110428501A (en) Full-view image generation method, device, electronic equipment and readable storage medium storing program for executing
CN112700545B (en) Simulation display system and method for remote sensing data
CN114627491A (en) Single three-dimensional attitude estimation method based on polar line convergence
CN106683163A (en) Imaging method and system used in video monitoring
CN108257182A (en) A kind of scaling method and device of three-dimensional camera module
CN115330594A (en) Target rapid identification and calibration method based on unmanned aerial vehicle oblique photography 3D model
CN113345084B (en) Three-dimensional modeling system and three-dimensional modeling method
CN111047678B (en) Three-dimensional face acquisition device and method
CN104732586A (en) Fast reconstruction method for three-dimensional human body dynamic form and fast construction method for three-dimensional movement light stream
CN110443228A (en) A kind of method for pedestrian matching, device, electronic equipment and storage medium
CN113239850B (en) Three-dimensional human face sign acquisition system and method
Rossi et al. Measurement of the geometric characteristics of a fire front by stereovision techniques on field experiments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant