CN112562059B

CN112562059B - Automatic structured light pattern design method

Info

Publication number: CN112562059B
Application number: CN202011327373.1A
Authority: CN
Inventors: 杨涛; 彭磊; 刘超; 刘青峰; 李欢欢; 周翔
Original assignee: Gedian Technology Shenzhen Co ltd
Current assignee: Gedian Technology Shenzhen Co ltd
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2023-12-08
Anticipated expiration: 2040-11-24
Also published as: CN112562059A

Abstract

The invention discloses an automatic structured light pattern design method, which uses artificial intelligence technology to automatically design an optimal structured light pattern by constructing a conductive numerical simulation method. Which comprises the following steps: determining a primary constraint of the system; determining a use scene of the system and collecting corresponding three-dimensional data; constructing a guided structured light three-dimensional imaging simulation environment; constructing a guided structured light generation and decoding algorithm; training to obtain an optimized structured light coding pattern; an optimized structured light encoding pattern is deployed. Compared with the traditional artificial structured light pattern, the designed structured light pattern is more optimized and has higher imaging precision; the invention can automatically complete design work by using a computer; the three-dimensional imaging scheme can be customized according to the use scene and the system constraint, and the system has better flexibility and expandability.

Description

Automatic structured light pattern design method

Technical field:

the invention relates to an automatic structured light pattern design method, which mainly uses differential geometry and machine learning technology and uses artificial intelligence to design a decoding algorithm of structured light coding so as to realize an optimized three-dimensional imaging scheme. The invention belongs to the field of optical three-dimensional measurement and artificial intelligence.

The background technology is as follows:

the structured light uses the principle of 'triangular imaging', the specially designed structured light pattern is projected to the surface of an object through a projection device, a camera shoots a deformed structured light image from another angle, and then three-dimensional information of a scene is reconstructed through a corresponding demodulation or matching algorithm. Structured light techniques are classified into point structured light, line structured light and surface structured light according to the form of the projected structured light. The advantage of the point and line structured light is high robustness, and the disadvantage is obvious: both methods are very inefficient. Correspondingly, they are thus often used for specific measurement scenarios, such as industry. In contrast, the high efficiency of surface structured light makes its use scenario broader. The common structured light can be classified into binary codes, gray codes and color codes according to the gray scale used for the projection pattern thereof, as shown in fig. 1. Binary coding uses black and white two-level gray scale to carry out graphic feature coding, and is characterized by high robustness but generally poor precision. One of the most popular structured light coding modes at present is binary pseudo-random lattice coding, and Microsoft Kinect V1, face ID of apple company, and the RealSense series of Intel all use the coding modes of the type. Gray scale is used as a characteristic for encoding, and compared with binary image characteristic encoding, the gray scale is used as a characteristic, so that the characteristic is denser, the imaging precision is greatly improved, and the defect that the gray scale encoding is difficult to realize single-frame three-dimensional imaging and is interfered by the reflectivity of an object to a certain extent is overcome. Among gray scale coding techniques, the sine and cosine phase coding is the most popular, and the developed fourier transform profilometry and phase shift profilometry are the most developed and used techniques in high-precision three-dimensional imaging. Color coding can use both graphics and gray information for multi-channel coding. The most densely coded features are most advantageous in view of the fact that they are the least applicable solution, due to the high demands on the projection device and the severe interference of the object texture colors. Of course, the other two schemes can well utilize infrared wave bands to perform noninductive imaging due to the fact that the wave bands are limited, so that the scheme is favored.

These conventional structured light codes described above all rely on manual design. Humans create graphic feature codes based on their own feature logic to assist in matching, such as pseudo-random lattice codes. Fringe projection structured light using sine and cosine features is also created, and such signals can be extracted more easily in the frequency domain. Structured light has evolved to the present day, all without departing from the limitations of human characteristic thinking, and these structured light codes, which were designed in the past, were not sufficiently "rational" and could be more optimized, which is a major problem addressed by the present invention.

In many fields of computer graphics, computer vision and machine learning, the computation of differentiation is increasingly important. In particular, there is an urgent need for machine learning algorithms that require back-propagation calculations, and for rendering algorithms that are minimally capable of arbitrary input parameters (e.g., camera position and orientation, scene geometry, lighting, materials, etc.). Previous conductive rendering studies have focused mainly on fast approximation solutions using simpler rendering models that deal only with the primary factors related to visibility, and ignore secondary factors such as shadows and indirect light. Solutions for diffusely reflecting surfaces are difficult to generalize to models of arbitrary materials. The Ramamoorthi et al work is again only minimally related to the coordinates of the rendered image (reference 1). Recently, methods of constructing a "micro-renderable layer" using deep learning techniques have begun to become popular, however they are only aimed at specific purposes and cannot deal with discontinuous geometries such as illumination and occlusion relationships (reference 2). Tzu-Mao Li et al developed a first physical engine-based micro-renderable framework (reference 3) that was micro-capable of arbitrary input parameters to the engine, pushing the differential geometry into a new phase, enabling differential geometry-based structured light design work.

In architecture, the conventional structured light cannot be improved by a learning method because it is forward-calculated, and cannot be optimized using a back-propagation algorithm, as shown in fig. 2. Recently developed structured light algorithm based on machine learning uses a neural network capable of back propagation in the structured light demodulation stage, so that the structured light algorithm can learn and promote from data through machine learning and is a current research hotspot. However, in both of these architectures, the coding scheme using artificial intelligence to design structured light cannot be implemented.

The invention mainly combines the latest differential geometry technology and machine learning technology to realize the three-dimensional imaging of the total-flow-path-guided probability structured light, namely, the three-dimensional imaging scheme of the structured light can be realized by utilizing the artificial intelligence technology by counter-propagating the structured light from the generation, the projection and the demodulation of the structured light.

The invention comprises the following steps:

the invention aims to provide an automatic structured light pattern design method to solve the problem that traditional structured light depends on manual design.

An automated structured light pattern design method comprising the steps of:

determining a primary constraint of the system;

secondly, determining a use scene of the system and collecting corresponding three-dimensional data;

thirdly, constructing a guided structured light three-dimensional imaging simulation environment;

(IV) constructing a guided structured light generation and decoding algorithm;

training to obtain an optimized structured light coding pattern;

and (six) deploying an optimized structured light coding pattern.

In the step (one), first, a main constraint of the three-dimensional imaging system is determined, wherein the main constraint includes, but is not limited to:

1) Dimension of the projection system. Such as one-dimensional encoding or two-dimensional feature encoding. The stripe structured light is a one-dimensional feature and the pseudo-random lattice is a two-dimensional feature.

2) Number of channels of the projection system. The infrared projection system is single-channel and the color projection system is three-channel.

3) Optical properties of the projection system. The optical system is low-pass and cannot project features with too high a spatial frequency.

4) Gray scale of projection system. Binary features tend to be more robust and gray scale features can carry denser coding information. Different gray scale intervals have different requirements on the gray scale resolution of the system.

5) Linear characteristics of the projection system. The linear characteristics of the projection system may produce additional modulation of the output structured light characteristics.

6) Precision requirements and speed requirements.

According to the above constraint, the coding scheme is preliminarily determined.

In the step (II), the "use scenario" includes but is not limited to

1) Shape characteristics of a three-dimensional imaged object. Whether the imaged scene is a fixed object or a plurality of objects, such as three-dimensional imaging of a human face is aimed at a fixed-shape scene; whether a continuous or discontinuous surface.

2) Texture features of a three-dimensional imaging object. Whether the texture is monochromatic or multi-colored, whether the reflective features are diffuse or specular, or a combination thereof.

The above features require the input of the corresponding network by using training data as a carrier. A large amount of three-dimensional data that can represent an actual usage scenario needs to be collected.

The "three-dimensional data" may be point cloud data, mesh data, or curved surface data, and its common format includes, but is not limited to obj, stl, ply, stp.

The source of the three-dimensional data can be obtained by scanning through a scanner, can be automatically or manually synthesized through a computer, can be modified on the basis of the data, and can be any form of three-dimensional data obtained by other ways.

The three-dimensional data can be in the form of a single file, or can be a combined scene of a plurality of files in different scales and position postures in a three-dimensional space.

The "three-dimensional data" may be static, may be dynamically changing in position and posture over time, may be changing in shape over time, or may be changing in position and posture over time.

The "three-dimensional data" uses data enhancement methods to expand the data set, including but not limited to: color, reflectance.

In the step (III):

the "training environment" is a conductive physical rendering simulation engine.

The training environment may contain one rendering scene or may contain multiple rendering scenes in parallel.

The rendered scene comprises at least one three-dimensional imaging system and one three-dimensional object, as in fig. 3.

The three-dimensional imaging system comprises at least one camera and one structured light projector, as shown in fig. 3. The camera is a virtual camera having similar functions and parameters as a physical world camera, except that the image taken by the camera can differentiate any parameter in the scene. The structured light projector is a virtual active light source that has similar functions and parameters as an object world projection device, except that the projected image is steerable for any parameter in the scene.

In the three-dimensional imaging system, a fixed or variable included angle and a fixed or variable relative pose are arranged between the virtual camera and the structured light projector.

The relative position between the three-dimensional imaging system and the three-dimensional object is fixed or varies with time.

The virtual camera needs to set resolution, gray scale output or color output, distortion, focal length, and format and bit number of data according to actual usage scenario. To ensure that the simulation process approaches the actual use scenario.

The structured light projector needs to set attributes such as dimension, gray scale, low pass, distortion and the like according to an actual use scene so as to ensure that the simulation process is close to the actual use scene.

The virtual camera has a fixed angle of view, or a time-varying angle of view.

The rendered scene contains different types and numbers of ambient light sources, the number and types of ambient light sources and parameters (intensity, hue, direction, etc.) being fixed or varying over time to simulate ambient light disturbances in an actual usage scene.

The step (IV) is as shown in FIG. 4, and comprises the following substeps:

1) Generating a guided structured light encoding pattern using a structured light generation algorithm

The guided structured light coding pattern is generated by a neural network to generate a series of structured light generation parameters, and then the structured light coding pattern is calculated and generated by using the parameters. The algorithm used in the computational generation process is straightforward. The structured light generating parameter may be frequency, phase, etc. of sine and cosine structured light, or may be parameters such as coding bit number of pseudo-random lattice and dot size.

In another embodiment of the invention. The guided structured light generation algorithm directly generates a structured light pattern. The guided algorithm module may be a neural network, or may be other forms of learning-based methods.

The structured light generation algorithm takes a fixed random vector as input to generate a fixed resolution structured light pattern. The generation module needs to perform the constraint of the structured light according to the system constraint in the step (one).

2) Projecting the generated structural pattern onto the surface of the three-dimensional object to be measured using a structured light projector

The structured light projector takes the generated structured light pattern as an input, and projects the pattern onto the surface of the object, and performs mixed rendering with the texture of the object, ambient light, and the like.

The structured-light pattern is an image or a set of images projected by a fixed projector.

In another embodiment of the invention, the structured-light pattern is an image or a set of images projected by a plurality of structured-light projectors of the same or different parameters.

In another embodiment of the invention, the structured-light pattern is a set of images, each projected by a different structured-light projector.

3) Collecting structured light information of an object surface using a camera

At least one camera is used to collect structured light information of the object surface. The camera outputs an acquired structured light image, and also outputs a scene depth map aligned with the structured light, referred to as a depth map truth value. The depth characterization information includes, but is not limited to: depth map, normal map, curvature map, curved surface expansion map, UV map. The depth map may be any one of the above-described forms, or may be any combination of the above-described forms.

In another embodiment of the invention, the depth characterizing information is output by another depth camera having the same spatial position, same pose, and same angle of view as the structured light collection camera.

4) Decoding structured light collected by a camera using a structured light decoding algorithm

The guided structured light decoding algorithm is a guided algorithm module which can be optimized by using a back propagation algorithm, and can be a guided form of a traditional structured light algorithm, such as a guided form of phase shift profilometry, fourier transform profilometry, a feature matching algorithm, and the like, a neural network, or other learning-based methods.

The structured light decoding algorithm takes the structured light information of the object surface collected in the previous step as input, and outputs a predicted depth representation map or a combination of depth representation maps.

The step (five) comprises the following substeps:

1) Loading three-dimensional data

Before each rendering of the structured light image, one three-dimensional data is reloaded, or the spatial pose or combination mode of the three-dimensional data is transformed, or the texture characteristics of the three-dimensional data are transformed, or the spatial pose of the three-dimensional imaging system is transformed, or the combination changes are carried out. So that the three-dimensional scene within the camera field of view changes from the last time each rendering.

2) Establishing a loss function and calculating a loss

The loss function is used to describe the difference between the predicted depth map and the truth of the map. The loss function types include, but are not limited to: l1 loss, L2 loss, and some combination thereof.

In another embodiment of the invention, the loss between the predicted depth map and the truth of the map is determined by a neural network comparison.

3) Counter-propagating, optimizing parameters

The gradient descent method is used to search for optimal parameters. The parameters are weights of the neural network, or corresponding parameters of other learning methods.

4) Obtaining the optimal projection structure light pattern

In the step (six):

in actual deployment, the designed structured light pattern, and corresponding algorithms, are used. The corresponding algorithm may or may not be conductive.

The positive effects of the invention

The invention relates to a method for designing a structured light pattern by using an artificial intelligence by using a method for simulating a derivative value. This approach allows a user to customize a set of three-dimensional imaging schemes based on structured light principles according to the use scenario. Compared with the traditional method, the method has the following positive effects:

1) Compared with the traditional artificial structured light pattern, the designed structured light pattern is more optimized and has higher imaging precision.

2) Compared with the traditional structured light algorithm, the invention needs to manually design the projection pattern, and can automatically complete the design work by using a computer.

3) Compared with the traditional structured light algorithm, the three-dimensional imaging method can customize a three-dimensional imaging scheme according to the use scene and system constraint, and has better flexibility and expandability.

Drawings

Fig. 1 illustrates a structured light encoding scheme.

FIG. 2 illustrates an example fringe projection measurement system. 1 an imaging system; 2 a structured light projection system; 3 the object to be measured.

FIG. 3 illustrates an example network architecture;

fig. 4 builds a guided structured light generation and decoding algorithm map.

Detailed Description

The invention aims to realize the automatic design of the structured light pattern by utilizing the technology of differential geometry of artificial intelligence. In order to achieve the purpose, the method provides the following example technical scheme:

determining major constraints of a system

The main constraint of the system is firstly determined, the projection system is a one-dimensional, gray-scale and single-channel structured light projection system, and other constraints are avoided.

(II) determining the use scene of the system and collecting corresponding three-dimensional data

The scene is used for three-dimensional imaging of the face. A three-dimensional model database of faces is collected, and the three-dimensional face model as used herein is a high-precision 3D face model from a scanner scan.

(III) constructing a conductive structured light three-dimensional imaging simulation environment

An active projection light source is constructed using components provided by the Redener engine, thereby constructing a guidable structural light simulation environment. The virtual camera and the structured light projector have the same resolution and angle of view. The camera and projector form a monocular parallel optical axis three-dimensional imaging system.

The center of the field of view of the three-dimensional imaging system coincides with the center of the three-dimensional model of the human face, so that the human face model is ensured to be always in the field of view of imaging.

(IV) building a guided structured light generation and decoding algorithm

Firstly, sampling a variable z with the dimension of (1, 128) from a random variable space, and obtaining a target variable phi after the variable z passes through a fully-connected neural network, wherein phi is a vector, and phi (A, B, f, phi) is a variable with the dimension of (1, 4).

The structured light pattern is generated by the following steerable process calculation:

I(x,y)＝A+Bcos(2πfx+φ)

secondly, the structured light image is projected to the three-dimensional face surface by using a structured light projector, and the image is acquired

Each structured light projection and collection process can be expressed as:

Im,Depth＝Rend _diff (pattern,3D_Scene(models,pose,texture,scale),Light)

in each differentiable rendering of Rend _diff All contain an actively projected structured Light pattern, and a three-dimensional scene consisting of a series of random textures (texture), random poses (pose), random scale 3D models (model), and ambient Light. The output is a steerable acquisition image Im and a predicted Depth characterization map Depth, which in this example is a Depth map.

The rendering model uses a classical Blinn-Phong illumination model, the mathematical expression of which is:

Im＝pattern _cam *albedo ^γ *(N·H) ^shininess

in pattern _cam Is a projection pattern at a camera viewing angle; albedo is albedo; gamma is used for adjusting the gamma coefficient of the albedo; n is a surface normal vector; h is an intermediate vector of the light incidence direction L and the viewpoint direction V, also referred to as a half angle vector; shinine is a high light coefficient, with closer to 0 indicating closer to an ideal diffuse reflection surface, further from 0, closer to specular reflection.

Finally, a deep convolutional neural network is constructed to recover the depth information from the collected structured light information. A depth convolution network uses a more common encoder-decoder structure (encoder-decoder) that takes as input an acquired structured light image and outputs a pixel-aligned depth map.

Training to obtain optimized structured light coding pattern

And loading the face model, randomly generating rendering parameters, rendering to obtain a structural light image and a depth map truth value. The depth map is predicted using the neural network described in the previous step.

The construction loss function is:

L ₁ ＝|GT-D|

wherein L is ₁ For L1 penalty, GT is the true value of Depth, and D is the Depth of network prediction.

And (3) iterating by using an Adam optimizer, optimizing parameters of the network, and repeatedly repeating the steps until the gradient of the network is not obviously reduced, and ending training.

(VI) deploying an optimized structured light coding pattern

And (3) carrying out three-dimensional imaging on the human face by using the structured light image obtained in the fifth step and a decoding network. In implementing deployment, the structured light acquisition image comes from a real world camera, which is distinguished from the training process.

Although specific embodiments have been described and illustrated in detail, the invention is not limited to the embodiments described and may be practiced otherwise than as specifically described and within the spirit and scope of the present invention as defined by the following claims. In particular, it is to be understood that other embodiments may be utilized and functional modifications may be made without departing from the scope of the present invention.

In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims or in different embodiments does not indicate that a combination of these measures cannot be used to advantage.

It should be emphasized that the term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.

The features of the methods described above and below may be implemented in software and may be executed on a data processing system or other processing tool by execution of computer-executable instructions. The instructions may be program code that loads memory (e.g., RAM) from a storage medium or from another computer via a computer network. Alternatively, the described features may be implemented by hardwired circuitry instead of software, or by a combination of hardwired circuitry and software.

Claims

1. An automated structured light pattern design method, comprising the steps of:

determining a primary constraint of the three-dimensional imaging system;

secondly, determining the use scene of the three-dimensional imaging system and collecting corresponding three-dimensional data;

(IV) constructing a guided structured light generation and decoding algorithm;

training to obtain an optimized structured light coding pattern;

(six) deploying an optimized structured light coding pattern;

in the step (one), first, main constraints of the three-dimensional imaging system are determined, wherein the main constraints are as follows:

1) Projecting dimensions of the three-dimensional imaging system;

2) Projecting the number of channels of the three-dimensional imaging system;

3) Optical characteristics of the projection three-dimensional imaging system;

4) Projecting the gray scale range of the three-dimensional imaging system;

5) Projecting linear features of the three-dimensional imaging system;

6) Precision requirements and speed requirements;

in the step (ii), the "usage scenario" is characterized in that:

1) A shape feature of the three-dimensionally imaged object;

2) Texture features of the three-dimensional imaging object;

in the step (two): the shape features and the texture features need to be input into a corresponding network by taking training data as a carrier; a large amount of three-dimensional data that can represent an actual usage scenario needs to be collected;

the three-dimensional data is point cloud data, grid data or curved surface data, and obj, stl, ply, stp is included;

the source of the three-dimensional data is obtained by scanning by a scanner, or is obtained by automatic or manual synthesis by a computer, or is obtained by modifying the data on the basis of the data, or is any form of three-dimensional data obtained by other ways;

the three-dimensional data is in the form of a single file or a combined scene of a plurality of files in different scales and position postures in a three-dimensional space;

the three-dimensional data is static, or the position and the gesture are dynamically changed along with time, or the shape is changed along with time, or the position and the gesture are changed along with time;

the three-dimensional data uses a data enhancement method to expand the data set, wherein the enhancement method comprises the changes of color and reflectivity;

in the step (III):

the 'simulation environment' is a conductive physical rendering simulation engine;

the 'simulation environment' comprises one rendering scene or a plurality of parallel rendering scenes;

the rendering scene comprises at least one three-dimensional imaging system and a three-dimensional object;

the three-dimensional imaging system at least comprises a camera and a structured light projector; the camera is a virtual camera and has functions and parameters similar to those of a physical world camera, except that an image shot by the camera can differentiate any parameter in a scene; the structured light projector is a virtual active light source that has similar functions and parameters as the object world projection device, except that the projected image is steerable to any parameter in the scene;

in the three-dimensional imaging system, a fixed or variable included angle and a fixed or variable relative pose are arranged between the virtual camera and the structured light projector;

the relative position between the three-dimensional imaging system and the three-dimensional object is fixed, or varies with time;

the virtual camera needs to set resolution, gray level output or color output, distortion, focal length, and format and bit number of data according to actual use scene; to ensure that the simulation process approaches the actual use scene;

the structured light projector needs to set dimension, gray scale, low pass and distortion attributes according to actual use scenes so as to ensure that the simulation process is close to the actual use scenes;

the virtual camera has a fixed angle of view, or a time-varying angle of view;

the rendering scene comprises different types and numbers of ambient light sources, and the numbers and types of the ambient light sources and parameters of the ambient light sources are fixed or changed along with time so as to simulate the ambient light interference in the actual use scene;

said step (four) comprises the sub-steps of:

1) Generating a guided structured light encoding pattern using a structured light generation algorithm;

the guided structured light coding pattern is generated by a neural network to generate a series of structured light generation parameters, and then the structured light coding pattern is obtained by calculating and generating the parameters; the algorithm used in the computational generation process is straightforward; the structural light generating parameter is the frequency and the phase of sine and cosine structural light or the coding bit number of a pseudo-random lattice and the size parameter of a point;

2) Projecting a structured light coding pattern to the surface of the three-dimensional object to be measured by using a structured light projector;

the structure light projector takes the structure light coding pattern as input, and projects the structure light coding pattern onto the surface of the object, so that the structure light coding pattern and the texture of the object and the ambient light are mixed and rendered;

the structured light encoding pattern is an image or a group of images projected by a fixed projector;

3) Collecting structured light information of the surface of the object by using a camera;

collecting structured light information of the surface of the object by using at least one camera; the camera outputs the collected structured light image, and simultaneously outputs a scene depth representation image aligned with the structured light, which is called a depth representation image truth value; the scene depth map comprises: depth map, normal map, curvature map, curved surface expansion map, UV map; the output scene depth representation map is in any one form or any combination form;

4) Decoding the structural light collected by the camera by using a structural light decoding algorithm;

the structured light decoding algorithm is a guided algorithm module which uses a back propagation algorithm to optimize, or is a guided form of a traditional structured light algorithm;

the structured light decoding algorithm takes the structured light information of the object surface collected in the previous step as input, and outputs a predicted depth representation map or a combination of the depth representation maps;

the step (five) comprises the following substeps:

1) Loading three-dimensional data

Before each time of rendering the structured light image, reloading three-dimensional data, or transforming the spatial pose or combination mode of the three-dimensional data, or transforming the texture characteristics of the three-dimensional data, or transforming the spatial pose of a three-dimensional imaging system, or carrying out combination change; so that the three-dimensional scene in the camera field of view has a change relative to the last time when each rendering;

2) Establishing a loss function and calculating a loss

The loss function is used to describe the difference between the predicted depth map and the truth of the map; the loss function types include: l1 loss, L2 loss, and some combination thereof;

3) Counter-propagating, optimizing parameters

Searching optimal parameters by using a gradient descent method; the parameters are weights of the neural network or corresponding parameters of other learning methods;

4) An optimal projected structured light pattern is obtained.

2. The structured light pattern design method according to claim 1, wherein the structured light pattern design method comprises the steps of:

determining the main constraint of the system:

firstly, determining main constraint of a system, wherein the projection system is a one-dimensional, gray-scale and single-channel structured light projection system, and no other constraint exists;

(II) determining a use scene of the system, and collecting corresponding three-dimensional data:

the scene is used for three-dimensional imaging of the face, a three-dimensional model database of the face is collected, and the three-dimensional model of the face used in the database is a high-precision 3D face model from scanner scanning;

and (III) constructing a conductive structured light three-dimensional imaging simulation environment:

constructing a guidable structured light simulation environment by using a Redener, wherein the virtual camera and the structured light projector have the same resolution and view angle; the camera and the projector form a monocular parallel optical axis three-dimensional imaging system;

the center of the view field of the three-dimensional imaging system is coincident with the center of the three-dimensional model of the human face, so that the three-dimensional model of the human face is ensured to be always in the imaged view field;

(IV) constructing a guided structured light generation and decoding algorithm:

firstly, sampling a variable z with the dimension of (1, 128) from a random variable space, and obtaining a target variable phi after the z passes through a fully connected neural network, wherein phi is a vector, and phi (A, B, f, phi) is a variable with the dimension of (1, 4);

I(x,y)＝A+Bcos(2πfx+φ)；

secondly, projecting the structured light image onto the surface of the three-dimensional model of the human face by using a structured light projector and collecting the image;

each structured light projection and collection process can be expressed as:

Im,Depth＝Rend _diff (pattern,3D_Scene(models,pose,texture,scale),Light)

in each differentiable rendering of Rend _diff All comprising an active projected structured Light pattern and a three-dimensional scene consisting of a series of random textures, random poses, random scale 3D models, and ambient Light; output as a guided acquired image Im and predictionDepth represents the Depth of the map, which is the Depth map;

Im＝pattern _cam *albedo ^γ *(N·H) ^shininess ；

in pattern _cam Is a projection pattern at a camera viewing angle; albedo is albedo; gamma is used for adjusting the gamma coefficient of the albedo; n is a surface normal vector; h is an intermediate vector of the light incidence direction L and the viewpoint direction V, also referred to as a half angle vector; shinine is a high light coefficient, the closer to 0, the closer to the ideal diffuse reflection surface, the farther from 0, the closer to specular reflection;

finally, constructing a depth convolution neural network to recover the depth information from the collected structured light information; the depth convolutional neural network uses a more common encoding-decoding structure; the network takes the collected structured light image as input and outputs a depth map with aligned pixels;

training to obtain an optimized structured light coding pattern;

loading a face model, randomly generating rendering parameters, rendering to obtain a structural light image and a depth map truth value; predicting a depth map by using the depth convolution neural network in the previous step;

the construction loss function is:

L ₁ ＝|GT-D|

wherein L is ₁ For L1 loss, GT is the true value of Depth, and D is the Depth of network prediction;

iteration is carried out by using an Adam optimizer, parameters of a network are optimized, and the process of the steps is repeated repeatedly until the gradient of the network is not obviously reduced, and training is finished;

(six) deploying an optimized structured light coding pattern;

and (3) performing three-dimensional imaging on the human face by using the structured light coding pattern obtained in the fifth step and a decoding network.