CN117152397B

CN117152397B - Three-dimensional face imaging method and system based on thermal imaging projection

Info

Publication number: CN117152397B
Application number: CN202311393671.4A
Authority: CN
Inventors: 王永恒; 张帅; 路杰
Original assignee: Huiyigu Traditional Chinese Medicine Technology Tianjin Co ltd
Current assignee: Huiyigu Traditional Chinese Medicine Technology Tianjin Co ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-01-26
Anticipated expiration: 2043-10-26
Also published as: CN117152397A

Abstract

The invention discloses a three-dimensional face imaging method and system based on thermal imaging projection. The method comprises the following steps: acquiring a data set, marking the data set, preprocessing the data set, identifying by using a deep learning network model, judging whether the data set is credible, if so, performing texture filling, and if not, re-acquiring the image and repeating the process. The acquired facial images are comprehensively identified in a mode that the depth camera, the RGB camera and the thermal imaging camera are respectively combined with the corresponding deep learning network, so that the actual facial imaging result of the thermal imaging image reflected by the mirror surface is avoided, the accuracy of facial imaging is improved, and the success rate of facial diagnosis is improved.

Description

Three-dimensional face imaging method and system based on thermal imaging projection

Technical Field

The invention relates to the technical field of image processing, in particular to a three-dimensional face imaging method and system based on thermal imaging projection.

Background

When the inspection is performed in the traditional Chinese medicine diagnosis and treatment, different facial colors can be changed when different lamplight irradiates, so that a standard light source is required to be arranged in the multidimensional inspection acquisition box, the quality of pictures shot by an internal camera is ensured not to be influenced by external environment, no dead angle exists in the inspection acquisition box when the light source irradiates, and the phenomena of shadow and unbalanced reflection cannot occur.

The existing traditional Chinese medicine inspection acquisition equipment is characterized in that an RGB camera is arranged in an acquisition box to shoot a high-definition color two-dimensional plane image of a target tongue surface and is used for clinically analyzing the texture characteristics of the target, and the three-dimensional characteristics of the target tongue surface and the temperature information thereof cannot be intuitively reflected due to the fact that the texture image shot by the RGB camera is poor in depth position information accuracy.

The existing acquisition box is characterized in that RGB camera shooting target clear color images are arranged in a box body and used for building a front base model, but because texture images shot by the RGB cameras are poor in depth position information precision, only approximate outlines can be obtained, local outlines of users cannot be clearly displayed, and therefore formed three-dimensional imaging is inaccurate.

To solve this problem, in the prior art, for example, patent application number CN202010932310.2, entitled 3D camera and infrared scanning algorithm with point cloud aligned with color textures. The method is applicable to conventional three-dimensional imaging, and when infrared imaging and color textures are aligned during inspection image processing, thermal imaging formed in an inspection box by a user is a set of self thermal imaging and thermal imaging of specular reflection in the inspection box; therefore, the method cannot be directly used for texture filling, so that the problem of inaccurate imaging data of thermal imaging in a inspection box and the problem of how to use mapping data of thermal imaging to enable the imaging data to be combined with color textures to form accurate three-dimensional face imaging are the urgent problems to be solved in the application.

Disclosure of Invention

Therefore, the invention aims to provide a three-dimensional face imaging method and system based on thermal imaging projection, which comprehensively identifies the acquired face images by adopting a mode of combining a depth camera, an RGB camera and a thermal imaging camera with a corresponding deep learning network, avoids the actual face imaging result of the thermal imaging image reflected by a mirror surface, improves the face imaging accuracy and improves the success rate of face diagnosis.

In order to achieve the above object, the present invention provides a three-dimensional face imaging method based on thermal imaging projection, comprising the following steps:

acquiring RGB images, depth images and thermal imaging images of multiple frames of the same human face in a uniform light supplementing environment;

marking the trusted region and the untrusted region of the acquired RGB image, depth image and thermal imaging image of the face to form an RGB mark data set, a depth mark data set and a thermal imaging mark data set;

preprocessing the marked data sets respectively;

extracting the face outline and the corresponding pixel position information in the trusted region from the preprocessed data set to serve as a forward learning data set; extracting the object outline and pixel position information of the unreliable region as a corresponding reverse learning data set;

constructing a deep learning network model, wherein the deep learning network model comprises three independent deep learning networks, namely an RGB network, a deep network and a thermal imaging network;

inputting an RGB image, an RGB mark data set, an RGB forward learning data set and an RGB reverse learning data set into an RGB network, performing depth training, and outputting RGB credibility;

the depth training process comprises the following steps: recognizing the position of a face region in an RGB image, performing two classifications on the acquired RBG image according to the learning of the RGB forward learning data set and the RGB reverse learning data set, and calculating the final RGB credibility as a binary label;

according to the depth training process, the depth image, the depth marking data set, the depth forward learning data set and the depth reverse learning data set are input into a depth network, and the depth credibility is output;

inputting the thermal imaging image, the thermal imaging marking data set, the thermal imaging forward learning data set and the thermal imaging reverse learning data set into a thermal imaging network, and outputting thermal imaging credibility;

when all the obtained three credibility values meet the correspondingly set credibility threshold values, filling RGB features extracted by an RGB network and temperature feature data extracted by a thermal imaging network into a three-dimensional face model; the three-dimensional face model is composed of three-dimensional space features extracted by a depth network.

Further preferably, the performing trusted region and untrusted region marking includes:

marking real face areas in the RGB image, the depth image and the thermal imaging image as trusted areas; and marking the mirror image projection area of the human face in the same image as an unreliable area to obtain an RGB mark data set, a depth mark data set and a thermal imaging mark data set.

Further preferably, the preprocessing is performed on the marked data sets respectively, including gray processing, binarization, pixel enhancement, filtering and ecology closing operation are sequentially performed on the marked RGB mark data set, the marked depth mark data set and the marked thermal imaging data set; three pre-processed data sets are formed.

Further preferably, the RGB network, the depth network and the thermal imaging network each comprise an input layer, an output layer, 8 convolution layers for feature extraction, 8 sampling layers for feature optimization selection, 2 full connection layers for feature representation, 1 excitation layer for fast convergence and 1 loss layer for calculating output and target loss values.

Further preferably, before the multi-frame face image is acquired in the uniform light supplementing environment, unified calibration is performed on all cameras, and the calibration process comprises the following steps:

all cameras shoot a metal cube with uniform temperature in the same acquisition space, and corner point calibration points are stuck on the metal cube;

the camera coordinates of all cameras are converted into world coordinates,

establishing a three-dimensional cube according to coordinates of corner calibration points in a depth image acquired in a depth camera;

and positioning and mapping the RGB image acquired by the RGB camera and the thermal imaging image acquired by the infrared camera on the established three-dimensional cube by taking the coordinates of the corner calibration points in the depth image as the reference.

The invention also provides a three-dimensional face imaging system based on thermal imaging projection, which comprises a inspection box, wherein a plurality of cameras are arranged in the inspection box, and the plurality of cameras comprise an RGB camera, a depth camera and a thermal imaging camera; each camera is connected with the central processing unit;

the RGB camera is used for a depth camera and a thermal imaging camera and is respectively used for acquiring RGB images, depth images and thermal imaging images of multiple frames of the same human face in a uniform light supplementing environment;

the central processing unit comprises an image preprocessing module and a deep learning network model;

the image preprocessing module is used for marking the acquired RGB image, depth image and thermal imaging image of the face in a trusted area and an untrusted area to form an RGB mark data set, a depth mark data set and a thermal imaging mark data set; preprocessing the marked data sets respectively; extracting the face outline and the corresponding pixel position information in the trusted region from the preprocessed data set to serve as a forward learning data set; extracting the object outline and pixel position information of the unreliable region as a corresponding reverse learning data set;

the deep learning network model comprises three independent deep learning networks, namely an RGB network, a deep network and a thermal imaging network;

inputting the depth image, the depth mark data set, the depth forward learning data set and the depth reverse learning data set into a depth network, performing depth training, and outputting depth credibility;

inputting the thermal imaging image, the thermal imaging marking data set, the thermal imaging forward learning data set and the thermal imaging reverse learning data set into a thermal imaging network for deep training, and outputting the thermal imaging credibility;

Further preferably, an arc-shaped panel is arranged in the inspection box, and a uniform light supplementing lamp is arranged behind the arc-shaped panel.

Further preferably, the RGB camera includes an RGB texture camera disposed at the center of the inspection box;

the depth camera comprises a left binocular camera and a right binocular camera which are arranged at the left side and the right side of the inspection box;

the thermal imaging cameras comprise left thermal imaging cameras and right thermal imaging cameras which are arranged on the left side and the right side of the inspection box;

wherein, each side camera of depth camera and thermal imaging camera all is equipped with gesture control mechanism.

The invention also provides electronic equipment, which comprises a processor and a memory, wherein the memory is used for storing computer program instructions, the processor is used for executing the computer program instructions stored in the memory, and when the program instructions are executed, the steps of the three-dimensional face imaging method based on thermal imaging projection are executed.

The present invention also provides a computer storage medium having stored thereon a computer program which, when executed, implements the steps of the three-dimensional face imaging method based on thermal imaging projection as described above.

According to the three-dimensional face imaging method and system based on thermal imaging projection, the acquired face images are comprehensively identified in a mode of combining a depth camera, an RGB camera and a thermal imaging camera with a corresponding deep learning network, actual face imaging results of the thermal imaging images reflected by the mirror surfaces are avoided, face imaging accuracy is improved, and success rate of face diagnosis is improved.

Drawings

Fig. 1 is a schematic flow chart of a three-dimensional face imaging method based on thermal imaging projection.

Fig. 2 is a schematic structural diagram of a three-dimensional face imaging system based on thermal imaging projection provided by the invention.

Fig. 3 is a network structure diagram of a depth recognition network provided by the present invention.

Fig. 4 is a schematic view of the external structure of the inspection box.

Fig. 5 is a schematic view of the internal structure of the inspection box.

In the figure: 1. inspection box; 2. an RGB camera; 5. a central processing unit; 301. a left binocular camera; 302. a right binocular camera; 401. a left thermal imaging camera; 402. a right thermal imaging camera.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

As shown in fig. 1-3, a three-dimensional face imaging method based on thermal imaging projection according to an embodiment of the present invention includes the following steps:

s1, acquiring a data set; acquiring RGB images, depth images and thermal imaging images of multiple frames of the same human face in a uniform light supplementing environment;

s2, marking a data set; marking the trusted region and the untrusted region of the acquired RGB image, depth image and thermal imaging image of the face to form an RGB mark data set, a depth mark data set and a thermal imaging mark data set;

further preferably, performing trusted region and untrusted region labeling includes:

S3, preprocessing a data set; preprocessing the marked data sets respectively;

s4, identifying by using the deep learning network model.

s5, judging whether the image is credible, if so, performing texture filling, and if not, re-acquiring the image and repeating the process.

When all the obtained three credibility values meet the set credibility threshold value, filling RGB features extracted by an RGB network and temperature feature data extracted by a thermal imaging network into a three-dimensional face model; the three-dimensional face model is composed of three-dimensional space features extracted by a depth network.

As shown in fig. 3, it is further preferred that the RGB network, the depth network, and the thermal imaging network each include an input layer, an output layer, 8 convolution layers for feature extraction, 8 sampling layers for feature optimization selection, 2 full connection layers for representing features, 1 excitation layer for rapid convergence, and 1 loss layer for calculating output and target loss values.

The excitation function is as follows:

wherein z is an input parameter of the convolutional layer; the excitation function indicates that when the parameters input by the input layer are extracted to the convolution layer, the convolution layer is activated to start extracting the characteristic parameters.

The application can adopt CIOU function as loss function for identifying the position of human face

Wherein, IOU represents the calculated value of the IOU loss function, and the area where the two boxes intersect/the area where the two boxes merge; wherein the two boxes are the true position box (typically indicated by a green box) and the predicted position box (typically indicated by a blue box) of the target, respectively. ρ represents the center distance of two frames, b represents the parameter of the predicted center coordinates, i.e. the center of the blue frame, b ^gt A parameter representing the center of the true target bounding box, i.e., the green box center point.

Alpha andthe aspect ratio of the true position frame selection area and the aspect ratio of the predicted position frame selection area are respectively, wherein w, h and w are expressed in the formula ^gt 、h ^gt Representing the height width of the predicted frame and the height width of the real frame, respectively.

For example, inputting an RGB image, an RGB mark data set, an RGB forward learning data set and an RGB reverse learning data set into an RGB network, performing depth training, and outputting RGB credibility;

and recognizing the position of a face region in the RGB image by using a CIOU function, performing two classifications on the acquired RBG image by using the deep learning network according to the learning of the RGB forward learning data set and the RGB reverse learning data set, namely judging the real face and judging the non-real face on the position of the recognized face region, marking the reliable region as 1 when the recognized face belongs to the real face, marking the non-real face as 0 when the recognized face belongs to the non-real face, and calculating the final RGB credibility by taking the reliable region as a binary label.

For example, using BCE loss function to calculate confidence

Wherein y is _i Is a binary tag (with a value of 0 or 1), and P (x) is the tag probability of output belonging to y. A lower LOSS value indicates a higher degree of confidence in the image.

Similarly, the depth credibility and the thermal imaging credibility can be calculated according to the above process, and finally when the obtained three credibility meet the set credibility threshold, namely, when the RGB credibility meets the RGB credibility threshold, the depth credibility meets the depth credibility threshold and the thermal imaging credibility meets the thermal imaging credibility threshold, mapping can be performed at this time, namely, RGB features extracted by an RGB network and temperature feature data extracted by a thermal imaging network are filled into the three-dimensional face model.

It should be noted that, before the multi-frame face image is acquired in the uniform light supplementing environment, the method further comprises the step of uniformly calibrating all cameras, and the calibration process comprises the following steps:

converting camera coordinates of all cameras into world coordinates;

The invention also provides a three-dimensional face imaging system based on thermal imaging projection, which comprises a inspection box 1, wherein a plurality of cameras are arranged in the inspection box, and the plurality of cameras comprise an RGB camera 2, a depth camera and a thermal imaging camera, as shown in fig. 5; as shown in fig. 2, each camera is connected to the central processing unit 5; the depth camera comprises a left binocular camera 301 and a right binocular camera 302 which are arranged at the left side and the right side of the inspection box;

the thermal imaging cameras comprise a left thermal imaging camera 401 and a right thermal imaging camera 402 which are arranged at the left side and the right side of the inspection box; wherein, every side camera of depth camera and thermal imaging camera all is equipped with gesture control mechanism.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. The three-dimensional face imaging method based on thermal imaging projection is characterized by comprising the following steps of:

the method comprises the steps of marking an acquired RGB image, depth image and thermal imaging image of a face with a trusted area and an untrusted area, wherein the marking of a real face area in the RGB image, the depth image and the thermal imaging image as a trusted area; marking the mirror image projection area of the face in the same image as an unreliable area to form an RGB mark data set, a depth mark data set and a thermal imaging mark data set;

preprocessing the marked data sets respectively;

2. The three-dimensional face imaging method based on thermal imaging projection according to claim 1, wherein preprocessing is performed on the marked data sets respectively, and comprises sequentially performing gray processing, binarization, pixel enhancement, filtering and ecological closure operation on the marked RGB mark data set, the depth mark data set and the thermal imaging mark data set; three pre-processed data sets are formed.

3. The three-dimensional face imaging method based on thermal imaging projection of claim 1, wherein each of the RGB, depth and thermal imaging networks comprises an input layer, an output layer, 8 convolution layers for feature extraction, 8 sampling layers for feature optimization selection, 2 full connection layers for representing features, 1 excitation layer for rapid convergence, and 1 loss layer for calculating output and target loss values.

4. The three-dimensional face imaging method based on thermal imaging projection of claim 1, further comprising uniformly calibrating all cameras before acquiring the multi-frame face image in the uniform light supplementing environment, wherein the calibration process comprises:

the camera coordinates of all cameras are converted into world coordinates,

5. The three-dimensional face imaging system based on thermal imaging projection is characterized by comprising a inspection box, wherein a plurality of cameras are arranged in the inspection box, and the plurality of cameras comprise an RGB camera, a depth camera and a thermal imaging camera; each camera is connected with the central processing unit;

the RGB camera, the depth camera and the thermal imaging camera are respectively used for acquiring RGB images, depth images and thermal imaging images of multiple frames of the same human face in a uniform light supplementing environment;

the system comprises an image preprocessing module and a deep learning network model;

the image preprocessing module is used for marking the acquired RGB image, depth image and thermal imaging image of the face with a trusted area and an untrusted area, and marking the real face area in the RGB image, the depth image and the thermal imaging image as a trusted area; marking the mirror image projection area of the face in the same image as an unreliable area to form an RGB mark data set, a depth mark data set and a thermal imaging mark data set; preprocessing the marked data sets respectively; extracting the face outline and the corresponding pixel position information in the trusted region from the preprocessed data set to serve as a forward learning data set; extracting the object outline and pixel position information of the unreliable region as a corresponding reverse learning data set;

inputting the RGB image, the RGB mark data set, the RGB forward learning data set and the RGB reverse learning data set into an RGB network, performing depth training, and outputting RGB credibility;

6. The three-dimensional face imaging system based on thermal imaging projection of claim 5, wherein an arc-shaped panel is arranged in the inspection box, and a uniform light filling lamp is arranged behind the arc-shaped panel.

7. The three-dimensional face imaging system based on thermal imaging projection of claim 5, wherein the RGB camera comprises an RGB texture camera centrally located in the inspection box;

wherein, each side the depth camera and the thermal imaging camera are provided with an attitude control mechanism.

8. An electronic device comprising a processor and a memory for storing computer program instructions, the processor for executing the computer program instructions stored in the memory, which when executed, perform the method steps of any of claims 1-4.

9. A computer storage medium, characterized in that it has stored thereon a computer program which, when executed, implements the method steps of any of claims 1 to 4.