CN113570701B

CN113570701B - Hair reconstruction method and device

Info

Publication number: CN113570701B
Application number: CN202110788408.XA
Authority: CN
Inventors: 陈春朋; 刘帅; 许瀚誉; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2023-10-24
Anticipated expiration: 2041-07-13
Also published as: CN113570701A

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a hair reconstruction method and device, which specifically comprise the steps of obtaining an RGB image and a depth image of a target object, extracting second pixel points in a hair area of the target object from the RGB image, and obtaining a plurality of candidate pixel points corresponding to the second pixel points in each first pixel point according to the mapping relation between the RGB image and the depth image; obtaining a target pixel point set according to a comparison result of the confidence coefficient of each of the plurality of candidate pixel points and a preset confidence coefficient threshold value of the region where the corresponding candidate pixel point is located; denoising the target pixel points in the target pixel point set to obtain a processed depth image, and reconstructing the hair of the target object according to the processed depth image and the RGB image, so that the sense of reality of the hair part of the human body model is improved, the calculation complexity is low, the reconstruction efficiency is high, and the requirement of real-time reconstruction is met.

Description

Hair reconstruction method and device

Technical Field

The application relates to the technical field of three-dimensional reconstruction, in particular to a hair reconstruction method and device.

Background

In the three-dimensional reconstruction process of the human body, firstly, acquisition data of the human body reconstruction are obtained from various sensors, then, the acquisition data are processed by using a three-dimensional reconstruction method, so that three-dimensional information of the human body is obtained, and a human body model is reconstructed. Wherein, the three-dimensional information of human body relates to shape, gesture, material data and the like. In recent years, with the continuous development of imaging technology, a visual three-dimensional reconstruction technology based on an RGBD camera has gradually become a research hotspot.

The reconstruction method can be classified into a real-time reconstruction method and an off-line reconstruction method. The off-line reconstruction method mostly adopts a mode of cooperatively collecting data by a birdcage type multi-camera, uses a multi-view three-dimensional matching or depth information fusion mode to reconstruct a human body model, and uses dozens or hundreds of RGB or RGBD cameras, so that the data collected by the multi-view can be fully fused and mutually complemented, and the point cloud data of a human body and hair with high precision can be obtained. The real-time reconstruction method can reconstruct a human body model at a speed of 20-30 fps under the capability of the existing network bandwidth, however, when the RGBD camera is used for collecting data, due to the influence of the material of the hair, enough real hair point cloud data cannot be obtained, and the reconstruction effect of the hair part is extremely poor.

At present, two methods for reconstructing hair mainly exist, namely, a deep learning algorithm is used for carrying out three-dimensional geometric estimation on a human body in an acquired two-dimensional image, because hair information in the two-dimensional image is rich, obvious data loss can hardly exist in hair geometric data estimated through the two-dimensional image, the reality of a human body model is strong, but reconstruction of the human body model is completed only through the two-dimensional image, algorithm complexity is high, execution efficiency is low, and the method is generally suitable for off-line reconstruction. Secondly, a hairstyle library is established in advance, the hair in the RGB image is identified, the hairstyle closest to the hair in the established hairstyle library is searched, the reconstruction efficiency is high, but the hairstyle library is required to be established in advance, and the hairstyle in the hairstyle library cannot be completely matched with the hairstyle of the acquisition object, so that the reality of the human body model is poor.

Disclosure of Invention

The embodiment of the application provides a hair reconstruction method and device, which are used for improving the sense of reality of hair parts in a human body model.

In a first aspect, an embodiment of the present application provides a hair restoration method, including:

acquiring an RGB image of a target object acquired by an RGB camera and a depth image of the target object, wherein the depth image is obtained by an infrared image acquired by an IR camera, and the confidence level of each first pixel point in the depth image is obtained according to the energy integral value of a reflected light wave of a corresponding phase angle received after light waves emitted by a plurality of phase angles irradiate the target object;

Extracting second pixel points in the hair region of the target object from the RGB image, and obtaining a plurality of candidate pixel points corresponding to the second pixel points in each first pixel point according to the mapping relation between the RGB image and the depth image;

obtaining a target pixel point set according to a comparison result of the confidence coefficient of each of the plurality of candidate pixel points and a preset confidence coefficient threshold value of the region where the corresponding candidate pixel point is located;

denoising the target pixel points in the target pixel point set to obtain a processed depth image, and reconstructing the hair of the target object according to the processed depth image and the RGB image.

In a first aspect, an embodiment of the present application provides a reconstruction device, including a communication interface, a display, a memory, and a processor;

the communication interface is connected with the processor and is configured to receive an RGB image of a target object acquired by an RGB camera and a depth image obtained by receiving an infrared image of the target object acquired by an IR camera, wherein the confidence level of each first pixel point in the depth image is obtained according to the energy integral value of reflected light waves of corresponding phase angles received after light waves emitted in a plurality of phase angles irradiate the target object;

The display is connected with the processor and is configured to display the reconstructed hair;

the memory is connected with the processor and is configured to store computer program instructions;

the processor is configured to perform the following operations in accordance with the computer program instructions:

In a third aspect, embodiments of the present application provide a computer readable storage medium storing computer executable instructions for causing a computer to perform the hair restoration method provided by the embodiments of the present application.

In the embodiment of the application, the IR camera generates an infrared image by utilizing light waves reflected after the light waves emitted by a plurality of phase angles irradiate the target object, the confidence of each pixel point in the infrared image is determined according to the energy integral value of the reflected light waves of the corresponding phase, a depth image is obtained, second pixel points in the hair area of the target object are extracted from the obtained RGB image, according to the mapping relation between the RGB image and the depth image, a plurality of candidate pixel points corresponding to each first pixel point and the second pixel point in the depth image can be obtained, and target pixel points meeting the confidence requirement are screened out from the plurality of candidate pixel points to obtain a target pixel point set, further, denoising processing is carried out on the target pixel points in the target pixel point set to obtain a processed depth image, and the hair of the target object is reconstructed based on the RGB image and the processed depth image.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 schematically illustrates a system architecture diagram provided by an embodiment of the present application;

fig. 2 schematically illustrates a method for measuring depth information according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for hair restoration according to an embodiment of the present application;

FIG. 4a schematically illustrates an RGB image of the hair of a target object provided by an embodiment of the present application;

FIG. 4b schematically illustrates a depth image of hair of a target object provided by an embodiment of the present application;

fig. 4c illustrates a point cloud image corresponding to a depth image of a hair of a target object according to an embodiment of the present application; FIG. 4d illustrates a point cloud image after denoising and complementation according to an embodiment of the present application;

FIG. 5 schematically illustrates a hair region of a segmentation target object according to an embodiment of the application;

FIG. 6 schematically illustrates a hair data completion process provided by an embodiment of the present application;

FIG. 7 is a flowchart illustrating a method for complete hair restoration according to an embodiment of the present application;

fig. 8 is a functional block diagram schematically illustrating a reconstruction device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, embodiments and advantages of the present application more apparent, an exemplary embodiment of the present application will be described more fully hereinafter with reference to the accompanying drawings in which exemplary embodiments of the application are shown, it being understood that the exemplary embodiments described are merely some, but not all, of the examples of the application.

Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the appended claims. Furthermore, while the present disclosure has been described in terms of an exemplary embodiment or embodiments, it should be understood that each aspect of the disclosure can be practiced separately from the other aspects.

The terms "first," second, "" third and the like in the description and in the claims and in the above drawings are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated (Unless otherwise indicated). It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

In order to clearly describe the embodiments of the present application, explanation is given below for terms in the present application.

1) TOF (abbreviation for Time of Flight, chinese "Time of Flight") camera: the depth image is formed by an infrared light emitter and an infrared camera (hereinafter referred to as an IR camera), wherein the infrared light emitter is used for emitting infrared light waves to a surrounding scene, the IR camera is used for capturing the infrared light waves reflected by a human body, and the distance (namely depth information) from the human body to the camera is calculated according to the reflected light waves, so that the depth image is obtained. The depth imaging error of a ToF camera grows linearly with acquisition distance.

2) RGB camera: RGB information of the surrounding environment is captured, and an RGB image is obtained.

3) Depth (RGBD) camera: a depth measurement function is added on the function of an RGB common camera and is used for collecting RGB images and depth images of a human body, wherein pixel values on the depth images represent the distance from the current visible human body to the camera, and the RGB images and the depth images can be mapped one by one. Because the depth information of the human body in the acquired depth image is relatively dense, the depth image can be used for reconstructing a dense geometrical model of the human body more conveniently. According to the imaging principle, depth cameras are mainly divided into three categories: binocular stereo matching cameras, structured light cameras, and ToF cameras. The depth imaging precision of the ToF camera is highest; the resolution ratio of the binocular stereo matching camera is higher, but the depth precision is lower; the depth imaging precision of the structured light camera is between the imaging precision and the depth value, the imaging precision and the depth value are in an exponentially increasing relation, and the power consumption is low. Depth imaging accuracy of depth cameras generally decreases with increasing acquisition distance.

4) Flying spot noise: in the process of generating a depth image by a ToF camera, a large number of wrong depth measurement values often exist at the edge of a human body, and after a 3D point cloud image is generated, invalid data points flying in the air are visually represented and are called flying spot noise. The main cause of flying spot noise generation is: in the IR camera, each pixel point has a certain physical size, when the depth value of the edge of a human body is measured, a single pixel point can simultaneously receive light rays reflected by the foreground and the background, and the energy generated by the foreground and the background is superimposed, so that the original data acquired by a sensor in the IR camera contains a plurality of distance information. In addition, lens scattering and inter-pixel crosstalk can also cause flying spot noise.

The following describes the design concept of the embodiment of the present application.

Fig. 1 schematically illustrates an application scenario provided by an embodiment of the present application; as shown in fig. 1, when the TOF camera 100 is imaging, an infrared light emitter (not shown in fig. 1) inside the TOF camera emits an infrared modulated light wave of 850nm or 940nm to the surrounding environment, and after receiving the light, the measured human body reflects the light wave into a photosensitive chip of the IR camera to obtain an infrared image. The IR camera calculates the distance (depth information) between the measured human body and the camera according to the time difference between the emitted light wave and the received reflected light wave, i.e. d=time difference t×light speed c, and obtains a depth image. The RGB camera 200 is used for capturing RGB images of a human body under test and providing texture data for human body reconstruction.

Because of the high wave speed of the light waves, it is difficult to directly and accurately measure the time from transmission to reception of the light waves. Fig. 2 schematically illustrates a method for measuring depth information according to an embodiment of the present application; as shown in fig. 2, the time of emitting modulated light waves by the infrared light emitter and the exposure time of the photosensitive chip in the IR camera are calculated by a Timer (Timer), so as to indirectly calculate the propagation time of the light waves and the distance from the human body to the camera.

As shown in fig. 1, there is depth information in each pixel point in the infrared image generated by the photosensitive chip of the IR camera, but due to the influence of the material, color, etc. of the clothing of the human body, each pixel point generates different degrees of amplitude attenuation to the light wave, the computing unit inside the IR camera estimates the confidence level of the corresponding pixel point according to the amplitude attenuation degree, and eliminates the noise point according to the comparison of the confidence level and the confidence level threshold value.

In the conventional ToF camera, since the scene is not fixed, when the confidence threshold is designed, a global threshold is mostly used to filter possible noise points, so that data of some specific scenes (such as hair areas) are missing, and the reconstructed human body model lacks realism. In the process of three-dimensional reconstruction of human body, the sense of reality is the key for determining the quality of three-dimensional reconstruction of human body.

In the field of real-time three-dimensional reconstruction, a hair part of a reconstructed model often has large-area data loss, and the main reasons are as follows: firstly, the hair is made of black reflective materials, and has the light absorption characteristics of dark objects and the reflective characteristics of reflective objects; secondly, the existing TOF camera does not consider semantic information of an actual human body in imaging, a global confidence threshold is used for filtering when a camera chip senses light, and the confidence threshold is low due to light absorption and light reflection of hair parts, so that a large amount of data loss can occur. And the hair is positioned close to the face, so that the user can pay attention to the hair relative to other parts, and if the hair is absent, the aesthetic feeling and the authenticity of the human body reconstruction can be seriously affected.

Based on the analysis, the embodiment of the application provides a hair reconstruction method and device, because the three-dimensional reconstruction of a human body is performed, the use scene of a ToF camera is determined, based on the prior condition, a hair region of a target object is identified based on an RGB image of the target object acquired by the RGB camera, the mapping relation between the RGB image and an IR image obtained by calibrating the RGB camera is utilized to determine the corresponding hair pixel points of the hair region in a depth image, confidence threshold values of the pixel points in different head regions are set in the ToF camera, a target pixel point set for reconstructing the hair of the target object is obtained according to the comparison of the confidence values of the hair pixel points in the depth image and the confidence threshold values, the denoised depth image is obtained by denoising the target pixel points in the target pixel point set, and the hair of the target object is reconstructed according to the RGB image and the denoised depth image. Further, in order to ensure the reconstruction quality of hair, data complement is performed on the hair cavity in the denoised depth image, so that the data integrity of the hair part is improved, and the authenticity of the human body model is further improved.

FIG. 3 is a flow chart illustrating a method for hair restoration according to an embodiment of the present application; as shown in fig. 3, the method is performed by the reconstruction device and mainly comprises the following steps:

s301: an RGB image of a target object and a depth image of the target object are acquired.

In this step, the reconstruction device acquires the RGB image of the target object acquired by the RGB camera, as shown in fig. 4a, where the RGB image contains abundant human semantic information and has little data loss, and may be used to obtain the hair region of the target object and provide texture data for model reconstruction.

In S301, the reconstruction device may further acquire a depth image, as shown in fig. 4b, where the depth image is obtained from an infrared image of the target object acquired by the IR camera, and the confidence level of each first pixel point in the depth image is obtained according to energy integration values of reflected light waves of corresponding phase angles received after light waves emitted at a plurality of phase angles are irradiated to the target object. The process of obtaining the depth image by using the infrared image may be performed by an IR camera or by a reconstruction device, and may be specifically set according to a processor configuration of an actual device.

In the implementation, a light source of an infrared light emitter in the TOF camera is modulated, so that the infrared light emitter emits light waves with the frequency f to a target object according to a plurality of phase angles, and after the light waves irradiate the target object, the target object reflects the reflected light waves with corresponding phase angles to a photosensitive chip of the IR camera according to the light reflection principle, so that an infrared image is generated. And, the IR camera integrates the reflected light waves of a plurality of phase angles respectively to obtain the energy integrated value of the reflected light waves of the corresponding phase angles. The IR camera determines a phase difference of the emitted light wave and the reflected light wave from energy integrated values of the reflected light wave at a plurality of phase angles. Optionally, the energy integral values of the reflected light waves with the multiple phase angles may be respectively denoted as r0, r1, r2, and r3, and the calculation formulas of the phase differences are as follows:

Further, the IR camera determines depth information of each pixel point in the infrared according to the phase difference, the wave velocity (c= 299792458 m/s) of the light wave and the frequency f, and the calculation formula is as follows:

and according to the energy integral value of the reflected light wave according to four phase angles, the confidence degree of each pixel point in the infrared image can be determined, and the calculation formula is as follows:

the lower the reflectivity of the target object is, the smaller the A value is, namely, the lower the confidence is, which indicates that the degree of data deletion is more serious.

A depth image is generated based on the depth information and the confidence level for each pixel point in the infrared image. Specifically, if the confidence coefficient of the pixel point is greater than or equal to a preset threshold value, indicating that the depth information of the pixel point is effective, and the pixel value of the pixel point is a depth value; if the confidence coefficient of the pixel point is smaller than a preset threshold value, the depth information of the pixel point is invalid, and invalid pixel points in the depth image of the pixel point are indicated, so that the depth image corresponding to the infrared image is generated. Wherein, the invalid pixel point is a data hole in the visual representation of the depth image.

In the embodiment of the application, in order to improve the effectiveness of depth data, an infrared light emitter is set as a multi-frequency infrared emitter, the multi-frequency infrared emitter alternately emits sinusoidal light waves with a plurality of modulation frequencies to a target object according to four phase angles of 0 degrees, 90 degrees, 180 degrees and 270 degrees, and the reflected light waves with a plurality of modulation frequencies reflected by the target object are weighted according to preset frequency weights by aiming at the sinusoidal light waves with a plurality of modulation frequencies emitted by each phase angle, so that an energy integral value obtained after the reflected light waves with a plurality of modulation frequencies of corresponding phase angles are weighted is obtained. And calculating the depth information and the confidence of each pixel point according to the weighted four energy integration values. Alternatively, the modulation frequency of the sine wave in the embodiment of the present application may be set to 20MHZ, 60MHZ, 80MHZ, 100MHZ.

S302: and extracting second pixel points in the hair region of the target object from the RGB image, and obtaining a plurality of candidate pixel points corresponding to the second pixel points in each first pixel point according to the mapping relation between the RGB image and the depth image.

In the step, real-time human semantic segmentation is carried out on the acquired RGB image, the hair region of the target object is identified, and the second pixel point in the hair region is extracted.

In the embodiment of the application, the algorithm for segmenting the hair region in real time based on the RGB image is not limited, and includes but is not limited to an HLNet algorithm and a hair-segment algorithm. In the hair-segment algorithm, the motion of a person in a video stream is considered to have space-time consistency, i.e. the hair positions in the previous frame and the current frame cannot be changed drastically.

In S302, according to the prior knowledge of the hair-segment algorithm, the reconstruction device fuses the result of the segmentation mask in the previous RGB image into the three color channels of the currently acquired RGB image to form four channel data, and based on the fused four channel data, segments the hair region of the target object by using the deep neural network algorithm, where the segmentation process is shown in fig. 5.

In S302, when the hair region is segmented based on the obtained RGB image, the mask information of the hair in the previous RGB image is included in addition to the RGB three-channel color information in the current RGB image, so that the mask position of the hair in the current RGB image can be rapidly predicted based on the mask information of the hair in the previous RGB image, thereby completing the real-time segmentation. Based on experimental data, the reconstruction device can realize hair segmentation of 30-100fps by using the method, and the real-time performance is high.

The second pixel point in the RGB image cannot correspond to the first pixel point in the depth image due to the different resolutions of the IR camera and the RGB camera, and the different viewing angle ranges. Therefore, in the embodiment of the application, the RGB camera and the IR camera are calibrated in advance to obtain the internal parameters of the IR camera, the internal parameters of the RGB camera, the external parameter conversion matrix of the IR camera coordinate system and the RGB camera coordinate system, and the mapping relation between the second pixel point in the RGB image and the second pixel point in the depth image is determined through the three parameters. Based on the mapping relation, the reconstruction device obtains a plurality of candidate pixel points corresponding to the second pixel point in each first pixel point.

S303: and obtaining a target pixel point set according to the comparison result of the confidence coefficient of each of the plurality of candidate pixel points and the preset confidence coefficient threshold value of the region where the corresponding candidate pixel point is located.

In the embodiment of the application, the confidence threshold value of the pixel points in different hair areas in the depth image is set according to the mapping relation between the second pixel points of the hair areas in the RGB image and the first pixel points in the depth image. Setting a first confidence threshold for the middle hair region and the hair region near the face to preserve rich hair data; and setting a second confidence threshold for the hair edge area, and filtering possible flying spot noise. The first confidence coefficient threshold value and the second confidence coefficient threshold value can be set according to actual requirements, and the first confidence coefficient threshold value is smaller than the second confidence coefficient threshold value.

In S303, for any one candidate pixel point i of the plurality of candidate pixel points, the reconstruction device determines a hair region where the candidate pixel point i is located according to coordinates of the candidate pixel point i, compares the confidence coefficient a of the candidate pixel point i with the confidence coefficient threshold dA of the hair region where the candidate pixel point i is located, if a is greater than dA, indicates that the candidate pixel point is valid hair data, determines the candidate pixel point i as a target pixel point until all the candidate pixel points are traversed, and then obtains a target pixel point set.

The point cloud image corresponding to the depth image after the target pixel point set is screened is shown in fig. 4c, and as shown in fig. 4c, flying spot noise generated by invalid hair data exists in the depth image, and denoising processing is needed.

S304: denoising the target pixel points in the target pixel point set to obtain a processed depth image, and reconstructing the hair of the target object according to the processed depth image and the RGB image.

In this step, since the confidence threshold set in the hair area is low, in addition to retaining enough valid hair data, more noise occurs in the hair area due to the low confidence threshold. In order to ensure the quality of the depth image, denoising is carried out on target pixel points in the target pixel point set, so that noise of the hair area is restrained.

The noise in the hair area is mainly due to insufficient energy of reflected light wave received by the IR camera, so the noise in the hair area is basically flying spot noise, and methods for removing the flying spot noise include but are not limited to statistical filtering, gaussian filtering, bilateral filtering and the like. The algorithm idea of the statistical filtering is as follows: a statistical analysis is performed on the neighborhood of each point, and some points which do not meet certain standards are removed. The following describes the flying spot noise removal process by taking statistical filtering as an example.

In S304, it is assumed that the average distance between the target pixel and the pixels in the neighborhood is subject to gaussian distribution, and the shape is determined by the mean and standard deviation. For each target pixel point j in the target pixel point set, determining the average distance between the target pixel point j and the pixel points in the first preset adjacent area, judging whether the average distance corresponding to the target pixel point j is in the first preset distance interval, if not, indicating that the target pixel point j is likely to be flying spot noise, and eliminating the target pixel point j from the target pixel point set. And obtaining the depth image after denoising after traversing the target pixel point set.

After the denoised depth image is obtained, in S304, reconstruction data such as hair surface geometry, hair surface direction, etc. are extracted based on the depth image, and texture data of hair including color values of pixel points in the hair region are extracted from the RGB image, and a hair model of the target object is reconstructed based on the extracted reconstruction data and texture data.

Considering the material and color problems of the hair, the authenticity of the three-dimensional reconstruction human body model can be influenced, human body semantic information in the RGB image is fully utilized, the human body is subjected to hair segmentation, a plurality of second pixel points in the hair region are extracted based on the segmentation result, a plurality of candidate pixel points corresponding to the second pixel points in the depth image are obtained based on the mapping relation between the RGB image and the depth image, the target pixel points are screened out from the plurality of candidate pixel points through the confidence threshold of the set hair region, and flying spot noise in the target pixel points is removed, so that the quality of the depth image is improved, and the authenticity of hair reconstruction is further improved. The embodiment of the application uses the lightweight image segmentation and filtering algorithm, has small calculated amount, consumes less time than 10ms in a single frame algorithm, can ensure that the reconstruction equipment can finish the reconstruction of the hair at the speed of 30fps, and has strong instantaneity.

In some embodiments, after removing the flying spot noise in the target pixel set, the depth image may have more data holes in visual appearance, which is caused by the missing hair data, so that the authenticity of the reconstructed hair is lower. The application mainly aims at hair reconstruction, mainly aims at whether the data of the hair part look true or smooth, and can infer the data with the cavity area according to the existing hair data for the missing hair data so as to complement the hair data. Fig. 4d illustrates a point cloud image obtained by complementing the hair data in the depth image after denoising according to the embodiment of the present application, where the hair data illustrated in fig. 4d is more complete than the point cloud image illustrated in fig. 4 c. Further, the data of the hair after the completion is smoothed, so that the reconstructed hair is complete, smooth and real.

In the implementation, after eliminating the target pixel points with average distance not in the first preset distance interval from the target pixel point set, determining the pixel points in the hair region missing in the target pixel point set according to the second pixel points and the target pixel points in the target pixel point set.

For example, the second target pixel points are { z1, z2, z3, z4,..100 } for a total of 100, while the target pixel points in the target pixel point set are { z1', z3', z6', z7',..99 '} for a total of 60, so that the missing pixel points can be obtained for a total of 40 { z2', z4', z5',.}.

Further, the target pixel point set is complemented according to the pixel points in the second preset neighborhood of the missing pixel points. Specifically, for each missing pixel point, counting the number of pixel points with depth information in a second preset adjacent area of the pixel point, if the counted number is larger than a preset threshold value, carrying out weighted calculation on the pixel points with the depth information according to the set pixel weight to obtain the depth information of the missing pixel point, and storing the pixel points into a target pixel point set. The depth information of the missing pixel points is as follows:

wherein d _{Tonifying device} Representing depth information of missing pixels, n being the number of pixels with depth information in the second preset neighborhood, and w _i The pixel weight corresponding to the ith pixel point, d _i Is the depth information of the i-th pixel point.

For example, assume that the second preset neighborhood is a neighborhood of 8 pixels nearest to the missing pixel k, the firstThe 8 pixel points in the two preset adjacent areas are respectively marked as k1-k8, if depth information exists in more than 2 pixel points (k 1, k3 and k 6) in the 8 pixel points, the missing pixel point k can be complemented based on three pixel points of k1, k3 and k6, and the depth information of the missing pixel point k after the complementation is d _k ＝w _k1 d _k1 +w _k3 d _k3 +w _k6 d _k6 。

In the above embodiment of the present application, by using the mapping relationship between the RGB image and the depth image, which first pixels in the depth image are valid hair pixels and which are missing pixels (holes) of the hair area are determined by using the second pixels of the hair area in the RGB image, and the missing pixels in the hair area are complemented with the mapping relationship as a priori knowledge, so as to obtain complete hair data, thereby improving the integrity and the authenticity of the reconstructed hair.

It should be noted that, when the number of pixels with depth information in the second preset neighborhood of the missing pixel k is smaller than the preset threshold, the missing pixel k is stored into the to-be-complemented queue, and when the number of pixels with depth information in the second preset neighborhood meets the threshold requirement, the depth information is complemented.

As shown in fig. 6, the open circles indicate missing pixels, the filled circles indicate pixels having depth information (i.e., valid hair data), only two valid pixels are in the second preset area of the missing pixel B (indicated by a thick solid line), and the two valid pixels do not satisfy the completion condition, so that the missing pixels are stored in the to-be-completed queue, and three valid pixels are in the second preset area of the missing pixel a (indicated by a thin solid line), so that the missing pixels satisfy the completion condition, so that the missing pixels can be completed according to the three valid pixels, the completed pixel a can be used as valid pixels in the second preset vicinity of the pixel B, and at this time, the missing pixel B satisfies the completion condition, so that the missing pixel B can be completed.

In the embodiment of the application, the setting modes of the pixel weights mainly comprise the following two modes:

mode one

And determining the pixel weight corresponding to the pixel point with the depth information according to the first distance between the pixel point with the depth information in the second preset adjacent area and the missing pixel point, wherein the pixel weight is positively correlated with the first distance.

Still taking fig. 6 as an example, when the missing pixel point a is completed, the pixel point C in the second preset neighboring area is closest to the missing pixel point a, and the pixel point E is farthest from the missing pixel point E, the pixel weight corresponding to the pixel point C, D, E is set to be w _c ＞w _D ＞w _E 。

Mode two

And setting corresponding pixel weights for original target pixel points or the complemented target pixel points in the target pixel point set according to the pixel points with depth information in the second preset neighborhood, wherein the pixel weights of the original target pixel points are larger than those of the complemented target pixel points.

Still taking fig. 6 as an example, when the missing pixel point B is completed, the pixel point F, G in the second preset neighboring area is the original effective data (i.e., the original target pixel point) and the pixel point a is the completed target pixel point, and then the pixel weight of the pixel point a is set to be smaller than the pixel weight of the pixel point F, G.

In some embodiments, after the completed target pixel set is obtained by completing all the missing pixels, since the depth information of the missing pixels is weighted by the depth information of the adjacent pixels, the depth information of the target pixels after partial completion is not smooth, and a gaussian smoothing algorithm can be used to smooth the target pixels after completion, so that the reconstructed hair is smoother and more real.

In other embodiments, when filtering flying spot noise in S304, the euclidean distance between the surrounding adjacent points is taken as a basis for removing, which is a global noise filtering method, and there may be residual noise. In order to better remove noise, a clean depth image is obtained, and human body priori information can be added: the hair is not too far from the face.

In the implementation, when the second pixel points in the hair region are extracted from the RGB image, the face pixel points can be extracted from the RGB image, the face pixel points corresponding to the face pixel points in the depth image are determined according to the mapping relation between the RGB image and the depth image, further, for the first pixel point corresponding to each second pixel point, a second distance between the first pixel point and the nearest face pixel point is determined, if the determined second distance is not in the second preset distance interval, the first pixel point is indicated to be noise, and the first pixel point is removed from the target pixel point set, so that the noise of the hair part is further suppressed.

Based on the target pixel point set after noise elimination, a clean depth image is obtained, based on the clean depth image, reconstruction data such as hair surface geometry, hair surface direction and the like are extracted, and the hair model of a target object is reconstructed by combining the texture data of the hair extracted from the RGB image, so that the authenticity of the hair is improved.

The embodiment of fig. 7 shows a flowchart of a method for reconstructing complete hair according to an embodiment of the present application, as shown in fig. 7, mainly including the following steps:

s701: the RGB camera captures an RGB image of the target object.

In the step, the RGB image contains abundant human semantic information, the hair region of the target object can be obtained by segmentation based on the RGB image, and the RGB image also contains the texture data of the hair.

S702: the infrared light emitter alternately emits sinusoidal light waves of a plurality of modulation frequencies to the target object at four phase angles.

In this step, the magnitudes of the four phase angles and the magnitudes of the modulation frequency are referred to in the previous embodiment and are not repeated here.

S703: and the IR camera receives the reflected light waves of a plurality of modulation frequencies corresponding to the four phase angles reflected by the target object, and an infrared image is obtained.

S704: the IR camera weights the reflected light waves of a plurality of modulation frequencies according to preset frequency weights, and respectively obtains energy integral values of the reflected light waves of four phases.

A detailed description of this step is referred to S30 1 and is not repeated here.

S705: the IR camera determines depth information and confidence of each pixel point in the infrared image based on energy integral values of reflected light waves of four phases, and obtains a depth image.

In this step, the calculation of the depth information and the confidence of each pixel point is referred to as S30 1, and is not repeated here.

S706: the reconstruction device acquires an RGB image and a depth image.

S707: the reconstruction device extracts second pixel points in the hair region of the target object and face pixel points in the face region from the RGB image.

In this step, the division of the face area is similar to the division method of the hair area, see S302, and is not repeated here.

S708: the reconstruction device obtains an initial candidate pixel point set corresponding to the second pixel point in each first pixel point in the depth image and a reference pixel point set corresponding to the face pixel point in the RGB image in each first pixel point in the depth image according to the mapping relation between the RGB image and the depth image.

A detailed description of this step is referred to S302 and is not repeated here.

S709: the reconstruction device determines a second distance between an initial candidate pixel point in the initial candidate pixel point set and a nearest reference pixel point in the reference pixel point set.

In the step, according to the prior information of the human body: the hair is not far from the face, a second distance between the initial candidate pixel point and the reference pixel point can be determined, and invalid initial candidate pixel points are eliminated based on the second distance, namely invalid hair data are eliminated.

S710: the reconstruction device determines whether the second distance is within a second preset distance interval, if so, then S711 is performed, otherwise the initial candidate pixel point is culled.

In the step, whether the initial candidate pixel point is noise is determined according to whether the second distance is in a second preset distance range, if so, the hair data indicating that the initial candidate pixel point is effective should be reserved, and if not, the hair data indicating that the initial candidate pixel point is ineffective should be removed.

S711: the reconstruction device reserves the initial candidate pixel points to obtain a target candidate pixel point set.

S712: the reconstruction device determines whether the respective confidence levels of the target candidate pixels in the target candidate pixel set are greater than a preset confidence threshold value of the region where the corresponding target candidate pixel is located, and if so, executes S713, otherwise, eliminates the target candidate pixel.

In this step, according to the mapping relationship between the second pixel point of the hair region in the RGB image and the first pixel point in the depth image, the confidence threshold of the pixel points in different hair regions in the depth image is set, and the specific setting mode is referred to S303, and is not repeated here.

S713: the reconstruction device reserves target candidate pixel points to obtain a target pixel point set.

S714: the reconstruction device performs denoising processing on the target pixel points in the target pixel point set to obtain a denoised depth image.

A detailed description of this step is referred to S304 and is not repeated here.

S715: and the reconstruction equipment complements the pixel points with the missing hair part in the depth image according to the target pixel point set and the extracted second pixel points, and obtains the complemented depth image.

A detailed description of this step is referred to the previous embodiments and will not be repeated here.

S716: the reconstruction device extracts the reconstruction data of the hair from the complemented depth image, extracts the texture data of the hair from the RGB image, and reconstructs the hair of the target object according to the extracted reconstruction data and texture data.

It should be noted that, the reconstruction device in the embodiment of the present application includes, but is not limited to, a terminal with a man-machine interaction function, such as a smart television, a VR/AR device, a smart phone, a notebook computer, and the like.

Based on the same technical conception, the embodiment of the application provides a reconstruction device which can realize the hair reconstruction method in the previous embodiment and can achieve the same technical effects.

Referring to fig. 8, the reconstruction device comprises a communication interface 801, a display 802, a memory 803, and a processor 804, wherein the communication interface 801, the display 802, and the memory 803 are respectively connected with the processor 804 through a bus (indicated by a double-headed arrow in fig. 8), the communication interface 801 is configured to receive an RGB image of a target object acquired by an RGB camera, and a depth image obtained by receiving an infrared image of the target object acquired by an IR camera, wherein a confidence level of each first pixel point in the depth image is obtained according to an energy integral value of a reflected light wave of a corresponding phase angle received after light waves emitted at a plurality of phase angles are irradiated to the target object; the display 802 is configured to display reconstructed hair; the memory 803 is configured to store computer program instructions; the processor 804 is configured to perform the following operations in accordance with the computer program instructions:

obtaining a target pixel point set according to the comparison result of the confidence coefficient of each of the plurality of candidate pixel points and the preset confidence coefficient threshold value of the region where the corresponding candidate pixel point is located;

Optionally, when the light wave is a sinusoidal light wave alternately emitted by light waves with a plurality of modulation frequencies, the energy integral value of the reflected light wave with a corresponding phase angle is an energy integral value obtained by weighting the reflected light waves with a plurality of modulation frequencies reflected by the target object according to a preset frequency weight after the sinusoidal light waves with a plurality of modulation frequencies are alternately emitted to the target object.

Optionally, the processor 804 is specifically configured to:

for each target pixel point in the target pixel point set, determining the average distance between the target pixel point and the pixel point in the first preset adjacent area;

And if the average distance corresponding to the target pixel point is not in the first preset distance interval, eliminating the target pixel point from the target pixel set.

Optionally, the processor 804 is further configured to:

determining the pixel points in the hair region which are missing in the target pixel point set according to the second pixel points and the target pixel points in the target pixel point set;

and according to the pixel points in the second preset neighborhood of the missing pixel points, complementing the target pixel point set.

Optionally, the processor 804 is specifically configured to:

if the number of the pixels with depth information in the pixels in the second preset neighborhood is larger than a preset threshold, weighting calculation is carried out on the pixels with the depth information according to the set pixel weight, the depth information of the missing pixels is obtained, and the depth information is stored in the target pixel set.

Optionally, the processor 804 sets the pixel weights by:

determining a pixel weight corresponding to the pixel point with depth information according to a first distance between the pixel point with the depth information and the missing pixel point, wherein the pixel weight is positively correlated with the first distance; or alternatively

And setting corresponding pixel weights according to whether the pixel points with depth information are original target pixel points or completed target pixel points in the target pixel point set, wherein the pixel weights of the original target pixel points are larger than those of the completed target pixel points.

Optionally, the processor 804 is further configured to:

face pixel points are extracted from the RGB image, and corresponding face pixel points of the face pixel points in the depth image are determined according to the mapping relation between the RGB image and the depth image;

and determining a second distance between the first pixel point corresponding to the second pixel point and the nearest face pixel point, and eliminating the first pixel point of which the second distance is not in a second preset distance interval.

Embodiments of the present application also provide a computer readable storage medium storing instructions that, when executed, perform the method of the previous embodiments.

The embodiment of the application also provides a computer program product for storing a computer program for executing the method of the previous embodiment.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of hair restoration, comprising:

acquiring an RGB image of a target object acquired by an RGB camera and a depth image of the target object, wherein the depth image is obtained by an infrared image acquired by an IR camera, the confidence level of each first pixel point in the depth image is obtained according to the energy integral value of a reflected light wave of a corresponding phase angle received after light waves emitted by a plurality of phase angles are irradiated to the target object, and when the light waves are sinusoidal light waves alternately emitted by light waves of a plurality of modulation frequencies, the energy integral value of the reflected light wave of the corresponding phase angle is an energy integral value obtained by weighting the reflected light waves of a plurality of modulation frequencies reflected by the target object according to preset frequency weights after the sinusoidal light waves of a plurality of modulation frequencies are alternately emitted to the target object;

2. The method of claim 1, wherein denoising the target pixel points in the set of target pixel points comprises:

and if the average distance corresponding to the target pixel point is not in the first preset distance interval, eliminating the target pixel point from the target pixel point set.

3. The method of claim 2, further comprising, after eliminating target pixels from the set of target pixels that have an average distance that is not within a first predetermined distance interval:

4. The method of claim 3, wherein the complementing the set of target pixels according to pixels within a second preset neighborhood of missing pixels comprises:

5. The method of claim 4, wherein the pixel weights are set by:

6. The method of claim 1, wherein the method further comprises:

7. A reconstruction device, comprising a communication interface, a display, a memory, and a processor;

the communication interface is connected with the processor and is configured to receive an RGB image of a target object acquired by an RGB camera and a depth image obtained by receiving an infrared image of the target object acquired by an IR camera, wherein the confidence level of each first pixel point in the depth image is obtained according to an energy integral value of a reflected light wave of a corresponding phase angle received after light waves emitted by a plurality of phase angles are irradiated to the target object, and when the light waves are sinusoidal light waves emitted by light waves of a plurality of modulation frequencies alternately, the energy integral value of the reflected light wave of the corresponding phase angle is energy integral value obtained by weighting the reflected light waves of a plurality of modulation frequencies reflected by the target object according to preset frequency weights after the sinusoidal light waves of a plurality of modulation frequencies are emitted to the target object alternately;

8. The reconstruction device of claim 7, wherein the denoising the target pixel point in the set of target pixel points comprises: