CN113362450B

CN113362450B - Three-dimensional reconstruction method, device and system

Info

Publication number: CN113362450B
Application number: CN202110612037.XA
Authority: CN
Inventors: 刘帅; 任子健; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2023-01-03
Anticipated expiration: 2041-06-02
Also published as: CN113362450A

Abstract

The application relates to a remote three-dimensional communication technology, and provides a three-dimensional reconstruction method, a device and a system, wherein by utilizing the characteristic that a high-resolution view is not needed in an abnormal human eye motion state, a rendering display terminal acquires human eye visual parameters through an eyeball tracking device, detects the human eye motion frequency in real time according to the human eye visual parameters, adjusts the rendering parameters and sends a control instruction carrying the rendering parameters when the motion frequency is greater than a preset frequency threshold, the transmission terminal receives the control instruction, sends corresponding three-dimensional reconstruction data to the rendering display terminal according to the transmission control parameters corresponding to the motion type indicated by the rendering parameters so as to reduce the data volume required by the reconstruction of a three-dimensional model, the rendering display terminal renders and displays the three-dimensional model of a current frame according to the three-dimensional reconstruction data with reduced data volume, and reduces the data volume required by the reconstruction of the three-dimensional model by adjusting the rendering parameters, thereby reducing the data transmission pressure, reducing the transmission time delay and further improving the rendering efficiency of the model.

Description

Three-dimensional reconstruction method, device and system

Technical Field

The present application relates to the field of remote three-dimensional communication technologies, and in particular, to a three-dimensional reconstruction method, apparatus, and system.

Background

In the remote three-dimensional communication interaction system, aiming at the three-dimensional reconstruction of a human body model, firstly, acquired data of the model reconstruction are required to be obtained from various sensors, and then, the acquired data are processed by using a three-dimensional reconstruction method, so that the human body three-dimensional model is reconstructed. Wherein the reconstruction of the three-dimensional model of the human body involves shape, pose and material data.

Immersive rendering of terminals such as Virtual Reality (VR) and Augmented Reality (AR) often requires a high-precision three-dimensional model. At present, an optical scanner (for example, a visible structure optical scanner or a laser scanner) still needs to be used in a static three-dimensional reconstruction method with higher precision, and such methods require that an acquired object remains still for several seconds or even minutes in the whole scanning process, high-precision three-dimensional scanning information of multiple angles is spliced, and a high-precision human body static three-dimensional model is finally reconstructed. However, the method based on the optical scanner has inherent defects, such as difficulty in reconstructing a dynamic three-dimensional model of a human body (the scanning time is long and the object to be acquired needs to be still), requirement of a person with professional knowledge for scanning operation, relatively high price of the scanner, and the like. In addition, the high-precision three-dimensional model means larger data volume, the transmission time is longer under the capacity of the existing network bandwidth, and the rendering time delay of VR and AR terminals is increased.

With the continuous development of imaging technology, the quality and efficiency of three-dimensional reconstruction are further improved by the appearance of RGBD cameras and the proposal and optimization of binocular stereo matching algorithm, and the method can be used for dynamic three-dimensional reconstruction. While a single RGBD camera can only acquire color information (RGB image) or depth information (RGBD image) of a certain View angle of a current scene, a Multi-View (camera) acquisition system can acquire two-dimensional color information or depth information of each View angle of an object, and a high-precision three-dimensional model is dynamically reconstructed by using a Multi-View Stereo (MVS) algorithm or depth information fusion method and other methods, but the Multi-View construction is complex, multi-camera calibration and data fusion need to be performed, and the realization difficulty is high.

Disclosure of Invention

The application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device and a three-dimensional reconstruction system, which are used for reducing the data volume of three-dimensional reconstruction and improving the rendering efficiency of dynamic three-dimensional reconstruction.

In a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method applied to a rendering display terminal, including:

acquiring human eye visual parameters, and detecting human eye movement frequency according to the human eye visual parameters;

if the motion frequency is larger than a preset frequency threshold, adjusting a rendering parameter and sending a control instruction carrying the rendering parameter, wherein the rendering parameter is used for indicating the motion type of the human eyes, and the control instruction is used for indicating that corresponding three-dimensional reconstruction data are sent according to a transmission control parameter corresponding to the motion type so as to reduce the data volume required by the reconstruction of the three-dimensional model; receiving the three-dimensional reconstruction data with the reduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume;

and if the motion frequency is less than or equal to a preset frequency threshold, directly receiving the three-dimensional reconstruction data with the unreduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the unreduced data volume.

In a second aspect, an embodiment of the present application provides a three-dimensional reconstruction method, including:

if the motion frequency is greater than a preset frequency threshold, receiving a control instruction, wherein the control instruction carries a rendering parameter, and the rendering parameter is used for indicating the motion type of human eyes; sending corresponding three-dimensional reconstruction data according to the transmission control parameters corresponding to the motion types to reduce the data volume required by the reconstruction of the three-dimensional model, so that the rendering display terminal reconstructs the three-dimensional model according to the three-dimensional reconstruction data with the reduced data volume and displays the three-dimensional model; the rendering display terminal detects that the human eye movement frequency is larger than a preset frequency threshold value according to the obtained human eye visual parameters and then adjusts the rendering parameters;

and if the motion frequency is less than or equal to the preset frequency threshold, directly transmitting the three-dimensional reconstruction data with the unreduced data volume, and rendering and displaying the three-dimensional model by the rendering and displaying terminal according to the three-dimensional reconstruction data with the unreduced data volume.

In a third aspect, an embodiment of the present application provides a rendering display terminal, including an eyeball tracking apparatus, a display, a memory, and a processor:

the eyeball tracking device is connected with the processing device and is configured to acquire human eye vision parameters;

the display, coupled to the processor, configured to display a three-dimensional model;

the memory, coupled to the processor, configured to store computer program instructions;

the processor configured to perform the following operations in accordance with the computer program instructions:

if the motion frequency is larger than a preset frequency threshold, adjusting a rendering parameter, and sending a control instruction carrying the rendering parameter, wherein the rendering parameter is used for indicating the motion type of the human eyes, and the control instruction is used for indicating that corresponding three-dimensional reconstruction data are sent according to a transmission control parameter corresponding to the motion type so as to reduce the data volume required by three-dimensional model reconstruction; receiving the three-dimensional reconstruction data with the reduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume;

and if the motion frequency is less than or equal to a preset frequency threshold, directly receiving the three-dimensional reconstruction data with unreduced data volume, rendering the three-dimensional model of the current frame according to the three-dimensional reconstruction data with unreduced data volume, and displaying the three-dimensional model.

In a fourth aspect, an embodiment of the present application provides a transmission terminal, including a memory and a processor:

if the motion frequency is larger than a preset frequency threshold value, receiving a control instruction, wherein the control instruction carries a rendering parameter, and the rendering parameter is used for indicating the motion type of human eyes; sending corresponding three-dimensional reconstruction data according to the transmission control parameters corresponding to the motion types to reduce the data volume required by the reconstruction of the three-dimensional model, so that the rendering display terminal reconstructs the three-dimensional model according to the three-dimensional reconstruction data with the reduced data volume and displays the three-dimensional model; the rendering display terminal detects that the human eye movement frequency is larger than a preset frequency threshold value according to the obtained human eye visual parameters and then adjusts the rendering parameters;

and if the motion frequency is less than or equal to a preset frequency threshold, directly transmitting the three-dimensional reconstruction data with unreduced data volume, and rendering and displaying the three-dimensional model by the rendering and displaying terminal according to the three-dimensional reconstruction data with unreduced data volume.

In a fifth aspect, the present application provides a three-dimensional reconstruction system, including an acquisition terminal, a transmission terminal, and a rendering display terminal;

the acquisition terminal is used for acquiring a depth image and a color image, extracting three-dimensional reconstruction data from the acquired depth image and the corresponding color image, reconstructing a three-dimensional model according to the extracted three-dimensional reconstruction data, and sending the three-dimensional reconstruction data corresponding to the three-dimensional model to the transmission terminal;

the transmission terminal is used for receiving the three-dimensional reconstruction data sent by the acquisition terminal; if the motion frequency is larger than a preset frequency threshold value, receiving a control instruction which is sent by the rendering display terminal and carries rendering parameters, wherein the rendering parameters are used for indicating the motion type of human eyes; sending corresponding three-dimensional reconstruction data according to the transmission control parameters corresponding to the motion types so as to reduce the data volume required by the reconstruction of the three-dimensional model; if the motion frequency is less than or equal to a preset frequency threshold, directly transmitting three-dimensional reconstruction data with unreduced data volume;

the rendering display terminal is used for acquiring human eye visual parameters and detecting human eye movement frequency according to the human eye visual parameters; if the motion frequency is larger than a preset frequency threshold, adjusting rendering parameters and sending a control instruction carrying the rendering parameters; receiving the three-dimensional reconstruction data with the reduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume; and if the motion frequency is less than or equal to a preset frequency threshold, directly receiving the three-dimensional reconstruction data with the unreduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the unreduced data volume.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are used to enable a computer to execute the three-dimensional reconstruction method provided in the embodiment of the present application.

In the embodiment of the application, the rendering display terminal detects the human eye frequency in real time according to the obtained human eye visual parameters, when the motion frequency is greater than a preset frequency threshold value, the rendering parameters are adjusted, a rendering parameter control instruction is sent to carry, the transmission terminal receives the control instruction, three-dimensional reconstruction data with reduced data volume are sent to the rendering display terminal according to the transmission control parameters corresponding to the motion type indicated by the rendering parameters, and the three-dimensional reconstruction data with reduced data volume of the rendering display terminal renders a three-dimensional model and displays the three-dimensional model; and when the motion frequency is less than or equal to the preset frequency threshold, the rendering display terminal directly performs rendering and display of the three-dimensional model according to the three-dimensional reconstruction data with the unreduced data volume transmitted by the transmission terminal. On one hand, rendering parameters are adjusted in a self-adaptive mode by utilizing the characteristic that a high-resolution view is not needed in the abnormal motion state of human eyes, and a control instruction is sent to reduce the data volume required by the reconstruction of the three-dimensional model, so that the data transmission pressure is reduced, the transmission time delay is reduced, and the rendering efficiency of the model is improved; on the other hand, the accuracy of the rendered and displayed model is matched with the visual state of human eyes, the data volume required by three-dimensional reconstruction is reduced to render a low-accuracy three-dimensional model in the abnormal motion state of the human eyes, the rendering efficiency is improved, and the high-accuracy three-dimensional model is rendered according to the three-dimensional reconstruction data with the unreduced data volume in the normal motion state of the human eyes, so that the human eyes can see the clear three-dimensional model.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 schematically illustrates a three-dimensional reconstruction process in the related art provided by an embodiment of the present application;

fig. 2 schematically illustrates a three-dimensional reconstruction system architecture diagram provided by an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an application scenario provided by an embodiment of the present application;

fig. 4a is a flowchart illustrating a three-dimensional reconstruction method at a rendering display terminal side according to an embodiment of the present application; fig. 4b is a flowchart illustrating a three-dimensional reconstruction method at the rendering display terminal side according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating EAR detection whether an eye blink or not according to an embodiment of the present application;

fig. 6 is a flowchart illustrating a three-dimensional reconstruction method at the transmission terminal side according to an embodiment of the present application;

fig. 7 is a schematic diagram illustrating a process of a three-dimensional reconstruction method provided by an embodiment of the present application;

fig. 8 is a flowchart illustrating a complete three-dimensional reconstruction method provided by an embodiment of the present application;

fig. 9 is a functional block diagram schematically illustrating a rendering display terminal according to an embodiment of the present disclosure;

fig. 10 is a functional block diagram of a transmission terminal according to an embodiment of the present disclosure.

Fig. 11 is a diagram illustrating an example of a hardware structure of a rendering display terminal according to an embodiment of the present application;

fig. 12 is a diagram illustrating an exemplary hardware structure of a transmission terminal according to an embodiment of the present application.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without making any inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The core technology of the remote three-dimensional communication system relates to a real-time three-dimensional reconstruction technology, a three-dimensional data coding, decoding and transmission technology, an immersive VR/AR display technology and the like. Referring to fig. 1, a three-dimensional reconstruction scheme of a current remote three-dimensional communication system is that an acquisition end is responsible for acquiring model data including a color image (RGB image) and a depth image (RGBD image), performing three-dimensional reconstruction according to the acquired model data, sending the three-dimensional reconstruction data of a reconstructed model to a transmission end, performing data encoding by the transmission end and transmitting the data to a rendering display terminal, the rendering display end receives and decodes the three-dimensional reconstruction data, and rendering and displaying a person model and a object model in a three-dimensional scene according to the decoded three-dimensional reconstruction data.

At present, the fundamental challenges based on VR and AR telecommunication technologies are: presenting the reconstructed three-dimensional model at the high resolution required for high immersion poses high requirements on the rendering engine and data transmission. For a user, a good tele-immersive experience requires a low latency, high frame rate, high image quality rendering effect.

When using VR or AR head-mounted equipment to carry out long-range three-dimensional communication, to the real-time three-dimensional reconstruction of personage, the model precision influences visual experience, but the precision is higher simultaneously, and the data volume of model also can be higher, and transmission technology has important influence to the precision and the model formation of image of dynamic three-dimensional reconstruction.

For example, taking the model data amount of 30 frames transmitted per second in the existing network as an example, the transmission bitrate required by the model with the resolution of 192 × 128 is 256mbps, and the transmission bitrate required by the model with the resolution of 384 × 384 is 1120Mbps. Therefore, the larger the data volume of the model is, the longer the time delay of cloud transmission is, so that the rendering display end cannot update the three-dimensional model in real time, and the user experience is reduced.

For a rendering display terminal such as VR or AR, since it is near-eye display, visual characteristics such as eye structure and motion can be considered. In terms of the visual characteristics of human eyes, in daily life, the eyeballs of people and many animals frequently rotate to search, watch and track interested targets, and the development of an eyeball tracking device makes interaction possible by means of a fixation point and an eyeball movement mode.

The eye movement consists of four systems, namely saccadic movement (beating, eye jumping), following movement, collective movement and rotational movement. Saccadic Movement (Saccadic Movement) refers to the co-directional rapid Movement of an eyeball at a speed of 500 degrees/second or more when the eyeball shifts the line of sight from one target to another in the visual field, in order to allow a new target to be projected rapidly onto the fovea of the macula of the eye, the maximum speed can reach 900 degrees/second. Typically, about 3 glances per second can be performed, each of which can be up to 20-200 milliseconds in duration, which is longer than the duration required by a VR, etc. rendering display terminal to render each frame of image. This rapid reflex eye movement is not changed by the subjective will, and can indirectly represent the muscle strength. Saccades are often described as "ballistic" movements, meaning that once "fired", their trajectory cannot be changed, and in terms of eye tracking, are generally considered to be "effectively blind" during a saccade, without the need for high resolution views.

Typically, the human eye does not move quickly from one fixation target to another without tracking a moving target or focusing on a point, i.e. there is no saccadic motion. Physically, at the moment when the eyeball rotates rapidly, the image of the seen object sweeps across the surface of the retina rapidly, and the content seen by the human eye should theoretically be a blurred picture, but in reality, the object seen by the human eye is still clear and recognizable. That is, the human brain ingeniously deals with the visual physical and biological contradiction, a phenomenon called Saccadic Suppression (Saccadic supression).

Based on the above analysis, the embodiments of the present application provide a three-dimensional reconstruction method, apparatus, and system. Because human eyes do not need to receive high-resolution information in abnormal motion states such as winks and saccades, the data size transmitted by the model can be adjusted according to the visual state of the human eyes. Specifically, according to the embodiment of the application, the eyeball tracking device in the rendering display terminal such as the VR or the AR is used for obtaining the human eye visual parameters, when the fact that the human eye movement frequency is larger than the preset frequency threshold value is detected, the human eye is in an abnormal operation state, the rendering parameters can be adjusted in real time, a control command is sent according to the adjusted rendering parameters, after the transmission end receives the control command, corresponding three-dimensional reconstruction data is sent according to the transmission control parameters corresponding to the movement type indicated by the rendering parameters so as to reduce the data volume required by three-dimensional model reconstruction, cloud transmission pressure is reduced, rendering delay is reduced, rendering efficiency is improved, user experience is improved, when the fact that the human eye movement frequency is smaller than or equal to the preset frequency threshold value is detected, the human eye is in a normal movement state, the rendering display terminal directly reconstructs and displays the three-dimensional model according to the three-dimensional reconstruction data of which the data volume transmitted by the transmission end is not reduced, and therefore the human eyes can see clear three-dimensional model.

In the embodiment of the present application, the visual state (e.g., saccade, blink, etc.) when the human eye movement frequency is greater than the preset frequency threshold is referred to as an abnormal movement state, and the visual state (e.g., observation target, tracking target, etc.) when the human eye movement frequency is equal to or less than the preset frequency threshold is referred to as a normal visual state. And in the normal motion state, the three-dimensional reconstruction data is transmitted in real time without reducing the data volume in order to ensure the model precision.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 2 schematically illustrates an architecture diagram of a three-dimensional reconstruction system provided in an embodiment of the present application. As shown in fig. 2, the system includes an acquisition terminal 201, a transmission terminal 202, and a rendering display terminal 203.

The acquisition terminal 201 is configured to acquire image data including a depth image (RGBD image) and a color image (RGB image), extract three-dimensional reconstruction data from the acquired RGBD image and the corresponding RGB image, reconstruct a three-dimensional model according to the extracted three-dimensional reconstruction data, and send the three-dimensional reconstruction data corresponding to the three-dimensional model to the transmission terminal 202.

The transmission terminal 202 is configured to receive the three-dimensional reconstruction data sent by the acquisition terminal 201; if the motion frequency is greater than the preset frequency threshold, receiving a control instruction which is sent by the rendering display terminal 203 and carries rendering parameters, wherein the rendering parameters are used for indicating the motion type of human eyes, and sending corresponding three-dimensional reconstruction data to the rendering display terminal 203 according to transmission control parameters corresponding to the motion type so as to reduce the data volume required by the reconstruction of the three-dimensional model; if the motion frequency is less than or equal to the preset frequency threshold, the three-dimensional reconstruction data with the unreduced data volume is directly transmitted to the rendering display terminal 203. The transmission terminal 202 may be a cloud server.

The rendering display terminal 203 is used for acquiring human eye visual parameters and detecting human eye movement frequency according to the human eye visual parameters; if the motion frequency is larger than a preset frequency threshold value, adjusting rendering parameters and sending a control instruction carrying the rendering parameters; receiving the three-dimensional reconstruction data with the reduced data volume, and rendering a three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume; and if the motion frequency is less than or equal to the preset frequency threshold, directly receiving the three-dimensional reconstruction data with the unreduced data volume, and rendering and displaying the three-dimensional model of the current frame by the three-dimensional reconstruction data with the unreduced data volume. Rendering display terminal 203 may be a smart television, smart phone, VR or AR head mounted display device of a specific interactive function.

It should be noted that, in some embodiments, the acquisition terminal may be responsible for acquiring only image data, and the model reconstruction process may be performed by the transmission terminal, so as to reduce the requirement on the calculation performance of the acquisition terminal.

In the embodiment of the application, the system architecture shown in fig. 2 may be deployed according to different usage scenarios, for example, in a live broadcast scenario, the anchor terminal sets the acquisition terminal of the system, the user terminal sets the rendering display terminal of the system, and the user may view a three-dimensional model through the rendering display terminal to experience the immersion of surface-to-surface interaction in the virtual world; for example, in a conference scene, two conference rooms of a teleconference are simultaneously provided with the acquisition terminal and the rendering display terminal of the system, so that real-time remote three-dimensional communication in the two conference rooms is performed.

Based on the system architecture shown in fig. 2, fig. 3 exemplarily shows an application scenario diagram provided by the embodiment of the present application. As shown in fig. 3, the user terminals 1 to 4 perform real-time remote three-dimensional communication, and the user terminals 1 to 4 are respectively provided with a collection terminal and a rendering terminal. The system comprises a collecting terminal, a rendering display terminal and a display terminal, wherein the collecting terminal comprises a depth camera (RGBD camera) and a main station or a workstation, and the rendering display terminal comprises all or part of a smart television, a smart phone, a VR or AR head display. In the remote three-dimensional communication process, the three-dimensional reconstruction data of the user side 1 can be uploaded to the cloud server, the rendering display terminals from the user side 2 to the user side 4 download the three-dimensional reconstruction data of the user side 1 from the cloud server, and synchronously display the three-dimensional human body model according to the downloaded three-dimensional reconstruction data, and similarly, the user side 1, the user side 3 and the user side 4 can also synchronously display the three-dimensional human body model of the user side 2, and so on.

It should be noted that fig. 3 is only an example of remote three-dimensional communication of multiple persons, and the number of user ends for remote three-dimensional communication is not limited in the embodiment of the present application.

Fig. 4a exemplarily shows a flowchart of a three-dimensional reconstruction method at a rendering display terminal side according to an embodiment of the present application, where the flowchart is mainly executed by the rendering display terminal in a three-dimensional reconstruction system, and mainly includes the following steps:

s401: and acquiring human eye visual parameters, and detecting the human eye movement frequency according to the human eye visual parameters.

In the step, human eye visual parameters are obtained through an eyeball tracking device, and the human eye visual parameters comprise at least one item of eye aspect ratio and fixation point position information. The gaze point position information may be determined according to at least one of an eyeball coordinate, a gaze direction, and a gaze depth.

In S401, the human eye moves in various ways, for example, the human eye includes movements such as saccades, blinks, and browses, and the movement frequency of the human eye movement has different monitoring modes.

For example, when the motion frequency includes a blink frequency, the blink frequency is detected from the eye aspect ratio; when the moving frequency includes a panning frequency, the panning frequency of the human eye is detected based on the gazing point position information.

In an embodiment of the application, the eye tracking device detects whether the eye blinking frequency is detected. The eyeball tracking device can directly output the undetected pupils, and can judge whether the human eyes are in the non-blinking or eye-closing state when the pupils are not detected, so as to output the corresponding state; in general, eye blink detection may be determined by using an Eye Aspect Ratio (EAR), and whether an Eye is open or closed (i.e., whether an Eye blinks) may be determined by calculating a value of the EAR, and further, a frequency of Eye blinking may be detected by counting the number of times the Eye is open and closed within a preset time period. The calculation formula of EAR is as follows:

wherein p is ₁ To p ₆ As shown in fig. 5, the length-width ratio (indicated by two solid lines with waiting arrows in fig. 5) of the eyes is different when the eyes open and close during blinking, the distance of the eye feature point in the vertical direction is calculated in the numerator, and the distance of the eye feature point in the horizontal direction is calculated in the denominator. Since there is only one set of horizontal points and two sets of vertical points, the denominator is multiplied by 2 to ensure that the weights of the two sets of feature points are the same.

After the eye feature points are extracted, a deep learning algorithm, such as a Support Vector Machine (SVM), may be combined to detect whether to blink and count the blink frequency.

In an embodiment of the present application, the moving speed of the point of regard (i.e., saccadic frequency) is detected by an eye tracking device. In one embodiment, according to the eyeball tracking device, eyeball coordinates are obtained, the moving speed of the fixation point is determined according to the moving distance of the eyeball coordinates in a preset time period, and the saccade frequency is obtained and output. In another embodiment, the gaze direction is acquired according to the eyeball tracking device, the moving speed of the fixation point is determined according to the rotation angle of the gaze direction within a preset time period, and the saccade frequency is obtained and output.

S402: and determining whether the motion frequency is greater than a preset frequency threshold, if so, executing S403, otherwise, executing S405.

In the step, after determining a motion frequency corresponding to the motion of the human eye, comparing the motion frequency with a set frequency threshold, if the motion frequency is greater than the preset frequency threshold, it is indicated that the human eye may be in an abnormal motion state such as saccade, blink, and the like, and a high-resolution model view is not needed, and the data amount required for three-dimensional reconstruction can be reduced, executing S403, if the motion frequency is less than or equal to the preset frequency threshold, it is indicated that the human eye may be in a normal motion state such as carefully observing, tracking a target, and the like, and executing S405 in order to ensure that the human eye sees a clear model, and it is not needed to reduce the data amount required for three-dimensional reconstruction.

It should be noted that the preset frequency thresholds corresponding to different eye movements are different, for example, a saccade movement corresponds to a first frequency threshold, a blink movement corresponds to a second frequency threshold, the second preset threshold is smaller than the first preset threshold, and the sizes of the first frequency threshold and the second frequency threshold may be set according to actual situations.

S403: and adjusting the rendering parameters and sending a control instruction carrying the rendering parameters.

In the step, human eye visual parameters are obtained through the eyeball tracking device, and rendering parameters used for indicating human eye movement types are adjusted according to the human eye visual parameters, preset rendering parameters and preset adjusting thresholds, wherein the preset rendering parameters are rendering parameters in a normal visual state. The adjustment formula is as follows:

ρ _t (α，β)＝T+K(α _t +β _t ) Equation 2

Where ρ is _t (alpha, beta) is a rendering parameter after T frame adjustment, T is a preset adjustment threshold, K is a preset rendering parameter, and alpha _t Eye aspect ratio, beta, for t frames _t The gaze point movement state for t frames.

And after the rendering parameters are adjusted, sending a control instruction carrying the rendering parameters, wherein the control instruction is used for indicating that corresponding three-dimensional reconstruction data are sent according to the transmission control parameters corresponding to the motion types so as to reduce the data volume required by the reconstruction of the three-dimensional model, thereby reducing the transmission pressure of the cloud and further improving the rendering display efficiency of the rendering display terminal.

The control instruction can be sent to the transmission terminal and can also be sent to the acquisition terminal, and the control instruction can be specifically set according to the performance of the equipment.

S404: and receiving the three-dimensional reconstruction data with the reduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume.

In this step, the three-dimensional reconstruction data includes geometric data (vertex coordinates, vertex normal, vertex indices, patch indices, and the like) and texture data (vertex color values), and the three-dimensional reconstruction data is used to reconstruct a three-dimensional model. And the rendering display terminal receives the three-dimensional reconstruction data with the reduced data volume, and renders and displays the three-dimensional model according to the received three-dimensional reconstruction data.

S405: and directly receiving the three-dimensional reconstruction data with the unreduced data volume, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the unreduced data volume.

In the step, the motion frequency is less than or equal to the preset frequency threshold, human eyes are in a normal motion state, and in order to improve user experience, it is required to ensure that the displayed three-dimensional model has higher precision (i.e. higher resolution), and at this time, the data volume of the three-dimensional reconstruction data does not need to be reduced by adjusting rendering parameters. The acquisition terminal directly extracts three-dimensional reconstruction data with unreduced data volume from the acquired image to reconstruct the model, and sends the three-dimensional reconstruction data with unreduced data volume corresponding to the three-dimensional model to the rendering display terminal through the cloud in real time. And rendering and displaying the high-precision three-dimensional model by the rendering and displaying terminal according to the three-dimensional reconstruction data with unreduced data volume so as to improve the immersion sense in the user interaction process.

Optionally, in some embodiments, when it is detected that the motion frequency is greater than the preset frequency threshold, after the three-dimensional reconstruction data with a reduced data amount is received, the following steps may be further performed, with reference to fig. 4b:

s4041: and predicting the three-dimensional reconstruction data of the next frame of the current frame according to the three-dimensional reconstruction data after the data volume is reduced and the three-dimensional reconstruction data of the previous N frames adjacent to the current frame.

In this step, N is an integer of 1 or more. To further improve rendering efficiency, predictions may be made from the three-dimensional reconstructed data. In the interaction process, the vertex data of the human body model can be linearly changed within the preset time, and the three-dimensional reconstruction data of the next frame can be predicted according to the linear change rule. The linear change rule can be determined according to vertex data in the three-dimensional reconstruction data of the current frame and the adjacent previous N frames.

For example, according to the three-dimensional reconstruction data of the t frame and the three-dimensional reconstruction data of the previous t-3, t-2 and t-1 frames, the three-dimensional reconstruction data of the t +1 frame is predicted, and the analogy is repeated, and the pin insertion processing is carried out on the predicted three-dimensional reconstruction data to smooth the image of the three-dimensional model.

It should be noted that the prediction and frame interpolation process can be processed according to the generation rule and the motion rule of the three-dimensional model.

S4042: and rendering the three-dimensional model of the next frame according to the predicted three-dimensional reconstruction data.

In the step, after the three-dimensional reconstruction data of the next frame is obtained, the three-dimensional model is rendered directly according to the predicted three-dimensional slave building data, data do not need to be obtained from a transmission terminal, transmission time delay is reduced, and rendering efficiency is improved.

In the embodiment of the application, in the prediction process, the rendering display terminal detects the human eye movement frequency in real time according to the human eye visual parameters, and if the movement frequency is detected to be recovered to be less than or equal to the preset frequency threshold, the three-dimensional reconstruction data corresponding to the next frame is received from the transmission terminal, and the received three-dimensional reconstruction data is used for replacing the predicted three-dimensional reconstruction data, so that the resolution of the model view is ensured.

It should be noted that, in the embodiment of the present application, the rendering display terminal may send the control instruction to the transmission terminal, where the transmission terminal controls transmission of the three-dimensional reconstruction data, and may also send the control instruction to the acquisition terminal, where the acquisition terminal controls transmission of the three-dimensional reconstruction data.

Taking the transmission terminal receiving a control instruction to reduce the data volume of the three-dimensional model as an example, fig. 6 exemplarily shows a flowchart of the three-dimensional reconstruction method at the transmission terminal side provided in the embodiment of the present application. The process mainly comprises the following steps:

s601: and receiving a control instruction, wherein the control instruction carries a rendering parameter used for indicating the motion type of the human eyes.

In the step, when the motion frequency is greater than a preset frequency threshold, the transmission terminal receives a control instruction sent by the rendering display terminal, wherein the control instruction carries rendering parameters, and the rendering parameters are used for indicating the motion type of human eyes. The rendering parameters are adjusted after the rendering display terminal detects that the human eye movement frequency is larger than the preset frequency threshold according to the obtained human eye visual parameters, and the specific process is referred to as S401-S403, and is not repeated here.

S602: and sending corresponding three-dimensional reconstruction data according to the transmission control parameters corresponding to the motion types to reduce the data volume required by the reconstruction of the three-dimensional model, so that the rendering display terminal renders and displays the three-dimensional model according to the three-dimensional reconstruction data with the reduced data volume.

In this step, the rendering parameter is used to indicate a type of motion of the human eye, wherein the type of motion is divisible according to the eye movement speed. For example, the types of motion include a quick glance, a regular glance, a quick blink, a regular blink, and the like. There is a correspondence between the type of motion and the transmission control parameters, as shown in table 1.

TABLE 1 correspondence between rendering parameters, abnormal eye movement types, and transfer control parameters

Rendering parameters	Abnormal eye movement pattern	Transmission control parameters
			ρ _t1	A	SG ₁
ρ _t2	B	SG ₁
			ρ _t3	C	SG ₃
…	…	…

The corresponding relationship between the motion type and the transmission control parameter may be a linear relationship or a nonlinear relationship.

In S602, after receiving the control instruction, the transmission terminal sends corresponding three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type indicated by the control parameter, so as to reduce the data amount required for reconstructing the three-dimensional model.

For example, when the motion type is blinking rapidly, the transmission terminal stops sending the three-dimensional reconstruction data to the rendering display terminal according to the first transmission control parameter, at this time, the rendering display terminal renders the three-dimensional model of the current frame according to the three-dimensional reconstruction data of the previous frame and displays the three-dimensional model, or the rendering display terminal predicts the three-dimensional reconstruction data of the current frame according to the three-dimensional reconstruction data of the previous N frames and renders the three-dimensional model of the current frame according to the predicted three-dimensional reconstruction data and displays the three-dimensional model.

For another example, when the motion type is fast scan, the corresponding second transmission control parameter, and the transmission terminal sends the three-dimensional reconstruction data with the model resolution being the first resolution to the rendering display terminal according to the second transmission control parameter, and at this time, the three-dimensional reconstruction data with the first resolution of the rendering display terminal renders and displays the three-dimensional model. The first resolution is smaller than the preset model resolution, the preset model resolution can be the model resolution in the normal visual state, and the smaller the resolution is, the smaller the data volume of the three-dimensional reconstruction is.

At present, the geometric expression of a real-time human dynamic three-dimensional reconstruction model is mainly based on a symbolic Distance Function (TSDF), in a TSDF expression mode, a three-dimensional surface of a real scene is usually represented as an isosurface with a zero symbolic Distance Function value in a three-dimensional space, a Function value corresponding to a free space other than the real surface is positive and is in direct proportion to the Distance from a Function sampling point to the real surface, and a Function value in a scene occupied space (a space surrounded by the real surface) is negative and is in inverse proportion to the Distance from the Function sampling point to the real surface.

During model reconstruction, a cube is often created in a virtual space, then the cube is uniformly divided in three directions of an X axis, a Y axis and a Z axis, the divided small cube is called a voxel (voxel), the number of voxels in each direction is the resolution of the voxel space in the direction, and the central point of each voxel is a function sampling point and is used as a sparse sampling point of a continuous symbolic distance function in the virtual space. According to the geometric model expressed by the TSDF, vertex data is extracted by adopting a Marching Cubes (MC) algorithm, and after the vertex data is extracted, detail Level (LOD) processing is carried out on the geometric model.

The LOD technology can be used for simplifying the model, drawing important details with high quality according to the importance degree of the model, drawing unimportant details with low quality, fully keeping the sharp features of the model and the geometric features of the model after simplification, accelerating the real-time drawing of the model under the condition of not losing the sharp features of the graph by selecting a proper LOD pyramid, and improving the operational capability of the system. Generally, the evaluation of the importance of the model includes: distance criteria (model-to-observer distance), size criteria (size of model), culling criteria (whether model is visible), and the like.

In the embodiment of the application, the importance degree of the model can be determined according to the information (rendering parameters) fed back by the visual state of human eyes, so that the model is simplified according to the importance degree of the model to reduce the transmitted data volume. When the human eye movement frequency is detected to be larger than the preset frequency threshold, the simplification degree of the model is higher, and when the human eye movement frequency is detected to be smaller than or equal to the preset frequency threshold, the model with lower simplification degree or not simplified is adopted for reconstruction, so that the purpose of rendering acceleration is achieved.

In S602, a model detail level corresponding to the transmission control parameter may be determined according to the pre-constructed LOD pyramid, the received three-dimensional reconstruction data may be down-sampled according to the model detail level, and the down-sampled three-dimensional reconstruction data may be sent to the rendering display terminal. The larger the detail level of the model is, the smaller the resolution of the model is, and the smaller the data volume of the model is.

The embodiment of the present application does not have a limiting requirement on the down-sampling method, and the method includes, but is not limited to, geometric element deletion method, region merging method, and vertex clustering method.

For example, when the motion type is normal saccade or normal blinking, the corresponding third transmission control parameter determines that the level of detail of the model corresponding to the third transmission control parameter is 1 according to a pre-constructed LOD pyramid, the transmission terminal performs down-sampling on the three-dimensional reconstruction data sent by the acquisition terminal in real time to obtain the three-dimensional reconstruction data of the second resolution, sends the three-dimensional reconstruction data of the second resolution to the rendering display terminal, and at the moment, the three-dimensional reconstruction data of the second resolution of the rendering display terminal renders the three-dimensional model and displays the three-dimensional model. Wherein the second resolution is greater than the first resolution.

S603: and directly transmitting the three-dimensional reconstruction data with the unreduced data volume, so that the rendering display terminal renders and displays the three-dimensional model according to the three-dimensional reconstruction data with the unreduced data volume.

In this step, when the motion frequency is less than or equal to the preset frequency threshold, in order to improve the user experience, it is necessary to ensure that the displayed three-dimensional model has higher precision (i.e., higher resolution), and at this time, it is not necessary to reduce the data volume of the three-dimensional reconstruction data. The acquisition terminal extracts three-dimensional reconstruction data from the acquired images in real time to reconstruct the model, and uploads the three-dimensional reconstruction data corresponding to the three-dimensional model to the transmission terminal in real time. The transmission terminal sends the three-dimensional reconstruction data with the unreduced data volume uploaded by the acquisition terminal in real time to the rendering display terminal, and the rendering display terminal renders and displays the high-precision three-dimensional model according to the three-dimensional reconstruction data with the unreduced data volume so as to improve the immersion sense in the user interaction process.

It should be noted that, when there are multiple three-dimensional models in a scene, the rendering display terminal may sequentially render according to the order of the models, or may render according to the priority of the models.

In the above-mentioned embodiment of the application, on the one hand, utilize the characteristic that the human eye does not need the high resolution view under the unusual motion state, render display terminal and detect human eye motion frequency according to the human eye visual parameter that eyeball tracking device obtained, when detecting that the motion frequency is greater than preset threshold, indicate that the human eye is in the unusual motion state, real-time self-adaptation adjustment renders the parameter, render the parameter transmission for transmission terminal after will adjusting through control command, transmission control parameter reduction three-dimensional model reconstruction required data volume is reduced according to the transmission control parameter that the motion type that renders the parameter instruction corresponds, thereby reduce high in the clouds transmission pressure, reduce and render delay, promote the telecommunication and experience. On the other hand, the three-dimensional reconstruction data transmitted by the embodiment of the application is matched with the visual state of human eyes, the three-dimensional reconstruction data with the unreduced data volume is transmitted in the normal motion state, the human eyes can see clear three-dimensional models, and the human eyes do not need high-resolution model views in the abnormal motion state, so that the three-dimensional models with the reduced data volume are transmitted, and the rendering efficiency is improved.

In the embodiment of the application, the three-dimensional reconstruction process of the rendering display end and the transmission end is shown in fig. 7, the acquisition end acquires an RGBD image and an RGB image, three-dimensional reconstruction is performed according to three-dimensional reconstruction data extracted from the acquired image, and the three-dimensional reconstruction data is sent to the transmission end in real time. The rendering display end obtains human eye visual parameters through the installed eyeball tracking device, detects human eye movement frequency, adjusts the rendering parameters in real time according to the human eye movement frequency, and sends the rendering parameters to the transmission end through a control instruction after adjustment. And the transmission end adaptively controls the transmission of the three-dimensional reconstruction data according to the three-dimensional reconstruction parameters corresponding to the motion type indicated by the rendering parameters, encodes the three-dimensional reconstruction parameters and then transmits the encoded three-dimensional reconstruction data to the rendering display end. And the rendering display end performs rendering and display of the three-dimensional model according to the received three-dimensional reconstruction data and the adjusted rendering parameters.

Taking an interaction process of a rendering display terminal and a transmission terminal as an example, fig. 8 exemplarily outputs a flowchart of a three-dimensional reconstruction method in a complete remote interaction process provided by the embodiment of the present application. As shown in fig. 8, the method mainly includes the following steps:

s801: the acquisition terminal acquires the RGBD image and the RGB image in real time and extracts three-dimensional reconstruction data from the RGBD image and the RGB image.

In the step, the three-dimensional reconstruction data comprises vertex data, surface patch data, texture data and the like, and the acquisition terminal acquires the RGBD image and the RGB image in real time and carries out denoising processing. And (4) segmenting the foreground background of the RGBD image to obtain a clean human body model, and extracting vertex data, surface patch data and the like of the geometric model. Three-dimensional reconstructed texture data is extracted from the RGB image.

S802: and the acquisition terminal reconstructs the three-dimensional model according to the extracted three-dimensional reconstruction data and sends the three-dimensional reconstruction data corresponding to the three-dimensional model to the cloud server.

In the step, the acquisition terminal matches the extracted vertex data with vertex data corresponding to the parameterized human body model to obtain a complete human body inner layer model, carries out texture mapping on the human body inner layer model according to the extracted texture data to obtain a human body surface dense model, and sends three-dimensional reconstruction data corresponding to the human body surface dense model to the cloud server.

S803: and the rendering display terminal acquires the human eye visual parameters through the eyeball tracking device and detects the human eye movement frequency according to the human eye visual parameters.

In this step, the human visual parameters include at least one of an eye aspect ratio and gaze point position information. The specific detection process is referred to as S401 and is not repeated here.

S804: and the rendering display terminal determines whether the motion frequency is greater than a preset frequency threshold, if so, executing S805, otherwise, executing S810.

In this step, the motion frequency is compared with a set frequency threshold, if the motion frequency is greater than the preset frequency threshold, the data volume required for three-dimensional reconstruction may be reduced by adjusting the rendering parameters, and S805 is executed, and if the motion frequency is less than or equal to the preset frequency threshold, the data volume required for three-dimensional reconstruction is not required to be reduced in order to ensure that human eyes see a clear model, and S810 is executed.

S805: and adjusting the rendering parameters, and sending a control instruction carrying the rendering parameters to the cloud server.

In this step, the rendering display parameters are used to indicate the type of motion of the human eyes, and the correspondence between the type of motion and the transmission control parameters is shown in table 1. The adjustment process of the rendering parameters is referred to S403 and is not repeated here.

S806: and the cloud server receives the control instruction, and sends corresponding three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type indicated by the rendering parameter so as to reduce the data volume required by the reconstruction of the three-dimensional model.

In the step, the three-dimensional reconstruction data are sent to the cloud server by the acquisition terminal in real time, and the cloud server reduces the transmission of the three-dimensional reconstruction data in a self-adaptive manner according to the rendering parameters fed back by the visual state. The detailed description is referred to S602 and will not be repeated here.

S807: and the rendering display terminal receives the three-dimensional reconstruction data with the reduced data volume, and renders and displays the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume.

The detailed description of this step is referred to S404 and will not be repeated here.

S808: and the rendering display terminal predicts the three-dimensional reconstruction data of the next frame of the current frame according to the three-dimensional reconstruction data after the data volume is reduced and the three-dimensional reconstruction data of the previous N frames adjacent to the current frame.

The detailed description of this step is referred to S4041, and is not repeated here.

S809: and rendering the three-dimensional model of the next frame by the rendering display terminal according to the predicted three-dimensional reconstruction data.

See S4042 for a detailed description of this step, which is not repeated here.

S810: and the rendering display terminal receives the three-dimensional reconstruction data with the unreduced data volume sent by the cloud server, renders the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the unreduced data volume and displays the three-dimensional model.

In this step, when the motion frequency is less than or equal to the preset frequency threshold, in order to ensure that human eyes see a clear model, it is necessary to ensure that the displayed three-dimensional model has higher resolution to improve user experience, and at this time, the transmission terminal does not need to reduce the data volume of the three-dimensional reconstruction data. The transmission terminal directly sends the three-dimensional reconstruction data with the unreduced data volume transmitted by the acquisition terminal in real time to the rendering display terminal, and the rendering display terminal reconstructs a high-resolution three-dimensional model according to the three-dimensional reconstruction data with the unreduced data volume. The detailed description of this step is referred to S405 and will not be repeated here.

Based on the same inventive concept, the embodiment of the present invention further provides a rendering display terminal, which can implement the method steps in fig. 4a and fig. 4b in the embodiment of the present application, and the principle of solving the problem is similar to the method in the embodiment of the present application, and can achieve the technical effect in the embodiment described above, and repeated details are not repeated.

Referring to fig. 9, the rendering display terminal includes a detection module 901, an adjustment module 902, a sending module 903, a receiving module 904, and a rendering display module 905:

the detection module 901 is configured to obtain human eye visual parameters and detect human eye movement frequency according to the human eye visual parameters;

an adjusting module 902, configured to adjust a rendering parameter if the motion frequency is greater than a preset frequency threshold, where the rendering parameter is used to indicate a motion type of human eyes;

a sending module 903, configured to send a control instruction carrying a rendering parameter, where the control instruction is used to instruct to send corresponding three-dimensional reconstruction data according to a transmission control parameter corresponding to a motion type, so as to reduce a data amount required for reconstructing a three-dimensional model;

a receiving module 904, configured to receive the three-dimensional reconstruction data with reduced data size; and if the motion frequency is less than or equal to the preset frequency threshold, directly receiving the three-dimensional reconstruction data with the unreduced data volume.

And a rendering and displaying module 905, configured to render and display the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data size, or render and display the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the unreduced data size.

Optionally, the movement frequency includes at least one of a blink frequency and a saccade frequency, and the detecting module 901 is specifically configured to include:

the human eye vision parameters comprise eye aspect ratio, and the blink frequency is detected according to the eye aspect ratio; and/or

The human eye vision parameters comprise the position information of the fixation point, and the human eye glancing frequency is detected according to the position information of the fixation point.

Optionally, the adjustment formula of the rendering parameter is:

ρ _t (α，β)＝T+K(α _t +β _t )

where ρ is _t (alpha, beta) is rendering parameter after T frame adjustment, T is preset adjustment threshold, K is preset rendering parameter, alpha _t Eye aspect ratio, β, for t frame _t The gaze point movement state for t frames.

Optionally, the rendering display terminal further includes a prediction module 906, configured to:

predicting three-dimensional reconstruction data of a next frame of the current frame according to the three-dimensional reconstruction data with the reduced data size and three-dimensional reconstruction data of a previous N frames adjacent to the current frame, wherein N is an integer greater than or equal to 1;

and rendering the three-dimensional model of the next frame according to the predicted three-dimensional reconstruction data.

Optionally, the predicting module 906 is further configured to:

and if the motion frequency is recovered to be less than or equal to the preset frequency threshold, receiving the three-dimensional reconstruction data corresponding to the next frame, and replacing the predicted three-dimensional reconstruction data with the received three-dimensional reconstruction data.

Based on the same inventive concept, the embodiment of the present invention further provides a transmission terminal, which can implement the method steps in fig. 6 of the embodiment of the present application, and the principle of solving the problem is similar to the method in the embodiment of the present application, and can achieve the technical effect in the embodiment described above, and the repeated parts are not described again.

Referring to fig. 10, the transmission terminal includes a receiving module 1001, a transmitting module 1002:

a receiving module 1001, configured to receive a control instruction if the motion frequency is greater than a preset frequency threshold, where the control instruction carries a rendering parameter, and the rendering parameter is used to indicate a motion type of human eyes; the rendering display terminal detects that the human eye movement frequency is larger than a preset frequency threshold value according to the obtained human eye visual parameters and then adjusts the rendering parameters;

a sending module 1002, configured to send corresponding three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type to reduce a data amount required for reconstructing the three-dimensional model, so that the rendering display terminal reconstructs and displays the three-dimensional model according to the three-dimensional reconstruction data with the reduced data amount; or if the motion frequency is less than or equal to the preset frequency threshold, directly transmitting the three-dimensional reconstruction data with the unreduced data volume, and enabling the rendering display terminal to render and display the three-dimensional model according to the three-dimensional reconstruction data with the unreduced data volume.

Optionally, the sending module 1002 is specifically configured to:

determining a model detail level corresponding to the transmission control parameter according to a pre-constructed detail level LOD pyramid;

and according to the model detail level, performing down-sampling on the received three-dimensional reconstruction data, and sending the down-sampled three-dimensional reconstruction data to a rendering display terminal.

Referring to fig. 11, the rendering display terminal includes an eye tracking apparatus 1101, a display 1102, a memory 1103, and a processor 1104. The eye tracking apparatus 1101, the display 1102, and the memory 1103 are respectively connected to the processor 1104 by a bus (indicated by a thick solid line in fig. 11), the eye tracking apparatus 1101 is configured to acquire parameters of human vision, the display 1102 is configured to display a three-dimensional model, the memory 1103 is configured to store computer program instructions, and the processor 1104 is configured to execute a three-dimensional reconstruction method on the rendering display terminal side according to the computer program instructions.

Based on the same inventive concept, the embodiment of the present invention further provides a transmission terminal, which can implement the method steps in fig. 6 in the embodiment of the present application, and the principle of solving the problem is similar to the method in the embodiment of the present application, and can achieve the technical effect in the embodiment described above, and repeated parts are not described again.

Referring to fig. 12, the transmission terminal includes a memory 1201, a processor 1202, and the memory 1201 and the processor 1202 are connected by a bus (indicated by a thick solid line in fig. 12). The memory 1201 is configured to store computer program instructions, and the processor 1202 is configured to execute the three-dimensional reconstruction method at the transmission terminal side in the embodiment of the present application according to the computer program instructions.

Embodiments of the present invention further provide a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are used to enable a computer to execute the method in the foregoing embodiments.

The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A three-dimensional reconstruction method is applied to a rendering display terminal and comprises the following steps:

if the motion frequency is larger than a preset frequency threshold, adjusting a rendering parameter, and sending a control instruction carrying the rendering parameter to a transmission terminal, wherein the rendering parameter is used for indicating the motion type of the human eyes, and the control instruction is used for indicating the transmission terminal to send corresponding three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type so as to reduce the data volume required by the reconstruction of the three-dimensional model; receiving the three-dimensional reconstruction data with reduced data volume sent by the transmission terminal, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with reduced data volume;

if the motion frequency is less than or equal to a preset frequency threshold, directly receiving three-dimensional reconstruction data with unreduced data volume sent by the transmission terminal, and rendering and displaying a three-dimensional model of the current frame according to the three-dimensional reconstruction data with unreduced data volume;

the three-dimensional reconstruction data are sent to the transmission terminal by the acquisition terminal, and the mode for reducing the data volume required by the three-dimensional model reconstruction comprises stopping sending the three-dimensional reconstruction data or reducing the resolution of voxels in the three-dimensional reconstruction.

2. The method of claim 1, wherein the motion frequency comprises at least one of a blink frequency and a saccade frequency, and wherein detecting the human eye motion frequency from the human eye vision parameters comprises:

the human eye vision parameters comprise eye aspect ratios, and blink frequency is detected according to the eye aspect ratios; and/or

The human eye vision parameters comprise fixation point position information, and human eye glancing frequency is detected according to the fixation point position information.

3. The method of claim 1, wherein the adjustment formula of the rendering parameters is:

ρ _t (α，β)＝T+K(α _t +β _t )

where ρ is _t (alpha, beta) is a rendering parameter after T frame adjustment, T is a preset adjustment threshold, K is a preset rendering parameter, and alpha _t Eye aspect ratio, β, for t frame _t The gaze point movement state for t frames.

4. The method of claim 1, wherein rendering the three-dimensional model of the current frame from the reduced amount of data three-dimensional reconstructed data further comprises:

predicting three-dimensional reconstruction data of a next frame of the current frame according to the three-dimensional reconstruction data with reduced data size and three-dimensional reconstruction data of a previous N frames adjacent to the current frame, wherein N is an integer greater than or equal to 1;

5. The method of claim 4, wherein the method further comprises:

and if the motion frequency is recovered to be less than or equal to a preset frequency threshold, receiving the three-dimensional reconstruction data corresponding to the next frame, and replacing the predicted three-dimensional reconstruction data with the received three-dimensional reconstruction data.

6. A three-dimensional reconstruction method is applied to a transmission terminal, and comprises the following steps:

if the motion frequency is larger than a preset frequency threshold, receiving a control instruction sent by a rendering display terminal, wherein the control instruction carries a rendering parameter, and the rendering parameter is used for indicating the motion type of human eyes; according to the transmission control parameters corresponding to the motion types, reducing the data volume of the three-dimensional reconstruction data sent by the acquisition terminal, and sending the data volume to the rendering display terminal, so that the rendering display terminal reconstructs and displays a three-dimensional model according to the three-dimensional reconstruction data with the reduced data volume; the rendering display terminal detects that the human eye movement frequency is larger than a preset frequency threshold value according to the obtained human eye visual parameters and then adjusts the rendering parameters;

if the motion frequency is less than or equal to a preset frequency threshold, directly transmitting the three-dimensional reconstruction data with unreduced data volume sent by the acquisition terminal, and enabling the rendering display terminal to render and display the three-dimensional model according to the three-dimensional reconstruction data with unreduced data volume;

the way of reducing the data amount required for the reconstruction of the three-dimensional model includes stopping sending the three-dimensional reconstruction data or reducing the resolution of the voxel in the three-dimensional reconstruction.

7. The method of claim 6, wherein the reducing the data volume of the three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type comprises:

and according to the model detail level, performing down-sampling on the received three-dimensional reconstruction data, and sending the down-sampled three-dimensional reconstruction data to the rendering display terminal so as to reduce the data volume required by the reconstruction of the three-dimensional model.

8. A rendering display terminal, comprising an eye tracking device, a display, a memory, a processor:

the eyeball tracking device is connected with the processor and is configured to acquire human eye vision parameters;

if the motion frequency is larger than a preset frequency threshold, adjusting a rendering parameter, and sending a control instruction carrying the rendering parameter to a transmission terminal, wherein the rendering parameter is used for indicating the motion type of the human eyes, and the control instruction is used for indicating the transmission terminal to send corresponding three-dimensional reconstruction data according to the transmission control parameter corresponding to the motion type so as to reduce the data volume required by three-dimensional model reconstruction; receiving the three-dimensional reconstruction data with the reduced data volume sent by the transmission terminal, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with the reduced data volume;

the three-dimensional reconstruction data are sent to the transmission terminal by the acquisition terminal, and the mode for reducing the data volume required by the reconstruction of the three-dimensional model comprises the step of stopping sending the three-dimensional reconstruction data or reducing the resolution of voxels in the three-dimensional reconstruction.

9. A transmission terminal, comprising a memory and a processor:

ways to reduce the amount of data required for three-dimensional model reconstruction include, among other things, stopping the transmission of three-dimensional reconstruction data or reducing the resolution of voxels in the three-dimensional reconstruction.

10. A three-dimensional reconstruction system is characterized by comprising an acquisition terminal, a transmission terminal and a rendering display terminal;

the transmission terminal is used for receiving the three-dimensional reconstruction data sent by the acquisition terminal; if the motion frequency is greater than a preset frequency threshold, receiving a control instruction which is sent by the rendering display terminal and carries a rendering parameter, wherein the rendering parameter is used for indicating the motion type of human eyes; reducing the data volume of the three-dimensional reconstruction data according to the transmission control parameters corresponding to the motion types, and sending the three-dimensional reconstruction data with the reduced data volume to the rendering display terminal; if the motion frequency is less than or equal to a preset frequency threshold, directly transmitting three-dimensional reconstruction data with unreduced data volume;

the rendering display terminal is used for acquiring human eye visual parameters and detecting human eye movement frequency according to the human eye visual parameters; if the motion frequency is larger than a preset frequency threshold value, adjusting a rendering parameter, and sending a control instruction carrying the rendering parameter to the transmission terminal; receiving the three-dimensional reconstruction data with reduced data volume sent by the transmission terminal, and rendering and displaying the three-dimensional model of the current frame according to the three-dimensional reconstruction data with reduced data volume; if the motion frequency is less than or equal to a preset frequency threshold, directly receiving three-dimensional reconstruction data with unreduced data volume, rendering a three-dimensional model of the current frame according to the three-dimensional reconstruction data with unreduced data volume and displaying the three-dimensional model;