CN113810736B

CN113810736B - AI-driven real-time point cloud video transmission method and system

Info

Publication number: CN113810736B
Application number: CN202110985757.0A
Authority: CN
Inventors: 乔秀全; 黄亚坤; 朱原玮; 陈俊亮
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2022-11-01
Anticipated expiration: 2041-08-26
Also published as: CN113810736A

Abstract

The invention discloses an AI-driven real-time point cloud video transmission method and a system, wherein the method comprises the following steps: acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data; extracting hierarchical features of the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features; and receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect. According to the invention, the original point cloud video stream to be transmitted is subjected to feature extraction, only part of key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved, the transmission quantity and energy consumption of the point cloud video stream can be reduced remarkably, complex multiple processing in the traditional transmission scheme is avoided, and the data transmission quantity is greatly reduced.

Description

AI-driven real-time point cloud video transmission method and system

Technical Field

The invention relates to the field of point cloud video streaming transmission, in particular to an AI-driven real-time point cloud video transmission method and system.

Background

Point clouds, a typical and popular data format describing volumetric media and holographic video, may be captured by an RGB-D camera with a depth sensor. It allows the user to experience a six degree-of-freedom motion scene and to change the position and orientation thereof, like in a conventional Virtual Reality (VR) video, which has only three degrees of freedom. The volume media provides 3D scenes of a plurality of angles and is widely applied to various fields including education, medical treatment, entertainment and the like.

At present, even the transmission of dense point cloud streams in existing network environments, including 5G, is very challenging for reasons including: (1) The transmission of a large amount of point cloud videos requires extremely high bandwidth, and even through the traditional compression, the level of gigabit per second (Gbps) can be achieved, which exceeds the capability of the current 5G network; (2) The computational overhead of 3D volumetric media is large because only software codecs can be used, and inefficient codecs also slow down the transmission speed; (3) Conventional techniques such as rate adaptation and buffering control of Adaptive bit rate Streaming (ABR) are not suitable for 3D volumetric media, and advanced techniques for providing volumetric media need to be explored.

Most of the existing point cloud video transmission technologies can be realized by traditional compression (including providing lossy and lossless compression), which reduces the size of data to be transmitted, but still has many disadvantages, such as: on the one hand, the lossless compression technique of the point cloud video is still not enough to realize efficient transmission and better user experience. On the other hand, under limited network conditions, lossy compression makes it difficult to ensure that the authenticity of the recovered point cloud matches the actual original video. Other point cloud video transmission techniques, such as techniques for extending current VR video streams, are transmitted at the data block level, and these methods are high in mobile energy consumption, and processing delay on a receiving device is often unacceptable. In addition, each transport block is susceptible to network fluctuations and various packet losses during reassembly.

In summary, the transmission capability of the conventional technology is far from meeting the bandwidth requirement of the real-time point cloud video stream. Therefore, there is a need to explore an advanced transmission scheme to ensure good service is provided under the existing network.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides an AI-driven real-time point cloud video transmission method and system, which can obviously reduce the transmission quantity and energy consumption of point cloud video streams. The system avoids complex multiprocessing in the traditional transmission scheme, and designs and trains an end-to-end deep learning network from original data acquisition to final rendering and playing.

The technical scheme of the invention is realized as follows:

according to one aspect of the invention, an AI-driven real-time point cloud video transmission method is provided.

The AI-driven real-time point cloud video transmission method comprises the following steps:

acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data;

extracting hierarchical features of the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features;

and receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.

The method for obtaining the point cloud video data by using the AI generation equipment comprises the following steps of: scanning a 3D model to be transmitted by using a plurality of depth cameras with different angles, and acquiring a point cloud flow of each depth camera; splicing a plurality of point clouds collected by the multi-view camera into a complete point cloud to obtain point cloud video data;

in addition, the AI-driven real-time point cloud video transmission method further comprises:

training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;

each trained neural network model is divided into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;

when a hierarchical feature extraction module and a point cloud recovery reconstruction module based on a generated countermeasure network are deployed to a high-performance edge server and user terminal equipment, a self-adaptive matcher is deployed to enable the self-adaptive matcher to adaptively match a neural network model meeting the requirements of real-time point cloud video frame feature extraction and reconstruction recovery of a current network according to network bandwidth change monitored in real time.

Furthermore, the hierarchical feature extraction module comprises three set abstraction levels connected in series for performing hierarchical feature learning to capture a local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;

wherein, the sampling layer selects a subset from the output of the previous layer by using a farthest point sampling technology, and each point in the subset represents the center of a local area; grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set; the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer; and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.

Further, the point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers, and reconstructs the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.

In addition, the step of extracting the hierarchical features of the point cloud video data and the step of determining the key point cloud features in the point cloud video frame comprises the following steps: and performing hierarchical feature extraction on the point cloud video data through a hierarchical feature extraction module deployed to a high-performance edge server close to the input end, and determining key point cloud features in the point cloud video frame.

In addition, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming the original point cloud video with the visual effect comprises the following steps: and expanding and reconstructing key point cloud characteristics through a point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network, so that point cloud information similar to the original input point cloud is obtained, and an original point cloud video on the visual effect is formed.

According to another aspect of the invention, an AI-driven real-time point cloud video transmission system is provided.

The AI-driven real-time point cloud video transmission system comprises:

the point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment and processing the video data information to obtain point cloud video data;

the key point cloud feature extraction module is used for extracting hierarchical features of point cloud video data, determining key point cloud features in a point cloud video frame and transmitting the key point cloud features;

and the point cloud recovery and reconstruction module is used for receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics, obtaining point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.

The point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment and processing the video data information to obtain point cloud video data, and scanning a 3D (three-dimensional) model to be transmitted by using a plurality of depth cameras at different angles and acquiring a point cloud stream of each depth camera; and a plurality of point clouds collected from the multi-view camera are spliced into a complete point cloud to obtain point cloud video data

In addition, the AI-driven real-time point cloud video transmission system further comprises:

the neural network training module is used for training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;

the neural network deployment module is used for splitting each trained neural network model into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;

and the neural network matching module is used for deploying the hierarchical feature extraction module and the point cloud restoration and reconstruction module based on the generated countermeasure network to the high-performance edge server and the user terminal equipment, and deploying the self-adaptive matcher to enable the self-adaptive matcher to adaptively match the neural network model meeting the requirements of the real-time point cloud video frame feature extraction and reconstruction and restoration of the current network according to the network bandwidth change monitored in real time.

Furthermore, the hierarchical feature extraction module comprises three serially connected set abstraction levels for performing hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;

Further, the point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.

In addition, when the key point cloud feature extraction module performs hierarchical feature extraction on the point cloud video data and determines key point cloud features in the point cloud video frame, the key point cloud feature extraction module performs hierarchical feature extraction on the point cloud video data through the hierarchical feature extraction module deployed to the high-performance edge server close to the input end and determines the key point cloud features in the point cloud video frame.

In addition, when the point cloud recovery reconstruction module expands and reconstructs the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form an original point cloud video with visual effect, the point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form the original point cloud video with visual effect.

Has the advantages that:

according to the invention, the original point cloud video stream to be transmitted is subjected to feature extraction, only part of key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved, and the transmission quantity and the energy consumption of the point cloud video stream can be obviously reduced. The complex multiple processing in the traditional transmission scheme is avoided, and the data transmission quantity is greatly reduced, so that the method is more suitable for the existing network environment. The invention also considers the dynamic and unstable network environment, incorporates the network environment into end-to-end network design and training, and provides a self-adaptive transmission control algorithm to balance transmission delay and reconstruction accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart of AI-driven real-time point cloud video transmission according to an embodiment of the present invention;

FIG. 2 is a block diagram schematically illustrating a structure of an AI-driven real-time point cloud video transmission system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an AI-driven real-time point cloud video transmission method according to an embodiment of the invention;

FIG. 4 is a schematic structural design diagram of a deep neural network model according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

According to the embodiment of the invention, an AI-driven real-time point cloud video transmission method is provided.

As shown in fig. 1, the AI-driven real-time point cloud video transmission method according to the embodiment of the present invention includes:

step S101, obtaining video data information by using AI generating equipment, and processing the video data information to obtain point cloud video data;

step S103, extracting hierarchical features of point cloud video data, determining key point cloud features in a point cloud video frame, and transmitting the key point cloud features;

and step S105, receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.

training a pre-built deep neural network on line by using a large number of training sets consisting of 3D models of various scales to obtain a plurality of candidate neural network models;

In addition, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming the original point cloud video with visual effect comprises the following steps: the key point cloud characteristics are expanded and reconstructed through a point cloud recovery and reconstruction module which is deployed on user terminal equipment close to an input end and based on a generated countermeasure network, point cloud information similar to the original input point cloud is obtained, and an original point cloud video in a visual effect is formed.

According to an embodiment of the invention, an AI-driven real-time point cloud video transmission system is provided.

As shown in fig. 2, the AI-driven real-time point cloud video transmission system according to the embodiment of the present invention includes:

a point cloud video data obtaining module 201, configured to obtain video data information by using an AI generating device, and process the video data information to obtain point cloud video data;

the key point cloud feature extraction module 203 is used for extracting hierarchical features of the point cloud video data, determining key point cloud features in a point cloud video frame and transmitting the key point cloud features;

and the point cloud recovery and reconstruction module 205 is configured to receive the key point cloud features, expand and reconstruct the received key point cloud features, obtain point cloud information similar to the original input point cloud, and form an original point cloud video with a visual effect.

When the point cloud video data acquisition module 201 acquires video data information by using AI generation equipment and processes the video data information to obtain point cloud video data, it scans a 3D model to be transmitted by using a plurality of depth cameras at different angles and acquires a point cloud stream of each depth camera; and a plurality of point clouds collected from the multi-view camera are spliced into a complete point cloud to obtain point cloud video data

In addition, the AI-driven real-time point cloud video transmission system further comprises: a neural network training module (not shown in the figure) for training a pre-built deep neural network on line by using a large amount of training sets consisting of 3D models of various scales to obtain a plurality of candidate neural network models; a neural network deployment module (not shown in the figure) for splitting each trained neural network model into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end; and a neural network matching module (not shown in the figure) for deploying an adaptive matcher while deploying the hierarchical feature extraction module and the point cloud restoration and reconstruction module based on the generated countermeasure network to the high-performance edge server and the user terminal equipment, so that the adaptive matcher is prompted to adaptively match a neural network model satisfying the real-time point cloud video frame feature extraction and reconstruction and restoration of the current network according to the network bandwidth change monitored in real time.

The hierarchical feature extraction module comprises three serially connected set abstraction levels for performing hierarchical feature learning to capture a local structure of an original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;

The point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying the feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.

In addition, when the key point cloud feature extraction module 203 performs hierarchical feature extraction on the point cloud video data and determines the key point cloud features in the point cloud video frame, the key point cloud feature extraction module is deployed to a high-performance edge server close to the input end to perform hierarchical feature extraction on the point cloud video data and determine the key point cloud features in the point cloud video frame.

In addition, when the point cloud recovery reconstruction module 205 expands and reconstructs the received key point cloud features to obtain point cloud information similar to the original input point cloud and form an original point cloud video with visual effect, the point cloud recovery reconstruction module based on the generation countermeasure network deployed on the user terminal device close to the input end expands and reconstructs the key point cloud features to obtain point cloud information similar to the original input point cloud and form the original point cloud video with visual effect.

In order to make the technical solution of the present invention more clearly understood, the technical solution of the present invention is described in detail below from the viewpoint of the operation principle.

Fig. 3 is a schematic diagram of the principle of the present invention, and as can be seen from fig. 3, the method of the present invention is as follows:

(1) A multi-view camera. The system captures original point clouds by using a plurality of depth cameras placed at different angles, performs preprocessing by using a USB line, and synchronizes point cloud streams of each camera to a high-performance edge server for splicing.

(2) And point cloud feature extraction and point cloud information recovery reconstruction. Extracting key point cloud features, namely extracting key features of the spliced point cloud by using the hierarchical feature extraction module provided by the invention; and recovering and reconstructing the received point cloud characteristics based on the point cloud information generated against the network, namely the point cloud reconstruction and reconstruction module provided by the invention.

(3) An adaptive matcher. The method is used for sensing the network condition of the connected terminal and selecting the optimal transmission inference model so as to keep the stable operation of communication and improve the user experience.

(4) And a base station. The system can provide real-time point cloud transmission under the current network environment, and key point cloud characteristics are transmitted to various terminals through wireless by utilizing the existing base station.

(5) And a user terminal. The system can be applied to various terminals and has wide application scenes. For example, real-time holographic communication is implemented on a smartphone. A more immersive experience, such as rendering a point cloud using AR glasses or AR headphones, a user may interact in the point cloud video personally.

Fig. 4 is a schematic structural design diagram of a deep neural network model, and as can be seen from fig. 4, the deep neural network model structure includes a hierarchical feature extraction module and a point cloud restoration and reconstruction module, and specifically includes:

(1) And a hierarchical feature extraction module. The system is used for learning the input point cloud, extracting the characteristics of part of key point cloud and transmitting the key point cloud. Specifically, the module performs hierarchical feature learning based on a plurality of set abstraction levels, capturing the local structure of the original point cloud. Each set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer.

The sampling layer is used for selecting a subset from the output of the previous layer and representing the center of the local area; the grouping layer is used for finding n nearest neighbors around the center to construct a local region set, and the mini PointNet is used for converting the local region set into a feature vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer.

(2) And a point cloud recovery and reconstruction module. The method is used for recovering and reconstructing the key point cloud characteristics received by the receiving end. In particular, the module uses partial generation ideas in the antagonistic generation network, with fewer parameters and computational effort than the entire antagonistic generation network, facilitating deployment on resource-constrained terminals. The system comprises a point cloud feature expansion part and a final point set generation part.

The point cloud feature expansion part receives the transmitted point cloud feature matrix and unifies the feature dimensions through a multilayer sensor layer. Then, through an upward-downward-upward expansion unit, more diversified point cloud numbers and feature dimensions are generated.

And a final point set generation part, which reconstructs the expanded characteristics into a three-dimensional coordinate form through two multi-layer perceptron layers.

In actual application, when the point cloud feature extraction is performed on the original data through a hierarchical feature extraction module in the deep neural network, the specific implementation scheme can be as follows:

(1) Training phase

a. The surface of each point cloud model in the training set randomly selects 200 points as seed coordinates, uses a farthest point sampling technology to find 256 points around the surface by taking each seed coordinate as a center, so that the area formed by the 256 points occupies about 5% of the surface of the model, the seed coordinates and the 256 points are defined as a fragment, and the coordinates of the point set in the fragment are normalized to a unit sphere. In this embodiment, each sample input into the neural network is a patch, which contains 256 points, each having three-dimensional coordinates, and the input can be represented as (256,3).

b. The input sample of (256,3) is input to the sampling layer, specifically by using the farthest point sampling technique to select 128 points, resulting in a sparse point set, denoted as (128,3). The reason for choosing to use the farthest point sampling technique is that it covers the whole set of points better than random sampling, specifically choosing the number of center points, specified by the human, in this embodiment 128.

c. Inputting the information of the sparse point set (128,3) into a grouping layer, specifically, generating 128 local regions by using a sphere query method with 128 points as centers, wherein each region comprises 32 points, the radius of a sphere is 0.2, and a grouping feature is obtained and is represented as (128, 32,3). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 32 and 0.2. This step can also be realized by the K-nearest neighbor method, and both have little influence on the result.

d. Inputting the information of the grouping features (128, 32,3) into a mini PointNet layer, and specifically, the grouping features (128, 32,3) sequentially pass through three two-dimensional convolutional layers and a maximum pooling layer, and layered feature information (128, 64) is output. In this embodiment, the number of output channels of the three two-dimensional convolution layers is 64, 64, and 64, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.

e. And inputting the hierarchical characteristic information (128, 64) into the sampling layer again, wherein the specific steps are that 64 points are selected by using a farthest point sampling technology, and the sparse point set characteristics (64, 64) are output. The number of center points is specifically selected and designated by human, in this embodiment 64.

f. And inputting the information of the sparse point set features (64, 64) into a grouping layer, specifically, generating 64 local regions by using a sphere query method by taking 64 points as centers, wherein each region comprises 64 points, and the radius of a sphere is 0.3 to obtain grouping features (64, 64, 64). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 64 and 0.3. This step can also be achieved by the K-nearest neighbor method.

g. Inputting the information of the grouping features (64, 64, 64) into the mini PointNet layer, and specifically, outputting the information (64, 32) of the grouping features (64, 64, 64) through three two-dimensional convolutional layers and one maximum pooling layer in sequence. In this embodiment, the number of output channels of the three two-dimensional convolution layers is 64, 64, and 32, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.

h. And inputting the hierarchical characteristic information (64, 32) into a sampling layer again, wherein the specific steps are that a farthest point sampling technology is used, N points are selected, and the sparse point set characteristic (N, 32) is obtained. Wherein the number N of specific selected center points is a variable.

i. Inputting the information of the sparse point set characteristics (N, 32) into a grouping layer, and specifically, generating N local regions by using a sphere query method by taking N points as centers, wherein each region comprises 64 points, the radius is 0.4, and the grouping characteristics are obtained and are represented as (N, 64, 32). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 64 and 0.3.N is a variable. This step can also be achieved by the K-nearest neighbor method.

j. Inputting the information of the grouping features (N, 64, 32) into a mini PointNet layer, wherein the specific steps are that the grouping features (N, 64, 32) sequentially pass through three two-dimensional convolutional layers and a maximum pooling layer to obtain layered point cloud feature information (N, M). In this embodiment, the number of output channels of the three two-dimensional convolution layers is 32, m, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.

For the layered feature extraction module, a sampling layer, a grouping layer and a mini PointNet layer are set abstraction layers, and the set abstraction layer 1,e-g in the steps b-d is the set abstraction layer 2,h-j is the set abstraction layer 3. The number of the set abstraction levels is manually specified, in this embodiment, is specified to be 3, and the output of the last abstraction level is the point cloud feature (N, M). And (3) inputting all training samples into the deep neural network, calculating a loss function through the forward propagation and the backward propagation in the steps b-j, updating the weight of the network, and training a neural network model. And setting N and M into different combinations, namely training a plurality of candidate models.

(2) Inference phase

a. Randomly selecting a plurality of points on the surface of a target object in a video frame to be transmitted as seed coordinates, using a farthest point sampling technology to find 256 points around the selected points by taking each seed coordinate as a center, forming a fragment, and normalizing coordinates of a point set in the fragment into a unit sphere. Where the number of seeds is specified by human, in the present invention as the number of target point clouds divided by 256.

b. And selecting an optimal inference model according to the network fluctuation condition.

c. And inputting the optimal reasoning model by taking the fragments as units, and performing forward reasoning to obtain the final point cloud characteristics needing to be transmitted.

In actual application, when point cloud information is reconstructed on the point cloud features through the point cloud recovery reconstruction module, a specific embodiment can be as follows:

(1) Training phase

a. And (5) point cloud features (N, M) are subjected to a two-dimensional convolution layer to obtain point cloud features (N, 128) with uniform dimensions. The layer has the effect that even the key point cloud features compressed by different inference models can output the feature number with the same dimensionality through the layer. The number of output channels of the two-dimensional convolution layer is 128, the size of a convolution kernel is 1 multiplied by 1, the step length is 1 multiplied by 1, and no filling is performed.

b. Point cloud features of uniform dimensions, expressed as

(N, 128) using an up-sampling operation to increase the number of point cloud features to obtain up-sampled point cloud features

Denoted as (256, 128). The up-sampling operation comprises the following specific steps:

and copying the characteristic (N, 128) r times to obtain (rN, 128), wherein r is sampling multiplying power and is equal to 256/N.

And generating a unique two-dimensional vector for each feature by adopting a 2D grid mechanism, and adding the vector to the feature vector of each corresponding point in the same feature to obtain a feature (rN, 128+ 2).

Generating upsampled point cloud features using a self-attention unit and two-dimensional convolution layers

. The output channels of the two-dimensional convolution layers are 256 and 128, the convolution kernel size is 1 multiplied by 1, the step size is 1 multiplied by 1, and no filling exists.

c. Point-to-point cloud feature

(256, 128) obtained using a downsampling operation

Point cloud features of the same scale

Denoted as (N, 128). The downsampling operation comprises the following specific steps:

characterizing a point cloud

By simply moving the rows, the (N, r x 128) feature is modified,

generating point cloud features using two-dimensional convolution layers

d. Characterizing a point cloud

And point cloud characteristics

Subtracting to obtain point cloud characteristics

Dimension is (N, 128).

e. Point-to-point cloud feature

(N, 128) obtaining the upsampled point cloud feature using the same upsampling operation

And the dimension is (256, 128).

f. Characterizing a point cloud

And point cloud characteristics

Adding to obtain point cloud characteristics

And the dimension is (256, 128).

g. Characterizing a point cloud

And (256, 128) performing coordinate reconstruction through two-dimensional convolution layers to obtain a reconstructed point cloud, wherein the dimension is (256,3). The output channel of two-dimensional convolution layers is 64,3, the convolution kernel size is 1 × 1, the step length is 1 × 1, and no filling exists.

And for the point cloud recovery reconstruction module, steps c-f are upward-downward-upward expansion units, and step g is a final point set generation part. Inputting all point cloud characteristic samples obtained through transmission into a deep neural network, calculating a loss function through steps a-g, forward propagation and backward propagation, updating the weight of the network, and training a neural network model.

(2) Inference phase

a. And selecting an optimal inference model according to the network fluctuation condition.

b. And inputting all the point cloud characteristic samples obtained by transmission into a deep neural network, and performing forward reasoning to obtain restored and reconstructed fragment point cloud information through the steps A-G.

c. And adding all the fragment reconstruction information together, selecting points with the same number as the original input point cloud by using a farthest point sampling technology, and reconstructing a final point cloud.

By means of the technical scheme, the original point cloud video stream needing to be transmitted is subjected to feature extraction, only partial key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved.

In operation, the invention completely uses AI as a drive, namely, a deep neural network is used for automatically realizing feature extraction and target reconstruction, and original point cloud coordinates are directly used as input, other operations including information preprocessing are not needed, the geometrical distribution condition and attribute importance information of the point cloud are artificially obtained, a coding and decoding mode is selected, and the deep neural network only needs to be trained and then deployed to practical application, and the deep neural network is divided into two processes of feature extraction and reconstruction.

The system is an end-to-end training neural network, and comprises feature extraction and reconstruction, namely training can be carried out as long as data are input, the system is more intelligent, internal specific operation does not need to be concerned, and a proper inference model can be selected according to the network condition to provide better experience for users.

Meanwhile, the invention can reach a high-power compression ratio of 30.72 times under the condition of acceptable precision loss, realize real-time transmission under the existing environment of less than 5G, and greatly reduce the transmission data volume.

In conclusion, the invention can obviously reduce the transmission quantity and energy consumption of the point cloud video stream. The invention also considers the dynamic and unstable of the network environment, incorporates the network environment into the end-to-end network design and training, and provides a self-adaptive transmission control algorithm to balance the transmission delay and the reconstruction accuracy

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. An AI-driven real-time point cloud video transmission method is characterized by comprising the following steps:

the method comprises the steps of receiving key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to original input point cloud and form an original point cloud video with a visual effect;

2. The AI-driven real-time point cloud video transmission method according to claim 1, wherein the hierarchical feature extraction module includes three sets of abstraction levels connected in series for performing hierarchical feature learning to capture a local structure of an original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer; wherein the content of the first and second substances,

a sampling layer for selecting a subset from the output of the previous layer using a farthest point sampling technique, each point in the subset representing the center of a local area;

grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set;

the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer;

and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.

3. The AI-driven real-time point cloud video transmission method according to claim 2, wherein the point cloud restoration reconstruction module comprises a point cloud feature expansion portion and a final point set generation portion, wherein,

the point cloud feature expansion part is used for unifying the dimensions of the features through a multilayer perceptron when the transmitted key point cloud features are received; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit;

and the final point set generating part comprises two multilayer sensor layers, and reconstructs the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.

4. The AI-driven real-time point cloud video transmission method of claim 3,

the method comprises the following steps of acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data:

scanning a 3D model to be transmitted by using a plurality of depth cameras with different angles, and acquiring a point cloud flow of each depth camera; splicing a plurality of point clouds collected by the multi-view camera into a complete point cloud to obtain point cloud video data;

extracting hierarchical features of the point cloud video data, and determining key point cloud features in the point cloud video frame comprises the following steps:

performing hierarchical feature extraction on point cloud video data through a hierarchical feature extraction module deployed to a high-performance edge server close to an input end to determine key point cloud features in a point cloud video frame;

expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video with a visual effect comprises the following steps:

and expanding and reconstructing key point cloud characteristics through a point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network, so that point cloud information similar to the original input point cloud is obtained, and an original point cloud video on the visual effect is formed.

5. An AI-driven real-time point cloud video transmission system, comprising:

the point cloud recovery and reconstruction module is used for receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form an original point cloud video on the visual effect;

6. The AI-driven real-time point cloud video transmission system of claim 5, wherein the hierarchical feature extraction module comprises three sets of abstraction levels connected in series for performing hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer; wherein the content of the first and second substances,

and the eigenvector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.

7. The AI-driven real-time point cloud video transmission system of claim 6, wherein the point cloud restoration reconstruction module comprises a point cloud feature expansion portion and a final point set generation portion, wherein,

and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.

8. The AI-driven real-time point cloud video transmission system of claim 7,

the point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment, processing the video data information and scanning a 3D (three-dimensional) model to be transmitted by using a plurality of depth cameras at different angles when point cloud video data are obtained, and acquiring a point cloud stream of each depth camera; splicing a plurality of point clouds collected by a multi-view camera into a complete point cloud to obtain point cloud video data;

the key point cloud feature extraction module extracts the hierarchical features of the point cloud video data through the hierarchical feature extraction module arranged to the high-performance edge server close to the input end when the hierarchical features of the point cloud video data are extracted and the key point cloud features in the point cloud video frame are determined;

the point cloud recovery reconstruction module expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and when an original point cloud video on the visual effect is formed, the point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and the original point cloud video on the visual effect is formed.