CN113810736B - AI-driven real-time point cloud video transmission method and system - Google Patents

AI-driven real-time point cloud video transmission method and system Download PDF

Info

Publication number
CN113810736B
CN113810736B CN202110985757.0A CN202110985757A CN113810736B CN 113810736 B CN113810736 B CN 113810736B CN 202110985757 A CN202110985757 A CN 202110985757A CN 113810736 B CN113810736 B CN 113810736B
Authority
CN
China
Prior art keywords
point cloud
video data
features
cloud video
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110985757.0A
Other languages
Chinese (zh)
Other versions
CN113810736A (en
Inventor
乔秀全
黄亚坤
朱原玮
陈俊亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110985757.0A priority Critical patent/CN113810736B/en
Publication of CN113810736A publication Critical patent/CN113810736A/en
Application granted granted Critical
Publication of CN113810736B publication Critical patent/CN113810736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Abstract

The invention discloses an AI-driven real-time point cloud video transmission method and a system, wherein the method comprises the following steps: acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data; extracting hierarchical features of the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features; and receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect. According to the invention, the original point cloud video stream to be transmitted is subjected to feature extraction, only part of key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved, the transmission quantity and energy consumption of the point cloud video stream can be reduced remarkably, complex multiple processing in the traditional transmission scheme is avoided, and the data transmission quantity is greatly reduced.

Description

AI-driven real-time point cloud video transmission method and system
Technical Field
The invention relates to the field of point cloud video streaming transmission, in particular to an AI-driven real-time point cloud video transmission method and system.
Background
Point clouds, a typical and popular data format describing volumetric media and holographic video, may be captured by an RGB-D camera with a depth sensor. It allows the user to experience a six degree-of-freedom motion scene and to change the position and orientation thereof, like in a conventional Virtual Reality (VR) video, which has only three degrees of freedom. The volume media provides 3D scenes of a plurality of angles and is widely applied to various fields including education, medical treatment, entertainment and the like.
At present, even the transmission of dense point cloud streams in existing network environments, including 5G, is very challenging for reasons including: (1) The transmission of a large amount of point cloud videos requires extremely high bandwidth, and even through the traditional compression, the level of gigabit per second (Gbps) can be achieved, which exceeds the capability of the current 5G network; (2) The computational overhead of 3D volumetric media is large because only software codecs can be used, and inefficient codecs also slow down the transmission speed; (3) Conventional techniques such as rate adaptation and buffering control of Adaptive bit rate Streaming (ABR) are not suitable for 3D volumetric media, and advanced techniques for providing volumetric media need to be explored.
Most of the existing point cloud video transmission technologies can be realized by traditional compression (including providing lossy and lossless compression), which reduces the size of data to be transmitted, but still has many disadvantages, such as: on the one hand, the lossless compression technique of the point cloud video is still not enough to realize efficient transmission and better user experience. On the other hand, under limited network conditions, lossy compression makes it difficult to ensure that the authenticity of the recovered point cloud matches the actual original video. Other point cloud video transmission techniques, such as techniques for extending current VR video streams, are transmitted at the data block level, and these methods are high in mobile energy consumption, and processing delay on a receiving device is often unacceptable. In addition, each transport block is susceptible to network fluctuations and various packet losses during reassembly.
In summary, the transmission capability of the conventional technology is far from meeting the bandwidth requirement of the real-time point cloud video stream. Therefore, there is a need to explore an advanced transmission scheme to ensure good service is provided under the existing network.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides an AI-driven real-time point cloud video transmission method and system, which can obviously reduce the transmission quantity and energy consumption of point cloud video streams. The system avoids complex multiprocessing in the traditional transmission scheme, and designs and trains an end-to-end deep learning network from original data acquisition to final rendering and playing.
The technical scheme of the invention is realized as follows:
according to one aspect of the invention, an AI-driven real-time point cloud video transmission method is provided.
The AI-driven real-time point cloud video transmission method comprises the following steps:
acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data;
extracting hierarchical features of the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features;
and receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.
The method for obtaining the point cloud video data by using the AI generation equipment comprises the following steps of: scanning a 3D model to be transmitted by using a plurality of depth cameras with different angles, and acquiring a point cloud flow of each depth camera; splicing a plurality of point clouds collected by the multi-view camera into a complete point cloud to obtain point cloud video data;
in addition, the AI-driven real-time point cloud video transmission method further comprises:
training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;
each trained neural network model is divided into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;
when a hierarchical feature extraction module and a point cloud recovery reconstruction module based on a generated countermeasure network are deployed to a high-performance edge server and user terminal equipment, a self-adaptive matcher is deployed to enable the self-adaptive matcher to adaptively match a neural network model meeting the requirements of real-time point cloud video frame feature extraction and reconstruction recovery of a current network according to network bandwidth change monitored in real time.
Furthermore, the hierarchical feature extraction module comprises three set abstraction levels connected in series for performing hierarchical feature learning to capture a local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;
wherein, the sampling layer selects a subset from the output of the previous layer by using a farthest point sampling technology, and each point in the subset represents the center of a local area; grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set; the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer; and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.
Further, the point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers, and reconstructs the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
In addition, the step of extracting the hierarchical features of the point cloud video data and the step of determining the key point cloud features in the point cloud video frame comprises the following steps: and performing hierarchical feature extraction on the point cloud video data through a hierarchical feature extraction module deployed to a high-performance edge server close to the input end, and determining key point cloud features in the point cloud video frame.
In addition, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming the original point cloud video with the visual effect comprises the following steps: and expanding and reconstructing key point cloud characteristics through a point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network, so that point cloud information similar to the original input point cloud is obtained, and an original point cloud video on the visual effect is formed.
According to another aspect of the invention, an AI-driven real-time point cloud video transmission system is provided.
The AI-driven real-time point cloud video transmission system comprises:
the point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment and processing the video data information to obtain point cloud video data;
the key point cloud feature extraction module is used for extracting hierarchical features of point cloud video data, determining key point cloud features in a point cloud video frame and transmitting the key point cloud features;
and the point cloud recovery and reconstruction module is used for receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics, obtaining point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.
The point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment and processing the video data information to obtain point cloud video data, and scanning a 3D (three-dimensional) model to be transmitted by using a plurality of depth cameras at different angles and acquiring a point cloud stream of each depth camera; and a plurality of point clouds collected from the multi-view camera are spliced into a complete point cloud to obtain point cloud video data
In addition, the AI-driven real-time point cloud video transmission system further comprises:
the neural network training module is used for training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;
the neural network deployment module is used for splitting each trained neural network model into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;
and the neural network matching module is used for deploying the hierarchical feature extraction module and the point cloud restoration and reconstruction module based on the generated countermeasure network to the high-performance edge server and the user terminal equipment, and deploying the self-adaptive matcher to enable the self-adaptive matcher to adaptively match the neural network model meeting the requirements of the real-time point cloud video frame feature extraction and reconstruction and restoration of the current network according to the network bandwidth change monitored in real time.
Furthermore, the hierarchical feature extraction module comprises three serially connected set abstraction levels for performing hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;
wherein, the sampling layer selects a subset from the output of the previous layer by using a farthest point sampling technology, and each point in the subset represents the center of a local area; grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set; the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer; and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.
Further, the point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
In addition, when the key point cloud feature extraction module performs hierarchical feature extraction on the point cloud video data and determines key point cloud features in the point cloud video frame, the key point cloud feature extraction module performs hierarchical feature extraction on the point cloud video data through the hierarchical feature extraction module deployed to the high-performance edge server close to the input end and determines the key point cloud features in the point cloud video frame.
In addition, when the point cloud recovery reconstruction module expands and reconstructs the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form an original point cloud video with visual effect, the point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form the original point cloud video with visual effect.
Has the advantages that:
according to the invention, the original point cloud video stream to be transmitted is subjected to feature extraction, only part of key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved, and the transmission quantity and the energy consumption of the point cloud video stream can be obviously reduced. The complex multiple processing in the traditional transmission scheme is avoided, and the data transmission quantity is greatly reduced, so that the method is more suitable for the existing network environment. The invention also considers the dynamic and unstable network environment, incorporates the network environment into end-to-end network design and training, and provides a self-adaptive transmission control algorithm to balance transmission delay and reconstruction accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flow chart of AI-driven real-time point cloud video transmission according to an embodiment of the present invention;
FIG. 2 is a block diagram schematically illustrating a structure of an AI-driven real-time point cloud video transmission system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an AI-driven real-time point cloud video transmission method according to an embodiment of the invention;
FIG. 4 is a schematic structural design diagram of a deep neural network model according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
According to the embodiment of the invention, an AI-driven real-time point cloud video transmission method is provided.
As shown in fig. 1, the AI-driven real-time point cloud video transmission method according to the embodiment of the present invention includes:
step S101, obtaining video data information by using AI generating equipment, and processing the video data information to obtain point cloud video data;
step S103, extracting hierarchical features of point cloud video data, determining key point cloud features in a point cloud video frame, and transmitting the key point cloud features;
and step S105, receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video on the visual effect.
The method for obtaining the point cloud video data by using the AI generation equipment comprises the following steps of: scanning a 3D model to be transmitted by using a plurality of depth cameras with different angles, and acquiring a point cloud flow of each depth camera; splicing a plurality of point clouds collected by the multi-view camera into a complete point cloud to obtain point cloud video data;
in addition, the AI-driven real-time point cloud video transmission method further comprises:
training a pre-built deep neural network on line by using a large number of training sets consisting of 3D models of various scales to obtain a plurality of candidate neural network models;
each trained neural network model is divided into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;
when a hierarchical feature extraction module and a point cloud recovery reconstruction module based on a generated countermeasure network are deployed to a high-performance edge server and user terminal equipment, a self-adaptive matcher is deployed to enable the self-adaptive matcher to adaptively match a neural network model meeting the requirements of real-time point cloud video frame feature extraction and reconstruction recovery of a current network according to network bandwidth change monitored in real time.
Furthermore, the hierarchical feature extraction module comprises three serially connected set abstraction levels for performing hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;
wherein, the sampling layer selects a subset from the output of the previous layer by using a farthest point sampling technology, and each point in the subset represents the center of a local area; grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set; the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer; and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.
Further, the point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers, and reconstructs the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
In addition, the step of extracting the hierarchical features of the point cloud video data and the step of determining the key point cloud features in the point cloud video frame comprises the following steps: and performing hierarchical feature extraction on the point cloud video data through a hierarchical feature extraction module deployed to a high-performance edge server close to the input end, and determining key point cloud features in the point cloud video frame.
In addition, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming the original point cloud video with visual effect comprises the following steps: the key point cloud characteristics are expanded and reconstructed through a point cloud recovery and reconstruction module which is deployed on user terminal equipment close to an input end and based on a generated countermeasure network, point cloud information similar to the original input point cloud is obtained, and an original point cloud video in a visual effect is formed.
According to an embodiment of the invention, an AI-driven real-time point cloud video transmission system is provided.
As shown in fig. 2, the AI-driven real-time point cloud video transmission system according to the embodiment of the present invention includes:
a point cloud video data obtaining module 201, configured to obtain video data information by using an AI generating device, and process the video data information to obtain point cloud video data;
the key point cloud feature extraction module 203 is used for extracting hierarchical features of the point cloud video data, determining key point cloud features in a point cloud video frame and transmitting the key point cloud features;
and the point cloud recovery and reconstruction module 205 is configured to receive the key point cloud features, expand and reconstruct the received key point cloud features, obtain point cloud information similar to the original input point cloud, and form an original point cloud video with a visual effect.
When the point cloud video data acquisition module 201 acquires video data information by using AI generation equipment and processes the video data information to obtain point cloud video data, it scans a 3D model to be transmitted by using a plurality of depth cameras at different angles and acquires a point cloud stream of each depth camera; and a plurality of point clouds collected from the multi-view camera are spliced into a complete point cloud to obtain point cloud video data
In addition, the AI-driven real-time point cloud video transmission system further comprises: a neural network training module (not shown in the figure) for training a pre-built deep neural network on line by using a large amount of training sets consisting of 3D models of various scales to obtain a plurality of candidate neural network models; a neural network deployment module (not shown in the figure) for splitting each trained neural network model into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end; and a neural network matching module (not shown in the figure) for deploying an adaptive matcher while deploying the hierarchical feature extraction module and the point cloud restoration and reconstruction module based on the generated countermeasure network to the high-performance edge server and the user terminal equipment, so that the adaptive matcher is prompted to adaptively match a neural network model satisfying the real-time point cloud video frame feature extraction and reconstruction and restoration of the current network according to the network bandwidth change monitored in real time.
The hierarchical feature extraction module comprises three serially connected set abstraction levels for performing hierarchical feature learning to capture a local structure of an original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer;
wherein, the sampling layer selects a subset from the output of the previous layer by using a farthest point sampling technology, and each point in the subset represents the center of a local area; grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set; the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer; and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.
The point cloud recovery reconstruction module comprises a point cloud feature expansion part and a final point set generation part, wherein the point cloud feature expansion part is used for unifying the feature dimensions through a multilayer perceptron when receiving the transmitted key point cloud features; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit; and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
In addition, when the key point cloud feature extraction module 203 performs hierarchical feature extraction on the point cloud video data and determines the key point cloud features in the point cloud video frame, the key point cloud feature extraction module is deployed to a high-performance edge server close to the input end to perform hierarchical feature extraction on the point cloud video data and determine the key point cloud features in the point cloud video frame.
In addition, when the point cloud recovery reconstruction module 205 expands and reconstructs the received key point cloud features to obtain point cloud information similar to the original input point cloud and form an original point cloud video with visual effect, the point cloud recovery reconstruction module based on the generation countermeasure network deployed on the user terminal device close to the input end expands and reconstructs the key point cloud features to obtain point cloud information similar to the original input point cloud and form the original point cloud video with visual effect.
In order to make the technical solution of the present invention more clearly understood, the technical solution of the present invention is described in detail below from the viewpoint of the operation principle.
Fig. 3 is a schematic diagram of the principle of the present invention, and as can be seen from fig. 3, the method of the present invention is as follows:
(1) A multi-view camera. The system captures original point clouds by using a plurality of depth cameras placed at different angles, performs preprocessing by using a USB line, and synchronizes point cloud streams of each camera to a high-performance edge server for splicing.
(2) And point cloud feature extraction and point cloud information recovery reconstruction. Extracting key point cloud features, namely extracting key features of the spliced point cloud by using the hierarchical feature extraction module provided by the invention; and recovering and reconstructing the received point cloud characteristics based on the point cloud information generated against the network, namely the point cloud reconstruction and reconstruction module provided by the invention.
(3) An adaptive matcher. The method is used for sensing the network condition of the connected terminal and selecting the optimal transmission inference model so as to keep the stable operation of communication and improve the user experience.
(4) And a base station. The system can provide real-time point cloud transmission under the current network environment, and key point cloud characteristics are transmitted to various terminals through wireless by utilizing the existing base station.
(5) And a user terminal. The system can be applied to various terminals and has wide application scenes. For example, real-time holographic communication is implemented on a smartphone. A more immersive experience, such as rendering a point cloud using AR glasses or AR headphones, a user may interact in the point cloud video personally.
Fig. 4 is a schematic structural design diagram of a deep neural network model, and as can be seen from fig. 4, the deep neural network model structure includes a hierarchical feature extraction module and a point cloud restoration and reconstruction module, and specifically includes:
(1) And a hierarchical feature extraction module. The system is used for learning the input point cloud, extracting the characteristics of part of key point cloud and transmitting the key point cloud. Specifically, the module performs hierarchical feature learning based on a plurality of set abstraction levels, capturing the local structure of the original point cloud. Each set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer.
The sampling layer is used for selecting a subset from the output of the previous layer and representing the center of the local area; the grouping layer is used for finding n nearest neighbors around the center to construct a local region set, and the mini PointNet is used for converting the local region set into a feature vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer.
(2) And a point cloud recovery and reconstruction module. The method is used for recovering and reconstructing the key point cloud characteristics received by the receiving end. In particular, the module uses partial generation ideas in the antagonistic generation network, with fewer parameters and computational effort than the entire antagonistic generation network, facilitating deployment on resource-constrained terminals. The system comprises a point cloud feature expansion part and a final point set generation part.
The point cloud feature expansion part receives the transmitted point cloud feature matrix and unifies the feature dimensions through a multilayer sensor layer. Then, through an upward-downward-upward expansion unit, more diversified point cloud numbers and feature dimensions are generated.
And a final point set generation part, which reconstructs the expanded characteristics into a three-dimensional coordinate form through two multi-layer perceptron layers.
In actual application, when the point cloud feature extraction is performed on the original data through a hierarchical feature extraction module in the deep neural network, the specific implementation scheme can be as follows:
(1) Training phase
a. The surface of each point cloud model in the training set randomly selects 200 points as seed coordinates, uses a farthest point sampling technology to find 256 points around the surface by taking each seed coordinate as a center, so that the area formed by the 256 points occupies about 5% of the surface of the model, the seed coordinates and the 256 points are defined as a fragment, and the coordinates of the point set in the fragment are normalized to a unit sphere. In this embodiment, each sample input into the neural network is a patch, which contains 256 points, each having three-dimensional coordinates, and the input can be represented as (256,3).
b. The input sample of (256,3) is input to the sampling layer, specifically by using the farthest point sampling technique to select 128 points, resulting in a sparse point set, denoted as (128,3). The reason for choosing to use the farthest point sampling technique is that it covers the whole set of points better than random sampling, specifically choosing the number of center points, specified by the human, in this embodiment 128.
c. Inputting the information of the sparse point set (128,3) into a grouping layer, specifically, generating 128 local regions by using a sphere query method with 128 points as centers, wherein each region comprises 32 points, the radius of a sphere is 0.2, and a grouping feature is obtained and is represented as (128, 32,3). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 32 and 0.2. This step can also be realized by the K-nearest neighbor method, and both have little influence on the result.
d. Inputting the information of the grouping features (128, 32,3) into a mini PointNet layer, and specifically, the grouping features (128, 32,3) sequentially pass through three two-dimensional convolutional layers and a maximum pooling layer, and layered feature information (128, 64) is output. In this embodiment, the number of output channels of the three two-dimensional convolution layers is 64, 64, and 64, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.
e. And inputting the hierarchical characteristic information (128, 64) into the sampling layer again, wherein the specific steps are that 64 points are selected by using a farthest point sampling technology, and the sparse point set characteristics (64, 64) are output. The number of center points is specifically selected and designated by human, in this embodiment 64.
f. And inputting the information of the sparse point set features (64, 64) into a grouping layer, specifically, generating 64 local regions by using a sphere query method by taking 64 points as centers, wherein each region comprises 64 points, and the radius of a sphere is 0.3 to obtain grouping features (64, 64, 64). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 64 and 0.3. This step can also be achieved by the K-nearest neighbor method.
g. Inputting the information of the grouping features (64, 64, 64) into the mini PointNet layer, and specifically, outputting the information (64, 32) of the grouping features (64, 64, 64) through three two-dimensional convolutional layers and one maximum pooling layer in sequence. In this embodiment, the number of output channels of the three two-dimensional convolution layers is 64, 64, and 32, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.
h. And inputting the hierarchical characteristic information (64, 32) into a sampling layer again, wherein the specific steps are that a farthest point sampling technology is used, N points are selected, and the sparse point set characteristic (N, 32) is obtained. Wherein the number N of specific selected center points is a variable.
i. Inputting the information of the sparse point set characteristics (N, 32) into a grouping layer, and specifically, generating N local regions by using a sphere query method by taking N points as centers, wherein each region comprises 64 points, the radius is 0.4, and the grouping characteristics are obtained and are represented as (N, 64, 32). Where the number of points in each region and the radius of the sphere are specified by the person, in this embodiment as 64 and 0.3.N is a variable. This step can also be achieved by the K-nearest neighbor method.
j. Inputting the information of the grouping features (N, 64, 32) into a mini PointNet layer, wherein the specific steps are that the grouping features (N, 64, 32) sequentially pass through three two-dimensional convolutional layers and a maximum pooling layer to obtain layered point cloud feature information (N, M). In this embodiment, the number of output channels of the three two-dimensional convolution layers is 32, m, the convolution kernel size is 1 × 1, the step size is 1 × 1, and there is no padding.
For the layered feature extraction module, a sampling layer, a grouping layer and a mini PointNet layer are set abstraction layers, and the set abstraction layer 1,e-g in the steps b-d is the set abstraction layer 2,h-j is the set abstraction layer 3. The number of the set abstraction levels is manually specified, in this embodiment, is specified to be 3, and the output of the last abstraction level is the point cloud feature (N, M). And (3) inputting all training samples into the deep neural network, calculating a loss function through the forward propagation and the backward propagation in the steps b-j, updating the weight of the network, and training a neural network model. And setting N and M into different combinations, namely training a plurality of candidate models.
(2) Inference phase
a. Randomly selecting a plurality of points on the surface of a target object in a video frame to be transmitted as seed coordinates, using a farthest point sampling technology to find 256 points around the selected points by taking each seed coordinate as a center, forming a fragment, and normalizing coordinates of a point set in the fragment into a unit sphere. Where the number of seeds is specified by human, in the present invention as the number of target point clouds divided by 256.
b. And selecting an optimal inference model according to the network fluctuation condition.
c. And inputting the optimal reasoning model by taking the fragments as units, and performing forward reasoning to obtain the final point cloud characteristics needing to be transmitted.
In actual application, when point cloud information is reconstructed on the point cloud features through the point cloud recovery reconstruction module, a specific embodiment can be as follows:
(1) Training phase
a. And (5) point cloud features (N, M) are subjected to a two-dimensional convolution layer to obtain point cloud features (N, 128) with uniform dimensions. The layer has the effect that even the key point cloud features compressed by different inference models can output the feature number with the same dimensionality through the layer. The number of output channels of the two-dimensional convolution layer is 128, the size of a convolution kernel is 1 multiplied by 1, the step length is 1 multiplied by 1, and no filling is performed.
b. Point cloud features of uniform dimensions, expressed as
Figure DEST_PATH_IMAGE002
(N, 128) using an up-sampling operation to increase the number of point cloud features to obtain up-sampled point cloud features
Figure DEST_PATH_IMAGE004
Denoted as (256, 128). The up-sampling operation comprises the following specific steps:
and copying the characteristic (N, 128) r times to obtain (rN, 128), wherein r is sampling multiplying power and is equal to 256/N.
And generating a unique two-dimensional vector for each feature by adopting a 2D grid mechanism, and adding the vector to the feature vector of each corresponding point in the same feature to obtain a feature (rN, 128+ 2).
Generating upsampled point cloud features using a self-attention unit and two-dimensional convolution layers
Figure DEST_PATH_IMAGE006
. The output channels of the two-dimensional convolution layers are 256 and 128, the convolution kernel size is 1 multiplied by 1, the step size is 1 multiplied by 1, and no filling exists.
c. Point-to-point cloud feature
Figure 335400DEST_PATH_IMAGE004
(256, 128) obtained using a downsampling operation
Figure 252540DEST_PATH_IMAGE002
Point cloud features of the same scale
Figure DEST_PATH_IMAGE008
Denoted as (N, 128). The downsampling operation comprises the following specific steps:
characterizing a point cloud
Figure 32277DEST_PATH_IMAGE004
By simply moving the rows, the (N, r x 128) feature is modified,
generating point cloud features using two-dimensional convolution layers
Figure 365170DEST_PATH_IMAGE008
. The output channels of the two-dimensional convolution layers are 256 and 128, the convolution kernel size is 1 multiplied by 1, the step size is 1 multiplied by 1, and no filling exists.
d. Characterizing a point cloud
Figure 179542DEST_PATH_IMAGE002
And point cloud characteristics
Figure 2005DEST_PATH_IMAGE008
Subtracting to obtain point cloud characteristics
Figure DEST_PATH_IMAGE010
Dimension is (N, 128).
e. Point-to-point cloud feature
Figure DEST_PATH_IMAGE012
(N, 128) obtaining the upsampled point cloud feature using the same upsampling operation
Figure DEST_PATH_IMAGE014
And the dimension is (256, 128).
f. Characterizing a point cloud
Figure 455988DEST_PATH_IMAGE004
And point cloud characteristics
Figure 592572DEST_PATH_IMAGE014
Adding to obtain point cloud characteristics
Figure DEST_PATH_IMAGE016
And the dimension is (256, 128).
g. Characterizing a point cloud
Figure 261450DEST_PATH_IMAGE016
And (256, 128) performing coordinate reconstruction through two-dimensional convolution layers to obtain a reconstructed point cloud, wherein the dimension is (256,3). The output channel of two-dimensional convolution layers is 64,3, the convolution kernel size is 1 × 1, the step length is 1 × 1, and no filling exists.
And for the point cloud recovery reconstruction module, steps c-f are upward-downward-upward expansion units, and step g is a final point set generation part. Inputting all point cloud characteristic samples obtained through transmission into a deep neural network, calculating a loss function through steps a-g, forward propagation and backward propagation, updating the weight of the network, and training a neural network model.
(2) Inference phase
a. And selecting an optimal inference model according to the network fluctuation condition.
b. And inputting all the point cloud characteristic samples obtained by transmission into a deep neural network, and performing forward reasoning to obtain restored and reconstructed fragment point cloud information through the steps A-G.
c. And adding all the fragment reconstruction information together, selecting points with the same number as the original input point cloud by using a farthest point sampling technology, and reconstructing a final point cloud.
By means of the technical scheme, the original point cloud video stream needing to be transmitted is subjected to feature extraction, only partial key point cloud features are transmitted, and finally recovery reconstruction is carried out at the receiving end, so that the effect that the original point cloud video is transmitted visually is achieved.
In operation, the invention completely uses AI as a drive, namely, a deep neural network is used for automatically realizing feature extraction and target reconstruction, and original point cloud coordinates are directly used as input, other operations including information preprocessing are not needed, the geometrical distribution condition and attribute importance information of the point cloud are artificially obtained, a coding and decoding mode is selected, and the deep neural network only needs to be trained and then deployed to practical application, and the deep neural network is divided into two processes of feature extraction and reconstruction.
The system is an end-to-end training neural network, and comprises feature extraction and reconstruction, namely training can be carried out as long as data are input, the system is more intelligent, internal specific operation does not need to be concerned, and a proper inference model can be selected according to the network condition to provide better experience for users.
Meanwhile, the invention can reach a high-power compression ratio of 30.72 times under the condition of acceptable precision loss, realize real-time transmission under the existing environment of less than 5G, and greatly reduce the transmission data volume.
In conclusion, the invention can obviously reduce the transmission quantity and energy consumption of the point cloud video stream. The invention also considers the dynamic and unstable of the network environment, incorporates the network environment into the end-to-end network design and training, and provides a self-adaptive transmission control algorithm to balance the transmission delay and the reconstruction accuracy
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (8)

1. An AI-driven real-time point cloud video transmission method is characterized by comprising the following steps:
acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data;
extracting hierarchical features of the point cloud video data, determining key point cloud features in the point cloud video frame, and transmitting the key point cloud features;
the method comprises the steps of receiving key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to original input point cloud and form an original point cloud video with a visual effect;
training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;
each trained neural network model is divided into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;
when a hierarchical feature extraction module and a point cloud recovery reconstruction module based on a generated countermeasure network are deployed to a high-performance edge server and user terminal equipment, a self-adaptive matcher is deployed to enable the self-adaptive matcher to adaptively match a neural network model meeting the requirements of real-time point cloud video frame feature extraction and reconstruction recovery of a current network according to network bandwidth change monitored in real time.
2. The AI-driven real-time point cloud video transmission method according to claim 1, wherein the hierarchical feature extraction module includes three sets of abstraction levels connected in series for performing hierarchical feature learning to capture a local structure of an original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer; wherein the content of the first and second substances,
a sampling layer for selecting a subset from the output of the previous layer using a farthest point sampling technique, each point in the subset representing the center of a local area;
grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set;
the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer;
and the eigenvector output by the mini PointNet layer of the last set abstraction layer is the data to be transmitted.
3. The AI-driven real-time point cloud video transmission method according to claim 2, wherein the point cloud restoration reconstruction module comprises a point cloud feature expansion portion and a final point set generation portion, wherein,
the point cloud feature expansion part is used for unifying the dimensions of the features through a multilayer perceptron when the transmitted key point cloud features are received; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit;
and the final point set generating part comprises two multilayer sensor layers, and reconstructs the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
4. The AI-driven real-time point cloud video transmission method of claim 3,
the method comprises the following steps of acquiring video data information by using AI (Artificial intelligence) generating equipment, and processing the video data information to obtain point cloud video data:
scanning a 3D model to be transmitted by using a plurality of depth cameras with different angles, and acquiring a point cloud flow of each depth camera; splicing a plurality of point clouds collected by the multi-view camera into a complete point cloud to obtain point cloud video data;
extracting hierarchical features of the point cloud video data, and determining key point cloud features in the point cloud video frame comprises the following steps:
performing hierarchical feature extraction on point cloud video data through a hierarchical feature extraction module deployed to a high-performance edge server close to an input end to determine key point cloud features in a point cloud video frame;
expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and forming an original point cloud video with a visual effect comprises the following steps:
and expanding and reconstructing key point cloud characteristics through a point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network, so that point cloud information similar to the original input point cloud is obtained, and an original point cloud video on the visual effect is formed.
5. An AI-driven real-time point cloud video transmission system, comprising:
the point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment and processing the video data information to obtain point cloud video data;
the key point cloud feature extraction module is used for extracting hierarchical features of point cloud video data, determining key point cloud features in a point cloud video frame and transmitting the key point cloud features;
the point cloud recovery and reconstruction module is used for receiving the key point cloud characteristics, expanding and reconstructing the received key point cloud characteristics to obtain point cloud information similar to the original input point cloud and form an original point cloud video on the visual effect;
the neural network training module is used for training a pre-built deep neural network on line by using a training set consisting of a large number of 3D models with various scales to obtain a plurality of candidate neural network models;
the neural network deployment module is used for splitting each trained neural network model into a hierarchical feature extraction module and a point cloud recovery reconstruction module based on the generated countermeasure network; respectively deploying the high-performance edge server and the user terminal equipment close to the input end;
and the neural network matching module is used for deploying the hierarchical feature extraction module and the point cloud restoration and reconstruction module based on the generated countermeasure network to the high-performance edge server and the user terminal equipment, and deploying the self-adaptive matcher to enable the self-adaptive matcher to adaptively match the neural network model meeting the requirements of the real-time point cloud video frame feature extraction and reconstruction and restoration of the current network according to the network bandwidth change monitored in real time.
6. The AI-driven real-time point cloud video transmission system of claim 5, wherein the hierarchical feature extraction module comprises three sets of abstraction levels connected in series for performing hierarchical feature learning to capture the local structure of the original point cloud; the set abstraction layer consists of three basic layers including a sampling layer, a grouping layer and a mini PointNet layer; wherein the content of the first and second substances,
a sampling layer for selecting a subset from the output of the previous layer using a farthest point sampling technique, each point in the subset representing the center of a local area;
grouping layers, finding n nearest neighbor points around the center of a local area, and combining to construct a local area set;
the mini PointNet layer converts a local region set into a characteristic vector by adopting 3 two-dimensional convolution layers and 1 maximum pooling layer;
and the eigenvector output by the mini PointNet layer of the last collection abstraction layer is the data to be transmitted.
7. The AI-driven real-time point cloud video transmission system of claim 6, wherein the point cloud restoration reconstruction module comprises a point cloud feature expansion portion and a final point set generation portion, wherein,
the point cloud feature expansion part is used for unifying the dimensions of the features through a multilayer perceptron when the transmitted key point cloud features are received; and generating more diversified point cloud number and characteristic dimension through an upward-downward-upward expansion unit;
and the final point set generating part comprises two multilayer sensor layers and is used for reconstructing the expanded point cloud characteristics into a three-dimensional coordinate form through the two multilayer sensor layers.
8. The AI-driven real-time point cloud video transmission system of claim 7,
the point cloud video data acquisition module is used for acquiring video data information by using AI (artificial intelligence) generation equipment, processing the video data information and scanning a 3D (three-dimensional) model to be transmitted by using a plurality of depth cameras at different angles when point cloud video data are obtained, and acquiring a point cloud stream of each depth camera; splicing a plurality of point clouds collected by a multi-view camera into a complete point cloud to obtain point cloud video data;
the key point cloud feature extraction module extracts the hierarchical features of the point cloud video data through the hierarchical feature extraction module arranged to the high-performance edge server close to the input end when the hierarchical features of the point cloud video data are extracted and the key point cloud features in the point cloud video frame are determined;
the point cloud recovery reconstruction module expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and when an original point cloud video on the visual effect is formed, the point cloud recovery reconstruction module which is deployed on user terminal equipment close to the input end and is based on the generation countermeasure network expands and reconstructs the key point cloud characteristics to obtain point cloud information similar to the original input point cloud, and the original point cloud video on the visual effect is formed.
CN202110985757.0A 2021-08-26 2021-08-26 AI-driven real-time point cloud video transmission method and system Active CN113810736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110985757.0A CN113810736B (en) 2021-08-26 2021-08-26 AI-driven real-time point cloud video transmission method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110985757.0A CN113810736B (en) 2021-08-26 2021-08-26 AI-driven real-time point cloud video transmission method and system

Publications (2)

Publication Number Publication Date
CN113810736A CN113810736A (en) 2021-12-17
CN113810736B true CN113810736B (en) 2022-11-01

Family

ID=78894093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110985757.0A Active CN113810736B (en) 2021-08-26 2021-08-26 AI-driven real-time point cloud video transmission method and system

Country Status (1)

Country Link
CN (1) CN113810736B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691243A (en) * 2019-10-10 2020-01-14 叠境数字科技(上海)有限公司 Point cloud geometric compression method based on deep convolutional network
CN113256640A (en) * 2021-05-31 2021-08-13 北京理工大学 Method and device for partitioning network point cloud and generating virtual environment based on PointNet

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
LU100465B1 (en) * 2017-10-05 2019-04-09 Applications Mobiles Overview Inc System and method for object recognition
CN110012279B (en) * 2018-01-05 2020-11-17 上海交通大学 3D point cloud data-based view-division compression and transmission method and system
CN108320330A (en) * 2018-01-23 2018-07-24 河北中科恒运软件科技股份有限公司 Real-time three-dimensional model reconstruction method and system based on deep video stream
US11113870B2 (en) * 2019-03-18 2021-09-07 Samsung Electronics Co., Ltd. Method and apparatus for accessing and transferring point cloud content in 360-degree video environment
KR20230152815A (en) * 2019-03-21 2023-11-03 엘지전자 주식회사 Method of encoding point cloud data, apparatus of encoding point cloud data, method of decoding point cloud data, and apparatus of decoding point cloud data
CN111901601B (en) * 2019-05-06 2023-03-31 上海交通大学 Code rate allocation method for unequal error protection in dynamic point cloud data transmission
US11210815B2 (en) * 2019-08-09 2021-12-28 Intel Corporation Point cloud playback mechanism
US11729243B2 (en) * 2019-09-20 2023-08-15 Intel Corporation Dash-based streaming of point cloud content based on recommended viewports
CN111783838A (en) * 2020-06-05 2020-10-16 东南大学 Point cloud characteristic space representation method for laser SLAM
CN112672168B (en) * 2020-12-14 2022-10-18 深圳大学 Point cloud compression method and device based on graph convolution
CN113141526B (en) * 2021-04-27 2022-06-07 合肥工业大学 Point cloud video self-adaptive transmission method for joint resource allocation under QoE (quality of experience) drive

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691243A (en) * 2019-10-10 2020-01-14 叠境数字科技(上海)有限公司 Point cloud geometric compression method based on deep convolutional network
CN113256640A (en) * 2021-05-31 2021-08-13 北京理工大学 Method and device for partitioning network point cloud and generating virtual environment based on PointNet

Also Published As

Publication number Publication date
CN113810736A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN109377530A (en) A kind of binocular depth estimation method based on deep neural network
CN110599395B (en) Target image generation method, device, server and storage medium
CN111512342A (en) Method and device for processing repeated points in point cloud compression
CN113315972B (en) Video semantic communication method and system based on hierarchical knowledge expression
CN109949222B (en) Image super-resolution reconstruction method based on semantic graph
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN116051740A (en) Outdoor unbounded scene three-dimensional reconstruction method and system based on nerve radiation field
Chen et al. Toward knowledge as a service over networks: A deep learning model communication paradigm
CN113792641A (en) High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism
Xia et al. WiserVR: Semantic communication enabled wireless virtual reality delivery
Huang et al. Toward holographic video communications: A promising AI-driven solution
WO2022205755A1 (en) Texture generation method and apparatus, device, and storage medium
CN102075283B (en) Information steganography method and device
CN115131196A (en) Image processing method, system, storage medium and terminal equipment
Zhang et al. Semantic sensing and communications for ultimate extended reality
CN113810736B (en) AI-driven real-time point cloud video transmission method and system
CN110782503B (en) Face image synthesis method and device based on two-branch depth correlation network
CN104917532A (en) Face model compression method
Li et al. Towards communication-efficient digital twin via ai-powered transmission and reconstruction
EP4164221A1 (en) Processing image data
EP4224860A1 (en) Processing a time-varying signal using an artificial neural network for latency compensation
CN112541972A (en) Viewpoint image processing method and related equipment
CN114694065A (en) Video processing method, device, computer equipment and storage medium
CN116246010A (en) Human body three-dimensional reconstruction method based on image
CN115131228A (en) Image restoration method, system, storage medium and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant