WO2022129737A1 - Procédé et dispositif de compression de données représentatives d' une scène tridimensionnelle volumétrique en vue d'une décompression en temps réel pour un visionnage en ligne - Google Patents
Procédé et dispositif de compression de données représentatives d' une scène tridimensionnelle volumétrique en vue d'une décompression en temps réel pour un visionnage en ligne Download PDFInfo
- Publication number
- WO2022129737A1 WO2022129737A1 PCT/FR2021/052252 FR2021052252W WO2022129737A1 WO 2022129737 A1 WO2022129737 A1 WO 2022129737A1 FR 2021052252 W FR2021052252 W FR 2021052252W WO 2022129737 A1 WO2022129737 A1 WO 2022129737A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- blocks
- frames
- texture
- textures
- meshes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000006837 decompression Effects 0.000 title description 9
- 239000002131 composite material Substances 0.000 claims abstract description 22
- 238000007906 compression Methods 0.000 claims description 64
- 230000006835 compression Effects 0.000 claims description 64
- 238000012545 processing Methods 0.000 claims description 23
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 238000006073 displacement reaction Methods 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000033001 locomotion Effects 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000013139 quantization Methods 0.000 claims 1
- 238000006467 substitution reaction Methods 0.000 abstract 1
- 238000007796 conventional method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 230000000007 visual effect Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 230000002123 temporal effect Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 230000009467 reduction Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 238000011002 quantification Methods 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 230000001976 improved effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/54—Motion estimation other than block-based using feature points or meshes
Definitions
- TITLE Method and device for compressing data representative of a volumetric three-dimensional scene for real-time decompression for online viewing.
- the invention relates to a method and a device for compressing data representative of a volumetric three-dimensional scene with a view to real-time decoding for online viewing of a volumetric video by an end user.
- the object of the invention is to compress vo 1 umetric data, which are representative of three-dimensional scenes, with a view to “online” viewing by an end user.
- the volumetric data are obtained from the capture of a three-dimensional scene by photogrammetry, that is to say by means of a set of cameras, consisting of 106 cameras in this case. , each capturing an image of the scene from a particular angle at a frequency of the order of 30 to 60 images taken per second.
- point clouds are representative of the surfaces, seen by the cameras, of the elements of the scene, and serve as a basis for the modeling of the scene by a mesh of connected triangles. continuously to each other, on which a texture is. plated, independently for each frame at first.
- a geometric tracking of the mesh is operated in time, consisting in approaching the constituent meshes of the frames by deformation of a reference mesh belonging to a frame called "keyframe or “keyframe”, so that the meshes of the following frames have the same number of triangles and the same connectivities as this reference mesh.
- interframes The frames whose mesh is defined on the basis of the mesh of a keyframe, located between two keyframes, are called “interframes” or “interframes”.
- the compression of the data concerning the mesh benefits from the temporal redundancies between the neighboring frames whose mesh is based on that of the same key frame.
- a mesh of a keyframe is fully encoded, while only mesh variations are encoded for interframes, which is more economical in terms of resulting data volume and. computationally intensive for their decodings than the complete encoding and decoding of the meshes of each frame.
- each frame corresponds to a complete atlas of textures which must be encoded and then decoded entirely during compression and decompression, respectively, according to the principle of MPEG and H.264 compression standards.
- the video temporal compression of this type of process is based, in this order, on the definition of blocks by cutting out an image, the comparison between blocks of two adjacent frames, then the encoding/compression of a difference between two blocks; during decoding, this difference is applied individually to each pixel.
- the methods of this type are intended to provide the lowest possible transmission rate on a computer network , without consideration for the transfer of images between the processor and the graphics card of a computer processing these image data streams . or for the occupation of the memory of the graphics card.
- the invention aims to improve on the one hand. the compression of the information relating to the meshes of the frames of a volumetric video stream, and on the other hand to the information relating to the textures associated with these same frames.
- the invention relates more particularly to a method for compressing a volumetric video stream of a three-dimensional action scene represented by a plurality of frames, the method being implemented by computer, and according to which a mesh is generated and a texture for each frame of the scene and groups of frames each comprising a keyframe and inter-frames are defined, the meshes of which are defined with respect to the mesh of the corresponding keyframe, the method comprising a step of compressing the information representative of the textures, the texture compression step comprising, for each group of frames, the steps of compressing the information representative of the textures of each of the frames of the group according to a block compression method capable of forming blocks that can be directly exploited by a graphics processing unit conventional according to standard algorithms such as DXT, ASTC or PVRTC, thus defining blocks of pixels directly usable by a conventional graphics processing unit, and comprising blocks associated with the key frame and blocks associated respectively with each of the inter-frames ; determine, from the blocks associated with the keyframe and the blocks associated with the interframes, on
- the compression of the information representative of the textures takes advantage in particular of the temporal correspondence which exists between the textures of the successive frames forming a volumetric video stream, avoiding the coding, the transmission, then the decoding of redundant information.
- the decompression of the encoded video stream by means of the compression method according to the invention has a low computational cost and the level of compression can be easily chosen by means of a simple parameter, making it possible to adapt the compromise between the level of compression and the ease of decompressing to the needs of the user and therefore of obtaining a volumetric video flux particularly suitable for online reading in real time.
- the steps of encoding the composite textures of the keyframes and the textures of the inter-frames can comprise compression by entropy coding;
- the step of determining the first blocks and the second blocks can comprise a step of evaluating quantified graphical differences of the blocks of a group of frames of a given position relative to each other , this step of evaluating based on calculations such as PSNR (Peak Signal to Noise Ratio) calculations between two blocks considered, resulting: in numbers each representative of the graphic variations between these two blocks considered;
- PSNR Peak Signal to Noise Ratio
- the step of determining the first blocks and the second blocks can further comprise a step of constructing a graph comprising nodes interconnected along rows and columns, the quantified graph differences being assigned to a cost of moving along a row between two nodes and a cost being assigned to displacements according to a column; and a step of determining a path of lowest cost in this graph, each start of path at the level of a column being associated with one of the first blocks and each displacement according to a column being associated with one of the second blocks, the lowest cost being equal to a sum of travel costs each between two nodes on the same line and of travel costs each between two nodes on the same column, the path being made up of travel each between two nodes on the same line and displacements each between two nodes on the same column;
- the path determination step can implement a Dijkstra algorithm
- the method may further comprise a step of compressing the information representative of the meshes comprising the steps of compressing the information representative of the meshes by quantification; compress the information representative of the quantized meshes according to a compression method which comprises the steps of compressing information representative of triangles of the meshes of the keyframes according to the Edgebreaker algorithm; compress information representing points of the meshes of the keyframes according to a prediction algorithm; compress information representative of texture coordinates of the keyframes according to a linear selective prediction algorithm; and compress information representative of points of the meshes of the inter-frames by differential coding; and compressing, by means of an entropy calculation algorithm, the compressed information representing said points of the meshes, of the said triangles and of the said texture coordinates of the keyframes as well as the information representative of the points of the meshes of the interframes,
- the invention may extend to:
- a computer program comprising instructions which, when the program is executed by a computer, lead it to implement the steps of the method
- a computer-readable medium comprising instructions which, when executed by a computer, lead it to implement the steps of the method.
- FIG. 1A illustrates a videogrammetry studio comprising cameras
- FIG. 1B is a diagram of the system for capturing images and processing data from the cameras of FIG. 1A
- FIG. 1C is a diagram of a process for producing volumetric videos of a scene
- FIG. 1D illustrates a cloud of points as defined in the method of FIG. 1C
- FIG. 1E illustrates a mesh corresponding to the point cloud of FIG. 1D
- FIG. 2A illustrates a succession of frames
- FIG. 2B illustrates a distribution of the frames of FIG.
- Figure 2C illustrates a texture
- the 2D figure illustrates a complete texture for a keyframe
- FIG. 3A is a diagram of the process according to the invention
- Figure 3B is. a diagram of a particular step of the process according to the invention
- FIG. 4A is a table of error values
- FIG. 4B is a graph constructed from the table in FIG. 4A
- FIG. 5A is a particular case of the graph of FIG. 4B after processing according to the invention
- Figure 5B is a table summarizing the results of Figure 5A
- FIG. 5C illustrates the uncompressed and compressed textures of a succession of frames
- FIG. 6 is a diagram illustrating the decompression of a compressed volumetric video data stream according to the invention.
- a volumetric action scene that is to say a scene taking place in time and in the three directions of space
- the scene is acted out by 105 actors in a 100 videogrammetry studio with a green screen.
- Such a studio is composed of a structure 110 surrounding a plate 120 and which has the function of supporting a set of cameras 130 observing the scene 125 from a variety of viewpoints
- the cameras 130 are connected to a system 140 of storage and data processing, as well as a user interface 150.
- a method of producing volumetric video using studio 100 includes the following steps in block diagram 155 of Figure 1C.
- the cameras capture the scene in a synchronized manner, for example at a frequency comprised between 30 and 60 images per second, each according to its own point of view, which makes it possible to reconstruct the scene as a whole.
- the images captured by the cameras are optionally processed for calibration, to correct biases or other errors, and to subtract the background therefrom, according to conventional methods.
- these reprocessed images feed an algorithm known to those skilled in the art, implemented by the data processing system 140 so as to produce point clouds C as illustrated by FIG. 1D, representative of the objects constituting the scene, by establishing depth maps of the visible surfaces of these objects according to the principle of stereography, by comparing images of the same physical surface captured by different cameras according to different viewing angles.
- a mesh M as illustrated in FIG. 1E is generated from the point clouds obtained, and a texture intended to be mapped onto it, according to conventional methods, such as by reconstruction of Poisson surfaces and use of UVAtlas source code, copyright ® Microsoft Corporation, respectively for each frame, independently from frame to frame.
- the scene is represented by a succession of frames F, each frame being associated on the one hand with information representative of a respective texture T, and on the other hand with information representative of a respective mesh M, independently from one frame to another.
- FIG. 2A illustrates 5 frames F1 to F5 in succession, each associated respectively with a mesh M1 to M5 and with a texture T1 to T5.
- Each mesh is made up of a set of points linked together by triangles , recreating the surface of the objects in the scene , and each triangle is associated with texture coordinates using a texture coordinate table in order to be able to properly map the texture onto the triangles.
- the triangles are used as elementary surfaces decomposing the surfaces of the objects as seen by the cameras.
- a step 168 of geometric and photometric tracking is implemented in order to distribute the frames F into groups of frames FGr each comprising a key frame KF and a plurality of inter-frames IF.
- FIG. 2B illustrates 18 frames F1 to F18 divided into three groups of frames FGr1 to FGr3 each comprising a key frame KF and inter-frames IF in variable number according to the groups, frames obtained in a flow 172 in output step 170 .
- a reference mesh is associated with each keyframe
- the meshes of the inter-frames are defined by deformation of the reference meshes, that is to say by means of information representative of deformations of the meshes associated with the corresponding keyframes
- a texture is associated with each frame.
- Each texture can be represented by an image comprising patches on a uniform background, as illustrated by FIG. 2C.
- the information representative of the meshes consists, in the case of the keyframes, of information representative of the points of the mesh, i.e. of the location data of these points in a three-dimensional space; into information representative of the triangles of the mesh, i.e. surface data each comprised between 3 of the points of the mesh; and in information representative of. textures associated with respective mesh triangles, i.e. texture coordinate data.
- each frame is associated with a texture specific to it within a stream 172 of video data comprising texture data T and mesh data M, each texture being a digital image.
- a test step 301 it is determined whether the incoming data is mesh data M or texture data T.
- block compression is applied to each texture associated with a frame of the stream of frames, the textures being retrieved from the texture data T, so that the texture of each frame is associated with a set of blocks.
- Block compression is a conventional method of compression that reduces the amount of memory required for store color data, in which blocks of pixels (such as 4 pixel by 4 pixel or 8 pixel by 8 pixel squares) are compressed considering that the variations within the same block are very small, according to standard algorithms such as DXT, ASTC or PVRTC.
- block refers to blocks obtained by an algorithm of the type mentioned in the previous paragraph and which are directly exploitable by a conventional graphics processing unit, or GPU, i.e. without requiring decompression or other treatment.
- the "blocks” as considered in the method according to the invention are distinct from the “blocks” used in compression methods such as those of JPEG and MPEG coding which are of the AVC (Advanced Video Coding) type mentioned above in the “Prior Technique” section.
- the blocks used in AVC type processes result directly from a cutting of an image and are therefore each a group of pixels.
- a characteristic of the blocks of AVC-type methods is that they are usually not considered to be directly exploitable by a graphics processing unit, unlike the blocks of the method according to the invention.
- block compression according to the invention results in data that can be decoded directly by conventional graphics processing units.
- this feature represents a decisive advantage by reducing the amount of data to be transferred and the computational cost of decompressing the transferred data.
- a strong advantage of the method according to the invention compared to conventional methods is thus to take advantage of the ability of GPUs to perform decompression operations at block level to facilitate the processing of the video streams obtained.
- block compression within the meaning of the invention is particularly suited to the compression of textures, in particular of the atlas type, and much more suitable than image compression as used in JPEG and MPEG coding.
- the blocks directly interpretable by a GPU as considered in this invention present a difficulty in their compression due to the fact that it is not possible to modify a block by a difference.
- a texture reduction method is applied during a step 304, according to the following original method called reduction by partial texture.
- one chooses, among blocks associated with the key frame and blocks associated with the inter-frames of a group of frames, on the one hand a set (210) of first blocks capable of forming a composite texture of the keyframe, and on the other hand a set (220) of second blocks able to modify by iterative replacements the composite texture of the keyframe so as to form approximate textures of the inter-frames.
- the only encoded blocks are those which provide signif icantly new information.
- This feature allows a considerable reduction in the volume of texture data to be encoded during compression, then to be transmitted and decoded when viewing the video online.
- Blocks encoded for interframes are used to modify the composite texture of the keyframe by iteration with each new frame, when necessary to maintain acceptable visual quality.
- FIG. 2D illustrates this situation, with a complete texture of a keyframe, complete texture formed by a set 210 of 256 blocks, and a set 220 of 92 blocks forming a partial texture of an interframe corresponding to this keyframe, the missing blocks being considered sufficiently close to those of the keyframe not to have to be encoded again.
- blocks already encoded are used which are graphically close to these non-encoded blocks, as long as the degradation in terms of visual quality remains acceptable.
- the blocks already encoded are either those of the key frame, or blocks of a partial texture that have already come to replace those of the key frame, so that one obtains textures approximate to the textures of the inter-frames. by modifying the texture of the keyframe by successive iterations.
- the frame stream data consists of groups of frames, each group comprising a keyframe followed by interframes, in temporal succession.
- the textures associated with the frames of the same group naturally resemble each other, as a consequence of the very definition of the groups by geometric and photometric tracking, and therefore have a large number of blocks in common that it is not necessary to redistribute. encode multiple times, the initial encoding of a reconstituted texture for the keyframe and certain blocks chosen for interframes being sufficient.
- the issue of the. reduction by partial texture is to determine which blocks are to be used for the keyframe texture and which blocks need updating within each group of frames, in such a way as to minimize the number of blocks to encode while , maintaining acceptable visual quality of the final video .
- the graphic differences of the blocks of each frame relative to each other are evaluated, either from block to block directly, or from block to an uncompressed texture. corresponding to a block, as described, below.
- Nf different blocks B1 to BNf follow one another at the same position respectively during the frames F1 to FNf and can be considered as the temporal variants of a given block during the Nf frames F1 to FNf considered.
- FIG. 5C representing a group of a sequence of Nf frames each associated with an uncompressed texture Tnc and with a compressed texture Tc, the blocks 131 to BNf corresponding to the same position Pos in the compressed textures respective Tc and to images (or sets of pixels) I1 to INf corresponding to this same position in the respective uncompressed textures Tnc.
- the blocks B1 to BnF and the images I1 to INf consist of sets of pixels of the same position, of the same geometry and of the same dimension.
- PSNR Peak Signal to Noise Ratio
- a PSNR is calculated between the blocks B1 to BNf of the respective frames F1 to FNf and the images I1 to INf of the same position of the uncompressed textures of the corresponding frames of the same group, which gives us gives NfxNf PSNR calculations to perform.
- the PSNRs constitute an indication of the resemblance between two images, here a first image which is that of a block defined during the step of compression by block and a second image which is that constituted by a group of pixels corresponding to the block in an uncompressed image.
- the PSNR is expressed on a logarithmic scale and a large value of PSNR is indicative of a strong resemblance between two images.
- a step 304-3 consists in constructing, for each block position in a group of Nf frames, a square table 400 comprising NfxNf entries, each consisting of one of the error values ErrVal calculated in step 304-2.
- Each row of the table 400 is dedicated to the evaluations of the temporal variations of a given position block over the course of Nf frames F1 to FNf, referring here to the images I1 to INf corresponding to the blocks in the corresponding uncompressed textures.
- Each column of the table 400 is dedicated to a frame, the frames being classified in their order of appearance in the volumetric video stream and being identified F1 to FN.
- the table entries are filled in by inserting therein the error values ErrVal calculated in step 304-2 in the following manner.
- An entry of coordinates (p;q) of the p-th row and the q-th column of the table 400 corresponds to an error value ErrVal (Bp/Iq) calculated between a p-th block at a given position of compressed texture of the p-th frame and an image Iq of a set of pixels at this given position of the uncompressed texture at the q-th frame Fq, and translates the graphical difference amplitude between this p-th block and this set of pixels.
- the second entry in the first row of the table corresponds to an error value ErrVal (B1/I2) calculated between the first block B1 at the first frame F1 and the set of pixels 12 corresponding to it. block in the uncompressed texture in the second frame F2 before block compression, and translates the amplitude of graphic difference between this first block B1 and this set of corresponding pixels 12.
- the diagonal entries (B1;F1) to (BNf/FNf) of the table translate the amplitude of the graphic difference between a block and the corresponding group of pixels of the uncompressed texture of the same frame, indicating the degradation of video image quality introduced by the block compression step.
- a possible approach defined by the inventors to minimize the quantity of data to be encoded then to be decoded during the reading of a video stream consists in minimizing the number of blocks to be encoded by making a choice among the blocks by means of the table 400 which indicates the errors introduced by the block compression of step 302, not only within a frame, but also between the frames of the same group for a given position of blocks.
- the choice amounts to determining within the table a path having the lowest possible cost to go from any entry of the first column to the left of the table corresponding to the first frame of the group considered to any entry of the last column at line corresponding to the last frame of the group considered, by moving either to the right or vertically (up or down), by introducing a cost for each horizontal displacement, a cost for each vertical displacement, and by seeking to minimize the overall cost induced by the path followed.
- One way of optimizing the path is to assign a first cost to the degradation of visual quality induced by the reuse of the same block to encode the images of several frames , a second cost to the intensity of the calculation , corresponding to increasing the amount of data to be encoded, then determining the minimizing path, the overall cost representing the sum of the first cost and the second cost.
- the optimal path can be determined by classical methods of graph theory, passing for example by the graph 450 of FIG. 4B constructed from the table 400, this graph being made up of nodes and possible displacements between these nodes, by determining the path according to parameters, here the costs mentioned above, defined by the practitioner according to his objectives and his priorities.
- the central nodes of the graph 450 are identified by U p, q and correspond to the entries of the p-th column and the q-th row of the table 400 .
- a node U p , q of the graph corresponds to the use of a block of the block position considered in the texture of the p th frame to encode the texture of the q th frame.
- the nodes are arranged in the same way as the table entries, in Nf rows and Nf columns, a fictitious starting node Ud being connected along rows to each of the nodes corresponding to the entries ErrVal (Bl/Il) to ErrVal (BFn/ II) of the first column of the table, with in this example the number of frames Nf equal to 4 .
- each ellipse represents a node and each arrow represents a possible displacement between two nodes.
- the possible movements in the graph are horizontal, within the same row, or vertical, within the same column.
- the possible horizontal displacements are from a given node to the node immediately adjacent to its right , as represented by the arrows in solid lines of the graph 400, for an August of visual degradation considered as corresponding to the value ErrVal of the entry of the table corresponding to the end node of the displacement considered.
- the possible vertical displacements are made between two nodes located one above the other, immediately adjacent or not, as represented by the arrows in dotted lines, for a computation cost Ccalc defined by the user.
- the optimal path sought can be determined by, among other conventional methods, the implementation of Dijkstra's algorithm, generally used to determine the shortest path between two points of a graph made up of a plurality of interconnected nodes, which is the case of the graph 450.
- the optimal path the one having the lowest overall cost, is considered as the one allowing a minimization of the volume of data to be encoded compatible with an acceptable video quality.
- the video quality is determined by the practitioner by choosing the Ccalc cost according to criteria depending on his priorities, a high Ccalc cost favoring a high compression rate, a low Ccalc cost inducing a high visual quality, an intermediate Ccalc cost leading to a compromise between compression ratio and visual quality.
- the cost Ccalc is preferably of the order of 0.0001, for example between 0.00001 and 0.001, or could for example be initialized to an average value of the error value ErrVal.
- FIG. 5A illustrates an application of the graph 450 to the particular case of a group of 5 frames for a given block position, with an optimal path Popt passing through the nodes U 2.1 , U 2.3 , U 4.3 and U 4.5, thus including a change of lines between the nodes U 2.3 and U 4.3 .
- the B2 block of the second frame is encoded and used for the first and second F1 and F2 frames when encoding the video, i.e. for the composite texture of the keyframe and the first interframe.
- This block B2 does not have to be re-encoded for the second frame since it has already been re-encoded for the first.
- Block B4 of the fourth frame is encoded and replaces block B2 for frames 3 and following F3, F4 and F5.
- Block B4 is encoded only once but is used for 3 frames.
- Step 304 as described above only applies to a given block position within the textures of the interframes of each frame group.
- This step 304 is therefore repeated to be applied at each block position to determine which blocks to use to encode the textures as a whole, as indicated by the L loop in diagram 300.
- a frame is a KF key frame or an IF inter-frame.
- all of the blocks necessary for encoding the complete texture associated with a keyframe are retrieved and as determined at step 304 for each block position, so that forms a composite texture for this keyframe, and a conventional compression method is applied to this composite texture, such as an entropic compression method, for example by Huffman coding.
- a complete composite texture is encoded using blocks of different textures associated with respective frames of the group of frames of the key frame considered, as described above.
- This point represents a first difference from conventional texture encoding methods, in which the texture associated with a keyframe is. encoded exclusively with the blocks resulting from its compression, by block, independently of the textures of the close frames.
- the blocks newly encoded for this inter-frame as determined at step 304 are retrieved for each inter-frame.
- each frame is associated with a mesh specific to it within a stream 172 of volumetric video data comprising texture data T and mesh data M, with a reference mesh associated with each keyframe and. information relating to the modifications of the reference mesh for each inter-frame in a given group of frames.
- One step. 301 of test it is determined whether the incoming data are mesh data M or texture data T.
- a conventional method of compression by quantification is applied to each mesh retrieved from the mesh data M, consisting in discretizing the coordinates of the defining points, the mesh as well as the coordinates of the textures associated with a subset of values, for a reduction of the occupied memory.
- a given quantized mesh is associated with a keyframe or with an interframe.
- a compression method is applied to the information representative of the meshes of the key frames, comprising the sub-steps 354-1, 354-2 and 354-3, each adapted to a particular type of data which, considered in together, define the meshes of the keyframes.
- the information representative of the keyframe triangles is compressed using an algorithm known as “Edgebreaker”, known for example from J. Rossignac, “Edgebreaker;
- the information representative of the points, or vertices, of the key frames is compressed using a prediction algorithm according to known methods, as explained for example in the reference C. Touma and C. Gotsman, Triangle Mesh Compression, Proceedings Graphics Interface 98, p. 26-34,1998.
- the information representative of the texture coordinates of the triangles of the keyframes is compressed by a linear selective prediction algorithm, as for example described by Isenburg, M, and Snoeyink, J. in “Compressing texture coordinates with selective linear predictions”, Proceedings of Computer Graphics International’ 2003.
- This so-called linear selective prediction algorithm encodes the coordinates of textures by point (vertex) whereas they are usually encoded by triangle, which reduces their number and allows better compression as well as optimization of calculations during decoding .
- a method of compression by differential coding is applied to the information representative of the points, or vertices, of the inter-frames.
- the triangles and texture coordinates of the interframes are the same as those of the corresponding keyframe, their differences are therefore zero within each group of frames, and this information is not do not have to be re-encoded.
- Differential coding also called delta compression or delta encoding
- delta compression is a lossless data compression technique consisting of transforming data by the series of differences between successive data, a particularly effective technique in the present case.
- step 358 the information compressed during steps 354-1 to 354-3 and 356 is compressed again, this time by means of an entropy calculation algorithm.
- FSE Finite State Entropy
- steps 354-1 to 354-3, 356 and 358 are, taken individually, already known to those skilled in the art, their combination as described here is new and leads to results superior to known combinations. of methods conventional with a view to obtaining a compressed video stream suitable for viewing online.
- the video data relating to the meshes can, when they are compressed according to the invention, easily be decoded in real time by conventional computing systems.
- step 360 the data streams resulting from steps 308, 310 and 358 are combined into a video file capable of being decoded in streaming for viewing via a computer network.
- the information representative of the frames is sent via a computer network and decoded sequentially, as it arrives at the computer system of the user in the form of a stream 602 of compressed data, according to the diagram in FIG. 6.
- the data representative of each frame arrive sequentially according to the order of their respective frames in the video stream and are decoded in this order, the texture data T and the mesh data M being separated.
- the frames include the keyframes and the interframes identified respectively by KF and IF in the figure and represent the pieces of information to be decoded, M and T for the information representative of the meshes and the textures, respectively.
- the data representative of the meshes and the textures are respectively identified to be decoded separately thereafter.
- a step 61.0 the data representative of the textures T are subjected to entropic decoding.
- step 612 the data decoded from step 610 and corresponding respectively to the key frames KF and to the inter-frames IF are identified.
- the data representative of the textures of the KF keyframes does not require additional processing, because although still in a compressed format, it is the block compression of step 302, the products of which can be directly processed by conventional graphics processing units.
- the data representative of the textures of the inter-frames IF which depend on the data of the other frames, are processed in such a way as to resume the data of the previous frame and to modify them by replacing those therein. among the blocks for which a new blou has been encoded following step 310 by the corresponding new block.
- the data representative of the meshes M are first subjected to a decoding of the entropic encoding by FSE, which is carried out according to conventional algorithms of very fast execution.
- the data corresponding respectively to the key frames KF and to the inter-frames IF are identified.
- the data M representative of the meshes of the key frames KF obtained following the steps 354-1, 354-2 and 354-3 are decoded independently of the data of the other frames according to conventional methods, fast with regard to the types of compression used (prediction or Edgebreaker) for this data.
- step 626 the data representative of the meshes M of the interframes IF obtained following step 354, and which depend on the data already decoded of the key frames KF on which they respectively depend, are decoded according to conventional methods, very fast since it is only a delta compression on the positions of the vertices.
- step 628 the de-quantification of the mesh data resulting from the steps 624 and 626 is carried out, according to conventional methods.
- the data obtained following the operations 610, 614 and 628 are processed by a data processing unit and/or a graphics processing unit in a conventional manner in order to proceed with the display 630 of the video.
- the weight of the data as well as the speed of decoding have been significantly improved, the decoding being able to be implemented by conventional calculation units.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112021006478.9T DE112021006478T5 (de) | 2020-12-17 | 2021-12-09 | Verfahren und vorrichtung zum komprimieren von einer szene mit dreidimensionalem volumen repräsentierenden daten im hinblick auf eine echtzeit-dekompression für eine online-betrachtung |
GB2309035.0A GB2616566A (en) | 2020-12-17 | 2021-12-09 | Method and device for compressing data representative of a volumetric three-dimensional scene with a view to real-time decompression for online viewing |
JP2023537222A JP2024503787A (ja) | 2020-12-17 | 2021-12-09 | ボリュメトリック3次元シーンを表すデータをオンライン視聴のためにリアルタイム解凍するためのビューを用いて圧縮する方法及び装置 |
US18/267,894 US20240040101A1 (en) | 2020-12-17 | 2021-12-09 | Method and device for compressing data representative of a volumetric three-dimensional scene with a view to real-time decompression for online viewing |
KR1020237023968A KR20230119694A (ko) | 2020-12-17 | 2021-12-09 | 온라인 시청을 위한 실시간 압축해제를 위해 볼류메트릭3차원 장면을 나타내는 데이터를 압축하기 위한 방법 및 디바이스 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FRFR2013513 | 2020-12-17 | ||
FR2013513A FR3118379B1 (fr) | 2020-12-17 | 2020-12-17 | Procédé et dispositif de compression de données représentatives d'une scène tridimensionnelle volumétrique en vue d’une décompression en temps réel pour un visionnage en ligne. |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022129737A1 true WO2022129737A1 (fr) | 2022-06-23 |
Family
ID=74871569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2021/052252 WO2022129737A1 (fr) | 2020-12-17 | 2021-12-09 | Procédé et dispositif de compression de données représentatives d' une scène tridimensionnelle volumétrique en vue d'une décompression en temps réel pour un visionnage en ligne |
Country Status (7)
Country | Link |
---|---|
US (1) | US20240040101A1 (fr) |
JP (1) | JP2024503787A (fr) |
KR (1) | KR20230119694A (fr) |
DE (1) | DE112021006478T5 (fr) |
FR (1) | FR3118379B1 (fr) |
GB (1) | GB2616566A (fr) |
WO (1) | WO2022129737A1 (fr) |
-
2020
- 2020-12-17 FR FR2013513A patent/FR3118379B1/fr active Active
-
2021
- 2021-12-09 GB GB2309035.0A patent/GB2616566A/en active Pending
- 2021-12-09 WO PCT/FR2021/052252 patent/WO2022129737A1/fr active Application Filing
- 2021-12-09 JP JP2023537222A patent/JP2024503787A/ja active Pending
- 2021-12-09 DE DE112021006478.9T patent/DE112021006478T5/de active Pending
- 2021-12-09 KR KR1020237023968A patent/KR20230119694A/ko active Search and Examination
- 2021-12-09 US US18/267,894 patent/US20240040101A1/en active Pending
Non-Patent Citations (3)
Title |
---|
FARAMARZI ESMAEIL ET AL: "Mesh Coding Extensions to MPEG-I V-PCC", 21 September 2020 (2020-09-21), pages 1 - 5, XP055837185, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=9287057&ref=aHR0cHM6Ly9zY2hvbGFyLmdvb2dsZS5jb20v> [retrieved on 20210902], DOI: 10.1109/MMSP48831.2020.9287057 * |
JEAN-EUDES MARVIE (INTERDIGITAL) ET AL: "[V-PCC][EE2.6-related] Proposition of an anchor and a test model for coding animated meshes", no. m55327, 5 October 2020 (2020-10-05), XP030292836, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/132_OnLine/wg11/m55327-v1-m55327-v1.zip m55327_VideoCodingOfGenericAnimatedMeshes.docx> [retrieved on 20201005] * |
TANG DANHANG ET AL: "Real-time compression and streaming of 4D performances", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 37, no. 6, 4 December 2018 (2018-12-04), pages 1 - 11, XP058464802, ISSN: 0730-0301, DOI: 10.1145/3272127.3275096 * |
Also Published As
Publication number | Publication date |
---|---|
US20240040101A1 (en) | 2024-02-01 |
FR3118379A1 (fr) | 2022-06-24 |
KR20230119694A (ko) | 2023-08-16 |
FR3118379B1 (fr) | 2024-03-15 |
DE112021006478T5 (de) | 2023-11-02 |
GB2616566A (en) | 2023-09-13 |
JP2024503787A (ja) | 2024-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020188403A1 (fr) | Remplissage de géométrie d'un nuage de points | |
US7324594B2 (en) | Method for encoding and decoding free viewpoint videos | |
FR2788153A1 (fr) | Procede de codage/decodage d'un treillis tridimensionnel | |
FR2756399A1 (fr) | Procede et dispositif de compression video pour images de synthese | |
JP5575975B2 (ja) | コンピュータグラフィックスを使用して少なくとも1つの画像をレンダリングするデータ符号化方法及びデバイス並びに対応する復号化方法及びデバイス | |
Chai et al. | Depth map compression for real-time view-based rendering | |
EP3139608A1 (fr) | Procédé de compression d'un flux de données vidéo | |
CA2346342C (fr) | Compression et codage de reseau maille tridimensionnel | |
US20240054685A1 (en) | Point cloud decoding method, point cloud encoding method, and point cloud decoding device | |
US20220360823A1 (en) | Device and method for processing point cloud data | |
WO2022129737A1 (fr) | Procédé et dispositif de compression de données représentatives d' une scène tridimensionnelle volumétrique en vue d'une décompression en temps réel pour un visionnage en ligne | |
EP1574068B1 (fr) | Procede de codage d'une image par ondelettes et procede de decodage correspondant | |
EP4240016A1 (fr) | Dispositif de transmission de données de nuage de points, procédé de transmission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points | |
EP1181826B1 (fr) | Codage d'images hierarchiques a transformation variable | |
FR2856548A1 (fr) | Procede de representation d'une sequence d'images par modeles 3d, signal et dispositifs correspondants | |
WO2020188172A1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues | |
EP2737452B1 (fr) | PROCÉDÉ DE CODAGE D'UNE IMAGE APRÈS REDIMENSIONNEMENT PAR SUPPRESSION DE PIXELS et procédé de transmission d'image entre une entité émettrice et une entité réceptrice | |
US20230306643A1 (en) | Mesh patch simplification | |
FR2848373A1 (fr) | Procede de mesure d'artefacts de bloc | |
EP4199516A1 (fr) | Réduction de données redondantes dans un codage vidéo immersif | |
WO2023144138A1 (fr) | Procédés d'émission et de réception d'un masque pour le tatouage d'une image | |
WO2023180842A1 (fr) | Simplification de patch de maillage | |
Ferreira | Dynamic 3D Point Cloud Compression | |
CN117896536A (zh) | 点云解码、编码方法、介质、电子设备及产品 | |
WO2023203416A1 (fr) | Codage et décodage d'ondelettes de maillages dynamiques sur la base de composants vidéo et de métadonnées |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21847996 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 202309035 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20211209 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18267894 Country of ref document: US Ref document number: 2023537222 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 20237023968 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112021006478 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21847996 Country of ref document: EP Kind code of ref document: A1 |