US20220385941A1 - Volumetric video in web browswer - Google Patents

Volumetric video in web browswer Download PDF

Info

Publication number
US20220385941A1
US20220385941A1 US17/824,045 US202217824045A US2022385941A1 US 20220385941 A1 US20220385941 A1 US 20220385941A1 US 202217824045 A US202217824045 A US 202217824045A US 2022385941 A1 US2022385941 A1 US 2022385941A1
Authority
US
United States
Prior art keywords
faces
geometry
frames
face
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/824,045
Other languages
English (en)
Inventor
Ofer RUBINSTEIN
Yigal Eilam
Michael Birnboim
Vsevolod KAGARLITSKY
Gilad TALMON
Michael Tamir
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tetavi Ltd
Original Assignee
Tetavi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tetavi Ltd filed Critical Tetavi Ltd
Priority to US17/824,045 priority Critical patent/US20220385941A1/en
Assigned to TETAVI, LTD. reassignment TETAVI, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RUBINSTEIN, Ofer, BIRNBOIM, MICHAEL, EILAM, YIGAL, KAGARLITSKY, VSEVOLOD, TALMON, GILAD, TAMIR, MICHAEL
Publication of US20220385941A1 publication Critical patent/US20220385941A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component

Definitions

  • the present invention generally pertains to a system and method for ensuring a match between geometry and texture when playing volumetric videos in a web browser.
  • volumetric videos typically have three types of information, stored in separate files: audio information in an audio file, texture information in a compressed video file and geometric information in a mesh file.
  • audio information in an audio file if there is mismatch between the frames being displayed, the quality of the replay is damaged. For example, if the texture frame number differs from the mesh frame number, the displayed image can look ragged. If the audio frame number does not match the mesh frame number, a person's lip movements will not be correlated with the sound. The latter mismatch will not be dealt with herein.
  • the present invention discloses methods of ensuring that the texture decoder frame number and the mesh decoder frame number will consistently match, ensuring a better-quality displayed image, even in the limited processing environment of a web browser.
  • U.S. Pat. No. 8,284,204 discloses a device for rendering to multiple viewpoints of a stereoscopic display.
  • the device includes vertex shaders which receive vertices corresponding to primitives and process viewpoint dependent information.
  • the device also includes a primitive replication unit which replicates primitives according to a number of viewpoints supported by the stereoscopic display.
  • the primitive replication unit adds unique view tags to each of the primitives which identify the viewpoint that the respective primitive is destined for.
  • Each replicated primitive is processed by a rasterizer and converted into pixels.
  • the rasterizer adds a view tag to the rasterized pixels so that the pixels identify a respective primitive and identify a respective pixel buffer that the pixel is destined for.
  • the pixels can then be processed by a pixel processing unit and written to a pixel buffer corresponding to a respective viewpoint.
  • the pixels are subsequently output to the stereoscopic display.
  • U.S. Patent Application Publication No. US2017/0078703 discloses a method comprising: identifying at least one boundary in an image based on one or more signal characteristics; classifying a region of the image containing the boundary as a region containing an edge; determining context-based information about the region to be signaled in a bitstream of video data; partitioning the region at least in two along the edge; and applying a transform on the region.
  • FIG. 1 schematically illustrates a flow chart of the prior art
  • FIG. 2 schematically illustrates a flow chart of an embodiment of the present invention
  • FIG. 3 schematically illustrates a flow chart of an embodiment of the present invention
  • FIGS. 4 - 5 schematically illustrate a flow chart of an embodiment of the present invention.
  • FIGS. 6 - 7 schematically illustrate a flow chart of an embodiment of the present invention
  • data set hereinafter refers to a set of data which can be downloaded.
  • a data set contains at least one of texture data, geometry data, audio data and metadata.
  • a data set will contain both texture data and geometry data as well as audio data and can, but does not necessarily, contain metadata.
  • color data hereinafter refers to colors in a data set.
  • Each color datum is typically a pixel in a 2D mapping; typically, the 2D mapping comprises a texture atlas, however, the type of mapping is not relevant to the methods disclosed herein.
  • face or ‘geometry face’ hereinafter refers to a smallest useful area on a model.
  • a complete set of faces describes the geometry of the model. Faces are most commonly triangles, less commonly quadrilaterals, and can be any desired planar geometrical shape.
  • a face can comprise one or more colors.
  • Geometry data hereinafter refers to the geometric data in a data set.
  • the geometry data typically comprise the locations of vertices of faces, with a link between the 3D geometric location in a 3D model of the vertex and the 2D location of the vertex in a 2D atlas, where the atlas stores the color information.
  • Geometry data are typically stored as (location in 3D) and (location on a conformal or other 2D atlas).
  • One typical method of storing geometry data is XYZ UV, where (x,y,z) is the location of the datum in 3D space, while (u, v) is the location of the datum in the 2D atlas.
  • Each geometry datum can comprise a vertex number.
  • the vertex number can be stored instead of the (x,y,z) location, thereby reducing the size of the data set to be stored.
  • Geometry information hereinafter refers to the information describing the geometry of the model. Geometry information typically includes the locations on the model of the vertices of the faces, a mapping of the mesh vertices to the atlas pixels, and the faces to which each vertex belongs.
  • color information hereinafter refers to the color of a pixel in an atlas as applied on the model. From the geometry data, the color datum for each pixel in the atlas can be used to color a 3D model. For each pixel in the atlas (of constant size), the size and 3D shape on the 3D model of the area colored by the pixel is generated from the geometry information. No geometry information is stored in the atlas.
  • devices configured to display images have, in addition to a main processor for general computation, dedicated graphics processing.
  • the main processor is commonly referred to as a CPU, as will be used herein.
  • the graphics processing can be software-enabled or hardware-enabled.
  • the graphics processing whether hardware or software enabled, will be referred to as GPU.
  • the number of geometry faces to be rendered and their location is specified in CPU, although the location can be edited in GPU.
  • a vertex x m y m z m can be mapped to more than one location in the atlas, for non-limiting example, to both u j v j and u k y k , for example, when the two faces that share the vertex x m y m z m map to different areas in the atlas.
  • the location x m y m z m will occur more than once in the list of vertices, once as x m y m z m u j v j and once as x m y m z m u k v k .
  • the system of the present invention is not limited by the method of mapping locations x m y m z m to atlas pixels u j v j
  • the volumetric video also comprises audio data.
  • Other information which may or may not be present, is typically referred to as metadata.
  • Metadata can comprise, but is not limited to, a date stamp, a time stamp, an authentication, a validity certificate, a passcode, a reference number, and any combination thereof. Except as detailed herein, the presence of metadata and the nature of the metadata, if present, do not limit the invention disclosed herein.
  • the steps involved in generating ( 1000 ) the textured volumetric video comprise:
  • step 3 in practice, very often the decoder reports the wrong frame number or a referenced frame (either previous or subsequent) is corrupted or is not in buffer.
  • color data are typically stored as color values
  • geometry data are typically stored, as described above, as vertex location and, for each face, the vertices that describe that face.
  • the present invention discloses two methods of ensuring synchronization of color frame with geometry frame. Both methods have more than one embodiment. In one method, geometry is stored as texture; in the other method, the frame number is stored, for each frame, in the color data.
  • the file comprises both geometry data and color data, with the geometry data encoded in a texture data format.
  • the color data and geometry data can then be compressed using a single compression format while keeping the file size small, thus improving download speeds and keeping a reasonable geometric quality.
  • H.264 and H.265 typically use a type of Fourier transform, such as, but not limited to, integer discrete cosine transform (DCT) to compress the color data.
  • DCT integer discrete cosine transform
  • DCT integer discrete cosine transform
  • the high-frequency components of the transform are removed or de-emphasized. This tends not to strongly affect texture quality, as the high-frequency color data are frequently noise.
  • the decoder back-transforms the data.
  • the color spaces (and respective color values) will be referred to as RGB and YUV hereinafter.
  • color values can be subsampled so that, for each location, only two of the YUV channels are stored, the two channels being, first, the Y channel (always stored) and, second, either the U channel or the V channel.
  • the subsampling scheme is typically expressed as a three-part ratio J:a:b (e.g. 4:2:2) that describes the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high.
  • the parts are (in their respective order):
  • b is either zero or equal to a, except for irregular cases like 4:4:1 and 4:2:1, which do not follow the convention.
  • the two values that are stored are Y and one of U or V, with U and V being stored in alternate color data, in the order UV in a first line and VU in a second line.
  • the “missing” value is generated by averaging the values from the adjacent color data.
  • the V value can be generated from an average of the V values of the color data surrounding the given color datum. Obviously, the same will hold for generating the U value of a color datum with stored V value.
  • Other typical subsampling schemes are 4:4:2 and 4:2:0. The subsampling scheme that is used depends on the amount of compression desired and the quality of the texture desired; greater compression leads to greater loss of detail in the rendered color.
  • geometry data are stored in a texture format
  • subsampling can cause significant damage to the accuracy of the rendered image because of the lack of a relationship between the information stored in adjacent locations since, typically, geometry data are stored as, for non-limiting examples, face number, vertex location and vertex-to-vertex connections between faces or as face number and vertex number plus location of the vertex for each vertex in the face.
  • LSB least significant bit
  • MSB most significant bits
  • file data alternate between (or comprise separate blocks of) color data and geometry data stored in texture data format.
  • the MSB for all of the pixels comprise Y data.
  • the texture data comprise YUV channels, and, as above, for the geometry data, the geometric information is encoded in the Y channel only. Therefore, when lossy compression is used, discarding LSB data, the geometric information is not degraded.
  • the geometry for each frame is embedded with the relevant color data. Extra lines of file data comprise the number and location of the geometry faces.
  • regions of interest are compressed less than other regions.
  • a region of interest can be the geometry; geometry data are compressed less than color data.
  • a region of interest is the bottom of the image; it is very common for an image to have the background in its upper part while the foreground is in the lower part. Therefore, it is common to compress the lower part of an image less than the upper part of the image.
  • system of the present invention is not limited by the above techniques for compression.
  • synchronization of geometry and texture is ensured by storing the geometry data and color data separately, but storing a frame number in the color data to ensure synchronization of color and geometry.
  • a frame number is encoded in the color data.
  • one or more of the first file locations of the color data can contain a frame number instead of a texture value.
  • the geometry is encoded for each frame inside the relevant texture. A few more lines of file data are added and the number and location of the geometry faces are stored in the extra lines of file data.
  • the first embodiment ( 1100 ) of the method comprises steps of:
  • the frame number is encoded in the color data.
  • a first variant and second variant, with two sub-variants, are given.
  • steps 1 and 2 are the same as in the standard process.
  • the standard step 3 is then replaced, as shown below, by steps that result from using the frame number that was decoded from the data set.
  • the first variant of the second embodiment of the method ( 1200 ) comprises steps of:
  • a data set is streamed or downloaded to the browser ( 1205 ).
  • the relevant geometry data are downloaded and stored in accessible GPU memory by generating a degenerate pool of faces, as described in step 3 of the standard method, and the actual geometry, by changing locations to actual locations, is applied to a subset of the degenerate pool of faces, as described in step 4 of the standard method.
  • Two sub-variants are shown, which differ in the amount of geometry data stored.
  • geometry data are stored for a large number of frames (“large memory”); in the second sub-variant, geometry data are stored for only a small number of frames (“small memory”), the number of frames being sufficient only to cover decoder frame inaccuracy.
  • large memory a large number of frames
  • small memory a small number of frames
  • the number of frames being sufficient only to cover decoder frame inaccuracy.
  • the number of the number of frames stored are from 2 to 1000, from 2 to 100, from 2 to 10, from 2 to 4 and from 2 to 3.
  • the first sub-variant ( 1300 ) of the second variant of the method comprises steps of:
  • the second sub-variant ( 1400 ) of the second variant of the method comprises steps of:
  • step 6 If the decoder inaccuracies are larger than expected and, for a particular frame, the relevant geometry is not in the GPU accessible memory the second variant is modified by adding steps between step 6 and step 7:

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Image Generation (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
US17/824,045 2021-05-25 2022-05-25 Volumetric video in web browswer Pending US20220385941A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/824,045 US20220385941A1 (en) 2021-05-25 2022-05-25 Volumetric video in web browswer

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163192643P 2021-05-25 2021-05-25
US17/824,045 US20220385941A1 (en) 2021-05-25 2022-05-25 Volumetric video in web browswer

Publications (1)

Publication Number Publication Date
US20220385941A1 true US20220385941A1 (en) 2022-12-01

Family

ID=84194533

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/824,045 Pending US20220385941A1 (en) 2021-05-25 2022-05-25 Volumetric video in web browswer

Country Status (5)

Country Link
US (1) US20220385941A1 (ja)
EP (1) EP4349019A1 (ja)
JP (1) JP2024520211A (ja)
CA (1) CA3220000A1 (ja)
WO (1) WO2022249183A1 (ja)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080007559A1 (en) * 2006-06-30 2008-01-10 Nokia Corporation Apparatus, method and a computer program product for providing a unified graphics pipeline for stereoscopic rendering
US20170078703A1 (en) * 2015-09-10 2017-03-16 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding
US20190080483A1 (en) * 2017-09-14 2019-03-14 Apple Inc. Point Cloud Compression
US10897614B2 (en) * 2016-12-09 2021-01-19 Nokia Technologies Oy Method and an apparatus and a computer program product for video encoding and decoding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013116347A1 (en) * 2012-01-31 2013-08-08 Google Inc. Method for improving speed and visual fidelity of multi-pose 3d renderings
JP6820527B2 (ja) * 2015-06-25 2021-01-27 パナソニックIpマネジメント株式会社 映像同期装置及び映像同期方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080007559A1 (en) * 2006-06-30 2008-01-10 Nokia Corporation Apparatus, method and a computer program product for providing a unified graphics pipeline for stereoscopic rendering
US20170078703A1 (en) * 2015-09-10 2017-03-16 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding
US10897614B2 (en) * 2016-12-09 2021-01-19 Nokia Technologies Oy Method and an apparatus and a computer program product for video encoding and decoding
US20190080483A1 (en) * 2017-09-14 2019-03-14 Apple Inc. Point Cloud Compression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Hugues Hoppe. 1997. View-dependent refinement of progressive meshes. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques (SIGGRAPH '97). ACM Press/Addison-Wesley Publishing Co., USA, 189–198. https://doi.org/10.1145/258734.258843 (Year: 1997) *

Also Published As

Publication number Publication date
JP2024520211A (ja) 2024-05-22
WO2022249183A1 (en) 2022-12-01
CA3220000A1 (en) 2022-12-01
EP4349019A1 (en) 2024-04-10

Similar Documents

Publication Publication Date Title
US11234021B2 (en) Signal reshaping and coding for HDR and wide color gamut signals
US8736603B2 (en) Compression of texture rendered wire mesh models
US10679539B2 (en) Two-dimensional compositing
US20180255317A1 (en) Method for reconstructing video stream
JP7043164B2 (ja) 高ダイナミックレンジフレームおよびインポーズされる低ダイナミックレンジフレームの両方を符号化するための方法およびデバイス
CN113170140A (zh) 数据阵列的位平面编码
AU2018233015B2 (en) System and method for image processing
US20200236401A1 (en) Point cloud coding using homography transform
KR20220063254A (ko) 세계 시그널링 정보에 대한 비디오 기반 포인트 클라우드 압축 모델
JP2004104621A (ja) 画像符号化装置、画像復号化装置及びそれらの方法
US11989919B2 (en) Method and apparatus for encoding and decoding volumetric video data
US20220385941A1 (en) Volumetric video in web browswer
CN117280680A (zh) 动态网格对齐的并行方式
US11423580B2 (en) Decoding data arrays
US9924176B2 (en) Hybrid block based compression
US20230306641A1 (en) Mesh geometry coding
US20230306642A1 (en) Patch mesh connectivity coding
US11515961B2 (en) Encoding data arrays
US20230370635A1 (en) Encoding and decoding immersive video
US20240127489A1 (en) Efficient mapping coordinate creation and transmission
US20240177355A1 (en) Sub-mesh zippering
WO2023180839A1 (en) Mesh geometry coding
WO2023180840A1 (en) Patch mesh connectivity coding
WO2023174701A1 (en) V-pcc based dynamic textured mesh coding without occupancy maps
CN115761021A (zh) 一种基于WebGL的法线贴图高保真压缩方法及系统

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: TETAVI, LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RUBINSTEIN, OFER;EILAM, YIGAL;BIRNBOIM, MICHAEL;AND OTHERS;SIGNING DATES FROM 20220731 TO 20220823;REEL/FRAME:060874/0622

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED