CN113497928B

CN113497928B - Data processing method for immersion media and related equipment

Info

Publication number: CN113497928B
Application number: CN202010202033.XA
Authority: CN
Inventors: 胡颖; 许晓中; 刘杉; 崔秉斗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2022-07-12
Anticipated expiration: 2040-03-20
Also published as: CN113497928A

Abstract

The embodiment of the application provides a data processing method and related equipment for immersive media, wherein the method comprises the following steps: acquiring a region encapsulation data box of the immersion medium, wherein the region encapsulation data box comprises region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to the first conversion processing, the region shape information indicates that a shape of the ith mapping region is a target shape, wherein i is a positive integer; a zone encapsulation process is performed on the immersion medium in accordance with the zone encapsulation data box. The data processing overhead of the immersive media can be reduced.

Description

Data processing method for immersion media and related equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of Virtual Reality (VR) technologies, and in particular, to a data processing method for immersive media, a data processing device for immersive media, an encoding device, and a decoding device.

Background

In the video processing process of the immersion medium, the encoding efficiency of the video can be greatly improved by encoding the projection image after region packaging, so that the region packaging technology is widely applied to the video processing process of the immersion medium. However, in practice, it has been found that the existing region encapsulation process incurs a large computational and storage overhead, thereby affecting the video coding efficiency of the immersive media.

Disclosure of Invention

The embodiment of the application provides a data processing method and related equipment for an immersion medium, which can reduce the data processing overhead of the immersion medium.

In one aspect, an embodiment of the present application provides a data processing method for an immersive media, including:

acquiring a region encapsulation data box of the immersion medium, wherein the region encapsulation data box comprises region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to the first conversion processing, the region shape information indicates that a shape of the ith mapping region is a target shape, wherein i is a positive integer;

a zone encapsulation process is performed on the immersion medium in accordance with the zone encapsulation data box.

The embodiment of the application provides that only the mapping region with the shape of the target shape (such as a square) can be subjected to the first conversion processing (such as rotation) in the region packaging data box of the immersion medium, and because the size of the calculation matrix required by the mapping region with the target shape in the region packaging process is small, the storage and calculation overhead of the immersion medium in the region packaging process can be greatly reduced.

acquiring a region encapsulation data box of the immersion medium, wherein the region encapsulation data box comprises region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion processing, the region shape information indicates that a shape of the ith packaged region is a target shape, wherein i is a positive integer;

a zone decapsulation process is performed on the immersion medium per zone encapsulation data box.

The embodiment of the application provides that only the encapsulation area with the target shape (such as a square) can be subjected to the first inverse conversion processing (such as rotation) in the area decapsulation process in the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, the storage and calculation overhead of the immersion medium in the area decapsulation process can be greatly reduced.

acquiring a projected image of the immersion medium and area configuration of the projected image;

dividing N mapping regions from the projection image according to region configuration, wherein N is a positive integer;

generating a region encapsulation data box for the immersion medium based on the region configuration and the N mapped regions; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the ith mapping region needs to be subjected to the first conversion process, where i is a positive integer and 1 ≦ i ≦ N;

performing region encapsulation processing on the projected image according to a region encapsulation data box to obtain an encapsulation image of the immersion medium, wherein the encapsulation image comprises N encapsulation regions, and one mapping region corresponds to one encapsulation region;

and carrying out coding processing on the packaged image to obtain a coding file of the immersion medium.

The embodiment of the application provides that only the mapping area with the shape of the target shape (such as a square) can be subjected to the first conversion processing (such as rotation) in the area packaging data box of the immersion medium, and because the size of the calculation matrix required by the mapping area with the target shape in the area packaging process is small, the storage and calculation overhead of the immersion medium in the area packaging process can be greatly reduced, and the encoding efficiency of the immersion medium is favorably improved.

decoding the encoded file of the immersion medium to obtain a packaged image of the immersion medium, wherein the packaged image comprises N packaged areas, and N is a positive integer;

performing region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, wherein the projected image comprises N mapping regions, and one encapsulation region corresponds to one mapping region;

and performing three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium.

The embodiment of the application provides that only the encapsulation area with the target shape (such as a square) can be subjected to the first inverse conversion processing (such as rotation) in the area decapsulation process in the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, such a provision can greatly reduce the storage and calculation overhead of the immersion medium in the area decapsulation process, and is beneficial to improving the decoding efficiency of the immersion medium.

In one aspect, an embodiment of the present application provides an immersive media data processing apparatus, including:

an acquisition unit configured to acquire a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to the first conversion processing, the region shape information indicates that a shape of the ith mapping region is a target shape, wherein i is a positive integer;

and the processing unit is used for executing region packaging processing on the immersion medium according to the region packaging data box.

In one aspect, an embodiment of the present application provides another data processing apparatus for immersive media, including:

an acquisition unit for acquiring a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion processing, the region shape information indicates that a shape of the ith packaged region is a target shape, wherein i is a positive integer;

and the processing unit is used for executing region decapsulation processing on the immersion medium according to the region encapsulation data box.

In one aspect, an embodiment of the application provides another immersion medium data processing apparatus, including:

an acquisition unit for acquiring a projected image of the immersion medium and a region configuration of the projected image;

the processing unit is used for dividing N mapping regions from the projected image according to region configuration, wherein N is a positive integer; and the number of the first and second groups,

a zone encapsulation data box for generating the immersion medium according to the zone configuration and the N mapped zones; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the ith mapping region needs to be subjected to the first conversion process, where i is a positive integer and 1 ≦ i ≦ N; and the number of the first and second groups,

the device comprises a data box, a mapping area, a media immersion area and a media immersion area, wherein the data box is used for carrying out area packaging processing on a projected image according to an area packaging data box to obtain a packaging image of the immersion medium, the packaging image comprises N packaging areas, and one mapping area corresponds to one packaging area; and the number of the first and second groups,

and the method is used for coding the packaged image to obtain a coded file of the immersion medium.

the processing unit is used for decoding the coded file of the immersion medium to obtain a packaged image of the immersion medium, wherein the packaged image comprises N packaged regions, and N is a positive integer;

the processing unit is also used for performing region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, wherein the projected image comprises N mapping regions, and one encapsulation region corresponds to one mapping region; and the number of the first and second groups,

and the method is also used for carrying out three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium.

In one aspect, an embodiment of the present application provides an encoding apparatus, including:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a memory storing one or more first instructions (or second instructions) adapted to be loaded by the processor and to perform the above-described associated immersion medium data processing method.

In one aspect, an embodiment of the present application provides a decoding apparatus, including:

The embodiment of the application provides that only the encapsulation area with the target shape (such as a square) can be subjected to the first inverse conversion processing (such as rotation) in the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, the provision can greatly reduce the storage and calculation overhead of the immersion medium in the area decapsulation process and is beneficial to improving the decoding efficiency of the immersion medium.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 illustrates an architecture diagram of an immersive media system provided by an exemplary embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a 6DoF provided by an exemplary embodiment of the present application;

FIG. 3 illustrates a schematic diagram of a 3DoF provided by an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic diagram of 3DoF + provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a schematic diagram of a mapping region and an encapsulation region provided with guard bands provided by an exemplary embodiment of the present application;

FIG. 6a is a diagram illustrating the effect of rotating a rectangular mapping region according to an exemplary embodiment of the present application;

fig. 6b is a schematic diagram illustrating an effect of splitting a rectangular mapping region into square mapping regions and then rotating according to an exemplary embodiment of the present application;

FIG. 7 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application;

FIG. 8 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application;

FIG. 9 is a flow chart illustrating a method for data processing of an immersive media as provided in an exemplary embodiment of the present application;

FIG. 10 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application;

FIG. 11 is a block diagram of a data processing device for immersive media according to an exemplary embodiment of the present application;

FIG. 12 is a block diagram of a data processing device for immersive media according to an exemplary embodiment of the present application;

fig. 13 is a schematic structural diagram of an encoding apparatus according to an exemplary embodiment of the present application;

fig. 14 shows a schematic structural diagram of a decoding device according to an exemplary embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Embodiments of the present application relate to data processing techniques for immersive media. The immersive media are media files that can provide immersive media content and enable a user immersed in the media content to obtain a sensory experience such as a visual experience and an auditory experience in the real world. Immersive media content includes video content represented in various forms in a three-dimensional (3-dimensional, 3D) space, such as three-dimensional video content represented in a spherical form. In particular, the immersive media content may be VR (Virtual Reality) video content, panoramic video content, spherical video content, or 360 degree video content; so, the immersive media may also be referred to as VR video, panoramic video, spherical video, or 360 degree video. In addition, immersive media content also includes audio content that is synchronized with the video content represented in the three-dimensional space.

FIG. 1 illustrates an architecture diagram of an immersive media system provided by an exemplary embodiment of the present application; as shown in fig. 1, the immersive media system includes an encoding device, which may refer to a Computer device used by a provider of the immersive media, and a decoding device, which may be a terminal (e.g., a PC (Personal Computer), a smart mobile device (e.g., a smart phone), etc.) or a server. The decoding device may refer to a Computer device used by a user who immerses the media, which may be a terminal (e.g., a PC (Personal Computer), a smart mobile device (e.g., a smartphone), a VR device (e.g., a VR headset, VR glasses, etc.)). The data processing process of the immersion medium comprises a data processing process at the encoding device side and a data processing process at the decoding device side.

The data processing process at the encoding device end mainly comprises the following steps: (1) the acquisition and production process of media content of the immersion media; (2) the process of encoding of the immersion media and file packaging. The data processing process at the decoding device end mainly comprises the following steps: (3) a process of file decapsulation and decoding of the immersion medium; (4) a rendering process of the immersion media. In addition, the transmission process involving the immersive media between the encoding device and the decoding device may be based on various transmission protocols, which may include, but are not limited to: DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), SMTP (Smart Media Transport Protocol), TCP (Transmission Control Protocol), and the like.

The various processes involved in the data processing of the immersion medium will be described in detail below with reference to fig. 1.

The data processing process at the encoding equipment end comprises the following steps:

(1) a process for obtaining and producing media content for an immersive media.

1) A process of obtaining media content for immersive media.

The media content of the immersive media is obtained by capturing real-world audio-visual scenes with a capture device. In one implementation, the capture device may refer to a hardware component provided in the encoding device, for example, the capture device refers to a microphone, a camera, a sensor, etc. of the terminal. In another implementation, the capturing device may also be a hardware device connected to the encoding device, such as a camera connected to a server; an acquisition service for providing media content of an immersive media to an encoding device. The capture device may include, but is not limited to: audio equipment, camera equipment and sensing equipment. The audio device may include, among other things, an audio sensor, a microphone, and the like. The camera devices may include a general camera, a stereo camera, a light field camera, and the like. The sensing device may include a laser device, a radar device, or the like. The number of capture devices may be multiple, the capture devices being deployed at specific locations in real space to simultaneously capture audio content and video content from different angles within the space, the captured audio and video content remaining synchronized in both time and space. The media content captured by the capture device is referred to as raw data for the immersive media.

2) A production process for media content for immersive media.

The captured audio content itself is content suitable for performing audio encoding of the immersion media. The captured video content is rendered suitable for video encoding of the immersion medium by a series of production processes including:

and (6) splicing. The captured video contents are obtained by shooting the capturing device at different angles, and the splicing means that the video contents shot at all the angles are spliced into a complete video capable of reflecting a 360-degree visual panorama in a real space, namely the spliced video is a panoramic video (or spherical video) represented in a three-dimensional space.

And (9) projecting. The projection is a process of mapping a three-dimensional video formed by splicing to a two-dimensional (3-Dimension, 2D) image, and the 2D image formed by projection is called a projection image; the manner of projection may include, but is not limited to: longitude and latitude map projection and regular hexahedron projection.

And area packaging. The projected image may be encoded directly or after area encapsulation. In practice, it is found that in the data processing process of the immersion medium, the video coding efficiency of the immersion medium can be greatly improved by performing region packaging on the two-dimensional projection image and then encoding the two-dimensional projection image, so that the region packaging technology is widely applied to the video processing process of the immersion medium. The area packing refers to a process of performing conversion processing on the projection image by area, and the area packing process causes the projection image to be converted into a packed image. The process of area encapsulation specifically includes: dividing the projection image into a plurality of mapping areas, then respectively carrying out conversion processing on the plurality of mapping areas to obtain a plurality of packaging areas, and mapping the plurality of packaging areas to a 2D image to obtain a packaging image. The mapping area refers to an area obtained by dividing a projection image before performing area packaging; the encapsulation area refers to an area in the encapsulation image after performing area encapsulation. The conversion process may include, but is not limited to: mirroring, rotating, rearranging, upsampling, downsampling, changing the resolution and movement of the regions, and the like.

It should be noted that, since only panoramic video can be captured by using the capturing device, such video can be processed by the encoding device and transmitted to the decoding device for corresponding data processing, a user on the decoding device side can only view 360 Degrees of video information by performing some specific actions (e.g. head rotation), while performing unspecific actions (e.g. head movement) cannot obtain corresponding video changes, and the VR experience is not good, so that it is necessary to additionally provide depth information matched with the panoramic video to enable the user to obtain better immersion and better VR experience, which relates to 6DoF (Six Degrees of Freedom) production technology. When the user can move more freely in the simulated scene, it is called 6 DoF. When the 6DoF manufacturing technology is adopted to manufacture the video content of the immersion medium, the capturing device generally adopts a light field camera, a laser device, a radar device and the like to capture point cloud data or light field data in a space, and some specific processing is required in the process of executing the manufacturing process (i-c), such as processes of cutting and mapping the point cloud data, a depth information calculation process and the like. FIG. 2 illustrates a schematic diagram of a 6DoF provided by an exemplary embodiment of the present application; the 6DoF is divided into a window 6DoF, an omnidirectional 6DoF and a 6DoF, wherein the window 6DoF means that the rotation and the movement of a user on an X axis and a Y axis are limited, and the translation of the user on a Z axis is limited; for example, the user cannot see the scene outside the window frame, and the user cannot pass through the window. An omnidirectional 6DoF means that the rotational movement of the user in the X, Y and Z axes is limited, for example, the user cannot freely move through three-dimensional 360-degree VR content in a limited movement area. 6DoF means that the user can freely translate along the X, Y, and Z axes, for example, the user can freely move around in the three-dimensional 360-degree VR content. Similar to 6DoF, there are also 3DoF and 3DoF + fabrication techniques. FIG. 3 illustrates a schematic diagram of a 3DoF provided by an exemplary embodiment of the present application; as shown in fig. 3, the 3DoF means that the user is fixed at the center point of a three-dimensional space, and the head of the user rotates along the X-axis, the Y-axis and the Z-axis to view a picture provided by the media content. Fig. 4 is a schematic diagram of 3DoF + provided by an exemplary embodiment of the present application, where 3DoF + refers to a scene provided by media content that a user's head can move in a limited space to view based on 3DoF when a virtual scene provided by immersion media has certain depth information, as shown in fig. 4.

(2) The process of encoding of the immersion media and file packaging.

The captured audio content can be directly audio-encoded to form an audio code stream of the immersive media. After the manufacturing process is carried out in the first step-second step or the first step-third step, video coding is carried out on the projected image or the packaged image, and a video code stream of the immersion medium is obtained. It should be noted here that if the 6DoF production technique is adopted, a specific encoding method (such as point cloud encoding) needs to be adopted for encoding in the video encoding process. Packaging the audio code stream and the video code stream in a File container according to a File Format of the immersive Media (such as an ISOBMFF (ISO Base Media File Format) Format) to form a Media File resource of the immersive Media, wherein the Media File resource can be a Media File or a Media fragment to form a Media File of the immersive Media; and records metadata of the Media file assets of the immersive Media using Media Presentation Description (MPD) as required by the file format of the immersive Media, where metadata is a generic term for information related to the presentation of the immersive Media, and the metadata may include description information for Media content, description information for windows, signaling information related to the presentation of the Media content, and so forth. As shown in fig. 1, the encoding apparatus stores media presentation description information and media file resources formed after a data processing process.

Secondly, the data processing process at the decoding device end:

(3) a process of file decapsulation and decoding of the immersion medium;

the decoding device can obtain the media file resources of the immersive media and the corresponding media presentation description information from the encoding device through recommendation of the encoding device or adaptive dynamic according to user requirements at the decoding device end, for example, the decoding device can determine the orientation and position of the user according to the tracking information of the head/eyes/body of the user, and then dynamically request the encoding device to obtain the corresponding media file resources based on the determined orientation and position. The media file assets and media presentation description information are transmitted by the encoding device to the decoding device via a transmission mechanism (e.g., DASH, SMT). The file decapsulation process of the decoding device end is opposite to the file encapsulation process of the encoding device end, and the decoding device decapsulates the media file resources according to the file format requirement of the immersion media to obtain an audio code stream and a video code stream. The decoding process of the decoding device end is opposite to the encoding process of the encoding device end, and the decoding device performs audio decoding on the audio code stream to restore the audio content. In addition, the decoding process of the video code stream by the decoding device comprises the following steps:

decoding a video code stream to obtain a plane image; if the metadata indicates that the immersion media has performed a region encapsulation process, the planar image refers to an encapsulated image, based on metadata provided by the media presentation description information; the planar image is referred to as a projected image if the metadata indicates that the immersion medium has not performed a region encapsulation process;

if the metadata indicates that the immersion medium has performed a region encapsulation process, the decoding device performs region decapsulation on the encapsulated image to obtain a projected image. The region decapsulation is the reverse of the region encapsulation, and the region decapsulation is a process of performing reverse conversion processing on the encapsulated image by region, and the region decapsulation causes the encapsulated image to be converted into a projected image. The process of region decapsulation specifically includes: and performing reverse conversion processing on the plurality of packaging areas in the packaging image according to the indication of the metadata to obtain a plurality of mapping areas, and mapping the plurality of mapping areas to one 2D image to obtain a projection image. The inverse conversion processing refers to processing inverse to the conversion processing, for example: the conversion process means a counterclockwise rotation of 90 degrees, and the reverse conversion process means a clockwise rotation of 90 degrees.

And reconstructing the projection image according to the media presentation description information to convert the projection image into a 3D image, wherein the reconstructing is a process of re-projecting the two-dimensional projection image into a 3D space.

(4) A rendering process of the immersion media.

And rendering the audio content obtained by decoding the audio and the 3D image obtained by decoding the video by the decoding equipment according to metadata related to rendering and windows in the media presentation description information, and playing and outputting the 3D image after rendering is completed. In particular, if the fabrication techniques of 3DoF and 3DoF + are employed, the decoding apparatus renders the 3D image based mainly on the current viewpoint, disparity, depth information, etc., and if the fabrication technique of 6DoF is employed, the decoding apparatus renders the 3D image within the viewing window based mainly on the current viewpoint. The viewpoint refers to a viewing position point of a user, the parallax refers to a visual line difference generated by binocular eyes of the user or a visual line difference generated due to movement, and the window refers to a viewing area.

Immersive media systems support data boxes (boxes), which refer to data blocks or objects that include metadata, i.e., metadata that includes the corresponding media content in a data Box. The immersion media may include a plurality of data boxes, including, for example, a spherical Region scaling data Box (Sphere Region Zooming Box) containing metadata describing spherical Region scaling information; a 2D region scaling data box (2 dregionizingbox) containing metadata for describing 2D region scaling information; a Region packaging data Box (Region Wise packaging Box) containing metadata for describing corresponding information in the Region packaging process; and so on.

As can be seen from the above description of the immersive media system, in the data processing process of the immersive media, the video encoding efficiency of the immersive media can be greatly improved by performing region encapsulation on the two-dimensional projection image and then encoding the two-dimensional projection image, so that the region encapsulation technology is widely applied to the video processing process of the immersive media. The Region packaging structure (Region Wise packaging structure) specifies the mapping relationship between the packaging Region and the corresponding mapping Region, and in addition, if a guard band (guard band) is set for the packaging Region in the Region packaging process, the position and the size of the guard band are also specified in the Region packaging structure (Region Wise packaging structure). The guard band is provided to avoid the problem of boundary errors.

According to existing coding standards for immersive media, such as AVS (Audio Video coding Standard), the region encapsulation process for immersive media currently supports only rectangular (including rectangular and square) region encapsulation structures. Based on existing coding standards, the syntax of the region-packed data box for immersion media can be seen in table 1 below:

TABLE 1

The semantics of the syntax shown in table 1 above are as follows: RegionWisePackingStruct () represents the region encapsulation result; the dependent _ picture _ matching _ flag indicates that a picture matching flag is composed, and for a stereoscopic projection image (i.e., having two image components for left and right eyes), when the frame packing manner of the two image components is determined to be up-down or left-right side-by-side, if the dependent _ picture _ matching _ flag is equal to 1, it indicates that mapping area information and packing area information for each of the two image components of the projection image are applied separately in syntax, and the packed image and the projection image have the same stereoscopic packing format; if the dependent _ picture _ matching _ flag is equal to 0, it means that the mapping region information and the packaging region information correspond to the entire projection image. Num _ regions identifies the number of the pack area when the content _ picture _ matching _ flag is 0, and the meaning of 0 value of num _ regions is preserved at this time. When the dependent _ picture _ matching _ flag is 1, the total number of the packed regions is equal to 2 × num _ regions, and each recortrepacking (i) describes information of a projected image and a packed image corresponding to one image component. proj _ picture _ width represents the width of the projected image; proj _ picture _ height represents the height of the projected image; packet _ picture _ width represents the width of the packed image; packet _ picture _ height represents the height of the packed picture. guard _ band _ flag [ i ] denotes a flag of a guard band of the ith footprint, guard _ band _ flag [ i ] equal to 1 denotes that the ith footprint is provided with a guard band, and guard _ band _ flag [ i ] equal to 1 denotes that the ith footprint is not provided with a guard band. packing _ type [ i ] represents an area pack type of the ith map area.

The syntax of the rectangle-like region encapsulation (RectRegionPacking) shown in Table 1 can be seen in Table 2 below:

TABLE 2

The semantics of the syntax shown in table 2 above are as follows: (i) the RectRegionPacking (i) represents the area packaging mapping relation between the ith mapping area and the corresponding ith packaging area; proj _ reg _ width [ i ], proj _ reg _ height [ i ], proj _ reg _ top [ i ] and proj _ reg _ left [ i ] respectively represent the width, height, offset of the top end and offset of the left end of the ith mapping region; transform _ type [ i ] represents a transform type when the ith encapsulation area is inversely transformed to the ith mapping area; when transform _ type [ i ] includes both rotation and mirror image, first do rotation, then do mirror image: the value range of transform _ type [ i ] is [0, 7], and the transform types represented by the values are as follows:

0: no conversion

1 horizontal mirror image

2, clockwise rotating by 180 degrees

3, firstly rotating clockwise by 180 degrees and then horizontally mirroring

4, firstly rotating clockwise by 90 degrees and then horizontally mirroring

5, clockwise rotating by 90 degrees

6, firstly rotating by 270 degrees clockwise and then horizontally mirroring

7, clockwise rotation of 270 degrees

When the transform _ type [ i ] comprises rotation and mirror image, firstly rotating and then mirroring; and the width, the height, the top offset and the left offset of the ith packaging area are respectively represented by a packed _ reg _ width [ i ], a packed _ reg _ height [ i ], a packed _ reg _ top [ i ] and a packed _ reg _ left [ i ]. It should be noted that the values of proj _ reg _ width [ i ], proj _ reg _ height [ i ], packet _ reg _ width [ i ] and packet _ reg _ height [ i ] should all be greater than 0.

The syntax of the GuardBand-like (GuardBand) shown in table 1 can be seen in the following table 3:

TABLE 3

The semantics of the syntax shown in table 3 above are as follows: guard band representing the ith package area; left _ gb _ width [ i ], right _ gb _ width [ i ], top _ gb _ height [ i ] and bottom _ gb _ height [ i ] respectively represent the size of the left, right, upper and lower guard bands of the boundary of the ith encapsulation area; gb _ not _ used _ for _ pred _ flag [ i ] indicates that the ith package region does not use a guard band; gb _ type [ i ] [ j ] represents a guard band type of the ith encapsulation area.

FIG. 5 shows a schematic diagram of an ith mapping region and an ith encapsulation region provided with guard bands provided by an exemplary embodiment of the present application; in fig. 5, 501 denotes the width pro _ picture _ width of the projection image; 502 denotes the height proj _ picture _ height of the projected image; 503 denotes an offset amount proj _ reg _ left [ i ] of the left end of the ith mapping region; 504 represents the offset proj _ reg _ top [ i ] at the top of the ith mapping region; 505 denotes a height proj _ rge _ height [ i ] of the ith mapping region; 506 indicates the width proj reg width i of the ith mapping region. 507 represents the width packet _ picture _ width of the packed image; 508 represents the height packed _ picture _ height of the packed picture; 509, an offset packet _ reg _ left [ i ] at the left end of the ith encapsulation area; 510 denotes the offset packed _ reg _ top [ i ] at the top of the ith encapsulation area; 511 denotes a height packed _ reg _ height [ i ] of the ith encapsulation area; 512 denotes a width of an ith encapsulation area, packed _ reg _ width [ i ]; 513 denotes a size top _ gb _ height [ i ] of an upper guard band on the boundary of the ith encapsulation area; 514 represents the size left _ gb _ width [ i ] of the boundary left guard band of the ith encapsulation area; 515 denotes a size right _ gb _ width [ i ] of a boundary right guard band of the ith encapsulation area; 516 denotes the size bottom _ gb _ height [ i ] of the boundary lower guard band of the ith package region.

As can be seen from the syntax and semantics of the region encapsulation data box of the immersion medium, in the existing region encapsulation technology of the immersion medium, both the mapping region and the encapsulation region are rectangular (including rectangle and square). The conversion from mapping area to encapsulation area may perform any manner of conversion process on the mapping area: (1) no conversion; (2) horizontally mirroring; (3) rotating 180 degrees counterclockwise; (4) rotating the mirror plate counterclockwise by 180 degrees and then horizontally mirroring the mirror plate; (4) rotating 90 degrees anticlockwise and then horizontally mirroring; (5) rotated 90 degrees counterclockwise; (6) rotating the wafer counterclockwise by 270 degrees and then horizontally mirroring; (8) rotated 270 degrees counterclockwise. Accordingly, the conversion from the encapsulation area to the mapping area may perform the inverse conversion process of any of the 8 ways described above on the encapsulation area.

The inventor of the present application finds in practice that the above-mentioned existing coding standards based on the region packaging technology of the immersion media have some problems, specifically: when the shape of the mapping region is rectangular, the encoding device may generate huge storage and calculation overhead in the region encapsulation process, and similarly, when the shape of the encapsulation region is rectangular, the decoding device may generate large storage and calculation overhead in the region decapsulation process. Based on this discovery, in order to reduce the storage and computation overhead of the immersion media in the region encapsulation process and the region decapsulation process, the embodiment of the present application proposes a data processing scheme for the immersion media, which makes the following two improvements for the region encapsulation technology based on the existing coding standard:

the improvement is as follows: it is specified that only the square-shaped mapped region can be rotated during region encapsulation, and similarly, it is specified that only the square-shaped encapsulated region can be rotated during region decapsulation.

Based on the idea of the first improvement, the embodiment of the present application modifies the syntax and semantics of the region encapsulation data box of the immersion media based on the existing coding standard, specifically modifies the syntax of the rectangular region encapsulation (rectegion packaging) shown in table 2 above, and the syntax of the modified rectangular region encapsulation (rectegion packaging) can be referred to the following table 4:

TABLE 4

The semantics of the syntax of the modified portion of table 4 above with respect to table 2 are as follows: and the value of the horizontal _ flip _ flag [ i ] indicates whether horizontal mirroring is required or not when the ith packaging area is reversely converted into the ith mapping area, the value of the horizontal _ flip _ flag [ i ] is 0, which means that the horizontal mirroring is not required, and the value of the horizontal _ flip _ flag [ i ] is 1, which means that the horizontal mirroring is required. It is understood that the horizontal _ flip _ flag [ i ] may also indicate whether horizontal mirroring is performed when the ith mapping region is converted to the ith encapsulation region, a value of the horizontal _ flip _ flag [ i ] is 0 indicates that horizontal mirroring is not required, and a value of the horizontal _ flip _ flag [ i ] is 1 indicates that horizontal mirroring is required.

And the rotation _ flag [ i ] indicates whether the ith packaging area rotates or not when being inversely converted to the ith mapping area, the rotation _ flag [ i ] is 0, which means that the rotation is not needed, and the rotation _ flag [ i ] is 1, which means that the rotation is needed. When the width of the encapsulation area is not equal to its height, the value of the field rotation _ flag [ i ] is 0. It can be understood that the rotation _ flag [ i ] can also indicate whether the ith mapping region is rotated when being converted to the ith encapsulation region, the rotation _ flag [ i ] is 0 to indicate that the rotation is not needed, and the rotation _ flag [ i ] is 1 to indicate that the rotation is needed. When the width of the mapping region is not equal to its height, the value of the field rotation _ flag [ i ] is 0.

The rotation _ depth [ i ] indicates the angle of rotation when the ith encapsulation area is inversely converted to the ith mapping area, and the field value cannot be 0 in units of 90 degrees. It is understood that rotation _ degree [ i ] can also indicate the angle of rotation when the ith mapping region is converted to the ith encapsulation region, and the field value cannot be 0 in units of 90 degrees.

The second improvement is that: the rectangular mapping area which needs to be rotated and meets the splitting condition is allowed to be split into a plurality of square mapping areas and then rotated; in the same way, the rectangular packaging area which needs to be rotated and meets the splitting condition is allowed to be split into a plurality of square packaging areas and then rotated.

Based on the improvement, the data processing scheme of the immersion medium provided by the embodiment of the application can greatly reduce the storage and calculation overhead of the immersion medium in the region encapsulation process and the region decapsulation process. In the following, a specific example is used for comparison, and fig. 6a is a schematic diagram illustrating the effect of rotating a rectangular mapping region provided by an exemplary embodiment of the present application; fig. 6b is a schematic diagram illustrating an effect of splitting a rectangular mapping region into square mapping regions and then rotating the square mapping regions according to an exemplary embodiment of the present application. According to the calculation requirement in the region packaging process, the size of a calculation matrix required by the mapping region in the region packaging process is max (width, height) × max (width, height), wherein the width represents the width of the mapping region, and the height represents the height of the mapping region. That is, in the area encapsulation process, the size of the calculation matrix when each pixel in the mapping area is subjected to the conversion processing depends on the larger of the width and the height of the mapping area. As shown in fig. 6a, the rectangular mapping area has a height H and a width 4H, and the size of the calculation matrix required by the rectangular mapping area in the area packaging process is 4H × 4H. As shown in fig. 6b, according to the improved scheme provided in the embodiment of the present application, the rectangular mapping region shown in fig. 6a is divided into 4 square mapping regions with a side length of H, and the square mapping regions are respectively rotated, where the size of the computation matrix corresponding to each square mapping region is H × H, and then the size of the computation matrix required for the 4 square mapping regions is 4 × H. By comparison, the rectangular mapping area needing to be rotated is divided into the square mapping areas and then rotated, the size of a calculation matrix required by rotation is reduced by 3/4, the storage space occupied by the calculation matrix is reduced by 3/4, and the storage cost is greatly reduced; meanwhile, the size of the calculation matrix is reduced, and the calculation complexity is greatly reduced, so that the calculation overhead in the region packaging process is greatly reduced, and the encoding and decoding efficiency of the immersion medium is further improved.

FIG. 7 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application; the method is performed by an encoding device in an immersive media system, the method comprising the following steps S701-S702:

s701, acquiring a region encapsulation data box of the immersion medium, wherein the region encapsulation data box comprises region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to the first conversion process, the region shape information indicates that a shape of the ith mapping region is a target shape, where i is a positive integer.

And S702, performing region packaging processing on the immersion medium according to the region packaging data box.

In steps S701 to S702, the syntax of the Region packaging data Box (Region wide packaging Box) can be referred to table 1, table 4, and table 3 above. Wherein the region shape information refers to information indicating a shape of the region (the mapping region and/or the encapsulation region), and the region shape information includes a width proj reg width [ i ] and a height proj reg height [ i ] of the ith mapping region, see table 4 above. When the value of the width proj reg width [ i ] of the ith mapping region and the value of the height proj reg height [ i ] thereof are equal, the region shape information may be used to indicate that the shape of the ith mapping region is a target shape, where the target shape is a square. When the value of the width proj reg width [ i ] of the ith mapping region and the value of the height proj reg height [ i ] thereof are not equal, the region shape information may be used to indicate that the shape of the ith mapping region is a rectangle.

The area process information refers to information indicating what process should be performed on an area (mapping area and/or encapsulation area). The area processing information includes a flag of the first conversion processing; when the flag of the first conversion process is a valid value, the area processing information is used to indicate that the ith mapping area needs to be subjected to the first conversion process. When the flag of the first conversion process is an invalid value, the area processing information is used to indicate that the i-th mapping area does not need to be subjected to the first conversion process. Here, the valid and invalid values are set according to the requirements of the encoding standard, for example: according to the existing AVS standard, the effective value is 1; the invalid value is 0. The region processing information further includes a magnitude of the first conversion processing; the amplitude is non-zero and the amplitude varies by a multiple of the conversion step size. When the flag of the first conversion process is a valid value, step S702 specifically includes: the first conversion process is performed on the ith mapping region in terms of amplitude.

In an embodiment, the first conversion process refers to rotation in a counterclockwise direction, and referring to table 4 above, the flag of the first conversion process refers to a rotation flag rotation _ flag [ i ], and the valid value is 1, that is, the value of rotation _ flag [ i ] is 1, which indicates that the ith mapping region needs to be rotated. In this embodiment, the amplitude of the first conversion process is a rotation angle rotation _ degree [ i ], the conversion step is 90 degrees, that is, the mapping region rotates counterclockwise in units of 90 degrees, for example: rotating 90 degrees, 180 degrees and 270 degrees counterclockwise; then step S702 specifically includes: the i-th mapped region is rotated counterclockwise by the rotation angle indicated by rotation _ degree [ i ].

In addition, the area processing information may further include a flag of the second conversion processing; when the flag of the second conversion process is a valid value, the area processing information is used to indicate that the ith mapping area needs to be subjected to the second conversion process. When the flag of the second conversion process is an invalid value, the area processing information is used to indicate that the i-th mapping area does not need to be subjected to the second conversion process.

In an embodiment, the second conversion process refers to horizontal mirroring, referring to table 4 above, the flag of the second conversion process is horizontal _ flip _ flag [ i ], the effective value is 1, and the value of the horizontal _ flip _ flag [ i ] is 1, which indicates that horizontal mirroring is required; then step S702 specifically includes: performing horizontal mirroring on the ith mapping region; mirror image refers to the action of a symmetric flip performed centered on a vertical axis passing through a center point like a mirror reflection. Horizontal mirroring refers to a symmetrical flipping action performed in a horizontal direction, centered on a vertical axis passing through a center point, like a mirror reflection.

In the embodiment of the application, only the mapping region with the target shape (such as a square) can be executed with the first conversion processing (such as rotation) in the region packaging data box of the immersion medium, and the storage and calculation cost of the immersion medium in the region packaging process can be greatly reduced due to the small size of the calculation matrix required by the mapping region with the target shape in the region packaging process.

FIG. 8 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application; the method is performed by a decoding device in an immersive media system, the method comprising the following steps S801-S802:

s801, acquiring a region packaging data box of the immersion medium, wherein the region packaging data box comprises region shape information and region processing information; when the region processing information indicates that the ith packaged region of the immersion medium needs to be subjected to the first inverse conversion process, the region shape information indicates that the shape of the ith packaged region is a target shape, where i is a positive integer.

And S802, performing region decapsulation processing on the immersion medium according to the region encapsulation data box.

In steps S801 to S802, the syntax of the Region Packing Box (Region Wise Packing Box) can be found in table 1, table 4, and table 3 described above. Wherein the region shape information refers to information indicating a shape of the region (the mapping region and/or the encapsulation region), and the region shape information includes a width packet _ reg _ width [ i ] and a height packet _ reg _ height [ i ] of the ith encapsulation region, see table 4 above. When the value of the width packet _ reg _ width [ i ] of the ith encapsulation area is equal to the value of the height packet _ reg _ height [ i ], the area shape information is used for indicating that the shape of the ith encapsulation area is an object shape, wherein the object shape is a square. When the value of the width packet _ reg _ width [ i ] of the ith encapsulation area is not equal to the value of the height packet _ reg _ height [ i ], the area shape information is used for indicating that the shape of the ith encapsulation area is rectangular.

The area process information refers to information indicating what process should be performed on an area (mapping area and/or encapsulation area). The area processing information includes a flag of the first conversion processing; when the flag of the first conversion process is a valid value, the area process information is used to indicate that the ith encapsulation area needs to be subjected to the first inverse conversion process. When the flag of the first conversion process is an invalid value, the area processing information is used to indicate that the ith encapsulation area does not need to be subjected to the first inverse conversion process. The region processing information further includes a magnitude of the first conversion processing; the amplitude is non-zero and the amplitude varies by a multiple of the conversion step size. When the flag of the first conversion process is a valid value, step S802 specifically includes: the first inverse conversion process is performed on the ith encapsulation area in accordance with the amplitude.

In one embodiment, the first conversion process refers to rotation in a counterclockwise direction; the first reverse conversion processing is processing reverse to the first conversion processing, and the first reverse conversion processing is rotation in the clockwise direction. Referring to table 4, the flag of the first conversion process is a rotation flag rotation _ flag [ i ], and the valid value is 1, that is, the value of rotation _ flag [ i ] is 1, which indicates that the ith encapsulation area needs to be rotated. In this embodiment, the amplitude refers to a rotation angle rotation _ degree [ i ], and the conversion step is 90 degrees, that is, the encapsulation area rotates clockwise by 90 degrees, for example: clockwise rotation is carried out by 90 degrees, 180 degrees and 270 degrees; then step S802 specifically includes: and performing clockwise rotation on the ith mapping area according to the rotation angle indicated by rotation _ degree [ i ].

In addition, the area processing information may further include a flag of the second conversion processing; when the flag of the second conversion process is a valid value, the area processing information is used to indicate that the ith encapsulation area needs to be subjected to the second inverse conversion process. When the flag of the second conversion process is an invalid value, the area processing information is used to indicate that the ith encapsulation area does not need to be subjected to the second inverse conversion process.

In one embodiment, the second conversion process is horizontal mirroring, and the second reverse conversion process is a process reverse to the second conversion process, and the second reverse conversion process is also horizontal mirroring. Referring to table 4, the flag of the second conversion process is horizontal _ flip _ flag [ i ], the effective value is 1, and the value of the horizontal _ flip _ flag [ i ] is 1, which indicates that horizontal mirroring is required; then step S802 specifically includes: horizontal mirroring is performed on the ith encapsulation area.

In the embodiment of the application, only the encapsulation area with the target shape (such as a square) can be executed with the first inverse conversion processing (such as rotation) in the area decapsulation process by the provision of the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, the provision can greatly reduce the storage and calculation overhead of the immersion medium in the area decapsulation process.

FIG. 9 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application; the method is performed by an encoding device in an immersive media system, the method comprising the steps S901-S905 of:

s901, acquiring a projected image of the immersion medium and the area configuration of the projected image.

The area configuration refers to configuration information set by the encoding device for an area encapsulation process according to actual requirements, where the actual requirements may include, but are not limited to: coding efficiency requirements, coding calculation requirements, coding capability information of the coding device itself, and the like. The zone configuration may include a zone configuration and a zone processing configuration. The partition configuration refers to information describing a partition plan of the projection image, that is, the partition configuration specifies how to partition the projection image into a plurality of mapping regions. The area processing arrangement refers to information for describing a conversion processing plan of the mapping area, that is, the area processing arrangement specifies how to perform conversion processing on each mapping area in the projection image.

S902, dividing N mapping regions from the projection image according to the region configuration, wherein N is a positive integer.

In one embodiment, step S902 may include the following steps S11-S12:

s11, the projection image is divided into a plurality of initial regions according to the partition arrangement.

s12, converting the plurality of initial regions into N mapped regions according to the region handling configuration.

In step s11-12, the shape of the initial region obtained by dividing the projection image in accordance with the partition arrangement may be a square or a rectangle. According to the improved design of the embodiment of the present application, only the square mapping region can be rotated, so step s12 needs to determine whether the initial region needs to be rotated according to the indication of the region processing configuration, and determine whether the initial region can be rotated as the mapping region according to the shape of the initial region; specifically, the target initial region is adopted to refer to any one of the plurality of initial regions, and the specific flow of step s12 is described by taking the target initial region as an example, then, the manner of converting the target initial region into the mapping region according to the region configuration in step s12 includes:

(1) if the shape of the target initial region is the target shape (i.e., square), the target initial region is directly set as the mapping region.

In this way, the target initial region can be directly used as the mapping region, and since the shape of the target initial region is a square, when the target initial region needs to be rotated, the target initial region is allowed to be rotated.

(2) If the target initial area is a non-target shape (i.e. a rectangle) in shape and the area processing configuration indicates that the target initial area needs to be subjected to the first conversion process, the target initial area is split into at least one mapping area of the target shape.

In this manner, the first conversion process may refer to counterclockwise rotation. Since the target initial region is rectangular in shape, the target initial region is not allowed to be directly rotated according to the improved rule of the embodiment of the present application, which may bring huge storage and calculation overhead. Therefore, the target initial region needs to be split into at least one square mapping region and then rotated. It can be understood that, before splitting the target initial region into at least one square mapping region, it may be determined whether the target initial region meets a splitting condition, where the splitting condition is a condition that the target initial region can be split into the square mapping region, and if the target initial region meets the splitting condition, one or more square mapping regions can be split from the target initial region by using a certain splitting manner, and then the target initial region is split. If the target initial region does not satisfy the splitting condition, that is, the square mapping region cannot be obtained from the target initial region no matter what splitting method is adopted, the target initial region may be directly set as the mapping region, but at the same time, the mapping region is set to be non-rotatable, that is, in the syntax shown in table 4, the value of the rotation flag corresponding to the mapping region is set to be an invalid value (for example, set to be 0).

(3) If the shape of the target initial region is a non-target shape and the region processing configuration indicates that the target initial region does not need to be subjected to the first conversion processing, the target initial region is set as the mapping region.

In this manner, the first conversion process may refer to counterclockwise rotation. Although the target initial area is rectangular in shape, since the target initial area does not need to be rotated and does not bring storage and calculation overhead to the area packaging process due to the rotation, the target initial area can be directly used as a mapping area, and the target initial area is still allowed to perform conversion processing other than rotation, such as horizontal mirroring.

S903, generating a region packaging data box of the immersion medium according to the region configuration and the N mapping regions; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the ith mapping region needs to be subjected to the first conversion process, where i is a positive integer and 1 ≦ i ≦ N.

The area processing information refers to information indicating what conversion processing the mapping area should be performed. The syntax of the Region Packing Box (Region Wise Packing Box) can be found in table 1, table 4 and table 3 above. Step S903 is a process of assigning values to corresponding syntax fields in the area encapsulation data box according to actual requirements by the encoding device.

In one embodiment, step S903 specifically includes the following steps S21-S23:

s21, obtaining the width and height of the ith mapping region.

s22, adding the width and height of the ith mapping region to the region shape information in the region encapsulation data box; and the number of the first and second groups,

s23, comparing the width and height of the ith mapping region, and setting the region processing information in the region encapsulation data box according to the comparison result and the region configuration.

In steps s21-s22, after N mapping regions are divided from the projection image, the width and height values of each mapping region are determined, the width and height values of the ith mapping region can be obtained, and the values are added to the region shape information in the region encapsulation data box, that is, the process of assigning values to two fields, namely, proj _ reg _ width [ i ] and proj _ reg _ height [ i ] in the region encapsulation data box, the width value of the ith mapping region is assigned to proj _ reg _ width [ i ], and the height value of the ith mapping region is assigned to proj _ reg _ height [ i ]. In step s23, the comparison between the width and the height of the ith mapping region is performed to check the shape of the ith mapping region and assign values to the corresponding fields of the region processing information in the region encapsulation data box.

In one embodiment, the region processing information includes a flag of the first conversion process, which may be rotation, and the flag of the first conversion process may be, for example, rotation _ flag [ i ] shown in table 4; then step s23 specifically includes the following steps s231-s 233:

s231, when the width and the height of the ith mapping region are not equal, that is, the ith mapping region is rectangular, setting the flag of the first conversion process in the region encapsulation data box to an invalid value, for example, setting the value of rotation _ flag [ i ] to 0, which indicates that the ith mapping region is not rotated.

s232, when the width and the height of the ith mapping region are equal, that is, the ith mapping region is a square, if the region configuration indicates that the ith mapping region needs to be subjected to the first conversion process, setting the flag of the first conversion process in the region encapsulation data box to be a valid value, for example, setting the value of rotation _ flag [ i ] to 1, which indicates that the ith mapping region needs to be rotated.

s233, when the width and height of the ith mapping region are equal, that is, the ith mapping region is square, if the region configuration indicates that the ith mapping region does not need to be executed with the first conversion process, setting the flag of the first conversion process type in the region encapsulation data box to an invalid value; for example, the value of rotation _ flag [ i ] is set to 0, which indicates that the ith mapping region does not need to be rotated.

In one embodiment, the region processing information includes a magnitude of the first conversion process; the magnitude of the first conversion process may be, for example, a rotation angle. Then step s23 specifically includes the following step s 234:

s234, when the width and the height of the ith mapping region are equal, that is, the ith mapping region is a square, if the region configuration indicates that the ith mapping region needs to be executed with the first conversion process, setting the amplitude of the first conversion process in the region encapsulation data box according to the indication of the region configuration, for example: the zone configuration indicates that the ith mapping zone needs to be rotated by 90 degrees, then the rotation _ degree [ i ] in the zone encapsulation data box is set to 90 degrees.

In one embodiment, the region processing information includes a flag of the second conversion process; this second conversion process may be a horizontal mirror image, and then step s23 specifically includes the following steps s235-s 236:

s235, if the region configuration indicates that the ith mapping region needs to be performed with the second conversion process, setting the flag of the second conversion process in the region encapsulation data box to a valid value; for example, the value of horizontal _ flip _ flag [ i ] is set to 1.

s236, if the zone configuration indicates that the ith mapping zone does not need to be executed with the second conversion process, setting the flag of the second conversion process in the zone encapsulation data box to an invalid value; for example, the value of horizontal _ flip _ flag [ i ] is set to 0.

And S904, performing region packaging processing on the projected image according to the region packaging data box to obtain a packaging image of the immersion medium, wherein the packaging image comprises N packaging regions, and one mapping region corresponds to one packaging region.

And S905, coding the packaged image to obtain a coded file of the immersion media.

The coding file refers to a video code stream. Referring to the embodiment of fig. 1, after the encoding device obtains the video code stream in step S902, the encoding device further performs file encapsulation on the video code stream and the audio code stream according to the file format of the immersive media to form a media file resource of the immersive media, and the region encapsulation data box formed in step S903 is encapsulated into the media presentation description information of the media file resource, so that the decoding device presents the media file resource according to the media presentation description information.

In the embodiment of the application, only the mapping region with the shape of the target shape (such as a square) can be specified in the region packaging data box of the immersion medium to be executed with the first conversion processing (such as rotation) in the region packaging process, and because the size of the calculation matrix required by the mapping region with the target shape in the region packaging process is small, the specification can greatly reduce the storage and calculation overhead of the immersion medium in the region packaging process and is beneficial to improving the encoding efficiency of the immersion medium.

FIG. 10 illustrates a flow chart of a method of data processing of an immersive media as provided by an exemplary embodiment of the present application; the method is performed by a decoding device in an immersive media system, the method including the following steps S1001-S1004.

S1001, decoding the encoded file of the immersion medium to obtain a packaged image of the immersion medium, wherein the packaged image comprises N packaged areas, and N is a positive integer.

The decoding equipment receives the media file resources and the media presentation description information of the immersion media transmitted by the encoding equipment, and firstly carries out file decapsulation processing on the media file resources to obtain audio code streams and video code streams of the immersion media. The video stream here is the encoded file in step S1001. The encoding file of this embodiment refers to a video code stream obtained after the encoding device is processed through the steps of the embodiment shown in fig. 9, that is, the video code stream obtained after the encoding device performs the regional encapsulation process, where the video code stream is obtained by encoding an encapsulated image. In this step, the decoding device decodes the video code stream to obtain a packaged image of the immersive media. It can be understood that, if the video code stream is obtained by directly encoding the projected image, the decoding device performs decoding processing on the video code stream to obtain the projected image.

S1002, acquiring a region packaging data box of the immersion medium, wherein the region packaging data box comprises region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion process, the region shape information indicates that a shape of the ith packaged region is a target shape, where i is a positive integer.

S1003, performing region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, wherein the projected image comprises N mapping regions, and one encapsulation region corresponds to one mapping region.

In steps S1002 to S1003, the region processing information refers to information indicating what reverse conversion processing should be performed for the packaged region. The syntax of the Region Packing Box (Region Wise Packing Box) can be found in table 1, table 4 and table 3 above. The region decapsulation processing is the reverse of the region encapsulation processing, and the region decapsulation is a process of performing reverse conversion processing on the encapsulated image by region, and the region decapsulation causes the encapsulated image to be converted into a projected image. The process of region decapsulation specifically includes: and respectively carrying out reverse conversion processing on a plurality of packaging areas in the packaging image according to corresponding instructions in the area packaging data box to obtain a plurality of mapping areas, and mapping the plurality of mapping areas to a 2D image to obtain a projection image. The inverse conversion processing refers to processing inverse to the conversion processing, for example: the conversion process means a counterclockwise rotation of 90 degrees, and the reverse conversion process means a clockwise rotation of 90 degrees.

The area processing information includes a flag of the first conversion processing; when the flag of the first conversion process is a valid value, the area process information is used to indicate that the ith encapsulation area needs to be subjected to the first inverse conversion process. When the flag of the first conversion process is an invalid value, the area processing information is used to indicate that the ith encapsulation area does not need to be subjected to the first inverse conversion process. When the flag of the first conversion process is a valid value, step S1003 specifically includes: and performing first inverse conversion processing on the ith packaging area to obtain the ith mapping area of the projection image. The region processing information further includes a magnitude of the first conversion processing; the amplitude is a non-zero value and is changed according to the multiple of the conversion step length; when the flag of the first conversion process is a valid value, step S1003 specifically includes: first inverse conversion processing is performed on the ith encapsulation area according to the amplitude, and the ith mapping area of the projection image is obtained.

In one embodiment, the first conversion process refers to rotation in a counterclockwise direction; the first reverse conversion processing is processing reverse to the first conversion processing, and the first reverse conversion processing is rotation in the clockwise direction. Referring to table 4, the flag of the first conversion process is a rotation flag rotation _ flag [ i ], and the valid value is 1, that is, the value of rotation _ flag [ i ] is 1, which indicates that the ith encapsulation area needs to be rotated. In this embodiment, the amplitude refers to a rotation angle rotation _ degree [ i ], and the conversion step is 90 degrees, that is, the encapsulation area rotates clockwise by 90 degrees, for example: clockwise rotation is carried out by 90 degrees, 180 degrees and 270 degrees; then step S1003 specifically includes: and performing clockwise rotation on the ith mapping area according to the rotation angle indicated by the rotation _ degree [ i ] to obtain the ith mapping area of the projection image.

In addition, the area processing information may further include a flag of the second conversion processing; when the flag of the second conversion process is a valid value, the area process information is used to indicate that the ith encapsulation area needs to be subjected to the second inverse conversion process. When the flag of the second conversion process is an invalid value, the area processing information is used to indicate that the ith encapsulation area does not need to be subjected to the second inverse conversion process. When the flag of the second conversion process is a valid value, step S1003 specifically includes: and performing second inverse conversion processing on the ith packaging region to obtain an ith mapping region of the projected image.

In one embodiment, the second conversion process is horizontal mirroring, and the second reverse conversion process is a process reverse to the second conversion process, and the second reverse conversion process is also horizontal mirroring. Referring to table 4, the flag of the second conversion process is horizontal _ flip _ flag [ i ], the effective value is 1, and the value of the horizontal _ flip _ flag [ i ] is 1, which indicates that horizontal mirroring is required; then step S1003 specifically includes: and performing horizontal mirroring on the ith packaging area to obtain the ith mapping area of the projection image.

And S1004, performing three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium.

After the decoding device performs three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image, referring to the description of the embodiment shown in fig. 1, the decoding device also renders audio content obtained by audio decoding and a 3D image obtained by video decoding according to metadata related to rendering and windows in the media presentation description information, and the playing and output of the 3D image are realized after the rendering is completed.

In the embodiment of the application, only the encapsulation area with the target shape (such as a square) can be executed with the first inverse conversion processing (such as rotation) in the area decapsulation process by the provision in the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, such provision can greatly reduce the storage and calculation overhead of the immersion medium in the area decapsulation process, and is beneficial to improving the decoding efficiency of the immersion medium.

FIG. 11 is a block diagram of a data processing device for immersive media according to an exemplary embodiment of the present application; the data processing means of the immersion medium can be a computer program (including program code) running in the encoding device, for example the data processing means of the immersion medium can be an application software in the encoding device. Referring to fig. 11, the immersion medium data processing apparatus includes an acquisition unit 1101 and a processing unit 1102.

In one exemplary embodiment, the immersive media data processing apparatus may be configured to perform the corresponding steps of the method illustrated in FIG. 7; then:

an acquisition unit 1101 for acquiring a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to the first conversion processing, the region shape information indicates that a shape of the ith mapping region is a target shape, wherein i is a positive integer;

a processing unit 1102 for performing a zone encapsulation process on the immersion medium in accordance with the zone encapsulation data box.

In one embodiment, the region shape information includes a width and a height of the ith mapping region;

the region shape information is used to indicate that the shape of the ith mapping region is the target shape when the width and height of the ith mapping region are equal.

In one embodiment, the region processing information includes a flag of the first conversion process;

when the flag of the first conversion process is a valid value, the area processing information is used to indicate that the ith mapping area needs to be subjected to the first conversion process.

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a non-zero value and is changed according to the multiple of the conversion step length; the processing unit 1102 is specifically configured to: the first conversion process is performed on the ith mapping region in terms of magnitude.

In one embodiment, the first conversion process refers to rotation in a counterclockwise direction; the amplitude refers to the rotation angle, and the conversion step is 90 degrees.

In one embodiment, the region processing information includes a flag of the second conversion process;

when the flag of the second conversion process is a valid value, the area processing information is used to indicate that the ith mapping area needs to be subjected to the second conversion process.

In one embodiment, the second conversion process is referred to as horizontal mirroring.

In another exemplary embodiment, the immersive media data processing apparatus may be used to perform the corresponding steps in the method shown in fig. 9; then:

an acquisition unit 1101 configured to acquire a projected image of the immersion medium and a region configuration of the projected image;

a processing unit 1102, configured to partition N mapping regions from the projection image according to the region configuration, where N is a positive integer; and (c) a second step of,

a zone encapsulation data box for generating the immersion media according to the zone configuration and the N mapped zones; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the ith mapping region needs to be subjected to the first conversion process, where i is a positive integer and 1 ≦ i ≦ N; and the number of the first and second groups,

In one embodiment, the zone configuration includes a zone configuration and a zone handling configuration; the processing unit 1102 is specifically configured to:

dividing the projection image according to partition configuration to obtain a plurality of initial areas;

the plurality of initial regions are converted into N mapping regions according to a region processing configuration.

In one embodiment, the way that the processing unit 1102 converts the target initial region into the mapping region according to the region processing configuration includes:

if the shape of the target initial area is the target shape, setting the target initial area as a mapping area;

if the shape of the target initial region is a non-target shape and the region processing configuration indicates that the target initial region needs to be subjected to first conversion processing, splitting the target initial region into at least one mapping region of the target shape;

if the shape of the target initial area is a non-target shape and the area processing configuration indicates that the target initial area does not need to be subjected to the first conversion processing, setting the target initial area as a mapping area;

wherein the target initial region is any one of the plurality of initial regions.

In an embodiment, the processing unit 1102 is specifically configured to:

acquiring the width and height of the ith mapping region;

adding the width and height of the ith mapping region to the region shape information in the region encapsulation data box; and the number of the first and second groups,

and comparing the width and the height of the ith mapping area, and setting area processing information in the area packaging data box according to the comparison result and the area configuration.

In one embodiment, the region processing information includes a flag of the first conversion process; the processing unit 1102 is specifically configured to:

when the width and the height of the ith mapping area are not equal, setting a flag of the first conversion processing in the area packaging data box as an invalid value;

when the width and the height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region needs to be subjected to the first conversion processing, setting the flag of the first conversion processing in the region encapsulation data box to a valid value;

when the width and height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region does not need to be subjected to the first conversion process, the flag of the first conversion process type in the region encapsulation data box is set to an invalid value.

In one embodiment, the region processing information includes a magnitude of the first conversion process;

when the width and height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region needs to be subjected to the first conversion process, the magnitude of the first conversion process in the region encapsulation data box is set in accordance with the indication of the region configuration.

In one embodiment, the region processing information includes a flag of the second conversion process; the processing unit 1102 is further configured to:

if the zone configuration indicates that the ith mapping zone needs to be executed with the second conversion processing, setting a flag of the second conversion processing in the zone encapsulation data box to be a valid value;

if the zone configuration indicates that the ith mapping zone does not need to be subjected to the second conversion process, the flag of the second conversion process in the zone enclosure data box is set to an invalid value.

According to an embodiment of the invention, the units in the data processing device of the immersion medium shown in fig. 11 can be combined into one or several other units respectively or totally, or some unit(s) can be further split into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effect of the embodiment of the invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the immersion medium data processing apparatus can also include other elements, and in practical applications, these functions can be facilitated by other elements, and can be achieved by multiple elements in cooperation. According to another embodiment of the present application, the data processing apparatus of the immersion medium shown in fig. 7 or 9 can be constructed by running a computer program (including program code) capable of executing the steps involved in the corresponding method shown in fig. 7 or 9 on a general-purpose computing device, such as a computer, including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., and a storage element, and the data processing method of the immersion medium of the embodiments of the present application can be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

In the embodiment of the application, only the mapping region with the shape of the target shape (such as a square) can be specified in the region packaging data box of the immersion medium to be subjected to the first conversion processing (such as rotation) in the region packaging process, and because the size of a calculation matrix required by the mapping region with the target shape in the region packaging process is small, the specification can greatly reduce the storage and calculation overhead of the immersion medium in the region packaging process and is favorable for improving the encoding efficiency of the immersion medium.

FIG. 12 is a block diagram of a data processing device for immersive media according to an exemplary embodiment of the present application; the data processing means of the immersion medium can be a computer program (including program code) running in the decoding device, for example the data processing means of the immersion medium can be an application software in the decoding device. Referring to fig. 12, the immersion media data processing apparatus includes an acquisition unit 1201 and a processing unit 1202.

In one exemplary embodiment, the immersive media data processing apparatus may be configured to perform the corresponding steps of the method illustrated in FIG. 8; then:

an acquisition unit 1201 for acquiring a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion processing, the region shape information indicates that a shape of the ith packaged region is a target shape, wherein i is a positive integer;

a processing unit 1202 for performing a region decapsulation process on the immersion medium by the region encapsulation data box.

In one embodiment, the region shape information includes a width and a height of the ith encapsulation region;

the region shape information indicates that the shape of the ith packaging region is a target shape when the width and height of the ith packaging region are equal.

when the flag of the first conversion process is a valid value, the area process information is used to indicate that the ith encapsulation area needs to be subjected to the first inverse conversion process.

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a non-zero value and is changed according to the multiple of the conversion step length; the processing unit 1202 is specifically configured to: the first inverse conversion process is performed on the ith encapsulation area in terms of magnitude.

In one embodiment, the first conversion process refers to rotation in a counterclockwise direction; the first reverse conversion processing refers to rotation in the clockwise direction; the amplitude refers to the rotation angle, and the conversion step is 90 degrees.

when the flag of the second conversion process is a valid value, the area process information is used to indicate that the ith encapsulation area needs to be subjected to the second inverse conversion process.

In one embodiment, the second inverse transformation process refers to horizontal mirroring.

In another exemplary embodiment, the immersive media data processing apparatus may be used to perform the corresponding steps in the method shown in fig. 10; then:

the processing unit 1201 is configured to perform decoding processing on the encoded file of the immersion medium to obtain a packaged image of the immersion medium, where the packaged image includes N packaged regions, and N is a positive integer;

an acquisition unit 1202 for acquiring a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion processing, the region shape information indicates that a shape of the ith packaged region is a target shape, wherein i is a positive integer;

the processing unit 1202 is further configured to perform region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, where the projected image includes N mapping regions, and one encapsulation region corresponds to one mapping region; and the projection image is also used for carrying out three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium.

In one embodiment, the region processing information includes a flag of the first conversion process; when the flag of the first conversion process is a valid value, the processing unit 1202 is specifically configured to:

and performing first inverse conversion processing on the ith packaging area to obtain the ith mapping area of the projection image.

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a nonzero value and is changed according to multiples of the conversion step length; the processing unit 1202 is specifically configured to:

first inverse conversion processing is performed on the ith encapsulation area according to the amplitude, and the ith mapping area of the projection image is obtained.

In one embodiment, the region processing information includes a flag of the second conversion process; when the flag of the second conversion process is a valid value, the processing unit 1202 is further configured to:

and performing second inverse conversion processing on the ith packaging region to obtain an ith mapping region of the projected image.

According to an embodiment of the present invention, the units in the data processing device of the immersion medium shown in fig. 12 can be combined into one or several other units respectively or totally, or some unit(s) can be further split into a plurality of functionally smaller units to realize the same operation without affecting the realization of the technical effect of the embodiment of the present invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the immersion medium data processing apparatus can also include other elements, and in practical applications, these functions can be facilitated by other elements, and can be achieved by multiple elements in cooperation. According to another embodiment of the present application, the data processing apparatus of the immersion medium shown in fig. 8 or 10 can be constructed by running a computer program (including program code) capable of executing the steps involved in the corresponding method shown in fig. 8 or 10 on a general-purpose computing device, such as a computer, including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., and a storage element, and the data processing method of the immersion medium of the embodiments of the present application can be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

In the embodiment of the application, only the encapsulation area with the target shape (such as a square) can be executed with the first inverse conversion processing (such as rotation) in the area decapsulation process by the provision of the area encapsulation data box of the immersion medium, and because the size of the calculation matrix required by the encapsulation area with the target shape in the area decapsulation process is small, the provision can greatly reduce the storage and calculation overhead of the immersion medium in the area decapsulation process, and is favorable for improving the decoding efficiency of the immersion medium.

Fig. 13 is a schematic structural diagram of an encoding apparatus according to an exemplary embodiment of the present application; the encoding device may refer to a computer device used by a provider of the immersive media, which may be a terminal (e.g., a PC, a smart mobile device (e.g., a smartphone), etc.) or a server. As shown in fig. 13, the encoding device includes a capture device 1301, a processor 1302, a memory 1303, and a transmitter 1304. Wherein:

the capture device 1301 is used to capture a real-world sound-visual scene to obtain raw data (including audio content and video content that remain synchronized in time and space) of the immersive media. The capture device 1301 may include, but is not limited to: audio equipment, camera equipment and sensing equipment. The audio device may include, among other things, an audio sensor, a microphone, and the like. The camera devices may include a general camera, a stereo camera, a light field camera, and the like. The sensing device may include a laser device, a radar device, or the like.

The processor 1302 (or CPU) is a Processing core of the encoding apparatus, and the processor 1302 is adapted to implement one or more program instructions, and is specifically adapted to load and execute the one or more program instructions so as to implement the flow of the data Processing method of the immersion medium shown in fig. 7 or fig. 9.

The memory 1303 is a memory device in the encoding device for storing programs and media resources. It is to be understood that the memory 1303 here may include a built-in storage medium in the encoding apparatus, and may also include an extended storage medium supported by the encoding apparatus. The memory may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, at least one memory located remotely from the processor. The memory provides a storage space for storing an operating system of the encoding device. And in the storage space there is also stored a computer program comprising program instructions adapted to be called up and executed by a processor for carrying out the steps of the data processing method of the immersion medium. Additionally, the memory 1303 may also be used to store an immersive media file formed after processing by the processor, the immersive media file including media file resources and media presentation description information.

The transmitter 1304 is used to enable transmission interaction of the encoding device with other devices, and in particular to enable transmission between the encoding device and the decoding device with respect to the immersive media. I.e., the encoding device transmits the relevant media asset of the immersive media to the decoding device through the transmitter 1304.

Referring again to fig. 13, the processor 1302 may include a converter 1321, an encoder 1322, and a wrapper 1323; wherein:

the converter 1321 is configured to perform a series of conversion processes on the captured video content to render the video content suitable for video encoding of the immersion medium being performed. The conversion process may include: stitching and projection, optionally the conversion process also includes area encapsulation. The converter 1321 may convert the captured 3D video content into a 2D image and provide it to an encoder for video encoding.

The encoder 1322 is configured to audio encode the captured audio content to form an audio codestream of the immersive media. And is further configured to perform video encoding on the 2D image obtained by the conversion by the converter 1321 to obtain a video code stream.

The encapsulator 1323 is configured to encapsulate the audio codestream and the video codestream in a file container according to a file format of the immersive media (e.g., ISOBMFF) to form a media file resource of the immersive media, where the media file resource may be a media file or a media fragment forming a media file of the immersive media; and recording metadata of the media file assets of the immersion media using the media presentation description information in accordance with the file format requirements of the immersion media. The resulting encapsulated file of the immersive media processed by the encapsulator is stored in memory and provided to the decoding device for presentation of the immersive media as needed.

In an exemplary embodiment, the processor 1302, and in particular the devices that the processor comprises, performs the steps of the immersion medium data processing method shown in fig. 7 by calling one or more instructions in memory. In particular, memory 1303 stores one or more first instructions adapted to be loaded by processor 1302 and to perform the steps of:

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a nonzero value and is changed according to multiples of the conversion step length; the one or more first instructions, when adapted to be loaded by the processor 1302 and perform the step of performing a zone encapsulation process on the immersion medium in accordance with the zone encapsulation data box, particularly perform the steps of: the first conversion process is performed on the ith mapping region in terms of magnitude.

In another exemplary embodiment, the processor (and in particular the devices that the processor contains) performs the steps of the immersion medium data processing method shown in fig. 9 by calling one or more instructions in memory 1303. In particular, the memory stores one or more second instructions adapted to be loaded by the processor 1302 and to perform the steps of:

dividing N mapping regions from the projected image according to region configuration, wherein N is a positive integer;

In one embodiment, the zone configuration includes a zone configuration and a zone handling configuration; the one or more second instructions, when adapted to be loaded by the processor 1302 and executed to divide the N mapping regions from the projected image according to the region configuration, specifically execute the following steps:

dividing the projected image according to partition configuration to obtain a plurality of initial areas;

In one embodiment, the method for converting the target initial region into the mapping region according to the region processing configuration comprises the following steps:

if the shape of the target initial region is the target shape, setting the target initial region as a mapping region;

if the target initial region is a non-target shape and the region processing configuration indicates that the target initial region needs to be subjected to the first conversion process, the one or more second instructions are adapted to be loaded and executed by the processor 1302 to split the target initial region into mapping regions of at least one target shape;

if the target initial region is non-target in shape and the region processing configuration indicates that the target initial region does not need to be subjected to the first transformation process, the one or more second instructions are adapted to be loaded and executed by the processor 1302 to set the target initial region as a mapping region;

In one embodiment, the one or more second instructions, when loaded and executed by the processor 1302, particularly perform the steps of generating a zone encapsulation data box for the immersion medium based on the zone configuration and the N mapped zones, comprising:

acquiring the width and height of the ith mapping region;

In one embodiment, the region processing information includes a flag of the first conversion process; the one or more second instructions are adapted to be loaded by the processor 1302 and execute the step of encapsulating the area processing information in the area enclosure data box according to the comparison result and the area configuration setting area, and specifically execute the following steps:

when the width and height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region needs to be subjected to the first conversion process, the one or more second instructions are adapted to be loaded by the processor 1302 and to perform the setting of the magnitude of the first conversion process in the region encapsulation data box as indicated by the region configuration.

if the zone configuration indicates that the ith mapped zone needs to be subjected to the second translation process, the one or more second instructions are adapted to be loaded by the processor 1302 and perform setting the flag of the second translation process in the zone encapsulation data box to a valid value;

if the zone configuration indicates that the ith mapped zone does not need to be subject to the second translation process, the one or more second instructions are adapted to be loaded by processor 1302 and to perform setting the flag of the second translation process in the zone encapsulation data box to an invalid value.

Fig. 14 is a schematic structural diagram of a decoding device according to an exemplary embodiment of the present application; the decoding device may refer to a computer device used by a user who immerses the media, and the computer device may be a terminal (e.g., a PC, a smart mobile device (e.g., a smartphone), a VR device (e.g., a VR headset, VR glasses, etc.)). As shown in fig. 14, the decoding apparatus comprises a receiver 1401, a processor 1402, a memory 1403, and a display/playback device 1404. Wherein:

the receiver 1401 is used for enabling the decoding to interact with the transmission of other devices, in particular for enabling the transmission of immersive media between the encoding device and the decoding device. I.e., the decoding device receives the relevant media asset of the encoding device transmitting the immersion media via receiver 1401.

The processor 1402 (or CPU) is a Processing core of the encoding apparatus, and the processor 1402 is adapted to implement one or more program instructions, and is specifically adapted to load and execute the one or more program instructions so as to implement the flow of the data Processing method of the immersion medium shown in fig. 8 or fig. 10.

The memory 1403 is a storage device in the decoding device for storing programs and media resources. It is to be understood that the memory 1403 herein may include a built-in storage medium in the decoding device, and may also include an extended storage medium supported by the decoding device. The memory 1403 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory); optionally, at least one memory located remotely from the processor. The memory 1403 provides a storage space for storing the operating system of the decoding device. And in the storage space there is also stored a computer program comprising program instructions adapted to be called up and executed by a processor for carrying out the steps of the data processing method of the immersion medium. In addition, the memory 1403 may also be used for storing a three-dimensional image of the immersive media formed after the processing by the processor, audio content corresponding to the three-dimensional image, information required for rendering the three-dimensional image and the audio content, and the like.

The display/playback device 1404 is configured to output the rendered sound and the three-dimensional image.

Referring again to fig. 14, the processor 1402 may include a parser 1421, a decoder 1422, a converter 1423, and a renderer 1424; wherein:

the parser 1421 is configured to perform file decapsulation on a packaged file of a rendered media from an encoding device, specifically decapsulate a media file resource according to a file format requirement of an immersion media, to obtain an audio code stream and a video code stream; and provides the audio stream and the video stream to a decoder 1422.

The decoder 1422 performs audio decoding on the audio code stream to obtain audio content, and provides the audio content to the renderer for audio rendering. In addition, the decoder 1422 decodes the video code stream to obtain a 2D image. If the metadata indicates that the immersive media performed the region encapsulation process, the 2D image is referred to as an encapsulated image, based on metadata provided by the media presentation description information; the planar image is referred to as a projected image if the metadata indicates that the immersion medium has not performed a region encapsulation process.

The converter 1423 is used to convert a 2D image into a 3D image. If the immersion medium has been subjected to a zone encapsulation process, the converter 1423 also zone decapsulates the encapsulated image to obtain a projected image. And reconstructing the projection image to obtain a 3D image. If the rendering media has not performed the region encapsulation process, the converter 1423 directly reconstructs the projection image into a 3D image.

The renderer 1424 is used to render the audio content and 3D images of the immersive media. Specifically, the audio content and the 3D image are rendered according to metadata related to rendering and windows in the media presentation description information, and the rendering is finished and delivered to a display/play device for output.

In one exemplary embodiment, the processor 1402 (and in particular the devices that the processor contains) performs the steps of the immersion media data processing method shown in fig. 8 by calling one or more instructions in memory. In particular, the memory stores one or more first instructions adapted to be loaded by the processor 1402 and to perform the steps of:

acquiring a region packaging data box of the immersion medium, wherein the region packaging data box comprises region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to the first inverse conversion processing, the region shape information indicates that a shape of the ith packaged region is a target shape, wherein i is a positive integer;

the region shape information is used to indicate that the shape of the ith encapsulation region is the target shape when the width and the height of the ith encapsulation region are equal.

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a non-zero value and is changed according to the multiple of the conversion step length; the one or more first instructions, when being adapted to be loaded by the processor 1402 and executed to perform the step of performing the zone decapsulation process on the immersion medium according to the zone encapsulation data box, specifically perform the following steps: the first inverse conversion process is performed on the ith encapsulation area in terms of magnitude.

In another exemplary embodiment, the processor 1402 (and in particular the devices that the processor contains) performs the steps of the immersion medium data processing method shown in fig. 10 by invoking one or more instructions in memory. In particular, memory 1403 stores one or more second instructions adapted to be loaded and executed by processor 1402 by:

In one embodiment, the region processing information includes a flag of the first conversion process; when the flag of the first conversion process is a valid value, the one or more second instructions are adapted to be loaded by the processor 1402 and to perform the step of performing the region decapsulation process on the encapsulated image according to the region encapsulation data box to obtain the projected image of the immersion medium, and specifically perform the following steps:

In one embodiment, the region processing information includes a magnitude of the first conversion process; the amplitude is a non-zero value and is changed according to the multiple of the conversion step length; the one or more second instructions, when adapted to be loaded by the processor 1402 and executed to perform the step of performing the region decapsulation process on the encapsulated image according to the region encapsulation data box to obtain the projected image of the immersion medium, specifically perform the steps of:

In one embodiment, the region processing information includes a flag of the second conversion process; when the flag of the second conversion process is a valid value, the one or more second instructions are adapted to be loaded by the processor 1402 and to perform the step of performing a region decapsulation process on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, and specifically perform the following steps:

and executing second inverse conversion processing on the ith packaging area to obtain the ith mapping area of the projection image.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.

Claims

1. A method of data processing of an immersive media, comprising:

acquiring a region encapsulation data box of the immersion medium, wherein the region encapsulation data box comprises region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to a first conversion process, the region shape information indicates that a shape of the ith mapping region is a target shape, where i is a positive integer; the immersion medium comprises N mapping areas, wherein N is a positive integer, and i is more than or equal to 1 and less than or equal to N; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions;

performing region encapsulation processing on the immersion medium according to the region encapsulation data box;

the method for converting the target initial region into the mapping region according to the region processing configuration comprises the following steps: if the shape of the target initial region is the target shape, setting the target initial region as a mapping region; if the shape of the target initial region is a non-target shape and the region processing configuration indicates that the target initial region needs to be subjected to first conversion processing, splitting the target initial region into at least one mapping region of the target shape; and if the shape of the target initial area is a non-target shape and the area processing configuration indicates that the target initial area does not need to be subjected to the first conversion processing, setting the target initial area as a mapping area.

2. The method of claim 1, wherein the region shape information includes a width and a height of the ith mapping region;

the region shape information is to indicate that a shape of the ith mapping region is a target shape when a width and a height of the ith mapping region are equal.

3. The method according to claim 1, wherein the area processing information includes a flag of the first conversion processing;

when the flag of the first conversion process is a valid value, the area processing information is used to indicate that the i-th mapping area needs to be subjected to the first conversion process.

4. The method of claim 3, wherein the region processing information includes a magnitude of the first conversion processing; the amplitude is a non-zero value and the amplitude is varied by a multiple of the conversion step size;

the performing region encapsulation processing on the immersion medium according to the region encapsulation data box comprises: and performing a first conversion process on the ith mapping region according to the amplitude.

5. The method of claim 4, wherein the first conversion process is rotation in a counterclockwise direction; the amplitude refers to a rotation angle, and the conversion step is 90 degrees.

6. The method of claim 1, wherein the region processing information includes a flag of the second conversion process;

when the flag of the second conversion process is a valid value, the area processing information is used to indicate that the i-th mapping area needs to be subjected to the second conversion process.

7. The method of claim 6, wherein the second conversion process is horizontal mirroring.

8. A method of data processing of an immersive media, comprising:

obtaining a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to a first inverse conversion process, the region shape information indicates that a shape of the ith packaged region is a target shape, where i is a positive integer; the immersion medium comprises N mapping regions, and the ith mapping region is converted to obtain the ith packaging region; n is a positive integer, and i is more than or equal to 1 and less than or equal to N; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions;

performing region decapsulation processing on the immersion medium according to the region encapsulation data box;

9. The method of claim 8, wherein the region shape information includes a width and a height of the ith encapsulation region;

when the width and the height of the ith packaging region are equal, the region shape information is used for indicating that the shape of the ith packaging region is a target shape.

10. The method of claim 8, wherein the area processing information includes a flag of the first conversion process;

when the flag of the first conversion process is a valid value, the region processing information is used to indicate that the ith encapsulation region needs to be subjected to a first inverse conversion process.

11. The method of claim 10, wherein the region processing information includes an amplitude of the first conversion process; the amplitude is a non-zero value and the amplitude is varied by a multiple of the conversion step size;

the performing region decapsulation processing on the immersion medium according to the region encapsulation data box includes: and executing first inverse conversion processing on the ith packaging area according to the amplitude.

12. The method of claim 11, wherein the first conversion process is rotation in a counterclockwise direction; the first reverse conversion processing refers to rotation in the clockwise direction;

the amplitude refers to a rotation angle, and the conversion step is 90 degrees.

13. The method of claim 8, wherein the region processing information includes a flag of the second conversion process;

when the flag of the second conversion process is a valid value, the region processing information is used to indicate that the ith encapsulation region needs to be subjected to a second inverse conversion process.

14. The method of claim 13, wherein the second inverse transformation process is horizontal mirroring.

15. A method of data processing of an immersive media, comprising:

acquiring a projection image of an immersion medium and the area configuration of the projection image; the region configuration comprises a partition configuration and a region processing configuration;

dividing the projected image according to the partition configuration to obtain a plurality of initial areas;

converting the plurality of initial regions into N mapping regions according to the region processing configuration, wherein N is a positive integer; the target initial region is any one of the plurality of initial regions;

generating a zone encapsulation data box for the immersion media from the zone configuration and the N mapped zones; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the mapping region needs to be subjected to a first conversion process, where i is a positive integer and 1 ≦ i ≦ N;

performing region encapsulation processing on the projected image according to the region encapsulation data box to obtain an encapsulation image of the immersion medium, wherein the encapsulation image comprises N encapsulation regions, and one mapping region corresponds to one encapsulation region;

coding the packaged image to obtain a coding file of the immersion medium;

the method for converting the target initial region into the mapping region according to the region processing configuration comprises the following steps: if the shape of the target initial region is the target shape, setting the target initial region as a mapping region; if the shape of the target initial region is a non-target shape and the region processing configuration indicates that the target initial region needs to be subjected to first conversion processing, splitting the target initial region into at least one mapping region of the target shape; if the shape of the target initial region is a non-target shape and the region processing configuration indicates that the target initial region does not need to be subjected to the first conversion processing, setting the target initial region as a mapping region.

16. The method of claim 15, wherein the generating a zone encapsulation data box for the immersion medium from the zone configuration and the N mapping zones comprises:

acquiring the width and height of the ith mapping region;

adding the width and height of the ith mapping region to region shape information in the region encapsulation data box; and the number of the first and second groups,

and comparing the width and the height of the ith mapping region, and setting region processing information in the region packaging data box according to the comparison result and the region configuration.

17. The method of claim 16, wherein the region processing information includes a flag of the first conversion process;

the setting of the area processing information in the area encapsulation data box according to the comparison result and the area configuration comprises:

when the width and the height of the ith mapping region are not equal, setting a flag of first conversion processing in the region encapsulation data box as an invalid value;

when the width and the height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region needs to be subjected to first conversion processing, setting a flag of the first conversion processing in the region encapsulation data box to a valid value;

when the width and the height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region does not need to be subjected to the first conversion process, setting a flag of the first conversion process type in the region encapsulation data box to an invalid value.

18. The method of claim 17, wherein the region processing information includes a magnitude of the first conversion process;

when the width and the height of the ith mapping region are equal, if the region configuration indicates that the ith mapping region needs to be subjected to the first conversion processing, setting the amplitude of the first conversion processing in the region encapsulation data box according to the indication of the region configuration.

19. The method of claim 15, wherein the region processing information includes a flag of the second conversion process;

if the zone configuration indicates that the ith mapping zone needs to be executed with second conversion processing, setting a flag of the second conversion processing in the zone encapsulation data box to a valid value;

setting a flag of the second conversion process in the area encapsulation data box to an invalid value if the area configuration indicates that the ith mapping area does not need to be subjected to the second conversion process.

20. A method of data processing of an immersive media, comprising:

decoding an encoding file of an immersion medium to obtain a packaging image of the immersion medium, wherein the packaging image comprises N packaging areas, and N is a positive integer;

obtaining a region encapsulation data box for the immersion media, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to a first inverse transformation process, the region shape information indicates that a shape of the ith packaged region is a target shape, where i is a positive integer;

performing region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, wherein the projected image comprises N mapping regions, and one encapsulation region corresponds to one mapping region; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions;

performing three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium;

21. The method of claim 20, wherein the region processing information includes a flag of the first conversion process; when the mark of the first conversion processing is a valid numerical value, the region decapsulating processing is executed on the encapsulated image according to the region encapsulation data box, and a projected image of the immersion medium is obtained, including:

and executing first inverse conversion processing on the ith packaging area to obtain the ith mapping area of the projection image.

22. The method of claim 21, wherein the region processing information includes a magnitude of the first conversion process; the amplitude is a non-zero value and the amplitude is varied by a multiple of the conversion step size;

the performing, according to the region encapsulation data box, region decapsulation processing on the encapsulated image to obtain a projected image of the immersion medium specifically includes:

and performing first inverse conversion processing on the ith packaging area according to the amplitude to obtain an ith mapping area of the projection image.

23. The method of claim 20, wherein the region processing information includes a flag of the second conversion process; when the flag of the second conversion process is a valid value, the performing region decapsulation process on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, including:

24. A data processing apparatus for immersive media, comprising:

an acquisition unit to acquire a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith mapping region of the immersion medium needs to be subjected to a first conversion process, the region shape information indicates that a shape of the ith mapping region is a target shape, where i is a positive integer; the immersion medium comprises N mapping regions, N is a positive integer and i is more than or equal to 1 and less than or equal to N; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions;

a processing unit for performing a zone encapsulation process on the immersion medium in accordance with the zone encapsulation data box;

25. An immersive media data processing apparatus, comprising:

an acquisition unit to acquire a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to a first inverse conversion process, the region shape information indicates that a shape of the ith packaged region is a target shape, where i is a positive integer; the immersion medium comprises N mapping regions, and the ith mapping region is converted to obtain the ith packaging region; n is a positive integer and i is more than or equal to 1 and less than or equal to N; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions;

the processing unit is used for executing region decapsulation processing on the immersion medium according to the region encapsulation data box;

26. A data processing apparatus for immersive media, comprising:

an acquisition unit for acquiring a projected image of an immersion medium and a region configuration of the projected image; the region configuration comprises a partition configuration and a region processing configuration;

the processing unit is used for dividing the projected image according to the partition configuration to obtain a plurality of initial areas; converting the plurality of initial regions into N mapping regions according to the region processing configuration, wherein N is a positive integer; the target initial region is any one of the plurality of initial regions; and the number of the first and second groups,

a region encapsulation data box for generating the immersion media according to the region configuration and the N mapping regions; the region encapsulation data box includes region shape information and region processing information, the region shape information indicating that a shape of an ith mapping region of the immersion medium is a target shape when the region processing information indicates that the mapping region needs to be subjected to a first conversion process, where i is a positive integer and 1 ≦ i ≦ N; and the number of the first and second groups,

the data box is used for carrying out region packaging processing on the projected image according to the region packaging data box to obtain a packaging image of the immersion medium, the packaging image comprises N packaging regions, and one mapping region corresponds to one packaging region; and the number of the first and second groups,

the packaging image is used for being coded to obtain a coded file of the immersion media;

27. A data processing apparatus for immersive media, comprising:

an acquisition unit configured to acquire a region encapsulation data box of the immersion medium, the region encapsulation data box including region shape information and region processing information; when the region processing information indicates that an ith packaged region of the immersion medium needs to be subjected to a first inverse conversion process, the region shape information indicates that a shape of the ith packaged region is a target shape, where i is a positive integer;

the processing unit is further configured to perform region decapsulation processing on the encapsulated image according to the region encapsulation data box to obtain a projected image of the immersion medium, where the projected image includes N mapping regions, and one encapsulation region corresponds to one mapping region; the N mapping regions are obtained by converting a plurality of initial regions in a projected image of the immersion medium according to a region processing configuration of the projected image; the target initial region is any one of the plurality of initial regions; and (c) a second step of,

the immersion medium is used for carrying out three-dimensional reconstruction processing on the projection image to obtain a three-dimensional image of the immersion medium;

28. An encoding device, characterized by comprising:

a processor adapted to implement one or more instructions; and (c) a second step of,

a memory storing one or more first instructions adapted to be loaded by the processor and to perform the data processing method of the immersion medium of any of claims 1-7; alternatively, one or more second instructions are stored, the one or more second instructions being adapted to be loaded by the processor and to perform the data processing method of the immersion medium of any of claims 15-19.

29. A decoding device, characterized by comprising:

a memory storing one or more first instructions adapted to be loaded by the processor and to perform the data processing method of the immersion medium of any of claims 8-14; alternatively, one or more second instructions are stored, the one or more second instructions being adapted to be loaded by the processor and to perform the data processing method of the immersion medium of any of claims 20-23.