WO2021209044A1

WO2021209044A1 - Multimedia data transmission and reception methods, system, processor, and player

Info

Publication number: WO2021209044A1
Application number: PCT/CN2021/087805
Authority: WO
Inventors: 徐异凌; 王超斐
Original assignee: 上海交通大学
Priority date: 2020-04-16
Filing date: 2021-04-16
Publication date: 2021-10-21
Also published as: CN113542907A; CN113542907B

Abstract

Multiple degrees of freedom multimedia data transmission and reception methods, a multiple degrees of freedom multimedia data system, media processor, and player. Data types of different media types and track media stream distribution information are determined by means of increasing attribute descriptions of immersive multimedia. Association relationships between multiple pieces of data content in different media data are defined, and indices are provided. The system structure enables packaging and transmission of new media content and forms in multiple degrees of freedom, thereby providing a compatible and expandable framework for implementing subsequent relating techniques and designs, and better adapting to visual media consumption and applications in new degrees of freedom.

Description

Multimedia data transceiving method, system, processor and player

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on April 16, 2020 with the application number of 202010301699.0, and the entire content of the application is incorporated into this application by reference.

Technical field

This application belongs to the field of immersive multimedia, and specifically relates to a method for sending and receiving multimedia data with multiple degrees of freedom, a multimedia data system with multiple degrees of freedom, and a media processor and player.

Background technique

In recent years, due to the development of virtual reality (VR) technology, media services have evolved from traditional flat two-dimensional TVs to head-mounted displays (HMD) for panoramic and immersive content experiences. The immersive media produced by the VR system represents a virtual space where users can interact naturally as in the real world. Virtual reality renders visual and auditory sensory stimuli in the real world and presents them to users. The user starts to look around from a display area in a three-dimensional space, and at the same time obtains the associated audio according to the window.

However, with the enhancement of the performance of visual media-related hardware, especially the performance of media acquisition equipment, media processing equipment and computing equipment. Traditional immersive media, such as three Degrees of Freedom (3DoF), have been fully and maturely applied and developed. As users' demands for immersive media continue to increase, 3DoF technology can no longer fully meet the needs of users because it only supports the viewing mode of users rotating their heads at a fixed point, so 3DoF+ technology has entered a stage of rapid development. Corresponding research and design in the field of visual media also gradually involve media content with more degrees of freedom. On the basis of 3Dof, 3Dof+ and 6Dof related media experience forms were born. Corresponding to the newly born media experience form, the visual media field has also designed many corresponding media content that can be used to realize 3Dof+ and 6Dof degrees of freedom, and proposed and perfected the corresponding media realization technology.

The traditional immersive media system design is mainly aimed at the omnidirectional video media transmission under 3Dof, and the degree of freedom that the content consumer has in the media experience. For example, when the consumer experiences 3Dof media content, it has and only has three free head rotation operations, which are around the three coordinate axes of the three-dimensional rectangular coordinate system with the consumer's head as the origin. Spin. The relevant media used to realize this immersive media experience is a series of technologies related to omnidirectional video, and the media content is also targeted at the data transmitted, that is, the 2D image frame of the traditional video form is designed, which leads to the media content oriented to the system structure Relatively single such problem.

The 3Dof+ media experience form adds the freedom of limited head displacement on the basis of the three head freedoms, that is, immersive media content consumers can obtain different media contents through displacement within a certain limit. In other words, the sense of parallax generated by the displacement can be perceived by the device and the system can feed back different media content brought about by the parallax in real time to match the operation behavior of the consumer. This requires adding media information that can bring parallax interaction to the original media content, so that the visual system can feel a more realistic scene. 3DoF+ video is made from content acquired by multiple cameras deployed to predict user displacement. The depth image scene presented by 3DoF+ media is obtained by 2D image synthesis, where the 2D image is composed of texture components and corresponding depth components. Depth information can be directly collected by camera equipment or obtained indirectly through algorithms; or, 3DoF+ view can be synthesized from a planar image of a background area and multiple foreground images (non-planar).

It can be seen from the above requirements that only the processing form of the 2D image frame in the traditional video form cannot meet the requirement of the parallax caused by the displacement generated by the limited displacement. Therefore, it is necessary to design a new media information content and processing form to adapt to the new data form.

The current media content processing and data form include, for 3Dof+, the Atlas related technology is mainly used to realize the atlas related technology. The International Organization for Standardization MPEG has already implemented the Atlas related technology. As shown in the atlas data content under 3+ degrees of freedom (3Dof+video) in Figure 1, this type of solution uses texture components and corresponding depth components to form an atlas (Atlas) for encapsulation and transmission. The atlas is a collection of rectangular blocks from one or more 2D images into an image pair, the image pair contains a texture component image and a corresponding depth component image. At the encoding end, the images of different viewpoints taken by different angle cameras are trimmed to obtain a basic atlas containing basic image blocks and additional atlases containing supplementary image blocks. At the decoding end, according to the user’s current viewpoint and the source camera’s Correspondence, select the basic atlas and the supplementary atlas under the corresponding viewpoint to combine, you can get different visual field images under different viewpoints. For example, in Figure 1, the basic atlas and the additional atlas 1 are used to generate the visual field image 1. The basic atlas and the additional atlas 2 generate a field of view image 2. The method of using the atlas can reduce the amount of data that needs to be transmitted to a certain extent on the premise of realizing the corresponding media function, and has a better reconstruction effect on the user side.

In addition, 6Dof is a richer immersive media experience based on 3Dof and 3Dof+. On the basis of the three degrees of freedom of the head, the displacement of the three coordinate axes in the three-dimensional space with itself as the origin is added. To realize the parallax and conversion of media presentation content with the head rotation and body displacement of the media content consumer, only the processing of traditional video media content can no longer meet the requirements. At present, the media content and technology to realize 6Dof related media experience are still in the exploration stage, mainly point cloud, light field, etc. The point cloud data content is shown as an example in Figure 2, showing 6 degrees of freedom (6Dof video) immersive media data content The presentation is the surface information of the object obtained by scanning, including three-dimensional coordinate data, depth information, color information, etc., forming a geometric skeleton and then further point cloud presentation. Among them, there are different point cloud data compression algorithms for static and dynamic point cloud data, as well as different types of point cloud data such as machine perception and human eye perception. For example, for dynamic human eye perception point cloud data, a typical point cloud compression algorithm is to convert 3D point cloud data into 2D image data, and then perform data processing, one of which is video-based point cloud compression (Video- based Point Cloud Compression, VPCC) algorithm. This compression method first projects a 3D point cloud onto a 2D plane to obtain occupancy map information, geometric information, attribute information, and auxiliary information. The attribute information usually includes texture information and color information. Therefore, the compressed information is usually divided into four categories. Data is transferred. They are geometric information, attribute information, occupancy map information, and auxiliary information. The decoding of geometric information relies on occupancy map information and auxiliary information, and the decoding of attribute information relies on geometric information, occupancy map information and auxiliary information. Point cloud media needs to process different types of data simultaneously, and after integration, present media with rich spatial and texture characteristics to users. As the exploration of related technologies progresses, the system also needs to complete and update the corresponding content for the exploration of 6Dof.

In summary, a higher degree of freedom of immersive media experience means more diverse information and data types. Whether it is an atlas, point cloud, or other forms of media such as light fields, its information content is diversified. If you want to achieve a new immersive media experience under multiple degrees of freedom, it originally only supports a single content structure design. The immersive media system framework will not be able to effectively support the storage and transmission design of the new multi-degree-of-freedom media content, and it is necessary to design new information and structures in the new multi-degree-of-freedom media.

How to solve the problems of the existing system architecture, how to design the system structure for packaging and transmission of new media content and forms under multiple degrees of freedom, so that it can provide a compatible and extensible architecture for the implementation of subsequent corresponding technologies and designs, and more Adapting well to the consumption and application of visual media under the new degree of freedom is a key issue that needs to be solved urgently.

Summary of the invention

Aiming at the related technology and implementation of multi-degree-of-freedom immersive media content, this application proposes a multi-degree-of-freedom multimedia data sending method, a receiving method, a multi-degree-of-freedom multimedia data system, and a media processor and player.

This application provides a method for sending multimedia data with multiple degrees of freedom, including: encapsulating the multimedia data according to an encapsulation transmission protocol. The encapsulation transmission protocol includes: determining the attribute information of the multimedia data, including: different media types for the multimedia data , Determine the data type; determine and identify the number and location information of the media stream in the track where the multimedia data of the media type is located; and determine the association relationship between multiple data contents in different media data; and determine the corresponding attribute information respectively The index method and index information are used to transmit the encapsulated multimedia data.

Further, the data format of the multimedia data includes 3Dof+ mode and/or 6Dof mode; encapsulation transmission is suitable for MPEG media file transmission MMT mode or smart media transmission SMT mode or an ISO-based media file format ISOBMFF or an extension of the panoramic media application OMAF Way.

Further, the different media types of the multimedia data include any one or more of the following: traditional two-dimensional video, atlas video, dynamic point cloud, static point cloud, and light field.

Further, determine the data type of the multimedia data: when the media type is atlas video, the data type includes texture data and depth data; when the media type is dynamic point cloud, the data type includes texture, geometry, occupancy map, and additional information data ; When the media type is static point cloud, the data type includes texture, geometry, and additional information data; when the media type is light field, the data type includes texture data and angle data.

Further, the determining the data type of the multimedia data further includes: determining the number of data groups of the corresponding data type for each data type.

Further, the corresponding relationship between the numbers of data groups of different data types includes: the same structure corresponds to the same texture; or the same structure corresponds to different textures that are mutually complementary.

Further, the number and location information of the track media stream where the multimedia data for determining and identifying the media type is located includes: defining the track type, indicating that the multimedia data of each media type is in one or at least two tracks. : Define the media track number where the multimedia data is located; and define the specific position of each data in the multimedia data in the track; when there are at least two tracks: define the media track number of each data contained in the multimedia data, and define each of the multimedia data The specific location of the data in the track.

Further, the determining the association relationship between multiple data contents in different media data, the association relationship includes: mutual dependence between data contents and/or single dependence between data contents and/or mutual replacement between data contents.

Further, the interdependent association relationship includes: the texture and depth data in the atlas are interdependent; the geometry, occupancy map, and additional information in the point cloud are interdependent to jointly construct the point cloud geometric skeleton; the single dependent association The relationship includes: the texture data in the point cloud needs to rely on geometry, occupancy map, and additional information to jointly construct the geometric skeleton; the additional atlas relies on the basic atlas, and the mutual replacement relationship includes: for the same point cloud geometric skeleton, match Use different texture data for replacement.

Further, the index information includes a collection of the above-mentioned attribute information, and the attribute information is respectively placed at different levels of the encapsulation transmission protocol to describe, or defines an index that includes all attribute information of the media.

Further, the data stream of the targeted multimedia data includes any one or more of the following: outer layer information, description indication information, and data content information. The outer layer information is used to define the file type and content compatibility of the multimedia data. Description; description indication information, used to describe and indicate multimedia data; data content information, used for specific content information of multimedia data.

In addition, this application also provides a method for receiving multimedia data in multiple degrees of freedom, including: receiving the encapsulated multimedia data, analyzing the multimedia data according to the encapsulated transmission protocol that is the inverse of the sending method, and analyzing the multimedia data according to the parsed content. The data is processed accordingly.

Further, the receiving method includes:

S1: Receive the media content data of the multimedia data, analyze it according to the encapsulation transmission protocol, and obtain the description instruction information of the multimedia data; S2: Determine the media content data according to the description instruction information; S3: Analyze and obtain the corresponding media content type according to the media content type The description information of the number of data groups and/or the description information of the media data type and/or the description information of the track type; S4: Obtain the description information of the media data type, analyze and obtain the description information about the association relationship of different data types; S5: Based on different media data types Descriptive information and the number of data groups. Descriptive information. Completely obtain the number corresponding to each data type after analysis; S6: Completely obtain the index information corresponding to each data type in the analysis information according to the number of data groups of different types of data, and obtain according to S3 The track type description information, the association relationship description information between the data types obtained in S4, and the index information description information of each data type obtained in S5, to obtain the required media content.

In addition, this application also provides a media processor, including:

The storage module, the receiving module, the analysis module, and the data processing module are used to receive multimedia data for analysis and processing according to the encapsulation transmission protocol, the encapsulation transmission protocol includes:

Determining the attribute information of the multimedia data includes: determining the data type for different media types of the multimedia data; determining and identifying the number and location information of the media stream of the track where the multimedia data of the media type is located; and determining the number of data contents in different media data And the corresponding index mode and index information are respectively determined for the attribute information.

In addition, this application also provides a player, including:

Functions and effects of this application:

According to the multi-degree-of-freedom multimedia data sending method, receiving method, multi-degree-of-freedom multimedia data system, and media processor and player provided by this application, the solution to existing protocols is mainly for traditional media, and for new media, especially new ones. The problem of attribute non-support, provides a new encapsulated and designed immersive media system framework for the new features and attributes of new media with multiple degrees of freedom, and expands the existing protocol by defining and describing the important features and attributes of the new media , Can adapt to the diversification of media data types under the new multi-degree of freedom and the diversified relationship between data units, better compatible with the new multi-degree of freedom media content, with a certain degree of scalability, and provide corresponding system frame structure design The solution supports the storage and transmission of new media, enables devices and applications to support new media, and also realizes the effective use of multi-degree-of-freedom media data streams.

Description of the drawings

Figure 1 is a schematic diagram of the comparison between traditional immersive media content and the realization of atlas technology;

Figure 2 is a block diagram of the data flow of point cloud technology;

Figure 3-1 is the frame diagram of the media system design in the traditional scheme;

Attached Figure 3-2 is a framework diagram of the multi-degree-of-freedom immersive media system design in this application;

Figure 4-1 is an ISOBMFF-based data transmission single-track design diagram of the atlas in the embodiment;

Figure 4-2 is a schematic diagram of the data flow targeted under the single track of the atlas in Figure 4-1;

Figure 5-1 is an ISOBMFF-based data transmission single-track design diagram of the point cloud in the embodiment;

Figure 5-2 is a schematic diagram of the data flow targeted under the single track of the point cloud in Figure 5-1;

Figure 6-1 is an ISOBMFF-based data transmission multi-track design diagram of the atlas in the embodiment;

Figure 6-2 is a schematic diagram of the data flow targeted under the multi-track of the atlas in Figure 6-1;

Figure 7-1 is an ISOBMFF-based data transmission multi-track design diagram of the point cloud in the embodiment;

Fig. 7-2 is a schematic diagram of the data flow under the multi-track point cloud in Fig. 7-1;

Fig. 8 is a flow chart of multi-degree-of-freedom media data analysis;

Fig. 9 is a flow chart of data analysis corresponding to specific media content;

Figure 10 is a schematic diagram of the functional module structure of the multi-degree-of-freedom immersive media system.

Detailed ways

The application will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the application, but do not limit the application in any form.

As a multi-degree-of-freedom immersive media, the multimedia data targeted by this application has the following characteristics:

First (1), diversification of data types.

The traditional video stream in Figure 3-1 is composed of continuous image frames. There are many elements of immersive media under the new multi-degree of freedom. For example, the content of the newly-appearing atlas in the immersive media content of 3+ degrees of freedom shown in Figure 1, the atlas contains texture and depth information; the point cloud of 6 degrees of freedom shown in Figure 2 contains texture map information, Geometric map information, occupancy map information, and additional information. It can be seen from FIGS. 1 and 2 that the immersive media with multiple degrees of freedom targeted by this application requires an effective combination of multiple types of data to be correctly presented, and the original data encapsulation metadata cannot accurately describe these different types of data attributes.

(2) The relationship between different data units is diversified.

The traditional video units in Figure 3-1 are sorted according to the timeline. However, in this application, Figure 3-2 is a framework diagram of the multi-degree-of-freedom immersive media system design in this application. It can be seen that different types of data of the immersive media under the new degrees of freedom can form a variety of combined association relationships. For example, the atlas with 3+ degrees of freedom in Figure 1 is composed of a set of textures and depths to form a basic atlas, and another set of textures and depths to form a supplementary atlas. The content of the basic atlas and the supplementary atlas can be combined to form a free-view video. For example, the point cloud with 6 degrees of freedom in Figure 2 can be restored using geometric map information, occupancy map information and additional information. In some specific cases, only geometric structure information can be restored without using texture information. Different geometric structures and different texture map information can be combined to obtain point clouds with different textures under a unified geometric structure. Functions such as skinning of the character model can be realized by using the association relationship of related content attributes. Therefore, the original encapsulation protocol metadata needs to be extended to support the description of complex relationships.

Because of the above characteristics, when designing the framework of a new immersive media system with multiple degrees of freedom, in order to support multiple media content, it is necessary to design the corresponding immersive media with multiple degrees of freedom in the description structure of the media data to be packaged and transmitted. The description of the content of the data stream.

In order to achieve the above objective, this application provides a method for sending multimedia data with multiple degrees of freedom, including: encapsulating the multimedia data according to an encapsulation transmission protocol. The encapsulation transmission protocol includes: determining the attribute information of the multimedia data, including: Different media types of the data, determine the data type; determine and identify the number and location information of the media stream of the track where the multimedia data of the media type is located; and determine the association relationship between multiple data contents in different media data; and the attribute information The corresponding index method and index information are determined respectively, and the encapsulated multimedia data is transmitted.

As shown in Figure 3-2, in the immersive media system design framework, a new description of multimedia data, also known as multimedia data flow, needs to be added, 1. Media type; 2. Media flow content quantity; 3. Media flow content type and corresponding The amount of content; 4. The relationship between the media content; and 5. The content indexing method and index information. Specifically, the following instructions are included:

1. Define the description of the media type under the new multiple degrees of freedom.

It is described as a new type of media that supports multiple degrees of freedom, so that protocols and devices can correctly identify and process such new media. By adding relevant description information of the media type in the media data stream, it can play an instructive role in the design of media data stream information, various processing structures and processors.

Table 1 is a media type table of multimedia data in this embodiment. For example, in ISOBMFF (ISO-based media file format ISO Base File Format), new video types are added, such as traditional two-dimensional video, atlas, point cloud, light field, and reserved for defining future new media. Descriptions of various video types. Among them, point cloud further distinguishes dynamic point cloud, static point cloud and so on.

Table 1

序号Serial number	视频类型Video type
11	二维视频(传统视频)Two-dimensional video (traditional video)
22	图集视频Gallery video
33	动态点云Dynamic point cloud
44	静态点云Static point cloud
55	光场Light field
66	保留(用于定义新型媒体类型)Reserved (used to define new media types)

2. Define the data type and the number of corresponding types in the media data stream under the new multi-degree-of-freedom.

Describe the types and numbers of different types of data contained in each new type of media, so that the protocol and equipment can correctly identify and process this type of new media.

Table 2 is a corresponding table of the number of data types and the number of data groups determined according to different media types in this embodiment. The corresponding table definition in Table 2 describes the new video type, and describes the attributes and quantity of data contained in each video type.

For example, define new video types in ISOBMFF, such as atlas, point clouds, light fields, etc., and describe the attributes and quantities of data contained in each video type:

As shown in Table 2, 2. The atlas contains texture and depth data; 3. The dynamic point cloud contains texture, geometry, occupancy map, and additional information data; 4. The static point cloud video contains texture, geometry, and additional information data; current technical solutions Among them, 5. The light field contains texture and angle data, which may be expanded in the future with the study of the light field.

Further expandable, if each video type contains several sets of data, the number of data sets can also be defined. The atlas video can contain multiple atlases, the point cloud can contain multiple sets of point cloud data; the light field can contain multiple sets of texture and angle data.

It is worth noting that in this application, the immersive media data stream under the new degree of freedom is not limited to one type of data content form. In order to realize the system structure design for multiple media data stream contents, the new freedom The design of the immersive media system framework under the high degree describes the type of content in the media data stream and the amount of corresponding content.

Table 2

3. Determine and identify the number and location information of the track media stream where the multimedia data of the media type is located.

Define whether each type of media is in one media stream or distributed in multiple media streams, distinguish all data of each new type of media in a media stream for storage and transmission, and the address or location of each data.

Table 3 is a corresponding table of the track type of the media stream of the track where the multimedia data is located and the location of the data in this embodiment. For example, the track type is defined in ISOBMFF to describe whether each video is in one or at least two tracks.

table 3

4. Define the relationship between multiple data contents in different media data.

When there are multiple data content forms in the media data stream, there can be multiple data of each data type, and there are complex relationships between them. In order to realize the media from encapsulation, transmission to decoding and presentation, from processing media data For system support from streaming to presentation media, it is necessary to describe the association relationship information between the contents in the data stream, so as to realize the detailed design, implementation and application of the use of the data stream in a correct and feasible way.

Table 4 is a table of association relations between multiple media contents in multimedia data, which determines the association relations between multiple data contents in different media data: mutual dependence, single dependence, and mutual replacement.

For example, define the association relationship between the different data contained in each video type in ISOBMFF to describe:

1. Data depends on each other, neither of which is indispensable. For example, in Table 4, 2, the texture and depth data in the atlas are dependent on each other; Table 4 in 3, the geometry, occupancy map, and additional information in the dynamic point cloud are dependent on each other to jointly construct the geometric skeleton of the point cloud.

2. Single dependency, there is a dependency on a certain data, the data will lose its meaning without it. For example, in Table 4, 3. The texture data in the dynamic point cloud needs to rely on a geometric skeleton constructed by geometry, occupancy map, and additional information; Table 4, 2. The additional atlas depends on the basic atlas.

3. Replacement relationship, data can be replaced with each other. For example, in Table 4, 3. The dynamic point cloud can be equipped with different texture data for the same point cloud geometric skeleton, so as to show different "skins" on the same skeleton. Then there is a substitution relationship between different texture data.

In summary, the interdependent relationship includes: the texture and depth data in the atlas are interdependent; the geometry, occupancy map, and additional information in the point cloud are interdependent to construct the point cloud geometric skeleton. The single dependent relationship includes: The texture data in the point cloud needs to rely on geometry, occupancy maps, and additional information to jointly construct the geometric skeleton; the additional atlas depends on the basic atlas, and the mutual replacement relationship includes: for the same point cloud geometric skeleton, different texture data Used for replacement.

The above analysis omits a description of the different media content of each data type in Table 4. Table 4 is only an example of a better example, and is not a limitation of the application.

Table 4

5. Define the index method and index information of the media data stream under the new multi-degree-of-freedom.

The above description shows that the new type of media data has complex types, quantities, and association relationships. For ease of description, the index information of the media data can be defined.

Table 5 is a corresponding table of different media types of multimedia data and corresponding indexing methods and index information determined respectively.

For example, ISOBMFF defines the indexing method between the data content of each video type and the indexing method of indexing information media, that is, the data composition and index information of the media are given to help the device quickly analyze its media type, composition, quantity and access Information, to achieve effective acquisition and corresponding processing of content.

table 5

In Table 5, taking 2. Atlas video as an example, for the case where the media type is atlas video and distributed on a single track structure, the protocol is expanded by using the Sample Table Box (Sample Table Box) to increase the index Information, namely the sample type and sample index, helps the device quickly analyze its media type, composition, quantity, and access information, so as to achieve effective content acquisition and corresponding processing.

In addition, continue to take 2. Atlas video as an example. For the case where the media type is atlas video and distributed on a multi-track structure, it is expanded by using the Track Reference Box (Track Reference Box, the same below) in the protocol. , Add index information, namely track (Track) type and track (Track) ID, to help the device quickly analyze its media type, composition, quantity and access information, to achieve effective content acquisition and corresponding processing.

The corresponding descriptions of the index methods and index information of the single-track and multi-track structures of other media types in Table 5 are omitted, which can be inferred and will not be repeated.

It is further expanded to explain that the index information can be used as a collection of the above-mentioned newly defined attributes. These attribute information can be placed at different levels of the protocol file to describe, or an index can be defined to contain all relevant information of the media, which is convenient for the device to quickly read and Parsing.

In summary, when it is necessary to support new multi-degree-of-freedom immersive media, the immersive media system framework given in this application adds a description of the multimedia data stream to the protocol and performs corresponding processing, respectively, in conjunction with Figure 4-1 Embodiments 1 to 4 of Figure 7-2 describe the sending method and receiving method of multimedia data under multiple degrees of freedom, the multimedia data system under multiple degrees of freedom, and media processors and players, so as to finally realize the acquisition of media content on the consumer side. Immersive media experience under the new multiple degrees of freedom.

The following are based on the four embodiments listed in ISOBMFF: atlas single track, point cloud single track, atlas multi track, and point cloud multi track are optional solutions and are not limited by the scope of this application.

[Embodiment One]

Figure 4-1 is an ISOBMFF-based data transmission single-track design diagram of the atlas in the embodiment. Fig. 4-2 is a schematic diagram of the data flow targeted under the single track of the atlas in Fig. 4-1.

For the single-track design of the atlas, as shown in Figure 4-1, the outer information can be represented by any field. Here, the field ftyp is used to represent the outer information, where ftyp is the outermost data box of the encapsulated file to define the file Type and content compatibility. The description indication information can be represented by any field. Here, the field moov is used to represent the description indication information. moov is the data box of the media content description information in the file, which contains various related information describing the transmission media content. The data content information can be represented by any field. Here, the field mdat is used to represent the data content information. The mdat is the specific media data content information. The content contained in the moov is used to describe and describe the specific media data content in the mdat. Indicating role. This application adds description information about the content of media data contained in mdat in the moov structure.

The media data content format is shown in Figure 4-2, indicating that the number of atlases contained in the current data stream is "n". Based on the data content format, the moov data box shown in Figure 4-1 In, new information about the type of media content, the type of media track, the number of media data groups, the type of media data and its corresponding number, the association relationship between different data types, and the index information are added.

Specifically, in moov, the description "miv" about the media type of the atlas is added, indicating that the current media data stream is an atlas data stream (miv). Indicate that the track type is single track, indicate that the data types existing in the current media data stream are texture and depth, add a description about the amount of data, and indicate that the number of atlases contained in the current data stream is "n", each The atlas contains a depth layer and a texture layer. Indicates the position of the corresponding data in each atlas, the position of the depth layer "depth 0" of the first atlas in the track, and the position of the texture layer "texture 0" of the first atlas in the track. By analogy, the instructions for the corresponding texture and depth position information in each atlas are completed. Add relevant information about the relationship between the data in the media data stream. For example, atlas 0 containing the basic view block is necessary data, and the atlas where other supplementary view blocks are located are supplementary content, which depends on atlas 0 and is related to atlas 0 The miv image corresponding to the viewpoint is restored together.

[Embodiment 2]

Fig. 5-1 is a design diagram of a single track of ISOBMFF-based data transmission of the point cloud in the embodiment. Fig. 5-2 is a schematic diagram of the data flow targeted under the single track of the point cloud in Fig. 5-1.

For the single-track design of the point cloud, as shown in Figure 5-1, the outer information can be represented by any field. Here, the field ftyp is used to represent the outer information, where ftyp is the outermost data box of the encapsulated file to define the file Type and content compatibility. The description indication information can be represented by any field. Here, the field moov is used to represent the description indication information. moov is the data box of the media content description information in the file, which contains various related information describing the transmission media content. The data content information can be represented by any field. Here, the field mdat is used to represent the data content information. The mdat is the specific media data content information. The content contained in the moov is used to describe and describe the specific media data content in the mdat. Indicating role. This application adds description information about the content of media data contained in mdat in the moov structure.

The media data content format is shown in Figure 5-2. In mdat, point cloud data group 0 to point cloud data group n, each group contains 2 sets of textures (texture 01, texture 02), geometry, occupancy map, and extra information.

Based on the form of the data content, in the moov data box shown in Figure 5-1, add new information about the type of media content, the type of media track, the number of media data groups, the type of media data and its corresponding quantity, and different data types. The relationship between and index information.

Specifically: For the single-track design of the point cloud, as shown in Figure 5-1, the description of the point cloud media type "point cloud" is added to the moov structure, indicating that the current media data stream is a point cloud data stream (vpcc) . Indicates that the track type is single track, indicating that the data types existing in the current media data stream are texture, geometry, occupancy map and additional information. The description about the amount of data is added, indicating that the texture contained in the current data stream is "t" Numbers, geometry, occupancy map and additional information are all "n". Indicate the current texture information, the position of texture 1 in the track, the position of texture 2 in the track, the position of geometry 1 in the track, etc., and so on to complete the instructions for four different types of data information. Increase the related information of the relationship between the data in the media data stream, like the geometry 0 of a point cloud frame 0, occupying graph 0, and the additional information 0 is mutually dependent, and together restore the geometry structure 0 of the point cloud of this frame, and the texture 0 The restoration depends on the restoration of the geometric structure 0, that is, the texture information 0 depends on the geometric 0, occupying the image 0 and the additional information 0.

It is worth noting that in the present application, in a normal use scenario, the same structure 0 can correspond to the same texture, that is, a modification of the second embodiment above, and the number of data: texture, geometry, occupancy map, and additional information are all n. Then, in other extended use scenarios, the same structure 0 can also correspond to different textures, that is, in the second embodiment, the number of data: the number of textures is t, and the geometry, occupancy map, and additional information are all n. Structure 0 can correspond to texture 00, texture 01, and texture 02. A typical application scenario is point cloud character model skinning. It can be seen that different textures corresponding to the same geometric structure are mutually complementary. In Figure 5-2, each set of atlas contains one or more sets of texture data. Therefore, it can be seen that the number of texture data t is more than the number of data n of other data types (geometry, occupancy map, and additional information).

[Embodiment Three]

Fig. 6-1 is an ISOBMFF-based data transmission multi-track design diagram of the atlas in the embodiment. Fig. 6-2 is a schematic diagram of the data flow targeted under the multi-track of the atlas in Fig. 6-1.

For the multi-track design of the atlas, as shown in Figure 6-1, the outer information can be represented by any field. Here, the field ftyp is used to represent the outer information, where ftyp is the outermost data box of the encapsulated file to define File type and content compatibility. The description indication information can be represented by any field. Here, the field moov is used to represent the description indication information. moov is the data box of the media content description information in the file, which contains various related descriptions of the transmission media content. Information, data content information can be represented by any field, here the field mdat is used to represent data content information, mdat is specific media data content information, and the content contained in moov describes the specific media data content in mdat And indicating role. This application adds description information about the content of media data contained in mdat in the moov structure.

The content format of its media data is shown in Figure 6-2. Atlas data 0 to atlas data n are distributed on track 1 (Track-1) and track 2 (Track-1), and each atlas includes a geometry (in this embodiment, depth) and a texture. Based on the data content form, in the moov data box shown in Figure 6-1, new information about the media content type, media track type, number of media data groups, media data type and its corresponding quantity, and different data types are added. The relationship between and index information.

Specifically: as shown in Figure 6-1, the description "miv" about the media type of the atlas is added to the moov structure, indicating that the current media data stream is an atlas data stream (miv). Indicate that the track type is multi-track, indicate that the data types existing in the current media data stream are texture and depth, add a description of the data quantity information, and indicate that the number of atlases contained in the current data stream is "n", each An atlas contains a depth layer and a texture layer. Indicate the track of the corresponding data type in each atlas and its position in the track, indicate the depth layer "depth 0" of the first atlas in the track of type depth and the position in the track, indicating the first image The texture layer "texture 0" of the set is in the track of type texture and the position in the track. By analogy, the instructions for the corresponding texture and depth position information in each atlas are completed. Add relevant information about the relationship between the data in the media data stream. For example, atlas 0 containing the basic view block is necessary data, and the atlas where other supplementary view blocks are located are supplementary content, which depends on atlas 0 and is related to atlas 0 The miv image corresponding to the viewpoint is restored together.

[Embodiment Four]

Fig. 7-1 is an ISOBMFF-based data transmission multi-track design diagram of the point cloud in the embodiment. Fig. 7-2 is a schematic diagram of the data flow under the multi-track point cloud in Fig. 7-1.

For the multi-track design of the point cloud, as shown in Figure 7-1, the outer information can be represented by any field. Here, the field ftyp is used to represent the outer information, where ftyp is the outermost data box of the encapsulated file to define File type and content compatibility. The description indication information can be represented by any field. Here, the field moov is used to represent the description indication information. moov is the data box of the media content description information in the file, which contains various related descriptions of the transmission media content. Information, data content information can be represented by any field, here the field mdat is used to represent data content information, mdat is specific media data content information, and the content contained in moov describes the specific media data content in mdat And indicating role. This application adds description information about the content of media data contained in mdat in the moov structure.

The format of the media data content is shown in Figure 7-2. Point cloud data 0 to point cloud data n are distributed on track 1 to track 5 (Track-1 to Track-5). The point cloud data contains t textures, and the geometry, occupancy map and additional information are all n, among which, The first group of textures are distributed in Track-1, the second group of textures are distributed in Track-2, and the geometry, occupancy map, and additional information are distributed in Track-3 to Track-5, respectively. Based on the form of the data content, in the moov data box shown in Figure 7-1, new information is added about the type of media content, the type of media track, the number of media data groups, the type of media data and its corresponding quantity, and the different data types. The relationship between and index information. specifically:

As shown in FIG. 7, the description of the point cloud media type "point cloud" is added to the moov structure, indicating that the current media data stream is a point cloud data stream (vpcc). Indicate that the track type is multi-track, indicate that the data types existing in the current media data stream are texture, geometry, occupancy map, and additional information. The description about the amount of data is added, indicating that the texture contained in the current data stream is "t" ", geometry, occupancy map and additional information are all "n". Indicate the track type and position in the track where the current texture information is located, texture 0 is in track 1 whose type is texture and its corresponding position, and texture 1 is in track 1 whose track type is texture and its corresponding position, Geometry 0 is in the track 3 whose type is geometry and indicates its corresponding position, etc., and so on, to complete the instructions for four different types of data information. Increase the related information of the relationship between the data in the media data stream, like the geometry 0 of a point cloud frame 0, occupying graph 0, and the additional information 0 is mutually dependent, and together restore the geometry structure 0 of the point cloud of this frame, and the texture 0 The restoration depends on the restoration of the geometric structure 0, that is, the texture information 0 depends on the geometric 0, occupying the image 0 and the additional information 0.

Similar to the above-mentioned solution in [Embodiment 2], in this application, in a normal use scenario, the same structure 0 can correspond to the same texture, that is, a modification of the above-mentioned embodiment 4. The amount of data: texture, geometry, occupancy map, and additional information are all For n. Then, in other extended use scenarios, the same structure 0 can also correspond to different textures. That is, in the fourth embodiment, the number of data: the texture is t, and the geometry, occupancy map, and additional information are all n. Structure 0 can correspond to texture 00, texture 01, and texture 02. A typical application scenario is point cloud character model skinning. It can be seen that different textures corresponding to the same geometric structure are mutually complementary.

In Figure 7-2, each set of point clouds contains one or more sets of texture data. Therefore, it can be seen that the number of texture data t is more than the number of data n of other data types (geometry, occupancy map, and additional information).

Fig. 8 is a flow chart of multi-degree-of-freedom media data analysis, which is used to illustrate the method of receiving multimedia data under multi-degree-of-freedom. As shown in FIG. 10, this application provides a multi-degree-of-freedom immersive media system, which includes a sender side and a server side. Among them, the server includes a receiving module, a parsing module, and a data processing module. After the sender finishes sending the encapsulated media file, the server will receive the media file through the receiver. First, the encapsulated media file protocol will be parsed, and the media data content will be processed according to the parsed content. . Specifically: as shown in Figure 8:

S1: After the sender completes the modification of the corresponding content in the data encapsulation transmission protocol, the server receives the corresponding media file data through the receiver, and completes the analysis of the related protocol to obtain the description information of the media content data.

S2: The data processing module will process the media content data according to the description information parsed in S1. First, the media content is judged, and the judgment is based on the parsed media type description information.

S3: According to the media content type under the new multiple degrees of freedom determined in S2, the parsed data group quantity description information under the corresponding content, the media data type description information, and the track type description information are obtained.

S4: On the basis of obtaining the data type description information in S3, obtain the association relationship description information of different data types from the parsed information.

S5: Under the guidance of different data type description information and data group quantity description information, the number corresponding to each data type after analysis is completely obtained.

S6: According to the number of data groups of different types of data obtained in S5, the index information corresponding to each data type in the analysis information is completely obtained, according to the track type description information obtained in S3, and the association relationship between the data types obtained in S4 Under the combined action of the description information and the index information description information of each data type obtained in S5, the required media content is restored in the data processing terminal.

Fig. 9 is a flow chart of data analysis corresponding to specific different media content, when corresponding to specific different media content: dynamic point cloud (a in Fig. 9), static point cloud (b in Fig. 9), atlas video (Fig. 9 c) and light field (figure 9 d), including the following steps:

In the first step T1, the media type is judged according to the media type description information, and the media type has been defined in the encapsulated content. If it is a traditional video media type, it is processed according to the old immersive media processing flow. If it is an immersive media type under the new multi-degree of freedom, dynamic point cloud, static point cloud, atlas video, light field, the media content processing flow corresponding to the resolved media type is used for processing.

In the second step T2, after the media type is judged, the processing flow and processor corresponding to the media type are started, and at the same time, the number of media content data groups, the media content type corresponding to the media content and the track type during transmission are further obtained. For dynamic point clouds, as shown in Figure 9(a), the corresponding media content types include texture, geometry, occupancy map, and additional information. For static point clouds, as shown in Figure 9(b), The corresponding media content types include texture, geometry, and additional information. For atlas videos, as shown in Figure 9(c), the corresponding media content types include texture and depth. For light fields, as shown in Figure 9( As shown in d), the corresponding media content types currently include texture and angle.

The third step T3, after completing the acquisition of the data type under the corresponding media type, combine the number of media data groups to analyze the number of different media data types. The number of media data groups can assist in the acquisition of the number of media data types, avoiding content loss, and media The number of data types can guide the data analysis terminal to complete the complete analysis of different types of data, avoiding content loss and affecting the media video recovery effect.

The fourth step T4, after completing the acquisition of the number of data groups and the number of data types, the index information and the association relationship of the corresponding data types are parsed, combined with the previous track type judgment results, and the data combination is performed. The data combination method is:

T4.1: As shown in the branch a in Figure 9, for dynamic point clouds, according to the relationship between data types, the geometry of the same group of dynamic point cloud data, the occupancy map and additional information depend on each other to recover the geometry of the dynamic point cloud The restoration of shape and texture depends on the restoration of geometric shapes, and the same set of dynamic point cloud data can have multiple sets of corresponding texture information but only one set of geometry, occupancy map and additional information. When the track type is single track, first find the geometry, occupancy map and additional information of the same group in the track according to the index information, complete the restoration of the point cloud geometry, and then index different texture data under the same group as needed to find all The required texture data is restored on the basis of point cloud geometry, occupancy map and additional information. When the track type is multi-track, first find the geometry, occupancy map and additional information and texture of the track in the track type index according to the index information, and find the data of the corresponding type in the corresponding track according to the data type index. First find the geometry, occupancy map and additional information that belong to the same group in the corresponding type of track, complete the restoration of the point cloud geometry, and then index the different texture data belonging to the same group in the corresponding texture track as needed to find what you need Texture data, complete the restoration of texture information on the basis of point cloud geometry, occupancy map and additional information.

T4.2: As shown in the branch b in Figure 9, for static point clouds, according to the relationship between the data types, the geometry of the same group of dynamic point cloud data, additional information depend on each other to restore the geometry of the dynamic point cloud, and The restoration of texture depends on the restoration of geometric shapes. When the track type is single track, first find the same group of geometry in the track according to the index information, additional information, complete the restoration of the point cloud geometry, and then index the texture data under the same group as needed to find the required texture data , Complete the restoration of texture information on the basis of point cloud geometry and additional information. When the track type is multi-track, first find the track where geometry, additional information and texture are located in the track type index according to the index information, and find the corresponding type of data in the corresponding track according to the data type index. First find the geometry and additional information that belong to the same group in the corresponding type of track, and complete the restoration of the point cloud geometry. After that, index the texture data belonging to the same group in the corresponding texture track as needed to find the required texture data. The point cloud geometry, based on additional information, completes the restoration of texture information.

T4.3: As shown in the c branch in Figure 9, for atlas videos, according to the association relationship between data types, the depth and texture of the same set of atlas data depend on each other, and the atlas video content is restored together. When the track type is single track, find the texture and depth of the same group in the track according to the index information, and combine them together to complete the restoration of the image. When the track type is multi-track, first find the track where the texture and depth are located according to the track type index according to the index information, and find the data of the corresponding type in the corresponding track according to the data type index. After that, the texture and depth of the same set of atlas data together recover the content of the set of atlas.

T4.4: As shown in the d branch in Figure 9, for the light field, according to the correlation between the data types, the angle and texture of the same group of light field data and the extended information are dependent on each other, and the content of the light field is restored together. When the track type is single track, find the same set of texture, angle and expansion information in the track according to the index information, and combine them together to complete the restoration of the image. When the track type is multi-track, first find the track where the texture, angle, and expansion information are located in the track type index according to the index information, and find the corresponding type of data in the corresponding track according to the data type index. After that, the texture and angle of the same group of light field data and the expansion information together recover the content of the group of light field data.

The fifth step T5, according to the number of media data of the corresponding type and the number of media data types, complete the analysis and combination of all media data in sequence, and finally present the immersive media video content under the new multiple degrees of freedom.

The application concept of this application, the described embodiments, and the scope of this application enable the immersive media system to provide system architecture support for the upcoming implementation of immersive media 3Dof+ and 6Dof related experiences and technical applications.

It should be noted that although this embodiment uses packaging protocols such as ISOBMFF and atlas and point cloud technologies as examples to illustrate the proposed immersive media 3Dof+ and 6Dof metadata and its structure, parameter content, data and its packaging and transmission methods, However, the immersive media data form and content under the new multiple degrees of freedom in this embodiment can also be encapsulated and transmitted in other formats, parameter expressions and files, such as using MMT, SMT transmission, ISOBMFF encapsulation, or based on OMAF ( The expansion of omnidirectional media application format, the application format of panoramic media, does not affect the expression of the core technology of this application.

As shown in FIG. 10, this application provides a multi-degree-of-freedom immersive media system, which includes a sender side and a server side. Among them, the server includes a receiving module, a parsing module, and a data processing module. After the sender finishes sending the encapsulated media file, the server will receive the media file through the receiver. First, the encapsulated media file protocol will be parsed, and the media data content will be processed according to the parsed content. .

As shown in FIG. 10, a processor and a memory coupled to the processor are provided. When executing the computer-readable program in the memory, the processor may be configured to execute the method and system for receiving multimedia data in multiple degrees of freedom described in conjunction with FIGS. 1-9.

Those skilled in the art know that, in addition to implementing the system and its various devices, modules, and units provided by this application in a purely computer-readable program code manner, it is completely possible to make the system and its various devices provided by this application by logically programming the method steps , Modules and units implement the same functions in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers and embedded microcontrollers. Therefore, the system and its various devices, modules, and units provided in this application can be regarded as a hardware component, and the devices, modules, and units included in the system for realizing various functions can also be regarded as hardware components. The structure; the devices, modules, and units used to implement various functions can also be regarded as both software modules for implementing methods and structures within hardware components.

Those skilled in the art will further appreciate that the various illustrative logic blocks, modules, circuits, and algorithm steps described in conjunction with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or a combination of the two. In order to clearly explain the interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps are generally described above in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the specific application and the design constraints imposed on the overall system. Technicians can implement the described functionality in different ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope of this application.

The various illustrative logic modules and circuits described in conjunction with the embodiments disclosed herein can be used with general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable Logic devices, discrete gates or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein are implemented or executed. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.

The steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be directly embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from and write information to the storage medium. In the alternative, the storage medium may be integrated into the processor. The processor and the storage medium may reside in the ASIC. The ASIC may reside in the user terminal. In the alternative, the processor and the storage medium may reside as discrete components in the user terminal.

In one or more exemplary embodiments, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, each function can be stored as one or more instructions or codes on a computer-readable medium or transmitted through it. Computer-readable media includes both computer storage media and communication media, including any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a computer. By way of example and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or can be used to carry or store instructions or data in the form of a structure Any other medium that agrees with the program code and can be accessed by a computer. Any connection is also properly called a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave , Then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium. Disks and discs as used in this article include compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs, in which disks are often reproduced in a magnetic manner Data, and a disc (disc) optically reproduces the data with a laser. Combinations of the above should also be included in the scope of computer-readable media.

Claims

A method for sending multimedia data with multiple degrees of freedom, including:

The multimedia data is encapsulated according to the encapsulation transmission protocol, the encapsulation transmission protocol includes:

Determining the attribute information of the multimedia data includes: determining the data type for different media types of the multimedia data; determining and identifying the number and location information of the media stream of the track where the multimedia data of the media type is located; and determining the number of data contents in different media data The relationship between; and,

Respectively determine the corresponding index mode and index information for the attribute information,

Transmit the encapsulated multimedia data.
The method for sending multimedia data under multiple degrees of freedom according to claim 1, wherein the data form of the multimedia data includes 3Dof+ mode and/or 6Dof mode; encapsulation transmission is suitable for MPEG media file transmission MMT mode or smart media transmission SMT mode or ISO-based media file format ISOBMFF or panoramic media application OMAF extension method.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein the different media types of the multimedia data include any one or more of the following: traditional two-dimensional video, atlas video, dynamic point cloud, static point cloud , Light field.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein determining the data type of the multimedia data:

When the media type is atlas video, the data type includes texture data and depth data;

When the media type is dynamic point cloud, the data type includes texture, geometry, occupancy map and additional information data;

When the media type is static point cloud, the data type includes texture, geometry, and additional information data;

When the media type is light field, the data type includes texture data and angle data.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein determining the data type of the multimedia data further comprises:

Determine the number of data groups of the corresponding data type for each data type.
The method for sending multimedia data with multiple degrees of freedom according to claim 5, wherein the correspondence between the numbers of data groups of different data types comprises:

The same structure corresponds to the same texture; or,

The same structure corresponds to different textures that are substitutes for each other.
The method for sending multimedia data in multiple degrees of freedom according to claim 1, wherein the number and location information of the track media stream where the multimedia data of the media type is determined and identified includes:

Define the track type, indicating that the multimedia data of each media type is in one or at least two tracks, among which,

Single track: define the media track number where the multimedia data is located; and define the specific position of each data in the multimedia data in the track;

At least two tracks: define the media track number of each data contained in the multimedia data, and define the specific position of each data in the multimedia data in the track.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein the association relationship between multiple data contents in different media data is determined, and the association relationship includes:

Data content is interdependent and/or data content is single dependent and/or data content is interchangeable.
The method for transmitting multimedia data in multiple degrees of freedom according to claim 8, wherein:

The interdependent relationship includes: the texture and depth data in the atlas are dependent on each other; the geometry, occupancy map, and additional information in the point cloud are dependent on each other to jointly construct the point cloud geometric skeleton;

The single dependent association relationship includes: the texture data in the point cloud needs to rely on geometry, occupancy map, and additional information to jointly construct the geometric skeleton; additional atlas depends on the basic atlas; and,

The mutual replacement relationship includes: for the same point cloud geometric skeleton, different texture data are used for replacement.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein the index information includes a collection of the above-mentioned attribute information, and the attribute information is respectively placed at different levels of the encapsulation transmission protocol to describe, or the definition includes the media Index of all attribute information.
The method for sending multimedia data with multiple degrees of freedom according to claim 1, wherein the targeted multimedia data data stream includes any one or more of the following: outer layer information, description indication information, and data content information;

Among them, the outer layer information is used to define the file type and content compatibility of the multimedia data; the description indication information is used to describe and indicate the multimedia data; the data content information is used for the specific content information of the multimedia data.
A method for receiving multimedia data under multiple degrees of freedom, including:

The encapsulated multimedia data is received, analyzed according to the encapsulated transmission protocol that is inverse to claim 1, and the multimedia data is processed correspondingly according to the parsed content.
The method for receiving multimedia data with multiple degrees of freedom according to claim 12, which comprises:

S1: Receive the media content data of the multimedia data, analyze it according to the encapsulation transmission protocol, and obtain the description instruction information of the multimedia data;

S2: Judging the media content data according to the description instruction information;

S3: According to the media content type, analyze and obtain the data group quantity description information and/or media data type description information and/or track type description information under the corresponding media content type;

S4: Obtain the description information of the media data type, and analyze and obtain the description information about the association relationship of different data types;

S5: Based on the description information of different media data types and the description information of the number of data groups, the number corresponding to each data type after analysis is completely obtained;

S6: Completely obtain the index information corresponding to each data type in the analysis information according to the number of data groups of different types of data, according to the track type description information obtained in S3, the association relationship description information between the data types obtained in S4, and S5 The index information description information of each data type obtained in, obtain the required media content.
A media processor, including:

The storage module, the receiving module, the analysis module, and the data processing module are used to receive multimedia data for analysis and processing according to the encapsulation transmission protocol, the encapsulation transmission protocol includes:

Determining the attribute information of the multimedia data includes: determining the data type for different media types of the multimedia data; determining and identifying the number and location information of the media stream of the track where the multimedia data of the media type is located; and determining the number of data contents in different media data The relationship between; and,

The corresponding index mode and index information are respectively determined for the attribute information.
A player including:

The storage module, the receiving module, the analysis module, and the data processing module are used to receive multimedia data for analysis and processing according to the encapsulation transmission protocol, the encapsulation transmission protocol includes:

Determining the attribute information of the multimedia data includes: determining the data type for different media types of the multimedia data; determining and identifying the number and location information of the media stream of the track where the multimedia data of the media type is located; and determining the number of data contents in different media data The relationship between; and,

The corresponding index mode and index information are respectively determined for the attribute information.