WO2019007096A1 - 一种媒体信息的处理方法及装置 - Google Patents
一种媒体信息的处理方法及装置 Download PDFInfo
- Publication number
- WO2019007096A1 WO2019007096A1 PCT/CN2018/078540 CN2018078540W WO2019007096A1 WO 2019007096 A1 WO2019007096 A1 WO 2019007096A1 CN 2018078540 W CN2018078540 W CN 2018078540W WO 2019007096 A1 WO2019007096 A1 WO 2019007096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- metadata
- media data
- media
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012545 processing Methods 0.000 title claims abstract description 57
- 230000008569 process Effects 0.000 claims description 12
- 230000002708 enhancing effect Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 13
- 238000003860 storage Methods 0.000 description 11
- 238000002360 preparation method Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical compound C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 1
- 229940005022 metadate Drugs 0.000 description 1
- JUMYIBMBTDDLNG-UHFFFAOYSA-N methylphenidate hydrochloride Chemical compound [Cl-].C=1C=CC=CC=1C(C(=O)OC)C1CCCC[NH2+]1 JUMYIBMBTDDLNG-UHFFFAOYSA-N 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
- H04N21/2353—Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4345—Extraction or processing of SI, e.g. extracting service information from an MPEG stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
- H04L65/612—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
- H04L65/613—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for the control of the source by the destination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/756—Media network packet handling adapting media to device capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/55—Push-based network services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/561—Adding application-functional data or data for application control, e.g. adding metadata
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2362—Generation or processing of Service Information [SI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4668—Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/658—Transmission by the client directed to the server
- H04N21/6587—Control parameters, e.g. trick play commands, viewpoint selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/858—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
- H04N21/8586—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL
Definitions
- the present invention relates to the field of streaming media transmission technologies, and in particular, to a method and an apparatus for processing media information.
- the ISO/IEC 23090-2 standard specification is also known as the OMAF (Omnidirectional media format) standard specification, which defines a media application format that enables the presentation of omnidirectional media in applications, omnidirectional media. Refers to omnidirectional video (360° video) and related audio.
- OMAF Omnidirectional media format
- the OMAF specification first specifies a list of projection methods that can be used to convert spherical video to two-dimensional video, and secondly how to use the ISO base media file format (ISOBMFF) to store omnidirectional media associated with the media.
- ISOBMFF ISO base media file format
- Metadata and how to encapsulate omnidirectional media data and transmit omnidirectional media data in a streaming media system, such as through Dynamic Adaptive Streaming over HTTP based on HyperText Transfer Protocol (HTTP) , DASH), Dynamic Adaptive Streaming as specified in the ISO/IEC 23009-1 standard.
- HTTP HyperText Transfer Protocol
- DASH Dynamic Adaptive Streaming as specified in the ISO/IEC 23009-1 standard.
- the ISO basic media file format is composed of a series of boxes, and other boxes can be included in the box.
- the boxes include a metadata box and a media data box, and the metadata box (moov box) includes metadata.
- Media data box (mdat box) includes media data, and the box of metadata and the box of media data may be in the same file or in separate files; if the metadata with time attribute is ISO
- the basic media file format is encapsulated, and the metadata box includes metadata describing metadata having a time attribute, and the media data box includes metadata having a time attribute.
- the client cannot accurately identify the source of the data, the client cannot fully satisfy the user's needs when selecting the media data according to the metadata, and the user experience is poor.
- the embodiment of the invention provides a method and a device for processing media information, which can enable a client to select different processing modes according to the source of the metadata.
- a method for processing media information comprising:
- the metadata information includes source information of the metadata
- the source information is used to represent a recommender of the media data
- the media data is omnidirectional media data
- the omnidirectional media data of the embodiment of the present invention may be video data or audio data.
- related examples of omnidirectional media may refer to the relevant provisions of the ISO/IEC 23090-2 standard specification.
- the metadata refers to some attribute information of the video data, such as the duration of the corresponding video data, the code rate, the frame rate, the position in the spherical coordinate system, and the like.
- the area of the omnidirectional video refers to an area in the video space corresponding to the omnidirectional video.
- the source information of the metadata may indicate that the video data corresponding to the metadata is recommended by the author of the omnidirectional video, or may indicate that the video data corresponding to the metadata is recommended by a certain user of the omnidirectional video. Or, it may be that the video data corresponding to the metadata is recommended after counting the viewing results of the plurality of users of the omnidirectional video.
- the client can refer to the information of the recommender of the media data when performing data processing, thereby enriching the user's selection and enhancing the user experience.
- the obtaining metadata information of the media data includes:
- the address information of the metadata track can be obtained through the media presentation description file, and then the information acquisition request is sent to the address, and the metadata track of the media data is received.
- the address information of the metadata track can be obtained through a separate file, and then an information acquisition request is sent to the address, and the metadata track of the media data is received.
- the server sends a metadata trace of the media data to the client.
- a track refers to a series of samples with time attributes in accordance with the ISO base media file format (ISOBMFF) encapsulation method.
- ISOBMFF ISO base media file format
- a video track is obtained by encapsulating a code stream generated after encoding each frame by a video encoder according to the ISOBMFF specification.
- the specific definition of the trajectory can be referred to the relevant description in ISO/IEC 14496-12.
- the relevant attributes and data structures of the media presentation description file can be referred to the relevant description in ISO/IEC 23009-1.
- the source information of the metadata may be stored in a newly added box in the metadata track, and the source information of the metadata is obtained by parsing the data of the box.
- the source information of the metadata may be an attribute added in an existing box in the metadata track, and the source information of the metadata is obtained by parsing the attribute.
- the client can obtain the source information of the metadata when obtaining the metadata track, so that the client can comprehensively consider other attributes of the metadata and the source information of the metadata for subsequent operation. Processing of related media data.
- the obtaining metadata information of the media data includes:
- the client can obtain the media presentation description file by sending an HTTP request to the server, or the server can directly push the media presentation description file to the client.
- the client can also obtain the media presentation description file by other possible means, for example, obtaining the media presentation description file from other client side interactions.
- the related attributes and data structures of the media presentation description file can refer to the relevant description in ISO/IEC23009-1.
- the source information of the metadata may be the information indicated in the descriptor, and the source information of the metadata may also be an attribute information.
- the source information of the metadata may be in an adaptation set hierarchy in the media presentation description file or in a representation hierarchy.
- the obtaining metadata information of the media data includes:
- code stream including the media data
- code stream further includes supplementary enhancement information (SEI), where the auxiliary enhancement information includes source information of the metadata.
- SEI supplementary enhancement information
- the client may send a media data acquisition request to the server, and then receive the media data sent by the server.
- the client can construct a Uniform Resource Locator (URL) through the related attribute and address information in the media presentation description file, and then send an HTTP request to the URL, and then receive the corresponding media data.
- URL Uniform Resource Locator
- the client may also receive the media data stream pushed by the server.
- the source information of the metadata is a source type identifier.
- the value of the source type identifier or a different source type identifier can indicate the corresponding source type.
- a bit flag can be used to indicate the source type, or a more bit field can be used to identify the source type.
- the client side stores a file of the correspondence between the source type identifier and the source type, and the client may determine the corresponding source type according to different values of the source type identifier or different source type identifiers.
- one source type corresponds to one recommender, for example, the source type may be an author recommendation of a video, or a certain user recommendation, or a recommendation after counting the viewing results of multiple users.
- the source information of the metadata includes a semantic representation of a recommender of the media data.
- codewords in ISO-639-2/T can be used to represent various semantics.
- processing the media data corresponding to the metadata according to the source information of the metadata may include the following implementation manners:
- the client side may request corresponding media data from the server side or other terminal side according to the user's selection of the source information.
- the client side may present or transmit the media data according to the user's selection of the source information.
- the server may push the media data to the client according to the source information of the metadata.
- the server may determine the media data to be pushed according to the source information of the received multiple metadata. For example, selecting from a plurality of recommendations according to a certain standard, and then pushing the media data according to the selection result. Or calculate a plurality of recommendations according to a certain standard, and then push the media data according to the calculated result.
- a second aspect of the present invention provides a device for processing media information.
- the device includes:
- An information obtaining module configured to obtain metadata information of the media data, where the metadata information includes source information of the metadata, where the source information is used to represent a recommender of the media data, and the media data is an omnidirectional media data.
- a processing module configured to process the media data according to source information of the metadata.
- the client can refer to the information of the recommender of the media data when performing data processing, thereby enriching the user's selection and enhancing the user experience.
- the information acquiring module is specifically configured to: obtain a metadata track of the media data, where the metadata track includes source information of the metadata.
- the information obtaining module is specifically configured to: obtain a media presentation description file of the media data, where the media presentation description file includes source information of the metadata.
- the information acquiring module is specifically configured to: obtain a code stream that includes the media data, where the code stream further includes supplementary enhancement information (SEI), the auxiliary The source information of the metadata is included in the enhancement information.
- SEI supplementary enhancement information
- the source information of the metadata is a source type identifier.
- the source information of the metadata includes a semantic representation of a recommender of the media data.
- a third aspect of the present invention discloses a method for processing media information, where the method includes:
- statistical analysis can be performed on the viewing angles of multiple users viewing the same video, thereby providing an effective perspective recommendation manner for subsequent users to view the video, thereby enhancing the user experience.
- the method is performed by the server side, for example, by a content preparation server, a content distribution network (CDN) or a proxy server.
- a content preparation server for example, a content preparation server, a content distribution network (CDN) or a proxy server.
- CDN content distribution network
- the information of the user view sent by the client may be sent through a separate file, or may be included in other data files sent by the client.
- the target viewing angle is determined according to the information of all the user perspectives, and the target viewing angle may be selected according to a predetermined criterion according to a statistical principle, or may be a plurality of viewing angles according to a certain manner.
- the data is calculated to obtain a target perspective.
- the media data corresponding to the target perspective may be directly pushed to the client, or the media data corresponding to the target perspective may be pushed to the distribution server, or may be received by the client for the omnidirectional media data.
- the media data corresponding to the target perspective is fed back to the client.
- a fourth aspect of the present invention discloses a method for processing media information, where the method includes:
- statistical analysis can be performed on the viewing angles of multiple users viewing the same video, thereby providing an effective perspective recommendation manner for subsequent users to view the video, thereby enhancing the user experience.
- the method is performed by the server side, for example, by a content preparation server, a content distribution network (CDN) or a proxy server.
- a content preparation server for example, a content preparation server, a content distribution network (CDN) or a proxy server.
- CDN content distribution network
- the information of the user view sent by the client may be sent through a separate file, or may be included in other data files sent by the client.
- the target viewing angle is determined according to the information of all the user perspectives, and the target viewing angle may be selected according to a predetermined criterion according to a statistical principle, or may be a plurality of viewing angles according to a certain manner.
- the data is calculated to obtain a target perspective.
- the fifth aspect of the present invention discloses a device for processing media information, where the device includes:
- a receiver configured to receive information of a user perspective sent by a plurality of clients, where the user perspective information is used to indicate a viewing angle when the user views the omnidirectional media data, and the processor is configured to use all the users according to the user
- the information of the perspective determines a target perspective; the transmitter is configured to send the media data corresponding to the target perspective.
- a sixth aspect of the present invention discloses a device for processing media information, where the device includes:
- a receiver configured to receive information of a user perspective sent by a plurality of clients, where the user perspective information is used to indicate a viewing angle when the user views the omnidirectional media data, and the processor is configured to use all the users according to the user
- the information of the perspective determines a target perspective, and generates metadata information of the media data according to the target perspective.
- a seventh aspect of the present invention discloses a device for processing media information, where the device includes: one or more processors, and a memory.
- the memory is coupled to one or more processors; the memory is for storing computer program code, the computer program code comprising instructions, and when the one or more processors execute the instructions, the processing device performs the first aspect, or the third aspect, Or the fourth aspect, or the method for processing media information according to any of the possible implementation manners of the foregoing aspects.
- An embodiment of the eighth aspect of the present invention discloses a computer readable storage medium, wherein the computer readable storage medium stores an instruction, wherein when the instruction is run on a device, causing the device to perform the first step as described above.
- FIG. 1 is a diagram showing an example of a view change in an omnidirectional video according to an embodiment of the present invention.
- FIG. 2 is a diagram showing an example of dividing a space corresponding to an omnidirectional video into a spatial object according to an embodiment of the present invention.
- FIG. 3 is a schematic diagram of relative positions of spatial objects in a space corresponding to omnidirectional video according to an embodiment of the present invention.
- FIG. 4 is an illustration of a coordinate system describing a spatial object in accordance with an embodiment of the present invention.
- FIG. 5 is another example of a coordinate system describing a spatial object according to an embodiment of the present invention.
- FIG. 6 is another example of a coordinate system describing a spatial object in accordance with an embodiment of the present invention.
- FIG. 8 is a schematic flowchart diagram of a method for processing media information according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of a device for processing media information according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram of specific hardware of a device for processing media information according to an embodiment of the present invention.
- FIG. 11 is a schematic diagram of a mapping relationship between a spatial object and video data according to an embodiment of the present invention.
- a track is a series of samples of a time base attribute in accordance with the ISO base media file format (ISOBMFF).
- ISOBMFF ISO base media file format
- Track is defined in the standard ISO/IEC 14496-12 as "timed sequence of related samples(q.v.) in an ISO base media file
- a track corresponds to a sequence of images or sampled audio; for hint tracks, a track corresponds to a streaming channel.”
- a track is a sequence of images or audio samples; for a cue track, a track corresponds to a stream channel.
- An ISOBMFF file is composed of a plurality of boxes, wherein one box can include other boxes.
- the supplementary enhancement information is a network abstract layer unit (NALU) defined in the video coding and decoding standard h.264, h.265 issued by the International Telecommunication Union (ITU). )type.
- NALU network abstract layer unit
- the Media Presentation description is a document specified in the standard ISO/IEC 23009-1, in which the metadata of the HTTP-URL constructed by the client is included. Include one or more period elements in the MPD, each period element includes one or more adaptation sets, each adaptation set includes one or more representations, each representation includes One or more segments, the client selects the expression based on the information in the MPD and builds the segmented http-URL.
- the spatial region of the VR video (the spatial region may also be called a spatial object) is a 360-degree panoramic space (or full).
- the azimuth space, or the panoramic space object exceeds the normal visual range of the human eye. Therefore, the user changes the viewing angle (ie, the angle of view, FOV) at any time during the process of watching the video.
- FIG. 1 is a schematic diagram of a perspective corresponding to a change in viewing angle.
- Box 1 and Box 2 are two different perspectives of the user, respectively.
- the video image viewed when the user's perspective is box 1 is a video image presented by the one or more spatial objects corresponding to the perspective at the moment.
- the user's perspective is switched to box 2.
- the video image viewed by the user should also be switched to the video image presented by the space object corresponding to box 2 at that moment.
- the server may divide a panoramic space (or a panoramic spatial object) within a range of viewing angles corresponding to the omnidirectional video to obtain a plurality of spatial objects,
- Each spatial object can correspond to a sub-view of the user, and the stitching of the plurality of sub-views forms a complete human eye viewing angle, and each spatial object corresponds to a sub-area of the panoramic space.
- the human eye angle of view (hereinafter referred to as the angle of view) may correspond to one or more divided spatial objects, and the spatial object corresponding to the perspective is all the spatial objects corresponding to the content objects within the scope of the human eye.
- the viewing angle of the human eye can be dynamically changed, but generally the viewing angle range may be 120 degrees*120 degrees, and the spatial object corresponding to the content object within the human eye angle range of 120 degrees*120 degrees may include one or more divided objects.
- the spatial object is, for example, the viewing angle 1 corresponding to the frame 1 of FIG. 1 and the viewing angle 2 corresponding to the frame 2.
- the client may obtain the spatial information of the video code stream prepared by the server for each spatial object through the MPD, and then request the video code stream corresponding to one or more spatial objects in a certain period of time according to the requirement of the perspective.
- the segment outputs the corresponding spatial object according to the perspective requirements.
- the client outputs the video stream segment corresponding to all the spatial objects within the 360-degree viewing angle range in the same time period, and then displays the complete video image in the entire 360-degree panoramic space.
- the server may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
- FIG. 2 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I.
- the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like.
- the server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 2 in a latitude and longitude mapping manner. As shown in FIG. 2, after the server can divide the spherical panoramic space into a plurality of spatial objects such as A to I, a set of video code streams can be prepared for each spatial object.
- a set of video code streams corresponding to each spatial object When the client user switches the viewing angle of the video viewing, the client can obtain the code stream corresponding to the new spatial object according to the new perspective selected by the user, and then the video content of the new spatial object code stream can be presented in the new perspective.
- the video creator (hereafter referred to as the author) produces the video
- he can design a main plot route for the video playback according to the video storyline needs.
- the user only needs to watch the video image corresponding to the main plot route to understand the storyline, and other video images can be seen or not.
- the client can selectively play the video image corresponding to the storyline, and other video images may not be presented, which can save the transmission resource and storage space resources of the video data, and improve the processing efficiency of the video data. .
- the video image to be presented to the user at each play time during video playback can be set according to the above-mentioned main plot route, and the video sequence of each play time can be obtained by stringing the time series to obtain the above main plot route.
- Storyline The video image to be presented to the user at each of the playing times is a video image presented on a spatial object corresponding to each playing time, that is, a video image to be presented by the spatial object during the time period.
- the angle of view corresponding to the video image to be presented at each of the playing times may be set as the author's perspective
- the spatial object that presents the video image in the perspective of the author may be set as the author space object.
- the code stream corresponding to the author view object can be set as the author view code stream.
- the video stream data of the plurality of video frames (encoded data of the plurality of video frames) is included in the code stream of the author, and each video frame can be presented as one image, that is, corresponding to the plurality of images in the author's view code stream.
- the image presented by the author's perspective is only part of the panoramic image (or VR image or omnidirectional image) that the entire video is to present.
- the spatial information of the spatial objects associated with the image corresponding to the author's perspective may be different or the same.
- the region information corresponding to the perspective can be encapsulated into a metadata track.
- the client can request the server and the region carried in the metadata track.
- the corresponding video code stream is decoded, and then the story scene picture corresponding to the author's perspective can be presented to the user.
- the server does not need to transmit the code stream of other perspectives other than the author's perspective (set to the non-author perspective, that is, the static view stream) to the client, which can save resources such as the transmission bandwidth of the video data.
- the author's perspective is an image of the preset space object set by the author according to the video storyline
- the author's spatial objects at different playback moments may be different or the same, and thus the author's perspective is a perspective that changes with the playing time.
- the author space object is a dynamic space object with changing positions, that is, the position of the author space object corresponding to each play time is different in the panoramic space.
- Each of the spatial objects shown in FIG. 2 is a spatial object divided according to a preset rule, and is a spatial object with a relative position in the panoramic space.
- the author space object corresponding to any play time is not necessarily fixed as shown in FIG. 2 .
- the spatial information may include location information of a center point of the spatial object or location information of an upper left point of the spatial object, and the spatial information may further include a width of the spatial object and the space. The height of the object.
- the spatial information when the coordinate system corresponding to the spatial information is an angular coordinate system, the spatial information may be described by using a yaw angle.
- the spatial information When the coordinate system corresponding to the spatial information is a pixel coordinate system, the spatial information may be described by a spatial position of the latitude and longitude map, or It is described by other geometric solid figures, and no limitation is imposed here. It is described by the yaw angle method, such as the pitch angle ⁇ (pitch), the yaw angle yaw, and the roll angle ⁇ (roll), which are used to indicate the width of the angle range and to indicate the height of the angle range. 3, FIG. 3 is a schematic diagram of the relative positions of the center points of the spatial objects in the panoramic space. In FIG.
- the point O is the center of the 360-degree VR panoramic video spherical image, which can be considered as the position of the human eye when viewing the VR panoramic image.
- Point A is the center point of the target space object
- C and F are the boundary points of the target space object along the horizontal coordinate axis of the target space object
- E and D are the target space objects passing the point A along the target space.
- the boundary point of the longitudinal coordinate axis of the object B is the projection point of the A point along the spherical meridian line on the equator line
- I is the starting coordinate point of the horizontal direction on the equator line.
- Pitch angle the center position of the image of the target space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ⁇ AOB in FIG. 3;
- Yaw angle the center position of the image of the target space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, such as ⁇ IOB in FIG. 3;
- Rolling angle the center position of the image of the yaw angle space object is mapped to the rotation angle of the point on the panoramic spherical image and the connection direction of the spherical center, as shown in FIG. 3 ⁇ DOB;
- the height of the angular range (the height of the target space object in the angular coordinate system): the image of the spatial object in the field of view of the panoramic spherical image, expressed as the maximum vertical angle of the field of view, as shown in Figure 3 ⁇ DOE;
- the width of the angle range (width of the target space object in the angular coordinate system): the image of the target space object is represented by the maximum field of view of the panoramic spherical image, as shown in Figure 3, ⁇ COF.
- the spatial information may include location information of an upper left point of the spatial object, and location information of a lower right point of the spatial object.
- the spatial information when the spatial object is not a rectangle, the spatial information may include at least one of a shape type, a radius, and a perimeter of the spatial object.
- the spatial information can include spatial rotation information for the spatial object.
- the spatial information may be encapsulated in spatial information data or a spatial information track, which may be a code stream of video data, metadata of video data, or a file independent of video data, spatial information.
- the trajectory can be a trajectory that is independent of the video data.
- the spatial information may be encapsulated in the spatial mate data of the video, such as in a same box, such as a covi box.
- the coordinate system for describing the width and height of the target space object is as shown in FIG. 4, and the shaded portion of the spherical surface is the target space object, and the vertices of the four corners of the target space object are B, E, G, respectively.
- O is the sphere corresponding to the 360-degree VR panoramic video spherical image
- the apex BEGI is the circle of the spherical center (the circle is centered on the center of the sphere O, and the radius of the circle is 360 degrees VR panorama
- the radius of the sphere corresponding to the video sphere image, the circle passing the z-axis, the number of the circle is two, one passing point BAIO, one passing point EFGO), and a circle parallel to the coordinate axis x-axis and y-axis (the circle is not With the center of the sphere O as the center, the number of the circle is two, and the two circles are parallel to each other, one passing point BDE, one passing point IHG) on the spherical surface, C is the center point of the target space object, and the DH side corresponds to The angle is expressed as the height of the target space object, the angle corresponding to the AF side is expressed as the width of the target space object, and the DH side and the
- the target space object may also be obtained by intersecting two large rings of the spherical center and two parallel rings; or two yaw angle rings and two elevation angles. The intersection of the rings is obtained, the yaw angle ring is the same yaw angle of the points on the ring, the pitch angle ring is the same as the pitch angle of the points on the ring; or two longitude circles and two latitude circles Intersect
- the coordinate system for describing the width and height of the target space object is as shown in FIG. 5, the shaded portion of the spherical surface is the target space object, and the vertices of the four corners of the target space object are B, E, G, respectively.
- O is the sphere corresponding to the 360-degree VR panoramic video spherical image
- the vertex BEGI is the circle passing the z-axis (the circle is centered on the center of the sphere O, and the radius of the circle is 360 degrees VR panorama
- the radius of the sphere corresponding to the video sphere image, the number of the circle is two, one passing point BAI, one passing point EFG), and a circle passing the y axis (the circle is centered on the center of the sphere O, and the radius of the circle is 360 degree VR panoramic video spherical image corresponding to the radius of the sphere, the number of the circle is two, one passing point BDE, one passing point IHG) on the spherical surface
- C is the center point of the target space object, DH side corresponding
- the angle is expressed as the height of the target space object, the angle corresponding to the AF side is represented as the width of the target space object, and the DH side and the AF side pass the C point, wherein the
- intersection point, the vertex of the angle corresponding to the AF side is the O point
- the vertex of the angle corresponding to the BI side is the L point
- the point L is the intersection of the circle passing through the BI point and parallel to the z-axis and the x-axis and the y-axis
- the EG side corresponds to
- the apex of the corner is the intersection of the circle passing the two points of EG and parallel to the z-axis and the x-axis
- the apex of the angle corresponding to the DH side is also the point O.
- the target space object can also be obtained by intersecting two circles passing the x-axis and two circles passing the z-axis.
- the target space object can also be two circles passing the x-axis and The intersection of the two circles of the y-axis is obtained, or the intersection of the four circles of the spherical center is obtained.
- the coordinate system for describing the width and height of the target spatial object is as shown in FIG. 6.
- the shaded portion of the spherical surface is the target spatial object, and the vertices of the four corners of the target spatial object are B, E, G, respectively.
- O is the sphere corresponding to the 360-degree VR panoramic video spherical image
- the vertices BEGI are respectively parallel to the x-axis and the z-axis of the coordinate axis (the circle is not centered on the center of the sphere O, the circle The number is two, and the two circles are parallel to each other, the number of the circle is two, one passing point BAI, one passing point EFG), and a circle parallel to the coordinate axis x-axis and y-axis (the circle does not take the ball
- the heart O is the center of the circle.
- the number of the circle is two, and the two circles are parallel to each other, one passing point BDE, one passing point IHG) on the spherical surface, C is the center point of the target space object, and the angle corresponding to the DH side Expressed as the height of the target space object, the angle corresponding to the AF edge is represented as the width of the target space object, and the DH edge and the AF edge pass the C point, wherein the BI edge, the EG edge, and the DH edge correspond to the same angle; the BE edge, the IG edge, and The AF edge corresponds to the same angle; the edge of the corner corresponding to the BE edge, the IG edge, and the AF edge is the O point, the BI edge, and the EG DH vertices and edges corresponding to the angle of the point is also O.
- the target space object may also be parallel to the y-axis and the z-axis and the two circles of the spherical center and the y-axis and the x-axis are parallel, but the two circles of the spherical center intersect.
- the target space object may also be parallel to the y-axis and the z-axis and the two circles of the spherical center are parallel to the z-axis and the x-axis and the two circles of the spherical center intersect.
- the J point and the L point in FIG. 5 are the same as the J point in FIG. 4, the apex of the corner corresponding to the BE side is the J point, and the apex of the angle corresponding to the BI side is the L point; in FIG. 6, the BE side The vertices corresponding to the BI side are all O points.
- FIG. 11 is a schematic diagram of a mapping relationship between a spatial object and video data according to an embodiment of the present invention.
- Figure 11 (a) shows an omnidirectional video (left large image) and omnidirectional video sub-region (right panel), and
- Figure 11 (b) shows omnidirectional video corresponding video space (spherical) and full A spatial object corresponding to a sub-region of the video (a dark portion on the sphere).
- a timed metadata track having a time attribute of a region on a spherical surface in which a box describing a spherical region is included in a box of metadata.
- the data in the media data box, includes the information of the spherical area, and the intent of the metadata track with the time attribute is described in the box of the metadata, that is, what the spherical area is used for, and two are described in the standard.
- the metadata track with time attributes - The recommended viewport timed metadata track and the initial viewpoint timed metadata track.
- the recommended view trajectory describes the area of the view that is recommended for presentation to the terminal, and the initial view trajectory describes the initial presentation direction for omnidirectional video viewing.
- the server side 701 includes a content preparation 7011 and a content service 7012.
- the content preparation 7011 may be a transcoder of the media data collection device or the media data, and is responsible for generating media content of the streaming media and related metadata, such as compression, encapsulation and storage/transmission of the media files (video, audio, etc.). .
- the content preparation 7011 can generate metadata information and a file in which the metadata source is located. Metadata can be encapsulated as a metadata track, and metadata can also be encapsulated in the SEI of the video data track.
- a sample in a metadata trajectory is a partial region in an omnidirectional video specified by a content creator or a partial region in an omnidirectional video specified by a content creator, and the metadata source is encapsulated in a metadata trajectory or carried in In the MPD.
- the source information of the metadata can be carried in the SEI.
- the source information of the metadata may indicate that the metadata indicates the producer of the content or the viewing area recommended by the director.
- the content service 7012 can be a network node, such as a content distribution network (CDN) or a proxy server.
- the content service 7012 may obtain the stored or transmitted data from the content preparation 7011, and forward the data to the terminal side 702; or obtain the area information fed back by the terminal from the terminal side 702, and generate the area metadata track or the area SEI information according to the feedback information, and Generate a file that carries the source of the zone information.
- the generated area metadata trajectory or the area SEI information may be view information fed back by each area of the omnidirectional video, and the sample of the user's area of interest is generated according to one or more areas of the statistically selected area, and the sample is encapsulated in the sample.
- the metadata track is either encapsulated in the SEI and encapsulates the area metadata source information into the track or carried in the MPD, or carries the source information of the area metadata in the SEI.
- the source of information indicates that the regional metadata information is derived from server statistics, indicating that the region described in the metadata track is the region of interest to most users.
- the area metadata track or the area information in the area SEI may also be area information fed back by a certain user specified by the server, generate an area metadata track or a region SEI according to the feedback information, and carry the source information of the area metadata in the area metadata.
- the trajectory is either carried in the MPD or in the SEI, and the source of the region information describes that the region metadata is from a certain user.
- content preparation 7011 and the content service 7012 can be on the same server hardware device or different hardware devices. Both content preparation 7011 and content service 7012 may include one or more hardware devices.
- the terminal side 702 obtains the media data and presents the media data, and the terminal side 702 obtains the area information of the content presented by the user in the omnidirectional video, the terminal side 702 feeds the area information to the content service side 701, or the terminal side 702 obtains the media data.
- the metadata and the data carrying the metadata source information, the terminal side 702 analyzes the metadata source information, parses the corresponding metadata according to the metadata source selected by the terminal user, and obtains the regional information for media presentation.
- the module processing method related to the source information of the metadata track is as follows;
- the source information may indicate that the area related to the metadata is recommended by the producer of the content or recommended by the director or may be recommended by a specified user, or may be a user according to relevant statistics The area of interest; the source information may also indicate the perspective of the omni-directional video that the content creator or director recommends to the user, or the most interesting area recommended by the server or the perspective recommended by a certain user.
- the information of the area is mainly referred to as some metadata of the area, and the information of the area may indicate the area recommended by the creator or the director of the content, or may be the most interesting area of the user obtained by the statistical user feedback information, or The area where the end user views the omnidirectional video.
- the region may be a two-dimensional planar region or a spherical region, and the two-dimensional planar region information is represented by a coordinate position of the upper left pixel in the two-dimensional plane and a width and height of the region in the two-dimensional plane.
- the information of the area is represented by the position of the center point of the area on the spherical surface and the wide and high coverage angle of the area on the spherical surface.
- Figure 1- The manner shown in Figure 6.
- the region may also be a direction on the sphere or a point on the sphere, in which case the representation of the region has no width and height information.
- the MPD file is generated; or the source of the metadata and metadata is encapsulated in the SEI to generate a code stream file.
- the file generated by the module can be stored locally or sent to the receiving end.
- the receiving end can be the terminal side or the content service side.
- the module for the source information of the metadata track may be the content preparation 7011 of FIG. 7, the content service 7012, a separate sub-module of the terminal side 702, or may integrate related functions in the above device.
- the technical solution of the embodiment of the present invention mainly falls on the content preparation side (transcoder), the intelligent network node (CND, proxy server), and the terminal player side.
- the transcoding server, the network server, and the terminal When the transcoding server, the network server, and the terminal generate the area metadata, the metadata is encapsulated into independent tracks or encapsulated in the SEI, and the source of the metadata is encapsulated in the metadata track or the SEI or the MPD file.
- an embodiment of the present invention discloses a method for processing media information S80, and the method S80 includes:
- S801 Obtain metadata information of the media data, where the metadata information includes source information of the metadata, where the source information is used to represent a recommender of the media data, and the media data is omnidirectional media data;
- S802 Process the media data according to source information of the metadata.
- an embodiment of the present invention discloses a media information processing apparatus 90.
- the apparatus 90 includes an information acquiring module 901 and a processing module 902.
- the information obtaining module 901 is configured to obtain metadata information of the media data, where the metadata information includes source information of the metadata, where the source information is used to represent a recommender of the media data, and the media data is omnidirectional media data.
- the processing module 902 is configured to process the media data according to the source information of the metadata.
- the source information of the metadata is carried in the metadata track.
- a description box of the sample data source in the metadata track is added, and the track source is described in the box.
- the format of the box added in this embodiment is as follows.
- SourceInformationBox extends Box(‘sinf’) ⁇
- source_type describes the source information of the track where the above box is located.
- Presenting the user's media content to the user; when source_type 1, indicating that the area information in the track is the most interesting area of the user, or indicating the most interesting area from the statistics, the terminal side can use the track
- the information presents the area of interest to most of the users in the omni-directional media to the user.
- the processing flow for obtaining the information of the above metadata track on the terminal side is as follows:
- the terminal acquires the metadata track, parses the metadata box (moov box) in the metadata track, and parses the box to obtain the sinf box;
- Source_type 0
- the metadata of the information description is intended to be a recommendation from the creator of the omnidirectional video, or a recommendation of the user viewing the omnidirectional video, or a recommendation to count the data of the viewing angle.
- the client can distinguish metadata of different sources when receiving the regional metadata. In the case of multiple regional metadata, the user can select the recommended area to be viewed according to individual needs.
- the source information of the metadata is carried in the MPD.
- the source information descriptor is added to the standard element SupplementalProperty/EssentialProperty specified in ISO/IEC 23009-1.
- the source is the source of the information.
- the value of the descriptor is defined as follows:
- the above description can be in the adaptationSet element of the MPD, or in the element of the representation of the MPD.
- the descriptor is in the element of the representation.
- an attribute describing the source of the representation may be added to the element of the adaptationSet element or representation, for example, the attribute is sourceType.
- Presenting the user's media content to the user; when sourceType 1, indicating that the area information in the track is the most interesting area of the user, or indicating the most interesting area from the statistics, the terminal side can use the track
- the information presents the area of interest to most of the users in the omni-directional media to the user.
- the descriptors and attributes are respectively used to describe the area information in the file metadata.mp4 file described in the representation as the recommendation of the producer of the video.
- the processing flow for obtaining the above sample information on the terminal side is as follows:
- the terminal obtains the MPD file and parses the MPD file. If the parsing includes a descriptor whose scheme is urn:mpeg:dash:purpose in the adaptationSet or representation element, parses the value of the descriptor.
- the source information information of the metadata is carried in the SEI.
- the SRC in the above syntax represents a specific value, such as 190, which is not limited herein.
- the payload type in the SEI is SRC
- the syntax in the SEI is as described in the following table.
- the source_type in the payload describes the source information of the area information described by the above SEI.
- the area information presents the area of interest to most of the users in the omnidirectional media to the user.
- the processing flow for obtaining the above video bitstream on the terminal side is as follows:
- the terminal acquires the video code stream, parses the NALU header information in the code stream, and if the parsed header information type is the SEI type, parses the SEI NALU to obtain the payload type of the SEI;
- the area information in the video stream is parsed, the area information is obtained, and the medium corresponding to the obtained area in the omnidirectional medium is presented to the user.
- the semantics of the source information may be extended.
- Language the language of the subsequent string, which uses the language codeword in ISO-639-2/T to represent various languages.
- sourceDescription A string describing the content of the source metadata.
- the semantics of the source information may be extended.
- the semantics of the source information may be extended.
- Reason_description A reason for describing the recommended metadata, or a description of the video content corresponding to the recommended metadata.
- the semantics of the source information may be extended.
- Person_description User age information describing the recommended metadata, or the age range counted. Such as children, youth or old age, or 0-10, 10-20, etc.
- the SourceInformationBox may be included in the scheme information box.
- Source_type takes an integer value, indicating the source type of the metadata.
- the source values of the different values are as follows:
- Date describes the time at which the metadata was generated/recommended.
- ID_lenght describes the length of the ID_description, which is the length of the ID_description -1.
- ID_description describes the name of the referrer
- Reason_lenght describes the length of the reason_description, which is the length of the reason_description -1.
- the reason_description describes the reason for recommending the metadata, or the description information of the video content corresponding to the recommended metadata.
- SourceInformationBox can also take other names, such as natrueInformationBox.
- natrueInformationBox is as follows:
- natrueInformationBox The syntax of natrueInformationBox is:
- Natrue_type takes an integer value indicating the source type of the metadata.
- the source values of the different values are as follows:
- Date describes the time at which the metadata is generated/recommended, and date can be an integer time in seconds, or a representation of other forms of time.
- ID_lenght describes the length of the ID_description, which is the length of the ID_description -1.
- ID_description describes the name of the referrer
- Reason_lenght describes the length of the reason_description, which is the length of the reason_description -1.
- the reason_description describes the reason for recommending the metadata, or the description information of the video content corresponding to the recommended metadata.
- Natrue_type an integer that indicates the type of nature. The following values for natrue_type are specified:
- the recommended viewport timed metadata track is used for indicating a particular person or user
- natrue_type Other values are reserved.
- Date is an integer that declares the recommended time of the metadate(in seconds since midnight,Jan.1,1904,in UTC time)
- ID_lenght indicates the length in byte of the ID_description field minus one.
- ID_description specifies the name of the recommended person. It is a null-terminated string in UTF-8 characters containing a file group name
- Reason_lenght indicates the length in byte of the reason_description field minus one.
- the reason_description specifies the recommended reason or the description of the media content corresponding to the metadata. It is a null-terminated string in UTF-8 characters.
- a SourceInformationBox or a natrueInformationBox may be carried in a tref box of a media track, and a tref is described in the ISO/IEC 14496-12 standard. Is the track reference box, which describes the track associated with the current media track.
- SourceInformationBox or natrueInformationBox can be an extension of the tref box. Aligned(8)class SourceInformationBox extends tref('sinf',0,0).
- the information of the intent/source of the metadata may also be represented by a sample entry type.
- the sample entry type of the region of interest of most users may be 'mroi', and a user recommendation may be 'proi'.
- ' recommended by the author or director, can be 'droi'.
- the terminal side presents the description information of the metadata trajectory that can be recommended to the user to the user, and the user selects the recommendation to be viewed according to the description information.
- the terminal acquires the corresponding metadata track according to the user's selection, parses the obtained metadata track, obtains the area information in the track, and presents the omnidirectional medium according to the area information.
- the terminal feeds back the recommended information of the user to the content server side, and the content service side obtains the metadata track according to the feedback user selection, parses the metadata track information to obtain the area information, and uses the area information to media the area information.
- the area information in the metadata track can also be used to create a movable viewing environment for the user, and the viewing environment rotates the user's viewing environment according to the yaw angle, the pitch angle and the rotation angle in the area information.
- the simulation for example, the viewing environment can be a rotatable seat that can be moved left, right, pitched or rotated according to the area information.
- FIG. 10 is a schematic diagram showing the hardware structure of a computer device 100 according to an embodiment of the present invention.
- the computer device 100 can be used as an implementation manner of a processing device for streaming media information, and can also be used as an implementation manner of a method for processing information of a streaming media.
- the computer device 100 includes a processor 101 and a memory 102.
- the input/output interface 103 and the bus 105 may also include a communication interface 104.
- the processor 101, the memory 102, the input/output interface 103, and the communication interface 104 realize a communication connection with each other through the bus 105.
- the processor 101 can be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing related programs.
- Processor 101 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 101 or an instruction in a form of software.
- the processor 101 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software modules can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
- the storage medium is located in the memory 102, and the processor 101 reads the information in the memory 102, combines the hardware to complete the functions required by the modules included in the processing device of the streaming media information provided by the embodiment of the present invention, or executes the present invention.
- Method for processing information of streaming media provided by the method embodiment.
- the memory 102 can be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
- the memory 102 can store operating systems as well as other applications.
- the function to be performed by the module included in the processing device for implementing the information of the streaming media provided by the embodiment of the present invention, or the method for processing the information of the streaming media provided by the embodiment of the method of the present invention is stored in the memory 102, and the processor 101 performs an operation performed by a module included in the processing device of the information of the streaming media, or performs the method embodiment of the present invention.
- the processing method of media data is stored in the memory 102, and the processor 101 performs an operation performed by a module included in the processing device of the information of the streaming media, or performs the method embodiment of the present invention.
- the input/output interface 103 is for receiving input data and information, and outputting data such as an operation result.
- Communication interface 104 enables communication between computer device 100 and other devices or communication networks using transceivers such as, but not limited to, transceivers. It can be used as an acquisition module or a transmission module in the processing device.
- Bus 105 may include a path for communicating information between various components of computer device 100, such as processor 101, memory 102, input/output interface 103, and communication interface 104.
- computer device 100 shown in FIG. 10 only shows the processor 101, the memory 102, the input/output interface 103, the communication interface 104, and the bus 105, those skilled in the art will understand in the specific implementation process.
- the computer device 100 also includes other devices necessary to achieve normal operation, for example, a display may also be included for displaying video data to be played.
- computer device 100 may also include hardware devices that implement other additional functions, depending on the particular needs.
- computer device 100 may also only include the components necessary to implement embodiments of the present invention, and does not necessarily include all of the devices shown in FIG.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Computer Security & Cryptography (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (16)
- 一种媒体信息的处理方法,其特征在于,所述方法包括:得到媒体数据的元数据信息,所述元数据信息包括元数据的来源信息,所述来源信息用于表示所述媒体数据的推荐方,所述媒体数据是全向媒体数据;根据所述元数据的来源信息处理所述媒体数据。
- 根据权利要求1所述的方法,其特征在于,所述得到媒体数据的元数据信息,包括:得到所述媒体数据的元数据轨迹(track),所述元数据轨迹中包括所述元数据的来源信息。
- 根据权利要求1所述的方法,其特征在于,所述得到媒体数据的元数据信息,包括:得到所述媒体数据的媒体展示描述文件,所述媒体展示描述文件中包括所述元数据的来源信息。
- 根据权利要求1所述的方法,其特征在于,所述得到媒体数据的元数据信息,包括:得到包括所述媒体数据的码流,其中,所述码流还包括辅助增强信息(supplementary enhancement information,SEI),所述辅助增强信息中包括所述元数据的来源信息。
- 根据权利要求1-4任意之一所述的方法,其特征在于,所述元数据的来源信息是一个来源类型标识。
- 根据权利要求1-4任意之一所述的方法,其特征在于,所述元数据的来源信息包括媒体数据的推荐方的语义表示。
- 一种媒体信息的处理装置,其特征在于,所述装置包括:信息获取模块,用于得到媒体数据的元数据信息,所述元数据信息包括元数据的来源信息,所述来源信息用于表示所述媒体数据的推荐方,所述媒体数据是全向媒体数据;处理模块,用于根据所述元数据的来源信息处理所述媒体数据。
- 根据权利要求7所述的装置,其特征在于,所述信息获取模块具体用于:得到所述媒体数据的元数据轨迹(track),所述元数据轨迹中包括所述元数据的来源信息。
- 根据权利要求7所述的装置,其特征在于,所述信息获取模块具体用于:得到所述媒体数据的媒体展示描述文件,所述媒体展示描述文件中包括所述元数据的来源信息。
- 根据权利要求7所述的装置,其特征在于,所述信息获取模块具体用于:得到包括所述媒体数据的码流,其中,所述码流还包括辅助增强信息(supplementary enhancement information,SEI),所述辅助增强信息中包括所述元数据的来源信息。
- 根据权利要求7-10任意之一所述的装置,其特征在于,所述元数据的来源信息是一个来源类型标识。
- 根据权利要求7-10任意之一所述的装置,其特征在于,所述元数据的来源信息包括媒体数据的推荐方的语义表示。
- 一种媒体信息的处理方法,其特征在于,所述方法包括:接收多个客户端发送的用户视角的信息,所述用户视角信息用于指示用户观看全向媒体数据时的视角;根据全部的用户视角的信息确定目标视角;发送所述目标视角对应的媒体数据。
- 一种媒体信息的处理方法,其特征在于,所述方法包括:接收多个客户端发送的用户视角的信息,所述用户视角信息用于指示用户观看全向媒体数据时的视角;根据全部的用户视角的信息确定目标视角;根据所述目标视角生成媒体数据的元数据信息。
- 一种媒体信息的处理装置,其特征在于,所述装置包括:接收器,所述接收器用于接收多个客户端发送的用户视角的信息,所述用户视角信息用于指示用户观看全向媒体数据时的视角;处理器,所述处理器用于根据全部的用户视角的信息确定目标视角;发送器,所述发送器用于发送所述目标视角对应的媒体数据。
- 一种媒体信息的处理装置,其特征在于,所述装置包括:接收器,所述接收器用于接收多个客户端发送的用户视角的信息,所述用户视角信息用于指示用户观看全向媒体数据时的视角;处理器,所述处理器用于根据全部的用户视角的信息确定目标视角,以及根据所述目标视角生成媒体数据的元数据信息。
Priority Applications (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020500115A JP2020526969A (ja) | 2017-07-07 | 2018-03-09 | メディア情報処理方法および装置 |
CA3069031A CA3069031A1 (en) | 2017-07-07 | 2018-03-09 | Media information processing method and apparatus |
SG11201913532YA SG11201913532YA (en) | 2017-07-07 | 2018-03-09 | Media information processing method and apparatus |
AU2018297439A AU2018297439A1 (en) | 2017-07-07 | 2018-03-09 | Method and apparatus for processing media information |
BR112020000093-0A BR112020000093A2 (pt) | 2017-07-07 | 2018-03-09 | método e aparelho de processamento de informações de mídia |
RU2020104035A RU2020104035A (ru) | 2017-07-07 | 2018-03-09 | Способ и устройство обработки мультимедийной информации |
KR1020207002474A KR20200020913A (ko) | 2017-07-07 | 2018-03-09 | 미디어 정보를 처리하는 방법 및 장치 |
EP18829059.7A EP3637722A4 (en) | 2017-07-07 | 2018-03-09 | METHOD AND APPARATUS FOR PROCESSING MULTIMEDIA INFORMATION |
PH12020500015A PH12020500015A1 (en) | 2017-07-07 | 2020-01-02 | Media information processing method and apparatus |
US16/734,682 US20200145716A1 (en) | 2017-07-07 | 2020-01-06 | Media information processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710551238.7 | 2017-07-07 | ||
CN201710551238.7A CN109218274A (zh) | 2017-07-07 | 2017-07-07 | 一种媒体信息的处理方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/734,682 Continuation US20200145716A1 (en) | 2017-07-07 | 2020-01-06 | Media information processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019007096A1 true WO2019007096A1 (zh) | 2019-01-10 |
Family
ID=64950588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/078540 WO2019007096A1 (zh) | 2017-07-07 | 2018-03-09 | 一种媒体信息的处理方法及装置 |
Country Status (12)
Country | Link |
---|---|
US (1) | US20200145716A1 (zh) |
EP (1) | EP3637722A4 (zh) |
JP (1) | JP2020526969A (zh) |
KR (1) | KR20200020913A (zh) |
CN (1) | CN109218274A (zh) |
AU (1) | AU2018297439A1 (zh) |
BR (1) | BR112020000093A2 (zh) |
CA (1) | CA3069031A1 (zh) |
PH (1) | PH12020500015A1 (zh) |
RU (1) | RU2020104035A (zh) |
SG (1) | SG11201913532YA (zh) |
WO (1) | WO2019007096A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113691883A (zh) * | 2019-03-20 | 2021-11-23 | 北京小米移动软件有限公司 | 在vr360应用中传输视点切换能力的方法和装置 |
JP2022538799A (ja) * | 2019-06-25 | 2022-09-06 | 北京小米移動軟件有限公司 | パノラマメディア再生方法、機器及びコンピュータ読み取り可能な記憶媒体 |
US11831861B2 (en) * | 2019-08-12 | 2023-11-28 | Intel Corporation | Methods for viewport-dependent adaptive streaming of point cloud content |
CN111770182B (zh) * | 2020-06-30 | 2022-05-31 | 北京百度网讯科技有限公司 | 数据推送方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105578199A (zh) * | 2016-02-22 | 2016-05-11 | 北京佰才邦技术有限公司 | 虚拟现实全景多媒体处理系统、方法及客户端设备 |
CN105898254A (zh) * | 2016-05-17 | 2016-08-24 | 亿唐都科技(北京)有限公司 | 节省带宽的vr全景视频布局方法、装置及展现方法、系统 |
CN106331732A (zh) * | 2016-09-26 | 2017-01-11 | 北京疯景科技有限公司 | 生成、展现全景内容的方法及装置 |
CN106341600A (zh) * | 2016-09-23 | 2017-01-18 | 乐视控股(北京)有限公司 | 一种全景视频播放处理方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9997199B2 (en) * | 2014-12-05 | 2018-06-12 | Warner Bros. Entertainment Inc. | Immersive virtual reality production and playback for storytelling content |
CN106504196B (zh) * | 2016-11-29 | 2018-06-29 | 微鲸科技有限公司 | 一种基于空间球面的全景视频拼接方法及设备 |
CN106846245B (zh) * | 2017-01-17 | 2019-08-02 | 北京大学深圳研究生院 | 基于主视点的全景视频映射方法 |
-
2017
- 2017-07-07 CN CN201710551238.7A patent/CN109218274A/zh active Pending
-
2018
- 2018-03-09 CA CA3069031A patent/CA3069031A1/en not_active Abandoned
- 2018-03-09 BR BR112020000093-0A patent/BR112020000093A2/pt not_active Application Discontinuation
- 2018-03-09 WO PCT/CN2018/078540 patent/WO2019007096A1/zh unknown
- 2018-03-09 RU RU2020104035A patent/RU2020104035A/ru unknown
- 2018-03-09 JP JP2020500115A patent/JP2020526969A/ja not_active Withdrawn
- 2018-03-09 EP EP18829059.7A patent/EP3637722A4/en not_active Withdrawn
- 2018-03-09 AU AU2018297439A patent/AU2018297439A1/en not_active Abandoned
- 2018-03-09 KR KR1020207002474A patent/KR20200020913A/ko not_active Application Discontinuation
- 2018-03-09 SG SG11201913532YA patent/SG11201913532YA/en unknown
-
2020
- 2020-01-02 PH PH12020500015A patent/PH12020500015A1/en unknown
- 2020-01-06 US US16/734,682 patent/US20200145716A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105578199A (zh) * | 2016-02-22 | 2016-05-11 | 北京佰才邦技术有限公司 | 虚拟现实全景多媒体处理系统、方法及客户端设备 |
CN105898254A (zh) * | 2016-05-17 | 2016-08-24 | 亿唐都科技(北京)有限公司 | 节省带宽的vr全景视频布局方法、装置及展现方法、系统 |
CN106341600A (zh) * | 2016-09-23 | 2017-01-18 | 乐视控股(北京)有限公司 | 一种全景视频播放处理方法及装置 |
CN106331732A (zh) * | 2016-09-26 | 2017-01-11 | 北京疯景科技有限公司 | 生成、展现全景内容的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3637722A4 * |
Also Published As
Publication number | Publication date |
---|---|
RU2020104035A (ru) | 2021-08-09 |
EP3637722A4 (en) | 2020-07-15 |
CA3069031A1 (en) | 2019-01-10 |
PH12020500015A1 (en) | 2020-11-09 |
KR20200020913A (ko) | 2020-02-26 |
CN109218274A (zh) | 2019-01-15 |
US20200145716A1 (en) | 2020-05-07 |
SG11201913532YA (en) | 2020-01-30 |
EP3637722A1 (en) | 2020-04-15 |
AU2018297439A1 (en) | 2020-01-30 |
JP2020526969A (ja) | 2020-08-31 |
BR112020000093A2 (pt) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110121734B (zh) | 一种信息的处理方法及装置 | |
JP6735415B2 (ja) | オーディオビジュアルコンテンツの観察点および観察向きの制御された選択のための方法および装置 | |
RU2711591C1 (ru) | Способ, устройство и компьютерная программа для адаптивной потоковой передачи мультимедийного контента виртуальной реальности | |
CN108965929B (zh) | 一种视频信息的呈现方法、呈现视频信息的客户端和装置 | |
CN109155873B (zh) | 改进虚拟现实媒体内容的流传输的方法、装置和计算机程序 | |
US11902350B2 (en) | Video processing method and apparatus | |
TW201924323A (zh) | 用於浸入式媒體資料之內容來源描述 | |
CN109218755B (zh) | 一种媒体数据的处理方法和装置 | |
US11095936B2 (en) | Streaming media transmission method and client applied to virtual reality technology | |
US20200145716A1 (en) | Media information processing method and apparatus | |
WO2018058773A1 (zh) | 一种视频数据的处理方法及装置 | |
US20210250568A1 (en) | Video data processing and transmission methods and apparatuses, and video data processing system | |
US20190230388A1 (en) | Method and apparatus for processing video data | |
TW201909625A (zh) | 用於在經由超文本傳輸協定(http)之動態自適應串流(dash)中之魚眼虛擬實境視訊之增強的高階發信號 | |
WO2020107998A1 (zh) | 视频数据的处理方法、装置、相关设备及存储介质 | |
WO2018058993A1 (zh) | 一种视频数据的处理方法及装置 | |
WO2018120474A1 (zh) | 一种信息的处理方法及装置 | |
CN108271084B (zh) | 一种信息的处理方法及装置 | |
WO2019195460A1 (en) | Associating file format objects and dynamic adaptive streaming over hypertext transfer protocol (dash) objects | |
WO2023169003A1 (zh) | 点云媒体的解码方法、点云媒体的编码方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18829059 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3069031 Country of ref document: CA Ref document number: 2020500115 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112020000093 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 20207002474 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2018829059 Country of ref document: EP Effective date: 20200107 |
|
ENP | Entry into the national phase |
Ref document number: 2018297439 Country of ref document: AU Date of ref document: 20180309 Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 112020000093 Country of ref document: BR Kind code of ref document: A2 Effective date: 20200103 |