WO2019137313A1 - 一种媒体信息的处理方法及装置 - Google Patents
一种媒体信息的处理方法及装置 Download PDFInfo
- Publication number
- WO2019137313A1 WO2019137313A1 PCT/CN2019/070480 CN2019070480W WO2019137313A1 WO 2019137313 A1 WO2019137313 A1 WO 2019137313A1 CN 2019070480 W CN2019070480 W CN 2019070480W WO 2019137313 A1 WO2019137313 A1 WO 2019137313A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- indication information
- data
- sub
- tile
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title abstract description 7
- 230000010365 information processing Effects 0.000 title abstract description 4
- 238000000034 method Methods 0.000 claims abstract description 102
- 238000012545 processing Methods 0.000 claims abstract description 77
- 230000008569 process Effects 0.000 claims description 30
- 230000000153 supplemental effect Effects 0.000 claims 4
- 230000005540 biological transmission Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 19
- 101150035983 str1 gene Proteins 0.000 description 13
- 238000004891 communication Methods 0.000 description 8
- 239000012634 fragment Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000005538 encapsulation Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 101100311330 Schizosaccharomyces pombe (strain 972 / ATCC 24843) uap56 gene Proteins 0.000 description 3
- 101150002402 Smcp gene Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 101150018444 sub2 gene Proteins 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2362—Generation or processing of Service Information [SI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/858—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
- H04N21/8586—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
Definitions
- the embodiments of the present invention relate to the field of streaming media transmission technologies, and in particular, to a method and an apparatus for processing media information.
- VR virtual reality
- 360-degree video With the increasing popularity of virtual reality (VR) video viewing applications such as 360-degree video, more and more users are joining the large-view VR video viewing experience team.
- This new video viewing application brings new video viewing modes and visual experiences to users while also bringing new technical challenges.
- the spatial area of the VR video is a 360-degree panoramic space, which exceeds the normal visual range of the human eye. Therefore, the user can change the viewing angle at any time during the video viewing (or Perspective).
- the viewing angle of the user is different, and the video images seen will also be different, so the content presented by the video needs to change as the user's perspective changes.
- the client does not need to display all the image areas, only a part of the image in the full image needs to be rendered and rendered on the client.
- the client can splicing the required sub-code streams to obtain a standard code stream, and in order to be compatible with the existing standard codec, the sub-code stream needs to meet certain conditions before being spliced together. .
- the existing omnidirectional media format (OMAF) standard uses an additional metadata track, which marks the track ID and splicing method of the substream that can be spliced, and the client The information of the track is used to guide the corresponding sub-code stream to complete the splicing process according to the agreed arrangement.
- the spliced image code stream ie, the standard code stream
- the standard codec is decoded by the standard codec, and the decoded image is rendered and rendered.
- the video image is divided into 8 sub-images (ie, 8 tiles), and the client-requested viewing angle includes 1, 2, 5, and 6 (four tiles) as an example, and t0 to tN indicate different timings.
- the metadata track transmitted in the above method brings additional bandwidth requirements, and when the number of tracks of the sub-picture stream is large, the track combination of the sub-image stream that can be spliced into the standard stream is more, and thus requires different construction.
- the metadata track makes the flow management more complex.
- An embodiment of the present application provides a method and an apparatus for processing media information, where the quantity information of the tile data included in the code stream or the indication information of whether the code stream can be used for splicing information is added to the sub-code stream data, thereby The information determines whether the code stream can be used for splicing, thereby solving the problem that the additional track information is needed in the prior art in the code stream splicing, which saves the transmission bandwidth and reduces the complexity of the stream management.
- the first aspect provides a method for processing media information, the method comprising: acquiring sub-stream data, where the sub-stream data includes indication information, where the indication information is used to indicate the quantity of tile data included in the sub-stream data, Or the indication information is used to indicate whether the substream data is available for splicing; and the substream data is processed according to the indication information.
- the executive body of the method embodiment of the present application may be a wearable device (such as an AR/VR helmet, AR/VR glasses, etc.), a smart terminal (such as a mobile phone, a tablet computer, etc.), a television, a set top box, etc., having video or image decoding functions. device.
- a wearable device such as an AR/VR helmet, AR/VR glasses, etc.
- a smart terminal such as a mobile phone, a tablet computer, etc.
- a television a set top box, etc., having video or image decoding functions. device.
- the media data acquisition request may be sent, and then the media data is received, where the sub-stream data is included in the media data.
- the terminal may construct a uniform resource locator (URL) through the related attribute and address information in the media presentation description file, and then send an HTTP request to the URL, and then receive the corresponding media data.
- URL uniform resource locator
- the media data may be obtained by pushing, and the media data includes substream data.
- the media data in the embodiment of the present application may be mainly data that is encoded and encapsulated in a video or an image.
- the media data in the embodiment of the present application may also be data encoded and encapsulated in the audio.
- the video is made up of a series of images.
- related examples of media data may refer to the relevant provisions of the media data of the ISO/IEC 23090-2 standard specification.
- the ISO/IEC 23090-2 standard specification is also known as the OMAF (omnidirectional media format) standard specification, which defines a media application format that enables the presentation of omnidirectional media in applications, omnidirectional media. Refers to omnidirectional video (360° video) and related audio.
- the OMAF specification first specifies a list of projection methods that can be used to convert spherical video to two-dimensional video, and secondly how to use the ISO base media file format (ISOBMFF) to store omnidirectional media associated with the media.
- ISO base media file format ISO base media file format
- Metadata and how to encapsulate omnidirectional media data and transmit omnidirectional media data in a streaming media system, such as through dynamic adaptive streaming over HTTP based on hypertext transfer protocol (HTTP) , DASH), Dynamic Adaptive Streaming as specified in the ISO/IEC 23009-1 standard.
- HTTP hypertext transfer protocol
- DASH Dynamic Adaptive Streaming
- the image in the embodiment of the present application may be a complete image collected by a collecting device (such as a camera, etc.), or may be a complete image.
- the resulting image For example, the resolution of the image acquired by the acquisition device is 1024 ⁇ 1024.
- the image of the embodiment of the present application may be 1024 ⁇ 1024, or may be an image of 512 ⁇ 512, or may be an image of 1024 ⁇ 512, or an image of 512 ⁇ 1024, which is not specifically limited in the present application.
- the image data (such as a video code stream, an original code stream, and the like) in the embodiment of the present application is data encoded by the video coding technology, for example, It is image data obtained by encoding an image with ITU H.264, or image data obtained by encoding an image with ITU H.265, or data encoded by other standard or proprietary technologies.
- the indication information of the sub-code stream data may be encapsulated in supplementary enhancement information (SEI).
- SEI supplementary enhancement information
- the indication information of the subcode stream data may be encapsulated in a box in a track.
- the indication information of the substream data may be encapsulated in a media presentation description (MPD) file.
- the media presentation description file includes some metadata of the image. Metadata refers to some attribute information, such as duration, code rate, frame rate, position in the spherical coordinate system, and so on.
- the media presentation description file may refer to the relevant provisions and examples in ISO/IEC 23009-1.
- the indication information of the sub-code stream data may be carried in a sample entry type of a track.
- the sample entry name is ‘onti’, it indicates that the substream in the current track includes a tile data, thereby indicating that the substream is available for splicing.
- the indication information includes at least one identifier, where the identifier is used to indicate the quantity of tile data included in the substream data.
- the indication information may be a flag of one bit, indicating whether the number of tile data is 1 according to the value of the flag, and the indication information may also be an indicator; or the indication information may be a flag of two or more bits, flag The value is used directly to indicate the amount of tile data.
- the sub-code stream data further includes video parameter set (VPS) information
- the sub-stream data is processed according to the indication information, including: according to the indication information and the VPS information, Process substream data.
- VPS video parameter set
- the sub-code stream data further includes sequence parameter set (SPS) information
- SPS sequence parameter set
- the sub-stream data is processed according to the indication information, including: according to the indication information and the SPS information, Process substream data.
- the sub-code stream data further includes picture parameter set (PPS) information
- the processing the sub-code stream data according to the indication information may include: according to the indication information and the PPS information. , processing substream data.
- PPS picture parameter set
- the sub-code stream data further includes slice segment (SS) information, and processing the sub-code stream data according to the indication information, including: processing the sub-code according to the indication information and the SS information.
- SS slice segment
- the sub-code stream data further includes resolution information, and when there are sub-streams of different resolutions, the sub-code stream may be spliced in a form in which a tile includes multiple slices. .
- multiple sub-code streams of low resolution may be combined into one tile data as a slice, that is, one tile data in the spliced code stream may include multiple slices. For example, if there are two sub-streams with the same resolution and the same content, and the two resolutions are 1024 ⁇ 1024 and 512 ⁇ 512, respectively, two 512 ⁇ 512 sub-code streams can be combined into one tile data.
- the sub-code stream when determining that the sub-code stream cannot be used for splicing, the sub-code stream may be directly decoded, and the decoded image and the spliced code stream are decoded. Image stitching is performed to obtain a final display image, so that the utilization of the subcode stream can be improved as compared with discarding the substream that cannot be used for splicing.
- the information of the number of tile data included in the subcode stream or the indication information indicating whether the substream data is available for splicing is added in each substream. Then, according to the indication information, it can be determined whether the sub-code stream is available for splicing, and splicing according to a plurality of sub-code streams that can be used for splicing. After splicing and decoding the required code stream, only a single decoder is needed to decode multiple sub-picture sequences, and no additional track information needs to be transmitted, which saves bandwidth and simplifies the complexity of stream management.
- a second aspect provides a method for processing media information, the method comprising: acquiring substream data of an original image; and determining indication information of the substream data according to the number of tile data included in the substream data, The indication information is used to indicate the number of tile data included in the substream data, or the indication information is used to indicate whether the substream data is available for splicing; the substream data is sent to the terminal, and the substream data includes indication information.
- the indication information of the sub-code stream may be encapsulated in supplementary enhancement information (SEI).
- SEI supplementary enhancement information
- the indication information of the sub-code stream may be encapsulated in a box of a track.
- the indication information of the sub-code stream may be encapsulated in a sample entry type of a track.
- the indication information of the sub-code stream may be encapsulated in a media presentation description (MPD) file.
- MPD media presentation description
- the sub-code stream data further includes video parameter set (VPS) information
- the sub-stream data is processed according to the indication information, including: according to the indication information and the VPS information, Process substream data.
- VPS video parameter set
- the sub-code stream data further includes sequence parameter set (SPS) information
- SPS sequence parameter set
- the sub-stream data is processed according to the indication information, including: according to the indication information and the SPS information, Process substream data.
- the sub-code stream data further includes picture parameter set (PPS) information
- processing the sub-code stream data according to the indication information may include: according to the indication information and the PPS information. , processing substream data.
- PPS picture parameter set
- the sub-code stream data further includes slice segment (SS) information, and processing the sub-code stream data according to the indication information, including: processing the sub-code according to the indication information and the SS information.
- SS slice segment
- the substream data further includes resolution information.
- the two resolutions are 1024 ⁇ 1024 and 512 ⁇ 512, respectively, and two 512 ⁇ 512 can be used.
- the substreams are combined as a tile data as slices.
- a device for processing media information includes: an obtaining module, configured to acquire substream data, where the substream data includes indication information, where the indication information is used to indicate tile data included in the substream data.
- the quantity, or the indication information is used to indicate whether the sub-stream data is available for splicing; and the processing module is configured to process the sub-stream data according to the indication information.
- the indication information is carried in supplementary enhancement information (SEI).
- SEI supplementary enhancement information
- the indication information is carried in a box of a track.
- the indication information is carried in a sample entry type of a track.
- the indication information is carried in a media presentation description (MPD) file.
- MPD media presentation description
- the subcode stream data further includes video parameter set (VPS) information
- the processing module is further configured to: process the substream data according to the indication information and the VPS information.
- VPS video parameter set
- the subcode stream data further includes sequence parameter set (SPS) information
- the processing module is further configured to: process the substream data according to the indication information and the SPS information.
- SPS sequence parameter set
- the sub-code stream data further includes picture parameter set (PPS) information
- the processing module is further configured to: process the sub-code stream data according to the indication information and the PPS information.
- PPS picture parameter set
- the subcode stream data further includes slice segment (SS) information
- the processing module is further configured to: process the substream data according to the indication information and the SS information.
- the substream data further includes resolution information.
- a fourth aspect a device for processing media information, the device comprising: an obtaining module, configured to acquire sub-stream data of an original image; and a processing module, configured to determine, according to the quantity of tile data included in the sub-code stream data, The indication information of the substream data, the indication information is used to indicate the number of tile data included in the substream data, or the indication information is used to indicate whether the substream data is available for splicing; and a sending module is configured to The terminal transmits substream data, and the substream data includes indication information.
- the indication information is carried in supplementary enhancement information (SEI).
- SEI supplementary enhancement information
- the indication information is carried in a box of a track.
- the indication information is carried in a sample entry type of a track.
- the indication information is carried in a media presentation description (MPD) file.
- MPD media presentation description
- the subcode stream data further includes video parameter set (VPS) information
- the processing module is further configured to: process the substream data according to the indication information and the VPS information.
- VPS video parameter set
- the sub-code stream data further includes sequence parameter set (SPS) information
- the processing module is further configured to: process the sub-code stream data according to the indication information and the SPS information.
- SPS sequence parameter set
- the sub-code stream data further includes picture parameter set (PPS) information
- the processing module is further configured to: process the sub-code stream data according to the indication information and the PPS information.
- PPS picture parameter set
- the sub-stream data further includes slice segment (SS) information
- the processing module is further configured to: process the sub-stream data according to the indication information and the SS information.
- the substream data further includes resolution information.
- a device for processing media information comprising: one or more processors, and a memory.
- the memory is coupled to one or more processors; the memory is for storing computer program code, the computer program code comprising instructions, and when the one or more processors execute the instructions, the processing device performs any of the first aspect and the first aspect as described above
- the sixth aspect provides a processor, where the processor is configured to perform the processing method of the media information according to the foregoing first aspect and any one of the possible implementation manners of the first aspect, or And a method of processing media information according to any of the possible implementations of the second aspect.
- a computer readable storage medium stores instructions that, when executed on a device, cause the device to perform the first aspect and the first A method of processing media information as described in any of the possible implementations.
- a still further aspect of the present application provides a computer readable storage medium having stored therein instructions that, when executed on a device, cause the device to perform the second aspect and the second A method of processing media information provided by any of the possible implementations of the aspect.
- a computer program product comprising instructions, when executed on a computer, causing the computer to perform the first aspect and any one of the possible implementations of the first aspect The method of processing media information.
- a further aspect of the present application provides a computer program product comprising instructions, when executed on a computer, causing the computer to perform any of the possible implementations of the second aspect and the second aspect described above The method of processing media information.
- 1 is a schematic diagram of video codec transmission of a subcode stream
- FIG. 2 is a schematic structural diagram of a video codec and transmission system according to an embodiment of the present application
- FIG. 3 is a schematic flowchart of a method for processing media information according to an embodiment of the present disclosure
- FIG. 4 is a schematic structural diagram of a device for processing media information according to an embodiment of the present disclosure
- FIG. 5 is a schematic flowchart diagram of another method for processing media information according to an embodiment of the present disclosure.
- FIG. 6 is a schematic structural diagram of another apparatus for processing media information according to an embodiment of the present disclosure.
- FIG. 7 is a schematic diagram of a first code stream splicing provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of a layout of a slice in a code stream according to an embodiment of the present disclosure
- FIG. 9 is a schematic diagram of arrangement of slices in another code stream according to an embodiment of the present disclosure.
- FIG. 10 is a schematic diagram of a second code stream splicing provided by an embodiment of the present application.
- FIG. 11 is a schematic diagram of a third code stream splicing provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of a fourth code stream splicing provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of a fifth code stream splicing provided by an embodiment of the present application.
- FIG. 14 is a schematic diagram of a sixth code stream splicing provided by an embodiment of the present application.
- FIG. 15 is a schematic diagram of a seventh code stream splicing provided by an embodiment of the present application.
- FIG. 16 is a schematic diagram of an eighth code stream splicing provided by an embodiment of the present application.
- FIG. 17 is a schematic structural diagram of a computer device according to an embodiment of the present application.
- Video decoding refers to the process of restoring a code stream into a reconstructed image according to specific grammar rules and processing methods.
- Video encoding refers to the process of compressing a sequence of images into a stream of code.
- Video coding A general term for video encoding and video decoding.
- the Chinese translation is the same as video encoding.
- Panoramic video It can also be called virtual reality (VR) panoramic video, or 360 degree panoramic video or 360-degree video. It is a video that uses 360-degree shooting with multiple cameras. The user is watching the video. At that time, you can adjust the video up and down and left and right to watch.
- VR virtual reality
- Tile refers to the block-shaped coding area obtained by dividing the image to be encoded in the video coding standard HEVC.
- One frame of image can be divided into multiple tiles, and these tiles together constitute the frame image.
- Each tile can be encoded independently.
- Sub-picture The image is divided to obtain a part of the original image called a sub-image of the image.
- the shape of the sub-image is square.
- the sub image may be a partial image in one frame of the image.
- Motion-constrained tile sets refers to an encoding technique for tiles that limits the motion vectors inside the tile during encoding so that the tiles in the same position in the image sequence Image pixels outside the location of the tile area are not referenced in the time domain, so each tile in the time domain can be decoded independently.
- Image sub-area For convenience of the description of the present application, an image sub-area is used as a general term for a tile or a sub-image. It can be understood that the sub-images in the present application may also include images divided according to tile coding.
- Track is defined in the standard ISO/IEC 14496-12 as "timed sequence of related samples (qv) in an ISO base media file.
- a track corresponds to a sequence of images or sampled
- a hint track corresponds to a streaming channel.
- Translate a sequence of time attributes of related samples in an ISO media file.
- a track is an image or sequence of audio samples; for a hint track, a track Corresponds to a stream channel.”
- Track refers to a series of ISOBMFF-packaged samples with time attributes, such as video track. The video sample is the code stream generated by the video encoder after each frame. All video samples are encapsulated according to the ISOBMFF specification. Generate samples.
- Box The box is defined in the ISO/IEC 14496-12 standard as "object-oriented building block defined by a unique type identifier and length.NOTE: Called'atom'in some specifications, including the first definition of MP4. "Translating" an object-oriented building block, defined by a unique type identifier and length. Note: In some specifications it is called “atoms”, including the first definition of MP4. "box is the basic unit that makes up the ISOBMFF file. The box can contain other boxes.
- Supplementary enhancement information A type of Network Abstract Layer Unit (NALU) defined in the Video Codec Standard (h.264, h.265).
- the Media Presentation description is a document specified in the standard ISO/IEC 23009-1, in which the client constructs the metadata of the HTTP-URL. Include one or more period elements in the MPD, each period element includes one or more adaptation sets, each adaptation set includes one or more representations, each representation includes One or more segments, the client selects the expression based on the information in the MPD and builds the segmented http-URL.
- FIG. 1 a subcode stream based video codec transmission block diagram is shown in FIG. 1 .
- a video or image is obtained by the video capture device.
- the video capture device may be a video or image capture device such as a camera, camera, or the like, or a receiving device that receives video or image data from other devices.
- the pre-encoding processor is used to perform some processing before encoding the video or the image, wherein the pre-encoding processor module may include sub-region division (or division) of the video or the image, it being understood that the pre-encoding processor may also be A part of the video encoder, or the above function of the pre-encoding processor performed by the video encoder.
- the video encoder is used to encode a video or an image according to a certain encoding rule.
- the encoding method specified in H.264, H.265 may be used, or the video or image may be encoded according to other private encoding techniques.
- the code stream encapsulating device may perform code stream encapsulation on the code stream according to a certain encapsulation format, for example, may be an encapsulation format of an MPEG-2 TS stream or other encapsulation manner.
- the encapsulated code stream is then transmitted by the transmitting transmission device to the terminal.
- the receiving device is configured to receive the code stream from the server side, and then decapsulate the code stream by the code stream decapsulation device, and then obtain a plurality of sub-code streams and send them to the video decoder.
- the subcode stream is decoded by the video decoder to generate a decoded video or image, and finally displayed by the display device.
- the server and the terminal shown in FIG. 2 are a representation of the sender and the receiver of the code stream.
- the server may be a device such as a smart phone or a tablet
- the terminal also It can be a device such as a smart phone or a tablet, which is not specifically limited in this embodiment.
- the sub-code stream of the embodiment of the present application is relative to the spliced code stream.
- the acquired subcode stream may be a separately transmitted code stream.
- the sub-code stream in the embodiment of the present application may also be referred to as sub-code stream data, and the sub-code stream and the sub-code stream data may be replaced with each other.
- an embodiment of the present application provides a method for processing media information S30, where the method S30 includes:
- S301 The terminal acquires the sub-code stream data, where the sub-code stream data includes indication information, where the indication information is used to indicate the quantity of the tile data included in the sub-code stream data.
- S302 The terminal processes the subcode stream data according to the indication information.
- an embodiment of an aspect of the present application provides a media information processing apparatus 40.
- the apparatus 40 includes an obtaining module 401 and a processing module 402.
- the obtaining module 401 is configured to acquire sub-stream data, where the sub-code stream data includes indication information, where the indication information is used to indicate the quantity of tile data included in the sub-code stream data, and the processing module 402 is configured to The indication information processes the subcode stream data.
- an embodiment of an aspect of the present application provides another method for processing media information S50, where the method S50 includes:
- S501 The server acquires substream data of the original image.
- S502 The server determines indication information of the sub-code stream data according to the quantity of tile data included in the sub-code stream data.
- S503 The server sends the sub-code stream data to a terminal, where the sub-code stream data includes the indication information.
- an embodiment of an aspect of the present application provides a media information processing apparatus S60, where the apparatus S60 includes: an obtaining module 601, a processing module 602, and a sending module 603.
- the obtaining module 601 is configured to acquire the sub-code stream data of the original image
- the processing module 602 is configured to determine the indication information of the sub-code stream data according to the quantity of the tile data included in the sub-code stream data
- the sending module 603 And the method is configured to send the subcode stream data to a terminal, where the subcode stream data includes the indication information.
- the embodiment of the present application provides a method based on sub-stream stream splicing processing, and a corresponding encoding transmission and decoding presentation manner.
- the whole system processing procedure of the embodiment of the present application is shown in FIG. 7, and the implementation steps thereof are described in detail as follows:
- the input video image (which may be referred to as the original image) is divided into regions (each region may be referred to as a tile) and encoded in the form of MCTS to generate a standard-compliant video stream.
- the MCTS refers to limiting the inter-frame intra prediction motion vector on the time domain and the air domain, and the predicted pixel does not exceed the image boundary.
- the setting when the server uses the MCTS format encoding may include: turning off the cross-tile de-square filtering and the SAO filtering to Ensure that the decoding between the tile data is independent. After that, the server splits the video code stream to obtain multiple sub-code streams.
- the sub-code stream may include encoded data of one tile, and may also include encoded data of multiple tiles.
- the coded data of the tile is the coded data obtained by encoding the tile area in the original image. In the embodiment of the present application, the coded data of the tile may also be referred to as tile data.
- the server splits the video code stream may include: the server detects a NALU start code in the video code stream, and divides different NALUs, and each NALU includes a tile data. Or a plurality of tile data; the server determines a coding parameter set corresponding to one or more tile data included in different NALUs (ie, determines a coding parameter set of a subcode stream), and copies each NALU included in the video code stream.
- the tile data adds a NALU start code to the copied tile data and the encoding parameter set to form a substream.
- the type in each NALU header may be used to indicate coding parameter set information of the video code stream and tile data included in the NLAU.
- the coding parameter set may include video parameter set (VPS) information, sequence parameter set (SPS) information, picture parameter set (PPS) information, and slice segment (SS) information. Wait.
- the server may use the encoding parameter set in the video code stream to modify only some of the parameters in the encoding parameter set, as described below.
- the server may modify the encoding specification level set in the VPS information, for example, changing the level from 5.1 to 4.1.
- the server can modify the SPS information to include parameters indicating image width and height, namely pic_width_in_luma_samples and pic_height_in_luma_sample. For example, when the number of tile data included in the substream is 1 (ie, the subimage corresponding to the substream includes a tile), pic_width_in_luma_samples and pic_height_in_luma_samples are 640 and 640 respectively; when the substream contains tiles When the number of data is 2 (that is, the sub-image corresponding to the sub-stream includes two tiles), if two tiles are adjacent to each other, pic_width_in_luma_samples and pic_height_in_luma_samples are 1280 and 640, respectively, if two tiles are adjacent to each other, Pic_width_in_luma_samples and pic_height_in_luma_samples are 640 and 1280, respectively.
- the server may set the tiles_enabled_flag (tile enable flag) and num_tile_columns_minus1 in the PPS information according to the number of tile data included in the subcode stream (ie, the number of tiles included in the sub image corresponding to the subcode stream).
- Tile column number identifier ie, the number of tiles included in the sub image corresponding to the subcode stream.
- num_tile_rows_minus1 tile row number identifier
- uniform_spacing_flag average allocation identifier
- the tiles_enabled_flag is set to 0; when the sub-image corresponding to the sub-stream includes two adjacent tiles of the same resolution, the tiles_enabled_flag is set to 1, Set num_tile_columns_minus1 to 1, num_tile_rows_minus1 to 0, and uniform_spacing_flag to 1.
- the server can construct each tile data included in the subcode stream as a slice partitioning manner, and the server can perform header information of the SS first_slice_segment_in_pic_flag (first fragment identifier), slice_segment_address (fragment) Parameters such as address) are set.
- the first_slice_segment_in_pic_flag may be set to 1, and slice_segment_address is set to 0; when the substream contains two tile data, the first tile of each frame is used as the first For slice, the corresponding first_slice_segment_in_pic_flag is set to 1, slice_segment_address is set to 0, the second tile of each frame is used as the second slice, the corresponding first_slice_segment_in_pic_flag is set to 1, and slice_segment_address is set to 10 (as shown in Figure 8, The image is divided according to the CTU (64x64) and scanned according to the full picture zigzag. The first CTU label of the current slice is the slice_segment_address value of the current slice.
- the server may also construct each slice data or two or more tile data as a slice.
- the embodiment of the present application only uses one tile data as a slice division manner.
- the embodiments of the present application are not limited.
- the server may write the quantity information of the tile data included in the subcode stream into the SEI message of the code stream.
- the quantity information of the tile data may be used to indicate whether the sub-code stream is available for splicing. When the number of tile data is 1, it may be used to indicate that the sub-code stream is usable for splicing. When the number of tile data is greater than 1, the information is available. Indicates that the substream cannot be used for splicing.
- the SEI message can be represented by the following syntax elements:
- sub_picture_info_aggregate (payloadSize)
- Sub_pic_str_only_one_tile The value of 1 indicates that the substream includes no more than one tile data (ie, the sub-image corresponding to the substream includes no more than one tile), and the value of 0 indicates that the substream includes two or more.
- Tile data ie, the sub-image corresponding to the sub-stream includes two or more Tiles.
- the value of the Tile_num_subpicture value indicates the number of tile data contained in the substream.
- a specific method of obtaining multiple substreams is as follows, but is not limited to this method:
- each NALU in the video stream and parsing a coding parameter set of the video stream (including VPS information, SPS information, PPS information, and SS information); according to the Tile data and the video stream included in each NALU Encoding a parameter set to generate a coding parameter set of multiple sub-code streams;
- Each substream can then be encapsulated and stored on the server.
- the terminal requests the required code stream from the server and decapsulates the received code stream.
- the SEI message sub_picture_info_aggregate (payloadSize) of each sub-code stream is parsed, and according to the values of the syntax elements in Table 1.1 and Table 1.2 above, it is judged whether each sub-code stream is available for code stream splicing.
- sub_pic_str_only_one_tile is 1, it indicates that the sub-code stream can be used for splicing; when sub_pic_str_only_one_tile is not 1, it indicates that the sub-code stream cannot be used for splicing, then the sub-code stream is discarded and does not participate in subsequent processing.
- one of the substreams may be copied and filled, and the copied substream may not participate in decoding and rendering.
- sub0 to sub15 are 16 tiles of a video image
- str0 is a substream corresponding to four tile codes of sub0, sub1, sub4, and sub5
- str1 to str16 are respectively sub0 to sub15 (ie, each substream) Contains a tile) corresponding substream after encoding.
- the only_one_tile_flag corresponding to the sub-code stream str0 is equal to 0
- the only_one_tile_flag corresponding to the sub-code streams str1, str2, str3, str5, str6, and str7 are equal to 1, indicating that str0 cannot be used for splicing (indicated by x in FIG.
- str1, str2, str3, str5, str6, and str7 can be used for splicing (in Figure 7, ⁇ can not be used for splicing).
- ⁇ can not be used for splicing.
- the terminal detects the NALU start code in the sub-code streams str1, str2, str3, str5, str6, and str7, and divides different NALUs; the coding parameter set and the encoded data of each sub-code stream can be determined by the type in the NALU header.
- the terminal may select one of the sub-code streams str1, str2, str3, str5, str6, and str7 as a reference for the coding parameter set. For example, select str1 as the coding parameter set construction reference, and determine new according to the encoding parameter set information of str1.
- a set of encoding parameters for a code stream ie, a standard code stream spliced by a substream).
- the encoded data of the corresponding tile is copied from the sub-code streams str1, str2, str3, str5, str6, and str7 as the encoded data of the tile corresponding position of the new code stream, which is the encoding parameter set of the new code stream and the new code stream.
- the encoded data is added to the NALU start code and spliced into a standard code stream in a certain order.
- the terminal determines the encoding parameter set of the new code stream (for example, including VPS information, SPS information, PPS information)
- the terminal can use the encoding parameter set of str1 to modify only some of the parameters in the encoding parameter set, as described below.
- the terminal may modify the coding level set in the VPS information, for example, changing the level from 4.1 to 5.1.
- the terminal can modify the SPS information to include parameters indicating image width and height, namely pic_width_in_luma_samples and pic_height_in_luma_sample.
- the sub-image requested by the terminal includes 3 ⁇ 2 tiles (ie, sub0 to sub2 and sub4 to sub6), and pic_width_in_luma_samples and pic_height_in_luma_samples are 1920 (ie, 640 ⁇ 3) and 1280 (ie, 640 ⁇ 2), respectively.
- the terminal may set the tiles_enabled_flag (tile enable flag), num_tile_columns_minus1 (tile column number identifier), num_tile_rows_minus1 (tile row number identifier), uniform_spacing_flag (average allocation identifier) in the PPS information according to the number of tiles included in the sub-image. And other parameters related to tile partitioning. For example, in FIG.
- the terminal may set the tiles_enabled_flag to 1, set num_tile_columns_minus1 to 2 (the sub-image includes three columns, starting from 0), and set num_tile_rows_minus1 to 1 (the sub-image includes Two lines, starting from 0), set uniform_spacing_flag to 1 (ie tiles are evenly allocated).
- the terminal can construct each tile data included in each sub-code stream as a slice division manner, and the terminal can perform header information first_slice_segment_in_pic_flag (first fragment identification) and slice_segment_address (for the SS parameter). Parameters such as the fragment address) are set. For example, in the arrangement form in FIG. 7, first_slice_segment_in_pic_flag is set to 1 in the header information of the SS parameter of str1 of each frame, and the remaining first_slice_segment_in_pic_flag is set to 0; str1, str2, str3, and str5 in the SS parameter header information of each frame.
- slice_segment_address corresponding to the encoded data of str6 and str7 can be determined by calculation, and the slice_segment_address values in FIG. 7 are respectively set to 0, 10, 20, 300, 310, 320 (as shown in FIG. 9, the image is divided according to CTU ( 64x64), according to the full-image zigzag scan, the first CTU label of the current slice is the slice_segment_address value of the current slice).
- the first sub-code stream is A
- the second sub-code stream is B
- the sub-code stream A includes 4 frames (frames) of data (ie, A0 to A3) and is encapsulated in In track1
- the subcode stream B also includes 4 frames of data (B0 to B3) and is encapsulated in track2
- the code stream structure after the subcode stream A and the subcode stream B are spliced can be as shown in the code stream C in FIG. Show.
- SSH represents header information in the SS parameters.
- a specific method of splicing a plurality of substreams is as follows, but is not limited to the method:
- the SEI message of the sub_picture_info_aggregate of each sub-code stream is parsed, the number of tile data included in each sub-code stream is obtained, and the sub-code stream of the number of tile data is determined to be a sub-code stream that can be used for splicing;
- the terminal decodes the spliced code stream and renders the presentation on the display device.
- the server may also add, in each sub-stream, indication information of whether the sub-code stream can be used for splicing information; correspondingly, after acquiring the sub-code stream, the terminal may directly determine the sub-code stream according to the indication information. Can be used for splicing.
- indication information is used to indicate whether the sub-code stream is available for splicing, or the indication information is used to indicate the number of tile data included in the sub-code stream, in the specific implementation process of the server side and the terminal side, only adding and acquiring The content of the indication information is different, and the other corresponding operations are consistent.
- the embodiments of the present application are not described herein again.
- the original image is tile-coded to obtain a video code stream, and multiple sub-code streams are obtained from the video code stream, and the number of tile data included in the sub-code stream is added to each sub-code stream.
- the information that can be used for splicing enables the terminal to determine the sub-streams available for splicing, splicing according to a plurality of sub-streams available for splicing. After splicing and decoding the required code stream, only a single decoder is needed to decode multiple sub-picture sequences without transmitting additional track information, which saves bandwidth and simplifies the complexity of system layer flow management.
- the embodiment of the present application provides another method based on sub-stream concatenation processing, and corresponding encoding transmission and decoding presentation manners.
- the entire system processing procedure of the embodiment of the present application is as shown in FIG. 11, and the implementation steps thereof are described in detail as follows:
- the input video image is divided into sub-images, and each sub-image is encoded to generate a plurality of sub-streams.
- each sub-image is encoded to generate a plurality of sub-streams.
- it can be the same width and height (here, the width and height of different sub-images after division are the same, not the width and height of the same sub-image, and the width and height of the same sub-image may be the same or different.
- the original image is divided to obtain a plurality of square sub-images.
- a prediction motion vector restriction may be performed for the sub-image such that the predicted pixel of the sub-image does not exceed one or some boundaries of the four boundaries of the sub-image.
- the quantity information of the tile data included in the sub-code stream can be written into the SEI message of the sub-code stream in the manner of the above embodiment.
- the syntax elements are the same as those in the foregoing embodiment. For details, refer to the description of the syntax elements in the foregoing embodiments, and the details are not described herein again.
- the sub_pic_str_only_one_tile value of 1 indicates that the sub-code stream includes no more than one tile data (ie, the sub-image corresponding to the sub-code stream includes no more than one tile), and the value of 0 indicates the sub-code.
- the stream includes two or more tile data (ie, the sub-image corresponding to the sub-stream includes two or more Tiles).
- sub0 to sub15 in FIG. 11 are 16 tiles of a video image
- str0 is a subcode stream obtained by encoding a sub-image including four tiles of sub0, sub1, sub4, and sub5, and str1 to str16 are respectively pair sub-images.
- the sub-code stream obtained by sub0 to sub15 (that is, each sub-picture contains one tile) is encoded.
- the only_one_tile_flag corresponding to str0 is equal to 0
- the only_one_tile_flag corresponding to str1 to str16 is equal to 1.
- the ⁇ in Fig. 11 is used to indicate that it cannot be used for splicing
- ⁇ is used to indicate that it can be used for splicing
- Each substream is then encapsulated and stored on the server.
- the server side of the foregoing embodiment performs tile coding on the original image, and only one encoder is needed. After encoding, a video code stream is obtained, which needs to be from the video stream.
- the server side performs sub-picture division on the original image, and each sub-picture is encoded by a single encoder, and the encoded code stream is multiple sub-code streams.
- the method in which the terminal side requests the server to request the code stream, decapsulates the code stream, and splicing the plurality of sub-code streams is consistent with the process in the foregoing embodiment.
- the terminal side requests the server to request the code stream, decapsulates the code stream, and splicing the plurality of sub-code streams.
- the original image is divided into sub-images, and each sub-image is encoded to obtain a plurality of sub-code streams, and the information of the number of tile data included in the sub-code stream is added in each sub-code stream.
- the terminal can determine a sub-code stream that can be used for splicing, and splicing according to a plurality of sub-code streams that can be used for splicing. After splicing and decoding the required code stream, only a single decoder is needed to decode multiple sub-picture sequences, and no additional track information needs to be transmitted, which saves bandwidth and simplifies the complexity of stream management.
- the sub-code stream when there are sub-streams of different resolutions in the server, the sub-code stream may also be spliced in the form of a tile containing multiple slices.
- the implementation process on the server side is consistent with the process in the foregoing embodiment.
- the terminal side obtains the sub-code stream that can be used for splicing, and performs sub-stream splicing, the low-resolution multiple sub-code streams can be combined into one slice.
- the tile data that is, one tile data in the spliced code stream may include a plurality of slices.
- the server includes two sub-streams with the same resolution and the same content. The two resolutions are 1024 ⁇ 1024 and 512 ⁇ 512, respectively, and the terminal can combine two 512 ⁇ 512 sub-streams as a slice and combine them into one tile. data.
- the sub-code stream with a resolution of 1024 ⁇ 1024 includes b1 to b8, and the sub-code stream with a resolution of 512 ⁇ 512 includes m1 to m8, and the sub-code streams b2, m3, and m4 are taken as an example for the terminal.
- the process of splicing the substream is described in detail.
- the terminal detects the NALU start code in the sub-code streams b2, m3, and m4, and divides different NALUs; the coding parameter set and the encoded data of each sub-code stream can be determined by the type in the NALU header.
- the terminal may select one of the sub-code streams b2, m3, and m4 as the coding parameter set construction reference, for example, select b2 as the coding parameter set construction reference, and determine the new code stream (ie, the sub-code according to the coding parameter set information of b2). The set of coding parameters for the streamed standard code stream). Then, the encoded data of the corresponding tile is copied from the sub-code streams b2, m3, and m4 as the encoded data of the tile corresponding position of the new code stream, and the NALU start is added to the encoding parameter set of the new code stream and the encoded data of the new code stream. Codes are stitched into a standard stream in a certain order.
- the terminal may use the encoding parameter set of b2, and only modify some parameters of the encoding parameter set, specifically As described below.
- the terminal may modify the coding level set in the VPS information, for example, changing the level from 4.1 to 5.1.
- the terminal can modify the SPS information to include parameters indicating image width and height, namely pic_width_in_luma_samples and pic_height_in_luma_sample.
- the width and height information of the decoded image in FIG. 12 can be set to 1536 (ie, 1024+512) and 1024, respectively, pic_width_in_luma_samples and pic_height_in_luma_samples.
- the terminal may set the tiles_enabled_flag (tile enable flag), num_tile_columns_minus1 (tile column number identifier), num_tile_rows_minus1 (tile row number identifier), uniform_spacing_flag (average allocation identifier) in the PPS information according to the number of tiles included in the decoded image. ), and column_width_minus1[i] (column number identification, i is 0 to (num_tile_columns_minus1)-1, when num_tile_columns_minus1 is 0, then column_width_minus1[i] does not exist) and other parameters related to tile division. For example, in FIG.
- the terminal may set the tiles_enabled_flag to 1, set num_tile_columns_minus1 to 1, set num_tile_rows_minus1 to 0, set uniform_spacing_flag to 0 (ie, the tiles are unevenly allocated), and set column_width_minus1[0] to 16.
- the terminal can combine multiple low-resolution sub-streams as slices, combine them into one tile data, and splicing with high-resolution tile data, and each tile data is constructed by slice division.
- the terminal sets parameters such as header information first_slice_segment_in_pic_flag (first fragment identifier) and slice_segment_address (segment address) of the SS parameter. For example, in the arrangement form in FIG.
- first_slice_segment_in_pic_flag is set to 1 in the header information of the SS parameter of b2 of each frame, and the remaining first_slice_segment_in_pic_flag is set to 0; encoding of b2, m3, and m4 in the SS parameter header information of each frame
- the value of the slice_segment_address corresponding to the data can be determined by calculation, and the slice_segment_address values in FIG. 12 are set to 0, 16, and 208, respectively.
- a subcode stream of 1024 ⁇ 1024 is b2
- two subcode streams of 512 ⁇ 512 are respectively m3 and m4
- a subcode stream b2 includes data of four frames (ie, A0).
- ⁇ A3) and encapsulated in track b2 subcode streams m3 and m4 also include 4 frames of data (B0 to B3) and are encapsulated in track m3 and track m4, then subcode streams b2, m3 and m4 are spliced
- the code stream structure can be as shown in code stream C in FIG.
- SSH represents header information in the SS parameters.
- the terminal can determine the subcode stream that can be used for splicing, and when the subcode stream that can be used for splicing includes multiple At different resolutions, multiple sub-streams of low resolution can be combined into one tile data, and combined with high-resolution tile data, so that multiple sub-streams with different resolutions can be realized. Splicing.
- the terminal side may also request to the server side to download all the code streams, and select the required code stream according to the user behavior on the terminal side, and perform decapsulation. Then, according to the method of code stream splicing provided by the above embodiment, sub-stream splicing is performed to obtain a standard code stream.
- the embodiment is added to the SEI message for the quantity information of the tile data included in the sub-code stream in the foregoing embodiment, for the sub-splicing
- the code stream is handled in some way.
- the terminal requests the required code stream from the server and decapsulates the received code stream.
- the SEI message sub_picture_info_aggregate (payloadSize) of each sub-code stream is parsed, and according to the values of the syntax elements in Table 1.1 and Table 1.2 above, it is judged whether each sub-code stream is available for code stream splicing.
- sub_pic_str_only_one_tile When sub_pic_str_only_one_tile is 1, it indicates that the subcode stream can be used for splicing; when sub_pic_str_only_one_tile is not 1, it indicates that the subcode stream cannot be used for splicing, the subcode stream is directly decoded, and the decoded image and the spliced code stream are decoded. The image is stitched together to obtain the final displayed image.
- str0 is a sub-code stream that cannot be used for splicing
- the image obtained by the terminal decoding the sub-code stream str0 includes sub0, sub1, sub4, and sub5
- the decoded image of the spliced code stream includes sub2 and Sub6
- the display image obtained by splicing the two images includes sub0 ⁇ sub2 and sub4 ⁇ sub6.
- the sub-code stream when it is determined that the sub-code stream cannot be used for splicing, the sub-code stream may be directly decoded, and the image obtained after decoding and the image obtained by decoding the spliced code stream are image-spliced to obtain a final image.
- the image is displayed so that the utilization of the substream can be improved compared to discarding the substream that cannot be used for splicing.
- the modification includes the server side to analyze the generation of the amount information of the tile data included in the sub-stream and the analysis of the quantity information of the tile data included in the sub-stream.
- Each sub-stream is encapsulated, and each sub-stream can be independently packaged in a track, such as a sub-picture track.
- the syntax description information of the number of tile data included in the subcode stream may be added to the sub-picture track, for example:
- only_one_tile_presence_flag indicates that the current track's code stream can participate in splicing; when the value is 0, the current track's code stream cannot participate in splicing.
- Tile_num_subpicture This value represents the number of tiles in the track's stream.
- the terminal requests the required code stream from the server and decapsulates it.
- the SubPictureCompositionBox in the spco box is parsed to obtain syntax information of the number of tile data included in the subcode stream, thereby obtaining information about whether each substream can be used for splicing, and selecting only_one_tile_presence_flag equal to 1. Track for subsequent processing.
- the embodiment adds the quantity information of the tile data included in the subcode stream in the file format specified by the ISOBMFF.
- a sample entry type is added to the track: ‘onti’.
- the sample entry name is ‘onti’, it indicates that the substream in the current track includes a tile data, thereby indicating that the substream is available for splicing.
- the embodiment provides the quantity information of the tile data included in the subcode stream in the MPD.
- the new EssentialProperty attribute onti@value is specified, and the quantity information of the tile data included in the substream is expressed.
- the onti@value attribute is described in Table 4.
- Table 4 describes the onti@value attribute in "urn:mpeg:dash:mcsp:2014"
- Tile_num_subpicture indicates the number of tiles contained in the track.
- the amount information of the tile data included in the subcode stream may also be added.
- the flag information identifier may also be used to identify whether the sub-code stream can participate in the splicing of the code stream.
- the onti@value attribute is described in Table 5.
- Table 5 describes the onti@value attribute in "urn:mpeg:dash:mcsp:2014"
- only_one_tile_flag indicates that the track can be used for stream stitching; when the value is 0, it indicates that the track cannot participate in stream stitching.
- the only_one_tile_flag in the embodiment of the present application may be replaced by the merge_enable_flag, and the merge_enable_flag is used to indicate whether the sub-code stream is available for splicing.
- the only_one_tile_flag is used as an example for description.
- the amount information of the tile data included in the subcode stream may also be added.
- the value of the codecs field is added as "mcc".
- Figure 15 show an example of how MCTS-based sub-picture tracks of the same resolution can be reconstructed to form a HEVC conforming bitstream.
- a 2x2 tile grid has been used in the tile sets and each motion-constrained tile set sequence is included in One sub-picture track.Each tile set originated from sub-picture bitstream is treated as a slice in the reconstructed bitstream and slice segment header of each slice in the reconstructed bitstream should be modified accordingly.
- first_slice_segment_in_pic_flag and slice_segment_address should be set accordingly.
- profile_tier_level should be set accordingly.
- pic_width_in_luma_samples and pic_height_in_luma_samples should be recalculated and set accordingly.
- tiles_enabled_flag should be set to 1
- num_tile_columns_minus1 and num_tile_rows_minus1 shoud be set to 1
- uniform_spacing_flag should be set to 1
- loop_filter_across_tiles_enabled_flag and pps_loop_filter_across_slices_enabled_flag should be set to 0.
- FIG. 15 An example of how the sub-image trajectory of the same resolution based on MCTS encoding is reconstructed into one HEVC standard code stream is shown in FIG.
- a 2x2 tile grid is used in the tile collection, and each motion-constrained tile set sequence is contained in a sub-image track.
- each tile set from the sub-image code stream is treated as a slice, and the header of each segment needs to be modified accordingly.
- the tile boundary line and the slice boundary line in FIG. 15 are the same.
- tiles_enabled_flag should be set to 1
- num_tile_columns_minus1 and num_tile_rows_minus1 should be set to 1
- uniform_spacing_flag should be set to 1
- loop_filter_across_tiles_enabled_flag and pps_loop_filter_across_slices_enabled_flag should be set to zero.
- the reconstructed bitstream should comply with HEVC tile and slice syntax.
- Figure 16 give an example of how to reconstruct from tracks with different resolutions.
- bitstreams from sub-picture track 3 and 4 are reconstructed to form one tile with two slices.
- OMAF player may choose to do reconstruction from same resolution and enable two HEVC decoders for different resolutions.
- the reconstructed code stream should conform to the HEVC tile and slice syntax.
- Figure 16 shows how the code stream is reconstructed using sub-image tracks of different resolutions.
- streams from different resolutions may not form the HEVC standard stream, in which case the OMAF player can choose the same resolution for reconstruction and use two HEVC decoders for different streams.
- the resolution stream is decoded.
- the embodiment provides a server device, where the server generates the number of tile data included in the indicator subcode stream described herein, or is used for A server indicating whether the substream is available for splicing information, writing the information to the SEI of the stream, or writing the information to a file or the like in the manner described herein.
- the server may not be responsible for the original stream encoding, such as a transcoding server, or just a server that generates such information, for storing a stream or file with the information described herein.
- the scheme of the embodiment of the present application adds information about the number of tile data included in the code stream to the SEI message of the code stream or the OMAF file format, so that the sub-stream possible for splicing is determined according to the number of tile data.
- the stream stream splicing process enables multiple substreams to be spliced to decode all substreams in a single decoding operation using a single decoder without the need to transmit additional track information, thereby saving bandwidth and reducing system layer. Manage the complexity of the flow.
- FIG. 17 is a schematic structural diagram of hardware of a computer device according to an embodiment of the present application.
- the computer device can be used as an implementation manner of a processing device for media information, and can also be implemented as a processing method for media information.
- the computer device includes a processor 171, a memory 172, and an input/output interface 173.
- bus 175, may also include a communication interface 174.
- the processor 171, the memory 172, the input/output interface 173, and the communication interface 174 implement communication connections with each other through the bus 175.
- the processor 171 can be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing related programs. A function to be performed by a module in a processing device for implementing information of a streaming media provided by an embodiment of the present invention, or a method for processing information of a streaming media corresponding to an embodiment of the method of the present invention.
- Processor 171 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 171 or an instruction in a form of software.
- the processor 171 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
- the storage medium is located in the memory 172, and the processor 171 reads the information in the memory 172, and performs the functions required by the module included in the processing device of the media information provided by the embodiment of the present application in combination with the hardware, or performs the method implementation of the present application.
- the method of processing fluid information provided by the example.
- the memory 172 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
- Memory 172 can store operating systems as well as other applications.
- the function to be performed by the module included in the processing device of the media information provided by the embodiment of the present application, or the method for processing the media information provided by the embodiment of the method of the present application, is implemented in the implementation of the present application.
- the program code of the technical solution provided by the example is stored in the memory 172, and the processor 171 performs an operation required by the module included in the processing device of the media information, or performs a processing method of the media information provided by the embodiment of the method of the present application. .
- the input/output interface 173 is for receiving input data and information, and outputting data such as an operation result.
- Communication interface 174 enables communication between a computer device and other devices or communication networks using transceivers such as, but not limited to, transceivers. It can be used as an acquisition module or a transmission module in the processing device.
- Bus 175 can include a path for communicating information between various components of computer equipment, such as processor 171, memory 172, input/output interface 173, and communication interface 174.
- the computer device shown in FIG. 17 only shows the processor 171, the memory 172, the input/output interface 173, the communication interface 174, and the bus 175, it will be understood by those skilled in the art in the specific implementation process.
- the computer device also includes other devices necessary to achieve normal operation, for example, a display may also be included for displaying video data to be played.
- computer devices may also include hardware devices that implement other additional functions, depending on the particular needs.
- the computer device may also only include the components necessary to implement the embodiments of the present application, and does not necessarily include all of the devices shown in FIG.
- a readable storage medium stores computer execution instructions, when a device (which may be a single chip microcomputer, a chip, etc.) or a processor executes the above method embodiment. Some or all of the steps in the processing of the provided media information.
- the aforementioned readable storage medium may include various media that can store program codes, such as a USB flash drive, a removable hard disk, a read only memory, a random access memory, a magnetic disk, or an optical disk.
- a computer program product comprising computer executed instructions stored in a computer readable storage medium; at least one processor of the device may be The reading storage medium reads the computer to execute the instruction, and the at least one processor executes the computer to execute the instruction, so that the device implements some or all of the processing methods of the media information provided by the foregoing method embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本申请实施例提供一种媒体信息的处理方法及装置,涉及流媒体传输技术领域,用于节省传输带宽,同时降低流管理的复杂度。所述方法包括:获取子码流数据,所述子码流数据包括指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量,或所述指示信息用于指示所述子码流数据是否可用于拼接;根据所述指示信息处理所述子码流数据。
Description
本申请要求于2018年01月12日提交中国专利局、申请号为201810032638.1、申请名称为“一种媒体信息的处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及流媒体传输技术领域,尤其涉及一种媒体信息的处理方法及装置。
当前随着360度视频等虚拟现实(virtual reality,VR)视频的观看应用的日益普及,越来越多的用户加入到大视角的VR视频观看的体验队伍中。这种新的视频观看应用给用户带来了新的视频观看模式和视觉体验的同时,也带来了新的技术挑战。由于360度等大视角的视频观看过程中,VR视频的空间区域为360度的全景空间,超过了人眼正常的视觉范围,因此,用户在观看视频的过程中随时都会变换观看的角度(或视角)。用户观看的视角不同,看到的视频图像也将不同,故此视频呈现的内容需要随着用户的视角变化而变化。
当前的视频应用场景中,特别是在当前360度全景视频应用以及多路视频应用中,有时用户仅对整幅图像的一部分图像感兴趣。这种情况下,客户端并不需要将所有图像区域都进行显示,只需要获取全图像中的一部分图像在客户端进行渲染呈现。在这种应用场景下,客户端可以将呈现所需要的子码流进行拼接,得到一个标准码流,且为了兼容已有标准编解码器,子码流需要满足一定的条件才可以拼接在一起。
如图1所示,现有的全景媒体应用格式(omnidirectional media format,OMAF)标准中采用额外传输一个metadata track,该track中标记了可以进行拼接的子码流的track ID以及拼接方法,客户端利用该track的信息指导对应的子码流按照约定的排布完成拼接过程,最后由标准编解码器对拼接后的图像码流(即标准码流)进行解码,将解码后的图像进行渲染呈现。图1中以视频图像被划分8个子图像(即8个tile)、客户端请求的视角包括1、2、5和6(四个tile)为例进行说明,t0~tN表示不同时序。
但是,上述方法中传输的metadata track会带来额外的带宽需求,而且在子图像码流的track数目较多时,可拼接为标准码流的子图像码流的track组合较多,从而需要构造不同的metadata track,使得流管理的复杂度较高。
发明内容
本申请的实施例提供一种媒体信息的处理方法及装置,在子码流数据中添加了码流中包括的tile数据的数量信息或码流是否可用于拼接的信息的指示信息,从而根据指示信息确定码流是否可用于拼接,从而解决了现有技术中在码流拼接时需要额外的track信息的问题,节省了传输带宽,同时也降低了流管理的复杂度较高。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种媒体信息的处理方法,该方法包括:获取子码流数据,子码 流数据包括指示信息,该指示信息用于指示该子码流数据中包括的tile数据的数量,或该指示信息用于指示该子码流数据是否可用于拼接;根据该指示信息处理该子码流数据。
本申请方法实施例的执行主体可以是可穿戴式设备(例如AR/VR头盔,AR/VR眼镜等),智能终端(例如手机,平板电脑等),电视,机顶盒等具有视频或图像解码功能的设备。
在第一方面的一种可能的实现方式中,可以通过发送媒体数据获取请求,然后接收媒体数据,该媒体数据中包括子码流数据。例如终端可以通过媒体展示描述文件中的有关属性和地址信息构建统一资源定位符(uniform resource locator,URL),然后向该URL发送HTTP请求,然后接收相应的媒体数据。
在第一方面的一种可能的实现方式中,可以通过推送的方式得到媒体数据,该媒体数据中包括子码流数据。本申请实施例中的媒体数据可以主要指的是对视频或图像进行编码封装后的数据,在一些可能的实现方式中,本申请实施例的媒体数据也可以是对音频进行编码封装后的数据。视频是由一系列的图像组成的。
在第一方面的一种可能的实现方式中,媒体数据的有关示例可以参考ISO/IEC23090-2标准规范的媒体数据的有关规定。
ISO/IEC 23090-2标准规范又称为OMAF(omnidirectional media format,全向媒体格式)标准规范,该规范定义了一种媒体应用格式,可以在应用中实现全向媒体的呈现,全向媒体主要是指全向视频(360°视频)和相关音频。OMAF规范首先指定了可以用于将球面视频转换为二维视频的投影方法的列表,其次是如何使用ISO基本媒体文件格式(ISO base media file format,ISOBMFF)存储全向媒体和该媒体相关联的元数据,以及如何在流媒体系统中封装全向媒体的数据和传输全向媒体的数据,例如通过基于超文本传输协议(hypertext transfer protocol,HTTP)的动态自适应流传输(dynamic adaptive streaming over HTTP,DASH),ISO/IEC 23009-1标准中规定的动态自适应流传输。
在第一方面的一种可能的实现方式中,本申请实施例所述的图像,可以是采集设备(如摄像头等)采集到的一个完整的图像,也可以是对一个完整的图像进行划分后得到的图像。例如采集设备采集的图像的分辨率为1024×1024。则本申请实施例的图像,可以是1024×1024,也可以是512×512的图像,也可以是1024×512的图像,或者是512×1024的图像等,本申请对此不做具体限定。
在第一方面的一种可能的实现方式中,本申请实施例所述的图像数据(比如,视频码流,原码流等),是按照视频编码技术对图像进行编码后的数据,例如可以是采用ITU H.264对图像进行编码后得到的图像数据,或者是采用ITU H.265对图像进行编码后得到的图像数据,也可以是采用其它标准或者私有技术对图像进行编码后的数据。
在第一方面的一种可能的实现方式中,子码流数据的指示信息可以被封装在补充增强信息(supplementary enhancement information,SEI)中。
在第一方面的一种可能的实现方式中,子码流数据的指示信息可以被封装在轨迹(track)中的一个盒(box)中。
在第一方面的一种可能的实现方式中,子码流数据的指示信息可以被封装在媒体 展示描述(media presentation description,MPD)文件中。其中,媒体展示描述文件中包括了图像的一些元数据的文件。元数据指的是一些属性信息,例如时长,码率,帧率,在球面坐标系中的位置等。在一个示例中,媒体展示描述文件可以参考ISO/IEC23009-1中的有关规定和示例。
在第一方面的一种可能的实现方式中,子码流数据的指示信息可以承载在轨迹(track)的样本入口类型中。可选的,针对子码流包括的tile数据的数量信息,在track中添加sample entry type(样本入口类型):‘onti’。当sample entry name为‘onti’时,表示当前track中的子码流包括一个tile数据,从而指示该子码流可用于拼接。
在第一方面的一种可能的实现方式中,指示信息包括至少一个的标识,所述标识用于指示子码流数据中包括的tile数据的数量。例如,指示信息可以是一个比特的flag,根据flag的值指示tile数据的数量是否为1,指示信息也可以是指示符;或者,指示信息可以是两个或者两个以上比特的flag,flag的值直接用于指示tile数据的数量。
在第一方面的一种可能的实现方式中,子码流数据还包括视频参数集(video parameter set,VPS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和VPS信息,处理子码流数据。
在第一方面的一种可能的实现方式中,子码流数据还包括序列参数集(sequence parameter set,SPS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和SPS信息,处理子码流数据。
在第一方面的一种可能的实现方式中,子码流数据还包括图像参数集(picture parameter set,PPS)信息,则根据指示信息处理子码流数据,可以包括:根据指示信息和PPS信息,处理子码流数据。
在第一方面的一种可能的实现方式中,子码流数据还包括片段(slice segment,SS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和SS信息,处理子码流数据。
在第一方面的一种可能的实现方式中,子码流数据还包括分辨率信息,当存在不同分辨率的子码流时,可以用一个tile包含多个slice的形式对子码流进行拼接。可选的,可以将低分辨率的多个子码流作为slice,组合成一个tile数据,即拼接后的码流中的一个tile数据可以包括多个slice。比如,存在两种分辨率且内容相同的子码流,两种分辨率分别为1024×1024和512×512,则可以将两个512×512的子码流作为slice,组合成一个tile数据。
在第一方面的一种可能的实现方式中,当确定子码流不能用于拼接时,可以直接对该子码流进行解码,并将解码后得到的图像与拼接码流解码后得到的图像进行图像拼接,得到最终的显示图像,从而与丢弃不能用于拼接的子码流相比,可以提高子码流的利用率。
上述第一方面提供的媒体信息的处理方法中,通过在每个子码流中添加该子码流所包含的tile数据的数量的信息或用于指示该子码流数据是否可用于拼接的指示信息,则可以根据指示信息确定子码流是否可用于拼接,根据多个可用于拼接的子码流进行拼接。对所需码流进行拼接后解码,仅需要单个解码器就能实现多个子图像序列的解码,且无需传输额外的track信息,节省了带宽,同时也简化了流管理的复杂度。
第二方面,提供了一种媒体信息的处理方法,该方法包括:获取原始图像的子码流数据;根据子码流数据中包括的tile数据的数量,确定子码流数据的指示信息,该指示信息用于指示该子码流数据中包括的tile数据的数量,或该指示信息用于指示该子码流数据是否可用于拼接;向终端发送子码流数据,子码流数据包括指示信息。
在第二方面的一种可能的实现方式中,子码流的指示信息可以被封装在补充增强信息(supplementary enhancement information,SEI)中。
在第二方面的一种可能的实现方式中,子码流的指示信息可以被封装在轨迹(track)的一个盒(box)中。
在第二方面的一种可能的实现方式中,子码流的指示信息可以被封装在轨迹(track)的样本入口类型中。
在第二方面的一种可能的实现方式中,子码流的指示信息可以被封装在媒体展示描述(media presentation description,MPD)文件中。
在第二方面的一种可能的实现方式中,子码流数据还包括视频参数集(video parameter set,VPS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和VPS信息,处理子码流数据。
在第二方面的一种可能的实现方式中,子码流数据还包括序列参数集(sequence parameter set,SPS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和SPS信息,处理子码流数据。
在第二方面的一种可能的实现方式中,子码流数据还包括图像参数集(picture parameter set,PPS)信息,则根据指示信息处理子码流数据,可以包括:根据指示信息和PPS信息,处理子码流数据。
在第二方面的一种可能的实现方式中,子码流数据还包括片段(slice segment,SS)信息,则根据指示信息处理子码流数据,包括:根据指示信息和SS信息,处理子码流数据。
在第二方面的一种可能的实现方式中,子码流数据还包括分辨率信息。可选的,存在不同分辨率的子码流,比如,存在两种分辨率且内容相同的子码流,两种分辨率分别为1024×1024和512×512,则可以将两个512×512的子码流作为slice,组合成一个tile数据。
第三方面,提供一种媒体信息的处理装置,该装置包括:获取模块,用于获取子码流数据,子码流数据包括指示信息,指示信息用于指示子码流数据中包括的tile数据的数量,或该指示信息用于指示该子码流数据是否可用于拼接;处理模块,用于根据指示信息处理子码流数据。
在第三方面的一种可能的实现方式中,指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
在第三方面的一种可能的实现方式中,指示信息承载在轨迹(track)的一个盒(box)中。
在第三方面的一种可能的实现方式中,指示信息承载在轨迹(track)的样本入口类型中。
在第三方面的一种可能的实现方式中,指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
在第三方面的一种可能的实现方式中,子码流数据还包括视频参数集(video parameter set,VPS)信息,处理模块还用于:根据指示信息和VPS信息,处理子码流数据。
在第三方面的一种可能的实现方式中,子码流数据还包括序列参数集(sequence parameter set,SPS)信息,处理模块还用于:根据指示信息和SPS信息,处理子码流数据。
在第三方面的一种可能的实现方式中,子码流数据还包括图像参数集(picture parameter set,PPS)信息,处理模块还用于:根据指示信息和PPS信息,处理子码流数据。
在第三方面的一种可能的实现方式中,子码流数据还包括片段(slice segment,SS)信息,处理模块还用于:根据指示信息和SS信息,处理子码流数据。
在第三方面的一种可能的实现方式中,子码流数据还包括分辨率信息。
第四方面,提供一种媒体信息的处理装置,该装置包括:获取模块,用于获取原始图像的子码流数据;处理模块,用于根据子码流数据中包括的tile数据的数量,确定子码流数据的指示信息,该指示信息用于指示该子码流数据中包括的tile数据的数量,或该指示信息用于指示该子码流数据是否可用于拼接;发送模块,用于向终端发送子码流数据,子码流数据包括指示信息。
在第四方面的一种可能的实现方式中,指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
在第四方面的一种可能的实现方式中,指示信息承载在轨迹(track)的一个盒(box)中。
在第四方面的一种可能的实现方式中,指示信息承载在轨迹(track)的样本入口类型中。
在第四方面的一种可能的实现方式中,指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
在第四方面的一种可能的实现方式中,子码流数据还包括视频参数集(video parameter set,VPS)信息,处理模块还用于:根据指示信息和VPS信息,处理子码流数据。
在第四方面的一种可能的实现方式中,子码流数据还包括序列参数集(sequence parameter set,SPS)信息,处理模块还用于:根据指示信息和SPS信息,处理子码流数据。
在第四方面的一种可能的实现方式中,子码流数据还包括图像参数集(picture parameter set,PPS)信息,处理模块还用于:根据指示信息和PPS信息,处理子码流数据。
在第四方面的一种可能的实现方式中,子码流数据还包括片段(slice segment,SS)信息,处理模块还用于:根据指示信息和SS信息,处理子码流数据。
在第四方面的一种可能的实现方式中,子码流数据还包括分辨率信息。
需要说明的是,本申请第二方面至第四方面的方法或装置实施例的具体示例和实 现方式可以参考第一方面方法实施例中的相关举例,在此不再赘述。
第五方面,提供了一种媒体信息的处理装置,该装置包括:一个或多个处理器、存储器。该存储器与一个或多个处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括指令,当一个或多个处理器执行指令时,处理装置执行如上述第一方面以及第一方面的任意一种可能的实现方式所提供的媒体信息的处理方法,或者执行上述第二方面以及第二方面的任意一种可能的实现方式所提供的媒体信息的处理方法。
第六方面,提供了一种处理器,所述处理器用于执行如上述第一方面以及第一方面的任意一种可能的实现方式所述的媒体信息的处理方法,或执行如上述第二方面以及第二方面的任意一种可能的实现方式所述的媒体信息的处理方法。
本申请的又一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在设备上运行时,使得设备执行如上述第一方面以及第一方面的任意一种可能的实现方式所述的媒体信息的处理方法。
本申请的又一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在设备上运行时,使得设备执行如上述第二方面以及第二方面的任意一种可能的实现方式所提供的媒体信息的处理方法。
本申请的又一方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得该计算机执行如上述第一方面以及第一方面的任意一种可能的实现方式所述的媒体信息的处理方法。
本申请的又一方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得该计算机执行如上述第二方面以及第二方面的任意一种可能的实现方式所述的媒体信息的处理方法。
图1为一种子码流的视频编解码传输的示意图;
图2为本申请实施例提供的一种视频编解码以及传输系统的结构示意图;
图3为本申请实施例提供的一种媒体信息的处理方法的流程示意图;
图4为本申请实施例提供的一种媒体信息的处理装置的结构示意图;
图5为本申请实施例提供的另一种媒体信息的处理方法的流程示意图;
图6为本申请实施例提供的另一种媒体信息的处理装置的结构示意图;
图7为本申请实施例提供的第一种码流拼接的示意图;
图8为本申请实施例提供的一种码流中slice的排布示意图;
图9为本申请实施例提供的另一种码流中slice的排布示意图;
图10为本申请实施例提供的第二种码流拼接的示意图;
图11为本申请实施例提供的第三种码流拼接的示意图;
图12为本申请实施例提供的第四种码流拼接的示意图;
图13为本申请实施例提供的第五种码流拼接的示意图;
图14为本申请实施例提供的第六种码流拼接的示意图;
图15为本申请实施例提供的第七种码流拼接的示意图;
图16为本申请实施例提供的第八种码流拼接的示意图;
图17为本申请实施例提供的一种计算机设备的结构示意图。
在介绍本申请实施例之前,首先对本申请涉及的相关技术术语进行解释说明。
视频解码(video decoding):是指将码流按照特定的语法规则和处理方法恢复成重建图像的处理过程。
视频编码(video encoding):是指将图像序列压缩成码流的处理过程。
视频编码(video coding):video encoding和video decoding的统称,中文译名和video encoding相同。
全景视频:也可以称为虚拟现实(virtual reality,VR)全景视频,或者称为360度全景视频或360度视频,一种用多摄像机进行全方位360度进行拍摄的视频,用户在观看视频的时候,可以随意调节视频上下左右进行观看。
Tile:指的是视频编码标准HEVC中针对待编码图像进行划分所得到的方块形编码区域,一帧图像可划分为多个tile,这些tile共同组成该帧图像。每个tile可以独立编码。
子图像(sub-picture):对图像进行划分,获得原图像的一部分称为该图像的子图像。在一些实施例中,子图像的形状为方形。子图像可以为一帧图像中的部分图像。
运动受限的tile集合(motion-constrained tile sets,MCTS):指的是针对tile的一种编码技术,这种技术在编码时对tile内部的运动矢量加以限制,使得图像序列中相同位置的tile在时域上不会参考该tile区域位置以外的图像像素,因此时域上各个tile可以独立解码。
图像子区域:为方便本申请的描述,以图像子区域来作为tile或者子图像的统称。可以理解的是,本申请中的子图像也可以包括按照tile编码方式划分的图像。
轨迹(track):track在标准ISO/IEC 14496-12中的定义为“timed sequence of related samples(q.v.)in an ISO base media file.NOTE:For media data,a track corresponds to a sequence of images or sampled audio;for hint tracks,a track corresponds to a streaming channel.”翻译“ISO媒体文件中相关样本的时间属性序列。注:对于媒体数据,一个track就是个图像或者音频样本序列;对于提示轨迹,一个轨迹对应一个流频道”。解释:track是指一系列有时间属性的按照ISOBMFF的封装方式的样本,比如视频track,视频样本是视频编码器编码每一帧后产生的码流,按照ISOBMFF的规范对所有的视频样本进行封装产生样本。
盒子(box):box在ISO/IEC 14496-12标准中的定义为“object-oriented building block defined by a unique type identifier and length.NOTE:Called‘atom’in some specifications,including the first definition of MP4.”翻译“面向对象的构建块,由唯一的类型标识符和长度定义。注:在某些规范中称为“原子”,包括MP4的第一个定义。”box是构成ISOBMFF文件的基本单元,box可以包含其他的box。
补充增强信息(supplementary enhancement information,SEI):是视频编解码标准(h.264,h.265)中定义的一种网络接入单元(Network Abstract Layer Unit,NALU)的类型。
媒体展示描述(Media presentation description,MPD)是标准ISO/IEC 23009-1中 规定的一种文档,在该文档中包括了客户端构造HTTP-URL的元数据。在MPD中包括一个或者多个周期(period)元素,每个period元素包括有一个或者多个自适应集(adaptationset),每个adaptationset中包括一个或者多个表示(representation),每个representation中包括一个或者多个分段,客户端根据MPD中的信息,选择表达,并构建分段的http-URL。
本申请实施例应用于视频编解码以及传输系统。在一个实施例中,基于子码流的视频编解码传输框图如图2所示。
参见图2,在服务器侧,由视频捕获装置得到视频或图像。视频捕获装置可以是摄像头,照相机等视频或图像的采集装置,也可以是接收装置,从其它设备处接收视频或图像的数据。编码前处理器用于对视频或图像进行编码前的一些处理,其中,编码前处理器模块可以包括对视频或图像进行子区域分割(或划分),可以理解的是,编码前处理器也可以是视频编码器的一部分,或者是由视频编码器执行编码前处理器的上述功能。视频编码器用于按照一定的编码规则对视频或图像进行编码,例如可以采用H.264,H.265中规定的编码方式,也可以按照其它私有编码技术对视频或图像进行编码。对于编码后的码流,码流封装装置可以按照一定的封装格式对码流进行码流封装,例如可以是MPEG-2TS流的封装格式或其它封装方式。然后由发送传输装置将封装后的码流发送给终端。
在终端侧,接收装置用于从服务器侧接收码流,然后由码流解封装装置对码流进行解封装后,可获得多个子码流,送入视频解码器。由视频解码器对子码流进行解码,生成解码后的视频或图像,最后由显示装置进行显示。
可以理解的是,图2中示出的服务器和终端是相对于码流的发送方和接收方的一种表示,在实际的产品实现中,服务器可以是智能手机或平板电脑等设备,终端也可以是智能手机或平板电脑等设备,本申请实施例对此不做具体限定。
可以理解的是,本申请实施例的子码流,是相对于拼接后的码流而言的。获取的子码流可以是单独传输的码流。本申请实施例中的子码流也可以称为子码流数据,子码流和子码流数据可以互相替换。
如图3所示,本申请一方面的实施例提供了一种媒体信息的处理方法S30,该方法S30包括:
S301:终端获取子码流数据,所述子码流数据包括指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量。
S302:终端根据所述指示信息处理所述子码流数据。
如图4所示,本申请的一方面的实施例提供了一种媒体信息的处理装置40,该装置40包括:获取模块401和处理模块402。其中,获取模块401用于获取子码流数据,所述子码流数据包括指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量;处理模块402用于根据所述指示信息处理所述子码流数据。
如图5所示,本申请的一方面的实施例提供了另一种媒体信息的处理方法S50,该方法S50包括:
S501:服务器获取原始图像的子码流数据。
S502:服务器根据所述子码流数据中包括的tile数据的数量,确定所述子码流数 据的指示信息。
S503:服务器向终端发送所述子码流数据,所述子码流数据包括所述指示信息。
如图6所示,本申请的一方面的实施例提供了一种媒体信息的处理装置S60,该装置S60包括:获取模块601、处理模块602和发送模块603。其中,获取模块601用于获取原始图像的子码流数据;处理模块602用于根据所述子码流数据中包括的tile数据的数量,确定所述子码流数据的指示信息;发送模块603用于向终端发送所述子码流数据,所述子码流数据包括所述指示信息。
在本申请一种可能的实现方式中,本申请实施例给出一种基于子码流拼接处理的方法,以及相对应的编码传输和解码呈现方式。本申请实施例的整个系统处理过程如图7所示,其实施步骤详细介绍如下:
服务器侧:
将输入视频图像(可以称为原始图像)进行区域(每个区域可以称为一个tile)划分,并采用MCTS形式编码,生成一个符合标准的视频码流。其中,MCTS是指限制时域和空域上的帧间帧内预测运动矢量,而且预测像素不超过图像边界,服务器采用MCTS形式编码时的设置可以包括:关闭跨tile去方滤波和SAO滤波,以保证tile数据之间的解码独立。之后,服务器对该视频码流进行拆分,获取多个子码流。其中,一个子码流中可能包括一个tile的编码数据,也可能包括多个tile的编码数据。tile的编码数据是指对原始图像中的tile区域进行编码后得到的编码数据,本申请实施例中将tile的编码数据也可以称为tile数据。
具体的,服务器对该视频码流进行拆分,获取多个子码流的过程可以包括:服务器检测该视频码流中的NALU起始码,划分出不同的NALU,每个NALU中包含一个tile数据、或者多个tile数据;服务器确定不同NALU中包含的一个或者多个tile数据对应的编码参数集(即确定一个子码流的编码参数集),并从视频码流中拷贝每个NALU所包括的tile数据,为拷贝的tile数据和编码参数集添加NALU起始码,从而形成一个子码流。
其中,每个NALU头(Header)中的类型可以用于指示该视频码流的编码参数集信息和该NLAU中所包括的tile数据。该编码参数集可以包括视频参数集(video parameter set,VPS)信息、序列参数集(sequence parameter set,SPS)信息、图像参数集(picture parameter set,PPS)信息和片段(slice segment,SS)信息等。
为便于理解,这里假设原始图像的分辨率为3840×1920,划分的每个tile的大小为640×640,且以该视频码流中的一个NALU(即一个子码流)为例进行详细说明。当服务器确定该子码流的编码参数集时,服务器可以沿用该视频码流中的编码参数集,仅对该编码参数集中的部分参数进行修改,具体如下所述。
对于VPS信息:可选的,由于子码流的分辨率下降,服务器可以对VPS信息中的编码规格level set进行修改,比如,将level从5.1修改为4.1。
对于SPS信息:服务器可以修改SPS信息中包括表示图像宽和高的参数,即pic_width_in_luma_samples和pic_height_in_luma_sample。比如,当该子码流所包含的tile数据的数量为1(即该子码流对应的子图像包括一个tile),则pic_width_in_luma_samples和pic_height_in_luma_samples分别为640和640;当该子 码流所包含的tile数据的数量为2(即该子码流对应的子图像包括两个tile)时,如果两个tile左右相邻,则pic_width_in_luma_samples和pic_height_in_luma_samples分别为1280和640,如果两个tile上下相邻时,则pic_width_in_luma_samples和pic_height_in_luma_samples分别为640和1280。
对于PPS信息:服务器可以根据该子码流中所包含的tile数据的数量(即该子码流对应的子图像所包括tile的数量)设置PPS信息中的tiles_enabled_flag(tile使能标识)、num_tile_columns_minus1(tile列数标识)、num_tile_rows_minus1(tile行数标识)、uniform_spacing_flag(平均分配标识)等与tile划分相关的参数。比如,当该子码流对应的子图像包括一个tile时,则tiles_enabled_flag设置为0;当子码流对应的子图像包括两个相同分辨率左右相邻的tile时,则tiles_enabled_flag设置为1,可将num_tile_columns_minus1设置为1,num_tile_rows_minus1设置为0,uniform_spacing_flag设置为1。
对于SS信息:服务器可以将该子码流所包含的每一个tile数据作为一个slice的划分方式进行构建,则服务器可以对SS的头(header)信息first_slice_segment_in_pic_flag(第一个片段标识)、slice_segment_address(片段地址)等参数进行设定。比如,当该子码流中含有一个tile数据时,则可以将first_slice_segment_in_pic_flag设置为1、slice_segment_address设置为0;当该子码流中含有两个tile数据,每一帧的第一个tile作为第一个slice时,则对应的first_slice_segment_in_pic_flag设置为1、slice_segment_address设置为0,每一帧的第二个tile作为第二个slice,对应的first_slice_segment_in_pic_flag设置为1、slice_segment_address设置为10(如图8所示,将图像按照CTU划分(64x64),按照全图Z字形扫描,当前slice的第一个CTU标号即为当前slice的slice_segment_address值)。
需要说明的是,服务器也可以将每两个tile数据或者两个以上的tile数据作为一个slice的划分方式进行构建,本申请实施例仅以一个tile数据作为一个slice的划分方式进行举例说明,其并不对本申请实施例构成限定。
在本申请实施例中,服务器可以将子码流中所包含的tile数据的数量信息写入到码流的SEI消息中。其中,tile数据的数量信息可以用于指示该子码流是否可用于拼接,当tile数据的数量为1时,可用于指示该子码流可用于拼接,当tile数据的数量大于1时,可用于指示该子码流不能用于拼接。具体的,SEI消息可以使用如下的语法元素表示:
表1.1 SEI语法
表1.2 子码流拼接SEI消息语法
sub_picture_info_aggregate(payloadSize){ | Descriptor |
sub_pic_str_only_one_tile | u(1) |
} |
表1.1中,针对SEI类型加入新类型156,用于表示当前子码流是否能用于拼接,并加入信息sub_picture_info_aggregate(payloadSize)。sub_picture_stream_aggregate中包含的语法元素含义如下:
sub_pic_str_only_one_tile该值为1表示该子码流包括不超过一个的tile数据(即该子码流对应的子图像包括不超过一个Tile),该值为0表示该子码流包括两个或两个以上的tile数据(即该子码流对应的子图像包括两个或两个以上的Tile)。
表1.3 子码流中包括的tile数据的数量SEI消息语法
sub_picture_info_aggregate(payloadSize){ | Descriptor |
Tile_num_subpicture | ue(v) |
} |
其中,也可以是表1.3类型,Tile_num_subpicture值表示该子码流中含有的tile数据的数量。
获取多个子码流的一种具体方法如下,但不局限于该方法:
1、对原始图像进行tile区域划分,并对划分后的tile进行编码,得到视频码流;
2、获取视频码流中的每个NALU,并解析视频码流的编码参数集(包括VPS信息,SPS信息,PPS信息和SS信息);根据每个NALU中包括的Tile数据和视频码流的编码参数集,制作多个子码流的编码参数集;
3、从视频码流中拷贝多个子码流中每个子码流的编码数据,并根据多个子码流的编码参数集,形成该多个子码流。
4、为每个子码流设置sub_picture_info_aggregate的SEI消息。
然后,可以将每个子码流进行封装,并存储在服务器上。
终端侧:
根据用户行为,终端向服务器请求需要的码流,并对接收到的码流进行解封装。
解析每个子码流的SEI消息sub_picture_info_aggregate(payloadSize),根据上述表1.1和表1.2中语法元素的值,判断每个子码流是否可用于码流拼接。当sub_pic_str_only_one_tile为1时表示子码流可用于拼接;当sub_pic_str_only_one_tile不为1时表示子码流不能用于拼接,则丢弃该子码流,不参与后续处理。对于sub_pic_str_only_one_tile为1的子码流进行拼接。可选的,如果请求下载的子码流不足以拼接成为完整的矩形,可以取其中一个子码流进行复制填充,复制的子码流可以不参与解码和渲染显示。
比如,图7中sub0~sub15为视频图像的16个tile,str0是sub0、sub1、sub4和sub5四个tile编码后对应的子码流,str1~str16分别是sub0~sub15(即每个子码流包含一个tile)编码后对应的子码流。相应的,子码流str0对应的only_one_tile_flag等于0,子码流str1、str2、str3、str5、str6和str7对应的only_one_tile_flag均等于1,则表示str0不能用于拼接(图7中用×表示不能用于拼接),str1、str2、str3、str5、str6和str7可用于拼接(图7中用√表示不能用于拼接)。对终端拼接子码流的过程进行详细说明。
终端检测子码流str1、str2、str3、str5、str6和str7中NALU起始码,划分出不同的NALU;通过NALU头中的类型可以确定每个子码流的编码参数集和编码数据。其中,终端可以从子码流str1、str2、str3、str5、str6和str7中选择一个作为编码参数集构造基准,比如,选择str1作为编码参数集构造基准,根据str1的编码参数集信息,确定新码流(即子码流拼接的标准码流)的编码参数集。之后,分别从子码流str1、str2、str3、str5、str6和str7中拷贝对应tile的编码数据作为新码流的tile对应位置的编码数据,为新码流的编码参数集和新码流的编码数据添加NALU起始码,并按照一定的顺序拼接成一个标准的码流。
为便于理解,这里假设原始图像的分辨率为3840×1920,划分的每个tile的大小为640×640,当终端确定新码流的编码参数集(比如,包括VPS信息、SPS信息、PPS信息和SS信息)时,终端可以沿用str1的编码参数集,仅对该编码参数集中的部分参数进行修改,具体如下所述。
对于VPS信息:可选的,由于新码流的分辨率提高,终端可以对VPS信息中的编码规格level set进行修改,比如,将level从4.1修改为5.1。
对于SPS信息:终端可以修改SPS信息中包括表示图像宽和高的参数,即pic_width_in_luma_samples和pic_height_in_luma_sample。比如图7中,终端请求的子图像包括3×2个tile(即sub0~sub2和sub4~sub6),则pic_width_in_luma_samples和pic_height_in_luma_samples分别为1920(即640×3)和1280(即640×2)。
对于PPS信息:终端可以根据子图像中包含的tile的数量设置PPS信息中的tiles_enabled_flag(tile使能标识)、num_tile_columns_minus1(tile列数标识)、num_tile_rows_minus1(tile行数标识)、uniform_spacing_flag(平均分配标识)等与tile划分相关的参数。比如图7中,终端请求的子图像包括3×2个tile,则终端可以设置tiles_enabled_flag为1,设置num_tile_columns_minus1为2(子图像包括三列,从0 开始表示),设置num_tile_rows_minus1为1(子图像包括两行,从0开始表示),设置uniform_spacing_flag为1(即tile是平均分配的)。
对于SS信息:终端可以将每个子码流所包含的每一个tile数据作为一个slice的划分方式进行构建,则终端可以对SS参数的头(header)信息first_slice_segment_in_pic_flag(第一个片段标识)、slice_segment_address(片段地址)等参数进行设定。比如图7中的排列形式,则每一帧的str1的SS参数的头信息中first_slice_segment_in_pic_flag设置为1,其余的first_slice_segment_in_pic_flag设置为0;每一帧的SS参数头信息中的str1、str2、str3、str5、str6和str7的编码数据对应的slice_segment_address的值通过计算可以确定,图7中的slice_segment_address值分别设置为0、10、20、300、310、320(如图9所示,将图像按照CTU划分(64x64),按照全图Z字形扫描,当前slice的第一个CTU标号即为当前slice的slice_segment_address值)。
示例性的,如图10所示,假设第一个子码流为A,第二个子码流为B,子码流A包括4帧(frame)的数据(即A0~A3)且被封装在track1中,子码流B也包括4帧的数据(B0~B3)且被封装在track2中,则子码流A和子码流B拼接后的码流结构可以如图10中的码流C所示。SSH表示SS参数中的头(header)信息。
将多个子码流进行拼接的一种具体方法如下,但不局限于该方法:
1、解析每个子码流的sub_picture_info_aggregate的SEI消息,获取每个子码流中包括的tile数据的数量,将tile数据的数量为1的子码流确定可用于拼接的子码流;
2、解析各个子码流的参数集(包括VPS信息,SPS信息,PPS信息和SS信息),并融合解码后图像的宽高等信息,制作一个新的编码参数集,作为新码流(拼接后码流)的编码参数集;
3、根据解码后图像中tile的位置拷贝每个可用于拼接的子码流中对应tile的编码数据,为拷贝的编码数据和新的编码参数集添加NALU起始码,并按照一定顺序拼接成一个标准码流,即得到拼接后的码流。
最后,终端将拼接后的码流进行解码,并在显示设备上渲染呈现。
需要说明的是,服务器也可以在每个子码流中添加该子码流是否可用于拼接的信息的指示信息;相应的,终端在获取子码流后,可以根据指示信息直接确定该子码流是否可用于拼接。其中,当指示信息用于指示子码流是否可用于拼接,或者指示信息用于指示子码流中包括的tile数据的数量时,服务器侧和终端侧的具体实现过程中,仅是添加和获取的指示信息的内容不同,其他对应的操作过程是一致的,本申请实施例在此不再赘述。
本申请实施例中,对原始图像进行tile编码得到视频码流,并从视频码流中获取多个子码流,通过在每个子码流中添加该子码流所包含的tile数据的数量或者是否可用于拼接的信息,使得终端可以确定可用于拼接的子码流,根据多个可用于拼接的子码流进行拼接。对所需码流进行拼接后解码,仅需要单个解码器就能实现多个子图像序列的解码,且无需传输额外的track信息,节省了带宽,同时也简化了系统层流管理的复杂度。
在本申请的一种可能的实现方式中,本申请实施例给出另一种基于子码流拼接处 理的方法,以及相对应的编码传输和解码呈现方式。本申请实施例的整个系统处理过程如图11所示,其实施步骤详细介绍如下:
服务器侧:
将输入视频图像进行子图像划分,并将各个子图像进行编码,生成多个子码流。例如,可以按照相同的宽高(这里指的是划分后的不同子图像的宽高相同,而不是指同一个子图像的宽和高要相同,同一个子图像的宽和高可以相同也可以不相同)对原图像进行划分,获得多个方形子图像。对子图像编码时,针对子图像可以进行预测运动矢量限制,使得该子图像的预测像素不会超过该子图像四个边界的某个或某些边界。对每个子图像码流,可以按照上述实施例中的方式,将子码流中包括的tile数据的数量信息写入到子码流的SEI消息中。语法元素与上述实施例相同,具体参见上述实施例中的语法元素的描述,本申请在此不再赘述。
因此,对于每个子码流,sub_pic_str_only_one_tile该值为1表示该子码流包括不超过一个的tile数据(即该子码流对应的子图像包括不超过一个Tile),该值为0表示该子码流包括两个或两个以上的tile数据(即该子码流对应的子图像包括两个或两个以上的Tile)。
比如,图11中的sub0~sub15为视频图像的16个tile,str0是对包含sub0、sub1、sub4和sub5四个tile的子图像编码后得到的子码流,str1~str16分别是对子图像sub0~sub15(即每个子图像包含一个tile)编码后得到的子码流。相应的,str0对应的only_one_tile_flag等于0,str1~str16对应的only_one_tile_flag均等于1。图11中的×用于表示不能用于拼接,√用于表示可用于拼接。
然后,将每个子码流进行封装,并存储在服务器上。
需要说明的是,本实施例与上述实施例的区别在于:上述实施例服务器侧是对原始图像进行tile编码,仅需要一个编码器即可,编码后得到一个视频码流,需要从视频码流中重新获取多个子码流,而本实施例服务器侧是对原图像进行子图像划分,每个子图像由单独的一个编码器进行编码,编码后的码流即为多个子码流。
另外,终端侧向服务器请求需要码流,对码流进行解封装,以及将多个子码流进行拼接的方法与上述实施例中的过程一致,具体参见上述实施例中的描述,本申请实施例在此不再赘述。
本申请实施例中,对原始图像进行子图像划分,并对每个子图像进行编码得到多个子码流,通过在每个子码流中添加该子码流所包含的tile数据的数量的信息,使得终端可以确定可用于拼接的子码流,根据多个可用于拼接的子码流进行拼接。对所需码流进行拼接后解码,仅需要单个解码器就能实现多个子图像序列的解码,且无需传输额外的track信息,节省了带宽,同时也简化了流管理的复杂度。
在本申请的一种可能的实现方式中,当服务器中存在不同分辨率的子码流时,还可以用一个tile包含多个slice的形式,对子码流进行拼接。其中,服务器侧的实现过程与上述实施例的过程一致,终端侧在获取可用于拼接的子码流,进行子码流拼接时,可以将低分辨率的多个子码流作为slice,组合成一个tile数据,即拼接后的码流中的一个tile数据可以包括多个slice。比如,服务器包括两种分辨率且内容相同的子码流,两种分辨率分别为1024×1024和512×512,则终端可以将两个512×512的子码流作 为slice,组合成一个tile数据。
比如,如图12所示,分辨率1024×1024的子码流包括b1~b8,分辨率512×512的子码流包括m1~m8,以子码流b2、m3和m4为例,对终端拼接子码流的过程进行详细说明。终端检测子码流b2、m3和m4中NALU起始码,划分出不同的NALU;通过NALU头中的类型可以确定每个子码流的编码参数集和编码数据。其中,终端可以从子码流b2、m3和m4中选择一个作为编码参数集构造基准,比如,选择b2作为编码参数集构造基准,根据b2的编码参数集信息,确定新码流(即子码流拼接的标准码流)的编码参数集。之后,分别从子码流b2、m3和m4中拷贝对应tile的编码数据作为新码流的tile对应位置的编码数据,为新码流的编码参数集和新码流的编码数据添加NALU起始码,并按照一定的顺序拼接成一个标准的码流。
当终端确定新码流的编码参数集(比如,包括VPS信息、SPS信息、PPS信息和SS信息)时,终端可以沿用b2的编码参数集,仅对该编码参数集中的部分参数进行修改,具体如下所述。
对于VPS信息:可选的,由于新码流的分辨率提高,终端可以对VPS信息中的编码规格level set进行修改,比如,将level从4.1修改为5.1。
对于SPS信息:终端可以修改SPS信息中包括表示图像宽和高的参数,即pic_width_in_luma_samples和pic_height_in_luma_sample。比如图12中解码后图像的宽高信息,则可以设置pic_width_in_luma_samples和pic_height_in_luma_samples分别为1536(即1024+512)和1024。
对于PPS信息:终端可以根据解码后图像中包含的tile的数量设置PPS信息中的tiles_enabled_flag(tile使能标识)、num_tile_columns_minus1(tile列数标识)、num_tile_rows_minus1(tile行数标识)、uniform_spacing_flag(平均分配标识)、以及column_width_minus1[i](列序数标识,i为0~(num_tile_columns_minus1)-1,当num_tile_columns_minus1为0时,则column_width_minus1[i]不存在)等与tile划分相关的参数。比如图12中,终端可以设置tiles_enabled_flag为1,设置num_tile_columns_minus1为1,设置num_tile_rows_minus1为0,设置uniform_spacing_flag为0(即tile是不平均分配的),设置column_width_minus1[0]为16。
对于SS信息:终端可以将低分辨率的多个子码流作为slice,组合成一个tile数据,和高分辨率的tile数据进行拼接,每一个tile数据以slice的划分方式进行构建。终端对SS参数的头(header)信息first_slice_segment_in_pic_flag(第一个片段标识)、slice_segment_address(片段地址)等参数进行设定。比如图12中的排列形式,则每一帧的b2的SS参数的头信息中first_slice_segment_in_pic_flag设置为1,其余的first_slice_segment_in_pic_flag设置为0;每一帧的SS参数头信息中的b2、m3和m4的编码数据对应的slice_segment_address的值通过计算可以确定,图12中的slice_segment_address值分别设置为0、16、208。
示例性的,如图13所示,假设1024×1024的子码流为b2,512×512的两个子码流分别为m3和m4,子码流b2包括4帧(frame)的数据(即A0~A3)且被封装在track b2中,子码流m3和m4也包括4帧的数据(B0~B3)且被封装在track m3和track m4 中,则子码流b2、m3和m4拼接后的码流结构可以如图13中的码流C所示。SSH表示SS参数中的头(header)信息。
本申请实施例中,通过在子码流中添加该子码流所包含的tile数据的数量的信息,使得终端可以确定可用于拼接的子码流,且当可用于拼接的子码流包括多个不同的分辨率时,可以通过将低分辨率的多个子码流作为slice,组合成一个tile数据,和高分辨率的tile数据进行拼接,从而可以实现不同分辨率的多个子码流之间的拼接。
可选的,终端侧也可以向服务器侧请求,下载所有的码流,在终端侧根据用户行为,选择需要的码流,并进行解封装。然后,按照上述实施例所提供的码流拼接的方法,进行子码流拼接,以得到标准码流。
在本申请的另一种可能的实现方式中,本实施例是针对上述实施例中关于子码流中包括的tile数据的数量信息添加在SEI消息中进行的扩展,针对不能用于拼接的子码流的处理方式作出一些改动。
终端侧:
根据用户行为,终端向服务器请求需要的码流,并对接收到的码流进行解封装。
解析每个子码流的SEI消息sub_picture_info_aggregate(payloadSize),根据上述表1.1和表1.2中语法元素的值,判断每个子码流是否可用于码流拼接。当sub_pic_str_only_one_tile为1时表示子码流可用于拼接;当sub_pic_str_only_one_tile不为1时表示子码流不能用于拼接时,将该子码流直接解码,解码后的图像和拼接后的码流解码后的图像进行图像拼接,得到最后的显示图像。
比如,如图14所示,str0是不能用于拼接的子码流,终端对子码流str0解码得到的图像包括sub0、sub1、sub4和sub5,拼接后的码流解码后的图像包括sub2和sub6,将二者进行图像拼接后得到的显示图像包括sub0~sub2和sub4~sub6。
本申请实施例中,当确定子码流不能用于拼接时,可以直接对该子码流进行解码,并将解码后得到的图像与拼接码流解码后得到的图像进行图像拼接,得到最终的显示图像,从而与丢弃不能用于拼接的子码流相比,可以提高子码流的利用率。
在本申请实施例的一种可选的方式中,本实施例中给出了OMAF文件格式中用于指示子码流中包括的tile数据的数量信息的语法元素和语义。具体地,改动的地方包括服务器侧对指示子码流中包括的tile数据的数量信息的生成和终端对子码流中包括的tile数据的数量信息的解析。
服务器侧:
将每个子码流进行封装,每个子码流可以独立地封装在一个track中,比如sub-picture track。可以在sub-picture track中加入所述子码流包括的tile数据的数量的语法描述信息,样例如下:
在spco box中添加如下语法:
语义如下:
only_one_tile_presence_flag该值为1时表示当前track的码流可以参与拼接;该值为0时表示当前track的码流不能参与拼接。
或者,在spco box中添加另一种表示方式,如下语法:
语义如下:
tile_num_subpicture该值表示track的码流中的tile数量。
终端侧:
根据用户行为,终端向服务器请求需要的码流,并进行解封装。解封装的过程中,将针对spco box中的SubPictureCompositionBox进行解析,获得子码流中包括的tile数据的数量的语法信息,从而获得每个子码流能否用于拼接的信息,选择only_one_tile_presence_flag等于1的track进行后续处理。
在本申请实施例的一种可能的实现方式中,本实施例添加了在ISOBMFF规定的文件格式中描述子码流包括的tile数据的数量信息。在文件格式中,针对子码流包括的tile数据的数量信息,在track中添加sample entry type(样本入口类型):‘onti’。当sample entry name为‘onti’时,表示当前track中的子码流包括一个tile数据,从而指示该子码流可用于拼接。
在本申请实施例的一种可选的方式中,本实施例给出了在MPD中描述子码流中包括的tile数据的数量信息。
服务器侧,在MPD文件中进行子码流中包括的tile数据的数量信息的描述示例如下:
本例中指定新的EssentialProperty属性onti@value,针对子码流中包括的tile数据的数量信息表述。onti@value属性描述如表4。当终端侧进行视频内容请求时,通过解析该元素获得该子码流包括的tile数据的数量信息,当数量大于1时可以参与拼接,否则不可以参与拼接。
表4在"urn:mpeg:dash:mcsp:2014"中onti@value属性描述
Mcsp@value | Description |
Tile_num_subpicture | specifies the number of tiles |
语法元素语义如下:
Tile_num_subpicture表示该track中含有的tile数目。
此外,在MPD文件的representation字段中,也可添加子码流中包括的tile数据的数量信息。
另外,一种替换方式是,也可以采用flag信息标识,针对子码流能否参与码流拼接的信息进行标识。onti@value属性描述如表5。当终端侧进行视频内容请求时,通过解析该元素获知该子码流能否参与码流拼接的信息。
表5在"urn:mpeg:dash:mcsp:2014"中onti@value属性描述
语法元素语义如下:
only_one_tile_flag该值为1表示该track可以用于码流拼接;该值为0时,表示该track不能参与码流拼接。
可选的,本申请实施例中的only_one_tile_flag也可以替换为merge_enable_flag, merge_enable_flag用于表示子码流是否可用于拼接的指示信息,本申请实施例中仅以only_one_tile_flag为例进行说明。
此外,在MPD文件的representation字段中,也可添加子码流中包括的tile数据的数量信息。本实施例中,添加codecs字段值为”mcc”,当终端侧获取到这个信息后,表示当前子码流的预测运动矢量是受限的,预测像素不会越过子图像的某个或某些边界进行参考。
在本申请的另一实施例中:
Figure 15 show an example of how MCTS-based sub-picture tracks of the same resolution can be reconstructed to form a HEVC conforming bitstream.A 2x2 tile grid has been used in the tile sets and each motion-constrained tile set sequence is included in one sub-picture track.Each tile set originated from sub-picture bitstream is treated as a slice in the reconstructed bitstream and slice segment header of each slice in the reconstructed bitstream should be modified accordingly.
To initialize HEVC decoder correctly,corresponding initialization data such as sequence parameter set and picture parameter set should be regenerated.
Details of slice segment header paramters and initialization paramters involved are as follows:
1、In the slice segment header of each slice in reconstructed bitstream,first_slice_segment_in_pic_flag and slice_segment_address should be set accordingly.
2、In video parameter set(VPS),profile_tier_level should be set accordingly.
3、In sequence parameter set(SPS),pic_width_in_luma_samples and pic_height_in_luma_samples should be recalculated and set accordingly.
4、In picture parameter set(PPS),parameters involved are as follows:
As the example show in figure 15,tiles_enabled_flag should be set to 1,num_tile_columns_minus1 and num_tile_rows_minus1 shoud be set to 1,uniform_spacing_flag should be set to 1,loop_filter_across_tiles_enabled_flag and pps_loop_filter_across_slices_enabled_flag should be set to 0.
翻译如下:
图15中示出了基于MCTS编码的相同分辨率的子图像轨迹如何重构为一个HEVC标准码流的例子。在tile集合中使用了2x2的tile网格,每个运动受限的tile集序列被包含在一个子图像轨迹中。在重构码流时,来自子图像码流中的每个tile集被作为一个片,每个片段的头部需要相应的进行修改。其中,在一种可能的实现方式中,图15中的tile边界线和slice边界线相同。
为了正确地初始化HEVC解码器,相应的需要重新生成初始化数据,比如序列参数集和图像参数集。
所涉及到的片段的头部的参数和初始参数的详细信息如上述4个表所示。
如图15所示的示例中,tiles_enabled_flag应该被设置为1,num_tile_columns_minus1和num_tile_rows_minus1应该被设置为1,uniform_spacing_flag应该被设置为1,loop_filter_across_tiles_enabled_flag和pps_loop_filter_across_slices_enabled_flag应该被设置为0。
在本申请的另一实施例中:
For sub-picture tracks with several resolutions,the reconstructed bitstream should comply with HEVC tile and slice syntax.Figure 16 give an example of how to reconstruct from tracks with different resolutions.
Notice that bitstreams from sub-picture track 3 and 4 are reconstructed to form one tile with two slices.
In some cases,bitstreams from different resolutions might not be able to form HEVC conforming bitstream in anyway,then in such cases,OMAF player may choose to do reconstruction from same resolution and enable two HEVC decoders for different resolutions.
翻译如下:
对于不同分辨率的子图像轨迹,重构的码流应该符合HEVC tile和slice语法。图16示出了如何使用不同分辨率的子图像轨迹重构码流。
值得注意的是,来自track3和track4的码流被重构形成一个包括两个slice的tile。
在某些情况下,来自不同分辨率的码流可能无法形成HEVC标准码流,在这种情况下,OMAF播放器可以选择相同的分辨率进行重构,并使用两个HEVC解码器对不同的分辨率的码流进行解码。
在本申请实施例的另一种可能的实现方式中,本实施例提供了一种服务器装置,这种服务器是生成本文中所述的指示子码流中包括的tile数据的数量、或者用于指示子码流是否可用于拼接的信息的服务器,将信息写入码流的SEI,或者将信息以文中所述的方式写入文件等。该服务器可以不负责原始码流编码,比如可以是转码服务器,或只是生成这些信息的服务器等,用于保存带有本文所述信息的码流或文件。
本申请实施例的方案通过在码流的SEI消息或者OMAF文件格式中加入关于码流中包括的tile数据的数量的信息,使得根据tile数据的数量确定可用于拼接的子码流可能。码流拼接处理,使得多个子码流在拼接后能够使用单个解码器在一次解码操作中将所有子码流进行解码,且无需传输额外的track信息,从而节省了带宽,同时也降低了系统层管理流的复杂度。
图17是本申请实施例提供的计算机设备的硬件结构示意图。如图17所示,计算机设备可以作为媒体信息的处理装置的一种实现方式,也可以作为媒体信息的处理方法的一种实现方式,计算机设备包括处理器171、存储器172、输入/输出接口173和总线175,还可以包括通信接口174。其中,处理器171、存储器172、输入/输出接口173和通信接口174通过总线175实现彼此之间的通信连接。
处理器171可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本发明实施例所提供的流媒体的信息的处理装置中的模块所需执行的功能,或者执行本发明方法实施例对应的流媒体的信息的处理方法。处理器171可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器171中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器171可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器172,处理器171读取存储器172中的信息,结合其硬件完成本申请实施例所提供的媒体信息的处理装置中包括的模块所需执行的功能,或者执行本申请方法实施例提供的流体信息的处理方法。
存储器172可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器172可以存储操作系统以及其他应用程序。在通过软件或者固件来实现本申请实施例提供的媒体信息的处理装置中包括的模块所需执行的功能,或者执行本申请方法实施例提供的媒 体信息的处理方法时,用于实现本申请实施例提供的技术方案的程序代码保存在存储器172中,并由处理器171来执行媒体信息的处理装置中包括的模块所需执行的操作,或者执行本申请方法实施例提供的媒体信息的处理方法。
输入/输出接口173用于接收输入的数据和信息,输出操作结果等数据。
通信接口174使用例如但不限于收发器一类的收发装置,来实现计算机设备与其他设备或通信网络之间的通信。可以作为处理装置中的获取模块或者发送模块。
总线175可包括在计算机设备各个部件(例如处理器171、存储器172、输入/输出接口173和通信接口174)之间传送信息的通路。
应注意,尽管图17所示的计算机设备仅仅示出了处理器171、存储器172、输入/输出接口173、通信接口174以及总线175,但是在具体实现过程中,本领域的技术人员应当明白,计算机设备还包括实现正常运行所必须的其他器件,例如还可以包括显示器,用于显示要播放的视频数据。同时,根据具体需要,本领域的技术人员应当明白,计算机设备还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当明白,计算机设备也可仅仅包括实现本申请实施例所必须的器件,而不必包括图17中所示的全部器件。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请的另一实施例中,还提供一种可读存储介质,可读存储介质中存储有计算机执行指令,当一个设备(可以是单片机,芯片等)或者处理器执行上述方法实施例所提供的媒体信息的处理方法中的部分或者全部步骤。前述的可读存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得设备实施上述方法实施例所提供的媒体信息的处理方法中的部分或者全部步骤。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (30)
- 一种媒体信息的处理方法,其特征在于,所述方法包括:获取子码流数据,所述子码流数据包括指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量,或所述指示信息用于指示所述子码流数据是否可用于拼接;根据所述指示信息处理所述子码流数据。
- 根据权利要求1所述的方法,其特征在于,所述指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
- 根据权利要求1所述的方法,其特征在于,所述指示信息承载在轨迹(track)的一个盒(box)中。
- 根据权利要求1所述的方法,其特征在于,所述指示信息承载在轨迹(track)的样本入口类型中。
- 根据权利要求1所述的方法,其特征在于,所述指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述子码流数据还包括视频参数集(video parameter set,VPS)信息,所述根据所述指示信息处理所述子码流数据,包括:根据所述指示信息和所述VPS信息,处理所述子码流数据。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述子码流数据还包括序列参数集(sequence parameter set,SPS)信息,所述根据所述指示信息处理所述子码流数据,包括:根据所述指示信息和所述SPS信息,处理所述子码流数据。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述子码流数据还包括图像参数集(picture parameter set,PPS)信息,所述根据所述指示信息处理所述子码流数据,包括:根据所述指示信息和所述PPS信息,处理所述子码流数据。
- 根据权利要求1-8任一项所述的方法,其特征在于,所述子码流数据还包括片段(slice segment,SS)信息,所述根据所述指示信息处理所述子码流数据,包括:根据所述指示信息和所述SS信息,处理所述子码流数据。
- 根据权利要求1-9任一项所述的方法,其特征在于,所述子码流数据还包括分辨率信息。
- 一种媒体信息的处理方法,其特征在于,所述方法包括:获取原始图像的子码流数据;根据所述子码流数据中包括的tile数据的数量,确定所述子码流数据的指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量,或所述指示信息用于指示所述子码流数据是否可用于拼接;向终端发送所述子码流数据,所述子码流数据包括所述指示信息。
- 根据权利要求11所述的方法,其特征在于,所述指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
- 根据权利要求11所述的方法,其特征在于,所述指示信息承载在轨迹(track)的一个盒(box)中。
- 根据权利要求11所述的方法,其特征在于,所述指示信息承载在轨迹(track)的样本入口类型中。
- 根据权利要求11所述的方法,其特征在于,所述指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
- 一种媒体信息的处理装置,其特征在于,所述装置包括:获取模块,用于获取子码流数据,所述子码流数据包括指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量,或所述指示信息用于指示所述子码流数据是否可用于拼接;处理模块,用于根据所述指示信息处理所述子码流数据。
- 根据权利要求16所述的装置,其特征在于,所述指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
- 根据权利要求16所述的装置,其特征在于,所述指示信息承载在轨迹(track)的一个盒(box)中。
- 根据权利要求16所述的装置,其特征在于,所述指示信息承载在轨迹(track)的样本入口类型中。
- 根据权利要求16所述的装置,其特征在于,所述指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述子码流数据还包括视频参数集(video parameter set,VPS)信息,所述处理模块,还用于:根据所述指示信息和所述VPS信息,处理所述子码流数据。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述子码流数据还包括序列参数集(sequence parameter set,SPS)信息,所述处理模块,还用于:根据所述指示信息和所述SPS信息,处理所述子码流数据。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述子码流数据还包括图像参数集(picture parameter set,PPS)信息,所述处理模块,还用于:根据所述指示信息和所述PPS信息,处理所述子码流数据。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述子码流数据还包括片段(slice segment,SS)信息,所述处理模块,还用于:根据所述指示信息和所述SS信息,处理所述子码流数据。
- 根据权利要求16-20任一项所述的装置,其特征在于,所述子码流数据还包括分辨率信息。
- 一种媒体信息的处理装置,其特征在于,所述装置包括:获取模块,用于获取原始图像的子码流数据;处理模块,用于根据所述子码流数据中包括的tile数据的数量,确定所述子码流数据的指示信息,所述指示信息用于指示所述子码流数据中包括的tile数据的数量,或所述指示信息用于指示所述子码流数据是否可用于拼接;发送模块,用于向终端发送所述子码流数据,所述子码流数据包括所述指示信息。
- 根据权利要求26所述的装置,其特征在于,所述指示信息承载在补充增强信息(supplementary enhancement information,SEI)中。
- 根据权利要求26所述的装置,其特征在于,所述指示信息承载在轨迹(track)的一个盒(box)中。
- 根据权利要求26所述的装置,其特征在于,所述指示信息承载在轨迹(track)的样本入口类型中。
- 根据权利要求26所述的装置,其特征在于,所述指示信息承载在媒体展示描述(media presentation description,MPD)文件中。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19738221.1A EP3734983A4 (en) | 2018-01-12 | 2019-01-04 | MULTIMEDIA INFORMATION PROCESSING PROCESS AND APPARATUS |
US16/926,080 US11172239B2 (en) | 2018-01-12 | 2020-07-10 | Media information processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810032638.1A CN110035331B (zh) | 2018-01-12 | 2018-01-12 | 一种媒体信息的处理方法及装置 |
CN201810032638.1 | 2018-01-12 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/926,080 Continuation US11172239B2 (en) | 2018-01-12 | 2020-07-10 | Media information processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019137313A1 true WO2019137313A1 (zh) | 2019-07-18 |
Family
ID=67218893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/070480 WO2019137313A1 (zh) | 2018-01-12 | 2019-01-04 | 一种媒体信息的处理方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11172239B2 (zh) |
EP (1) | EP3734983A4 (zh) |
CN (1) | CN110035331B (zh) |
WO (1) | WO2019137313A1 (zh) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102154407B1 (ko) * | 2018-11-15 | 2020-09-09 | 한국전자기술연구원 | 타일 기반 스트리밍을 위한 모션 제한 av1 영상 부호화 방법 및 장치 |
CN112236998A (zh) * | 2019-01-02 | 2021-01-15 | 株式会社 Xris | 用于对视频信号进行编码/解码的方法及其装置 |
US11172232B2 (en) * | 2019-09-06 | 2021-11-09 | Sharp Kabushiki Kaisha | Systems and methods for signaling level information in video coding |
GB2590632B (en) * | 2019-12-20 | 2023-07-26 | Canon Kk | Video coding and decoding |
KR20220143943A (ko) | 2020-02-28 | 2022-10-25 | 후아웨이 테크놀러지 컴퍼니 리미티드 | 슬라이스 헤더 신택스 엘리먼트의 시그널링을 단순화하는 인코더, 디코더, 및 대응하는 방법 |
CN115715466A (zh) | 2020-05-22 | 2023-02-24 | 抖音视界有限公司 | 子比特流提取处理中的填充符有效载荷 |
CN113660529A (zh) * | 2021-07-19 | 2021-11-16 | 镕铭微电子(济南)有限公司 | 基于Tile编码的视频拼接、编码、解码方法及装置 |
CN114268835B (zh) * | 2021-11-23 | 2022-11-01 | 北京航空航天大学 | 一种低传输流量的vr全景视频时空切片方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014047943A1 (zh) * | 2012-09-29 | 2014-04-03 | 华为技术有限公司 | 视频编码及解码方法、装置及系统 |
CN103780920A (zh) * | 2012-10-17 | 2014-05-07 | 华为技术有限公司 | 处理视频码流的方法及装置 |
US20170134795A1 (en) * | 2014-10-29 | 2017-05-11 | Spotify Ab | Method and an electronic device for playback of video |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9161039B2 (en) * | 2012-09-24 | 2015-10-13 | Qualcomm Incorporated | Bitstream properties in video coding |
KR20140112909A (ko) * | 2013-03-14 | 2014-09-24 | 삼성전자주식회사 | 파노라마 영상을 생성하는 전자 장치 및 방법 |
US9749627B2 (en) | 2013-04-08 | 2017-08-29 | Microsoft Technology Licensing, Llc | Control data for motion-constrained tile set |
GB2516424A (en) * | 2013-07-15 | 2015-01-28 | Nokia Corp | A method, an apparatus and a computer program product for video coding and decoding |
WO2015197815A1 (en) * | 2014-06-27 | 2015-12-30 | Koninklijke Kpn N.V. | Determining a region of interest on the basis of a hevc-tiled video stream |
GB2531271A (en) | 2014-10-14 | 2016-04-20 | Nokia Technologies Oy | An apparatus, a method and a computer program for image sequence coding and decoding |
EP3338454A1 (en) * | 2015-08-20 | 2018-06-27 | Koninklijke KPN N.V. | Forming one or more tile streams on the basis of one or more video streams |
CN107318008A (zh) * | 2016-04-27 | 2017-11-03 | 深圳看到科技有限公司 | 全景视频播放方法及播放装置 |
CN106331480B (zh) * | 2016-08-22 | 2020-01-10 | 北京交通大学 | 基于图像拼接的视频稳像方法 |
-
2018
- 2018-01-12 CN CN201810032638.1A patent/CN110035331B/zh active Active
-
2019
- 2019-01-04 EP EP19738221.1A patent/EP3734983A4/en not_active Ceased
- 2019-01-04 WO PCT/CN2019/070480 patent/WO2019137313A1/zh unknown
-
2020
- 2020-07-10 US US16/926,080 patent/US11172239B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014047943A1 (zh) * | 2012-09-29 | 2014-04-03 | 华为技术有限公司 | 视频编码及解码方法、装置及系统 |
CN103780920A (zh) * | 2012-10-17 | 2014-05-07 | 华为技术有限公司 | 处理视频码流的方法及装置 |
US20170134795A1 (en) * | 2014-10-29 | 2017-05-11 | Spotify Ab | Method and an electronic device for playback of video |
Non-Patent Citations (1)
Title |
---|
See also references of EP3734983A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3734983A4 (en) | 2020-12-16 |
US20200344499A1 (en) | 2020-10-29 |
EP3734983A1 (en) | 2020-11-04 |
US11172239B2 (en) | 2021-11-09 |
CN110035331A (zh) | 2019-07-19 |
CN110035331B (zh) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019137313A1 (zh) | 一种媒体信息的处理方法及装置 | |
TWI680673B (zh) | 視頻影像編解碼方法及設備 | |
US11876994B2 (en) | Description of image composition with HEVC still image file format | |
US10257247B2 (en) | Method, device, and computer program for encapsulating and parsing timed media data | |
KR102320455B1 (ko) | 미디어 콘텐트를 전송하는 방법, 디바이스, 및 컴퓨터 프로그램 | |
JP6150011B2 (ja) | インタラクティビティのための動き制約タイルセットseiメッセージの拡張 | |
US11638066B2 (en) | Method, device and computer program for encapsulating media data into a media file | |
CN109587478B (zh) | 一种媒体信息的处理方法及装置 | |
KR102334628B1 (ko) | 360도 비디오의 영역 정보 전달 방법 및 장치 | |
KR102336987B1 (ko) | 360도 비디오의 영역 기반 처리 방법 및 장치 | |
KR20210016530A (ko) | 미디어 콘텐츠 전송을 위한 방법, 디바이스, 및 컴퓨터 프로그램 | |
US11653054B2 (en) | Method and apparatus for late binding in media content | |
CN112602329B (zh) | 用于360度视频解码的块置乱 | |
WO2019128668A1 (zh) | 视频码流处理方法、装置、网络设备和可读存储介质 | |
US11146799B2 (en) | Method and apparatus for decoding video bitstream, method and apparatus for generating video bitstream, storage medium, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19738221 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019738221 Country of ref document: EP Effective date: 20200729 |