WO2015074273A1 - 处理视频的设备和方法 - Google Patents

处理视频的设备和方法 Download PDF

Info

Publication number
WO2015074273A1
WO2015074273A1 PCT/CN2013/087773 CN2013087773W WO2015074273A1 WO 2015074273 A1 WO2015074273 A1 WO 2015074273A1 CN 2013087773 W CN2013087773 W CN 2013087773W WO 2015074273 A1 WO2015074273 A1 WO 2015074273A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
track
container
sample
video
Prior art date
Application number
PCT/CN2013/087773
Other languages
English (en)
French (fr)
Inventor
夏青
张园园
石腾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201810133819.3A priority Critical patent/CN108184101B/zh
Priority to CN201380002598.1A priority patent/CN104919812B/zh
Priority to PCT/CN2013/087773 priority patent/WO2015074273A1/zh
Publication of WO2015074273A1 publication Critical patent/WO2015074273A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4545Input to filtering algorithms, e.g. filtering a region of the image
    • H04N21/45455Input to filtering algorithms, e.g. filtering a region of the image applied to a region of the image

Definitions

  • the present invention relates to the field of information technology and, in particular, to an apparatus and method for processing video. Background technique
  • FIG. 1 is a schematic diagram of a scene in which an area picture in a video needs to be extracted.
  • a European Cup game was shot using panoramic photography.
  • the resulting panoramic video has a resolution of 6Kx2K, which is suitable for playing on an ultra-high resolution panoramic display, but if the user wants to view the panoramic video on a normal screen, Since the resolution of the ordinary screen is small, it is necessary to extract the area picture in the panoramic video and play the area picture on the ordinary screen.
  • the top is a panoramic screen
  • the bottom is the phone screen and the computer screen
  • the panoramic screen can display the complete video screen
  • the mobile screen and the computer screen can not display the complete panoramic video, so on the phone screen and
  • FIG. 2 is a schematic diagram of another scene in which an area picture in a video needs to be extracted.
  • video surveillance you can put together the pictures taken by multiple cameras to form a surveillance video.
  • the monitor video is played back, if the user needs to specify a screen shot by one of the cameras for playback, it is necessary to extract the area screen of the monitor video for playback.
  • the left side is a monitoring video, and each image in the video includes a picture taken by a plurality of cameras, and it is assumed that the area indicated by the dotted line box is a picture taken by the camera that the user needs to specify for playback. Then, you need to extract the area picture and play it separately.
  • Embodiments of the present invention provide an apparatus and method for processing a video, which can effectively implement extraction of a region picture in a video.
  • an apparatus for processing a video is provided.
  • the video track of the video is divided into at least one sub-track, each sub-track is defined by a sub-track data description container and a sub-track data definition container.
  • the device includes: a receiving unit, configured to: receive a video file corresponding to the video, where the video file includes at least one sub-track data description container, at least one sub-track data definition container, and samples constituting a video track, the sub-
  • the track data description container includes area information of the sub track described by the sub track data description container, the area of the sub track
  • NAL packet corresponding to the subtrack of the container description
  • a determining unit configured to: determine a target area that needs to be extracted in a picture of the video, and a playing time period that needs to be extracted; according to the video file received by the receiving unit, in the sample that constitutes the video track Determining a sample corresponding to the playing time period; describing area information of the sub-track included in the container according to the target area and the sub-track data, defining a container in the sub-track data corresponding to the at least one sub-track, determining the playing The NAL packet corresponding to the target sub-track in the sample corresponding to the time period, and the determined NAL packet is decoded to play a picture of the target area in the playing time period.
  • the area corresponding to the sub track is composed of at least one block;
  • the video file further includes a sample group description container, where the sample group description container includes the video Corresponding relationship between each block and NAL packet in the track and an identifier of a correspondence between the respective block and the NAL packet; sub-track data corresponding to the target sub-track
  • the identifier of the corresponding relationship between the NAL packets; the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time segment is specifically: determining the playing time according to the identifier of the correspondence between the divided block and the NAL packet The NAL packet corresponding to the target subtrack in the sample corresponding to the segment.
  • identifying the same tile corresponds to the same Numbered NAL package.
  • the sub-track data definition container corresponding to the target sub-track further includes an identifier corresponding to a correspondence between each partition of the target sub-track and the NAL packet Sample information;
  • the NAL packet corresponding to the target sub-track is specifically: an identifier according to a correspondence between each partition of the target sub-track and a NAL packet, and between each partition of the target sub-track and the NAL
  • the sample information corresponding to the identifier of the correspondence relationship and the sample group description container determine a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container further includes a group identifier; And the unit is further configured to: before determining the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, acquiring the sample group having the group identifier from the video file according to the group identifier Describe the container.
  • the area corresponding to the sub track is composed of at least one block;
  • the video file further includes a sample group description container, where the sample group description container includes at least one mapping Each of the at least one mapping group includes a correspondence between each of the block identifiers and the NAL packet in the video track;
  • the video file further includes a sample and sample group mapping relationship container, the sample And a sample group mapping relationship container is configured to indicate a sample corresponding to each of the at least one mapping group;
  • the sub-track data definition container corresponding to the target sub-track includes an identifier of each of the target sub-tracks;
  • the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time segment is specifically: according to the sample group description container, the sample and the sample group mapping relationship container, and the identifier of each segment of the target sub-track And determining a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container includes a group identifier;
  • the determining unit is further configured to: before determining the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, obtain, according to the group identifier, the packet identifier from the video file.
  • the sample group describes a container and the sample and sample group mapping relationship container having the group identification.
  • an apparatus for processing a video is provided.
  • the video track of the video is divided into at least one sub-track, the video track being composed of samples.
  • the device includes: a generating unit, configured to: generate, for each of the at least one sub-track, a sub-track data description container and a sub-track data definition container, the sub-track data description container including the sub-
  • the track data describes the area information of the sub-track described by the container, the area of the sub-track
  • NAL packet corresponding to the described sub-track; generating a video file of the video, the video file including the one sub-track data description container and the one sub-track data definition generated for each of the sub-tracks a container and the sample constituting the video track;
  • a sending unit configured to: send the video file generated by the generating unit.
  • the area corresponding to the sub-track is composed of at least one partition;
  • the sub-track data definition container includes the sub-track in the sample constituting the video track The data defines an identifier of a correspondence between each of the sub-tracks of the sub-track described by the container and the NAL packet;
  • the generating unit is further configured to: before the generating the video file of the video, generate a sample group description container, where the sample group description container includes a correspondence between each block and the NAL package in the video track and An identifier of a correspondence between the respective blocks and the NAL packet;
  • the video file further includes the sample group description container.
  • the area corresponding to the sub-track is composed of at least one partition;
  • the sub-track data definition container includes each sub-track described by the sub-track data definition container Identification of the blocks;
  • the generating unit is further configured to: before the generating the video file of the video, generating a sample group description container and a mapping relationship container between the sample and the sample group, where the sample group description container includes at least one mapping group, where the at least one mapping group Each mapping group in a mapping group includes a correspondence between each of the tile identifiers and the NAL packet in the video track, and the sample and sample group mapping relationship container is used to indicate each mapping in the at least one mapping group. a corresponding sample of the group;
  • the video file further includes: the sample group description container and the mapping relationship container of the sample and the sample group.
  • a method of processing a video is provided.
  • the video track of the video is divided into at least one sub-track, each sub-track is defined by a sub-track data description container and a sub-track data definition container.
  • the method includes: receiving a video file corresponding to the video, the video file including at least one sub-track data description container, at least one sub-track data definition container, and samples constituting the video track, the sub-track data description container Included, the sub-track data describes area information of a sub-track described by the container, the area information of the sub-track is used to indicate an area corresponding to the sub-track in a picture of the video, and the sub-track data definition container is used for Refer to the corresponding network extraction layer NAL packet; determine a target area that needs to be extracted in the picture of the video and a playback time period that needs to be extracted; and determine, according to the video file, the sample that constitutes the video track Determining a sample corresponding to the playback time period; describing area information of the sub-tracks included in the container according to the target area and the sub-track data, and determining, in the at least one sub-track, a sub-track corresponding to the target area as a target sub- a track; defining a container
  • the area corresponding to the sub track is composed of at least one block;
  • the video file further includes a sample group description container, where the sample group description container includes the video Corresponding relationship between each block and NAL packet in the track and an identifier of a correspondence between the respective block and the NAL packet; sub-track data corresponding to the target sub-track
  • Determining, according to the sub-track data corresponding to the target sub-track, the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period including: describing the container according to the sample group and the video track in the component
  • the identifier of the correspondence between each of the target sub-tracks and the NAL packet in the sample, and the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period is determined.
  • the same sub-segment is identified for the samples constituting the video track Blocks correspond to NAL packets of the same number.
  • At least one of the samples constituting the video track is at least one of the regions corresponding to the sub-track Identifying the same partition corresponds to a different numbered NAL packet;
  • the sub-track data definition container corresponding to the target sub-track further includes an identifier corresponding to a correspondence between each partition of the target sub-track and the NAL packet
  • the NAL packet corresponding to the target sub-track in the sample corresponding to the inter-segment includes: an identifier according to a correspondence between each sub-block and the NAL packet of the target sub-track, and each of the target sub-tracks
  • the sample information corresponding to the identifier of the correspondence between the partition and the NAL and the sample group description container determine the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container further includes a group identifier
  • the method further includes: acquiring, according to the group identifier, the sample group description container having the group identifier from the video file.
  • the area corresponding to the sub track is composed of at least one block;
  • the video file further includes a sample group description container, where the sample group description container includes at least one mapping Each of the at least one mapping group includes a correspondence between each of the block identifiers and the NAL packet in the video track;
  • the video file further includes a sample and sample group mapping relationship container, the sample And a sample group mapping relationship container is configured to indicate a sample corresponding to each of the at least one mapping group;
  • the sub-track data definition container corresponding to the target sub-track includes an identifier of each of the target sub-tracks;
  • the NAL packet corresponding to the target sub-track in the sample corresponding to the interval includes: determining, according to the sample group description container, the sample and the sample group mapping relationship container, and the identifier of each block of the target sub-track, determining a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container includes a group identifier; Determining, according to the identifier of each of the sample group description container, the sample and the sample group mapping relationship container, and the target sub-track, determining the target sub-tracks in the samples corresponding to the playing time period Before the corresponding NAL packet, the method further includes: acquiring, according to the group identifier, the sample group description container having the group identifier from the video file, and the sample and sample group mapping relationship container having the group identifier .
  • a fourth aspect of an embodiment of the present invention provides a method of processing a video.
  • the video track of the video is divided into at least one sub-track, the video track being composed of samples.
  • the method includes: generating, for each of the at least one sub-track, a sub-track data description container and a sub-track data definition container, the sub-track data description container including the sub-track data description container description
  • the area information of the sub-track is used to indicate an area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container is used to indicate that the NAL packet is extracted in the group;
  • a video file of the video the video file includes the one sub-track data description container and the one sub-track data definition container generated for each of the sub-tracks, and the sample constituting the video track;
  • the video file includes: generating, for each of the at least one sub-track, a sub-track data description container and a sub-track data definition container, the sub-track data description container including the sub-track data
  • the area corresponding to the sub-track is composed of at least one partition;
  • the sub-track data definition container includes the sub-track in the sample constituting the video track The data defines an identifier of a correspondence between each of the sub-tracks of the sub-track described by the container and the NAL packet;
  • the method further includes: generating a sample group description container, where the sample group description container includes a correspondence between each of the video tracks and a NAL package, and the An identifier of a correspondence between each of the partitions and the NAL packet;
  • the video file further includes the sample group description container.
  • the area corresponding to the sub-track is composed of at least one partition;
  • the sub-track data definition container includes each sub-track of the sub-track data definition container description Identification of the blocks;
  • the method further includes: generating a sample group description container and a mapping relationship container between the sample and the sample group, the sample group description container including at least one mapping a group, the mapping group of each of the at least one mapping group includes a correspondence between each of the block identifiers and the NAL packet in the video track, where the sample and sample group mapping relationship container is used to indicate the at least one mapping a sample corresponding to each mapping group in the group;
  • the video file further includes the sample group description container and the mapping relationship container of the sample and the sample group.
  • an apparatus for processing a video is provided.
  • the video track of the video is divided into at least one sub-track, each sub-track is described by a sub-track data description container and a sub-track data definition container
  • the device includes: a memory, a processor and a receiver; the receiver receives the video corresponding to the video a file, the video file comprising at least one sub-track data description container, at least one sub-track data definition container, and a sample constituting a video track, the sub-track data description container including sub-track data describing area information of the sub-track described by the container, the area of the sub-track
  • the information is used to indicate an area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container is used to indicate the network abstraction layer NAL packet corresponding to the sub-track described by the sub-track data definition container in the samples constituting the video track.
  • the memory is configured to store the executable instructions; the processor executes the executable instructions stored in the memory, and is configured to: determine a target area that needs to be extracted in the picture of the video and a playing time period that needs to be extracted; according to the video file received by the receiving unit, Determining a sample corresponding to the playing time period in the sample constituting the video track; describing the area information of the sub-track included in the container according to the target area and the sub-track data, and determining the sub-track corresponding to the target area as the target sub-track in the at least one sub-track; Determining a container according to the sub-track data corresponding to the target sub-track, determining a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, and determining the NAL packet to be used for playing the screen of the target area in the playing time period.
  • an apparatus for processing a video is provided.
  • the video track of the video is divided into at least one sub-track, and the video track is composed of samples.
  • the device includes: a memory, a processor, and a transmitter.
  • the memory is used to store executable instructions.
  • the processor executes executable instructions stored in the memory, configured to: generate, for each subtrack of the at least one subtrack, a subtrack data description container and a subtrack data definition container, the subtrack data description container including the subtrack data Describe the area information of the sub-track of the container description, the area information of the sub-track is used to indicate the area corresponding to the sub-track in the picture of the video, and the sub-track data definition container is used to indicate the sub-track data definition in the sample composing the video track A NAL packet corresponding to the sub-track of the container description; a video file of the video is generated, the video file including a sub-track data description container and a sub-track data definition container generated for each sub-track and samples constituting the video track. Transmitter sends video File.
  • the sub-track corresponding to the target area is determined as the target sub-orbit in the at least one sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and corresponding to the target sub-track.
  • the sub-track data definition container determines the NAL packets corresponding to the target sub-tracks in the samples corresponding to the playing time period, so that the NAL packets can be decoded to play the picture of the target area in the playing time period, thereby effectively implementing the video in-region Extraction of the picture.
  • FIG. 1 is a schematic diagram of a scene in which an area picture in a video needs to be extracted.
  • FIG. 2 is a schematic diagram of another scene in which an area picture in a video needs to be extracted.
  • Figure 3a is a schematic flow diagram of an apparatus for processing video in accordance with one embodiment of the present invention.
  • FIG. 3b is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • FIG. 4a is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • FIG. 4b is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • Figure 5a is a schematic flow diagram of a method of processing a video in accordance with one embodiment of the present invention.
  • FIG. 5b is a schematic flowchart of a method of processing a video according to another embodiment of the present invention.
  • Figure 6a is a schematic illustration of an image frame in a scene to which an embodiment of the present invention may be applied.
  • Figure 6b is a schematic illustration of another image frame in a scene to which an embodiment of the present invention may be applied.
  • FIG. 7 is a schematic flow diagram of a process of a method of processing a video in accordance with one embodiment of the present invention.
  • Figure 8 is a schematic illustration of a block in accordance with one embodiment of the present invention.
  • Figure 9 is a schematic illustration of the correspondence between a partition and a NAL packet, in accordance with one embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a correspondence relationship between a block and a NAL packet according to another embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a correspondence between a block and a NAL packet according to another embodiment of the present invention. Figure.
  • Figure 12 is a schematic illustration of the block shown in Figure 8 in a planar coordinate system.
  • Figure 13 is a schematic flow chart of a process of a method of processing a video corresponding to the process of Figure 7.
  • Figure 14 is a schematic illustration of a target sub-track corresponding to a target area, in accordance with one embodiment of the present invention.
  • Figure 15 is a diagram showing descriptive information of a sub-track according to an embodiment of the present invention.
  • Figure 16 is a diagram showing descriptive information of a sub-track according to another embodiment of the present invention.
  • 17 is a schematic flow chart of a process of a method of processing a video according to another embodiment of the present invention.
  • Fig. 18 is a schematic flow chart showing the procedure of a method of processing a video corresponding to the process of Fig. 17.
  • Figure 19 is a diagram showing descriptive information of a sub-track according to an embodiment of the present invention. detailed description
  • a video program can contain different types of media streams, and different types of media streams can be referred to as different tracks.
  • a video stream may be referred to as a video track
  • an audio stream may be referred to as an audio track
  • a subtitle stream may be referred to as a subtitle track.
  • Embodiments of the invention relate to processing for video tracks.
  • a video track may refer to a set of samples arranged in chronological order, such as a video stream for a period of time.
  • the sample is the same type of media data corresponding to a time stamp.
  • the Sub Track mechanism is a method for grouping samples in a video track as defined in the International Organization for Standardization (Media File Format, ISOBMFF). .
  • the subtrack mechanism can be used primarily for media selection or media switching. That is to say, a plurality of sub-tracks obtained by using a grouping criterion are mutually substituted or mutually switched.
  • the picture of the target area i or the picture can be extracted from the picture of the video based on the sub-track mechanism.
  • the video may be encoded by the HEVC method.
  • Video encoded by the HEVC method can be stored as a video file in accordance with the framework defined by ISOBMFF.
  • the basic unit constituting the video file may be a container, and the video file may be composed of a group of containers.
  • the container can contain both a header (header) and a load (payload).
  • the load is the data contained in the container, such as media data, metadata, or other containers.
  • the head in the container can indicate the type and length of the container.
  • the video track of the video can be obtained.
  • the video track of the video may be divided into at least one video sub-track (abbreviated as a sub-track in the embodiment of the invention), and each sub-track may correspond to an area in the video picture.
  • the video track consists of a set of samples (ie consisting of at least two samples), and the picture presented by each sample is the video picture. It can therefore be understood that each sample can correspond to each of the sub-tracks of at least one of the sub-tracks described above.
  • each sample is also composed of consecutive NAL packets. It can be understood that the consecutive NAL packets in the embodiment of the present invention indicate that no extra byte slots are used between the NAL packets.
  • Each sample corresponds to each of the at least one sub-track described above, and it is understood that each sub-track may correspond to one or more consecutive NAL packets in a sample.
  • each sub-track can be described by a sub-track data description container (Sub Track Information Box) and a sub-track data definition container (Sub Track Definition Box).
  • Sub-track data description containers describing the same sub-track and sub-track data definition containers can be packaged in a sub-track container (Sub Track Box). That is, each sub-track can be described by a sub-track container, which can include a sub-track data description container and a sub-track data definition container describing the sub-track.
  • the sub-track data description container may include area information of the sub-track, and the area information of the sub-track may indicate the corresponding area of the sub-track in the video picture.
  • the subtrack data definition container can describe the data contained in the subtrack.
  • the sub-track data definition container may indicate a network abstraction layer (NAL) packet corresponding to the sub-track of the sub-track data definition container in each sample.
  • NAL network abstraction layer
  • the video file corresponding to the video may include at least one sub-track data description container and at least one sub-track data definition container and samples constituting the video track.
  • the video file is also Therefore, in order to extract the target area in the video picture and play the picture of the target area in a certain playing time period, it is necessary to acquire the NAL packet of the target area in the playing time period, and perform the acquired NAL packet. Decoding to play the picture of the target area within the playing time period.
  • each sub-track corresponds to an area in the video picture
  • the area information of the sub-tracks in the container may be described according to the target area and the sub-track data, and the sub-track corresponding to the target area is determined, that is, in the embodiment of the present invention.
  • the sample corresponding to the playing time period can be determined based on the playing time period that needs to be extracted.
  • the sub-track data definition container corresponding to each sub-track may indicate the NAL packet corresponding to the sub-track in each sample. Therefore, after determining the sample corresponding to the playing time period, the container may be defined according to the sub-track data corresponding to the target sub-track, and the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period is determined. For example, the number of the NAL packet corresponding to the target subtrack is determined. Thus, these NAL packets can be obtained from the video file, thereby decoding the NAL packets to play the picture of the target area within the above playing time period.
  • Figure 3a is a schematic flow diagram of an apparatus for processing video in accordance with one embodiment of the present invention.
  • An example of the device 300a of Figure 3a may be a file parser, or a user device containing a file parser, and the like.
  • the device 300a includes a receiving unit 310a and a determining unit 320a.
  • the video track of the video is divided into at least one sub-track, each sub-track being defined by a sub-track data description container and a sub-track data definition container.
  • the receiving unit 310a receives a video file corresponding to the video, the video file includes at least one sub-track data description container, at least one sub-track data definition container, and a sample constituting the video track, and the sub-track data description container includes the child track data description container description
  • the area information of the track, the area information of the sub track is used to indicate the area corresponding to the sub track in the picture of the video, and the sub track data definition container is used to indicate the sub track of the sub track data definition container description in the sample composing the video track Corresponding NAL package.
  • the determining unit 320a determines a target area that needs to be extracted in the picture of the video and a playing time period that needs to be extracted.
  • the determining unit 320a further determines samples corresponding to the playing time period in the samples constituting the video track according to the video file received by the receiving unit 310a.
  • the determining unit 320a also describes the area letter of the sub-track included in the container according to the target area and the sub-track data.
  • the sub-track corresponding to the target area is determined as the target sub-track in at least one sub-track.
  • the determining unit 320a further defines a container according to the sub-track data corresponding to the target sub-track, and determines a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, and the determined NAL packet is decoded and used to play the target area in the playing time period. Picture.
  • the sub-track corresponding to the target area is determined as the target sub-orbit in the at least one sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and corresponding to the target sub-track.
  • the sub-track data definition container determines the NAL packets corresponding to the target sub-tracks in the samples corresponding to the playing time period, so that the NAL packets can be decoded to play the picture of the target area in the playing time period, thereby effectively implementing the video in-region Extraction of the picture.
  • the area corresponding to the sub track may be composed of at least one block.
  • the video file may also include a sample group description container, which may include a correspondence between each of the blocks in the video track and the NAL package and an identification of the correspondence between each of the blocks and the NAL package.
  • the sub-track data definition container corresponding to the target sub-track may include an identifier of a correspondence between each of the target sub-tracks and the NAL packet in the samples constituting the video track.
  • the determining unit 320a determines, according to the sub-track data corresponding to the target sub-track, that the NAL packet corresponding to the target sub-orbit in the sample corresponding to the playing time period is specifically: according to the sample group description container and the target sub-orbit in the sample composing the video track An identifier of a correspondence between each of the partitions and the NAL packet determines a NAL packet corresponding to the target sub-track in the sample corresponding to the playback time period.
  • the same block that identifies the same may correspond to the NAL package of the same number.
  • At least one of the blocks that identify the same may be corresponding to different numbered NAL packets for at least two samples in the samples constituting the video track.
  • the sub-track data definition container corresponding to the target sub-track may further include sample information corresponding to the identifier of the correspondence between each of the target sub-tracks and the NAL packet.
  • the determining unit 320a determines, according to the sample group description container and the identifier of the correspondence between each partition of the target sub-track and the NAL packet in the sample constituting the video track, the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period may be Specifically, the identifier information corresponding to the identifier of each of the target sub-tracks and the NAL packet, the identifier corresponding to the identifier of the correspondence between each partition of the target sub-track and the NAL, and the sample group description container And determining a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container may further include a group identifier.
  • the determining unit 320a may further obtain a sample group description container having the group identifier from the video file according to the group identifier before determining the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the area corresponding to the sub track may be composed of at least one block.
  • the video file may further include a sample group description container, and the sample group description container may include at least one mapping group, each of the at least one mapping group including a correspondence between each of the tile identifiers and the NAL packet in the video track.
  • the video file may further include a sample and sample group mapping relationship container, and the sample and sample group mapping relationship container is used to indicate a sample corresponding to each mapping group in the at least one mapping group.
  • the sub-track data definition container corresponding to the target sub-track includes the identification of each of the sub-tracks of the target sub-track.
  • the determining unit 320a determines, according to the sub-track data corresponding to the target sub-track, that the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period is: describing the container, the sample and the sample group mapping relationship container and the target sub-orbit according to the sample group
  • the identifier of each block determines the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the sub-track data definition container may include a group identifier.
  • the determining unit 320a may further obtain, after determining the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, the sample group description container having the group identifier and the sample having the group identifier from the video file according to the group identifier.
  • the sample group maps the relationship container.
  • FIG. 3b is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • An example of the device 300b of Figure 3b may be a file parser, or a user device containing a file parser, and the like.
  • Device 300b includes a memory 310b, a processor 320b, and a receiver 330b.
  • Memory 310b may include random access memory, flash memory, read only memory, programmable read only memory, nonvolatile memory or registers, and the like.
  • the processor 320b may be a Central Processing Unit (CPU).
  • the memory 310b is for storing executable instructions.
  • Processor 320b can execute the executable instructions stored in memory 310b.
  • the video track of the video is divided into at least one sub-track, each sub-track is defined by a sub-track data description container and a sub-track data definition container.
  • Receiver 330b receives the video corresponding a video file, the video file including at least one sub-track data description container, at least one sub-track data definition container, and a sample constituting a video track, the sub-track data description container including sub-track data describing area information of the sub-track described by the container, sub-track
  • the area information is used to indicate an area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container is used to indicate the NAL packet corresponding to the sub-track described by the sub-track data definition container in the samples constituting the video track.
  • the processor 320b executes executable instructions stored in the memory 310b for: determining a target area that needs to be extracted in a picture of the video and a playback time period that needs to be extracted; according to the video file received by the receiving unit, in the sample constituting the video track Determining a sample corresponding to the playing time period; describing the area information of the sub-track included in the container according to the target area and the sub-track data, determining the sub-track corresponding to the target area as the target sub-track in the at least one sub-track; and corresponding to the target sub-track
  • the sub-track data definition container determines a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, and the determined NAL packet is decoded to play the screen of the target area in the playing time period.
  • the sub-track corresponding to the target area is determined as the target sub-orbit in the at least one sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and corresponding to the target sub-track.
  • the sub-track data definition container determines the NAL packets corresponding to the target sub-tracks in the samples corresponding to the playing time period, so that the NAL packets can be decoded to play the picture of the target area in the playing time period, thereby effectively implementing the video in-region Extraction of the picture.
  • the device 300b can perform the process of the method performed by the file parser in Fig. 5a, Fig. 13, or Fig. 18 below. Therefore, the specific operations and functions of the device 300b will not be described herein.
  • FIG. 4a is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • An example of the device 400a of Fig. 4a may be a file generator, or a server including a file generator, and the like.
  • the device 400a includes a generating unit 410a and a transmitting unit 420a.
  • the video track of the video is divided into at least one sub-track, and the video track is composed of samples.
  • the generating unit 410a generates, for each of the at least one sub-track, a sub-track data description container and a sub-track data definition container, and the sub-track data description container includes the sub-track data description area information of the sub-track described by the container,
  • the area information of the track is used to indicate an area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container is used to indicate the NAL packet corresponding to the sub-track described by the sub-track data definition container in the samples constituting the video track.
  • the generating unit 410a also generates a video file of the video, the video file including a sub-track data description container and a sub-track data definition container generated for each sub-track and samples constituting the video track.
  • Sending unit 420a transmits the video file generated by the generating unit 410a.
  • a sub-track data description container and a sub-track data definition container are generated by using each sub-track of the at least one sub-track, and the sub-track data description container includes a sub-track data describing an area of the sub-track described by the container.
  • the area information of the sub-track is used to indicate the area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container includes the NAL package corresponding to the sub-track described by the sub-track data definition container in the sample composing the video track, and generates Include a sub-track data description container and a sub-track data definition container generated for each sub-track and a video file constituting a sample of the video track, so that the file parser can determine the target sub-track corresponding to the target area according to the area information of the sub-track, and can The sub-track data definition container determines the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, so as to play the picture of the target area in the playing time period, thereby effectively extracting the area picture in the video.
  • the area corresponding to the sub track may be composed of at least one block.
  • the sub-track data definition container may include an identifier of a correspondence between each of the sub-tracks of the sub-tracks described by the sub-track data and the NAL package in the samples constituting the video track.
  • the generating unit 410a may generate a sample group description container before generating the video file of the video, and the sample group description container may include a correspondence between each block and the NAL package in the video track and a correspondence between each block and the NAL package. logo.
  • the video file may further include the sample group description container.
  • the same block that identifies the same may correspond to the NAL package of the same number.
  • the sub-track data definition container may further include sample information corresponding to the identifier of the correspondence between each of the sub-tracks of the sub-track described by the sub-track data and the NAL packet.
  • sub-track data definition container and the sample group description container may respectively include the same group identification.
  • the area corresponding to the sub track may be composed of at least one block.
  • the sub-track data definition container may include an identification of each of the sub-tracks in the sub-track described by the sub-track data definition container.
  • the generating unit 410a may further generate a sample group description container and a mapping relationship container between the sample and the sample group before generating the video file of the video, the sample group description container including at least one mapping a group, each mapping group in the at least one mapping group includes a correspondence between each of the tile identifiers and the NAL packet in the video track, and the sample and sample group mapping relationship container is used to indicate that each mapping group in the at least one mapping group corresponds to sample.
  • the video file may further include a sample group description container and a mapping relationship between the sample and the sample group (33)
  • the sub-track data definition container, the sample group description container, and the sample and sample group mapping relationship containers may respectively include the same group identifier.
  • the group identifier of the embodiment of the present invention may refer to a value of a grouping type field in a sub-track data definition container, a sample group description container, and a sample and sample group mapping relationship container.
  • FIG. 4b is a schematic flowchart of an apparatus for processing a video according to another embodiment of the present invention.
  • An example of the device 400b of Fig. 4b may be a file generator, or a server including a file generator, and the like.
  • the device 400b includes a memory 410b, a processor 420b, and a transmitter 430b.
  • Memory 410b may include random access memory, flash memory, read only memory, programmable read only memory, nonvolatile memory or registers, and the like.
  • the processor 420b may be a central processing unit (CPU).
  • Memory 410b is used to store executable instructions.
  • Processor 420b can execute executable instructions stored in memory 410b.
  • the video track of the video is divided into at least one sub-track, and the video track is composed of samples.
  • the processor 420b executes executable instructions stored in the memory 410b for: generating, for each subtrack of the at least one subtrack, a subtrack data description container and a subtrack data definition container, the subtrack data description container including the sub
  • the track data describes area information of the sub-track described by the container, the area information of the sub-track is used to indicate an area corresponding to the sub-track in the picture of the video, and the sub-track data definition container is used to indicate the sub-track in the sample composing the video track
  • the data definition container describes the NAL package corresponding to the sub-track; the video file of the video is generated, and the video file includes a sub-track data description container and a sub-track data definition container generated for each sub-track and samples constituting the video track.
  • Transmitter 430b sends the video file.
  • a sub-track data description container and a sub-track data definition container are generated by using each sub-track of the at least one sub-track, and the sub-track data description container includes
  • the sub-track data describes the area information of the sub-track described by the container, the area information of the sub-track is used to indicate the area corresponding to the sub-track in the picture of the video, and the sub-track data definition container includes the sub-track data definition container in the sample constituting the video track.
  • a NAL packet corresponding to the described sub-track and generating a video file including a sub-track data description container and a sub-track data definition container generated for each sub-track and a sample constituting the video track, so that the file parser can determine according to the region information of the sub-track
  • the target sub-track corresponding to the target area, and the NAL package corresponding to the target sub-track in the sample corresponding to the playing time period is determined according to the sub-track data definition container, so as to play the screen of the target area in the playing time period, thereby effectively implementing The extraction of the area picture in the video.
  • the device 400b can perform the processes of the method performed by the file generators in Figures 5b, 7 and 17 below, and therefore, the specific functions and operations of the device 400b will not be described herein.
  • Figure 5a is a schematic flow diagram of a method of processing a video in accordance with one embodiment of the present invention. The method of Figure 5a is performed by a file parser.
  • the video track of the video may be divided into at least one sub-track, and each sub-track is defined by a sub-track data description container and a sub-track data definition container.
  • each sub-track is defined by a sub-track data description container and a sub-track data definition container.
  • the video file includes at least one sub-track data description container, at least one sub-track data definition container, and a sample constituting the video track
  • the sub-track data description container includes the sub-track of the sub-track data description container description Area information
  • the area information of the sub-track is used to indicate an area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container is used to indicate that the sub-track corresponding to the sub-track data definition container in the sample composing the video track corresponds to NAL package.
  • a file parser can receive video files from a file generator.
  • the m-th sub-track data description container in the at least one sub-track data description container included in the video file may include area information of the m-th sub-track in the sub-track of the video track, and the area information of the m-th sub-track is used to indicate the video
  • the area corresponding to the mth sub-track in the picture, the m-th sub-track data definition container may be used to indicate the NAL packet corresponding to the m-th sub-track in the samples constituting the video track, and m may be a positive integer ranging from 1 to M.
  • M may be the number of at least one sub-track included in the video track.
  • 520a determining a target area to be extracted in the picture of the video and a playback time period to be extracted.
  • the target area may be specified by the user or the program provider in the picture of the video through the corresponding application, and the target area may be an area that is played separately.
  • the playback time period can also be user specified of. If the user does not specify a play time period, the play time period may also be the default, such as the entire play time period corresponding to the track.
  • the video track can be composed of a set of samples arranged in chronological order. Therefore, the file parser can determine the sample corresponding to the playing time period based on the specified playing time period. Specifically, it is determined that the sample corresponding to the playing time segment belongs to the prior art based on the specified playing time period, and is not detailed in the embodiment of the present invention.
  • the area information of the sub-tracks included in the container is described according to the target area and the sub-track data, and the sub-track corresponding to the target area is determined as the target sub-track in the at least one sub-track.
  • the container is defined according to the sub-track data corresponding to the target sub-track, and the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period is determined, and the determined NAL packet is decoded and used to play the target area in the playing time period.
  • the sub-track data definition container corresponding to each target sub-track can be used to indicate the NAL packet corresponding to the target sub-track in the samples constituting the video track described above. Therefore, after determining the samples corresponding to the playing time period, the file parser can determine the NAL packets corresponding to each of the target sub-tracks in the samples according to the sub-track data definition container. In this way, the decoder can decode the NAL packets determined by the file parser to play the picture in the target area during the playing time period.
  • the sub-track corresponding to the target area is determined as the target sub-orbit in the at least one sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and corresponding to the target sub-track.
  • the sub-track data definition container determines the NAL packets corresponding to the target sub-tracks in the samples corresponding to the playing time period, so that the NAL packets can be decoded to play the picture of the target area in the playing time period, thereby effectively implementing the video in-region Extraction of the picture.
  • the sub-track mechanism since the sub-track mechanism is used for media selection and media switching, in the video file, only one sub-track corresponds to one track, and even if there are multiple sub-tracks corresponding to one track, the number of sub-tracks is compared. less.
  • the sub-tracks may correspond to the sub-track data description container and the sub-track data definition container, so that the NAL packets corresponding to each of the target sub-tracks in the corresponding samples in the playback time period can be quickly determined according to the above two containers. Therefore, the processing time is relatively small and the user experience is better.
  • the area corresponding to each sub track may be at least one block.
  • Composition, block is obtained by dividing the picture.
  • Blocking is a rectangular area obtained by dividing the picture of the video by the well font, and each block can be independently decoded. It can be understood that the block is obtained by dividing the picture of the video, that is, the block is obtained by dividing the image frame of the video. The partitioning of each image frame is the same. In the track, the number of blocks and the position of the blocks are the same for all samples.
  • the area corresponding to each sub-track may be composed of one block or a plurality of adjacent blocks, and the area formed by these blocks may be a rectangular area.
  • the area corresponding to one sub-track may be composed of a plurality of adjacent blocks, and these blocks may form a rectangular area.
  • the area corresponding to a sub-track consists of one block.
  • the picture of the video can be divided into multiple blocks, and the content of the single block is often reduced, for example, only a part of a video object, and the video object can refer to a person in the video picture or Objects such as objects.
  • the area information of each sub-track may include the size and location of the area corresponding to the sub-track. That is, the area information of the mth sub-track may include the size and position of the area corresponding to the m-th sub-track.
  • the area and position of each sub track can be described by pixels.
  • the width and height of the area can be described by pixels, and the position of the area can be represented by the horizontal offset and vertical offset of the area relative to the upper left pixel of the video picture.
  • the file parser may compare the region corresponding to each sub-track with the target region, determine whether the region corresponding to the sub-track overlaps with the target region, and if there is overlap, it may determine that the sub-track corresponds to the target. Zone i or.
  • the area corresponding to the sub track may be a rectangular area composed of at least one block.
  • the shape of the target area specified by the user or the program provider may be arbitrary, for example, may be a rectangle, a triangle, a circle, or the like.
  • the overlap is usually judged based on the rectangle.
  • the rectangle corresponding to the target area can be determined. If the shape of the target area itself is a rectangle, then the rectangle corresponding to the target area is also the target area itself.
  • the shape of the target area itself is not a rectangle, then a rectangle containing the target area needs to be selected as the judgment object.
  • the rectangle corresponding to the target area may be the smallest rectangle containing the triangular area.
  • the file parser can determine the horizontal offset of the upper left corner of the rectangle corresponding to the target area relative to the upper left corner of the screen.
  • the sub-track data corresponding to the sub-track describes the area information of the sub-track included in the container, and the area information may indicate the size and position of the area corresponding to the sub-track. Therefore, the file parser can determine the horizontal offset of the upper left corner of the sub-track corresponding to the upper left corner of the screen according to the area information of the sub-track, and determine the maximum value between the two horizontal offsets. The maximum between the horizontal offsets is called the maximum of the left side of the two rectangles. It should be understood that the picture mentioned here can also be understood as an image frame of a video.
  • the file parser can determine the vertical offset of the upper left corner of the rectangle corresponding to the target area relative to the upper left corner of the screen.
  • the file parser may determine a vertical offset of an upper left corner of the sub-track corresponding to the upper left corner of the screen according to the area information of the sub-track, and determine a maximum value between the two vertical offsets, where two verticals are The maximum value between the offsets is called the maximum value of the upper boundary of the two rectangles.
  • the file parser may determine the sum of the horizontal offset of the upper left corner of the rectangle corresponding to the target area relative to the upper left corner of the screen and the width of the rectangle corresponding to the target area.
  • the file parser may determine a minimum value between the sum of the two widths according to the sum of the widths of the sub-tracks and the regions corresponding to the sub-tracks, where the minimum between the sum of the two widths is referred to as two The minimum boundary of the right side of the rectangle.
  • the file parser may determine the sum of the vertical offset of the upper left corner of the rectangle corresponding to the target area with respect to the upper left corner of the screen and the height of the rectangle corresponding to the target area screen.
  • the file parser may determine the sum of the vertical offset of the upper left corner of the area corresponding to the subtrack relative to the upper left corner of the screen and the height of the area corresponding to the subtrack according to the area information of the subtrack, and determine the sum of the two highs.
  • the minimum between the two, here the minimum between the two high sums is called the minimum of the two lower side boundaries.
  • the file parser can determine when the maximum value of the left boundary of the two rectangles is greater than or equal to the minimum value of the boundary of the right side of the two rectangles, or the maximum value of the upper boundary of the two rectangles is greater than or equal to the minimum value of the lower boundary of the two rectangles.
  • the two areas do not overlap, otherwise the file parser can determine that there is an overlap between the two areas.
  • each sub-track data description container may further include an information flag (Flag), where the information flag may indicate an area of the sub-track data description container including the sub-track of the sub-track data description container description information.
  • Flag information flag
  • the area information of each sub-track may further include at least one of the following information: identifier information indicating whether the area corresponding to the sub-track can be independently decoded, the sub-track The block identifier (identity, ID) included in the area corresponding to the track, and the identifier of the area corresponding to the sub track.
  • the area corresponding to the sub track may be composed of at least one block.
  • the video file may also include a sample group description container, which may include an identification of the correspondence between each of the blocks in the video track and the NAL package and the correspondence between the respective blocks and the NAL package.
  • the sub-track data definition container corresponding to the target sub-track may include an identifier of a correspondence between each of the target sub-tracks and the NAL packet in the samples constituting the video track.
  • the file parser may determine, according to the identifier of the correspondence between each partition of the container and the target sub-track and the NAL packet, the NAL packet corresponding to the target sub-track in the sample corresponding to the playback time period.
  • the area corresponding to each sub-track may be composed of at least one block, so the NAL packet corresponding to each sub-track can be understood as the NAL packet corresponding to each block in each sub-track.
  • Each sub-track data definition container may include an identifier of a correspondence between each of the sub-tracks in the sub-track described by the sub-track data definition container and the NAL packet.
  • the identifier of the correspondence between the partition and the NAL packet may be a group description index, using "group_ description_index" (group) Description index) field representation.
  • the sample group description container may include a correspondence relationship between each of the blocks in the video track and the NAL packet and an identifier of the corresponding relationship.
  • the identifier of the correspondence may be an index, and the index may indicate that the correspondence is stored in the sample group description container.
  • the identifier of the correspondence may be an entry index, represented by an "Entry_Index" field.
  • the identifier of the block and the number of the starting NAL packet corresponding to the block and the number of corresponding NAL packets may be included. The identifier of the correspondence between each block of the target track and the NAL packet.
  • the file parser may obtain the correspondence between each partition of the target sub-track and the NAL packet from the sample group description container according to the identifier of the correspondence between the respective sub-blocks of the target sub-track and the NAL packet. The corresponding relationship indicated by the identifier is determined, and the NAL packet corresponding to the target sub-track is determined based on the obtained correspondence.
  • the file parser may be based on the correspondence between each of the sub-blocks and the NAL packet in the target sub-track in the samples composing the video track. Identifying, in the sample group description container, searching for a correspondence between the partition indicated by the identifier of the correspondence between each of the partitions and the NAL packet and the NAL packet, and then based on the found corresponding NAL packets The number and the number of NAL packets determine the NAL packets corresponding to the respective blocks in the target sub-track in the samples constituting the video track. Thereby, the NAL packet corresponding to each block in the target sub-track in the sample corresponding to the playing time period can be determined.
  • the same block in the area corresponding to each sub track, for the samples constituting the video track, the same block is identified to correspond to the same numbered NAL packet.
  • the i-th block may correspond to the same numbered NAL packet, i may be a positive integer from 1 to K, and K may be the total number of blocks in a region corresponding to one sub-track. .
  • the blocks indicated by the same block identifier may correspond to the same numbered NAL packet.
  • the number of corresponding relationships contained in the sample group description container is the same as the total number of partitions in the video track, that is, how many partitions there are, and how many correspondences there are.
  • the sub-tracks indicated by the same identifier may correspond to the same-numbered NAL packets. Then, in the sub-track data defining container corresponding to each sub-track, the sample information of each sample, such as the sample identifier or the number of samples, may not be included.
  • At least one of the blocks identifying the same for at least two samples of the samples constituting the video track may correspond to different numbered NAL packets.
  • the sub-track data definition container corresponding to the target sub-track may further include sample information corresponding to the identifier of the correspondence between each of the target sub-tracks and the NAL packet.
  • the file parser may correspond to the identifier of the correspondence between each partition of the target sub-track and the NAL packet, the identifier of the correspondence between each partition of the target sub-track and the NAL packet.
  • the sample information and the sample group description container determine a NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period.
  • the blocks indicated by the same block identifier may correspond to different numbered NAL packets.
  • the i-th block may correspond to a different numbered NAL packet, i is a positive integer from 1 to K, and K is the total number of blocks in a region corresponding to one sub-track.
  • the same block identifier may correspond to the number of different starting NAL packets or the number of NAL packets.
  • the sub-track data definition container may further include sample information, and the sample information may be used to indicate a sample corresponding to the identifier of the correspondence between each of the partitions and the NAL packet.
  • sample information can include a continuous number of samples.
  • the number of samples can be represented using the "sample_count" field.
  • the number of consecutive samples and the corresponding relationship may be - corresponding.
  • the identity of the correspondence is arranged in chronological order of the samples indicated in the corresponding video track number in the video track. It can also be understood that the samples are grouped according to the correspondence between each partition and the NAL packet.
  • the two samples will correspond to the same correspondence identifier, and if the same partition corresponds to different NAL packets, then the two The samples will correspond to different correspondence identifiers.
  • the sample corresponding to the playing time period may be determined according to the sample information.
  • An identifier of a correspondence between each of the target sub-tracks and the NAL packet and then obtaining, according to the identifier of the determined correspondence, the correspondence between the identifiers of the determined correspondences Determining a NAL packet corresponding to the target subtrack in the sample corresponding to the playing time period.
  • each sub-track data definition container may include a group identification.
  • the file parser may obtain a sample group description container having the group identifier from the video file according to the group identifier. That is, the sub-track data definition container includes the same group identifier as the sample group description container.
  • sample group description containers there may be multiple sample group description containers, and different sample group description containers may be used to describe the characteristics of samples grouped based on different criteria. For example, samples in a video track may be grouped based on a correspondence between a block and a NAL packet, and a sample group description container for such a grouping standard may be used to describe a correspondence between each block and a NAL packet. The grouping can be based on the temporal layer to which the sample belongs, and the sample group description container for such grouping criteria can be used to describe information about the temporal layer.
  • the file parser needs to obtain a sample group description describing the correspondence between the block and the NAL packet from the video file. Said container. Therefore, the sub-track data definition container and the sample group description container may include group identifiers having the same value, so that the file parser may acquire the corresponding sample group description container based on the group identifier in the sub-track data definition container.
  • both the packet identification in the sub-track data definition container and the packet identification in the sample group description container may be a packet type, represented by a ""grouping_type" field.
  • the area corresponding to the sub track may be composed of at least one block.
  • the video file may further include a sample group description container, the sample group description container including at least one mapping group, each of the at least one mapping group including a correspondence between each of the tile identifiers and the NAL packet in the video track.
  • the video file may further include a sample and sample group mapping relationship container, and the sample and sample group mapping relationship container is used to indicate samples corresponding to each of the at least one mapping group.
  • the sub-track data definition container corresponding to the target sub-track may include an identification of each of the target sub-tracks.
  • the file parser may determine the NAL packet corresponding to the target sub-track in the sample corresponding to the playback time period according to the identifier of each partition of the sample group description container, the sample and the sample group mapping relationship container, and the target sub-track.
  • the sample group description container may include at least one mapping group, and each mapping group may include a correspondence between each of the video tracks and the NAL packet.
  • Each mapping group may have a corresponding identification.
  • the identity of the mapping group may be an entry index, represented by an "Entry_Index" field.
  • the identifier of each partition in the video track and the number of the starting NAL packet corresponding to the partition may be included.
  • the sample group description container may include a mapping group, in which case, for samples constituting the video track, the blocks indicated by the same block identifier correspond to the same numbered NAL packets.
  • the sample group description container can include multiple mapping groups. Each mapping group is different from each other. In this case, for the samples composing the video track, the blocks indicated by at least one of the same block identifiers correspond to different numbered NAL packets. That is to say, the correspondence between at least one of the two mapping groups and the NAL packet is different.
  • the video file may further include a sample and sample group mapping relationship container, and the sample and sample group mapping relationship container may be used to indicate a sample corresponding to each mapping group.
  • the sample and sample group mapping relationship container may include an identification of each mapping group and a corresponding number of consecutive samples. The identity of the mapping group is arranged in chronological order of the samples in the video track. Can be based on the sample The mapping relationship with the sample group determines the correspondence between each of the partitions and the NAL package in each sample.
  • the file parser may determine the mapping group identifier corresponding to the sample corresponding to the playing time period according to the sample and sample group mapping relationship container. The mapping group indicated by the mapping group identifier can then be determined in the sample group description container according to the determined mapping group identifier.
  • the file parser may define a container according to the sub-track data corresponding to the target sub-track, and determine each block identifier in the target sub-track. The file parser may determine the number of the NAL packet corresponding to each of the block identifiers in the target sub-track in the mapping group determined above.
  • each sub-track data definition container may include a group identification.
  • the file parser may obtain, from the video file, a sample group description container having the group identifier and a sample and sample group mapping relationship container having the group identifier according to the group identifier.
  • sample group description containers there may be multiple sample group description containers, and different sample group description containers may be used to describe the characteristics of samples grouped based on different criteria. For example, samples in a video track may be grouped based on a correspondence between a block and a NAL packet, and a sample group description container for such a grouping standard may be used to describe a correspondence between each block and a NAL packet. The grouping can be based on the temporal layer to which the sample belongs, and the sample group description container for such grouping criteria can be used to describe information about the temporal layer.
  • sample and sample group mapping relationship containers there may be multiple sample and sample group mapping relationship containers, and different sample and sample group mapping relationship containers may be used to indicate individual sample groups based on different grouping criteria.
  • the samples in the video track may be grouped based on the correspondence between the block and the NAL packet, and the sample and sample group mapping relationship container for such a grouping standard may be used to indicate that each block and the NAL packet are based on Each sample group divided by the correspondence.
  • the grouping can be based on the temporal layer to which the sample belongs, and the sample and sample group mapping relationship container for such grouping criteria can be used to indicate individual sample groups based on temporal layer partitioning.
  • the file parser needs to obtain a sample group description for describing the correspondence between the block and the NAL packet from the video file.
  • a container and acquiring each sample group for indicating the division based on the correspondence between the partition and the NAL packet. Therefore, the sub-track data definition container, the sample group description container, and the sample-to-sample group mapping relationship container may include group identifiers having the same value, so that the file parser may obtain the corresponding sample group description based on the group identifier in the sub-track data definition container.
  • Container and sample and sample group mapping relationship container may include group identifiers having the same value, so that the file parser may obtain the corresponding sample group description based on the group identifier in the sub-track data definition container.
  • the sub-track data definition container includes a group identifier
  • the sample group description container includes a group identifier
  • the sample and the sample group mapping relationship may be a group type, and is used.
  • the ""grouping_type" (packet type) field is indicated.
  • the sub-track data definition container may not include the group identifier.
  • the value of the group identifier of the sub-track data definition container may be preset. In this way, the value of the group identifier of the stored sub-track data definition container may be obtained first, and then the corresponding sample group description container and the sample and sample group mapping relationship container are obtained according to the value.
  • FIG. 5b is a schematic flowchart of a method of processing a video according to another embodiment of the present invention.
  • the method of Figure 5b is performed by a media file generator.
  • the method of Fig. 5b corresponds to the method of Fig. 5a, and in Fig. 5b, the same description will be omitted as appropriate.
  • the video track of the video is divided into at least one sub-track, the video track consisting of samples.
  • the subtrack data description container includes subtrack data describing area information of the subtrack described by the container, and the subtrack The area information is used to indicate an area corresponding to the sub-track in the picture of the video, and the sub-track data definition container includes a NAL packet corresponding to the sub-track described by the sub-track data definition container in the samples constituting the video track.
  • the video file includes a sub-track data description container and a sub-track data definition container generated for each sub-track and a sample constituting the video track.
  • the file generator can send a video file to a file parser.
  • a sub-track data description container and a sub-track data definition container are generated by using each sub-track of the at least one sub-track, and the sub-track data description container includes a sub-track data describing an area of the sub-track described by the container.
  • the area information of the sub-track is used to indicate the area corresponding to the sub-track in the picture of the video
  • the sub-track data definition container includes the NAL package corresponding to the sub-track described by the sub-track data definition container in the sample composing the video track, and generates Include a sub-track data description container and a sub-track data definition container generated for each sub-track and a video file constituting a sample of the video track, so that the file parser can determine the target sub-track corresponding to the target area according to the area information of the sub-track, and can The sub-track data definition container determines the NAL packet corresponding to the target sub-track in the sample corresponding to the playing time period, so as to play the picture of the target area in the playing time period, thereby effectively extracting the area picture in the video.
  • the area corresponding to each sub track may be composed of at least one block.
  • the sub-track data definition container may include an identifier of a correspondence between each of the sub-tracks of the sub-tracks described by the sub-track data and the NAL package in the samples constituting the video track.
  • the file generator may further generate a sample group description container, the sample group description container including a correspondence between each of the blocks in the video track and the NAL package and an identifier of the correspondence between each of the blocks and the NAL package.
  • the video file may further include a sample group description container.
  • the same block that identifies the same may correspond to the NAL package of the same number.
  • At least one of the blocks identifying the same for at least two samples of the samples constituting the video track may correspond to different numbered NAL packets.
  • the sub-track data definition container may further include sample information corresponding to the identifier of the correspondence between each of the sub-tracks of the sub-tracks described by the sub-track data definition container and the NAL packet.
  • each sub-track data definition container and the sample group description container respectively include the same group identification.
  • the area corresponding to each sub track may be composed of at least one block.
  • the sub-track data definition container may include an identification of each of the sub-tracks of the sub-track data of the sub-track data definition container.
  • the file generator may generate a sample group description container and a mapping relationship container of the sample and the sample group, the sample group description container includes at least one mapping group, and each of the at least one mapping group includes each of the video tracks A correspondence between a block identifier and a NAL packet, and a sample and sample group mapping relationship container is used to indicate a sample corresponding to each mapping group in at least one mapping group.
  • the video file may further include a sample group description container and a mapping relationship container between the sample and the sample group.
  • the sub-track data definition container, the sample group description container, and the sample and sample group mapping relationship containers respectively include the same group identifier.
  • Figure 6a is a schematic illustration of an image frame in a scene to which an embodiment of the present invention may be applied.
  • Figure 6b is a schematic illustration of another image frame in a scene to which an embodiment of the present invention may be applied.
  • Figures 6a and 6b may be two image frames when the same video is played. As shown in Figures 6a and 6b, the middle rectangular area may be the target area in the video picture specified by the user through the terminal. According to the user's needs, it is necessary to separately display the screen of the target area within a certain period of time.
  • FIG. 7 is a schematic flow diagram of a process of a method of processing a video in accordance with one embodiment of the present invention.
  • the method of Figure 7 is performed by a file generator.
  • the file generator determines a correspondence between the block and the NAL packet in the video track.
  • the video picture can be divided into a plurality of blocks, that is, the image frame of the video is divided into a plurality of blocks.
  • the number of blocks and the block position of all image frames of the video are the same, so the number of blocks and the position of the block are the same for all samples constituting the video track.
  • Figure 8 is a schematic illustration of a block in accordance with one embodiment of the present invention. As shown in Figure 8, you can take a picture
  • the image frame shown in 6a is divided into four blocks, namely, block 0, block 1, block 2, and block 3.
  • the size of the four partitions can be the same, with block IDs of 0, 1, 2, and 3, respectively.
  • the block mode in other image frames in this video is the same as that in Figure 8, and will not be described again.
  • the video track of the video can consist of 54 samples.
  • the division of the blocks in each image frame is the same as that shown in Fig. 8, that is, the division of the corresponding blocks for each sample is also the same as that shown in Fig. 8.
  • Each partition may correspond to one or more consecutive NAL packets.
  • the correspondence between the block and the NAL packet may include a block ID, a number of the start NAL packet corresponding to the block, and a number of NAL packets corresponding to the block.
  • the starting NAL packet corresponding to the blocking is the first NAL packet in the consecutive NAL packets corresponding to the blocking.
  • the block ID can be recorded as tileID.
  • the number of the NAL packet corresponding to the block can be determined by the number of the start NAL packet corresponding to the block and the number of the corresponding NAL packet. If the number of NAL packets is the same, then the samples belong to the same sample group; otherwise, the samples belong to different sample groups. Regarding the correspondence between the block and the NAL package, there are two cases:
  • the total number of correspondences between the partition and the NAL packet may be the same as the total number of partitions.
  • Figure 9 is a schematic illustration of the correspondence between a partition and a NAL packet, in accordance with one embodiment of the present invention.
  • the NAL packets corresponding to each partition are separated by horizontal dashed lines.
  • Table 1 shows the correspondence between the block and the NAL packet in Fig. 9. Since all the blocks in the same block ID are indicated by the same block ID, they correspond to the same numbered NAL packet. Then, in the video track, there are a total of four types of partitions and NAL packets, that is, the total number of correspondences between the blocks and the NAL packets is the same as the number of blocks. For example, block 1 can correspond to 2 NAL packets, and the starting NAL packet is numbered 0. Block 1 can correspond to 3 NAL packets, and the starting NAL packet is numbered 2. And so on.
  • FIG. 10 is a schematic diagram of a correspondence between a block and a NAL packet according to another embodiment of the present invention.
  • the image frame shown in FIG. 6a can be composed of block 0 to block 3, each of which is divided into The NAL packets in the block can be separated by horizontal dashed lines.
  • Table 2 shows the correspondence shown in Fig. 10. As shown in Table 1, block 1 can correspond to 1 NAL packet, and the starting NAL packet is numbered 0. Block 2 can correspond to 3 NAL packets, and the starting NAL packet is numbered 2. And so on.
  • FIG. 11 is a schematic diagram of a correspondence between a block and a NAL packet according to another embodiment of the present invention.
  • the image frame shown in FIG. 6b may also be composed of block 0 to block 3, and the NAL packets in each block may be separated by a horizontal line.
  • the correspondence between each block and the NAL packet is different from the correspondence shown in FIG. Table 3 shows the correspondence shown in Fig. 11.
  • block 1 may correspond to 3 NAL packets, and the starting NAL packet is numbered 0.
  • Block 2 can correspond to 3 NAL packets, and the starting NAL packet is numbered 3. And so on.
  • Tables 2 and 3 above show the correspondence between the eight types of blocks and the NAL packets.
  • the correspondence between the block and the NAL packet conforms to four of the above eight correspondences. Therefore, in the video track, the correspondence between the above eight types of blocks and the NAL packet is shared.
  • the file generator generates a sample group description container according to the correspondence between the block and the NAL package in step 701.
  • the identifier of the above correspondence may be an entry index.
  • the sample group description container may include an integer number of subsamples and a NAL packet mapping entry (Sub Sample NALU Map Entry), the specific number of which is the same as the number of correspondences between the block and NAL packets in the video track.
  • the mapping relationship entry of each sub-sample and the NAL packet may include an entry index, a blocking ID, a number of the starting NAL packet corresponding to the blocking, and a number of NAL packets corresponding to the blocking.
  • the mapping relationship entry of each subsample with the NAL packet may include the following fields: Entry_Index, tileID, NALU_start_number, and NALU_number.
  • the "Entry_Index" field can represent the index of the entry, that is, the identity of the correspondence between the block and the NAL package.
  • the "tilelD” field may indicate the block ID
  • the "NALU_start_number” field may identify the number of the starting NAL packet corresponding to the block
  • the "NALU_number” field may indicate the number of NAL packets corresponding to the block. The specific meaning of each field is shown in Table 4.
  • the sample set description container may also include the group identification mentioned in the embodiment of Figure 5a.
  • the packet identifier may be a packet type, and the packet type may be represented by a "Grouping_type" field, and the value of the field may indicate that the sample group description container is used to describe the partition-based and NAL-based A sample grouping of the correspondence of the packets.
  • the field can take the value "ssnm".
  • a data structure of the mapping relationship between the subsample and the NAL packet can be expressed as follows:
  • NALU start— number
  • Table 4 shows the meaning of the fields in the above data structure.
  • a value of 1 means 2 bytes are occupied.
  • a value of 0 means 1 byte is occupied.
  • NALU number of NAL packets corresponding to the block
  • Table 5 shows the contents of the sample group description container for the case where the correspondence between the block and the NAL packet is the case (A).
  • Table 6 shows the contents of the sample group description container when the correspondence between the partition and the NAL packet is the case (B).
  • each row is the correspondence between a subsample and a mapping entry record for the NAL packet.
  • the "Entry_Index" field can indicate where the mapping relationship entries for each subsample and NAL package are stored in the sample group description container, and the next three fields are those recorded in the entry.
  • the file generator divides the video track into sub-tracks based on the block.
  • Each sub-track can be composed of one or more sub-blocks that can form a rectangular area.
  • each sub-track can be composed of one sub-block, and then the four sub-blocks described above will correspond to four sub-tracks, respectively.
  • the file generator For each subtrack, the file generator generates a subtrack data description container for describing the subtrack.
  • the sub-track data description container may include area information of the sub-tracks described by the container.
  • each sub-track data description container may further include a flag indicating that the sub-track data describes area information in the container including the sub-tracks of the sub-track data description container description.
  • the flag may be a "flag" field, and the "flag” field may be given a specific value indicating that the sub-track data describes the area information of the sub-track including the container description in the container.
  • the value of the "flag" field is " ⁇ , it may indicate that the sub-track data describes the area information of the sub-track including the container description in the container.
  • the area information of the sub-track may include the size and position of the area corresponding to the sub-track. Table 7 shows the attributes in the area information of the sub-track.
  • the size of the area corresponding to the sub-track can be represented by the width and height of the area.
  • the position of the area corresponding to the sub-track can pass through the area.
  • the upper left corner pixel is represented by the horizontal offset and vertical offset of the upper left pixel of the image.
  • the sub-track data description area information of the sub-track of the container may include the following attributes:
  • Unsigned int(32) region height Unsigned int(32) tile— count ⁇ block number
  • Table 7 Attributes and corresponding meanings of the area information of the sub-track
  • Figure 12 is a schematic illustration of the block shown in Figure 8 in a planar coordinate system.
  • Table 8 shows the size and position of the area corresponding to each of the blocks shown in Fig. 12. As shown in Table 8, the size and position of the area corresponding to each block are represented by pixels.
  • Block ID Width (pixels) Height (pixels) Coordinates (pixels) Coordinates (pixels)
  • the file generator For each sub-track, the file generator generates a sub-track data definition container for describing the sub-track.
  • the sub-track data definition container may include description information of the sub-tracks of the sub-track data definition container, and the description information of the sub-tracks may indicate a correspondence between each of the sub-tracks and the NAL packet.
  • the sub-track data definition container may include a sub-track and a sample group mapping relationship container (Sub Track Sample Group Box), and the mapping relationship container of the sub-track and the sample group may include one or more pieces of description information of the sub-track.
  • the specific contents included in the description information of the sub-track can also be divided into two cases.
  • the mapping relationship container of the sub-track and the sample group may include an integer number of description information of the sub-track, each description information may include a group description index, and the group description index may use a "group_ description-index" (group description index) field.
  • group_ description-index group description index
  • the number of "group_ description_index” fields is the same as the number of partitions corresponding to this subtrack.
  • group_description_index” field may be used to indicate the correspondence identifier between each of the sub-tracks and the NAL packet in the sub-track of the sub-track data definition container description.
  • Each partition may correspond to one sample group, and the sample group may include one or more consecutive samples, and the sample group is divided based on the correspondence between the partition and the NAL packet.
  • the number of "group_description-index" fields may also be the same as the number of sample groups corresponding to the sub-track. Therefore, the number of pieces of description information of the sub track is the same as the number of blocks in the sub track, and the number of sample groups corresponding to the sub track is also the same.
  • mapping container of the sub-track and the sample group may further include a grouping type
  • the grouping type may be represented by a "grouping_type” field
  • the "grouping_type” field may represent the description of the sub-track data definition container. It is sub-track information based on the correspondence between the block and the NAL packet.
  • the value of the "grouping_type” field can also be "ssnm”. It can be seen that the value of the "grouping_type” field in the sub-track data definition container is the same as the value of the "grouping_type” field in the sample group description container, then the sub-track data definition container and the above-mentioned sample group description container It is corresponding.
  • a data structure of the mapping relationship between sub-tracks and sample groups can be expressed as follows: Aligned(8) class SubTrackSampleGroupBox extends FullBox('stsg,, 0, 1) ⁇ unsigned int(32) grouping— type; The value is "ssnm"
  • grouping_type can indicate the grouping type
  • item-count can represent the number of pieces of description information of the sub-tracks contained in the mapping relationship container of the sub-track and the sample group. Contains the above ""group_ description-index” field.
  • Each sub-track may correspond to one sub-track container, and the sub-track container may include a sub-track data description container corresponding to the sub-track and a sub-track data definition container corresponding to the sub-track.
  • Table 9 shows an example of a sub track box (Sub Track Box) of the first sub track in case (A).
  • a sub-track data description container and a sub-track data definition container are included in the sub-track data description container.
  • attribute information of the sub-track may be included in the sub-track data description container.
  • the attribute information of the sub track may include an ID, a horizontal offset, a vertical offset, a region width, a region height, a tile ID, and an independence field.
  • the ID in the sub-track data description container is also the ID of the sub-track container, and may represent the sub-track described by the sub-track container.
  • the horizontal offset, vertical offset, area width, and area height are used to indicate the size and position of the area corresponding to the sub track.
  • the sub-track data definition container may include a mapping relationship container of the sub-track and the sample group, and the mapping relationship container of the sub-track and the sample group includes description information of the sub-track.
  • the description information of the sub-track can be used to indicate the NAL packet corresponding to each block in the sub-track.
  • the description information of the sub track may include a group description index.
  • the sub-track data definition container may include a "grouping_type" field, and the value of the field is "ssnm", so the sub-track data definition container may be described with a sample group whose value is "singm” in the "grouping_type” field. The container corresponds.
  • the sub-track data definition container may correspond to the sample group description container shown in Table 5.
  • the mapping relationship container of the sub-track and the sample group may include description information of one sub-track.
  • the group description index has a value of " , which can indicate that the block with the block ID of "0" in the sample composing the video track corresponds to the sample group description of the "grouping_type” field with the value "ssnm".
  • the "Entry_Index” field in the container takes the value indicated by " ⁇ .
  • the description information of the plurality of sub-tracks may be included in the mapping relationship container of the sub-track and the sample group, and the number and description of the blocks are described.
  • the number of pieces of information is the same.
  • the area corresponding to the sub-track is composed of 3 blocks, and the mapping relationship between the sub-track and the sample group may include three pieces of description information of the sub-track.
  • each description of a subtrack can include a “sample_count” field and a "group_ description-index” field.
  • the “sample_count” field may indicate a continuous number of samples that conform to the correspondence between the partition and the NAL packet, that is, the "sample_count” field indicates the sample group that conforms to the correspondence between the partition and the NAL packet.
  • the “group_description-index” field can be used to indicate the correspondence identifier between each partition in the sample group and the NAL packet. It can be seen that the number of pieces of description information of the sub track is the same as the number of sample groups.
  • the mapping relationship between the subtrack and the sample group may also include "grouping_type” (grouping class) Type field, "grouping_type” field may indicate that the sub-track data definition container describes sub-track information based on the correspondence between the block and the NAL packet.
  • grouping_type can also be "ssnm”. It can be seen that the value of the "grouping_type” field in the sub-track data definition container is the same as the value of the "grouping_type” field in the sample group description container, then the sub-track data definition container and the above-mentioned sample group description container It is corresponding.
  • the order of the pieces of description information of the sub-tracks is arranged in the order of the consecutive samples indicated by the "sample_count" field in the video track.
  • a data structure of the sub-track and sample group mapping relationship container can be expressed as follows:
  • SubTrackSampleGroupBox extends FullBox('stsg, 0, 1) ⁇
  • “item_count” may represent the number of pieces of description information of the sub-track.
  • “sample_count” field and the “group_description_index” field are included.
  • Each sub-track may correspond to one sub-track container, and the sub-track container may include a sub-track data description container corresponding to the sub-track and a sub-track data definition container corresponding to the sub-track.
  • Table 10 shows an example of the sub-track container corresponding to the first sub-track in the case (B).
  • the sub-track container may include a sub-track data description container and a sub-track data definition container.
  • the sub-track data description container may include attribute information of the sub-track, and the attribute information may include an ID, a horizontal offset, a vertical offset, a region width, a region height, a tile ID, and an independence field.
  • the sub-track data definition container may include a mapping relationship container of the sub-track and the sample group, and the mapping relationship container of the sub-track and the sample group may include description information of the sub-track.
  • the description information of the sub-tracks may be used to indicate the NAL packets corresponding to the respective sub-blocks in the sub-track. Specifically, the description of the sub-track
  • the information may include a group description index and a number of samples.
  • the video to which the image frames of Figures 6a and 6b belong may include 54 image frames, which may be single layer encoded video, then each image frame may correspond to one sample, for a total of 54 samples.
  • the sub-track data definition container may include a "grouping_type" field, and the value of the field is
  • the subtrack data definition container can correspond to the sample group description container whose value is "grouping_type” and also "ssnm".
  • the sub-track data definition container may correspond to the sample group description container shown in Table 6.
  • the area corresponding to the first sub-track consists of a block with a block ID of "0".
  • the "group_ description-index” field takes the value "1" and the “sample-count” field takes the value "10".
  • the block with the block ID of "0" among the 10 samples from 1st to 10th can correspond to the "Entry_Index" in the sample group description container of the "grouping_type” field and the value of "ssnm".
  • the field value is the correspondence between the block indicated by ⁇ and the NAL packet.
  • the "group_ description_index” field takes the value "5", "sample_count "The field value is "30", then it can be said that the block with the block ID "0" among the 30 samples from the 11th to the 40th can correspond to the value of the "Entry_Index” field in the sample group description container.
  • the value of the "group_ description-index” field is " ⁇ ,” and the "sample_count” field is taken.
  • the value is "8", which means that the block with the block ID "0" among the 8 samples from the 41st to the 48th can correspond to the value of "Entry_Index” in the sample group description container.
  • the value of the "group_ description-index” field is "5"
  • the value of the "sample_count” field is "6”, which can represent 6th to 49th.
  • the block with the block ID of "0" in the sample can correspond to the correspondence between the block indicated by ⁇ and the NAL packet in the "Entry_Index” field in the sample group description container.
  • the samples can be grouped for the correspondence between each partition and the NAL packet. For example, if the area corresponding to the sub-track consists of two blocks, based on the correspondence between the first block and the NAL packet, the sample components can be grouped into four groups. Based on the correspondence between the second block and the NAL, the sample components can be grouped into three groups. Then, there can be 7 pieces of description information in the sub-track and sample group mapping relationship container.
  • the file generator generates a video file, where the video file includes the sample group description container, a sub-track data description container for describing each sub-track, and a sub-track data definition container for describing each sub-track and a sample constituting the video track. .
  • the video file may include a sub-track container corresponding to each sub-track
  • the sub-track container may include a sub-track data description container and a sub-track data definition container corresponding to the sub-track.
  • the video file may include a sample group description container and four sub-track containers whose "grouping type" field takes the value "ssnm”, and may include samples constituting the video track.
  • the file generator sends a video file to the file parser.
  • one sub-track data description container and one sub-track data definition container are generated for each sub-track, and a sub-track description container for describing each sub-track and sub-track data for describing each sub-track are generated.
  • each sub-track data description container includes area information of the sub-track
  • each sub-track data definition container includes a sub-track Descriptive information
  • the description information of the sub-track is used to indicate the NAL packet corresponding to each sub-block in the sub-track, so that the file parser can determine the target sub-orbit corresponding to the target area according to the area information of the sub-track, and according to the sub-track of the target sub-track
  • the track data defines a description information of the target sub-track in the container and the sample group description container, and determines a NAL packet corresponding to the target sub-track in the sample in the playing time period, so as to play the screen of the target area in the playing time period, thereby being effective
  • Figure 13 is a schematic flow chart of a process of a method of processing a video corresponding to the process of Figure 7. The method of Figure 13 is performed by a file parser.
  • the file parser receives the video file from the file generator.
  • the video track of the video can be divided into at least one sub track.
  • the video file may include at least one sub-track data description container and at least one sub-track data definition container and samples constituting the video track.
  • Each subtrack can be defined by a subtrack data description container and a subtrack data definition container.
  • the file parser determines the size and location of the target area to be extracted in the video picture, and the playing time period that needs to be extracted.
  • the file parser may acquire the size and position of the rectangle corresponding to the target area to be extracted from the application, and the playing time period corresponding to the target area to be extracted determined by the user selection or application.
  • the shape of the target area specified by the user or the program provider may be arbitrary, and may be, for example, a rectangle, a triangle, a circle, or the like.
  • the overlap is usually judged based on the rectangle. Then, the rectangle corresponding to the target area can be determined. If the shape of the target area itself is a rectangle, then the rectangle corresponding to the target area is also the target area itself. If the shape of the target area itself is not a rectangle, then a rectangle containing the target area needs to be selected as the judgment object.
  • the rectangle corresponding to the target area may be the smallest rectangle containing the triangular area.
  • the size of the rectangle corresponding to the target area can be represented by the width and height of the rectangle, and the position of the rectangle corresponding to the target area can be represented by the horizontal offset and the vertical offset of the upper left corner of the rectangle with respect to the upper left corner of the screen. 1303.
  • the file parser determines a sample corresponding to the playing time period according to the video file.
  • the file parser may select one or more samples in the playback time period from the video track according to the playback time period that needs to be extracted.
  • the video includes 54 image frames
  • the playback time period may correspond to the 20th frame to the 54th frame.
  • the playing time period may correspond to the 20th sample to the 54th sample.
  • determining a sample corresponding to the playing time period is a prior art, and is not detailed in the embodiment of the present invention.
  • the file parser obtains all sub-track data description containers from the video file.
  • the sub-track data description container may include area information of the sub-tracks described by the sub-track data description container.
  • the area information of each sub track is used to indicate the area corresponding to the sub track.
  • the file parser determines the sub-track corresponding to the target area as the target sub-track according to the size and position of the rectangle corresponding to the target area and the area information of the sub-tracks in the sub-track data description container.
  • the sub-track corresponding to the target area is referred to as a target sub-track below.
  • the file parser may compare the area corresponding to each sub-track with the target area according to the manner described in the embodiment of FIG. 3, and determine whether there is overlap between the area corresponding to the sub-track and the target area, and if there is overlap, Then it can be determined that the sub-track corresponds to the target zone i or .
  • Figure 14 is a schematic illustration of a target sub-track corresponding to a target area, in accordance with one embodiment of the present invention.
  • the size and position of the target area and the sub-track data in the four sub-track containers are compared with the corresponding areas of the sub-tracks in the container, and the target sub-track corresponding to the target area is determined as the second sub-track and the third sub-orbit. . That is, the second subtrack and the third subtrack are target subtracks.
  • the above target area corresponds to the second subtrack and the third subtrack, and can be from the video text.
  • the file parser defines a container according to the playing time period and the sub-track data corresponding to the target sub-track, and determines description information of the target sub-track in the sample corresponding to the playing time period.
  • the container may be defined according to the playing time period corresponding to the target area and the sub-track data corresponding to the second sub-track and the third sub-track respectively, and the description information and the third sub-track of the second sub-track in the sample corresponding to the playing time period may be determined. Descriptive information. As described in step 701 of FIG. 7, there may be two cases regarding the correspondence between the block and the NAL packet. Step 1307 will be described below in connection with the two cases for each of the two cases.
  • the blocks indicated by the same block ID correspond to the NAL packets of the same number.
  • the file parser may directly obtain the description information of the target sub-track from the sub-track and the sample group mapping relationship container in the sub-track data corresponding to the target sub-track, and the description information of the target sub-track That is, the description information of the target sub-track in the sample corresponding to the playback time period.
  • FIG. 15 is a schematic diagram of description information of a sub-track according to an embodiment of the present invention, to indicate that the block indicated by the same block ID in all samples in the video track corresponds to the same-numbered NAL packet, and the block and NAL packet in each sample The correspondence between them is the same.
  • the file parser may define a sub-track and a sample group mapping relationship container in the container from the sub-track data corresponding to the second sub-track, and obtain description information of the second sub-track.
  • the "group_ description_index” field has different values.
  • the number of values of the "group_ description_index” field may be the same as the number of partitions corresponding to the subtrack.
  • the blocks indicated by the same block ID in the samples constituting the video track correspond to the same numbered NAL packets, the correspondence between the block and the NAL packet in each sample is the same. Therefore, for each sub-track, all samples can share the same description information, so the description information of the second sub-track is the description information of the second sub-track in the sample corresponding to the playback time period.
  • the second subtrack corresponds to the subtrack container with ID "2".
  • the value of the "group_ description_index" field in the description information of the second subtrack is "2".
  • the process corresponding to the third subtrack is similar to the second subtrack, and will not be described again.
  • the third subtrack corresponds to a subtrack container with ID "3".
  • the value of the "group_description_index” field in the description information of the third subtrack is "3".
  • the blocks indicated by the same block ID correspond to different numbered NAL packets.
  • the file parser can define the sub-track data corresponding to the target sub-track.
  • the description information corresponding to the sample corresponding to the playing time period is determined according to the value of the "sample_count" field in each piece of description information of the target sub-track, and the description information That is, the description information of the target sub-track in the sample corresponding to the playing time period.
  • the second sub-track will be described below with reference to Fig. 16 as an example.
  • 16 is a schematic diagram of description information of a sub-track according to another embodiment of the present invention, to indicate that in at least two samples of a video track, the blocks indicated by the same block ID correspond to different-numbered NAL packets.
  • the description information of the second sub-track can be obtained from the sub-track and the sample group mapping relationship container in the sub-track data definition sub-track corresponding to the second sub-track.
  • the "group_ description_index” field and the corresponding “sample_count” (sample number) field have different values.
  • Each description can contain the value of a “sample_count” field and the value of a "group_ description_index” field.
  • the "sample_count” field may indicate the number of consecutive samples that correspond to the correspondence between the partition and the NAL packet indicated by the corresponding "group_template_index” field.
  • the description information of the second sub-track in the sample corresponding to the playback time period can be determined.
  • the second subtrack corresponds to a subtrack container with ID "2".
  • the value of the "sample_count” field is "10”, which means that the first to tenth samples correspond to the description of the first one.
  • the value of the “sample_count” field is "30”, which means that the 11th to 40th samples correspond to the description of the second item.
  • the value of the "sample_count” field is "8", which means that the 41st to 48th samples correspond to the description of Article 3.
  • sample_count The value of the "sample_count” field is "6", which means that the 49th to 54th samples correspond to the description of Article 4. Assume above that the samples corresponding to the playback time period are the 20th to 54th samples.
  • the description information of the second sub-track is the description information of the second, third, and fourth pieces in the mapping relationship container of the sub-track corresponding to the sub-track and the sample group.
  • the process of determining the description information corresponding to the third subtrack in the sample corresponding to the playing time period is similar to the second subtrack, and will not be described again.
  • the third subtrack corresponds to a subtrack container with ID "3".
  • the description information of the third sub-track in the sample corresponding to the playback time period is the description information of the second, third and fourth pieces in the mapping relationship container of the sub-track and the sample group corresponding to the sub-track.
  • the file parser according to the description information of the target sub-track and the sample group description container, for example, according to the description information of the second sub-track, the description information of the third sub-track, and the sample This group describes the container, and determines the number of the NAL packet corresponding to the number of the two sub-tracks.
  • the two cases described with respect to step 701 of FIG. 7 will still be described.
  • the blocks indicated by the same block ID correspond to the NAL packets of the same number.
  • the file parser may determine that the "grouping_type” field in the sub-track corresponding to the target sub-track and the sample group mapping relationship container has a value of "ssnm", and the value may be used as the group identifier of the embodiment of the present invention. You can then get a sample group description container with a "grouping_type” field value of "ssnm” from the video file. The file parser can obtain from the sample group description container the block between the block and the NAL packet indicated by the "Entry_Index" field of the same value as the "group_ description_index” field. Corresponding relationship, determining the number of the NAL packet corresponding to the subtrack according to the correspondence between the obtained partition and the NAL packet.
  • the second sub-track will be described below with reference to Fig. 15 as an example.
  • the "group_description_index” field takes a value of "2". Then, the correspondence between the block indicated by the "Entry_Index” field with the value "2" and the NAL packet is obtained in the sample group description container. It can be seen that the NAL packets corresponding to the second subtrack are numbered 2, 3, and 4, respectively.
  • the process corresponding to the third subtrack is similar to the second subtrack, and will not be described again.
  • the NAL packets corresponding to the third subtrack are numbered 5, 6, and 7, respectively.
  • the blocks indicated by the same block ID correspond to different numbered NAL packets.
  • the file parser may determine that the "grouping_type” field in the sub-track corresponding to the target sub-track and the sample group mapping relationship container has a value of "ssnm", and then obtain "grouping” from the video file.
  • a sample group description container whose type field takes the value "ssnm”.
  • the correspondence between the block indicated by the "Entry_Index” field and the NAL packet with the same value as the "group_ description-index” field can then be obtained from the sample group description container.
  • the relationship determines the number of the NAL packet corresponding to the sub-track according to the correspondence between the obtained block and the NAL packet.
  • the second sub-track will be described below with reference to Fig. 16 as an example.
  • the 20th sample will be described as an example.
  • the "group_ description_index” field takes the value "6".
  • the correspondence between the partition indicated by the "Entry_Index” field with the value "6" and the NAL packet is obtained in the sample group description container. It can be seen that in the 20th sample, the NAL packets corresponding to the first sub-track are numbered 3, 4, and 5, respectively.
  • the process corresponding to the third subtrack is similar to the second subtrack, and will not be described again.
  • the NAL packets corresponding to the 3rd sub-track are numbered 6 and 7, respectively.
  • the process of determining the number of the NAL packet is similar to the case of the 20th sample described above, and will not be described again.
  • step 1309 Acquire, according to the number of the NAL packet determined in step 1308, a corresponding NAL packet from the video file, so that the decoder decodes the NAL packet to play a picture of the target area in the playing time period.
  • the rectangular area corresponding to the NAL package exceeds the target area
  • the rectangular area may be cropped to play the picture of the target area.
  • the sub-track corresponding to the target area is determined as the target sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and the sub-track data corresponding to the target sub-track is defined in the container.
  • the description information of the target sub-track and the number of the sample group packet enable the NAL packets to be decoded to play the picture of the target area within the playing time period, thereby effectively extracting the area picture in the video.
  • FIG. 17 is a schematic flow chart of a process of a method of processing a video according to another embodiment of the present invention.
  • the method of Figure 17 is performed by a file generator.
  • the file generator determines a correspondence between the block in the track of the video and the NAL package.
  • the video picture can be divided into a plurality of blocks, that is, the image frame of the video is divided into a plurality of blocks.
  • the number of blocks and the block position of all image frames of the video are the same, so for the samples of the track, the number of blocks and the position of the block are also the same.
  • each image frame can be divided into four blocks, namely, block 0, block 1, block 2, and block 3.
  • the corresponding block for each sample is block 0, block 1, block 2, and block 3.
  • the correspondence between the block and the NAL packet can be grouped, that is, the mapping group described below.
  • the blocks indicated by the same block identifier correspond to the same number.
  • NAL package in this case, there is a mapping group.
  • At least one of the blocks indicated by the same block identifier corresponds to a different number of NAL packets.
  • Each mapping group has an identifier.
  • the identifier of the mapping group may be an entry index. For example, assuming that for the image frame shown in Fig. 6a, the correspondence between the block and the NAL packet is as shown in Table 11.
  • an integer number of partitioning and NAL packet mapping relationship entries may be included, the specific number of which is the same as the number of groups of the above mapping group.
  • the mapping relationship entry of each of the partitions and the NAL packet includes a correspondence between each of the partitions and the NAL packet.
  • a data structure of the mapping relationship between the partition and the NAL packet can refer to the data structure described in step 702.
  • NALU start— number
  • NALU start— number
  • Table 13 shows the meaning of the fields in the above data structure.
  • a value of 1 means 2 bytes are occupied.
  • a value of 0 means 1 byte is occupied.
  • tilelD block ID For example, Table 14 shows what the sample group description container contains. As shown in Table 14,
  • mapping_type The value of the "grouping_type” field is "tlnm". Wherein, in Table 14, two mapping groups are included, and each mapping group includes a correspondence between four partitions and a NAL packet.
  • the "Entry_Index” field is used to indicate where each mapping group is stored in the sample group description container.
  • the mapping relationship between the sample and the sample group may include a correspondence between the integer number of samples and the mapping group.
  • a “sample_count” field and an “Index” field can be included in the correspondence between each sample and the mapping group.
  • the “sample_count” field may indicate that there are “sample_count” consecutive samples that match the correspondence between the partition and the NAL packet in the mapping group indicated by the corresponding "Index".
  • the order of correspondence between the various samples and the mapping group is arranged in the order in which the consecutive samples corresponding to the "sample_count” field are arranged in the video track.
  • the mapping relationship between the sample and the sample group can also include the "grouping_type" field.
  • the value of this field may indicate that the sample group description container is used to describe a sample grouping based on the correspondence between the block and the NAL packet.
  • Table 15 shows the specific content contained in the mapping relationship between the sample and the sample group.
  • ⁇ Table 15 shows that the value of the "grouping_type" field can be "tlnm”.
  • the "Index” field takes a value of "1”
  • the “sample_count” field takes a value of "10” which can indicate that
  • the 10th samples from 1st to 10th can correspond to the mapping group whose "Entry_index” field has a value of "1” in the sample group description container whose "grouping_type” value is "tlnm”.
  • the 30th to the 40th samples can correspond to a mapping group with a value of "2" in the "Entry_index” field in the sample group description container.
  • the 8th to 48th samples can correspond to the mapping group of the "Entry_index” field in the sample group description container with the value " ⁇ .
  • the 49th to 54th samples can correspond to the sample group description in the container.
  • the "Entry_index” field takes a mapping group of "2".
  • the file generator divides the video track into sub-tracks based on the block.
  • Each sub-track can be composed of one or more sub-blocks that can form a rectangular area.
  • each sub-track can be composed of one sub-block, and then the four sub-blocks described above will correspond to four sub-tracks, respectively.
  • Step 1705 is similar to step 704 in FIG. 7, and will not be described again.
  • the sub-track data definition container may include description information of the sub-track, and the description information of the sub-track may indicate a correspondence between the partition and the NAL packet in the sub-track.
  • the sub-track data definition container may include a mapping relationship container of the sub-track and the sample group, and the mapping relationship container of the sub-track and the sample group may include description information of the sub-track.
  • the specific content of the mapping relationship between the sub-track and the sample group can be divided into the following two cases: One case is that the mapping relationship between the sub-track and the sample group container can include
  • the "grouping_type” field in another case, does not include the “grouping_type” field for the sub-track and sample group mapping container. The two cases are described below.
  • the mapping relationship between the sub-track and the sample group may not include the "grouping_type” field.
  • the value of the "grouping_type” field can be set in advance. This value can be the same as the "grouping_type” field in the sample group description container and the "grouping_type” field in the sample-sample group mapping container.
  • the mapping relationship container of the sub-track and the sample group may include description information of the sub-track, and the description information of the sub-track may include a "tilelD" (block ID) field. This field can represent the identity of the tile in the subtrack. Therefore, the number of values of the "tilelD” field can be equal to the total number of blocks in the subtrack. Then, the number of pieces of description information of the sub track is the same as the number of blocks in the sub track.
  • a data structure of the sub-track and sample group mapping relationship container can be expressed as follows:
  • SubTrackSampleGroupBox extends FullBox('stsg', 0, 1) ⁇
  • Unsigned int( 16) item— count; ⁇ the number of pieces of description information for(i 0; i ⁇ item_ count; i++) ⁇
  • the "item_count” field can represent the number of pieces of description information for the subtrack. In each description of the sub-track, the above “tilelD” field may be included.
  • Each sub-track may correspond to one sub-track container, and the sub-track container may include a sub-track data description container corresponding to the sub-track and a sub-track data definition container corresponding to the sub-track.
  • Table 16 shows an example of the sub-track container of the first sub-track to indicate that it is not included.
  • the subtrack data definition container for the "grouping_type" field As shown in Table 16, in the sub-track container, a sub-track data description container and a sub-track data definition container are included.
  • an ID, a horizontal offset, a vertical offset, a region width, a region height, and an independence field may be included.
  • the ID in the sub-track data description container is also the ID of the sub-track container, and may represent the sub-track described by the sub-track container.
  • the horizontal offset, the vertical offset, the area width, and the area height are used to indicate the size and position of the area corresponding to the sub track. Independence field can be used Whether the area corresponding to the subtrack can be independently decoded.
  • the sub-track data definition container may include a mapping relationship container of the sub-track and the sample group, and the mapping relationship container of the sub-track and the sample group includes description information of the sub-track.
  • the description information of the sub track may include the respective block IDs of the sub track. Assume that the area corresponding to the first sub-track consists of the first block, that is, the block with the block ID "0". Then, as shown in Table 16, in the description information of the sub track, the "tilelD" field takes a value of "0".
  • the mapping relationship container of the sub-track and the sample group may further include a "grouping_type" field.
  • the "grouping_type” field is used to indicate sub-track information of the correspondence between the partition and the NAL packet described by the sub-track data definition container.
  • the mapping relationship container of the sub-track and the sample group may include integer bar description information of the sub-track, and each description information of the sub-track may include a value of a "tilelD" field. Then, the number of pieces of description information of the sub track is still the same as the total number of blocks in the sub track. That is, the mapping relationship between the sub-track and the sample group can include the values of an integer number of "tilelD" fields.
  • a data structure of the sub-track and sample group mapping relationship container can be expressed as follows:
  • SubTrackSampleGroupBox extends FullBox('stsg, 0, 1) ⁇
  • Unsigned int( 16) item— count; ⁇ The number of pieces of description information for(i 0; i ⁇ item_ count; i++) ⁇ Unsigned int(32) tile ID;
  • the "item_count” field may indicate the number of pieces of description information of the sub-track.
  • the above “tilelD” field may be included.
  • the above “grouping_type” field is defined.
  • Table 17 shows an example of a sub-track container for the first sub-track to represent a sub-track data definition container including the "grouping_type" field.
  • a sub-track data description container and a sub-track data definition container are included in the sub-track data description container.
  • the ID, horizontal offset, vertical offset, area width, area height, and independence fields are included in the sub-track data description container.
  • the sub-track data description container ID is also the ID of the sub-track container, and can represent the sub-track described by the sub-track container.
  • the horizontal offset, the vertical offset, the area width, and the area height are used to indicate the size and position of the area corresponding to the sub track.
  • the sub-track data definition container may include a mapping relationship container of the sub-track and the sample group, and the mapping relationship container of the sub-track and the sample group includes description information of the sub-track.
  • the area corresponding to the first sub-track consists of a block with a block ID of "0".
  • the mapping relationship container of the sub-track and the sample group may include description information of one sub-track.
  • the "tilelD" field takes the value "0”.
  • the mapping relationship between the sub-track and the sample group may further include a "grouping_type” field, and the "grouping_type” field may take a value of "tlnm”.
  • the value of the "grouping_type” field in the sample group description container shown in Table 14 above is “tnmnm”.
  • the file generator generates a video file, where the video file includes the sample group description container, the sub-track data description container corresponding to each sub-track, and a sub-track data definition container corresponding to each sub-track and a sample constituting the video track.
  • Step 1707 is similar to step 706 of FIG. 7, and will not be described again.
  • the file generator sends a video file to the file parser.
  • one sub-track data description container and one sub-track data definition container are generated for each sub-track, and a sub-track description container for describing each sub-track and sub-track data for describing each sub-track are generated.
  • each sub-track data description container includes area information of the sub-track
  • each sub-track data definition container includes description information of the sub-track
  • the description information of the sub-track is used to indicate the NAL corresponding to each sub-block in the sub-track
  • the packet enables the file parser to determine the target sub-track corresponding to the target area according to the area information of the sub-track, and defines the description information of the target sub-track in the container, the sample group description container, and the sample and the sample according to the sub-orbit data of the target sub-track
  • the mapping relationship container of the group determines the NAL packet corresponding to each segment in each target sub-track in the sample in the playing time period, so as to play the picture of the target area in the playing time
  • Fig. 18 is a schematic flow chart showing the procedure of a method of processing a video corresponding to the process of Fig. 17. The method of Figure 18 is performed by a file parser.
  • Steps 1801 to 1806 are similar to steps 1301 to 1306 of FIG. 13 and will not be described again. Further, in this embodiment, it is still assumed that the target area corresponds to the second sub-track and the third sub-track, that is, the target sub-track is the second sub-track and the third sub-track.
  • the file parser defines a container according to the sub-track data corresponding to the target sub-track, and determines description information of the target sub-track.
  • the file parser can define the container from the sub-track data corresponding to the target sub-track, and directly obtain the mesh.
  • Descriptive information of the target track, the description information of the target sub-track includes the block in the target sub-track
  • Figure 19 is a diagram showing descriptive information of a sub-track according to an embodiment of the present invention.
  • the file parser may obtain the description information of the second sub-track from the sub-track and the sample group mapping relationship container in the sub-track data definition container corresponding to the second sub-track.
  • the file parser can determine the value of the "tilelD" field in the description of the second subtrack.
  • the second subtrack corresponds to a subtrack container with ID "2".
  • the second sub-track consists of a block containing the second block, that is, the block ID is "1". Therefore, in the sub-track data definition container corresponding to the second sub-track, the value of the "tilelD” field in the description information of the second sub-track is "1".
  • the third subtrack corresponds to a subtrack container with ID "3". Assume that the third subtrack consists of a block containing the third block, that is, the block ID is "2". Therefore, in the sub-track data definition container corresponding to the third sub-track, the value of the "tilelD" field in the description information of the third sub-track is "2".
  • step 1808 will be described for the two cases described in step 1706 of FIG. (1)
  • the file parser may obtain the value of the "grouping_type” field set in advance.
  • the value of the preset "grouping_type” field may be "tlnm", that is, the value of the "grouping_type” field set in advance and the "grouping_type” field in the sample group description container.
  • the value of the "grouping_type” field in the container and the mapping relationship between the sample and the sample group is the same.
  • the file parser can then obtain a mapping container between the sample and the sample group whose "grouping_type” field takes the value "tlnm” from the video file.
  • the file parser may obtain an "Entry_Index” field corresponding to the sample corresponding to the playing time period from the mapping relationship container of the sample and the sample group.
  • the file parser can then obtain the mapping group indicated by the "Entry_Index” field corresponding to the samples in the sample group description container whose value is "tlnm” in the "grouping_type” field, and then can be determined in the obtained mapping group.
  • the NAL packet number corresponding to the block ID included in the description information of the target sub-track thereby determining the number of the NAL packet corresponding to the target sub-track in the sample corresponding to the play time period.
  • the second sub-track will be described as an example with reference to FIG.
  • the playback time period corresponds to the 20th to 54th samples.
  • the corresponding "Index" field has a value of "2". Since the "Index" field in the container and the sample group has the same meaning as the "Entry_Index” field in the sample group description container, it indicates the mapping group.
  • the corresponding "Index” field has a value of "2". Then in the sample group description container, the file parser can determine the mapping group pointed to by the "Entry_Index” field with a value of "2". As shown in FIG. 19, the 20th sample corresponds to the 2nd mapping group. In the description of the second subtrack, the value of the "tilelD" field is "1". Then, in the 20th sample, for the 2nd subtrack, in the mapping group pointed to by the "Entry_Index” field with the value "2", the block with the block ID "1" The corresponding starting NAL packet is numbered 3.
  • the NAL packet is contiguous, in the mapping group, it can be seen that the number of the starting NAL packet corresponding to the block with the block ID "2" is 6. Then, the number of the NAL packets corresponding to the block with the block ID "1" is 3, 4, and 5, respectively. That is to say, the NAL packets corresponding to the second sub-track are numbered 3, 4, and 5, respectively.
  • the NAL packets corresponding to the third subtrack in the 20th sample are numbered 6 and 7, respectively.
  • the specific process is similar to the second sub-track, and will not be described again.
  • the value of the "grouping_type” field may be obtained, and the value may be used as a grouping according to an embodiment of the present invention.
  • logo For example, the value of the "grouping_type” field here can be "tlnm”.
  • the file parser can obtain a mapping container between the sample and the sample group whose "grouping_type” field has the value "tlnm” from the video file.
  • the file parser can obtain the "Entry_Index" field corresponding to the sample corresponding to the playback time period from the mapping relationship container of the sample and the sample group.
  • the file parser can then obtain the mapping group indicated by the "Entry_Index” field corresponding to the samples in the sample group description container whose value is "tlnm” in the "grouping_type” field, and then can be determined in the obtained mapping group.
  • the specific process of determining the NAL packet number is similar to the process of (1) in step 1808, and will not be described again.
  • Step 1809 is similar to step 1309 in FIG. 13 and will not be described again.
  • the sub-track corresponding to the target area is determined as the target sub-track by describing the area information of the sub-track described by the container according to the target area and the sub-orbit data, and the sub-track data corresponding to the target sub-track is defined in the container.
  • Descriptive information of the target sub-track, a mapping group in the sample group description container, and a mapping relationship container between the sample and the sample group determining the number of the NAL packet corresponding to each of the target sub-tracks in the sample corresponding to the playing time period, so that These NAL packets are decoded to play the picture of the target area within the playing time period, so that the extraction of the area picture in the video can be effectively realized.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the present invention
  • the technical solution in essence or the part contributing to the prior art or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for making one
  • the computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

本发明实施例提供处理视频的设备和方法。该设备包括:接收单元,用于接收视频对应的视频文件;确定单元,用于:确定在视频的画面中需要提取的目标区域以及需要提取的播放时间段;根据视频文件,在组成视频轨道的样本中确定播放时间段对应的样本;根据目标区域以及子轨道数据描述容器包括的子轨道的区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目标子轨道;根据目标子轨道对应的子轨道数据定义容器,确定播放时间段对应的样本中目标子轨道对应的NAL包,确定的NAL包被解码后用于播放目标区域在播放时间段内的画面。本发明实施例能够有效地实现视频中区域画面的提取。

Description

处理视频的设备和方法 技术领域
本发明涉及信息技术领域, 并且具体地, 涉及处理视频的设备和方法。 背景技术
目前, 出现了新一代的高效视频编码 ( High Efficiency Video coding, HEVC )方法。 对于釆用 HEVC方法编码的视频, 在视频播放的过程中常存 在一些提取视频中区域画面的需求。 比如, 图 1是需要提取视频中区域画面 的一个场景的示意图。 一场欧洲杯球赛使用了全景拍摄技术进行拍摄, 得到 的全景视频的分辨率为 6Kx2K, 适合于在超高分辨率的全景显示屏上播放, 但如果用户想在普通屏幕上观看该全景视频, 因为普通屏幕的分辨率较小, 就需要提取全景视频中的区域画面, 在普通屏幕上播放该区域画面。 如图 1 所示, 上方为一个全景屏幕, 下方为手机屏幕和电脑屏幕, 全景屏幕上能够 显示完整的视频画面, 而在手机屏幕和电脑屏幕无法显示完整的全景视频画 面, 因此在手机屏幕和电脑屏幕上播放时, 均需要提取虚线方框标识的区域 画面, 然后在手机屏幕和电脑屏幕上播放提取的区域画面。
再如, 图 2是需要提取视频中区域画面的另一场景的示意图。 视频监控 中, 可以将多个摄像头拍摄的画面拼起来, 形成一个监控视频。 当回放该监 控视频时, 如果用户需要指定其中某一个摄像头拍摄的画面进行回放, 就需 要提取该监控视频的区域画面进行播放。如图 2所示,左侧为一个监控视频, 该视频中的每一个图像都包含多个摄像头拍摄的画面,假设虚线方框所标识 的区域为用户需要指定的需要进行回放的摄像头拍摄的画面,那么就需要将 该区域画面提取出来单独播放。
然而,对于釆用 HEVC方法编码的视频, 目前还没有有效的方法来实现 视频中区域画面的提取, 例如实现上述图 1或图 2所示的场景中区域画面的 提取。 发明内容
本发明实施例提供处理视频的设备和方法, 能够有效地实现视频中区域 画面的提取。 本发明实施例的第一方面, 提供了一种处理视频的设备。 视频的视频轨 道被划分为至少一个子轨道,每个子轨道由一个子轨道数据描述容器和一个 子轨道数据定义容器描述。 所述设备包括: 接收单元, 用于: 接收所述视频 对应的视频文件, 所述视频文件包括至少一个子轨道数据描述容器、 至少一 个子轨道数据定义容器以及组成视频轨道的样本, 所述子轨道数据描述容器 包括所述子轨道数据描述容器描述的子轨道的区域信息, 所述子轨道的区域
容器描述的子轨道对应的网络提取层 NAL包;
确定单元, 用于: 确定在所述视频的画面中需要提取的目标区域以及需 要提取的播放时间段; 根据所述接收单元接收的所述视频文件, 在所述组成 所述视频轨道的样本中确定所述播放时间段对应的样本; 根据所述目标区域 以及所述子轨道数据描述容器包括的子轨道的区域信息,在所述至少一个子 轨道对应的子轨道数据定义容器,确定所述播放时间段对应的样本中所述目 标子轨道对应的 NAL包,所述确定的 NAL包被解码后用于播放所述目标区 域在所述播放时间段内的画面。
结合第一方面, 在第一种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述视频文件还包括样本组描述容器, 所述样本组描述 容器包括所述视频轨道中各个分块与 NAL包之间的对应关系以及所述各个 分块与 NAL包之间的对应关系的标识; 所述目标子轨道对应的子轨道数据
NAL包之间的对应关系的标识; 述播放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所 分块与 NAL包之间的对应关系的标识, 确定所述播放时间段对应的样本中 所述目标子轨道对应的 NAL包。
结合第一方面的第一种可能的实现方式, 在第二种可能的实现方式中, 在所述子轨道对应的区域中, 对于所述组成视频轨道的样本, 标识相同的分 块对应于相同编号的 NAL包。 结合第一方面的第一种可能的实现方式, 在第三种可能的实现方式中, 在所述子轨道对应的区域中,对于所述组成视频轨道的样本中的至少两个样 本, 至少一个标识相同的分块对应于不同编号的 NAL包; 所述目标子轨道 对应的子轨道数据定义容器还包括所述目标子轨道的每个分块与 NAL包之 间的对应关系的标识所对应的样本信息;
所述确定单元根据所述样本组描述容器和在所述组成视频轨道的样本 中所述目标子轨道的每个分块与 NAL包之间的对应关系的标识确定所述播 放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所述目 标子轨道的每个分块与 NAL包之间的对应关系的标识、 所述目标子轨道的 每个分块与 NAL之间的对应关系的标识所对应的样本信息以及所述样本组 描述容器, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL 包。
结合第一方面的第一种可能的实现方式至第三种可能的实现方式中任 一方式, 在第四种可能的实现方式中, 所述子轨道数据定义容器还包括分组 标识; 所述确定单元, 还用于在确定所述播放时间段对应的样本中所述目标 子轨道对应的 NAL包之前, 根据所述分组标识, 从所述视频文件中获取具 有所述分组标识的所述样本组描述容器。
结合第一方面, 在第五种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述视频文件还包括样本组描述容器, 所述样本组描述 容器包括至少一个映射组, 所述至少一个映射组中的每个映射组包括所述视 频轨道中各个分块标识与 NAL包之间的对应关系; 所述视频文件还包括样 本与样本组映射关系容器, 所述样本与样本组映射关系容器用于指示所述至 少一个映射组中每个映射组对应的样本; 所述目标子轨道对应的子轨道数据 定义容器包括所述目标子轨道的每个分块的标识; 述播放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所 述样本组描述容器、所述样本与样本组映射关系容器和所述目标子轨道的每 个分块的标识, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL包。
结合第一方面的第五种可能的实现方式, 在第六种可能的实现方式中, 所述子轨道数据定义容器包括分组标识; 所述确定单元,还用于在确定所述播放时间段对应的样本中所述目标子 轨道分别对应的 NAL包之前, 根据所述分组标识, 从所述视频文件中获取 具有所述分组标识的所述样本组描述容器和具有所述分组标识的所述样本 与样本组映射关系容器。
本发明实施例的第二方面, 提供了一种处理视频的设备。 视频的视频轨 道被划分为至少一个子轨道, 所述视频轨道由样本组成。 所述设备包括: 生成单元, 用于: 针对所述至少一个子轨道中的每个子轨道, 生成一个 子轨道数据描述容器和一个子轨道数据定义容器, 所述子轨道数据描述容器 包括所述子轨道数据描述容器描述的子轨道的区域信息, 所述子轨道的区域
描述的子轨道对应的网络提取层 NAL包; 生成所述视频的视频文件, 所述 视频文件包括针对所述每一个子轨道生成的所述一个子轨道数据描述容器 和所述一个子轨道数据定义容器以及所述组成所述视频轨道的样本;
发送单元, 用于: 发送所述生成单元生成的所述视频文件。
结合第二方面, 在第一种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述子轨道数据定义容器包括在所述组成视频轨道的样 本中所述子轨道数据定义容器描述的子轨道的每个分块与 NAL包之间的对 应关系的标识;
所述生成单元, 还用于在所述生成所述视频的视频文件之前, 生成样本 组描述容器, 所述样本组描述容器包括所述视频轨道中各个分块与 NAL包 之间的对应关系以及所述各个分块与 NAL包之间的对应关系的标识;
所述视频文件进一步包括所述样本组描述容器。
结合第二方面, 在第二种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述子轨道数据定义容器包括所述子轨道数据定义容器 描述的子轨道中每个分块的标识;
所述生成单元, 还用于在所述生成所述视频的视频文件之前, 生成样本 组描述容器以及样本与样本组的映射关系容器, 所述样本组描述容器包括至 少一个映射组, 所述至少一个映射组中的每个映射组包括所述视频轨道中各 个分块标识与 NAL包之间的对应关系, 所述样本与样本组映射关系容器用 于指示所述至少一个映射组中每个映射组对应的样本; 所述视频文件进一步包括: 所述样本组描述容器和所述样本与样本组的 映射关系容器。
本发明实施例第三方面, 提供了一种处理视频的方法。 视频的视频轨道 被划分为至少一个子轨道,每个子轨道由一个子轨道数据描述容器和一个子 轨道数据定义容器描述。 所述方法包括: 接收所述视频对应的视频文件, 所 述视频文件包括至少一个子轨道数据描述容器、至少一个子轨道数据定义容 器以及组成所述视频轨道的样本, 所述子轨道数据描述容器包括所述子轨道 数据描述容器描述的子轨道的区域信息, 所述子轨道的区域信息用于指示在 所述视频的画面中所述子轨道对应的区域, 所述子轨道数据定义容器用于指 道对应的网络提取层 NAL包; 确定在所述视频的画面中需要提取的目标区 域以及需要提取的播放时间段; 根据所述视频文件, 在所述组成所述视频轨 道的样本中确定所述播放时间段对应的样本; 根据所述目标区域以及所述子 轨道数据描述容器包括的子轨道的区域信息,在所述至少一个子轨道中确定 与所述目标区域对应的子轨道作为目标子轨道; 根据所述目标子轨道对应的 子轨道数据定义容器,确定所述播放时间段对应的样本中所述目标子轨道对 应的 NAL包,所述确定的 NAL包被解码后用于播放所述目标区域在所述播 放时间段内的画面。
结合第三方面, 在第一种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述视频文件还包括样本组描述容器, 所述样本组描述 容器包括所述视频轨道中各个分块与 NAL包之间的对应关系以及所述各个 分块与 NAL包之间的对应关系的标识; 所述目标子轨道对应的子轨道数据
NAL包之间的对应关系的标识;
所述根据目标子轨道对应的子轨道数据定义容器,确定所述播放时间段 对应的样本中所述目标子轨道对应的 NAL包, 包括: 根据所述样本组描述 容器和在所述组成视频轨道的样本中所述目标子轨道的每个分块与 NAL包 之间的对应关系的标识,确定所述播放时间段对应的样本中所述目标子轨道 对应的 NAL包。
结合第三方面的第一种可能的实现方式, 在第二种可能的实现方式中, 在所述子轨道对应的区域中, 对于所述组成视频轨道的样本, 标识相同的分 块对应于相同编号的 NAL包。
结合第三方面的第一种可能的实现方式, 在第三种可能的实现方式中, 在所述子轨道对应的区域中,对于所述组成视频轨道的样本中的至少两个样 本, 至少一个标识相同的分块对应于不同编号的 NAL包; 所述目标子轨道 对应的子轨道数据定义容器还包括所述目标子轨道的每个分块与 NAL包之 间的对应关系的标识所对应的样本信息; 间段对应的样本中所述目标子轨道对应的 NAL包, 包括: 根据所述目标子 轨道的每个分块与 NAL包之间的对应关系的标识、 所述目标子轨道的每个 分块与 NAL之间的对应关系的标识所对应的样本信息以及所述样本组描述 容器, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL包。
结合第三方面第一种可能的实现方式至第三种可能的实现方式,在第四 种可能的实现方式中, 所述子轨道数据定义容器还包括分组标识;
在所述根据所述样本组描述容器和在所述组成视频轨道的样本中所述 目标子轨道的每个分块与 NAL包之间的对应关系的标识, 确定所述播放时 间段对应的样本中所述目标子轨道对应的 NAL包之前, 还包括: 根据所述 分组标识, 从所述视频文件中获取具有所述分组标识的所述样本组描述容 器。
结合第三方面, 在第五种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述视频文件还包括样本组描述容器, 所述样本组描述 容器包括至少一个映射组, 所述至少一个映射组中的每个映射组包括所述视 频轨道中各个分块标识与 NAL包之间的对应关系; 所述视频文件还包括样 本与样本组映射关系容器, 所述样本与样本组映射关系容器用于指示所述至 少一个映射组中每个映射组对应的样本; 所述目标子轨道对应的子轨道数据 定义容器包括所述目标子轨道的每个分块的标识; 间段对应的样本中所述目标子轨道对应的 NAL包, 包括: 根据所述样本组 描述容器、所述样本与样本组映射关系容器和所述目标子轨道的每个分块的 标识, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL包。
结合第三方面的第五种可能的实现方式, 在第六种可能的实现方式中, 所述子轨道数据定义容器包括分组标识; 在所述根据所述样本组描述容器、所述样本与样本组映射关系容器和所 述目标子轨道的每个分块的标识,确定所述播放时间段对应的样本中所述目 标子轨道分别对应的 NAL包之前, 还包括: 根据所述分组标识, 从所述视 频文件中获取具有所述分组标识的所述样本组描述容器和具有所述分组标 识的所述样本与样本组映射关系容器。
本发明实施例的第四方面, 提供了一种处理视频的方法。 所述视频的视 频轨道被划分为至少一个子轨道,所述视频轨道由样本组成。所述方法包括: 针对所述至少一个子轨道中的每个子轨道, 生成一个子轨道数据描述容器和 一个子轨道数据定义容器, 所述子轨道数据描述容器包括所述子轨道数据描 述容器描述的子轨道的区域信息, 所述子轨道的区域信息用于指示在所述视 频的画面中所述子轨道对应的区域, 所述子轨道数据定义容器用于指示在组 络提取层 NAL包; 生成所述视频的视频文件, 所述视频文件包括针对所述 每一个子轨道生成的所述一个子轨道数据描述容器和所述一个子轨道数据 定义容器以及所述组成所述视频轨道的样本; 发送所述视频文件。
结合第四方面, 在第一种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述子轨道数据定义容器包括在所述组成视频轨道的样 本中所述子轨道数据定义容器描述的子轨道的每个分块与 NAL包之间的对 应关系的标识;
在所述生成所述视频的视频文件之前, 所述方法还包括: 生成样本组描 述容器, 所述样本组描述容器包括所述视频轨道中各个分块与 NAL包之间 的对应关系以及所述各个分块与 NAL包之间的对应关系的标识;
所述视频文件进一步包括所述样本组描述容器。
结合第四方面的第一种可能的实现方式, 在第二种可能的实现方式中, 在所述子轨道对应的区域中, 对于所述组成所述视频轨道的样本, 标识相同 的分块对应于相同编号的 NAL包。
结合第四方面, 在第三种可能的实现方式中, 所述子轨道对应的区域由 至少一个分块组成; 所述子轨道数据定义容器包括所述子轨道数据定义容器 描述的子轨道的每个分块的标识;
在所述生成所述视频的视频文件之前, 还包括: 生成样本组描述容器以 及样本与样本组的映射关系容器, 所述样本组描述容器包括至少一个映射 组, 所述至少一个映射组中的每个映射组包括所述视频轨道中各个分块标识 与 NAL包之间的对应关系, 所述样本与样本组映射关系容器用于指示所述 至少一个映射组中每个映射组对应的样本;
所述视频文件进一步包括所述样本组描述容器和所述样本与样本组的 映射关系容器。
本发明实施例的第五方面, 提供了一种处理视频的设备。 视频的视频轨 道被划分为至少一个子轨道,每个子轨道由一个子轨道数据描述容器和一个 子轨道数据定义容器描述, 该设备包括: 存储器、 处理器和接收器; 接收器 接收视频对应的视频文件, 视频文件包括至少一个子轨道数据描述容器、 至 少一个子轨道数据定义容器以及组成视频轨道的样本, 子轨道数据描述容器 包括子轨道数据描述容器描述的子轨道的区域信息,子轨道的区域信息用于 指示在视频的画面中子轨道对应的区域,子轨道数据定义容器用于指示在组 成视频轨道的样本中子轨道数据定义容器描述的子轨道对应的网络提取层 NAL 包。 存储器用于存储可执行指令; 处理器执行存储器中存储的可执行 指令, 用于: 确定在视频的画面中需要提取的目标区域以及需要提取的播放 时间段; 根据接收单元接收的视频文件, 在组成视频轨道的样本中确定播放 时间段对应的样本; 根据目标区域以及子轨道数据描述容器包括的子轨道的 区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目标子轨 道; 根据目标子轨道对应的子轨道数据定义容器, 确定播放时间段对应的样 本中目标子轨道对应的 NAL包,确定的 NAL包被解码后用于播放目标区域 在播放时间段内的画面。
本发明实施例的第六方面, 提供了一种处理视频的设备。 视频的视频轨 道被划分为至少一个子轨道, 视频轨道由样本组成。 该设备包括: 存储器、 处理器和发送器。 存储器用于存储可执行指令。 处理器执行存储器中存储的 可执行指令, 用于: 针对至少一个子轨道中的每个子轨道, 生成一个子轨道 数据描述容器和一个子轨道数据定义容器, 子轨道数据描述容器包括该子轨 道数据描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示在视 频的画面中该子轨道对应的区域, 子轨道数据定义容器用于指示在组成视频 轨道的样本中该子轨道数据定义容器描述的子轨道对应的 NAL包; 生成视 频的视频文件,视频文件包括针对每一个子轨道生成的一个子轨道数据描述 容器和一个子轨道数据定义容器以及组成视频轨道的样本。发送器发送视频 文件。
本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目 标子轨道, 并根据目标子轨道对应的子轨道数据定义容器确定播放时间段对 应的样本中目标子轨道对应的 NAL包,使得能够对这些 NAL包进行解码来 播放目标区域在该播放时间段内的画面,从而能够有效地实现视频中区域画 面的提取。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对本发明实施例中 所需要使用的附图作简单地介绍, 显而易见地, 下面所描述的附图仅仅是本 发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的 前提下, 还可以根据这些附图获得其他的附图。
图 1是需要提取视频中区域画面的一个场景的示意图。
图 2是需要提取视频中区域画面的另一场景的示意图。
图 3a是根据本发明一个实施例的处理视频的设备的示意性流程图。 图 3b是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 4a是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 4b是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 5a是根据本发明一个实施例的处理视频的方法的示意性流程图。 图 5b是根据本发明另一实施例的处理视频的方法的示意性流程图。 图 6a是可应用本发明实施例的场景中的一个图像帧的示意图。
图 6b是可应用本发明实施例的场景中的另一图像帧的示意图。
图 7是根据本发明一个实施例的处理视频的方法的过程的示意性流程 图。
图 8是根据本发明一个实施例的分块的示意图。
图 9是根据本发明一个实施例的分块与 NAL包之间的对应关系的示意 图。
图 10是根据本发明另一实施例的分块与 NAL包之间的对应关系的示意 图。
图 11是根据本发明另一实施例的分块与 NAL包之间的对应关系的示意 图。
图 12是图 8所示的分块在平面坐标系中的示意图。
图 13是与图 7的过程相对应的处理视频的方法的过程的示意性流程图。 图 14是根据本发明一个实施例的目标区域对应的目标子轨道的示意图。 图 15是根据本发明一个实施例的子轨道的描述信息的示意图。
图 16是根据本发明另一实施例的子轨道的描述信息的示意图。
图 17是根据本发明另一实施例的处理视频的方法的过程的示意性流程 图。
图 18是与图 17 的过程相对应的处理视频的方法的过程的示意性流程 图。
图 19是根据本发明一个实施例的子轨道的描述信息的示意图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明的一部分实施例, 而不 是全部实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动的前提下所获得的所有其他实施例, 都应属于本发明保护的范围。
一个视频节目可以包含不同类型的媒体流, 而不同类型的媒体流可以被 称为不同的轨道(Track )。 如视频流可称为视频轨道, 音频流可称为音频轨 道, 字幕流可称为字幕轨道。 本发明实施例涉及针对视频轨道的处理。
视频轨道可以是指按照时间顺序排列的一组样本,例如一段时间的视频 流。 样本是一个时间戳对应的同一类型的媒体数据, 例如, 对于单视角的视 频, 一个图像帧对应于一个样本; 对于多视角的视频, 同一时间点的多个图 像帧对应于一个样本。 子轨道 ( Sub Track )机制是国际标准组织基于媒体文 件格式 ( ISO ( the International Organization for Standardization ) based Media File Format, ISOBMFF ) 中定义的一种对一个视频轨道中的样本( Sample ) 进行分组的方法。子轨道机制主要可以用于媒体选择或媒体切换。也就是说, 釆用一种分组标准得到的多个子轨道之间是互为替代或互为切换的关系。对 于从视频的画面中提取目标区域的画面而言, 也可以理解为对媒体进行选 择, 因此, 在本发明实施例中, 可以基于子轨道机制从视频的画面中提取目 标区 i或的画面。 本发明实施例中 ,视频可以是通过 HEVC方法进行编码的。通过 HEVC 方法编码的视频可以按照 ISOBMFF定义的框架存储为视频文件。 组成视频 文件的基本单元可以是容器(Box ), —个视频文件可以由一组容器组成。 容 器可以包含头(Header )和负载(Payload )两部分。 负载为容器中包含的数 据, 例如可以是媒体数据、 元数据或其它容器。 容器中的头可以指示容器的 类型和长度。
具体来说,在对视频釆用 HEVC方法进行编码后,可以得到视频的视频 轨道。 视频的视频轨道可以被划分为至少一个视频子轨道(本发明实施例简 称子轨道), 每个子轨道可以与视频画面中一个区域相对应。 此外, 视频轨 道由一组样本组成(即由至少两个样本组成), 每个样本展现的画面即为视 频画面。 因此可以理解的是, 每个样本可以与上述至少一个子轨道的每一个 子轨道对应。
由于编码后的视频可以由连续的网络提取层 ( Network Abstraction Layer, NAL ) 包组成, 因此每个样本也是由连续的 NAL包组成。 可以理解 的是,本发明实施例中所述连续的 NAL包指 NAL包之间没用多余的字节空 隙。 每个样本与上述至少一个子轨道中的每一个子轨道都对应, 那么可以理 解的是, 每个子轨道可以对应于一个样本中的一个或多个连续的 NAL包。
由上述可知, 可以通过视频文件中的一组容器描述编码后的视频数据。 本发明实施例中, 每个子轨道可以通过一个子轨道数据描述容器 ( Sub Track Information Box )和一个子轨道数据定义容器( Sub Track Definition Box )来 描述。描述同一个子轨道的子轨道数据描述容器和子轨道数据定义容器可以 被封装在一个子轨道容器( Sub Track Box ) 中。 也就是, 每个子轨道可以通 过一个子轨道容器来描述, 该子轨道容器可以包括描述该子轨道的子轨道数 据描述容器和子轨道数据定义容器。
子轨道数据描述容器可以包括子轨道的区域信息, 子轨道的区域信息可 以指示该子轨道在视频画面中对应的区域。子轨道数据定义容器可以描述子 轨道所包含的数据。 具体来说, 子轨道数据定义容器可以指示在各个样本中 该子轨道数据定义容器描述的子轨道所对应的网络提取层 ( Network Abstraction Layer, NAL ) 包。
因此, 该视频对应的视频文件可以包括至少一个子轨道数据描述容器和 至少一个子轨道数据定义容器以及组成视频轨道的样本。 此外, 视频文件还 因此为了实现对视频画面中的目标区域的提取, 并播放该目标区域在某 个播放时间段内的画面, 就需要获取该目标区域在该播放时间段内的 NAL 包,对获取的 NAL包进行解码从而播放目标区域在该播放时间段内的画面。
进一步的, 由于每个子轨道对应于视频画面中一个区域, 那么可以根据 目标区域以及子轨道数据描述容器中的子轨道的区域信息,确定目标区域所 对应的子轨道, 即本发明实施例中所提到的目标子轨道。
此外, 由于视频轨道由按照时间顺序排列的一组样本组成, 因此, 可以 基于需要提取的播放时间段, 确定该播放时间段所对应的样本。
每个子轨道对应的子轨道数据定义容器可以指示在各个样本中该子轨 道对应的 NAL包。 因此, 在确定播放时间段对应的样本后, 就可以根据目 标子轨道对应的子轨道数据定义容器,确定播放时间段对应的样本中目标子 轨道对应的 NAL包。 例如, 确定目标子轨道对应的 NAL包的编号。 这样, 可以从视频文件中获取这些 NAL包, 从而对这些 NAL包进行解码, 以播放 目标区域在上述播放时间段内的画面。
下面将结合本发明实施例详细描述在视频画面中提取目标区域画面的 设备以及相应的过程。
图 3a是根据本发明一个实施例的处理视频的设备的示意性流程图。 图 3a的设备 300a的例子可以是文件解析器, 或者包含文件解析器的用户设备 等。 设备 300a包括接收单元 310a和确定单元 320a。
视频的视频轨道被划分为至少一个子轨道,每个子轨道由一个子轨道数 据描述容器和一个子轨道数据定义容器描述。
接收单元 310a接收视频对应的视频文件, 视频文件包括至少一个子轨 道数据描述容器、 至少一个子轨道数据定义容器以及组成视频轨道的样本, 子轨道数据描述容器包括该子轨道数据描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示在视频的画面中该子轨道对应的区域, 子轨道数 据定义容器用于指示在组成视频轨道的样本中该子轨道数据定义容器描述 的子轨道对应的 NAL包。确定单元 320a确定在视频的画面中需要提取的目 标区域以及需要提取的播放时间段。 确定单元 320a还根据接收单元 310a接 收的视频文件, 在组成视频轨道的样本中确定播放时间段对应的样本。 确定 单元 320a还根据目标区域以及子轨道数据描述容器包括的子轨道的区域信 息, 在至少一个子轨道中确定与目标区域对应的子轨道作为目标子轨道。 确 定单元 320a还根据目标子轨道对应的子轨道数据定义容器, 确定播放时间 段对应的样本中目标子轨道对应的 NAL包,上述确定的 NAL包被解码后用 于播放目标区域在播放时间段内的画面。
本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目 标子轨道, 并根据目标子轨道对应的子轨道数据定义容器确定播放时间段对 应的样本中目标子轨道对应的 NAL包,使得能够对这些 NAL包进行解码来 播放目标区域在该播放时间段内的画面,从而能够有效地实现视频中区域画 面的提取。
可选地,作为一个实施例,子轨道对应的区域可以由至少一个分块组成。 视频文件还可以包括样本组描述容器,样本组描述容器可以包括视频轨 道中各个分块与 NAL包之间的对应关系以及各个分块与 NAL包之间的对应 关系的标识。 目标子轨道对应的子轨道数据定义容器可以包括在组成视频轨 道的样本中该目标子轨道的每个分块与 NAL包之间的对应关系的标识。
确定单元 320a根据目标子轨道对应的子轨道数据定义容器确定播放时 间段对应的样本中目标子轨道对应的 NAL包可以具体为: 根据样本组描述 容器和在组成视频轨道的样本中目标子轨道的每个分块与 NAL包之间的对 应关系的标识 , 确定播放时间段对应的样本中目标子轨道对应的 NAL包。
可选地, 作为另一实施例, 在子轨道对应的区域中, 对于组成视频轨道 的样本, 标识相同的分块可以对应于相同编号的 NAL包。
可选地, 作为另一实施例, 在子轨道对应的区域中, 对于组成视频轨道 的样本中的至少两个样本, 至少一个标识相同的分块可以对应于不同编号的 NAL 包。 目标子轨道对应的子轨道数据定义容器还可以包括该目标子轨道 的每个分块与 NAL包之间的对应关系的标识所对应的样本信息。
确定单元 320a根据样本组描述容器和在组成视频轨道的样本中目标子 轨道的每个分块与 NAL包之间的对应关系的标识确定播放时间段对应的样 本中目标子轨道对应的 NAL包可以具体为: 根据目标子轨道的每个分块与 NAL包之间的对应关系的标识、 目标子轨道的每个分块与 NAL之间的对应 关系的标识所对应的样本信息以及样本组描述容器,确定播放时间段对应的 样本中目标子轨道对应的 NAL包。 可选地, 作为另一实施例, 子轨道数据定义容器还可以包括分组标识。 确定单元 320a还可以在确定播放时间段对应的样本中目标子轨道对应的 NAL 包之前, 根据该分组标识, 从视频文件中获取具有该分组标识的样本 组描述容器。
可选地,作为另一实施例,子轨道对应的区域可以由至少一个分块组成。 视频文件还可以包括样本组描述容器,样本组描述容器可以包括至少一 个映射组, 至少一个映射组中的每个映射组包括视频轨道中各个分块标识与 NAL包之间的对应关系。 视频文件还可以包括样本与样本组映射关系容器, 样本与样本组映射关系容器用于指示至少一个映射组中每个映射组对应的 样本。 目标子轨道对应的子轨道数据定义容器包括目标子轨道的每个分块的 标识。
确定单元 320a根据目标子轨道对应的子轨道数据定义容器确定播放时 间段对应的样本中目标子轨道对应的 NAL包具体为:根据样本组描述容器、 样本与样本组映射关系容器和目标子轨道的每个分块的标识,确定播放时间 段对应的样本中目标子轨道对应的 NAL包。
可选地, 作为另一实施例, 子轨道数据定义容器可以包括分组标识。 确定单元 320a还可以在确定播放时间段对应的样本中目标子轨道分别 对应的 NAL包之前, 根据分组标识, 从视频文件中获取具有该分组标识的 样本组描述容器和具有该分组标识的样本与样本组映射关系容器。
设备 300a的具体操作和功能可以参照下面 5a、 图 13或图 18中文件解 析器所执行的方法的过程, 为了避免重复, 此处不再赘述。
图 3b是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 3b的设备 300b的例子可以是文件解析器, 或者包含文件解析器的用户设备 等。 设备 300b包括存储器 310b、 处理器 320b和接收器 330b。
存储器 310b可以包括随机存储器、 闪存、 只读存储器、 可编程只读存 储器、非易失性存储器或寄存器等。处理器 320b可以是中央处理器(Central Processing Unit, CPU )。
存储器 310b用于存储可执行指令。 处理器 320b可以执行存储器 310b 中存储的可执行指令。
视频的视频轨道被划分为至少一个子轨道,每个子轨道由一个子轨道数 据描述容器和一个子轨道数据定义容器描述。 接收器 330b接收视频对应的 视频文件, 视频文件包括至少一个子轨道数据描述容器、 至少一个子轨道数 据定义容器以及组成视频轨道的样本, 子轨道数据描述容器包括子轨道数据 描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示在视频的画 面中子轨道对应的区域, 子轨道数据定义容器用于指示在组成视频轨道的样 本中子轨道数据定义容器描述的子轨道对应的 NAL包。处理器 320b执行存 储器 310b 中存储的可执行指令, 用于: 确定在视频的画面中需要提取的目 标区域以及需要提取的播放时间段; 根据接收单元接收的视频文件, 在组成 视频轨道的样本中确定播放时间段对应的样本; 根据目标区域以及子轨道数 据描述容器包括的子轨道的区域信息,在至少一个子轨道中确定与目标区域 对应的子轨道作为目标子轨道; 根据目标子轨道对应的子轨道数据定义容 器, 确定播放时间段对应的样本中目标子轨道对应的 NAL包, 确定的 NAL 包被解码后用于播放目标区域在播放时间段内的画面。
本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目 标子轨道, 并根据目标子轨道对应的子轨道数据定义容器确定播放时间段对 应的样本中目标子轨道对应的 NAL包,使得能够对这些 NAL包进行解码来 播放目标区域在该播放时间段内的画面,从而能够有效地实现视频中区域画 面的提取。
设备 300b可以执行下面图 5a、 图 13或图 18中文件解析器所执行的方 法的过程。 因此, 设备 300b的具体操作和功能此处不再赘述。
图 4a是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 4a的设备 400a的例子可以是文件生成器,或者包含文件生成器的服务器等。 设备 400a包括生成单元 410a和发送单元 420a。
视频的视频轨道被划分为至少一个子轨道, 视频轨道由样本组成。 生成 单元 410a针对至少一个子轨道中的每个子轨道, 生成一个子轨道数据描述 容器和一个子轨道数据定义容器, 子轨道数据描述容器包括该子轨道数据描 述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示在视频的画面 中该子轨道对应的区域, 子轨道数据定义容器用于指示在组成视频轨道的样 本中该子轨道数据定义容器描述的子轨道对应的 NAL包。生成单元 410a还 生成视频的视频文件,视频文件包括针对每一个子轨道生成的一个子轨道数 据描述容器和一个子轨道数据定义容器以及组成视频轨道的样本。发送单元 420a发送生成单元 410a生成的视频文件。
本发明实施例中, 通过针对至少一个子轨道中的每个子轨道, 生成一个 子轨道数据描述容器和一个子轨道数据定义容器,子轨道数据描述容器包括 子轨道数据描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示 在视频的画面中子轨道对应的区域, 子轨道数据定义容器包括在组成视频轨 道的样本中子轨道数据定义容器描述的子轨道对应的 NAL包, 并生成包括 针对每个子轨道生成的子轨道数据描述容器和子轨道数据定义容器以及组 成视频轨道的样本的视频文件,使得文件解析器能够根据子轨道的区域信息 确定目标区域对应的目标子轨道, 并能够根据子轨道数据定义容器确定播放 时间段对应的样本中目标子轨道对应的 NAL包, 以播放目标区域在该播放 时间段内的画面, 从而能够有效地实现视频中区域画面的提取。
可选地,作为一个实施例,子轨道对应的区域可以由至少一个分块组成。 子轨道数据定义容器可以包括在组成视频轨道的样本中该子轨道数据定义 容器描述的子轨道的每个分块与 NAL包之间的对应关系的标识。
生成单元 410a可以在生成视频的视频文件之前, 生成样本组描述容器, 样本组描述容器可以包括视频轨道中各个分块与 NAL包之间的对应关系以 及各个分块与 NAL包之间的对应关系的标识。
视频文件可以进一步包括该样本组描述容器。
可选地, 作为另一实施例, 在子轨道对应的区域中, 对于组成视频轨道 的样本, 标识相同的分块可以对应于相同编号的 NAL包。
可选地, 作为另一实施例, 在子轨道对应的区域中, 对于组成视频轨道 的样本中的至少两个样本, 至少一个标识相同的分块可以对应于不同编号的 NAL 包。 子轨道数据定义容器还可以包括该子轨道数据定义容器描述的子 轨道的每个分块与 NAL包之间的对应关系的标识所对应的样本信息。
可选地, 作为另一实施例, 子轨道数据定义容器和样本组描述容器可以 分别包括相同的分组标识。
可选地,作为另一实施例,子轨道对应的区域可以由至少一个分块组成。 子轨道数据定义容器可以包括该子轨道数据定义容器描述的子轨道中 每个分块的标识。
生成单元 410a还可以在生成视频的视频文件之前, 生成样本组描述容 器以及样本与样本组的映射关系容器, 样本组描述容器包括至少一个映射 组, 至少一个映射组中的每个映射组包括视频轨道中各个分块标识与 NAL 包之间的对应关系,样本与样本组映射关系容器用于指示至少一个映射组中 每个映射组对应的样本。
视频文件可以进一步包括样本组描述容器和样本与样本组的映射关系 ( 33
谷 。
可选地, 作为另一实施例, 子轨道数据定义容器、 样本组描述容器和样 本与样本组映射关系容器可以分别包括相同的分组标识。
本发明实施例的分组标识可以指在子轨道数据定义容器、样本组描述容 器和样本与样本组映射关系容器中, 分组类型(grouping— type )字段的取值。
设备 400a的其它功能和操作可以参照下面图 5b、图 7和图 17中文件生 成器所执行的方法的过程, 为了避免重复, 此处不再赘述。
图 4b是根据本发明另一实施例的处理视频的设备的示意性流程图。 图 4b的设备 400b的例子可以是文件生成器,或者包含文件生成器的服务器等。 设备 400b包括存储器 410b、 处理器 420b和发送器 430b。
存储器 410b可以包括随机存储器、 闪存、 只读存储器、 可编程只读存 储器、非易失性存储器或寄存器等。处理器 420b可以是中央处理器(Central Processing Unit, CPU )。
存储器 410b用于存储可执行指令。 处理器 420b可以执行存储器 410b 中存储的可执行指令。
视频的视频轨道被划分为至少一个子轨道, 视频轨道由样本组成。 处理 器 420b执行存储器 410b中存储的可执行指令, 用于: 针对至少一个子轨道 中的每个子轨道, 生成一个子轨道数据描述容器和一个子轨道数据定义容 器, 子轨道数据描述容器包括该子轨道数据描述容器描述的子轨道的区域信 息, 子轨道的区域信息用于指示在视频的画面中该子轨道对应的区域, 子轨 道数据定义容器用于指示在组成视频轨道的样本中该子轨道数据定义容器 描述的子轨道对应的 NAL包; 生成视频的视频文件, 视频文件包括针对每 一个子轨道生成的一个子轨道数据描述容器和一个子轨道数据定义容器以 及组成视频轨道的样本。
发送器 430b发送视频文件。
本发明实施例中, 通过针对至少一个子轨道中的每个子轨道, 生成一个 子轨道数据描述容器和一个子轨道数据定义容器,子轨道数据描述容器包括 子轨道数据描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示 在视频的画面中子轨道对应的区域, 子轨道数据定义容器包括在组成视频轨 道的样本中子轨道数据定义容器描述的子轨道对应的 NAL包, 并生成包括 针对每个子轨道生成的子轨道数据描述容器和子轨道数据定义容器以及组 成视频轨道的样本的视频文件,使得文件解析器能够根据子轨道的区域信息 确定目标区域对应的目标子轨道, 并能够根据子轨道数据定义容器确定播放 时间段对应的样本中目标子轨道对应的 NAL包, 以播放目标区域在该播放 时间段内的画面, 从而能够有效地实现视频中区域画面的提取。
设备 400b可以执行下面图 5b、图 7和图 17中文件生成器所执行的方法 的过程, 因此, 设备 400b的具体功能和操作此处不再赘述。
图 5a是根据本发明一个实施例的处理视频的方法的示意性流程图。 图 5a的方法由文件解析器执行。
本发明实施例中, 视频的视频轨道可以划分为至少一个子轨道, 每个子 轨道由一个子轨道数据描述容器和一个子轨道数据定义容器描述。 下面将详 细描述处理视频的方法的过程。
510a, 接收视频对应的视频文件, 视频文件包括至少一个子轨道数据描 述容器、 至少一个子轨道数据定义容器以及组成视频轨道的样本, 子轨道数 据描述容器包括该子轨道数据描述容器描述的子轨道的区域信息, 子轨道的 区域信息用于指示在视频的画面中该子轨道对应的区域, 子轨道数据定义容 器用于指示在组成视频轨道的样本中该子轨道数据定义容器描述的子轨道 对应的 NAL包。
例如, 文件解析器可以从文件生成器接收视频文件。 视频文件包含的至 少一个子轨道数据描述容器中第 m子轨道数据描述容器可以包括该视频轨 道的子轨道中的第 m子轨道的区域信息, 第 m子轨道的区域信息用于指示 在视频的画面中第 m子轨道对应的区域, 第 m子轨道数据定义容器可以用 于指示在组成视频轨道的样本中第 m子轨道对应的 NAL包, m可以为取值 从 1至 M的正整数, M可以为视频轨道包括的至少一个子轨道的数目。
520a, 确定在视频的画面中需要提取的目标区域以及需要提取的播放时 间段。
例如, 目标区域可以是用户或节目提供商通过相应的应用在视频的画面 中指定的, 目标区域可以是单独播放的区域。 播放时间段也可以是用户指定 的。 如果用户未指定播放时间段, 那么播放时间段也可以是默认的, 例如轨 道对应的整个播放时间段。
530a, 根据视频文件, 在组成视频轨道的样本中确定播放时间段对应的 样本。
如前面所述,视频轨道可以由按照时间顺序排列的一组样本组成。因此, 文件解析器可以基于指定的播放时间段, 确定播放时间段对应的样本。 具体 的, 基于指定的播放时间段, 确定播放时间段对应的样本属于现有技术, 本 发明实施例不再详述。
540a, 根据目标区域以及子轨道数据描述容器包括的子轨道的区域信 息, 在至少一个子轨道中确定与目标区域对应的子轨道作为目标子轨道。
550a, 根据目标子轨道对应的子轨道数据定义容器, 确定播放时间段对 应的样本中目标子轨道对应的 NAL包,该确定的 NAL包被解码后用于播放 目标区域在播放时间段内的画面。
每个目标子轨道对应的子轨道数据定义容器可以用于指示在上述组成 视频轨道的样本中该目标子轨道对应的 NAL包。 因此, 在确定播放时间段 对应的样本后, 文件解析器就可以根据子轨道数据定义容器确定这些样本中 每个目标子轨道对应的 NAL包。 这样, 解码器可以对文件解析器确定的这 些 NAL包进行解码, 从而对目标区域在播放时间段内的画面进行播放。
本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息,在至少一个子轨道中确定与目标区域对应的子轨道作为目 标子轨道, 并根据目标子轨道对应的子轨道数据定义容器确定播放时间段对 应的样本中目标子轨道对应的 NAL包,使得能够对这些 NAL包进行解码来 播放目标区域在该播放时间段内的画面,从而能够有效地实现视频中区域画 面的提取。
本发明实施例中, 由于子轨道机制用于媒体选择和媒体切换, 因此在视 频文件中往往只有一个子轨道对应于一个轨道, 即使有多个子轨道对应于一 个轨道, 其子轨道的数量也比较少。 而子轨道可以对应于子轨道数据描述容 器和子轨道数据定义容器, 因此能够根据上述两种容器快速地确定播放时间 段内对应的样本中每个目标子轨道分别对应的 NAL包。 因此, 处理时间相 对较少, 用户体验较好。
可选地, 作为一个实施例, 每个子轨道对应的区域可以由至少一个分块 组成, 分块是对画面划分得到的。
在 HEVC方法中, 引入了分块( Tile ) 的概念。 分块是利用井字格对视 频的画面划分得到的矩形区域, 每个分块可以被独立解码。 可以理解的是, 此处说分块是对视频的画面划分得到的,也就是分块是对视频的图像帧划分 得到的。 每个图像帧的分块划分方式都是相同的。 在轨道中, 对于所有样本 来说, 分块数目和分块位置均是相同的。
每个子轨道对应的区域可以由一个分块或多个相邻的分块组成, 这些分 块形成的区域可以为矩形区域。 为了减少子轨道的数量, 可以使得一个子轨 道对应的区域由多个相邻的分块组成, 这些分块可以形成矩形区域。 反之, 如果单个分块反映的内容较多时, 例如一个完整的视频对象, 那么一个子轨 道对应的区域由一个分块组成。 例如, 当视频为高分辨率视频时, 视频的画 面可以划分为多个分块, 单个分块反映的内容往往艮少, 例如只是一个视频 对象的一部分, 视频对象可以指视频画面中的人或物等对象。
可选地, 作为一个实施例, 每个子轨道的区域信息可以包括该子轨道对 应的区域的大小和位置。 也就是, 第 m子轨道的区域信息可以包括第 m子 轨道对应的区域的大小和位置。 例如, 可以通过像素来描述每个子轨道对应 的区域和位置。 比如, 可以通过像素来描述该区域的宽度和高度, 可以通过 该区域相对于视频画面的左上角像素的水平偏移以及垂直偏移来表示该区 域的位置。
在步骤 540a中, 文件解析器可以对每个子轨道对应的区域与目标区域 进行比较,确定子轨道对应的区域与目标区域是否存在交叠,如果存在交叠, 则可以确定该子轨道对应于目标区 i或。
具体地, 可以按照下述方式判断一个子轨道对应的区域与目标区域是否 存在交叠。 如上所述, 子轨道对应的区域可以为由至少一个分块组成的矩形 区域。 而用户或节目提供商指定的目标区域的形状可以是任意的, 例如, 可 以为矩形、 三角形或圓形等。 在判断子轨道对应的区域是否与目标区域存在 交叠时, 通常基于矩形来判断交叠。 那么, 可以确定目标区域对应的矩形。 如果目标区域本身的形状为矩形, 那么目标区域对应的矩形也就是目标区域 自身。 如果目标区域本身的形状不为矩形, 那么需要选择包含该目标区域的 矩形来作为判断对象。 例如, 4叚设目标区域是三角区域, 那么目标区域对应 的矩形可以是包含该三角区域的最小矩形。 A )文件解析器可以确定目标区域对应的矩形左上角相对于画面左上角 的水平偏移。
该子轨道对应的子轨道数据描述容器所包括的该子轨道的区域信息, 区 域信息可以指示该子轨道对应的区域的大小和位置。 因此文件解析器可以根 据该子轨道的区域信息,确定该子轨道对应的区域的左上角相对于画面左上 角的水平偏移, 确定两个水平偏移之间的最大值, 此处将两个水平偏移之间 的最大值称为两个矩形左侧边界最大值。 应理解, 此处提到的画面, 也可以 理解为视频的图像帧。
B )文件解析器可以确定目标区域对应的矩形左上角相对于画面左上角 的垂直偏移。 文件解析器可以根据该子轨道的区域信息, 确定该子轨道对应 的区域的左上角相对于画面左上角的垂直偏移,确定两个垂直偏移之间的最 大值, 此处将两个垂直偏移之间的最大值称为两个矩形上侧边界最大值。
C )文件解析器可以确定目标区域对应的矩形左上角相对于画面左上角 的水平偏移与目标区域对应的矩形的宽之和。文件解析器可以根据该子轨道 移与该子轨道对应的区域的宽之和, 确定两个宽之和之间的最小值, 此处将 该两个宽之和之间的最小值称为两个矩形右侧边界最小值。
D )文件解析器可以确定目标区域对应的矩形左上角相对于画面左上角 的垂直偏移与目标区域画面对应的矩形的高之和。文件解析器可以根据该子 轨道的区域信息,确定该子轨道对应的区域的左上角相对于画面左上角的垂 直偏移与该子轨道对应的区域的高之和, 确定两个高之和之间的最小值, 此 处将两个高之和之间的最小值称为两个矩形下侧边界最小值。
E ) 当两个矩形左侧边界最大值大于或等于两个矩形右侧边界最小值, 或者两个矩形上侧边界最大值大于或等于两个矩形下侧边界最小值时, 文件 解析器可以确定两个区域没有交叠, 否则, 文件解析器可以确定两个区域存 在交叠。
可选地, 作为另一实施例, 每个子轨道数据描述容器还可以包括信息标 志 ( Flag ),该信息标志可以指示该子轨道数据描述容器中包括该子轨道数据 描述容器描述的子轨道的区域信息。
可选地, 作为另一实施例, 每个子轨道的区域信息还可以包括以下至少 一种信息: 用于指示该子轨道对应的区域能否独立解码的标识信息、 该子轨 道对应的区域所包含的分块标识 ( Identity, ID )以及该子轨道对应的区域的 标识等。
可选地,作为另一实施例,子轨道对应的区域可以由至少一个分块组成。 视频文件还可以包括样本组描述容器,样本组描述容器可以包括视频轨道中 各个分块与 NAL包之间的对应关系以及各个分块与 NAL包之间的对应关系 的标识。
目标子轨道对应的子轨道数据定义容器可以包括在上述组成视频轨道 的样本中该目标子轨道的每个分块与 NAL包之间的对应关系的标识。
在步骤 550a中, 文件解析器可以根据样本组描述容器和目标子轨道的 每个分块与 NAL包之间的对应关系的标识, 确定播放时间段对应的样本中 目标子轨道对应的 NAL包。
每个子轨道对应的区域可以由至少一个分块组成, 因此每个子轨道对应 的 NAL包可以理解为每个子轨道中各个分块对应的 NAL包。每个子轨道数 据定义容器可以包括该子轨道数据定义容器描述的子轨道中各个分块与 NAL包之间的对应关系的标识。 例如, 在下面图 7至图 16的实施例中, 在 子轨道数据定义容器中, 分块与 NAL包之间的对应关系的标识可以是组描 述索引, 使用 "group— description— index" (组描述索引)字段表示。
而样本组描述容器可以包括该视频轨道中各个分块与 NAL包之间的对 应关系以及这些对应关系的标识。 例如, 对应关系的标识可以是索引, 索引 可以指示对应关系在样本组描述容器的存储位置。 比如,在下面图 7至图 16 的实施例中, 在样本组描述容器中, 对应关系的标识可以是条目索引, 使用 "Entry— Index" (条目索引) 字段表示。 在每种对应关系中, 可以包括分块 的标识以及该分块对应的起始 NAL包的编号以及对应的 NAL包的数目。 标子轨道的各个分块与 NAL包之间的对应关系的标识。 然后, 文件解析器 可以根据该目标子轨道的各个分块与 NAL包之间的对应关系的标识, 从样 本组描述容器中获取该目标子轨道的各个分块与 NAL包之间的对应关系的 标识所指示的对应关系,基于获取的对应关系确定该目标子轨道对应的 NAL 包。
例如, 对于其中任意的一个目标子轨道来说, 文件解析器可以根据在组 成视频轨道的样本中该目标子轨道中各个分块与 NAL包之间的对应关系的 标识, 在样本组描述容器中查找各个分块与 NAL包之间的对应关系的标识 所指示的分块与 NAL包之间的对应关系, 然后可以基于这些查找到的对应 的起始 NAL包的编号以及 NAL包的数目确定在组成视频轨道的样本中该目 标子轨道中各个分块对应的 NAL包。 从而可以确定播放时间段对应的样本 中该目标子轨道中各个分块对应的 NAL包。
可选地, 作为另一实施例, 在每个子轨道对应的区域中, 对于组成视频 轨道的样本, 标识相同的分块对应于相同编号的 NAL包。
例如, 对于组成视频轨道的样本, 第 i分块可以对应于相同编号的 NAL 包, i可以为取值从 1至 K的正整数, K可以为一个子轨道对应的区域中分 块的总数目。
具体地, 在组成视频轨道的样本中, 同一个分块标识所指示的分块可以 对应于相同编号的 NAL包。 这种情况下, 样本组描述容器中包含的对应关 系的条数与视频轨道中分块的总数目是相同的, 也就是说, 有多少个分块, 就有多少种对应关系。
这种情况下, 在组成视频轨道的样本中, 相同标识所指示的子轨道可以 对应于相同编号的 NAL包。 那么, 在每个子轨道对应的子轨道数据定义容 器中, 可以不用包含各个样本的样本信息, 比如样本标识或样本数目等。
可选地, 作为另一实施例, 在每个子轨道对应的区域中, 对于组成视频 轨道的样本中的至少两个样本, 至少一个标识相同的分块可以对应于不同编 号的 NAL包。
目标子轨道对应的子轨道数据定义容器还可以包括该目标子轨道中每 个分块与 NAL包之间的对应关系的标识所对应的样本信息。
在步骤 550a中, 文件解析器可以根据目标子轨道的每个分块与 NAL包 之间的对应关系的标识、 目标子轨道的每个分块与 NAL包之间的对应关系 的标识所对应的样本信息以及样本组描述容器,确定播放时间段对应的样本 中目标子轨道对应的 NAL包。
具体地, 在不同的样本中, 同一个分块标识所指示的分块可以对应于不 同编号的 NAL包。 例如, 在至少两个样本中, 第 i分块可以对应于不同编 号的 NAL包, i为取值从 1至 K的正整数, K为一个子轨道对应的区域中分 块的总数目。 这种情况下, 在样本组描述容器中, 相同的分块标识, 可以对应于不同 的起始 NAL包的编号或者 NAL包的数目。
因此, 子轨道数据定义容器还可以包括样本信息, 样本信息可以用于指 示每个分块与 NAL包之间的对应关系的标识所对应的样本。 例如样本信息 可以包括连续样本数目。 比如, 在下面图 7至图 16的实施例中, 样本数目 可以使用 "sample— count" (样本数目 )字段表示。 连续样本数目与对应关系 的标识可以是——对应的。对应关系的标识是按照对应的连续样本数目所指 示的样本在视频轨道中的时间顺序排列的。 也可以理解为, 按照每个分块与 NAL 包之间的对应关系对样本进行分组。 例如, 在两个样本中, 如果同一 个分块对应于相同的 NAL包,则这两个样本将对应于同一个对应关系标识, 如果同一个分块对应于不同的 NAL包, 则这两个样本将分别对应于不同的 对应关系标识。 取该目标子轨道中各个分块与 NAL包之间的对应关系的标识以及各个分块 与 NAL包之间的对应关系的标识对应的样本信息, 可以根据样本信息确定 在播放时间段对应的样本中该目标子轨道中各个分块与 NAL包之间的对应 关系的标识, 然后可以根据确定的对应关系的标识, 从样本组描述容器中获 取所确定的对应关系的标识指示的对应关系,从而确定在播放时间段对应的 样本中该目标子轨道对应的 NAL包。
可选地,作为另一实施例,每个子轨道数据定义容器可以包括分组标识。 文件解析器可以根据该分组标识,从视频文件中获取具有该分组标识的样本 组描述容器。 也就是说, 子轨道数据定义容器包括的分组标识和样本组描述 容器包括的分组标识相同。
具体地, 在视频文件中, 可能存在多个样本组描述容器, 不同的样本组 描述容器可以用于描述基于不同标准分组的样本的特性。 例如, 可以基于分 块与 NAL包之间的对应关系对视频轨道中的样本进行分组, 针对这种分组 标准的样本组描述容器可以用于描述各个分块与 NAL包之间的对应关系。 可以基于样本所属的时间层进行分组,针对这种分组标准的样本组描述容器 可以用于描述时间层的相关信息。
因此, 为了获取每个目标子轨道中各个分块与 NAL包的对应关系, 文 件解析器需要从视频文件中获取描述分块与 NAL包的对应关系的样本组描 述容器。 因此, 子轨道数据定义容器和样本组描述容器可以包括取值相同的 分组标识, 这样文件解析器可以基于子轨道数据定义容器中的分组标识获取 相应的样本组描述容器。 例如, 在下面图 7至图 16的实施例中, 子轨道数 据定义容器中的分组标识和样本组描述容器中的分组标识均可以是分组类 型, 使用 ""grouping_type" (分组类型)字段表示。
可选地,作为另一实施例,子轨道对应的区域可以由至少一个分块组成。 视频文件还可以包括样本组描述容器, 样本组描述容器包括至少一个映射 组, 至少一个映射组中的每个映射组包括视频轨道中各个分块标识与 NAL 包之间的对应关系。
视频文件还可以包括样本与样本组映射关系容器,样本与样本组映射关 系容器用于指示至少一个映射组中每个映射组对应的样本。
目标子轨道对应的子轨道数据定义容器可以包括该目标子轨道的每个 分块的标识。
在步骤 550a中, 文件解析器可以根据样本组描述容器、 样本与样本组 映射关系容器和目标子轨道的每个分块的标识,确定播放时间段对应的样本 中目标子轨道对应的 NAL包。
具体地, 样本组描述容器可以包括至少一个映射组, 每个映射组可以包 括视频轨道中各个分块与 NAL包之间的对应关系。 每个映射组可以有相应 的标识, 例如, 在下面图 17至图 19的实施例中, 映射组的标识可以是条目 索引, 使用 "Entry— Index" (条目索引) 字段表示。 在每个映射组中, 可以 包括视频轨道中各个分块的标识以及该分块对应的起始 NAL包的编号。
例如, 样本组描述容器可以包括一个映射组, 这种情况下, 对于组成视 频轨道的样本来说, 同一分块标识所指示的分块对应于相同编号的 NAL包。
样本组描述容器可以包括多个映射组。 各个映射组之间是互不相同的。 这种情况下, 对于组成视频轨道的样本来说, 至少一个相同分块标识所指示 的分块对应于不同编号的 NAL包。 也就是说, 任意的两个映射组中, 至少 有一个分块与 NAL包之间的对应关系是不相同的。
这种情况下, 视频文件还可以包括样本与样本组映射关系容器, 样本与 样本组映射关系容器可以用于指示每个映射组对应的样本。 例如, 样本与样 本组映射关系容器可以包括每个映射组的标识以及对应的连续样本数目。 映 射组的标识是按照样本在视频轨道中的时间顺序排列的。从而可以根据样本 与样本组映射关系容器确定在各个样本中每个分块与 NAL包之间的对应关 系。
对于任意一个目标子轨道, 文件解析器可以根据样本与样本组映射关系 容器, 确定播放时间段对应的样本所对应的映射组标识。 然后可以根据确定 映射组标识,在样本组描述容器中确定该映射组标识所指示的映射组。同时, 文件解析器可以根据该目标子轨道对应的子轨道数据定义容器,确定该目标 子轨道中的各个分块标识。 文件解析器可以在上面确定的映射组中, 确定该 目标子轨道中的各个分块标识对应的 NAL包的编号。
可选地,作为另一实施例,每个子轨道数据定义容器可以包括分组标识。 文件解析器可以根据该分组标识,从视频文件中获取具有该分组标识的样本 组描述容器和具有该分组标识的样本与样本组映射关系容器。
具体地, 在视频文件中, 可能存在多个样本组描述容器, 不同的样本组 描述容器可以用于描述基于不同标准分组的样本的特性。 例如, 可以基于分 块与 NAL包之间的对应关系对视频轨道中的样本进行分组, 针对这种分组 标准的样本组描述容器可以用于描述各个分块与 NAL包之间的对应关系。 可以基于样本所属的时间层进行分组,针对这种分组标准的样本组描述容器 可以用于描述时间层的相关信息。
相应地, 可能存在多个样本与样本组映射关系容器, 不同的样本与样本 组映射关系容器可以用于指示基于不同分组标准划分的各个样本组。 例如, 可以基于分块与 NAL包之间的对应关系对视频轨道中的样本进行分组, 针 对这种分组标准的样本与样本组映射关系容器可以用于指示基于各个分块 与 NAL包之间的对应关系所划分的各个样本组。 可以基于样本所属的时间 层进行分组,针对这种分组标准的样本与样本组映射关系容器可以用于指示 基于时间层划分的各个样本组。
因此, 为了获取每个目标子轨道中各个分块与 NAL包的对应关系以及 相应的样本分组情况, 文件解析器需要从视频文件中获取用于描述分块与 NAL包的对应关系的样本组描述容器, 并获取用于指示基于分块与 NAL包 的对应关系的划分的各个样本组。 因此, 子轨道数据定义容器、 样本组描述 容器和样本与样本组映射关系容器可以包括取值相同的分组标识, 这样文件 解析器可以基于子轨道数据定义容器中的分组标识获取相应的样本组描述 容器以及样本与样本组映射关系容器。 例如, 在下面图 17至图 19的实施例 中, 子轨道数据定义容器包括的分组标识、 样本组描述容器包括的分组标识 和样本与样本组映射关系容器包括的分组标识均可以是分组类型, 使用
""grouping_type" (分组类型)字段表示。
可选地, 作为另一实施例, 子轨道数据定义容器可以不包括分组标识。 可以预先设定子轨道数据定义容器的分组标识的取值。 这样, 可以先获取存 储的子轨道数据定义容器的分组标识的取值, 然后根据该取值获取相应的样 本组描述容器以及样本与样本组映射关系容器。
图 5b是根据本发明另一实施例的处理视频的方法的示意性流程图。 图 5b的方法由媒体文件生成器执行。 图 5b的方法与图 5a的方法是相对应的, 在图 5b中, 将适当省略相同的描述。 在图 5b的实施例中, 视频的视频轨道 被划分为至少一个子轨道, 视频轨道由样本组成。
510b, 针对至少一个子轨道中的每个子轨道, 生成一个子轨道数据描述 容器和一个子轨道数据定义容器, 子轨道数据描述容器包括子轨道数据描述 容器描述的子轨道的区域信息, 子轨道的区域信息用于指示在视频的画面中 该子轨道对应的区域, 子轨道数据定义容器包括在组成视频轨道的样本中子 轨道数据定义容器描述的子轨道对应的 NAL包。
520b, 生成视频的视频文件, 视频文件包括针对每一个子轨道生成的一 个子轨道数据描述容器和一个子轨道数据定义容器以及组成视频轨道的样 本。
530b, 发送视频文件。
例如, 文件生成器可以向文件解析器发送视频文件。
本发明实施例中, 通过针对至少一个子轨道中的每个子轨道, 生成一个 子轨道数据描述容器和一个子轨道数据定义容器,子轨道数据描述容器包括 子轨道数据描述容器描述的子轨道的区域信息, 子轨道的区域信息用于指示 在视频的画面中子轨道对应的区域, 子轨道数据定义容器包括在组成视频轨 道的样本中子轨道数据定义容器描述的子轨道对应的 NAL包, 并生成包括 针对每个子轨道生成的子轨道数据描述容器和子轨道数据定义容器以及组 成视频轨道的样本的视频文件,使得文件解析器能够根据子轨道的区域信息 确定目标区域对应的目标子轨道, 并能够根据子轨道数据定义容器确定播放 时间段对应的样本中目标子轨道对应的 NAL包, 以播放目标区域在该播放 时间段内的画面, 从而能够有效地实现视频中区域画面的提取。 可选地, 作为一个实施例, 每个子轨道对应的区域可以由至少一个分块 组成。子轨道数据定义容器可以包括在组成视频轨道的样本中该子轨道数据 定义容器描述的子轨道的每个分块与 NAL包之间的对应关系的标识。
在步骤 520b之前, 文件生成器还可以生成样本组描述容器, 样本组描 述容器包括视频轨道中各个分块与 NAL包之间的对应关系以及各个分块与 NAL包之间的对应关系的标识。
视频文件可以进一步包括样本组描述容器。
可选地, 作为另一实施例, 在每个子轨道对应的区域中, 对于组成视频 轨道的样本, 标识相同的分块可以对应于相同编号的 NAL包。
可选地, 作为另一实施例, 在每个子轨道对应的区域中, 对于组成视频 轨道的样本中的至少两个样本, 至少一个标识相同的分块可以对应于不同编 号的 NAL包。
子轨道数据定义容器还可以包括子轨道数据定义容器描述的子轨道的 每个分块与 NAL包之间的对应关系的标识所对应的样本信息。
可选地, 作为另一实施例, 每个子轨道数据定义容器和样本组描述容器 分别包括相同的分组标识。
可选地, 作为另一实施例, 每个子轨道对应的区域可以由至少一个分块 组成。
子轨道数据定义容器可以包括该子轨道数据定义容器描述的子轨道的 每个分块的标识。
在步骤 520b之前, 文件生成器可以生成样本组描述容器以及样本与样 本组的映射关系容器, 样本组描述容器包括至少一个映射组, 至少一个映射 组中的每个映射组包括视频轨道中各个分块标识与 NAL 包之间的对应关 系,样本与样本组映射关系容器用于指示至少一个映射组中每个映射组对应 的样本。
视频文件还可以进一步包括样本组描述容器和样本与样本组的映射关 系容器。
可选地, 作为另一实施例, 子轨道数据定义容器、 样本组描述容器和样 本与样本组映射关系容器分别包括相同的分组标识。
下面将结合具体例子详细描述本发明实施例。 应注意, 这些例子只是为 了帮助本领域技术人员更好地理解本发明实施例, 而非限制本发明实施例的 范围。
图 6a是可应用本发明实施例的场景中的一个图像帧的示意图。 图 6b是 可应用本发明实施例的场景中的另一图像帧的示意图。
图 6a和图 6b可以是播放同一视频时的两个图像帧。 如图 6a和图 6b所 示, 中间的矩形区域可以为用户通过终端所指定的视频画面中的目标区域。 根据用户的需求, 需要单独呈现某段时间内的目标区域的画面。
下面将结合图 6a和图 6b的场景详细描述本发明实施例的处理视频的方 法的过程。 在图 7中, 重点描述生成视频文件的过程。
图 7是根据本发明一个实施例的处理视频的方法的过程的示意性流程 图。 图 7的方法由文件生成器执行。
701 , 文件生成器确定视频轨道中分块与 NAL包之间的对应关系。
具体地, 可以将视频画面划分为多个分块, 也就是, 将视频的图像帧划 分为多个分块。 视频的所有图像帧的分块数目和分块位置均是相同的, 因此 对于组成视频轨道的所有样本来说, 分块数目和分块位置也是相同的。
图 8是根据本发明一个实施例的分块的示意图。 如图 8所示, 可以将图
6a所示的图像帧划分为 4个分块, 即分块 0、 分块 1、 分块 2和分块 3。 4 个分块的大小可以是相同的, 其分块 ID分别为 0、 1、 2和 3。 该视频中其 它图像帧中的分块方式均与图 8相同, 不再赘述。 例如, 假设该视频包括 54 个图像帧, 该视频为单层编码的视频, 那么该视频的视频轨道可以由 54个 样本组成。 每个图像帧中的分块的划分方式均与图 8所示的方式相同, 也就 是, 每个样本对应的分块的划分方式也是与图 8所示的方式相同。
每个分块可以对应连续的一个或多个 NAL包。 具体地, 分块与 NAL包 之间的对应关系可以包括分块 ID、 分块对应的起始 NAL包的编号、 分块对 应的 NAL包的数目。其中,分块对应的起始 NAL包为分块对应的连续 NAL 包中第一个 NAL包。 在下面的描述中, 可以将分块 ID记为 tileID。
由于样本中 NAL包的编号是连续的,因此通过分块对应的起始 NAL包 的编号以及其对应的 NAL包的数目,就可以确定该分块对应的 NAL包的编 号。 NAL 包的数目均相同, 则这些样本属于同一个样本组; 否则, 这些样本属 于不同的样本组。 关于分块与 NAL包之间的对应关系, 可以存在以下两种情况:
( A )在视频轨道的所有样本中, 相同的分块 ID所指示的分块, 对应于 相同编号的 NAL包。
这种情况下, 分块与 NAL包之间的对应关系的总条数与分块的总数目 可以是相同的。
图 9是根据本发明一个实施例的分块与 NAL包之间的对应关系的示意 图。 如图 9所示, 每个分块对应的 NAL包由横向的虚线隔开。 表 1示出了 图 9中分块与 NAL包之间的对应关系。 由于所有样本中, 相同的分块 ID所 指示的分块, 对应于相同编号的 NAL包。 那么在该视频轨道中, 共有 4种 分块与 NAL包之间的对应关系,也就是分块与 NAL包之间的对应关系的总 条数与分块的数目相同。 例如, 分块 1可以对应于 2个 NAL包, 起始 NAL 包的编号为 0。 分块 1可以对应于 3个 NAL包, 起始 NAL包的编号为 2。 以此类推。
表 1 分块与 NAL包之间的对应关系
Figure imgf000031_0001
( B )在视频轨道的至少两个样本中, 相同的分块 ID所指示的分块, 对 应于不同编号的 NAL包。
假设图 6a所示的图像帧的分块的划分方式和图 6b所示的图像帧不同, 也就是, 在图 6a的图像帧对应的样本以及图 6b的图像帧对应的样本中, 相 同分块 ID所指示的分块, 对应于不同编号的 NAL包。 下面通过图 10和表 2的例子说明图 6a所示的图像帧的分块, 并通过图 11和表 3的例子来说明 图 6b所示的图像帧的分块。
图 10是根据本发明另一实施例的分块与 NAL包之间的对应关系的示意 图。 如图 10所示, 图 6a所示的图像帧可以由分块 0至分块 3组成, 每个分 块中 NAL包可以由横向的虚线隔开。 表 2示出了图 10所示的对应关系。 如 表 1所示, 分块 1可以对应于 1个 NAL包, 起始 NAL包的编号为 0。 分块 2可以对应于 3个 NAL包, 起始 NAL包的编号为 2。 以此类推。
表 2 分块与 NAL包之间的对应关系
Figure imgf000032_0001
图 11是根据本发明另一实施例的分块与 NAL包之间的对应关系的示意 图。 如图 11所示, 如上所述, 图 6b所示的图像帧也可以由分块 0至分块 3 组成,每个分块中 NAL包可以通过横线隔开。在图 11中, 各个分块与 NAL 包之间的对应关系不同于图 10所示的对应关系。 表 3示出了图 11所示的对 应关系。 如表 3所示, 分块 1可以对应于 3个 NAL包, 起始 NAL包的编号 为 0。 分块 2可以对应于 3个 NAL包, 起始 NAL包的编号为 3。 以此类推。
表 3 分块与 NAL包之间的对应关系
Figure imgf000032_0002
可见,上述表 2和表 3—起示出了 8种分块与 NAL包之间的对应关系。 此处, 假设在该视频轨道的其它样本中, 分块与 NAL包之间的对应关系符 合上述 8种对应关系中的 4种。 因此, 在该视频轨道中, 共有上述 8种分块 与 NAL包之间的对应关系。
702, 文件生成器根据步骤 701中的分块与 NAL包之间对应关系, 生成 样本组描述容器。
在样本组描述容器中, 上述对应关系的标识可以是条目索引。 具体地, 样本组描述容器可以包括整数个子样本与 NAL 包的映射关系条目 (Sub Sample NALU Map Entry ), 其具体数量与视频轨道中分块与 NAL包的对应 关系的数目相同。每个子样本与 NAL包的映射关系条目可以包括条目索引、 分块 ID、 该分块对应的起始 NAL包的编号、 该分块对应的 NAL包的数目。 具体地, 每个子样本与 NAL 包的映射关系条目可以包括以下字段: Entry— Index、 tileID、 NALU— start— number和 NALU— number。 "Entry— Index" 字段可以表示条目索引, 也就是分块与 NAL 包之间对应关系的标识。
"tilelD" 字段可以表示分块 ID, "NALU— start— number" 字段可以标识分块 对应的起始 NAL 包的编号, "NALU— number" 字段可以表示分块对应的 NAL包的数目。 各字段的具体含义见表 4。
此外, 样本组描述容器还可以包括图 5a的实施例中提到的分组标识。 在本实施例中, 分组标识可以是分组类型, 分组类型可以使用 "Grouping— type" (分组类型) 字段来表示, 该字段的取值可以表示该样本 组描述容器用于描述基于分块与 NAL包的对应关系的样本分组。 例如该字 段可以取值为 "ssnm"。
按照 ISOBMFF定义的框架, 子样本与 NAL包的映射关系条目的一种 数据结构可以表示如下:
class SubSampleNALUMapEntry() extends VisualSampleGroupEntry ('ssnm') { unsigned int(6) reserved = 0;
unsigned int(l) large— size;
if (large— size) {
unsigned int(16) NALU— start— number;
unsigned int(16) NALU— number;
} else {
unsigned int(8) NALU— start— number;
unsigned int(8) NALU— number;
}
unsigned int( 16) tilelD; //分块 ID
}
表 4示出了上述数据结构中各字段的含义。
表 4 子样本与 NAL包的映射关系条目中字段的含义 字段名称 字段含义
i己录分块对应的起始 NAL包的编号和分块对应的 NAL包数目的字段占用的字节数, large— size
取值为 1表示占用 2个字节
取值为 0表示占用 1个字节
NALU— start— number 分块对应的起始 NAL包的编号
NALU— number 分块对应的 NAL包的个数
tilelD 分块 ID
表 5示出了针对分块与 NAL包之间的对应关系为情况 ( A )时的样本组 描述容器所包含的内容。
表 5 样本组描述容器
Figure imgf000034_0001
表 6示出了针对分块与 NAL包之间的对应关系为情况 ( B )时的样本组 描述容器所包含的内容。
表 6 样本组描述容器
Grouping— type "ssnm"
Entry— Index tilelD NalU— start— number NalU— number
(条目索引) (分块 ID ) (起始 NAL包的编号) ( NAL包的数目 )
1 0 0 2
2 1 2 3
3 2 5 3
4 3 8 2
5 0 0 3 6 1 3 3
7 2 6 2
8 3 8 3 在表 5和表 6中, 每一行是一个子样本与 NAL包的映射关系条目记录 的对应关系。 其中 "Entry— Index" 字段可以表示每条子样本与 NAL包的映 射关系条目在样本组描述容器中的存储位置,后面的 3个字段是该条目中记 录的内容。
703 , 文件生成器基于分块将视频轨道被划分为子轨道。
每个子轨道可以由一个或多个分块组成, 这些分块可以形成一个矩形区 域。 本实施例中, 可以 4叚设每个子轨道由一个分块组成, 那么上面所述的 4 个分块将分别对应于 4个子轨道。
704, 对于每一个子轨道, 文件生成器生成用于描述该子轨道的子轨道 数据描述容器。
子轨道数据描述容器可以包括该容器描述的子轨道的区域信息。
另外, 每个子轨道数据描述容器还可以包括一个标志, 该标志可以指示 该子轨道数据描述容器中包括该子轨道数据描述容器描述的子轨道的区域 信息。 具体的, 该标志可以是一个 "flag" 字段, 可以对 "flag" 字段赋予特 定的值,从而指示该子轨道数据描述容器中包括该容器描述的子轨道的区域 信息。 例如, "flag" 字段取值为 " Γ 时, 可以表示该子轨道数据描述容器 中包括该容器描述的子轨道的区域信息。子轨道的区域信息可以包括该子轨 道对应的区域的大小和位置。 表 7示出了子轨道的区域信息中的属性。 如表 7所示, 子轨道对应的区域的大小可以通过该区域的宽度和高度来表示。 子 轨道对应的区域的位置可以通过该区域的左上角像素相对于图像的左上角 像素的水平偏移和垂直偏移来表示。
当 "flag" 字段指示该容器包括子轨道的区域信息时, 子轨道数据描述 容器的子轨道的区域信息可以包含如下属性:
unsigned int(32) horizontal— offset
unsigned int(32) vertical— offset
unsigned int(32) region— width
unsigned int(32) region— height unsigned int(32) tile— count 〃分块数目
for(i = 0; i< tile_count; i++) { //一个区域由至少一个分块组成 unsigned int(32) tilelD
unsigned int(32) independent
表 7 子轨道的区域信息的属性以及对应含义
Figure imgf000036_0001
图 12是图 8所示的分块在平面坐标系中的示意图。
表 8示出了图 12所示的各个分块对应的区域的大小和位置。 如表 8所 示, 通过像素来表示各个分块对应的区域的大小和位置。
表 8 子轨道的区域信息
左上角 X轴 左上角 Y轴
分块 ID 宽度(像素) 高度(像素) 坐标(像素) 坐标(像素)
0 0 0 160 480
1 160 0 160 480
2 320 0 160 480
3 480 0 160 480 705 , 对于每个子轨道, 文件生成器生成用于描述该子轨道的子轨道数 据定义容器。
具体地, 子轨道数据定义容器可以包括该子轨道数据定义容器描述的子 轨道的描述信息, 子轨道的描述信息可以指示该子轨道中每一分块与 NAL 包之间的对应关系。
具体地, 子轨道数据定义容器可以包括子轨道和样本组的映射关系容器 ( Sub Track Sample Group Box ), 子轨道和样本组的映射关系容器可以包括 该子轨道的一条或多条描述信息。
基于步骤 701中的情况(A )和(B ), 子轨道的描述信息所包含的具体 内容也可以分为两种情况。
( 1 )针对上述情况(A ), 对于组成视频轨道的样本而言, 相同分块 ID 指示的分块对应于编号相同的 NAL包。 因此, 子轨道和样本组的映射关系 容器可以包括整数条该子轨道的描述信息,每条描述信息可以包括组描述索 引, 组描述索引可以使用 "group— description— index" (组描述索引) 字段来 表示。 "group— description— index" 字段的数目与该子轨道对应的分块数目相 同。 "group— description— index" 字段可以用于指示子轨道数据定义容器描述 的子轨道中各个分块与 NAL包之间的对应关系标识。 每个分块可以对应于 一个样本组, 样本组可以包括一个或多个连续的样本, 样本组是基于分块与 NAL包之间的对应关系划分的。 "group— description— index" 字段的数目也可 以与该子轨道对应的样本组的数目相同。 因此, 子轨道的描述信息的条数与 该子轨道中分块的数目是相同的, 并与该子轨道对应的样本组的数目也是相 同的。
此外, 子轨道和样本组的映射关系容器还可以包括分组类型, 分组类型 可以使用 "grouping— type" (分组类型) 字段来表示, "grouping— type" 字段 可以表示该子轨道数据定义容器描述的是基于分块与 NAL包之间的对应关 系的子轨道信息。 例如, "grouping— type" 字段的取值也可以为 "ssnm"。 可 见, 子轨道数据定义容器中的 "grouping— type" 字段的取值与上述样本组描 述容器中的 "grouping— type" 字段的取值相同, 那么, 子轨道数据定义容器 与上述样本组描述容器是对应的。
按照 ISOBMFF定义的框架, 子轨道和样本组的映射关系容器的一种数 据结构可以表示如下: aligned(8) class SubTrackSampleGroupBox extends FullBox('stsg,, 0, 1){ unsigned int(32) grouping— type; 〃取值为 "ssnm"
unsigned int( 16) item— count; 〃描述信息的条数
for(i = 0; i< item— count; i++) {
unsigned int(32) group— description— index;
其中, 如上所述, " grouping— type,, 可以表示分组类型, " item— count " 可以表示子轨道和样本组的映射关系容器中包含的子轨道的描述信息的条 数。 每条描述信息可以包含上述 ""group— description— index" 字段。
每个子轨道可以对应一个子轨道容器, 子轨道容器可以包括该子轨道对 应的子轨道数据描述容器和该子轨道对应的子轨道数据定义容器。
表 9示出了在情况( A )中第 1个子轨道的子轨道容器( Sub Track Box ) 的一个例子。 如表 9所示, 在该子轨道容器中, 包括子轨道数据描述容器和 子轨道数据定义容器。 在子轨道数据描述容器中, 可以包括子轨道的属性信 息。 子轨道的属性信息可以包括 ID、 水平偏移、 垂直偏移、 区域宽度、 区 域高度、 分块 ID以及独立性字段。 其中, 子轨道数据描述容器中的 ID也是 子轨道容器的 ID, 可以表示该子轨道容器描述的子轨道。 此外, 水平偏移、 垂直偏移、 区域宽度和区域高度用于表示该子轨道对应的区域的大小和位 置。
子轨道数据定义容器可以包括子轨道和样本组的映射关系容器, 该子轨 道和样本组的映射关系容器包括子轨道的描述信息。子轨道的描述信息可以 用于指示子轨道中各个分块对应的 NAL包。 子轨道的描述信息可以包括组 描述索引。 该子轨道数据定义容器可以包括 "grouping— type" 字段, 该字段 取值为 "ssnm" , 因此该子轨道数据定义容器可以与 "grouping— type" 字段 取值也为 "ssnm" 的样本组描述容器相对应。 本实施例中, 该子轨道数据定 义容器可以对应于表 5所示的样本组描述容器。
如表 9所示, 在上面的假设中, 第 1个子轨道对应的区域由分块 ID为 "0" 的分块组成。 在情况(A )中, 子轨道的描述信息的条数与子轨道对应 的分块数目是相同的。 因此, 子轨道和样本组的映射关系容器可以包括一条 子轨道的 描 述信 息 。 在这条描 述信 息 中 , 组描 述索 引 "group— description— index" 字段取值为 " , 可以表示组成该视频轨道的样 本中分块 ID为 "0" 的分块对应于 "grouping— type" 字段取值为 "ssnm" 的 样本组描述容器中 "Entry— Index" 字段取值为 "Γ 所指示的对应关系。
应理解, 在情况(A ) 中, 如果子轨道对应的区域由多个分块组成, 相 应地在子轨道和样本组的映射关系容器中可以包括多条子轨道的描述信息, 分块数目与描述信息的条数是相同的。 例如, 子轨道对应的区域由 3个分块 组成, 那么子轨道和样本组的映射关系容器中可以包括子轨道的 3条描述信 息。 表 9 子轨道容器
Figure imgf000039_0001
( 2 )针对上述情况(B ), 对于视频轨道中的至少两个样本, 相同分块 ID所指示的分块所对应的 NAL包编号不同。 子轨道的每条描述信息可以包 括一个 "sample— count" (样本数目 ) 字段和一个 "group— description— index" (组描述索引)字段。 "sample— count" 字段可以表示符合分块与 NAL包的 对应关系的连续的样本数目, 也就是 "sample— count" 字段指示了符合该分 块与 NAL包的对应关系的样本组。 "group— description— index"字段可以用于 指示一个样本组中各个分块与 NAL包之间的对应关系标识。 可见, 子轨道 的描述信息的条数与样本组的数目是相同的。
子轨道和样本组的映射关系容器还可以包括 "grouping— type" (分组类 型) 字段, "grouping— type" 字段可以表示该子轨道数据定义容器描述的是 基于分块与 NAL包之间的对应关系的子轨道信息。 例如, "grouping— type" 字段的取值也可以为 " ssnm "。 可见, 子轨道数据定义容器中的 "grouping— type" 字段的取值与上述样本组描述容器中的 "grouping— type" 字段的取值相同, 那么, 子轨道数据定义容器与上述样本组描述容器是对应 的。
子轨道的各条描述信息的排列顺序按照 "sample— count" 字段指示的连 续样本在视频轨道中的顺序进行排列。
按照 ISOBMFF定义的框架, 子轨道和样本组的映射关系容器的一种数 据结构可以表示如下:
aligned(8) class SubTrackSampleGroupBox extends FullBox('stsg,, 0, 1){
unsigned int(32) grouping— type; 〃取值为 "ssnm"
unsigned int( 16) item— count; 〃描述信息的条数
for(i = 0; i< item— count; i++) {
unsigned int(32) group— description— index;
unsigned int(8) sample— count;
可见, 在子轨道和样本组映射关系容器的数据结构中, 定义了上述的各 个字段。 该数据结构中, "item— count" 可以表示子轨道的描述信息的条数, 在子轨道的每条描述信息中, 包括上述 " sample— count " 字段和 "group— description— index" 字段。
每个子轨道可以对应一个子轨道容器, 子轨道容器可以包括该子轨道对 应的子轨道数据描述容器和该子轨道对应的子轨道数据定义容器。
表 10示出了在情况(B )中第 1个子轨道对应的子轨道容器的一个例子。 如表 10所示, 该子轨道容器可以包括子轨道数据描述容器和子轨道数 据定义容器。 子轨道数据描述容器可以包括子轨道的属性信息, 属性信息可 以包括 ID、 水平偏移、 垂直偏移、 区域宽度、 区域高度、 分块 ID以及独立 性字段。 子轨道数据定义容器可以包括子轨道和样本组的映射关系容器, 子 轨道和样本组的映射关系容器可以包括子轨道的描述信息。子轨道的描述信 息可以用于指示子轨道中各个分块对应的 NAL包。 具体来说, 子轨道的描 述信息可以包括组描述索引和样本数目。
如前面所假设的, 图 6a和图 6b的图像帧所属的视频可以包括 54个图 像帧, 该视频可以是单层编码的视频, 那么每个图像帧可以对应一个样本, 共有 54个样本。
该子轨道数据定义容器可以包括 "grouping— type" 字段, 该字段取值为
"ssnm" , 因此该子轨道数据定义容器可以与 "grouping— type" 字段取值也 为 "ssnm" 的样本组描述容器相对应。 在本实施例中, 该子轨道数据定义容 器可以对应于表 6所示的样本组描述容器。 在上面的假设中, 第 1个子轨道 对应的区域由分块 ID为 "0" 的分块组成。
如表 10所示,在子轨道的第 1条描述信息中, "group— description— index" 字段取值为 "1" , "sample— count" 字段取值为 "10"。 具体来说, 第 1至第 10这 10个样本中分块 ID为 "0" 的分块可以对应 "grouping— type" 字段取 值也为 "ssnm" 的样本组描述容器中 "Entry— Index" 字段取值为 " Γ 所指 示的分块与 NAL 包之间的对应关系。 在子轨道的第 2 条描述信息中, "group— description— index" 字段取值为 "5" , "sample— count" 字段取值为 "30" , 那么可以表示, 第 11至第 40这 30个样本中分块 ID为 "0" 的分块 可以对应上述样本组描述容器中 "Entry— Index" 字段取值为 "5" 所指示分 块与 NAL 包之间的的对应关系。 在子轨道的第 3 条描述信息中, "group— description— index"字段取值为 " Γ, , "sample— count"字段取值为 "8" , 可以表示, 第 41至第 48这 8个样本中分块 ID为 "0" 的分块可以对应上述 样本组描述容器中 "Entry— Index" 字段取值为 " 1" 所指示的分块与 NAL包 之间的对应关系。在子轨道的第 4条描述信息中, " group— description— index" 字段取值为 "5" , "sample— count" 字段取值为 "6" , 可以表示, 第 49至第 54 这 6 个样本中分块 ID 为 "0" 的分块可以对应该样本组描述容器中 "Entry— Index" 字段取值为 " Γ 所指示的分块与 NAL包之间的对应关系。
应理解, 在情况(B ) 中, 如果子轨道对应的区域由多个分块组成。 那 么, 子轨道的描述信息的条数也会发生相应变化。 如上所述, 针对每个分块 与 NAL包的对应关系, 可以对样本进行分组。 例如, 如果子轨道对应的区 域由 2个分块组成, 基于第 1个分块与 NAL包之间的对应关系, 可以将样 本组分为 4组。 基于第 2个分块与 NAL之间的对应关系, 可以将样本组分 为 3组。 那么, 子轨道和样本组映射关系容器中可以有 7条描述信息。 表 10 子轨道容 2Sr
子轨道容 2Sr
子轨道数据描述容器 flag="r
horizonta vertical region—
region— he
1— offset —offset width tilelD Independent
ID ight(区域
(水平偏 (垂直 (区域 (分块 ID) (独立性) 高度)
移) 偏移) 宽度)
0 0 160 480 0
子轨道数据定义容器
Figure imgf000042_0001
706, 文件生成器生成视频文件, 该视频文件包括上述样本组描述容器、 用于描述各个子轨道的子轨道数据描述容器和用于描述各个子轨道的子轨 道数据定义容器以及组成视频轨道的样本。
具体地, 该视频文件可以包括每个子轨道对应的子轨道容器, 子轨道容 器可以包括该子轨道对应的子轨道数据描述容器和子轨道数据定义容器。
例如, 在本实施例中, 视频文件可以包括一个 "grouping type" 字段取 值为 "ssnm" 的样本组描述容器和 4个子轨道容器, 并可以包括组成视频轨 道的样本。
707, 文件生成器向文件解析器发送视频文件。
本发明实施例中,针对每个子轨道生成一个子轨道数据描述容器以及一 个子轨道数据定义容器, 并生成包括用于描述每个子轨道的子轨道描述容器 和用于描述每个子轨道的子轨道数据定义容器的视频文件, 由于每个子轨道 数据描述容器包括子轨道的区域信息,每个子轨道数据定义容器包括子轨道 的描述信息,子轨道的描述信息用于指示子轨道中各个分块对应的 NAL包, 使得文件解析器能够根据子轨道的区域信息确定目标区域对应的目标子轨 道, 并根据目标子轨道的子轨道数据定义容器中的目标子轨道的描述信息以 及样本组描述容器,确定播放时间段内的样本中目标子轨道对应的 NAL包, 以播放目标区域在该播放时间段内的画面,从而能够有效地实现视频中区域 画面的提取。
上面介绍了生成视频文件的过程, 下面将介绍根据视频文件从视频中提 取目标区域的画面的过程。 图 13的过程与图 7的过程是对应的, 将适当省 略相同的描述。
图 13是与图 7的过程相对应的处理视频的方法的过程的示意性流程图。 图 13的方法由文件解析器执行。
1301 , 文件解析器从文件生成器接收视频文件。
视频的视频轨道可以划分为至少一个子轨道。视频文件可以包括至少一 个子轨道数据描述容器和至少一个子轨道数据定义容器以及组成视频轨道 的样本。每个子轨道可以由一个子轨道数据描述容器和一个子轨道数据定义 容器描述。
1302, 文件解析器确定在视频画面中要提取的目标区域的大小和位置, 及需要提取的播放时间段。
具体地, 文件解析器可以从应用获取要提取的目标区域对应的矩形的大 小和位置, 以及由用户选择或者应用决定的要提取的目标区域对应的播放时 间段。
如图 3的实施例中所描述的, 用户或节目提供商指定的目标区域的形状 可以是任意的, 例如, 可以为矩形、 三角形或圓形等。 在判断子轨道对应的 区域是否与目标区域存在交叠时, 通常基于矩形来判断交叠。 那么, 可以确 定目标区域对应的矩形。 如果目标区域本身的形状为矩形, 那么目标区域对 应的矩形也就是目标区域自身。 如果目标区域本身的形状不为矩形, 那么需 要选择包含该目标区域的矩形来作为判断对象。 例如, 假设目标区域是三角 区域, 那么目标区域对应的矩形可以是包含该三角区域的最小矩形。 目标区 域对应的矩形的大小可以通过该矩形的宽度和高度来表示, 目标区域对应的 矩形的位置可以通过该矩形左上角相对于画面左上角的水平偏移和垂直偏 移来表示。 1303 , 文件解析器根据视频文件确定播放时间段对应的样本。 文件解析器可以根据需要提取的播放时间段,从视频轨道中选择该播放 时间段内的一个或多个样本。 例如, 以上述例子为例进行说明, 4叚设视频包 含 54个图像帧, 该播放时间段可以对应于第 20帧至第 54帧。 那么, 该播 放时间段可以对应于第 20个样本至第 54个样本。 具体的, 确定播放时间段 对应的样本为现有技术, 本发明实施例不再详述。
1304, 文件解析器从视频文件中获取所有的子轨道数据描述容器。
子轨道数据描述容器可以包括该子轨道数据描述容器描述的子轨道的 区域信息。 每个子轨道的区域信息用于指示该子轨道对应的区域。
1305 , 文件解析器根据目标区域对应的矩形的大小和位置以及每个子轨 道数据描述容器中的子轨道的区域信息,确定目标区域对应的子轨道作为目 标子轨道。
在下面将目标区域对应的子轨道称为目标子轨道。 具体地, 文件解析器 可以根据图 3的实施例所描述的方式,对每个子轨道对应的区域与目标区域 进行比较,确定子轨道对应的区域与目标区域是否存在交叠,如果存在交叠, 则可以确定该子轨道对应于目标区 i或。
在图 6a和图 6b所示的图像帧中, 假设目标区域本身为矩形。 图 14是 根据本发明一个实施例的目标区域对应的目标子轨道的示意图。
如图 14所示, 对目标区域的大小和位置以及 4个子轨道容器中子轨道 数据描述容器里子轨道对应的区域进行比较,确定目标区域对应的目标子轨 道为第 2个子轨道和第 3个子轨道。 即, 第 2个子轨道和第 3个子轨道为目 标子轨道。
( 33
谷 。
例如, 上述目标区域对应第 2个子轨道和第 3个子轨道, 可以从视频文
1307, 文件解析器根据上述播放时间段以及目标子轨道对应的子轨道数 据定义容器, 确定播放时间段对应的样本中目标子轨道的描述信息。
例如, 可以根据目标区域对应的播放时间段以及第 2个子轨道和第 3个 子轨道分别对应的子轨道数据定义容器, 确定播放时间段对应的样本中第 2 个子轨道的描述信息和第 3个子轨道的描述信息。 如图 7的步骤 701所述, 关于分块与 NAL包之间的对应关系可以存在 两种情况。 下面将分别针对这两种情况, 结合具体例子对步骤 1307进行描 述。
( 1 )对于组成视频轨道的样本, 相同分块 ID所指示的分块对应相同编 号的 NAL包。
在这种情况下,文件解析器可以直接从目标子轨道对应的子轨道数据定 义容器中的子轨道和样本组映射关系容器中, 获取该目标子轨道的描述信 息, 该目标子轨道的描述信息也就是播放时间段对应的样本中该目标子轨道 的描述信息。
下面以第 2个子轨道为例, 结合图 15进行说明。 图 15是根据本发明一 个实施例的子轨道的描述信息的示意图, 以表示视频轨道中所有样本中相同 分块 ID所指示的分块对应相同编号的 NAL包, 各个样本中分块与 NAL包 之间的对应关系都相同。
具体地, 文件解析器可以从第 2个子轨道对应的子轨道数据定义容器中 的子轨道和样本组映射关系容器, 获取第 2个子轨道的描述信息。 在第 2个 子轨道的各条描述信息中, "group— description— index" (组描述索引)字段有 不同的取值。 "group— description— index" 字段的取值的数目可以与该子轨道 对应的分块数目相同。
由于在这种情况下, 组成视频轨道的样本中相同分块 ID所指示的分块 对应相同编号的 NAL包,各个样本中分块与 NAL包之间的对应关系都相同。 因此, 对于每个子轨道来说, 所有样本可以共用同样的描述信息, 因此第 2 个子轨道的描述信息即为播放时间段对应的样本中第 2 个子轨道的描述信 息。 如图 15所示, 第 2个子轨道对应于 ID为 "2" 的子轨道容器。 在播放 时间段对应的样本中,第 2个子轨道的描述信息中 "group— description— index" 字段的取值 "2"。
第 3个子轨道对应的过程类似于第 2个子轨道, 不再赘述。 如图 15所 示, 第 3个子轨道对应于 ID为 "3" 的子轨道容器。 在播放时间段对应的样 本中,第 3个子轨道的描述信息中 "group— description— index"字段的取值" 3"。
( 2 )在组成视频轨道的样本的至少两个样本中, 相同的分块 ID所指示 的分块, 对应于不同编号的 NAL包。
在这种情况下,文件解析器可以在目标子轨道对应的子轨道数据定义容 器中的子轨道和样本组映射关系容器中,根据该目标子轨道的各条描述信息 中 "sample— count" 字段的取值, 确定播放时间段对应的样本所对应的描述 信息, 这些描述信息即为播放时间段对应的样本中该目标子轨道的描述信 息。 下面将以第 2个子轨道为例, 结合图 16来进行说明。 图 16是根据本发 明另一实施例的子轨道的描述信息的示意图, 以表示在视频轨道的至少两个 样本中, 相同分块 ID所指示的分块对应于不同编号的 NAL包。
具体地, 可以从第 2个子轨道对应的子轨道数据定义容器中的子轨道和 样本组映射关系容器中, 获取第 2个子轨道的描述信息。 在第 2个子轨道的 各条描述信息中, "group— description— index" (组描述索引)字段以及相应的 "sample_count" (样本数目 )字段有着不同的取值。 每条描述信息可以包含 一个 "sample— count" 字段的取值和一个 "group— description— index" 字段的 取值。 "sample— count" 字段可以表示符合相应的 "group— description— index" 字段所指示的分块与 NAL包之间的对应关系的连续样本数目。
此夕卜, 因为已知 "group— description— index" 字段各个取值对应的连续样 本数目, 因此可以确定播放时间段对应的样本中第 2个子轨道的描述信息。 例如, 如图 16所示, 第 2个子轨道对应于 ID为 "2" 的子轨道容器。 第 2 个子轨道的描述信息共有 4条。 "sample— count" 字段的取值为 "10" , 可以 表示第 1至第 10个样本对应第 1条描述信息。 "sample— count" 字段的取值 为 "30" ,可以表示第 11至第 40个样本对应第 2条描述信息。 "sample— count" 字段的取值为 "8" , 可以表示第 41 至第 48 个样本对应第 3 条描述信息。
"sample— count" 字段的取值为 "6" , 可以表示第 49至第 54个样本对应第 4条描述信息。 如上假设, 播放时间段对应的样本为第 20至第 54个样本。 在播放时间段对应的样本中, 第 2个子轨道的描述信息为该子轨道对应的子 轨道和样本组的映射关系容器中的第 2、 3和 4条描述信息。
确定播放时间段对应的样本中第 3个子轨道对应的描述信息的过程类似 于第 2个子轨道, 不再赘述。 如图 16所示, 第 3个子轨道对应于 ID为 "3" 的子轨道容器。在播放时间段对应的样本中第 3个子轨道的描述信息为该子 轨道对应的子轨道和样本组的映射关系容器中的第 2、 3和 4条描述信息。
1308, 文件解析器根据目标子轨道的描述信息以及样本组描述容器, 确 例如, 根据第 2个子轨道的描述信息、 第 3个子轨道的描述信息以及样 本组描述容器, 确定这两个子轨道的编号对应的 NAL包的编号。 在该步骤中, 仍将针对图 7的步骤 701所述的两种情况进行描述。
( 1 )对于组成视频轨道的样本, 相同分块 ID所指示的分块对应相同编 号的 NAL包。
具体地, 文件解析器可以确定目标子轨道对应的子轨道和样本组映射关 系容器中的 "grouping_type" (分组类型)字段取值为 "ssnm" , 其取值可以 作为本发明实施例的分组标识,然后可以从视频文件中获取 "grouping— type" 字段取值为 "ssnm" 的样本组描述容器。 文件解析器可以从该样本组描述容 器中获取与 " group— description— index " (组描述索引 ) 字段取值相同的 "Entry— Index" (条目索引)字段所指示的分块与 NAL包之间的对应关系, 根据获取的分块与 NAL包之间的对应关系确定该子轨道对应的 NAL包的编 号。
下面以第 2个子轨道为例, 结合图 15进行说明。
如图 15所示,在第 2个子轨道的描述信息中, "group— description— index" 字段取值为 "2"。那么,在样本组描述容器中获取取值为 "2"的 "Entry— Index" 字段所指示的分块与 NAL包之间的对应关系。 可见, 第 2个子轨道对应的 NAL包的编号分别为 2、 3和 4。
第 3个子轨道对应的过程类似于第 2个子轨道, 不再赘述。 如图 15所 示, 第 3个子轨道对应的 NAL包的编号分别为 5、 6和 7。
( 2 )在组成视频轨道的样本的至少两个样本中, 相同的分块 ID所指示 的分块, 对应于不同编号的 NAL包。
具体地, 文件解析器可以确定目标子轨道对应的子轨道和样本组映射关 系容器中的 "grouping— type" (分组类型)字段取值为 "ssnm" , 然后可以从 视频文件中获取 "grouping— type" 字段取值为 "ssnm" 的样本组描述容器。 然后可以从该样本组描述容器中获取与 "group— description— index" (组描述 索引)字段取值相同的 "Entry— Index" (条目索引)字段所指示的分块与 NAL 包之间的对应关系, 根据获取的分块与 NAL包之间的对应关系确定该子轨 道对应的 NAL包的编号。
下面以第 2个子轨道为例, 结合图 16进行说明。
如图 16所示, 以第 20个样本为例进行说明。 在第 20个样本中, 第 2 个子轨道的描述信息中, "group— description— index" 字段取值为 "6"。 那么, 在样本组描述容器中获取取值为 "6" 的 "Entry— Index" 字段所指示的分块 与 NAL包之间的对应关系。 可见, 在第 20个样本中, 第 1个子轨道对应的 NAL包的编号分别为 3、 4和 5。
第 3个子轨道对应的过程类似于第 2个子轨道, 不再赘述。 如图 16所 示, 在第 20个样本中, 第 3个子轨道对应的 NAL包的编号分别为 6和 7。
对于播放时间段对应的每个样本,例如上述假设的第 20至第 54个样本, 确定 NAL包的编号的过程与上述第 20个样本的情况类似, 不再赘述。
1309, 根据步骤 1308中确定的 NAL包的编号, 从视频文件中获取相应 的 NAL包, 以便解码器对这些 NAL包进行解码, 以播放目标区域在播放时 间段内的画面。
例如, 当这些 NAL包对应的矩形区域超出目标区域时, 可以对该矩形 区域进行裁剪, 从而播放目标区域的画面。
本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息, 确定目标区域对应的子轨道作为目标子轨道, 并根据目标 子轨道对应的子轨道数据定义容器中的目标子轨道的描述信息以及样本组 包的编号, 使得能够对这些 NAL包进行解码来播放目标区域在该播放时间 段内的画面, 从而能够有效地实现视频中区域画面的提取。
下面仍将结合图 6a和图 6b所示的场景描述本发明实施例。 在图 17中, 重点描述生成视频文件的过程。
图 17是根据本发明另一实施例的处理视频的方法的过程的示意性流程 图。 图 17的方法由文件生成器执行。
1701 , 文件生成器确定视频的轨道中分块与 NAL包之间的对应关系。 具体地, 可以将视频画面划分为多个分块, 也就是, 将视频的图像帧划 分为多个分块。 视频的所有图像帧的分块数目和分块位置均是相同的, 因此 对于轨道的样本来说, 分块数目和分块位置也是相同的。
在该实施例中, 分块示意图仍可以参见图 8。 如图 8所述, 每个图像帧 可以被划分为 4个分块, 即分块 0、 分块 1、 分块 2和分块 3。 相应地, 每个 样本对应的分块即为分块 0、 分块 1、 分块 2和分块 3。
分块与 NAL包之间的对应关系可以分组, 即下面所述的映射组。 对于 组成视频轨道的样本来说, 同一分块标识所指示的分块对应于相同编号的 NAL包, 这种情况下, 共有一个映射组。
对于组成视频轨道的样本来说, 至少一个相同分块标识所指示的分块对 应于不同编号的 NAL包。 这种情况下, 可以有多个映射组。 也就是说, 任 意的两个映射组中,至少有一个分块与 NAL包之间的对应关系是不相同的。
每个映射组具有标识, 本实施例中, 映射组的标识可以为条目索引。 例如, 假设针对于图 6a所示的图像帧, 分块与 NAL包之间的对应关系 如表 11所示。
表 11 映射组
Figure imgf000049_0001
假设针对于图 6b所述的图像帧, 分块与 NAL 包之间的对应关系如表 所示。
表 12 分块与 NAL包之间的对应关系
Figure imgf000049_0002
此处, 假设在该视频轨道的其它样本中, 分块与 NAL包之间的对应关 系符合上述两个映射组中的其中一组。 因此, 在该视频轨道中, 共有 2组分 块与 NAL包之间的对应关系, 即共有两个映射组。
1702, 根据步骤 1701中的分块与 NAL包之间的对应关系, 生成样本组 描述容器。
在样本组描述容器中, 可以包括整数个分块与 NAL包的映射关系条目 ( Tile NALU Map Entry ), 其具体数量与上述映射组的组数相同。 每个分块 与 NAL包的映射关系条目包括各个分块与 NAL包之间的对应关系。 按照 ISOBMFF定义的框架, 分块与 NAL包的映射关系条目的一种数 据结构可参考步骤 702中描述的数据结构。
class TileNALUMapEntry () extends VisualSampleGroupEntry ('tlnm') { unsigned int(6) reserved = 0;
unsigned int(l) large— size;
if (large— size) {
unsigned int(16) entry— count;
} else {
unsigned int(8) entry— count; for (i=l; i<= entry— count; i++){
if (large— size) {
unsigned int(16) NALU— start— number; unsigned int(8) NALU— start— number;
\
unsigned int( 16) tilelD; //分块 ID
表 13示出了上述数据结构中各字段的含义。
表 13 分块与 NAL包的映射关系条目中字段含义 字段名称 字段含义
i己录分块对应的起始 NAL包的编号和分块对应的 NAL包数目的字段占用的字节数, large— size
取值为 1表示占用 2个字节
取值为 0表示占用 1个字节
NALU— start— number 分块对应的起始 NAL包的编号
entry— count 样本中分块的数目
tilelD 分块 ID 例如, 表 14 示出了样本组描述容器所包含的内容。 如表 14 所示,
"grouping_type" (分组类型)字段的取值为 "tlnm"。 其中, 表 14中, 包括 两个映射组, 每个映射组中包括 4个分块与 NAL包之间的对应关系。 其中 "Entry— Index" 字段用于表示每个映射组在样本组描述容器中的存储位置。
表 14样本组描述容器
Figure imgf000051_0001
1703 , 根据步骤 1701中确定的分块与 NAL包之间的对应关系, 生成样 本与样本组的映射关系容器。
具体地,样本与样本组的映射关系容器可以包括整数条样本与映射组之 间的对应关系。 在每条样本与映射组之间的对应关系中, 可以包括一个 " sample_count " (样本数目 ) 字段和一个 " Index " (索引 ) 字段。 "sample— count" 字段可以表示有 "sample— count"个连续的样本符合相应的 "Index" 所指示的映射组中分块与 NAL包之间的对应关系。 各种样本与映 射组之间的对应关系的排列顺序按照 "sample— count" 字段对应的连续样本 在视频轨道中的排列顺序进行排列。
样本与样本组的映射关系容器还可以包括 "grouping— type" (分组类型 ) 字段。 该字段的取值可以表示该样本组描述容器用于描述基于分块与 NAL 包的对应关系的样本分组。
例如, 表 15示出了样本与样本组的映射关系容器所包含的具体内容。 ^口表 15所示, "grouping— type" 字段的取值可以为 "tlnm"。
在表 15中,在第 1行所表示的样本与映射组之间的对应关系中, "Index" 字段取值为 "1" , "sample— count" 字段取值为 "10 " , 可以表示, 第 1到第 10这 10个样本可以对应 "grouping— type" 取值为 "tlnm" 的样本组描述容 器中 "Entry— index" 字段取值为 "1" 的映射组。 类似地, 第 11到第 40这 30个样本可以对应该样本组描述容器中 "Entry— index" 字段取值为 "2" 的 映射组。第 41到第 48这 8个样本可以对应该样本组描述容器中 "Entry— index" 字段取值为 "Γ 的映射组。 第 49到第 54个这 6个样本可以对应该样本组 描述容器中 "Entry— index" 字段取值为 "2" 的映射组。
表 15 样本与样本组的映射关系容器
Figure imgf000052_0001
1704, 文件生成器基于分块将视频轨道被划分为子轨道。
每个子轨道可以由一个或多个分块组成, 这些分块可以形成一个矩形区 域。 本实施例中, 可以 4叚设每个子轨道由一个分块组成, 那么上面所述的 4 个分块将分别对应于 4个子轨道。
1705 , 对于每一个子轨道, 生成用于描述该子轨道的子轨道数据描述容 器。
步骤 1705类似于图 7中的步骤 704, 不再赘述。
1706, 对于每个子轨道, 生成用于描述子轨道的子轨道数据定义容器。 子轨道数据定义容器可以包括子轨道的描述信息, 子轨道的描述信息可 以指示该子轨道中分块与 NAL包之间的对应关系。
具体地, 子轨道数据定义容器可以包括子轨道和样本组的映射关系容 器, 子轨道和样本组的映射关系容器可以包括子轨道的描述信息。 子轨道和样本组的映射关系容器的所包含的具体内容可以分为以下两 种情况: 一种情况是子轨道和样本组的映射关系容器可以包括
"grouping— type" 字段, 另一种情况是子轨道和样本组的映射关系容器不包 括 "grouping— type" 字段。 下面针对这两种情况进行描述。
( 1 )子轨道和样本组的映射关系容器可以不包括 "grouping— type" 字 段。 这种情况下, 可以预先设定 "grouping— type" 字段的取值。 该取值可以 与样本组描述容器中的 "grouping— type" 字段以及样本与样本组的映射关系 容器中的 "grouping— type" 字段取值相同。 子轨道和样本组的映射关系容器 可以包括子轨道的描述信息,在子轨道的描述信息中,可以包括 "tilelD" (分 块 ID )字段。 该字段可以表示该子轨道中分块的标识。 因此, "tilelD" 字段 的取值的数目可以与该子轨道中的分块的总数目相等。 那么, 子轨道的描述 信息的条数与子轨道中分块的数目是相同的。
按照 ISOBMFF定义的框架, 子轨道和样本组的映射关系容器的一种数 据结构可以表示如下:
aligned(8) class SubTrackSampleGroupBox extends FullBox('stsg', 0, 1){
unsigned int( 16) item— count; 〃描述信息的条数 for(i = 0; i< item— count; i++) {
unsigned int(32) tilelD;
}
}
在该数据结构中, "item— count" 字段可以表示子轨道的描述信息的条 数。 在子轨道的每条描述信息中, 可以包括上述 "tilelD" 字段。
每个子轨道可以对应一个子轨道容器, 子轨道容器可以包括该子轨道对 应的子轨道数据描述容器和该子轨道对应的子轨道数据定义容器。
表 16示出了第 1个子轨道的子轨道容器的一个例子, 用以表示不包括
"grouping— type" 字段的子轨道数据定义容器。 如表 16 所示, 在该子轨道 容器中, 包括子轨道数据描述容器和子轨道数据定义容器。 在子轨道数据描 述容器中, 可以包括 ID、 水平偏移、 垂直偏移、 区域宽度、 区域高度以及 独立性字段。 其中, 子轨道数据描述容器中的 ID也是子轨道容器的 ID, 可 以表示该子轨道容器描述的子轨道。 此外, 水平偏移、 垂直偏移、 区域宽度 和区域高度用于表示该子轨道对应的区域的大小和位置。独立性字段可以用 于指示子轨道对应的区域是否能独立解码。
子轨道数据定义容器可以包括子轨道和样本组的映射关系容器, 该子轨 道和样本组的映射关系容器包括子轨道的描述信息。子轨道的描述信息可以 包括该子轨道的各个分块 ID。 如上假设, 第 1 个子轨道对应的区域由第 1 个分块组成, 即分块 ID为 "0" 的分块。 那么, 如表 16所示, 在该子轨道 的描述信息中, "tilelD" 字段取值为 "0"。
表 16子轨道容器
Figure imgf000054_0001
( 2 )子轨道和样本组的映射关系容器还可以包括 "grouping— type" (分 组类型) 字段。 "grouping— type" 字段用于指示子轨道数据定义容器描述的 于分块与 NAL包之间的对应关系的子轨道信息。 具体地, 子轨道和样 本组的映射关系容器可以包括子轨道的整数条描述信息, 子轨道的每条描述 信息可以包括一个 "tilelD" 字段的取值。 那么, 子轨道的描述信息的条数 仍与子轨道中分块的总数目相同。 也就是说, 子轨道和样本组的映射关系容 器可以包括整数个 "tilelD" 字段的取值。
按照 ISOBMFF定义的框架, 子轨道和样本组的映射关系容器的一种数 据结构可以表示如下:
aligned(8) class SubTrackSampleGroupBox extends FullBox('stsg,, 0, 1){
unsigned int(32) grouping— type;
unsigned int( 16) item— count; 〃描述信息的条数 for(i = 0; i< item— count; i++) { unsigned int(32) tile ID;
}
}
在上述数据结构中, "item— count"字段可以表示子轨道的描述信息的条 数。 在子轨道的每条描述信息中, 可以包括上述 "tilelD" 字段。 并且, 定 义了上述 "grouping— type" 字段。
表 17 示出了第 1 个子轨道的子轨道容器的一个例子, 用以表示包括 "grouping— type" 字段的子轨道数据定义容器。 如表 17 所示, 在该子轨道 容器中, 包括子轨道数据描述容器和子轨道数据定义容器。 在子轨道数据描 述容器中, 包括 ID、 水平偏移、 垂直偏移、 区域宽度、 区域高度以及独立 性字段。 其中, 子轨道数据描述容器中的 ID也是子轨道容器的 ID, 可以表 示该子轨道容器描述的子轨道。 此外, 水平偏移、 垂直偏移、 区域宽度和区 域高度用于表示该子轨道对应的区域的大小和位置。
子轨道数据定义容器可以包括子轨道和样本组的映射关系容器, 该子轨 道和样本组的映射关系容器包括子轨道的描述信息。 如表 15所示, 在上面 的假设中, 第 1个子轨道对应的区域由分块 ID为 "0" 的分块组成。 子轨道 和样本组的映射关系容器可以包括一条子轨道的描述信息。在子轨道的这条 描述信息中, "tilelD" 字段取值为 "0"。 此外, 子轨道和样本组的映射关系 容器还可以包括 "grouping— type" 字段, 该 "grouping— type" 字段可以取值 为 "tlnm"。 而上述表 14所示的样本组描述容器中的 "grouping— type" 字段 取值为 " tlnm " , 表 15 所示的样本与样本组的映射关系容器中的 "grouping— type" 字段取值为 "tlnm" , 那么, 该子轨道数据定义容器可以对 应于表 14所示的样本组描述容器和表 15所示的样本与样本组的映射关系容 器。
表 17 子轨道容器
子轨道容器
子轨道数据 4 述容器 flag= -" 1 " region— width
horizontal— offset vertical— offset region— height Independent
ID (垂直偏
(水平偏移) (垂直偏移) (区域高度) (独立性) 移)
1 0 0 160 480 1 子轨道数据定义容器
子轨道和样本组的映射关系容器 grouping— type "tlnm"
tilelD
0
1707,文件生成器生成视频文件,该视频文件包括上述样本组描述容器、 各个子轨道对应的子轨道数据描述容器和各个子轨道对应的子轨道数据定 义容器以及组成视频轨道的样本。
步骤 1707和图 7的步骤 706类似, 不再赘述。
1708, 文件生成器向文件解析器发送视频文件。
本发明实施例中 ,针对每个子轨道生成一个子轨道数据描述容器以及一 个子轨道数据定义容器, 并生成包括用于描述每个子轨道的子轨道描述容器 和用于描述每个子轨道的子轨道数据定义容器的视频文件, 由于每个子轨道 数据描述容器包括子轨道的区域信息,每个子轨道数据定义容器包括子轨道 的描述信息,子轨道的描述信息用于指示子轨道中各个分块对应的 NAL包, 使得文件解析器能够根据子轨道的区域信息确定目标区域对应的目标子轨 道, 并根据目标子轨道的子轨道数据定义容器中的目标子轨道的描述信息、 样本组描述容器以及样本与样本组的映射关系容器,确定播放时间段内的样 本中每个目标子轨道中各个分块对应的 NAL包, 以播放目标区域在该播放 时间段内的画面, 从而能够有效地实现视频中区域画面的提取。
上面介绍了生成视频文件的过程,下面将介绍根据视频文件从视频中提 取目标区域的画面的过程。 图 18的过程与图 17的过程是对应的, 将适当省 略相同的描述。
图 18是与图 17 的过程相对应的处理视频的方法的过程的示意性流程 图。 图 18的方法由文件解析器执行。
步骤 1801至步骤 1806与图 13的步骤 1301至 1306类似, 不再赘述。 另外, 在该实施例中, 仍旧假设目标区域对应于第 2个子轨道和第 3个子轨 道, 即目标子轨道为第 2个子轨道和第 3个子轨道。
1807, 文件解析器根据目标子轨道对应的子轨道数据定义容器, 确定目 标子轨道的描述信息。
文件解析器可以从目标子轨道对应的子轨道数据定义容器, 直接获取目 标子轨道的描述信息, 目标子轨道的描述信息包括该目标子轨道中的分块
ID。
下面以第 2个子轨道为例, 结合图 19进行说明。 图 19是根据本发明一 个实施例的子轨道的描述信息的示意图。
具体地, 文件解析器可以从第 2个子轨道对应的子轨道数据定义容器中 的子轨道和样本组映射关系容器中, 获取第 2个子轨道的描述信息。 文件解 析器可以确定第 2个子轨道的描述信息中 "tilelD" 字段的取值。
如图 19所示, 第 2个子轨道对应于 ID为 "2" 的子轨道容器。 如上假 设, 第 2个子轨道由包含第 2个分块, 即分块 ID为 " 1" 的分块。 因此, 在 第 2个子轨道对应的子轨道数据定义容器中, 第 2个子轨道的描述信息中的 "tilelD" (分块 ID )字段的取值为 "1"。 第 3个子轨道对应于 ID为 "3" 的 子轨道容器。如上假设, 第 3个子轨道由包含第 3个分块, 即分块 ID为 "2" 的分块。 因此, 在第 3个子轨道对应的子轨道数据定义容器中, 第 3个子轨 道的描述信息中的 "tilelD" 字段的取值为 "2"。
1808, 根据目标子轨道的描述信息、 样本与样本组的映射关系容器以及 样本组描述容器, 确定播放时间段对应的样本中目标子轨道对应的 NAL包 的编号。
在该步骤中, 将针对图 17的步骤 1706所述的两种情况描述步骤 1808。 ( 1 )如果子轨道和样本组映射关系容器不包括 "grouping— type" (分组 类型 )字段, 文件解析器可以获取预先设定的 "grouping— type"字段的取值。 例如, 预先设定的 "grouping— type" 字段的取值可以为 "tlnm" , 即预先设定 的 "grouping— type" 字段的取值与样本组描述容器中的 "grouping— type" 字 段的取值以及样本与样本组的映射关系容器中的 "grouping— type" 字段的取 值相同。 然后文件解析器可以从视频文件中获取 "grouping— type" 字段取值 为 "tlnm" 的样本与样本组的映射关系容器。 文件解析器可以从样本与样本 组的映射关系容器中获取播放时间段对应的样本对应的 "Entry— Index"字段。 然后文件解析器可以在 "grouping— type" 字段取值为 "tlnm" 的样本组描述 容器中获取这些样本对应的 "Entry— Index" 字段所指示的映射组, 然后可以 在获取的映射组中确定目标子轨道的描述信息中所包含的分块 ID 对应的 NAL 包编号, 从而确定在该播放时间段对应的样本中该目标子轨道对应的 NAL包的编号。 下面以第 2个子轨道为例, 结合图 19进行说明。 例如, 仍假设播放时 间段对应于第 20至第 54个样本。以第 20个样本为例,可以从图 19中看出, 在样本与样本组的映射关系容器中, 其对应的 "Index" (索引)字段的取值 为 "2"。 由于样本与样本组的映射关系容器中的 "Index" 字段与样本组描述 容器中的 "Entry— Index" 字段的含义相同, 都是指示映射组。 因此, 对于第 20个样本而言, 对应的 "Index" (索引)字段的取值为 "2"。 那么在样本组 描述容器中, 文件解析器可以确定取值为 "2"的 "Entry— Index" (条目索引 ) 字段所指向的映射组。 如图 19所示, 第 20个样本对应于第 2个映射组。 而 第 2个子轨道的描述信息中, "tilelD" 字段的取值 "1"。 那么, 在第 20个 样本中, 对于第 2个子轨道, 在取值为 "2" 的 "Entry— Index" (条目索引) 字段所指向的映射组中, 分块 ID为 "1" 的分块对应的起始 NAL包的编号 为 3。 由于 NAL包是连续的, 在该映射组中, 可以看出, 分块 ID为 "2" 的分块对应的起始 NAL包的编号为 6。 那么说明, 分块 ID为 "1" 的分块 对应的 NAL包的编号分别为 3、 4和 5。也就是说,第 2个子轨道对应的 NAL 包的编号分别为 3、 4和 5。
同理,在第 20个样本中第 3个子轨道对应的 NAL包的编号分别为 6和 7。 具体过程类似于第 2个子轨道, 不再赘述。
( 2 )如果子轨道和样本组映射关系容器包括 "grouping— type" (分组类 型)字段, 则可以获取其中的 "grouping— type" 字段的取值, 该取值可以作 为本发明实施例的分组标识。 例如, 此处 "grouping— type" 字段的取值可以 为 "tlnm"。 文件解析器可以从视频文件中获取 "grouping— type" 字段取值为 "tlnm" 的样本与样本组的映射关系容器。 文件解析器可以从样本与样本组 的映射关系容器中获取播放时间段对应的样本对应的 "Entry— Index" 字段。 然后文件解析器可以在 "grouping— type" 字段取值为 "tlnm" 的样本组描述 容器中获取这些样本对应的 "Entry— Index" 字段所指示的映射组, 然后可以 在获取的映射组中确定目标子轨道的描述信息中所包含的分块 ID 对应的 NAL 包编号, 从而确定在该播放时间段对应的样本中该目标子轨道对应的 NAL包的编号。
针对第 2个子轨道与第 3个子轨道, 确定 NAL包编号的具体过程与步 骤 1808中的 ( 1 ) 的过程类似, 不再赘述。
步骤 1809与图 13中的步骤 1309类似, 不再赘述。 本发明实施例中,通过根据目标区域以及子轨道数据描述容器描述的子 轨道的区域信息, 确定目标区域对应的子轨道作为目标子轨道, 并根据目标 子轨道对应的子轨道数据定义容器中的目标子轨道的描述信息、样本组描述 容器中的映射组以及样本与样本组的映射关系容器,确定播放时间段对应的 样本中目标子轨道中各个分块对应的 NAL包的编号, 使得能够对这些 NAL 包进行解码来播放目标区域在该播放时间段内的画面,从而能够有效地实现 视频中区域画面的提取。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到, 为描述的方便和简洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来, 该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM, Read-Only Memory )、 随机存取存储器 ( RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应以所述权利要求的保护范围为准。

Claims

权利要求
1. 一种处理视频的设备, 其特征在于, 视频的视频轨道被划分为至少 一个子轨道,每个子轨道由一个子轨道数据描述容器和一个子轨道数据定义 容器描述, 所述设备包括:
接收单元, 用于: 接收所述视频对应的视频文件, 所述视频文件包括至 少一个子轨道数据描述容器、至少一个子轨道数据定义容器以及组成视频轨 道的样本, 所述子轨道数据描述容器包括所述子轨道数据描述容器描述的子 轨道的区域信息, 所述子轨道的区域信息用于指示在所述视频的画面中所述 子轨道对应的区域, 所述子轨道数据定义容器用于指示在所述组成所述视频 轨道的样本中所述子轨道数据定义容器描述的子轨道对应的网络提取层 NAL包;
确定单元, 用于:
确定在所述视频的画面中需要提取的目标区域以及需要提取的播放时 间段;
根据所述接收单元接收的所述视频文件,在所述组成所述视频轨道的样 本中确定所述播放时间段对应的样本;
根据所述目标区域以及所述子轨道数据描述容器包括的子轨道的区域 信息,在所述至少一个子轨道中确定与所述目标区域对应的子轨道作为目标 子轨道;
根据所述目标子轨道对应的子轨道数据定义容器,确定所述播放时间段 对应的样本中所述目标子轨道对应的 NAL包,所述确定的 NAL包被解码后 用于播放所述目标区域在所述播放时间段内的画面。
2. 根据权利要求 1 所述的设备, 其特征在于, 所述子轨道对应的区域 由至少一个分块组成;
所述视频文件还包括样本组描述容器, 所述样本组描述容器包括所述视 频轨道中各个分块与 NAL包之间的对应关系以及所述各个分块与 NAL包之 间的对应关系的标识;
所述目标子轨道对应的子轨道数据定义容器包括在所述组成视频轨道 的样本中所述目标子轨道的每个分块与 NAL包之间的对应关系的标识; 述播放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所 分块与 NAL包之间的对应关系的标识, 确定所述播放时间段对应的样本中 所述目标子轨道对应的 NAL包。
3. 根据权利要求 2所述的设备, 其特征在于, 在所述子轨道对应的区 域中, 对于所述组成视频轨道的样本, 标识相同的分块对应于相同编号的
NAL包。
4. 根据权利要求 2所述的设备, 其特征在于, 在所述子轨道对应的区 的分块对应于不同编号的 NAL包;
所述目标子轨道对应的子轨道数据定义容器还包括所述目标子轨道的 每个分块与 NAL包之间的对应关系的标识所对应的样本信息;
所述确定单元根据所述样本组描述容器和在所述组成视频轨道的样本 中所述目标子轨道的每个分块与 NAL包之间的对应关系的标识确定所述播 放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所述目 标子轨道的每个分块与 NAL包之间的对应关系的标识、 所述目标子轨道的 每个分块与 NAL之间的对应关系的标识所对应的样本信息以及所述样本组 描述容器, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL 包。
5. 根据权利要求 2至 4中任一项所述的设备, 其特征在于, 所述子轨 道数据定义容器还包括分组标识;
所述确定单元,还用于在确定所述播放时间段对应的样本中所述目标子 轨道对应的 NAL包之前, 根据所述分组标识, 从所述视频文件中获取具有 所述分组标识的所述样本组描述容器。
6. 根据权利要求 1 所述的设备, 其特征在于, 所述子轨道对应的区域 由至少一个分块组成;
所述视频文件还包括样本组描述容器, 所述样本组描述容器包括至少一 个映射组, 所述至少一个映射组中的每个映射组包括所述视频轨道中各个分 块标识与 NAL包之间的对应关系;
所述视频文件还包括样本与样本组映射关系容器, 所述样本与样本组映 射关系容器用于指示所述至少一个映射组中每个映射组对应的样本;
所述目标子轨道对应的子轨道数据定义容器包括所述目标子轨道的每 个分块的标识; 述播放时间段对应的样本中所述目标子轨道对应的 NAL包具体为: 根据所 述样本组描述容器、所述样本与样本组映射关系容器和所述目标子轨道的每 个分块的标识, 确定所述播放时间段对应的样本中所述目标子轨道对应的 NAL包。
7. 根据权利要求 6所述的设备, 其特征在于, 所述子轨道数据定义容 器包括分组标识;
所述确定单元,还用于在确定所述播放时间段对应的样本中所述目标子 轨道分别对应的 NAL包之前, 根据所述分组标识, 从所述视频文件中获取 具有所述分组标识的所述样本组描述容器和具有所述分组标识的所述样本 与样本组映射关系容器。
8. 一种处理视频的设备, 其特征在于, 视频的视频轨道被划分为至少 一个子轨道, 所述视频轨道由样本组成, 所述设备包括:
生成单元, 用于: 针对所述至少一个子轨道中的每个子轨道, 生成一个 子轨道数据描述容器和一个子轨道数据定义容器, 所述子轨道数据描述容器 包括所述子轨道数据描述容器描述的子轨道的区域信息, 所述子轨道的区域
描述的子轨道对应的网络提取层 NAL包;
生成所述视频的视频文件, 所述视频文件包括针对所述每一个子轨道生 成的所述一个子轨道数据描述容器和所述一个子轨道数据定义容器以及所 述组成所述视频轨道的样本;
发送单元, 用于: 发送所述生成单元生成的所述视频文件。
9. 根据权利要求 8所述的设备, 其特征在于, 所述子轨道对应的区域 由至少一个分块组成;
所述子轨道数据定义容器包括在所述组成视频轨道的样本中所述子轨 道数据定义容器描述的子轨道的每个分块与 NAL 包之间的对应关系的标 识;
所述生成单元, 还用于在所述生成所述视频的视频文件之前, 生成样本 组描述容器, 所述样本组描述容器包括所述视频轨道中各个分块与 NAL包 之间的对应关系以及所述各个分块与 NAL包之间的对应关系的标识; 所述视频文件进一步包括所述样本组描述容器。
10. 根据权利要求 9所述的设备, 其特征在于, 在所述子轨道对应的区 域中, 对于所述组成所述视频轨道的样本, 标识相同的分块对应于相同编号 的 NAL包。
11. 根据权利要求 9所述的设备, 其特征在于, 在所述子轨道对应的区 相同的分块对应于不同编号的 NAL包;
所述子轨道数据定义容器还包括, 所述子轨道数据定义容器描述的子轨 道的每个分块与 NAL包之间的对应关系的标识所对应的样本信息。
12. 根据权利要求 9至 11 中任一项所述的设备, 其特征在于, 所述子 轨道数据定义容器和所述样本组描述容器分别包括相同的分组标识。
13. 根据权利要求 8所述的设备, 其特征在于, 所述子轨道对应的区域 由至少一个分块组成;
所述子轨道数据定义容器包括所述子轨道数据定义容器描述的子轨道 中每个分块的标识;
所述生成单元, 还用于在所述生成所述视频的视频文件之前, 生成样本 组描述容器以及样本与样本组的映射关系容器, 所述样本组描述容器包括至 少一个映射组, 所述至少一个映射组中的每个映射组包括所述视频轨道中各 个分块标识与 NAL包之间的对应关系, 所述样本与样本组映射关系容器用 于指示所述至少一个映射组中每个映射组对应的样本;
所述视频文件进一步包括: 所述样本组描述容器和所述样本与样本组的 映射关系容器。
14. 根据权利要求 13所述的设备, 其特征在于, 所述子轨道数据定义 容器、所述样本组描述容器和样本与样本组映射关系容器分别包括相同的分 组标识。
15. 一种处理视频的方法, 其特征在于, 视频的视频轨道被划分为至少 一个子轨道,每个子轨道由一个子轨道数据描述容器和一个子轨道数据定义 容器描述, 所述方法包括:
接收所述视频对应的视频文件,所述视频文件包括至少一个子轨道数据 描述容器、 至少一个子轨道数据定义容器以及组成所述视频轨道的样本, 所 述子轨道数据描述容器包括所述子轨道数据描述容器描述的子轨道的区域 信息, 所述子轨道的区域信息用于指示在所述视频的画面中所述子轨道对应 的区域, 所述子轨道数据定义容器用于指示在所述组成所述视频轨道的样本 中所述子轨道数据定义容器描述的子轨道对应的网络提取层 NAL包;
确定在所述视频的画面中需要提取的目标区域以及需要提取的播放时 间段;
根据所述视频文件,在所述组成所述视频轨道的样本中确定所述播放时 间段对应的样本;
根据所述目标区域以及所述子轨道数据描述容器包括的子轨道的区域 信息,在所述至少一个子轨道中确定与所述目标区域对应的子轨道作为目标 子轨道;
根据所述目标子轨道对应的子轨道数据定义容器,确定所述播放时间段 对应的样本中所述目标子轨道对应的 NAL包,所述确定的 NAL包被解码后 用于播放所述目标区域在所述播放时间段内的画面。
16. 根据权利要求 15所述的方法, 其特征在于, 所述子轨道对应的区 域由至少一个分块组成;
所述视频文件还包括样本组描述容器, 所述样本组描述容器包括所述视 频轨道中各个分块与 NAL包之间的对应关系以及所述各个分块与 NAL包之 间的对应关系的标识;
所述目标子轨道对应的子轨道数据定义容器包括在所述组成视频轨道 的样本中所述目标子轨道的每个分块与 NAL包之间的对应关系的标识; 所述根据目标子轨道对应的子轨道数据定义容器,确定所述播放时间段 对应的样本中所述目标子轨道对应的 NAL包 , 包括:
根据所述样本组描述容器和在所述组成视频轨道的样本中所述目标子 轨道的每个分块与 NAL包之间的对应关系的标识, 确定所述播放时间段对 应的样本中所述目标子轨道对应的 NAL包。
17. 根据权利要求 16所述的方法, 其特征在于, 在所述子轨道对应的 区域中, 对于所述组成视频轨道的样本, 标识相同的分块对应于相同编号的 NAL包。
18. 根据权利要求 16所述的方法, 其特征在于, 在所述子轨道对应的 同的分块对应于不同编号的 NAL包;
所述目标子轨道对应的子轨道数据定义容器还包括所述目标子轨道的 每个分块与 NAL包之间的对应关系的标识所对应的样本信息; 间段对应的样本中所述目标子轨道对应的 NAL包 , 包括:
根据所述目标子轨道的每个分块与 NAL包之间的对应关系的标识、 所 述目标子轨道的每个分块与 NAL之间的对应关系的标识所对应的样本信息 以及所述样本组描述容器,确定所述播放时间段对应的样本中所述目标子轨 道对应的 NAL包。
19. 根据权利要求 16至 18中任一项所述的方法, 其特征在于, 所述子 轨道数据定义容器还包括分组标识;
在所述根据所述样本组描述容器和在所述组成视频轨道的样本中所述 目标子轨道的每个分块与 NAL包之间的对应关系的标识, 确定所述播放时 间段对应的样本中所述目标子轨道对应的 NAL包之前 , 还包括:
根据所述分组标识 ,从所述视频文件中获取具有所述分组标识的所述样 本组描述容器。
20. 根据权利要求 15所述的方法, 其特征在于, 所述子轨道对应的区 域由至少一个分块组成;
所述视频文件还包括样本组描述容器, 所述样本组描述容器包括至少一 个映射组, 所述至少一个映射组中的每个映射组包括所述视频轨道中各个分 块标识与 NAL包之间的对应关系;
所述视频文件还包括样本与样本组映射关系容器, 所述样本与样本组映 射关系容器用于指示所述至少一个映射组中每个映射组对应的样本;
所述目标子轨道对应的子轨道数据定义容器包括所述目标子轨道的每 个分块的标识; 间段对应的样本中所述目标子轨道对应的 NAL包 , 包括:
根据所述样本组描述容器、所述样本与样本组映射关系容器和所述目标 子轨道的每个分块的标识,确定所述播放时间段对应的样本中所述目标子轨 道对应的 NAL包。
21. 根据权利要求 20所述的方法, 其特征在于, 所述子轨道数据定义 容器包括分组标识;
在所述根据所述样本组描述容器、所述样本与样本组映射关系容器和所 述目标子轨道的每个分块的标识,确定所述播放时间段对应的样本中所述目 标子轨道分别对应的 NAL包之前, 还包括:
根据所述分组标识,从所述视频文件中获取具有所述分组标识的所述样 本组描述容器和具有所述分组标识的所述样本与样本组映射关系容器。
22. 一种处理视频的方法, 其特征在于, 视频的视频轨道被划分为至少 一个子轨道, 所述视频轨道由样本组成, 所述方法包括:
针对所述至少一个子轨道中的每个子轨道, 生成一个子轨道数据描述容 器和一个子轨道数据定义容器, 所述子轨道数据描述容器包括所述子轨道数 据描述容器描述的子轨道的区域信息, 所述子轨道的区域信息用于指示在所 述视频的画面中所述子轨道对应的区域, 所述子轨道数据定义容器用于指示 的网络提取层 NAL包;
生成所述视频的视频文件, 所述视频文件包括针对所述每一个子轨道生 成的所述一个子轨道数据描述容器和所述一个子轨道数据定义容器以及所 述组成所述视频轨道的样本;
发送所述视频文件。
23. 根据权利要求 22所述的方法, 其特征在于, 所述子轨道对应的区 域由至少一个分块组成;
所述子轨道数据定义容器包括在所述组成视频轨道的样本中所述子轨 道数据定义容器描述的子轨道的每个分块与 NAL 包之间的对应关系的标 识;
在所述生成所述视频的视频文件之前, 所述方法还包括:
生成样本组描述容器, 所述样本组描述容器包括所述视频轨道中各个分 块与 NAL包之间的对应关系以及所述各个分块与 NAL包之间的对应关系的 标识;
所述视频文件进一步包括所述样本组描述容器。
24. 根据权利要求 23所述的方法, 其特征在于, 在所述子轨道对应的 区域中, 对于所述组成所述视频轨道的样本, 标识相同的分块对应于相同编 号的 NAL包。
25. 根据权利要求 23所述的方法, 其特征在于, 在所述子轨道对应的 同的分块对应于不同编号的 NAL包;
所述子轨道数据定义容器还包括所述子轨道数据定义容器描述的子轨 道的每个分块与 NAL包之间的对应关系的标识所对应的样本信息。
26. 根据权利要求 23至 25中任一项所述的方法, 其特征在于, 所述子 轨道数据定义容器和所述样本组描述容器分别包括相同的分组标识。
27. 根据权利要求 23所述的方法, 其特征在于, 所述子轨道对应的区 域由至少一个分块组成;
所述子轨道数据定义容器包括所述子轨道数据定义容器描述的子轨道 的每个分块的标识;
在所述生成所述视频的视频文件之前, 还包括:
生成样本组描述容器以及样本与样本组的映射关系容器, 所述样本组描 述容器包括至少一个映射组, 所述至少一个映射组中的每个映射组包括所述 视频轨道中各个分块标识与 NAL包之间的对应关系, 所述样本与样本组映 射关系容器用于指示所述至少一个映射组中每个映射组对应的样本;
所述视频文件进一步包括所述样本组描述容器和所述样本与样本组的 映射关系容器。
28. 根据权利要求 27所述的方法, 其特征在于, 所述子轨道数据定义 容器、所述样本组描述容器和样本与样本组映射关系容器分别包括相同的分 组标识。
PCT/CN2013/087773 2013-11-25 2013-11-25 处理视频的设备和方法 WO2015074273A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810133819.3A CN108184101B (zh) 2013-11-25 2013-11-25 处理视频的设备和方法
CN201380002598.1A CN104919812B (zh) 2013-11-25 2013-11-25 处理视频的设备和方法
PCT/CN2013/087773 WO2015074273A1 (zh) 2013-11-25 2013-11-25 处理视频的设备和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/087773 WO2015074273A1 (zh) 2013-11-25 2013-11-25 处理视频的设备和方法

Publications (1)

Publication Number Publication Date
WO2015074273A1 true WO2015074273A1 (zh) 2015-05-28

Family

ID=53178840

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/087773 WO2015074273A1 (zh) 2013-11-25 2013-11-25 处理视频的设备和方法

Country Status (2)

Country Link
CN (2) CN104919812B (zh)
WO (1) WO2015074273A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652630B2 (en) * 2016-05-24 2020-05-12 Qualcomm Incorporated Sample entries and random access
US20180048877A1 (en) * 2016-08-10 2018-02-15 Mediatek Inc. File format for indication of video content
CN108235113B (zh) * 2016-12-14 2022-01-04 上海交通大学 一种全景视频渲染和呈现属性指示方法及系统
CN108989826B (zh) * 2017-06-05 2023-07-14 上海交通大学 视频资源的处理方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101796834A (zh) * 2007-07-02 2010-08-04 Lg电子株式会社 数字广播系统和在数字广播系统中处理数据的方法
CN102271249A (zh) * 2005-09-26 2011-12-07 韩国电子通信研究院 用于可伸缩视频的感兴趣区域信息设置方法和解析方法
CN103026721A (zh) * 2010-07-20 2013-04-03 高通股份有限公司 布置用于串流传输视频数据的子轨道片段

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101255226B1 (ko) * 2005-09-26 2013-04-16 한국과학기술원 스케일러블 비디오 코딩에서 다중 roi 설정, 복원을위한 장치 및 방법
CN101453639B (zh) * 2007-11-29 2012-05-30 展讯通信(上海)有限公司 支持roi区域的多路视频流的编码、解码方法和系统
CA2758237C (en) * 2009-04-09 2017-08-15 Telefonaktiebolaget Lm Ericsson (Publ) Media container file management
US8976871B2 (en) * 2009-09-16 2015-03-10 Qualcomm Incorporated Media extractor tracks for file format track selection
EP3313083B1 (en) * 2011-06-08 2019-12-18 Koninklijke KPN N.V. Spatially-segmented content delivery
EP2560386A1 (en) * 2011-08-15 2013-02-20 MediaTek, Inc Video processing apparatus and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102271249A (zh) * 2005-09-26 2011-12-07 韩国电子通信研究院 用于可伸缩视频的感兴趣区域信息设置方法和解析方法
CN101796834A (zh) * 2007-07-02 2010-08-04 Lg电子株式会社 数字广播系统和在数字广播系统中处理数据的方法
CN103026721A (zh) * 2010-07-20 2013-04-03 高通股份有限公司 布置用于串流传输视频数据的子轨道片段

Also Published As

Publication number Publication date
CN104919812A (zh) 2015-09-16
CN108184101A (zh) 2018-06-19
CN104919812B (zh) 2018-03-06
CN108184101B (zh) 2020-07-14

Similar Documents

Publication Publication Date Title
US11805304B2 (en) Method, device, and computer program for generating timed media data
JP6410899B2 (ja) メディアファイルの生成装置、生成方法、及びプログラム
US20200275143A1 (en) Method, device, and computer program for encapsulating scalable partitioned timed media data
US11178470B2 (en) Method, device, and computer program for encapsulating partitioned timed media data
US11876994B2 (en) Description of image composition with HEVC still image file format
US8495697B1 (en) Techniques to provide an enhanced video replay
CN110800311B (zh) 用于传输媒体内容的方法、装置和计算机程序
JP2018509029A (ja) 画像データのカプセル化
US11139000B2 (en) Method and apparatus for signaling spatial region information
JPWO2015060165A1 (ja) 表示処理装置、配信装置、および、メタデータ
WO2015074273A1 (zh) 处理视频的设备和方法
CN114556962B (zh) 多视点视频处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13897986

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13897986

Country of ref document: EP

Kind code of ref document: A1