WO2018058773A1 - Video data processing method and apparatus - Google Patents

Video data processing method and apparatus Download PDF

Info

Publication number
WO2018058773A1
WO2018058773A1 PCT/CN2016/107111 CN2016107111W WO2018058773A1 WO 2018058773 A1 WO2018058773 A1 WO 2018058773A1 CN 2016107111 W CN2016107111 W CN 2016107111W WO 2018058773 A1 WO2018058773 A1 WO 2018058773A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
spatial
code stream
representation
view
Prior art date
Application number
PCT/CN2016/107111
Other languages
French (fr)
Chinese (zh)
Inventor
邸佩云
宋翼
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018058773A1 publication Critical patent/WO2018058773A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Definitions

  • the present invention relates to the field of streaming media data processing, and in particular, to a method and an apparatus for processing video data.
  • VR virtual reality
  • FOV field of view
  • the VR video is divided into multiple code streams corresponding to multiple fixed space objects, and each fixed space object is correspondingly based on a hypertext transfer protocol (HTTP).
  • HTTP hypertext transfer protocol
  • DASH Dynamic adaptive streaming over HTTP
  • the terminal selects one or more fixed space objects in the video that include the space object according to the new space object after the user switches, and each fixed space object includes a part of the switched space object.
  • the terminal acquires the code stream of the panoramic space object, and decodes the code stream of the one or more fixed space objects, and then presents the video content corresponding to the space object according to the new space object.
  • the terminal needs to store the code stream of the panoramic space object into the local storage space, and then select the corresponding code stream according to the new space object for presentation, and the code stream corresponding to the non-presented space object is excess video data, not only It occupies the local storage space, and also causes a waste of network transmission bandwidth of video data transmission, and has poor applicability.
  • the DASH technical specification is mainly composed of two major parts: the media presentation description (English: Media Presentation) Description, MPD) and media file format (English: file format).
  • the server prepares multiple versions of the code stream for the same video content.
  • Each version of the code stream is called a representation in the DASH standard (English: representation).
  • Representation is a collection and encapsulation of one or more codestreams in a transport format, one representation containing one or more segments.
  • Different versions of the code stream may have different coding parameters such as code rate and resolution, and each code stream is divided into a plurality of small files, and each small file is called segmentation (or segmentation, English: segment).
  • segmentation or segmentation, English: segment
  • rep1 is a high-definition video with a code rate of 4mbps (megabits per second)
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a standard-definition video with a code rate of 1mbps.
  • the segment marked as shaded in Figure 3 is the segmentation data requested by the client.
  • the first three segments requested by the client are the segments of the media representation rep3, the fourth segment is switched to rep2, and the fourth segment is requested. Segment, then switch to rep1, request the fifth segment and the sixth segment, and so on.
  • Each represented segment can be stored in a file end to end, or it can be stored as a small file.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
  • the media presentation description is called MPD
  • the MPD can be an xml file.
  • the information in the file is described in a hierarchical manner. As shown in FIG. 2, the information of the upper level is completely inherited by the next level. Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the request. Find the http-URL of the segment.
  • media presentation is a collection of structured data for presenting media content
  • media presentation description English: media presentation description
  • a standardized description of media presentation files for providing streaming media services Period English: period
  • representation English: representation
  • a structured data set (encoded individual media types, such as audio, video, etc.) is a collection and encapsulation of one or more code streams in a transport format, one representation comprising one or more segments
  • Set (English: AdaptationSet), which represents a set of multiple interchangeable coded versions of the same media content component, an adaptive set containing one or more representations
  • a subset English: subset
  • the information is a media unit referenced by the HTTP uniform resource locator in the media presentation description, and the segmentation information describes segmentation
  • the related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC23009-1:2014 Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats, You can refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
  • Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment.
  • VR mainly includes Simulate aspects such as environment, perception, natural skills and sensing equipment.
  • the simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has.
  • there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception.
  • Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors.
  • a sensing device is a three-dimensional interactive device.
  • VR video or 360 degree video, or Omnidirectional video
  • only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented.
  • VR video is that the entire video content will be presented to the user; VR video is only a subset of the entire video is presented to the user (English: in VR typically only a Subset of the entire video region represented by the video pictures).
  • a Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an Adaptation Set or a Sub-Representation.”
  • spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation.
  • ROI region of interest
  • the existing DASH standard defines some descriptor elements in the MPD. Each descriptor element has two attributes, schemeIdURI and value. Where schemeIdURI describes what the current descriptor is, value Is the parameter value of the descriptor.
  • SupplementalProperty and EssentialProperty SupplementalProperty and EssentialProperty (supplemental feature descriptors and basic property descriptors).
  • the spatial location information may further include a spatial location corresponding to the thick wire frame in the figure and a spatial location corresponding to the thin wire frame, and the spatial object corresponding to the thick wire frame completely covers the spatial object of the thin wire frame, and the thin wire frame Space object covers part of the thick line space object;
  • Method 1 using the upper and lower angular ranges of the viewing angle center position and the viewing angle to describe the spatial position in the thick line frame; using the upper and lower angular ranges of the viewing angle center position and the viewing angle to describe the spatial position in the thin line frame;
  • All the spatial location information syntax in the present invention can exist independently in a box, such as a strp box. If there is an identifier in the upper level of the strp box, and the identifier strp box exists, the space position in the thick line box is described in the strp box. Information and thin wireframe space location information.
  • the spatial position of the thick line frame is described by the upper and lower angular ranges of the central position of the viewing angle and the viewing angle;
  • the spatial position of the thin line frame is described by the offset of the central position of the viewing angle and the upper and lower angular extent of the viewing angle;
  • the spatial position of the thin line frame is described by the upper and lower angle ranges of the center position of the angle of view and the angle of view; the spatial position of the thick line frame is described by the offset of the center position of the angle of view and the range of the upper and lower angles of the angle of view.
  • the spatial position information is described by the width and height of the center position and the angle of view, and the above-mentioned center position may also be replaced with the upper left starting position;
  • the angular position of the spatial position described above may be replaced by a spatial position at coordinates x, y, and z of the three coordinate axes.
  • Figure 6 is a schematic diagram of the spatial relationship of spatial objects.
  • the image AS can be set as a content component, and AS1, AS2, AS3, and AS4 are four spatial objects included in the AS, and each spatial object is associated with a space.
  • the spatial relationship of each spatial object is described in the MPD, for example, each spatial object. The relationship between the associated spaces.
  • the spatial object of the view code stream can completely contain the spatial object described by the spatial information, and may also have some deviation
  • the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user,
  • the splicing of multiple sub-views forms a complete human eye viewing angle.
  • the dynamic change of the viewing angle of the human eye can usually be 120 degrees * 120 degrees.
  • the space object 1 corresponding to the frame 1 and the space object 1 corresponding to the frame 2 described in FIG. The server may prepare a set of video code streams for each spatial object.
  • the server may obtain encoding configuration parameters of each code stream in the video, and generate a code stream corresponding to each spatial object of the video according to the encoding configuration parameters of the code stream.
  • the client may request the video stream segment corresponding to a certain angle of view for a certain period of time to be output to the spatial object corresponding to the perspective when the video is output.
  • the client outputs the video stream segment corresponding to all the angles of view within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree space.
  • the server may first map the spherical surface into a plane, and divide the space on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
  • FIG. 8 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I.
  • the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like.
  • the server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 8 in a latitude and longitude mapping manner.
  • each spatial object corresponds to one sub-view
  • each set of DASH code streams corresponding to each spatial object is for each sub-view.
  • Viewing stream The spatial information of the spatial objects associated with each image in a view code stream is the same, whereby the view code stream can be set as a static view code stream.
  • the view code stream of each sub-view is part of the entire video stream, and the view code streams of all sub-views constitute a complete video stream.
  • the DASH code stream corresponding to the corresponding spatial object may be selected for playing according to the viewing angle currently viewed by the user.
  • the client can determine the DASH code stream corresponding to the target space object of the switch according to the new perspective selected by the user, and then switch the video play content to the DASH code stream corresponding to the target space object.
  • the embodiment of the invention provides a method and a device for processing video data based on HTTP dynamic adaptive streaming media, which can save transmission bandwidth resources of video data, improve flexibility and applicability of video presentation, and enhance user experience of video viewing.
  • the first aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
  • the media presentation description includes information of at least two representations, a first representation of the at least two representations is an author view code stream, and the author view code stream includes a plurality of images, The spatial information of the spatial object associated with at least two of the plurality of images is different, the second representation of the at least two representations is a static view code stream, and the static view code stream includes a plurality of images, the plurality of The spatial information of the spatial objects associated with the images is the same;
  • the segment of the first representation is acquired, otherwise, the segment of the second representation is obtained.
  • the embodiment of the present invention may describe the author view code stream and the static view code stream in the media presentation description, wherein the spatial information of the spatial object associated with the image included in the author view code stream may dynamically change, and the static view code stream includes The spatial information of the spatial object associated with the image does not change.
  • the embodiment may select a segment of the corresponding code stream from the author view code stream and the lens view code stream, thereby improving the flexibility of the stream stream segment selection and enhancing the video viewing.
  • User experience The embodiment of the invention obtains the segmentation of the corresponding code stream from the author view code stream and the static view code stream, and does not need to acquire all segments, which can save the transmission bandwidth resource of the video data and enhance the applicability of the data processing.
  • the media presentation description further includes identifier information, where the identifier information is used to identify an author view code stream of the video.
  • the media presentation description includes information of an adaptation set
  • the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transmission format
  • the information that is represented includes the identifier information.
  • the media presentation description includes information about a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the embodiment of the invention can add the identifier information of the author view code stream in the media program description, which can improve the recognizability of the author view code stream.
  • the embodiment of the present invention may also carry the identification information of the author's view code stream in the information of the adaptive set of the media presentation description, or the information of the representation of the media presentation description, or the information of the description of the media presentation description, and the operation is flexible. High applicability.
  • the server needs to add a syntax element corresponding to the author view code stream when generating the MPD, and the client may obtain the author view code stream information according to the syntax element.
  • a representation for describing the author's view code stream may be added to the MPD, and the first representation is set.
  • a representation existing in the MPD for describing a static view stream may be referred to as a second representation.
  • the representation of several possible MPD syntax elements is as follows. It can be understood that the MPD example of the embodiment of the present invention only shows the relevant part of the existing standard that modifies the syntax element of the MPD in the existing standard, and does not show all the syntax elements of the MPD file. The technical solution of the embodiment of the present invention can be applied by the operator in combination with the relevant provisions in the DASH standard.
  • Table 2 is an attribute information table of the newly added syntax element:
  • the attribute @view_type is used to mark whether the corresponding representation is a non-author view (or static view) code stream or an author view (or dynamic view) code stream.
  • the view_type value is 0, it indicates that the corresponding representation is a non-author view code stream; when the view_type value is 1, it indicates that the corresponding representation is the author view code stream.
  • the client parses the MPD file locally, it can determine whether the current video stream contains the author view stream according to the attribute.
  • Example 1 Described in the MPD descriptor
  • the server can insert a new value at the position where the EssentialProperty of the existing MPD syntax contains the second value of the value attribute, the second value of the original value, and the second value.
  • the value is followed by a value.
  • Example 2 Described in the representation
  • the syntax element view_type has been added to the property information of Representation.
  • Example 3 Described in the attribute information of the adaptation set (adaptationSet)
  • the syntax element view_type is added to the property information of the AdaptationSet (ie, the attribute information of the code stream set where the author's view stream is located).
  • description information of the independent file of the spatial information may also be added, such as adding an adaptation set, and describing the information of the spatial information file in the adaptation set;
  • the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation
  • the method further includes:
  • the spatial information of the spatial object associated with the image is a spatial relationship of the spatial component and the content component associated with the spatial object.
  • the spatial information is carried in a designated box in the segment of the first representation or in a specified box in a metadata representation associated with the segment of the first representation.
  • the specified box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
  • the embodiment of the present invention may add spatial information of a spatial object associated with an image included in an author view code stream in an author view code stream (specifically, a segment in an author view code stream), for the client to view the code stream according to the author view.
  • the spatial information of the spatial object associated with the image contained in the segmentation performs segmentation switching of the author view code stream, or switching between the author view code stream and the static view code stream, thereby improving the applicability of code stream switching and enhancing the client user.
  • the server may further add spatial information of one or more author space objects in the author view code stream.
  • the server may add the above spatial information to the trun box in the existing file format for describing the spatial information of the spatial object associated with each frame image of the author view code stream.
  • the server may add the syntax element tr_flags to the existing trun box and set the value of tr_flags to 0x001000, and the spatial information for marking the relative position of the preset space object in the global space object is included in Trun box.
  • FIG. 11 is a schematic illustration of the relative positions of author space objects in a panoramic space.
  • the point O is the center of the sphere corresponding to the 360-degree VR panoramic video spherical image, and can be regarded as the position of the human eye when viewing the VR panoramic image.
  • Point A is the center point of the author's perspective image
  • C and F are the boundary points of the A-point of the author's perspective image along the horizontal coordinate axis of the image
  • E and D are the points along the longitudinal axis of the image in the image of the author's perspective.
  • Boundary point B is the projection point of point A along the spherical meridian on the equator line
  • I is the starting coordinate point in the horizontal direction on the equator line.
  • Center_pitch the center position of the image of the author space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ⁇ AOB in FIG. 11;
  • Center_yaw the center position of the image of the author space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, as shown in FIG. 11 ⁇ IOB;
  • Center_roll the center position of the image of the author space object is mapped to the rotation angle of the point on the panoramic spherical image and the direction of the connection of the spherical center, as shown in FIG. 11 ⁇ DOB;
  • Pitch_h the image of the author's spatial object in the field of view of the panoramic spherical image, expressed as the maximum angle of the field of view, as shown in Figure 11 ⁇ DOE; yaw_w: the image of the author's spatial object in the field of view of the panoramic spherical image, with the field of view
  • ⁇ COF the horizontal maximum angle
  • the unsigned int (16) may not be included in the trun box.
  • the flag identifier and the spatial location information may not exist in the same box.
  • the flag may exist in the upper level box of the box where the spatial location information is located, such as in the newly added box and its syntax description information.
  • strp box is a box describing spatial information
  • the box above the box is stbl (or other)
  • add a flag description in stbl if the flag description contains strp box, Then the strp box can be parsed in stbl.
  • the server may also add a new box and its syntax description to the video format for describing the spatial information of the author space object.
  • a new box and its syntax description information are as follows (example 2):
  • the information contained in the strp box is the spatial information of the newly added author space object, and the meaning of each syntax element included is the same as the meaning of each syntax element included in the above example 1.
  • the "unsigned int(16)center_roll;//fov center position roll" in the box may not exist in the example, and may be determined according to actual application scenario requirements, and is not limited herein.
  • the above-mentioned strp box can be included in the stbl box, and the flag in the stbl box is marked as the spatial location information strp box.
  • the upper box of the specific strp box can be determined according to the actual application scenario requirements, and no limitation is imposed here.
  • the strp box described above may be described in the metadata (traf) of the segmentation of the DASH, or may be in the metadata (matadata) of a track based on the ISOBMAF format package.
  • the above strp box may also be included in a separate metadata stream or track associated with the author's view stream, and the sample contained in the metadata stream or track is spatial information related to the author's view stream. ;
  • the spatial information in the present invention may be the yaw angle and the width of the center of the angle of view; or may be expressed as the yaw angle at the upper left of the angle of view and the yaw angle at the lower right.
  • a flag is extended in a parameter set of the code stream, and the flag value is 1, indicating that the spatial position information of the current frame is included in the code stream data of each frame;
  • the code stream data contains spatial position information of the current frame.
  • the semantics of the attributes describing the spatial location information are consistent with the syntax semantics of the trun box, strp box.
  • the spatial location information is encapsulated in SEI (Supplemental enhancement information)
  • the ROI in the above grammar indicates a specific value, such as 190, which is not limited herein.
  • the semantics of the ROI and the syntax of the trun box and the strp box are consistent.
  • ROI_payload(payloadSize) description method one:
  • the spatial information is described by the center point position yaw angle (center_pitch and center_yaw) or the center position yaw angle offset (center_pitch_offset and center_yaw_offset) and the spatial position width angle (pitch_w) and the elevation angle (pitch_h).
  • the yaw angle of the upper left position of the space object and the yaw angle of the lower right position can also be used to describe.
  • the second aspect provides a processing device for video data based on HTTP dynamic adaptive streaming, which may include:
  • a receiving module configured to receive a media presentation description, where the media presentation description includes at least two representations, the first representation of the at least two representations is an author view code stream, and the author view code stream includes multiple The spatial information of the spatial object associated with at least two of the plurality of images is different; the second representation of the at least two representations is a static view code stream, and the static view code stream includes multiple An image, wherein spatial information of the spatial objects associated with the plurality of images is the same;
  • the obtaining module is further configured to: when the obtained instruction information is a viewing author view code stream And acquiring the segment of the first representation, otherwise acquiring the segment of the second representation.
  • the media presentation description further includes identifier information, where the identifier information is used to identify an author view code stream of the video.
  • the media presentation description includes information of an adaptation set
  • the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transmission format
  • the information that is represented includes the identifier information.
  • the media presentation description includes information about a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation
  • the obtaining module is further configured to:
  • the spatial information of the spatial object associated with the image is a spatial relationship of the spatial component and its associated content component.
  • the spatial information is carried in a specified box in the segment of the first representation, or in a specified box in a metadata representation associated with the segment of the first representation.
  • the specified box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
  • the embodiment of the present invention may describe the author view code stream and the static view code stream in the media presentation description, wherein the spatial information of the spatial object associated with the image included in the author view code stream may dynamically change, and the static view code stream includes The spatial information of the spatial object associated with the image does not change.
  • the embodiment of the present invention can select a segment of the corresponding code stream from the author view code stream and the lens view code stream according to the obtained instruction information, improve the flexibility of the code stream segment selection, and enhance the user experience of the video view.
  • the embodiment of the invention obtains the segmentation of the corresponding code stream from the author view code stream and the static view code stream, and does not need to acquire all segments, which can save the transmission bandwidth resource of the video data and enhance the applicability of the data processing.
  • the embodiment of the present invention may add spatial information of a spatial object associated with an image included in an author view code stream in an author view code stream (specifically, a segment in an author view code stream), for the client to view the code stream according to the author view.
  • the spatial information of the spatial object associated with the image contained in the segmentation performs segmentation switching of the author view code stream, or switching between the author view code stream and the static view code stream, thereby improving the applicability of code stream switching and enhancing the client user.
  • FIG. 1 is a schematic diagram of an example of a framework for DASH standard transmission used in system layer video streaming media transmission
  • FIG. 2 is a schematic structural diagram of an MPD transmitted by a DASH standard used for system layer video streaming media transmission
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a segmentation storage manner in code stream data
  • 5 is another schematic diagram of a segmentation storage manner in code stream data
  • Figure 6 is a schematic diagram of the spatial relationship of spatial objects
  • FIG. 7 is a schematic view of a perspective corresponding to a change in viewing angle
  • Figure 8 is another schematic diagram of the spatial relationship of spatial objects
  • FIG. 9 is a schematic flowchart of a method for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention.
  • Figure 10 is another schematic diagram of the spatial relationship of spatial objects
  • Figure 11 is a schematic illustration of the relative positions of author space objects in a panoramic space
  • FIG. 12 is a schematic structural diagram of a device for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a frame example of DASH standard transmission used in system layer video streaming media transmission.
  • the data transmission process of the system layer video streaming media transmission scheme includes two processes: a server side (such as an HTTP server, a media content preparation server, hereinafter referred to as a server) generates media data for video content, responds to a client request process, and a client ( The process of requesting and obtaining media data from a server, such as an HTTP streaming client.
  • the media data includes a media presentation description (MPD) and a media stream.
  • MPD media presentation description
  • the MPD on the server includes a plurality of representations (also called presentations, English: representation), each representation describing a plurality of segments.
  • the HTTP streaming request control module of the client obtains the MPD sent by the server, analyzes the MPD, determines the information of each segment of the video code stream described in the MPD, and further determines the segment to be requested, and sends the corresponding segment to the server.
  • the segment's HTTP request is decoded and played through the media player.
  • the media data generated by the server for the video content includes a video stream corresponding to different versions of the same video content, and an MPD of the code stream.
  • the server generates a low-resolution low-rate low frame rate (such as 360p resolution, 300kbps code rate, 15fps frame rate) for the video content of the same episode, and a medium-rate medium-rate high frame rate (such as 720p).
  • Resolution 1200 kbps, 25 fps frame rate, high resolution, high bit rate, high frame rate (such as 1080p resolution, 3000 kbps, 25 fps frame rate).
  • FIG. 2 is a schematic structural diagram of an MPD of a system transmission scheme DASH standard.
  • each of the information indicating a plurality of segments is described in time series, for example, Initialization Segment, Media Segment 1, Media Segment2, ..., Media Segment20, etc.
  • the representation may include segmentation information such as a playback start time, a playback duration, and a network storage address (for example, a network storage address expressed in the form of a Uniform Resource Locator (URL)).
  • URL Uniform Resource Locator
  • the client In the process of the client requesting and obtaining the media data from the server, when the user selects to play the video, the client obtains the corresponding MPD according to the video content requested by the user to the server.
  • the client sends a request for downloading the code stream segment corresponding to the network storage address to the server according to the network storage address of the code stream segment described in the MPD, and the server sends the code stream segment to the client according to the received request.
  • the client After the client obtains the stream segment sent by the server, it can perform decoding, playback, and the like through the media player.
  • the system layer video streaming media transmission scheme adopts the DASH standard, and realizes the transmission of video data by analyzing the MPD by the client, requesting the video data to the server as needed, and receiving the data sent by the server.
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • Service Three different versions of code stream data can be prepared for the same video content (such as a movie), and three different versions of the code stream data are described in the MPD using three Representations.
  • the above three Representations (hereinafter referred to as rep) can be assumed to be rep1, rep2, rep3, and the like.
  • rep1 is a high-definition video with a code rate of 4mbps (megabits per second)
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a normal video with a code rate of 1mbps.
  • Each rep segment contains a video stream within a time period.
  • each rep describes the segments of each time segment according to the time series, and the segment lengths of the same time period are the same, thereby enabling content switching of segments on different reps.
  • the segment marked as shadow in the figure is the segmentation data requested by the client, wherein the first 3 segments requested by the client are segments of rep3, and the client may request rep2 when requesting the 4th segment.
  • the fourth segment in the middle can be switched to play on the fourth segment of rep2 after the end of the third segment of rep3.
  • the playback end point of the third segment of Rep3 (corresponding to the time end of the playback time) is the playback start point of the fourth segment (corresponding to the time start time of playback), and also rep2 or rep1.
  • the playback start point of the 4th segment is used to achieve alignment of segments on different reps. After the client requests the 4th segment of rep2, it switches to rep1, requests the 5th segment and the 6th segment of rep1, and so on. Then you can switch to rep3, request the 7th segment of rep3, then switch to rep1, request the 8th segment of rep1.
  • Each rep segment can be stored in a file end to end, or it can be stored as a small file.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS). It can be determined according to the requirements of the actual application scenario, and no limitation is imposed here.
  • FIG. 4 is a schematic diagram of a segment storage mode in the code stream data; All the segments on the same rep are stored in one file, as shown in Figure 5.
  • Figure 5 is another schematic diagram of the segmentation storage mode in the code stream data.
  • each segment in the segment of repA is stored as a file separately, and each segment in the segment of repB is also stored as a file separately.
  • the server may describe each segment in the form of a template or a list in the MPD of the code stream. URL and other information.
  • the server may use an index segment (English: index segment, that is, sidx in FIG. 5) in the MPD of the code stream to describe related information of each segment.
  • the index segment describes the byte offset of each segment in its stored file, the size of each segment, and the duration of each segment (duration, also known as the duration of each segment).
  • the spatial area of the VR video is a 360-degree panoramic space (or omnidirectional space), which exceeds the normal of the human eye.
  • the visual range therefore, the user will change the viewing angle (ie, the angle of view, FOV) at any time while watching the video.
  • the viewing angle of the user is different, and the video images seen will also be different, so the content presented by the video needs to change as the user's perspective changes.
  • FIG. 7 is a schematic diagram of a perspective corresponding to a change in viewing angle.
  • Box 1 and Box 2 are two different perspectives of the user, respectively.
  • the user can switch the viewing angle of the video viewing from the frame 1 to the frame 2 through the operation of the eye or the head rotation or the screen switching of the video viewing device.
  • the video image viewed when the user's perspective is box 1 is a video image presented by the one or more spatial objects corresponding to the perspective at the moment.
  • the user's perspective is switched to box 2.
  • the video image viewed by the user should also be switched to the video image presented by the space object corresponding to box 2 at that moment.
  • the server may divide the panoramic space within a 360-degree viewing angle range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user.
  • the splicing of multiple sub-views forms a complete human eye viewing angle. That is, the human eye angle of view (hereinafter referred to as the angle of view) may correspond to one or more spatial objects, and the spatial objects corresponding to the angle of view are all spatial objects corresponding to the content objects within the scope of the human eye.
  • the viewing angle of the human eye can be dynamically changed, and the spatial object corresponding to the content object in the range of the human eye angle of 120 degrees*120 degrees and 120 degrees*120 degrees may include one or There are a plurality of, for example, the viewing angle 1 corresponding to the frame 1 described in FIG. 7 and the viewing angle 2 corresponding to the frame 2.
  • the client may obtain the spatial information of the video code stream prepared by the server for each spatial object through the MPD, and then request the video code stream corresponding to one or more spatial objects in a certain period of time according to the requirement of the perspective.
  • the segment outputs the corresponding spatial object according to the perspective requirements.
  • the client outputs the video stream segment corresponding to all the spatial objects within the 360-degree viewing angle range in the same time period, and then displays the complete video image in the entire 360-degree panoramic space.
  • the server may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
  • FIG. 8 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I.
  • the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like.
  • the server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 8 in a latitude and longitude mapping manner. As shown in FIG. 8, after the server can divide the spherical panoramic space into a plurality of spatial objects such as A to I, a set of DASH code streams can be prepared for each spatial object.
  • each spatial object corresponds to a set of DASH code streams.
  • the client user switches the viewing angle of the video viewing, the client can obtain the code stream corresponding to the new spatial object according to the new perspective selected by the user, and then the video content of the new spatial object code stream can be presented in the new perspective.
  • FIG. 9 is a schematic flowchart diagram of a method for processing video data according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention includes the following steps:
  • instruction information is a viewing author view code stream, acquiring the segment of the first representation, and otherwise acquiring the segment of the second representation.
  • the video image to be presented to the user at each play time during video playback can be set according to the above-mentioned main plot route, and the video sequence of each play time can be obtained by stringing the time series to obtain the above main plot route.
  • Storyline The video image to be presented to the user at each of the playing times may be presented with a spatial object corresponding to each playing time, that is, a video image to be presented in the time period is presented on the spatial object.
  • the angle of view corresponding to the video image to be presented at each of the playing times may be set as the author's perspective
  • the spatial object that presents the video image in the perspective of the author may be set as the author space object.
  • the code stream corresponding to the author view object can be set as the author view code stream.
  • the author's view stream contains multiple video frames, and each video frame can be rendered as one image, that is, the author's view stream contains multiple images.
  • the image presented by the author's perspective is only part of the panoramic image (or VR image or omnidirectional image) that the entire video is to present.
  • the spatial information of the spatial object associated with the image presented by the author video stream may be different or the same, that is, the spatial information of the spatial object associated with at least two images of the plurality of images included in the author's perspective stream. different.
  • the corresponding code stream can be prepared by the server for the author perspective of each play time.
  • the code stream corresponding to the author view may be set as the author view code stream.
  • the server may encode the author view code stream and transmit it to the client.
  • the story scene picture corresponding to the author view code stream may be presented to the user.
  • the server does not need to transmit the code stream of other perspectives other than the author's perspective (set to the non-author perspective, that is, the static view stream) to the client, which can save resources such as the transmission bandwidth of the video data.
  • the author's perspective is a spatial object that presents the preset image according to the video storyline
  • the author's space object at different playing moments may be different or the same, thereby knowing that the author's perspective is one.
  • the author A spatial object is a dynamic spatial object with a constantly changing position, that is, the position of the author space object corresponding to each play time is different in the panoramic space.
  • Each of the spatial objects shown in FIG. 8 is a spatial object divided according to a preset rule, and is a spatial object fixed in a relative position in the panoramic space.
  • the author space object corresponding to any play time is not necessarily fixed as shown in FIG. 8.
  • One of the spatial objects and a spatial object whose relative position is constantly changing in the global space.
  • the content presented by the video obtained by the client from the server is stringed by each author's perspective. It does not contain the spatial object corresponding to the non-author perspective.
  • the author view code stream only contains the content of the author space object, and the MPD obtained from the server does not.
  • the spatial information of the author's spatial object containing the author's perspective the client can only decode and present the code stream of the author's perspective. If the viewing angle of the viewing is switched to the non-author perspective during the video viewing process, the client cannot present the corresponding video content to the user.
  • the embodiment of the invention modifies the MPD file and the video file format (file format) of the video provided in the DASH standard, so as to realize the video content presentation in the process of switching between the author perspective and the non-author perspective in the video playback process.
  • the modification of the DASH MPD file provided by the present invention can also be carried in a .m3u8 file defined by the HTTP protocol-based real-time stream (English: Http Live Streaming, HLS) or a smooth stream (English: Smooth Streaming, SS.ismc file)
  • HTTP protocol-based real-time stream English: Http Live Streaming, HLS
  • a smooth stream English: Smooth Streaming, SS.ismc file
  • SDP session description protocol
  • the modification of the file format can also be applied to the file format of ISOBMFF or MPEG2-TS, which can be determined according to the requirements of the actual application scenario.
  • the embodiment of the present invention will be described by taking the above identification information in the DASH code stream as an example.
  • the identifier information may be added to the media presentation description for identifying the author view code stream of the video, that is, the author view code stream.
  • the identifier information may be carried in the attribute information of the code stream set in which the author view code stream is carried in the media presentation description, that is, the identifier information may be carried in the information of the adaptive set in the media presentation description, where the identifier is
  • the information can also be carried in the information contained in the presentation contained in the media presentation description. Further, the foregoing identification information may also be carried in the information of the descriptor in the media presentation description.
  • the client can quickly obtain the code stream of the author view code stream and the non-author view by parsing the MPD to obtain the syntax elements added in the MPD.
  • the specific modification or added syntax is described in Table 2 below, and Table 2 is new.
  • Table 2 The attribute information table of the grammar element:
  • the attribute @view_type is used to mark whether the corresponding representation is a non-author view (or static view) code stream or an author view (or dynamic view) code stream.
  • the view_type value is 0, it indicates that the corresponding representation is a non-author view code stream; when the view_type value is 1, it indicates that the corresponding representation is the author view code stream.
  • the client parses the MPD file locally, it can determine whether the current video stream contains the author view stream according to the attribute.
  • Example 1 Described in the MPD descriptor
  • the server can insert a new value at the position where the EssentialProperty of the existing MPD syntax contains the second value of the value attribute, the second value of the original value, and the second value.
  • the value is followed by a value.
  • the client parses the MPD, it can obtain the second value of the value. That is, in this example, the second value of value is view_type.
  • Example 2 Described in the representation
  • the syntax element view_type has been added to the property information of Representation.
  • Example 3 Described in the attribute information of the adaptation set (adaptationSet)
  • the syntax element view_type is added to the property information of the AdaptationSet (ie, the attribute information of the code stream set where the author's view stream is located).
  • the space may be added.
  • Descriptive information of the information file such as adding an adaptation set, describing the information of the spatial information file in the adaptation set;
  • the client can determine the author view code stream according to the identification information such as the view_type carried in the MPD by parsing the MPD. Further, the client may obtain a segment in the author view code stream when the received instruction information indicates to view the author view code stream, and present a segment of the author view code stream. If the instruction information indicates that the author's view code stream is not viewed, the segment of the static view code stream may be acquired for presentation. If the spatial information related to the author view stream is encapsulated in a separate metadata file, the client can parse the MPD and obtain the metadata of the spatial information according to the codec identifier, thereby parsing the spatial information;
  • the switching instruction information received by the client may include the above-mentioned head rotation, eyes, gestures, or other human behavior action information, and may also include input information of the user, and the input information may include keyboard input information. , voice input information and touch screen input information.
  • the server may also add spatial information of one or more author space objects to the author view code stream.
  • each author space object corresponds to one or Multiple images, ie one or more images, may be associated with the same spatial object, or each image may be associated with one spatial object.
  • the server can add the spatial information of each author space object in the author view code stream, and can also use the space information as a sample and independently encapsulate it in a track or file.
  • the spatial information of an author space object is the spatial relationship between the author space object and its associated content component, that is, the spatial relationship between the author space object and the panoramic space.
  • the space described by the spatial information of the author space object may specifically be a partial space in the panoramic space, such as any one of the above-mentioned FIG. 8 or the solid line frame (or any one of the dotted lines) in FIG.
  • the server may add the foregoing spatial information to the trun box included in the segment of the author view code stream in the existing file format, and describe each frame image of the author view code stream.
  • the spatial information of the associated spatial object may be a partial space in the panoramic space, such as any one of the above-mentioned FIG. 8 or the solid line frame (or any one of the dotted lines) in FIG.
  • the server may add the foregoing spatial information to the trun box included in the segment of the author view code stream in the existing file format, and describe each frame image of the author view code stream.
  • the spatial information of the associated spatial object may be a partial space in the panoramic space, such as any one of the above-mentioned FIG. 8 or the solid line frame (or any one of the dotted lines)
  • the server may add the syntax element tr_flags to the existing trun box and set the value of tr_flags to 0x001000, and the spatial information for marking the relative position of the preset space object in the global space object is included in Trun box.
  • the spatial information of the author space object included in the trun box described above is described by a yaw angle, or the spatial position description of the latitude and longitude map may be used, or other geometric solid figures may be used for description. limit.
  • the trun box described above uses yaw angle descriptions such as center_pitch, center_yaw, center_roll, pitch_h, and yaw_w to describe the center position (center_pitch, center_yaw, center_roll), height (pitch_h), width yaw_w yaw angle of the spatial information in the sphere.
  • Figure 11 is a schematic illustration of the relative positions of author space objects in a panoramic space. In FIG.
  • the point O is the center of the sphere corresponding to the 360-degree VR panoramic video spherical image, and can be regarded as the position of the human eye when viewing the VR panoramic image.
  • Point A is the center point of the author's perspective image
  • C and F are the boundary points of the A-point of the author's perspective image along the horizontal coordinate axis of the image
  • E and D are the points along the longitudinal axis of the image in the image of the author's perspective.
  • Boundary point, B is the projection point of point A along the spherical meridian on the equator line
  • I is the starting coordinate point in the horizontal direction on the equator line. The meaning of each element is explained as follows:
  • Center_pitch the center position of the image of the author space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ⁇ AOB in FIG. 11;
  • Center_yaw the center position of the image of the author space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, as shown in FIG. 11 ⁇ IOB;
  • Center_roll the center position of the image of the author space object is mapped to the rotation angle of the point on the panoramic spherical image and the direction of the connection of the spherical center, as shown in FIG. 11 ⁇ DOB;
  • Pitch_h the image of the author's spatial object in the field of view of the panoramic spherical image, expressed as the maximum angle of the field of view, as shown in Figure 11 ⁇ DOE; yaw_w: the image of the author's spatial object in the field of view of the panoramic spherical image, with the field of view
  • ⁇ COF the horizontal maximum angle
  • the server may also add a new box and its syntax description to the video format for describing the spatial information of the author space object.
  • a new box and its syntax description information are as follows (example 2):
  • the information contained in the strp box is the spatial information of the newly added author space object, and the meaning of each syntax element included is the same as the meaning of each syntax element included in the above example 1.
  • the "unsigned int(16)center_roll;//fov center position roll" in the box may not exist in the example, and may be determined according to actual application scenario requirements, and is not limited herein.
  • the above-mentioned strp box can be included in the stbl box, and the flag in the stbl box is marked as the spatial location information strp box.
  • the upper box of the specific strp box can be determined according to the actual application scenario requirements, and no limitation is imposed here.
  • the strp box described above may be described in the metadata (traf) of the segmentation of the DASH, or in the metadata (matadata) of a track based on the ISOBMAF format.
  • Spatial information (fov center position and width and height, or fov upper left) described in the above various embodiments
  • the location and the lower right position may also be included in a separate metadata stream or trajectory associated with the author's view stream, each spatial information corresponding to a metadata trajectory or a sample in the metadata file.
  • the client parses the segment of the author's view code stream, and obtains the spatial information of the author space object, and then determines the relative position of the author space object in the panoramic space, and then can be in the video playback process according to the currently viewed author view code stream.
  • the space information of the author's spatial object and the trajectory of the perspective switching determine the position of the spatial object after the perspective switching, so as to realize the switching between the author's perspective and the non-author's perspective stream.
  • the spatial information of the author's spatial object can also be analyzed by the analytic element. Data track or metadata file acquisition.
  • the spatial information of the non-author perspective after the switching may be determined according to the spatial information of the author's perspective to obtain the non-author.
  • the code stream corresponding to the spatial information of the perspective is presented.
  • the client may set the center position of the author space object determined above or the specified boundary position included in the author space object as a starting point, or a position corresponding to the upper left and the lower right of the space object, for example, the above center_pitch, center_yaw, center_roll , pitch_h and
  • the position indicated by one or more parameters in yaw_w is set as the starting point.
  • the client may calculate the end point space object indicated by the space object switching trajectory according to the spatial object switching trajectory switched by the viewing angle, and determine the end point space object as the target space object.
  • the solid line area shown in FIG. 10 is an author space object
  • the dotted line area is a target space object calculated based on the author space object and the space object switching trajectory.
  • the server may request the code corresponding to the target space object. flow.
  • the client may send a request for acquiring a code stream of the target space object to the server according to information such as a URL of a code stream of each spatial object described in the MPD.
  • the server may send the code stream corresponding to the target space object to the client.
  • the client obtains the code stream of the target space object, the code stream of the target space object can be decoded and played, and the switching of the view code stream is realized.
  • the client may determine the author view according to the identification information carried in the MPD.
  • the corner stream can also obtain the spatial information of the author space object corresponding to the author view stream carried in the author's view stream, and then can obtain the author view stream according to the position of the author space object during the view switching process, or The target space object of the non-author perspective after the perspective switching is determined according to the author space object.
  • the server may request the non-author view code stream corresponding to the target space object to be played, and the code stream of the view switching is played. The client does not need to load the panoramic video stream when the code stream is switched according to the switching of the view, which can save the transmission bandwidth of the video data and the local storage space of the client.
  • the client requests the code stream corresponding to the target space object to play according to the target space object determined in the perspective switching process, and receives the bandwidth of the video data transmission and the like, and also realizes the stream switching of the view switching, thereby improving the video switching play. Applicability to enhance the user experience of video viewing.
  • FIG. 12 is a schematic structural diagram of a device for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention.
  • the processing device provided by the embodiment of the present invention includes:
  • the receiving module 121 is configured to receive a media presentation description, where the media presentation description includes information of at least two representations, and the first representation of the at least two representations is an author view code stream, where the author view code stream is included a plurality of images, wherein spatial information of the spatial objects associated with at least two of the plurality of images is different; the second representation of the at least two representations is a static view code stream, and the static view code stream includes more An image in which the spatial information of the spatial objects associated with the plurality of images is the same.
  • the media presentation describes a file that includes at least one spatial information associated with the author view codestream.
  • the obtaining module 122 is configured to obtain instruction information.
  • the obtaining module 122 is further configured to: acquire the segment of the first representation when the obtained instruction information is a viewing author view code stream, and otherwise acquire the segment of the second representation.
  • the media presentation description further includes identification information, where the identification information is used to identify an author view code stream of the video.
  • the media presentation description includes information of an adaptation set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transport format;
  • the information that is represented includes the identifier information.
  • the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of the spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation
  • the media presentation description includes a description information of a spatial information file
  • the obtaining module 122 is further configured to: acquire part or all of the data in the spatial information file;
  • the obtaining module 122 is further configured to:
  • the spatial information of the spatial object associated with the image is a spatial relationship of the spatial object and its associated content component.
  • the spatial information is carried in a designated box in the segment of the first representation, or in a specified box in a metadata representation associated with the segment of the first representation.
  • the designated box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
  • the obtaining module 122 is further configured to:
  • the processing device for the video data provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in each step of the foregoing video data processing method may be implemented by using the built-in modules. I will not repeat them here.
  • the client may determine the author view code stream according to the identifier information carried in the MPD, and may also obtain the author space pair corresponding to the author view code stream carried in the author view code stream.
  • the spatial information of the image can be obtained by acquiring the author's perspective stream according to the position of the author's spatial object during the perspective switching, or determining the target space object of the non-author perspective after the perspective switching according to the author's spatial object.
  • the server may request the non-author view code stream corresponding to the target space object to be played, and the code stream of the view switching is played.
  • the client does not need to load the panoramic video stream when the code stream is switched according to the switching of the view, which can save the transmission bandwidth of the video data and the local storage space of the client.
  • the client requests the code stream corresponding to the target space object to play according to the target space object determined in the perspective switching process, and receives the bandwidth of the video data transmission and the like, and also realizes the stream switching of the view switching, thereby improving the video switching play. Applicability to enhance the user experience of video viewing.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

Disclosed in the embodiments of the present invention are a method and apparatus for processing hypertext transfer protocol (HTTP) dynamic adaptive streaming media-based video data, the method comprising: receiving a media presentation description, the media presentation description comprising information of at least two representations. A first representation among the at least two representations is an author view code stream, the author view code stream comprising a plurality of images, and space information of space objects associated with at least two images in the plurality of images is different. A second representation of the at least two representations is a static view code stream, the static view code stream comprising a plurality of images, and space information of space objects associated with the plurality of images is identical; obtaining instruction information; and acquiring a segment of the first representation if the instruction information is viewing the author view code stream, otherwise acquiring a segment of the second representation. Using the embodiments of the present invention has the advantages of saving transmission resources of video data, improving flexibility and applicability of video presentation, and enhancing video viewing experience of a user.

Description

一种视频数据的处理方法及装置Method and device for processing video data 技术领域Technical field
本发明涉及流媒体数据处理领域,尤其涉及一种视频数据的处理方法及装置。The present invention relates to the field of streaming media data processing, and in particular, to a method and an apparatus for processing video data.
背景技术Background technique
随着虚拟现实(英文:virtual reality,VR)技术的日益发展完善,360度视角等VR视频的观看应用越来越多地呈现在用户面前。在VR视频观看过程中,用户随时可能变换视角(英文:field of view,FOV),每个视角对应一个空间对象的视频码流,视角切换时呈现在用户视角内的VR视频图像也应当随着切换。With the development of virtual reality (VR) technology, VR video viewing applications such as 360-degree viewing angles are increasingly presented to users. During the VR video viewing process, the user may change the view angle (English: field of view, FOV) at any time. Each view corresponds to a video stream of a spatial object, and the VR video image presented in the user's perspective when the view is switched should also follow Switch.
现有技术在VR视频准备阶段将VR全景视频划分为多个的固定空间对象对应的多个码流,每个固定空间对象对应一组基于通过超文本传输协议(英文:hypertext transfer protocol,HTTP)动态自适应流(英文:dynamic adaptive streaming over HTTP,DASH)码流。在用户变换视角时,终端根据用户切换后的新空间对象选择视频中包含该空间对象的一个或者多个固定空间对象,每个固定空间对象包含切换后的空间对象的一部分。终端获取全景空间对象的码流,并解码上述一个或者多个固定空间对象的码流后再根据新空间对象呈现该空间对象对应的视频内容。现有技术中终端需要将全景空间对象的码流存储到本地存储空间中,再根据新空间对象选择相应的码流进行呈现,不呈现的空间对象对应的码流则为过剩的视频数据,不仅占用了本地存储空间,还造成了视频数据传输的网络传输带宽的浪费,适用性差。In the VR video preparation phase, the VR video is divided into multiple code streams corresponding to multiple fixed space objects, and each fixed space object is correspondingly based on a hypertext transfer protocol (HTTP). Dynamic adaptive streaming over HTTP (DASH) code stream. When the user changes the view angle, the terminal selects one or more fixed space objects in the video that include the space object according to the new space object after the user switches, and each fixed space object includes a part of the switched space object. The terminal acquires the code stream of the panoramic space object, and decodes the code stream of the one or more fixed space objects, and then presents the video content corresponding to the space object according to the new space object. In the prior art, the terminal needs to store the code stream of the panoramic space object into the local storage space, and then select the corresponding code stream according to the new space object for presentation, and the code stream corresponding to the non-presented space object is excess video data, not only It occupies the local storage space, and also causes a waste of network transmission bandwidth of video data transmission, and has poor applicability.
发明内容 Summary of the invention
一、MPEG-DASH技术介绍First, MPEG-DASH technology introduction
2011年11月,MPEG组织批准了DASH标准,DASH标准是基于HTTP协议传输媒体流的技术规范(以下称DASH技术规范);DASH技术规范主要由两大部分组成:媒体呈现描述(英文:Media Presentation Description,MPD)和媒体文件格式(英文:file format)。In November 2011, the MPEG organization approved the DASH standard, which is a technical specification for transmitting media streams based on the HTTP protocol (hereinafter referred to as the DASH technical specification); the DASH technical specification is mainly composed of two major parts: the media presentation description (English: Media Presentation) Description, MPD) and media file format (English: file format).
1、媒体文件格式1, media file format
在DASH中服务器会为同一个视频内容准备多种版本的码流,每个版本的码流在DASH标准中称为表示(英文:representation)。表示是在传输格式中的一个或者多个码流的集合和封装,一个表达中包含一或者多个分段。不同版本的码流的码率、分辨率等编码参数可以不同,每个码流分割成多个小的文件,每个小文件被称为分段(或称分段,英文:segment)。在客户端请求媒体分段数据的过程中可以在不同的媒体表示之间切换,如图3所示,服务器为一部电影准备了3个表示,包括rep1,rep2,rep3。其中,rep1是码率为4mbps(每秒兆比特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的标清视频。图3中标记为阴影的分段是客户端请求播放的分段数据,客户端请求的前三个分段是媒体表示rep3的分段,第四个分段切换到rep2,请求第四个分段,之后切换到rep1,请求第五个分段和第六个分段等。每个表示的分段可以首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。In DASH, the server prepares multiple versions of the code stream for the same video content. Each version of the code stream is called a representation in the DASH standard (English: representation). Representation is a collection and encapsulation of one or more codestreams in a transport format, one representation containing one or more segments. Different versions of the code stream may have different coding parameters such as code rate and resolution, and each code stream is divided into a plurality of small files, and each small file is called segmentation (or segmentation, English: segment). In the process of requesting media segmentation data by the client, it is possible to switch between different media representations. As shown in FIG. 3, the server prepares three representations for a movie, including rep1, rep2, and rep3. Among them, rep1 is a high-definition video with a code rate of 4mbps (megabits per second), rep2 is a standard-definition video with a code rate of 2mbps, and rep3 is a standard-definition video with a code rate of 1mbps. The segment marked as shaded in Figure 3 is the segmentation data requested by the client. The first three segments requested by the client are the segments of the media representation rep3, the fourth segment is switched to rep2, and the fourth segment is requested. Segment, then switch to rep1, request the fifth segment and the sixth segment, and so on. Each represented segment can be stored in a file end to end, or it can be stored as a small file. The segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
2、媒体呈现描述2, the media presentation description
在DASH标准中,媒体呈现描述被称为MPD,MPD可以是一个xml的文件,文件中的信息是采用分级方式描述,如图2所示,上一级的信息被下一级完全继承。在该文件中描述了一些媒体元数据,这些元数据可以使得客户端了解服务器中的媒体内容信息,并且可以使用这些信息构造请 求segment的http-URL。In the DASH standard, the media presentation description is called MPD, and the MPD can be an xml file. The information in the file is described in a hierarchical manner. As shown in FIG. 2, the information of the upper level is completely inherited by the next level. Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the request. Find the http-URL of the segment.
在DASH标准中,媒体呈现(英文:media presentation),是呈现媒体内容的结构化数据的集合;媒体呈现描述(英文:media presentation description),一个规范化描述媒体呈现的文件,用于提供流媒体服务;时期(英文:period),一组连续的时期组成整个媒体呈现,时期具有连续和不重叠的特性;表示(英文:representation),封装有一个或多个具有描述性元数据的的媒体内容成分(编码的单独的媒体类型,例如音频、视频等)的结构化的数据集合即表示是传输格式中一个或者多个码流的集合和封装,一个表示中包含一个或者多个分段;自适应集(英文:AdaptationSet),表示同一媒体内容成分的多个可互替换的编码版本的集合,一个自适应集包含一个或者多个表示;子集(英文:subset),一组自适应集合的组合,当播放器播放其中所有自适应集合时,可以获得相应的媒体内容;分段信息,是媒体呈现描述中的HTTP统一资源定位符引用的媒体单元,分段信息描述媒体数据的分段,媒体数据的分段可以存储在一个文件中,也可以单独存储,在一种可能的方式中,MPD中会存储媒体数据的分段。In the DASH standard, media presentation (English: media presentation) is a collection of structured data for presenting media content; media presentation description (English: media presentation description), a standardized description of media presentation files for providing streaming media services Period (English: period), a set of consecutive periods that constitute the entire media presentation, the period has continuous and non-overlapping characteristics; representation (English: representation), encapsulated with one or more media content components with descriptive metadata A structured data set (encoded individual media types, such as audio, video, etc.) is a collection and encapsulation of one or more code streams in a transport format, one representation comprising one or more segments; Set (English: AdaptationSet), which represents a set of multiple interchangeable coded versions of the same media content component, an adaptive set containing one or more representations; a subset (English: subset), a set of adaptive sets When the player plays all of the adaptive collections, the corresponding media content can be obtained; The information is a media unit referenced by the HTTP uniform resource locator in the media presentation description, and the segmentation information describes segmentation of the media data, and the segmentation of the media data may be stored in one file or separately, in a possible In the mode, the segmentation of the media data is stored in the MPD.
本发明有关MPEG-DASH技术的相关技术概念可以参考ISO/IEC23009-1:2014 Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats,中的有关规定,也可以参考历史标准版本中的相关规定,如ISO/IEC 23009-1:2013或ISO/IEC 23009-1:2012等。The related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC23009-1:2014 Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats, You can refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
二、虚拟现实(virtual reality,VR)技术介绍Second, virtual reality (VR) technology introduction
虚拟现实技术是一种可以创建和体验虚拟世界的计算机仿真系统,它利用计算机生成一种模拟环境,是一种多源信息融合的交互式的三维动态视景和实体行为的系统仿真,可以使用户沉浸到该环境中。VR主要包括 模拟环境、感知、自然技能和传感设备等方面。模拟环境是由计算机生成的、实时动态的三维立体逼真图像。感知是指理想的VR应该具有一切人所具有的感知。除计算机图形技术所生成的视觉感知外,还有听觉、触觉、力觉、运动等感知,甚至还包括嗅觉和味觉等,也称为多感知。自然技能是指人的头部转动,眼睛、手势、或其他人体行为动作,由计算机来处理与参与者的动作相适应的数据,并对用户的输入作出实时响应,并分别反馈到用户的五官。传感设备是指三维交互设备。当VR视频(或者360度视频,或者全方位视频(英文:Omnidirectional video))在头戴设备和手持设备上呈现时,只有对应于用户头部的方位部分的视频图像呈现和相关联的音频呈现。Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment. VR mainly includes Simulate aspects such as environment, perception, natural skills and sensing equipment. The simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has. In addition to the visual perception generated by computer graphics technology, there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception. Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors. The computer processes the data that is appropriate to the actions of the participants, responds to the user's input in real time, and feeds back to the user's facial features. . A sensing device is a three-dimensional interactive device. When VR video (or 360 degree video, or Omnidirectional video) is presented on the headset and handheld device, only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented. .
VR视频和通常的视频(英文:normal video)的差别在于通常的视频是整个视频内容都会被呈现给用户;VR视频是只有整个视频的一个子集被呈现给用户(英文:in VR typically only a subset of the entire video region represented by the video pictures)。The difference between VR video and normal video (English: normal video) is that the normal video is that the entire video content will be presented to the user; VR video is only a subset of the entire video is presented to the user (English: in VR typically only a Subset of the entire video region represented by the video pictures).
三、现有DASH标准的空间描述:Third, the spatial description of the existing DASH standard:
现有标准中,对空间信息的描述原文是“The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects.A Spatial Object is defined as a spatial part of a content component(e.g.a region of interest,or a tile)and represented by either an Adaptation Set or a Sub-Representation.”In the existing standard, the description of the spatial information is "The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects. A Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an Adaptation Set or a Sub-Representation."
【中文】:MPD中描述的是空间对象(即Spatial Objects)之间的空间关系(即spatial relationships)。空间对象被定义为一个内容成分的一部分空间,比如现有的感兴趣区域(英文:region of interest,ROI)和tile;空间关系可以在Adaptation Set和Sub-Representation中描述。现有DASH标准在MPD中定义了一些描述子元素,每个描述子元素都有两个属性,schemeIdURI和value。其中,schemeIdURI描述了当前描述子是什么,value 是描述子的参数值。在已有的标准中有两个已有描述子SupplementalProperty和EssentialProperty(补充特性描述子和基本特性描述子)。现有标准中如果这两个描述子的schemeIdURI="urn:mpeg:dash:srd:2014"(或者schemeIdURI=urn:mpeg:dash:VR:2017),则表示该描述子描述了关联到的空间对象的空间信息(spatial information associated to the containing Spatial Object.),相应的value中列出了SDR的一系列参数值。具体value的语法如下表1:[英文]: The spatial relationship between spatial objects (Spatial Objects) is described in MPD. A spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation. The existing DASH standard defines some descriptor elements in the MPD. Each descriptor element has two attributes, schemeIdURI and value. Where schemeIdURI describes what the current descriptor is, value Is the parameter value of the descriptor. In the existing standards, there are two existing descriptions of SupplementalProperty and EssentialProperty (supplemental feature descriptors and basic property descriptors). If the two descriptors have the schemeIdURI="urn:mpeg:dash:srd:2014" (or schemeIdURI=urn:mpeg:dash:VR:2017) in the existing standard, it means that the descriptor describes the associated space. The spatial information associated to the containing Spatial Object., the corresponding value lists a series of parameter values of the SDR. The syntax of the specific value is shown in Table 1:
表1Table 1
Figure PCTCN2016107111-appb-000001
Figure PCTCN2016107111-appb-000001
Figure PCTCN2016107111-appb-000002
Figure PCTCN2016107111-appb-000002
在本发明的一种实施方式中,结合图11对空间位置的描述方式做进一步的示例性说明。In one embodiment of the present invention, a description of the spatial location is further exemplified in conjunction with FIG.
如图11所示,空间位置信息还可以包括图中的粗线框对应的空间位置和细线框对应的空间位置,粗线框对应的空间对象完全覆盖细线框的空间对象,细线框的空间对象覆盖部分粗线框的空间对象;As shown in FIG. 11 , the spatial location information may further include a spatial location corresponding to the thick wire frame in the figure and a spatial location corresponding to the thin wire frame, and the spatial object corresponding to the thick wire frame completely covers the spatial object of the thin wire frame, and the thin wire frame Space object covers part of the thick line space object;
上述的空间位置描述方式可以如下:The above spatial location description can be as follows:
方式1、用视角中心位置和视角的上下角度范围来描述粗线框中的空间位置;用视角中心位置和视角的上下角度范围来描述细线框中的空间位置;Method 1, using the upper and lower angular ranges of the viewing angle center position and the viewing angle to describe the spatial position in the thick line frame; using the upper and lower angular ranges of the viewing angle center position and the viewing angle to describe the spatial position in the thin line frame;
Figure PCTCN2016107111-appb-000003
Figure PCTCN2016107111-appb-000003
Figure PCTCN2016107111-appb-000004
Figure PCTCN2016107111-appb-000004
本发明中所有的空间位置信息语法都可以独立的存在一个box中,比如strp box,如果strp box的上一级存在一个标识,标识strp box存在,那么strp box中描述粗线框中的空间位置信息和细线框空间位置信息。All the spatial location information syntax in the present invention can exist independently in a box, such as a strp box. If there is an identifier in the upper level of the strp box, and the identifier strp box exists, the space position in the thick line box is described in the strp box. Information and thin wireframe space location information.
方式2、用视角中心位置和视角的上下角度范围来描述粗线框中的空间位置;用视角中心位置偏移和视角的上下角度范围偏移来描述细线框中的空间位置;Mode 2, the spatial position of the thick line frame is described by the upper and lower angular ranges of the central position of the viewing angle and the viewing angle; the spatial position of the thin line frame is described by the offset of the central position of the viewing angle and the upper and lower angular extent of the viewing angle;
Figure PCTCN2016107111-appb-000005
Figure PCTCN2016107111-appb-000005
Figure PCTCN2016107111-appb-000006
Figure PCTCN2016107111-appb-000006
方式3、用视角中心位置和视角的上下角度范围来描述细线框中的空间位置;用视角中心位置偏移和视角的上下角度范围偏移来描述粗线框中的空间位置。Mode 3, the spatial position of the thin line frame is described by the upper and lower angle ranges of the center position of the angle of view and the angle of view; the spatial position of the thick line frame is described by the offset of the center position of the angle of view and the range of the upper and lower angles of the angle of view.
Figure PCTCN2016107111-appb-000007
Figure PCTCN2016107111-appb-000007
在上述的所有实施例中空间位置信息都是用中心位置和视角的宽度和高度来描述,上述的中心位置也可以替换成左上的起点位置;In all of the above embodiments, the spatial position information is described by the width and height of the center position and the angle of view, and the above-mentioned center position may also be replaced with the upper left starting position;
上述的空间位置的角度位置方式可以以替换成在三个坐标轴x,y,z坐标下的空间位置。The angular position of the spatial position described above may be replaced by a spatial position at coordinates x, y, and z of the three coordinate axes.
图6是空间对象的空间关系示意图。其中,图像AS可设为一个内容成分,AS1、AS2、AS3和AS4为AS包含的4个空间对象,每个空间对象关联一个空间,MPD中描述了各个空间对象的空间关系,例如各个空间对象关联的空间之间的关系。Figure 6 is a schematic diagram of the spatial relationship of spatial objects. The image AS can be set as a content component, and AS1, AS2, AS3, and AS4 are four spatial objects included in the AS, and each spatial object is associated with a space. The spatial relationship of each spatial object is described in the MPD, for example, each spatial object. The relationship between the associated spaces.
MPD样例如下:MPD like the following:
Figure PCTCN2016107111-appb-000008
Figure PCTCN2016107111-appb-000008
Figure PCTCN2016107111-appb-000009
Figure PCTCN2016107111-appb-000009
Figure PCTCN2016107111-appb-000010
Figure PCTCN2016107111-appb-000010
其中,上述空间对象的左上坐标、空间对象的长宽和人空间对象参考的空间,也可以是相对值,比如:上述value="1,0,0,1920,1080,3840,2160,2"可以描述成value="1,0,0,1,1,2,2,2"。Wherein, the upper left coordinate of the spatial object, the length and width of the spatial object, and the space of the human space object reference may also be relative values, such as: the above value="1, 0, 0, 1920, 1080, 3840, 2160, 2" Can be described as value="1,0,0,1,1,2,2,2".
其中,上述空间对象的空间信息也可以偏航角信息,比如上述的value="1,0,0,0,150,110",表示对应的空间对象的空间信息是中心(或者左上角)为(0,0,0),视角的宽为150度,高为110度;该空间信息对应 的视角码流的空间对象可以完全包含该空间信息所描述的空间对象,也可以有一些偏差The spatial information of the spatial object may also be yaw angle information, such as the above value = "1, 0, 0, 0, 150, 110", indicating that the spatial information of the corresponding spatial object is the center (or upper left corner) is (0, 0) , 0), the width of the viewing angle is 150 degrees, and the height is 110 degrees; the spatial information corresponds to The spatial object of the view code stream can completely contain the spatial object described by the spatial information, and may also have some deviation
在一些可行的实施方式中,对于360度大视角的视频图像的输出,服务器可将360度的视角范围内的空间进行划分以得到多个空间对象,每个空间对象对应用户的一个子视角,多个子视角的拼接形成一个完整的人眼观察视角。其中,人眼观察视角的动态变化的,通常可为120度*120度。例如图7所述的框1对应的空间对象1和框2对应的空间对象1。服务器可为每个空间对象准备一组视频码流,具体的,服务器可获取视频中每个码流的编码配置参数,并根据码流的编码配置参数生成视频的各个空间对象对应的码流。客户端可在视频输出时向服务器请求某一时间段某个视角对应的视频码流分段并输出至该视角对应的空间对象。客户端在同一个时间段内输出360度的视角范围内的所有视角对应的视频码流分段,则可在整个360度的空间内输出显示该时间段内的完整视频图像。In some feasible implementation manners, for output of a 360-degree large-view video image, the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user, The splicing of multiple sub-views forms a complete human eye viewing angle. Among them, the dynamic change of the viewing angle of the human eye can usually be 120 degrees * 120 degrees. For example, the space object 1 corresponding to the frame 1 and the space object 1 corresponding to the frame 2 described in FIG. The server may prepare a set of video code streams for each spatial object. Specifically, the server may obtain encoding configuration parameters of each code stream in the video, and generate a code stream corresponding to each spatial object of the video according to the encoding configuration parameters of the code stream. The client may request the video stream segment corresponding to a certain angle of view for a certain period of time to be output to the spatial object corresponding to the perspective when the video is output. The client outputs the video stream segment corresponding to all the angles of view within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree space.
具体实现中,在360度的空间的划分中,服务器可首先将球面映射为平面,在平面上对空间进行划分。具体的,服务器可采用经纬度的映射方式将球面映射为经纬平面图。如图8,图8是本发明实施例提供的空间对象的示意图。服务器可将球面映射为经纬平面图,并将经纬平面图划分为A~I等多个空间对象。进一步的,服务器可也将球面映射为立方体,再将立方体的多个面进行展开得到平面图,或者将球面映射为其他多面体,在将多面体的多个面进行展开得到平面图等。服务器还可采用更多的映射方式将球面映射为平面,具体可根据实际应用场景需求确定,在此不做限制。下面将以经纬度的映射方式,结合图8进行说明。In a specific implementation, in the division of the space of 360 degrees, the server may first map the spherical surface into a plane, and divide the space on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner. FIG. 8 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I. Further, the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like. The server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 8 in a latitude and longitude mapping manner.
如图8,服务器可将球面的空间对象划分为A~I等多个空间对象之后,则可通过服务器为每个空间对象准备一组DASH码流。其中,每个空间对象对应一个子视角,每个空间对象对应的一组DASH码流为每个子视角的 视角码流。一个视角码流中每个图像所关联的空间对象的空间信息相同,由此可将视角码流设为静态视角码流。每个子视角的视角码流为整个视频码流的一部分,所有子视角的视角码流构成一个完整的视频码流。视频播放过程中,可根据用户当前观看的视角选择相应的空间对象对应的DASH码流进行播放。用户切换视频观看的视角时,客户端则可根据用户选择的新视角确定切换的目标空间对象对应的DASH码流,进而可将视频播放内容切换为目标空间对象对应的DASH码流。As shown in FIG. 8, after the server can divide the spherical space object into a plurality of spatial objects such as A to I, a set of DASH code streams can be prepared for each spatial object by the server. Wherein, each spatial object corresponds to one sub-view, and each set of DASH code streams corresponding to each spatial object is for each sub-view. Viewing stream. The spatial information of the spatial objects associated with each image in a view code stream is the same, whereby the view code stream can be set as a static view code stream. The view code stream of each sub-view is part of the entire video stream, and the view code streams of all sub-views constitute a complete video stream. During the video playback process, the DASH code stream corresponding to the corresponding spatial object may be selected for playing according to the viewing angle currently viewed by the user. When the user switches the view angle of the video view, the client can determine the DASH code stream corresponding to the target space object of the switch according to the new perspective selected by the user, and then switch the video play content to the DASH code stream corresponding to the target space object.
本发明实施例提供了一种基于HTTP动态自适应流媒体的视频数据的处理方法及装置,可节省视频数据的传输带宽资源,提高视频呈现的灵活性和适用性,增强视频观看的用户体验。The embodiment of the invention provides a method and a device for processing video data based on HTTP dynamic adaptive streaming media, which can save transmission bandwidth resources of video data, improve flexibility and applicability of video presentation, and enhance user experience of video viewing.
第一方面提供了一种基于HTTP动态自适应流媒体的视频数据的处理方法,其可包括:The first aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述至少两个表示中的第一表示是作者视角码流,所述作者视角码流中包含多个图像,所述多个图像中至少两个图像所关联的空间对象的空间信息不同,所述至少两个表示中的第二表示是静态视角码流,所述静态视角码流中包含多个图像,所述多个图像所关联的空间对象的空间信息相同;Receiving a media presentation description, the media presentation description includes information of at least two representations, a first representation of the at least two representations is an author view code stream, and the author view code stream includes a plurality of images, The spatial information of the spatial object associated with at least two of the plurality of images is different, the second representation of the at least two representations is a static view code stream, and the static view code stream includes a plurality of images, the plurality of The spatial information of the spatial objects associated with the images is the same;
得到指令信息;Get instruction information;
若所述指令信息是观看作者视角码流,则获取所述第一表示的分段,否则,获取所述第二表示的分段。If the instruction information is a viewing author view code stream, the segment of the first representation is acquired, otherwise, the segment of the second representation is obtained.
本发明实施例可在媒体呈现描述中描述作者视角码流和静态视角码流,其中,作者视角码流中包含的图像所关联的空间对象的空间信息可动态变化,静态视角码流中包含的图像所关联的空间对象的空间信息不变。本发明实施例可根据得到的指令信息从作者视角码流和镜头视角码流中选择相应的码流的分段,提高码流分段选择的灵活性,增强视频观看的用 户体验。本发明实施例从作者视角码流和静态视角码流中获取相应的码流的分段,无需获取所有分段,可节省视频数据的传输带宽资源,增强了数据处理的适用性。The embodiment of the present invention may describe the author view code stream and the static view code stream in the media presentation description, wherein the spatial information of the spatial object associated with the image included in the author view code stream may dynamically change, and the static view code stream includes The spatial information of the spatial object associated with the image does not change. According to the obtained instruction information, the embodiment may select a segment of the corresponding code stream from the author view code stream and the lens view code stream, thereby improving the flexibility of the stream stream segment selection and enhancing the video viewing. User experience. The embodiment of the invention obtains the segmentation of the corresponding code stream from the author view code stream and the static view code stream, and does not need to acquire all segments, which can save the transmission bandwidth resource of the video data and enhance the applicability of the data processing.
在一种可能的实现方式中,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识视频的作者视角码流。In a possible implementation manner, the media presentation description further includes identifier information, where the identifier information is used to identify an author view code stream of the video.
在一种可能的实现方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation manner, the media presentation description includes information of an adaptation set, and the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
在一种可能的实现方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a possible implementation manner, the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一种可能的实现方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In a possible implementation manner, the media presentation description includes information about a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
本发明实施例可在媒体程序描述中添加作者视角码流的标识信息,可提高作者视角码流的可识别性。本发明实施例还可在媒体呈现描述的自适应集的信息中,或者媒体呈现描述的表示的信息中,或者媒体呈现描述的描述子的信息中携带作者视角码流的标识信息,操作灵活,适用性高。The embodiment of the invention can add the identifier information of the author view code stream in the media program description, which can improve the recognizability of the author view code stream. The embodiment of the present invention may also carry the identification information of the author's view code stream in the information of the adaptive set of the media presentation description, or the information of the representation of the media presentation description, or the information of the description of the media presentation description, and the operation is flexible. High applicability.
在本发明实施例中,服务器在生成MPD时需要增加对应作者视角码流的语法元素,客户端可以根据该语法元素得到作者视角码流信息。服务器生成MPD时可在MPD中添加用于描述作者视角码流的表示,设为第一表示。MPD中现有的用于描述静态视角码流的表示可称为第二表示。几种可能的MPD语法元素的表示方式如下所示。可以理解的是,本发明实施例的MPD示例仅示出了本发明技术对现有标准中规定MPD的语法元素进行修改的相关部分,未示出MPD文件的全部语法元素,本领域普通技 术人员可以结合DASH标准中的相关规定运用本发明实施例的技术方案。In the embodiment of the present invention, the server needs to add a syntax element corresponding to the author view code stream when generating the MPD, and the client may obtain the author view code stream information according to the syntax element. When the server generates the MPD, a representation for describing the author's view code stream may be added to the MPD, and the first representation is set. A representation existing in the MPD for describing a static view stream may be referred to as a second representation. The representation of several possible MPD syntax elements is as follows. It can be understood that the MPD example of the embodiment of the present invention only shows the relevant part of the existing standard that modifies the syntax element of the MPD in the existing standard, and does not show all the syntax elements of the MPD file. The technical solution of the embodiment of the present invention can be applied by the operator in combination with the relevant provisions in the DASH standard.
在一种可能的实现方式中,在MPD中新增语法描述,如下表2,表2为新增语法元素的属性信息表:In a possible implementation manner, a new syntax description is added in the MPD, as shown in Table 2 below, and Table 2 is an attribute information table of the newly added syntax element:
表2Table 2
Figure PCTCN2016107111-appb-000011
Figure PCTCN2016107111-appb-000011
在MPD中通过属性@view_type来标记对应的representation是非作者视角(或称静态视角)码流还是作者视角(或称动态视角)码流。当view_type值为0时,表示对应的representation是非作者视角码流;当view_type值为1时,表示对应的representation是作者视角码流。客户端在本地解析MPD文件时,可根据该属性来判断当前视频流中是否包含作者视角码流。下面将通过一些可行的实现方式对应的MPD样例进行说明:In the MPD, the attribute @view_type is used to mark whether the corresponding representation is a non-author view (or static view) code stream or an author view (or dynamic view) code stream. When the view_type value is 0, it indicates that the corresponding representation is a non-author view code stream; when the view_type value is 1, it indicates that the corresponding representation is the author view code stream. When the client parses the MPD file locally, it can determine whether the current video stream contains the author view stream according to the attribute. The following is a description of the MPD examples corresponding to some possible implementations:
样例一:描述在MPD描述子中Example 1: Described in the MPD descriptor
Figure PCTCN2016107111-appb-000012
Figure PCTCN2016107111-appb-000012
Figure PCTCN2016107111-appb-000013
Figure PCTCN2016107111-appb-000013
如上所示,在该样例中,服务器可在现有MPD语法的EssentialProperty包含value属性的第二个值的位置插入一个新的值,原有value的第二个值以及第二个值以后的值依次往后挪一个值。客户端解析MPD之后则可获取得到value的第二个值。即在该样例中,value的第二个值是view_type。 EssentialProperty中的value=“0,0,…”,即value的第二个为0(即view_type=0)表示EssentialProperty描述的是固定视角码流;value=“0,1”,即value的第二个为1(即view_type=1)表示EssentialProperty描述的是作者视角码流,即view_type=0。As shown above, in this example, the server can insert a new value at the position where the EssentialProperty of the existing MPD syntax contains the second value of the value attribute, the second value of the original value, and the second value. The value is followed by a value. After the client parses the MPD, it can obtain the second value of the value. That is, in this example, the second value of value is view_type. Value=“0,0,...” in EssentialProperty, that is, the second value of value is 0 (ie view_type=0), indicating that EssentialProperty describes a fixed view stream; value=“0,1”, which is the second value. The value of 1 (ie, view_type=1) indicates that the EssentialProperty describes the author view stream, ie view_type=0.
样例二:描述在表示中Example 2: Described in the representation
Figure PCTCN2016107111-appb-000014
Figure PCTCN2016107111-appb-000014
Figure PCTCN2016107111-appb-000015
Figure PCTCN2016107111-appb-000015
在该样例中,Representation的属性信息中新增了语法元素view_type。当view_type=“0”或不设定时(默认是0),表示该Representation描述的码流为固定视角码流;当view_type=“1”时,表示该Representation描述的码流为作者视角码流。In this example, the syntax element view_type has been added to the property information of Representation. When view_type=“0” or not set (default is 0), it indicates that the code stream described by the Representation is a fixed view code stream; when view_type=“1”, it indicates that the code stream described by the Representation is the author view code stream. .
样例三:描述在自适应集(adaptationSet)的属性信息中Example 3: Described in the attribute information of the adaptation set (adaptationSet)
Figure PCTCN2016107111-appb-000016
Figure PCTCN2016107111-appb-000016
Figure PCTCN2016107111-appb-000017
Figure PCTCN2016107111-appb-000017
该样例中,在AdaptationSet的属性信息(即作者视角码流所在码流集合的属性信息)中新增语法元素view_type。当view_type=“0”或不设定时(默认是0),表示该AdaptationSet下包含的码流为固定视角码流;当view_type=“1”时,表示AdaptationSet下包含的码流为作者视角码流。In this example, the syntax element view_type is added to the property information of the AdaptationSet (ie, the attribute information of the code stream set where the author's view stream is located). When view_type=“0” or not set (default is 0), it indicates that the code stream contained in the AdaptationSet is a fixed view stream; when view_type=“1”, it indicates that the code stream included in the AdaptationSet is the author view code. flow.
在上述所有的MPD样例中,也可以添加空间信息的独立文件的描述信息,比如增加一个adaptation set,在该adaptationset中描述空间信息文件的信息;In all of the above MPD examples, description information of the independent file of the spatial information may also be added, such as adding an adaptation set, and describing the information of the spatial information file in the adaptation set;
Figure PCTCN2016107111-appb-000018
Figure PCTCN2016107111-appb-000018
在一种可能的实施方式中,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息;In a possible implementation, the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
所述获取所述第一表示的分段之后,所述方法还包括: After the segmentation of the first representation is obtained, the method further includes:
解析所述第一表示的分段,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the segment of the first representation, and acquiring spatial information of a spatial object associated with the image included in the segment of the first representation.
在一种可能的实施方式中,所述图像所关联的空间对象的空间信息为所述空间对象与其关联的内容成分的空间关系。In a possible implementation manner, the spatial information of the spatial object associated with the image is a spatial relationship of the spatial component and the content component associated with the spatial object.
在一种可能的实施方式中,所述空间信息携带在所述第一表示的分段中的指定box中,或者和第一表示的分段相关联的元数据表达中的指定box中。In a possible implementation manner, the spatial information is carried in a designated box in the segment of the first representation or in a specified box in a metadata representation associated with the segment of the first representation.
在一种可能的实施方式中,所述指定box为所述第一表示的分段中包含的trun box,所述trun box用于描述一个轨迹的一组连续样本。In a possible implementation manner, the specified box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
本发明实施例可在作者视角码流(具体可为作者视角码流中的分段)添加作者视角码流所包含的图像所关联的空间对象的空间信息,以供客户端根据作者视角码流的分段包含的图像所关联的空间对象的空间信息进行作者视角码流的分段切换,或者作者视角码流与静态视角码流的切换,提高了码流切换的适用性,增强客户端的用户体验。The embodiment of the present invention may add spatial information of a spatial object associated with an image included in an author view code stream in an author view code stream (specifically, a segment in an author view code stream), for the client to view the code stream according to the author view. The spatial information of the spatial object associated with the image contained in the segmentation performs segmentation switching of the author view code stream, or switching between the author view code stream and the static view code stream, thereby improving the applicability of code stream switching and enhancing the client user. Experience.
在本发明实施例中,服务器还可在作者视角码流中添加一个或者多个作者空间对象的空间信息。针对于DASH码流,服务器可在现有文件格式中的trun box中增加上述空间信息,用于描述作者视角码流的每一帧图像所关联的空间对象的空间信息。In the embodiment of the present invention, the server may further add spatial information of one or more author space objects in the author view code stream. For the DASH code stream, the server may add the above spatial information to the trun box in the existing file format for describing the spatial information of the spatial object associated with each frame image of the author view code stream.
添加描述的样例(样例一);Add a description of the sample (Sample 1);
Figure PCTCN2016107111-appb-000019
Figure PCTCN2016107111-appb-000019
Figure PCTCN2016107111-appb-000020
Figure PCTCN2016107111-appb-000020
在该样例中,服务器可在现有的trun box中添加语法元素tr_flags,并将tr_flags的值设定为0x001000,用于标记预设空间对象在全局空间对象中的相对位置的空间信息包含在trun box中。In this example, the server may add the syntax element tr_flags to the existing trun box and set the value of tr_flags to 0x001000, and the spatial information for marking the relative position of the preset space object in the global space object is included in Trun box.
上述trun box中采用偏航角方式描述,如center_pitch、center_yaw、center_roll、pitch_h和yaw_w来描述空间信息在球面中的中心位置(center_pitch、center_yaw、center_roll),高度(pitch_h),宽度yaw_w的偏航角。如图11,图11是作者空间对象在全景空间中的相对位置的示意图。在图11中,O点为360度VR全景视频球面图像对应的球心,可认为是观看VR全景图像时人眼的位置。A点为作者视角图像中心点,C、F为作者视角图像中过A点的沿该图像横向坐标轴的边界点,E、D为作者视角图像中过A点的沿该图像纵向坐标轴的边界点,B为A点沿球面经线在赤道线的投影点,I为赤道线上水平方向的起始坐标点。各个元素的含 义解释如下:The trun box described above uses yaw angle descriptions such as center_pitch, center_yaw, center_roll, pitch_h, and yaw_w to describe the center position (center_pitch, center_yaw, center_roll), height (pitch_h), width yaw_w yaw angle of the spatial information in the sphere. . Figure 11 is a schematic illustration of the relative positions of author space objects in a panoramic space. In FIG. 11, the point O is the center of the sphere corresponding to the 360-degree VR panoramic video spherical image, and can be regarded as the position of the human eye when viewing the VR panoramic image. Point A is the center point of the author's perspective image, C and F are the boundary points of the A-point of the author's perspective image along the horizontal coordinate axis of the image, and E and D are the points along the longitudinal axis of the image in the image of the author's perspective. Boundary point, B is the projection point of point A along the spherical meridian on the equator line, and I is the starting coordinate point in the horizontal direction on the equator line. Containment of each element The meaning is explained as follows:
center_pitch:作者空间对象的图像的中心位置映射到全景球面(即全局空间)图像上的点的竖直方向的偏转角,如图11中的∠AOB;Center_pitch: the center position of the image of the author space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ∠AOB in FIG. 11;
center_yaw:作者空间对象的图像的中心位置映射到全景球面图像上的点的水平方向的偏转角,如图11中的∠IOB;Center_yaw: the center position of the image of the author space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, as shown in FIG. 11 ∠ IOB;
center_roll:作者空间对象的图像的中心位置映射到全景球面图像上的点与球心连线方向的旋转角,如图11中的∠DOB;Center_roll: the center position of the image of the author space object is mapped to the rotation angle of the point on the panoramic spherical image and the direction of the connection of the spherical center, as shown in FIG. 11 ∠ DOB;
pitch_h:作者空间对象的图像在全景球面图像的视场高度,以视场纵向最大角度表示,如图11中∠DOE;yaw_w:作者空间对象的图像在全景球面图像的视场宽度,以视场横向最大角度表示,如图11中∠COF。Pitch_h: the image of the author's spatial object in the field of view of the panoramic spherical image, expressed as the maximum angle of the field of view, as shown in Figure 11 ∠ DOE; yaw_w: the image of the author's spatial object in the field of view of the panoramic spherical image, with the field of view The horizontal maximum angle is expressed as ∠COF in Figure 11.
在本发明的一种实现方式中,trun box中可以不包括unsigned int(16)In an implementation manner of the present invention, the unsigned int (16) may not be included in the trun box.
center_roll;//fov的中心位置roll信息。Center_roll; / / fov center position roll information.
在本发明的一种实现方式中,flag标识和空间位置信息也可以不存在于同一个box中。例如,flag可以存在于与空间位置信息所在的box的上一级box中,比如在新增的box及其语法描述信息中。一个样例如下(样例二),其中strp box是描述空间信息的box,该box的上一级box是stbl(或者其他),那么在stbl中增加一个flag描述,如果flag描述包含strp box,那么在stbl中可以解析得到strp box。In an implementation manner of the present invention, the flag identifier and the spatial location information may not exist in the same box. For example, the flag may exist in the upper level box of the box where the spatial location information is located, such as in the newly added box and its syntax description information. For example, the following (sample 2), where strp box is a box describing spatial information, the box above the box is stbl (or other), then add a flag description in stbl, if the flag description contains strp box, Then the strp box can be parsed in stbl.
在一些可行的实施方式中,服务器端也可在视频格式中添加一个新的box及其语法描述,用于描述作者空间对象的空间信息。其中,上述新增的box及其语法描述信息样例如下(样例二):In some possible implementations, the server may also add a new box and its syntax description to the video format for describing the spatial information of the author space object. Among them, the above-mentioned new box and its syntax description information are as follows (example 2):
Figure PCTCN2016107111-appb-000021
Figure PCTCN2016107111-appb-000021
Figure PCTCN2016107111-appb-000022
Figure PCTCN2016107111-appb-000022
上述strp box包含的信息为新增的作者空间对象的空间信息,其中包含的各个语法元素的含义与上述样例一中包含的各个语法元素的含义相同。具体实现中,在该样例中Box里的“unsigned int(16)center_roll;//fov的中心位置roll”可以不存在,具体可根据实际应用场景需求确定,在此不做限制。The information contained in the strp box is the spatial information of the newly added author space object, and the meaning of each syntax element included is the same as the meaning of each syntax element included in the above example 1. In the specific implementation, the "unsigned int(16)center_roll;//fov center position roll" in the box may not exist in the example, and may be determined according to actual application scenario requirements, and is not limited herein.
上述的strp box可以包含在stbl box中,stbl box中的flag标记为存在空间位置信息strp box,具体的strp box的上一级box是什么可根据实际应用场景需求确定,在此不做限制。The above-mentioned strp box can be included in the stbl box, and the flag in the stbl box is marked as the spatial location information strp box. The upper box of the specific strp box can be determined according to the actual application scenario requirements, and no limitation is imposed here.
上述的strp box可以描述在DASH的分段的元数据(traf)中,也可以是基于ISOBMAF格式封装的一个轨迹(track)的元数据(matadata)中The strp box described above may be described in the metadata (traf) of the segmentation of the DASH, or may be in the metadata (matadata) of a track based on the ISOBMAF format package.
上述的strp box也可以被包含在作者视角码流相关联的一个独立的元数据码流或者轨迹中,在元数据码流或者轨迹中包含的样本(sample)是作者视角码流相关的空间信息;The above strp box may also be included in a separate metadata stream or track associated with the author's view stream, and the sample contained in the metadata stream or track is spatial information related to the author's view stream. ;
本发明中的空间信息可以是视角的中心位置偏航角和宽高;或者也可以表示为视角左上的偏航角和右下的偏航角。The spatial information in the present invention may be the yaw angle and the width of the center of the angle of view; or may be expressed as the yaw angle at the upper left of the angle of view and the yaw angle at the lower right.
在本发明的一个实施例中,在码流的参数集中扩展一个flag,该flag值为1,表示在每一帧的码流数据中包含有当前帧的空间位置信息; In an embodiment of the present invention, a flag is extended in a parameter set of the code stream, and the flag value is 1, indicating that the spatial position information of the current frame is included in the code stream data of each frame;
上述的flag也可以描述在视频参数集(video_parameter_set,VPS),序列参数集(sequence_parameter_set,SPS)或者图像参数集(picture_parameter_set,PPS)中,具体语法如下,如果roi_extension_flag=1,表示在每一帧的码流数据中包含有当前帧的空间位置信息。在本发明的一种实现方式中,描述空间位置信息的属性的语义与trun box,strp box中的语法语义一致。The above flag may also be described in a video parameter set (video_parameter_set, VPS), a sequence parameter set (sequence_parameter_set, SPS) or an image parameter set (picture_parameter_set, PPS), and the specific syntax is as follows, if roi_extension_flag=1, indicating in each frame The code stream data contains spatial position information of the current frame. In one implementation of the invention, the semantics of the attributes describing the spatial location information are consistent with the syntax semantics of the trun box, strp box.
Figure PCTCN2016107111-appb-000023
Figure PCTCN2016107111-appb-000023
描述方式二:在pps中描述空间位置信息Description Method 2: Describe the spatial location information in pps
Figure PCTCN2016107111-appb-000024
Figure PCTCN2016107111-appb-000024
描述方式三:在pps中描述空间位置信息Description 3: Describe the spatial location information in pps
Figure PCTCN2016107111-appb-000025
Figure PCTCN2016107111-appb-000025
Figure PCTCN2016107111-appb-000026
Figure PCTCN2016107111-appb-000026
描述方式四:在pps中描述空间位置信息Description Method 4: Describe the spatial location information in pps
Figure PCTCN2016107111-appb-000027
Figure PCTCN2016107111-appb-000027
描述方式五:在pps中描述空间位置信息。 Description Method 5: Describe the spatial location information in pps.
Figure PCTCN2016107111-appb-000028
Figure PCTCN2016107111-appb-000028
在本发明的另一种实现方式中,将空间位置信息封装在SEI(Supplemental enhancement information)中In another implementation manner of the present invention, the spatial location information is encapsulated in SEI (Supplemental enhancement information)
Figure PCTCN2016107111-appb-000029
Figure PCTCN2016107111-appb-000029
上述语法中的ROI表示一个具体取值,比如190,这里不作限定。在本发明的一种实现方式中,ROI中与的语义和trun box,strp box中的语法语义一致。The ROI in the above grammar indicates a specific value, such as 190, which is not limited herein. In an implementation manner of the present invention, the semantics of the ROI and the syntax of the trun box and the strp box are consistent.
ROI_payload(payloadSize)的描述方法一:ROI_payload(payloadSize) description method one:
Figure PCTCN2016107111-appb-000030
Figure PCTCN2016107111-appb-000030
Figure PCTCN2016107111-appb-000031
Figure PCTCN2016107111-appb-000031
ROI_payload(payloadSize)的描述方法二:ROI_payload(payloadSize) description method two:
Figure PCTCN2016107111-appb-000032
Figure PCTCN2016107111-appb-000032
ROI_payload(payloadSize)的描述方法三:ROI_payload(payloadSize) description method three:
Figure PCTCN2016107111-appb-000033
Figure PCTCN2016107111-appb-000033
ROI_payload(payloadSize)的描述方法四:ROI_payload(payloadSize) description method four:
Figure PCTCN2016107111-appb-000034
Figure PCTCN2016107111-appb-000034
在上述的实施例中,空间信息采用中心点位置偏航角(center_pitch和center_yaw)或者中心位置偏航角偏移(center_pitch_offset和center_yaw_offset)和空间位置宽度角(pitch_w)和高度角(pitch_h)来描述,在实际应用中也可以使用空间对象的左上位置的偏航角和右下位置的偏航角来描述。In the above embodiment, the spatial information is described by the center point position yaw angle (center_pitch and center_yaw) or the center position yaw angle offset (center_pitch_offset and center_yaw_offset) and the spatial position width angle (pitch_w) and the elevation angle (pitch_h). In practical applications, the yaw angle of the upper left position of the space object and the yaw angle of the lower right position can also be used to describe.
第二方面提供了一种基于HTTP动态自适应流媒体的视频数据的处理装置,其可包括:The second aspect provides a processing device for video data based on HTTP dynamic adaptive streaming, which may include:
接收模块,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述至少两个表示中的第一表示是作者视角码流,所述作者视角码流中包含多个图像,所述多个图像中至少两个图像所关联的空间对象的空间信息不同;所述至少两个表示中的第二表示是静态视角码流,所述静态视角码流中包含多个图像,所述多个图像所关联的空间对象的空间信息相同;a receiving module, configured to receive a media presentation description, where the media presentation description includes at least two representations, the first representation of the at least two representations is an author view code stream, and the author view code stream includes multiple The spatial information of the spatial object associated with at least two of the plurality of images is different; the second representation of the at least two representations is a static view code stream, and the static view code stream includes multiple An image, wherein spatial information of the spatial objects associated with the plurality of images is the same;
获取模块,用于得到指令信息;Obtaining a module for obtaining instruction information;
所述获取模块,还用于在得到的所述指令信息是观看作者视角码流 时,获取所述第一表示的分段,否则,获取所述第二表示的分段。The obtaining module is further configured to: when the obtained instruction information is a viewing author view code stream And acquiring the segment of the first representation, otherwise acquiring the segment of the second representation.
在一种可能的实现方式中,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识视频的作者视角码流。In a possible implementation manner, the media presentation description further includes identifier information, where the identifier information is used to identify an author view code stream of the video.
在一种可能的实现方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation manner, the media presentation description includes information of an adaptation set, and the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
在一种可能的实现方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a possible implementation manner, the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一种可能的实现方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In a possible implementation manner, the media presentation description includes information about a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
在一种可能的实现方式中,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息;In a possible implementation, the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
所述获取模块还用于:The obtaining module is further configured to:
解析所述第一表示的分段,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the segment of the first representation, and acquiring spatial information of a spatial object associated with the image included in the segment of the first representation.
在一种可能的实现方式中,所述图像所关联的空间对象的空间信息为所述空间对象与其关联的内容成分的空间关系。In a possible implementation manner, the spatial information of the spatial object associated with the image is a spatial relationship of the spatial component and its associated content component.
在一种可能的实现方式中,所述空间信息携带在所述第一表示的分段中的指定box中,或者和第一表示的分段相关联的元数据表达中的指定box中。In a possible implementation, the spatial information is carried in a specified box in the segment of the first representation, or in a specified box in a metadata representation associated with the segment of the first representation.
在一种可能的实现方式中,所述指定box为所述第一表示的分段中包含的trun box,所述trun box用于描述一个轨迹的一组连续样本。In a possible implementation manner, the specified box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
可以理解的是,本发明装置实施例的具体实现方式,可以采用上述方 法实施例对应的实现方式,在此不再赘述。It can be understood that the specific implementation manner of the device embodiment of the present invention may adopt the foregoing method. The implementation manners of the method embodiments are not described herein again.
本发明实施例可在媒体呈现描述中描述作者视角码流和静态视角码流,其中,作者视角码流中包含的图像所关联的空间对象的空间信息可动态变化,静态视角码流中包含的图像所关联的空间对象的空间信息不变。本发明实施例可根据得到的指令信息从作者视角码流和镜头视角码流中选择相应的码流的分段,提高码流分段选择的灵活性,增强视频观看的用户体验。本发明实施例从作者视角码流和静态视角码流中获取相应的码流的分段,无需获取所有分段,可节省视频数据的传输带宽资源,增强了数据处理的适用性。本发明实施例可在作者视角码流(具体可为作者视角码流中的分段)添加作者视角码流所包含的图像所关联的空间对象的空间信息,以供客户端根据作者视角码流的分段包含的图像所关联的空间对象的空间信息进行作者视角码流的分段切换,或者作者视角码流与静态视角码流的切换,提高了码流切换的适用性,增强客户端的用户体验。The embodiment of the present invention may describe the author view code stream and the static view code stream in the media presentation description, wherein the spatial information of the spatial object associated with the image included in the author view code stream may dynamically change, and the static view code stream includes The spatial information of the spatial object associated with the image does not change. The embodiment of the present invention can select a segment of the corresponding code stream from the author view code stream and the lens view code stream according to the obtained instruction information, improve the flexibility of the code stream segment selection, and enhance the user experience of the video view. The embodiment of the invention obtains the segmentation of the corresponding code stream from the author view code stream and the static view code stream, and does not need to acquire all segments, which can save the transmission bandwidth resource of the video data and enhance the applicability of the data processing. The embodiment of the present invention may add spatial information of a spatial object associated with an image included in an author view code stream in an author view code stream (specifically, a segment in an author view code stream), for the client to view the code stream according to the author view. The spatial information of the spatial object associated with the image contained in the segmentation performs segmentation switching of the author view code stream, or switching between the author view code stream and the static view code stream, thereby improving the applicability of code stream switching and enhancing the client user. Experience.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1是系统层视频流媒体传输采用的DASH标准传输的框架实例示意图;1 is a schematic diagram of an example of a framework for DASH standard transmission used in system layer video streaming media transmission;
图2是系统层视频流媒体传输采用的DASH标准传输的MPD的结构示意图;2 is a schematic structural diagram of an MPD transmitted by a DASH standard used for system layer video streaming media transmission;
图3是本发明实施例提供的码流分段的切换的一示意图;FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention; FIG.
图4是码流数据中的分段存储方式的一示意图;4 is a schematic diagram of a segmentation storage manner in code stream data;
图5是码流数据中的分段存储方式的另一示意图; 5 is another schematic diagram of a segmentation storage manner in code stream data;
图6是空间对象的空间关系的一示意图;Figure 6 is a schematic diagram of the spatial relationship of spatial objects;
图7是视角变化对应的视角示意图;7 is a schematic view of a perspective corresponding to a change in viewing angle;
图8是空间对象的空间关系的另一示意图;Figure 8 is another schematic diagram of the spatial relationship of spatial objects;
图9是本发明实施例提供的基于HTTP动态自适应流媒体的视频数据的处理方法的流程示意图;9 is a schematic flowchart of a method for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention;
图10是空间对象的空间关系的另一示意图;Figure 10 is another schematic diagram of the spatial relationship of spatial objects;
图11是作者空间对象在全景空间中的相对位置的示意图;Figure 11 is a schematic illustration of the relative positions of author space objects in a panoramic space;
图12是本发明实施例提供的基于HTTP动态自适应流媒体的视频数据的处理装置的结构示意图。FIG. 12 is a schematic structural diagram of a device for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
当前以客户端为主导的系统层视频流媒体传输方案可采用DASH标准框架,如图1,图1是系统层视频流媒体传输采用的DASH标准传输的框架实例示意图。系统层视频流媒体传输方案的数据传输过程包括两个过程:服务器端(如HTTP服务器,媒体内容准备服务器,以下简称服务器)为视频内容生成媒体数据,响应客户端请求的过程,和客户端(如HTTP流媒体客户端)向服务器请求并获取媒体数据的过程。其中,上述媒体数据包括媒体呈现描述(英文:Media Presentation Description,MPD)和媒体码流。服务器上的MPD中包括多个表示(也称呈现,英文:representation),每个表示描述多个分段。客户端的HTTP流媒体请求控制模块获取服务器发送的MPD,并对MPD进行分析,确定MPD中描述的视频码流的各个分段的信息,进而可确定要请求的分段,向服务器发送相应的分段的HTTP请求,并通过媒体播放器进行解码播放。 The current client-side system layer video streaming media transmission scheme can adopt the DASH standard framework, as shown in FIG. 1. FIG. 1 is a schematic diagram of a frame example of DASH standard transmission used in system layer video streaming media transmission. The data transmission process of the system layer video streaming media transmission scheme includes two processes: a server side (such as an HTTP server, a media content preparation server, hereinafter referred to as a server) generates media data for video content, responds to a client request process, and a client ( The process of requesting and obtaining media data from a server, such as an HTTP streaming client. The media data includes a media presentation description (MPD) and a media stream. The MPD on the server includes a plurality of representations (also called presentations, English: representation), each representation describing a plurality of segments. The HTTP streaming request control module of the client obtains the MPD sent by the server, analyzes the MPD, determines the information of each segment of the video code stream described in the MPD, and further determines the segment to be requested, and sends the corresponding segment to the server. The segment's HTTP request is decoded and played through the media player.
1)在上述服务器为视频内容生成媒体数据的过程中,服务器为视频内容生成的媒体数据包括对应同一视频内容的不同版本的视频码流,以及码流的MPD。例如,服务器为同一集电视剧的视频内容生成低分辨率低码率低帧率(如360p分辨率、300kbps码率、15fps帧率)的码流,中分辨率中码率高帧率(如720p分辨率、1200kbps码率、25fps帧率)的码流,高分辨率高码率高帧率(如1080p分辨率、3000kbps码率、25fps帧率)的码流等。1) In the process in which the server generates media data for video content, the media data generated by the server for the video content includes a video stream corresponding to different versions of the same video content, and an MPD of the code stream. For example, the server generates a low-resolution low-rate low frame rate (such as 360p resolution, 300kbps code rate, 15fps frame rate) for the video content of the same episode, and a medium-rate medium-rate high frame rate (such as 720p). Resolution, 1200 kbps, 25 fps frame rate, high resolution, high bit rate, high frame rate (such as 1080p resolution, 3000 kbps, 25 fps frame rate).
此外,服务器还可为该集电视剧的视频内容生成MPD。其中,如图2,图2是系统传输方案DASH标准的MPD的结构示意图。上述码流的MPD包含多个时期(Period),例如,图2的MPD)中的period start=100s部分可包含多个自适应集(英文:adaptation set),每个adaptation set可包含Representation1、Representation2,…等多个表示。每个表示描述码流的一个或者多个分段。In addition, the server can generate MPDs for the video content of the episode. 2, FIG. 2 is a schematic structural diagram of an MPD of a system transmission scheme DASH standard. The MPD of the above code stream includes a plurality of periods (Period), for example, the period start=100s part in the MPD of FIG. 2 may include a plurality of adaptation sets (English: adaptation set), and each adaptation set may include Representation1, Representation2 , ... and so on. Each represents one or more segments that describe the code stream.
在本发明的一个实施例中,每个表示按照时序描述若干个分段(英文:Segment)的信息,例如初始化分段(英文:Initialization segment)、媒体分段(Media Segment)1、Media Segment2,…,Media Segment20等。表示中可以包括播放起始时刻、播放持续时长、网络存储地址(例如以统一资源定位符(英文:Universal Resource Locator,URL)的形式表示的网络存储地址)等分段信息。In one embodiment of the present invention, each of the information indicating a plurality of segments (English: Segment) is described in time series, for example, Initialization Segment, Media Segment 1, Media Segment2, ..., Media Segment20, etc. The representation may include segmentation information such as a playback start time, a playback duration, and a network storage address (for example, a network storage address expressed in the form of a Uniform Resource Locator (URL)).
2)在客户端向服务器请求并获取媒体数据的过程中,用户选择播放视频时,客户端根据用户点播的视频内容向服务器获取相应的MPD。客户端根据MPD中描述的码流分段的网络存储地址,向服务器发送下载网络存储地址对应的码流分段的请求,服务器根据接收到的请求向客户端发送码流分段。客户端获取得到服务器发送的码流分段之后,则可通过媒体播放器进行解码、播放等操作。2) In the process of the client requesting and obtaining the media data from the server, when the user selects to play the video, the client obtains the corresponding MPD according to the video content requested by the user to the server. The client sends a request for downloading the code stream segment corresponding to the network storage address to the server according to the network storage address of the code stream segment described in the MPD, and the server sends the code stream segment to the client according to the received request. After the client obtains the stream segment sent by the server, it can perform decoding, playback, and the like through the media player.
系统层视频流媒体传输方案采用DASH标准,通过客户端分析MPD、按需向服务器请求视频数据并接收服务器发送的数据的方式实现视频数据的传输。The system layer video streaming media transmission scheme adopts the DASH standard, and realizes the transmission of video data by analyzing the MPD by the client, requesting the video data to the server as needed, and receiving the data sent by the server.
参见图3,是本发明实施例提供的码流分段的切换的一示意图。服务 器可为同一个视频内容(比如一部电影)准备三个不同版本的码流数据,并在MPD中使用三个Representation对上述三个不同版本的码流数据进行描述。其中,上述三个Representation(以下简称rep)可假设为rep1、rep2和rep3等。其中,rep1是码率为4mbps(每秒兆比特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的普通视频。每个rep的segment包含一个时间段内的视频码流,同一个时间段内,不同的rep包含的segment相互对齐。即,每个rep按照时序描述每个时间段的segment,并且相同时段的segment长度相同,进而可实现不同rep上的segment的内容切换。如图,图中标记为阴影的分段是客户端请求播放的分段数据,其中,客户端请求的前3个分段是rep3的分段,客户端请求第4个分段时可请求rep2中的第4个分段,进而可在rep3的第3个segment播放结束之后切换到rep2的第4个分段上播放。Rep3的第3个segment的播放终止点(对应到时间上可为播放结束时刻)即为第4个segment的播放起始点(对应到时间上可为播放起始时刻),同时也是rep2或者rep1的第4个segment的播放起始点,实现不同rep上的segment的对齐。客户端请求rep2的第4个分段之后切换到rep1,请求rep1的第5个分段和第6个分段等。随后可切换至rep3上,请求rep3的第7个分段,再切换到rep1上,请求rep1的第8个分段。每个rep的segment可以首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。具体可根据实际应用场景需求确定,在此不做限制。FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention. Service Three different versions of code stream data can be prepared for the same video content (such as a movie), and three different versions of the code stream data are described in the MPD using three Representations. The above three Representations (hereinafter referred to as rep) can be assumed to be rep1, rep2, rep3, and the like. Among them, rep1 is a high-definition video with a code rate of 4mbps (megabits per second), rep2 is a standard-definition video with a code rate of 2mbps, and rep3 is a normal video with a code rate of 1mbps. Each rep segment contains a video stream within a time period. During the same time period, different rep-containing segments are aligned with each other. That is, each rep describes the segments of each time segment according to the time series, and the segment lengths of the same time period are the same, thereby enabling content switching of segments on different reps. As shown in the figure, the segment marked as shadow in the figure is the segmentation data requested by the client, wherein the first 3 segments requested by the client are segments of rep3, and the client may request rep2 when requesting the 4th segment. The fourth segment in the middle can be switched to play on the fourth segment of rep2 after the end of the third segment of rep3. The playback end point of the third segment of Rep3 (corresponding to the time end of the playback time) is the playback start point of the fourth segment (corresponding to the time start time of playback), and also rep2 or rep1. The playback start point of the 4th segment is used to achieve alignment of segments on different reps. After the client requests the 4th segment of rep2, it switches to rep1, requests the 5th segment and the 6th segment of rep1, and so on. Then you can switch to rep3, request the 7th segment of rep3, then switch to rep1, request the 8th segment of rep1. Each rep segment can be stored in a file end to end, or it can be stored as a small file. The segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS). It can be determined according to the requirements of the actual application scenario, and no limitation is imposed here.
在DASH媒体文件格式中提到,上述segment有两种存储方式:一种是每个segment分开独立存储,如图4,图4是码流数据中的分段存储方式的一示意图;另一种是同一个rep上的所有segment均存储在一个文件中,如图5,图5是码流数据中的分段存储方式的另一示意图。如图4,repA的segment中每个segment单独存储为一个文件,repB的segment中每个segment也单独存储为一个文件。对应的,图4所示的存储方式,服务器可在码流的MPD中可采用模板的形式或者列表的形式描述每个segment 的URL等信息。如图5,rep1的segment中所有segment存储为一个文件,rep2的segment中所有segment存储为一个文件。对应的,图5所示的存储方法,服务器可在码流的MPD中采用一个索引分段(英文:index segment,也就是图5中的sidx)来描述每个segment的相关信息。索引分段描述了每个segment在其所存储的文件中的字节偏移,每个segment大小以及每个segment持续时间(duration,也称每个segment的时长)等信息。As mentioned in the DASH media file format, the above segment has two storage modes: one is that each segment is separately stored separately, as shown in FIG. 4, and FIG. 4 is a schematic diagram of a segment storage mode in the code stream data; All the segments on the same rep are stored in one file, as shown in Figure 5. Figure 5 is another schematic diagram of the segmentation storage mode in the code stream data. As shown in Figure 4, each segment in the segment of repA is stored as a file separately, and each segment in the segment of repB is also stored as a file separately. Correspondingly, in the storage mode shown in FIG. 4, the server may describe each segment in the form of a template or a list in the MPD of the code stream. URL and other information. As shown in Figure 5, all segments in the segment of rep1 are stored as one file, and all segments in the segment of rep2 are stored as one file. Correspondingly, in the storage method shown in FIG. 5, the server may use an index segment (English: index segment, that is, sidx in FIG. 5) in the MPD of the code stream to describe related information of each segment. The index segment describes the byte offset of each segment in its stored file, the size of each segment, and the duration of each segment (duration, also known as the duration of each segment).
当前随着360度视频等VR视频的观看应用的日益普及,越来越多的用户加入到大视角的VR视频观看的体验队伍中。这种新的视频观看应用给用户带来了新的视频观看模式和视觉体验的同时,也带来了新的技术挑战。由于360度(本发明实施例将以360度为例进行说明)等大视角的视频观看过程中,VR视频的空间区域为360度的全景空间(或称全方位空间),超过了人眼正常的视觉范围,因此,用户在观看视频的过程中随时都会变换观看的角度(即视角,FOV)。用户观看的视角不同,看到的视频图像也将不同,故此视频呈现的内容需要随着用户的视角变化而变化。如图7,图7是视角变化对应的视角示意图。框1和框2分别为用户的两个不同的视角。用户在观看视频的过程中,可通过眼部或者头部转动,或者视频观看设备的画面切换等操作,将视频观看的视角由框1切换到框2。其中,用户的视角为框1时观看的视频图像为该视角对应的一个或者多个空间对象在该时刻所呈现的视频图像。下一个时刻用户的视角切换为框2,此时用户观看到的视频图像也应该切换为框2对应的空间对象在该时刻所呈现视频图像。With the increasing popularity of VR video viewing applications such as 360-degree video, more and more users are joining the VR video viewing experience team with large viewing angles. This new video viewing application brings new video viewing modes and visual experiences to users while also bringing new technical challenges. In the video viewing process of a large viewing angle such as 360 degrees (the embodiment of the present invention will be described by taking 360 degrees as an example), the spatial area of the VR video is a 360-degree panoramic space (or omnidirectional space), which exceeds the normal of the human eye. The visual range, therefore, the user will change the viewing angle (ie, the angle of view, FOV) at any time while watching the video. The viewing angle of the user is different, and the video images seen will also be different, so the content presented by the video needs to change as the user's perspective changes. FIG. 7 is a schematic diagram of a perspective corresponding to a change in viewing angle. Box 1 and Box 2 are two different perspectives of the user, respectively. During the process of watching the video, the user can switch the viewing angle of the video viewing from the frame 1 to the frame 2 through the operation of the eye or the head rotation or the screen switching of the video viewing device. The video image viewed when the user's perspective is box 1 is a video image presented by the one or more spatial objects corresponding to the perspective at the moment. At the next moment, the user's perspective is switched to box 2. At this time, the video image viewed by the user should also be switched to the video image presented by the space object corresponding to box 2 at that moment.
在一些可行的实施方式中,对于360度大视角的视频图像的输出,服务器可将360度的视角范围内的全景空间进行划分以得到多个空间对象,每个空间对象对应用户的一个子视角,多个子视角的拼接形成一个完整的人眼观察视角。即人眼视角(下面简称视角)可对应一个或者多个空间对象,视角对应的空间对象是人眼视角范围内的内容对象所对应的所有的空间对象。其中,人眼观察视角可以动态变化的,通常可为120度*120度,120度*120度的人眼视角范围内的内容对象对应的空间对象可包括一个或 者多个,例如上述图7所述的框1对应的视角1,框2对应的视角视角2。进一步的,客户端可通过MPD获取服务器为每个空间对象准备的视频码流的空间信息,进而可根据视角的需求向服务器请求某一时间段某个或者多个空间对象对应的视频码流分段并按照视角需求输出对应的空间对象。客户端在同一个时间段内输出360度的视角范围内的所有空间对象对应的视频码流分段,则可在整个360度的全景空间内输出显示该时间段内的完整视频图像。In some feasible implementation manners, for output of a 360-degree large-view video image, the server may divide the panoramic space within a 360-degree viewing angle range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user. The splicing of multiple sub-views forms a complete human eye viewing angle. That is, the human eye angle of view (hereinafter referred to as the angle of view) may correspond to one or more spatial objects, and the spatial objects corresponding to the angle of view are all spatial objects corresponding to the content objects within the scope of the human eye. Wherein, the viewing angle of the human eye can be dynamically changed, and the spatial object corresponding to the content object in the range of the human eye angle of 120 degrees*120 degrees and 120 degrees*120 degrees may include one or There are a plurality of, for example, the viewing angle 1 corresponding to the frame 1 described in FIG. 7 and the viewing angle 2 corresponding to the frame 2. Further, the client may obtain the spatial information of the video code stream prepared by the server for each spatial object through the MPD, and then request the video code stream corresponding to one or more spatial objects in a certain period of time according to the requirement of the perspective. The segment outputs the corresponding spatial object according to the perspective requirements. The client outputs the video stream segment corresponding to all the spatial objects within the 360-degree viewing angle range in the same time period, and then displays the complete video image in the entire 360-degree panoramic space.
具体实现中,在360度的空间对象的划分中,服务器可首先将球面映射为平面,在平面上对空间对象进行划分。具体的,服务器可采用经纬度的映射方式将球面映射为经纬平面图。如图8,图8是本发明实施例提供的空间对象的示意图。服务器可将球面映射为经纬平面图,并将经纬平面图划分为A~I等多个空间对象。进一步的,服务器可也将球面映射为立方体,再将立方体的多个面进行展开得到平面图,或者将球面映射为其他多面体,在将多面体的多个面进行展开得到平面图等。服务器还可采用更多的映射方式将球面映射为平面,具体可根据实际应用场景需求确定,在此不做限制。下面将以经纬度的映射方式,结合图8进行说明。如图8,服务器可将球面的全景空间划分为A~I等多个空间对象之后,则可为每个空间对象准备一组DASH码流。其中,每个空间对象对应的一组DASH码流。客户端用户切换视频观看的视角时,客户端则可根据用户选择的新视角获取新空间对象对应的码流,进而可将新空间对象码流的视频内容呈现在新视角内。下面将结合图9至图12对本发明实施例提供的视频数据的处理方法及装置进行描述。In a specific implementation, in the division of a 360-degree spatial object, the server may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner. FIG. 8 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I. Further, the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like. The server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 8 in a latitude and longitude mapping manner. As shown in FIG. 8, after the server can divide the spherical panoramic space into a plurality of spatial objects such as A to I, a set of DASH code streams can be prepared for each spatial object. Wherein, each spatial object corresponds to a set of DASH code streams. When the client user switches the viewing angle of the video viewing, the client can obtain the code stream corresponding to the new spatial object according to the new perspective selected by the user, and then the video content of the new spatial object code stream can be presented in the new perspective. The method and apparatus for processing video data provided by the embodiments of the present invention will be described below with reference to FIG. 9 to FIG.
参见图9,是本发明实施例提供的视频数据的处理方法的流程示意图。本发明实施例提供的方法包括步骤:FIG. 9 is a schematic flowchart diagram of a method for processing video data according to an embodiment of the present invention. The method provided by the embodiment of the present invention includes the following steps:
S901,接收媒体呈现描述。S901. Receive a media presentation description.
S902,得到指令信息。S902, obtaining instruction information.
S903,若所述指令信息是观看作者视角码流,则获取所述第一表示的分段,否则,获取所述第二表示的分段。S903. If the instruction information is a viewing author view code stream, acquiring the segment of the first representation, and otherwise acquiring the segment of the second representation.
在一些可行的实施方式中,视频的制作者(以下简称作者)制作视频 时,可根据视频的故事情节需求为视频播放设计一条主要情节路线。视频播放过程中,用户只需要观看该主要情节路线对应的视频图像则可了解到该故事情节,其他的视频图像可看可不看。由此可知,视频播放过程中,客户端可选择性的播放该故事情节对应的视频图像,其他的视频图像可以不呈现,可节省视频数据的传输资源和存储空间资源,提高视频数据的处理效率。作者设计故事的主要情节之后,可根据上述主要情节路线设定视频播放时每个播放时刻所要呈现给用户的视频图像,将每个播放时刻的视频图像按照时序串起来则可得到上述主要情节路线的故事情节。其中,上述每个播放时刻所要呈现给用户的视频图像可呈现在每个播放时刻对应的空间对象,即在该空间对象上呈现该时间段所要呈现的视频图像。具体实现中,上述每个播放时刻所要呈现的视频图像对应的视角可设为作者视角,呈现作者视角上的视频图像的空间对象可设为作者空间对象。作者视角对象对应的码流可设为作者视角码流。作者视角码流中包含多个视频帧,每个视频帧呈现时可为一个图像,即作者视角码流中包含多个图像。在视频播放过程中,在每个播放时刻,作者视角上呈现的图像仅是整个视频所要呈现的全景图像(或称VR图像或者全方位图像)中的一部分。在不同的播放时刻,作者视频码流呈现的图像所关联的空间对象的空间信息可以不同,也可以相同,即作者视角码流包含的多个图像至少两个图像所关联的空间对象的空间信息不同。In some possible implementations, the video creator (hereinafter referred to as the author) makes the video. At the time, a major plot route can be designed for video playback based on the video's storyline needs. During video playback, the user only needs to watch the video image corresponding to the main plot route to understand the storyline, and other video images can be viewed or not. It can be seen that during the video playing process, the client can selectively play the video image corresponding to the storyline, and other video images may not be presented, which can save the transmission resource and storage space resources of the video data, and improve the processing efficiency of the video data. . After the author designs the main plot of the story, the video image to be presented to the user at each play time during video playback can be set according to the above-mentioned main plot route, and the video sequence of each play time can be obtained by stringing the time series to obtain the above main plot route. Storyline. The video image to be presented to the user at each of the playing times may be presented with a spatial object corresponding to each playing time, that is, a video image to be presented in the time period is presented on the spatial object. In a specific implementation, the angle of view corresponding to the video image to be presented at each of the playing times may be set as the author's perspective, and the spatial object that presents the video image in the perspective of the author may be set as the author space object. The code stream corresponding to the author view object can be set as the author view code stream. The author's view stream contains multiple video frames, and each video frame can be rendered as one image, that is, the author's view stream contains multiple images. During video playback, at each playback time, the image presented by the author's perspective is only part of the panoramic image (or VR image or omnidirectional image) that the entire video is to present. At different playback moments, the spatial information of the spatial object associated with the image presented by the author video stream may be different or the same, that is, the spatial information of the spatial object associated with at least two images of the plurality of images included in the author's perspective stream. different.
在一些可行的实施方式中,作者设计了每个播放时刻的作者视角之后,则可通过服务器对每个播放时刻的作者视角准备相应的码流。其中,作者视角对应的码流可设为作者视角码流。服务器可对作者视角码流进行编码并传输给客户端,客户端对作者视角码流进行解码之后,则可呈现作者视角码流对应的故事情节画面给用户。服务器无需传输作者视角以外其他视角(设为非作者视角,即静态视角码流)的码流给客户端,可节省视频数据的传输带宽等资源。In some feasible implementation manners, after the author designs the author perspective of each play time, the corresponding code stream can be prepared by the server for the author perspective of each play time. The code stream corresponding to the author view may be set as the author view code stream. The server may encode the author view code stream and transmit it to the client. After the client decodes the author view code stream, the story scene picture corresponding to the author view code stream may be presented to the user. The server does not need to transmit the code stream of other perspectives other than the author's perspective (set to the non-author perspective, that is, the static view stream) to the client, which can save resources such as the transmission bandwidth of the video data.
在一些可行的实施方式中,由于作者视角是作者根据视频故事情节设定的呈现预设图像的空间对象,不同的播放时刻上的作者空间对象可不同也可相同,由此可知作者视角是一个随着播放时刻不断变化的视角,作者 空间对象是个不断变化位置的动态空间对象,即每个播放时刻对应的作者空间对象在全景空间中的位置不尽相同。上述图8所示的各个空间对象是按照预设规则划分的空间对象,是在全景空间中的相对位置固定的空间对象,任一播放时刻对应的作者空间对象不一定是图8所示的固定空间对象中的某一个,而且在全局空间中相对位置不断变化的空间对象。客户端从服务器获取的视频所呈现的内容是由各个作者视角串起来的,不包含非作者视角对应的空间对象,作者视角码流仅包含作者空间对象的内容,并且从服务器获取的MPD中不包含作者视角的作者空间对象的空间信息,则客户端只能解码并呈现作者视角的码流。若用户在观看视频的过程中,观看的视角切换到非作者视角上,客户端则无法呈现相应的视频内容给用户。In some feasible implementation manners, since the author's perspective is a spatial object that presents the preset image according to the video storyline, the author's space object at different playing moments may be different or the same, thereby knowing that the author's perspective is one. With the changing perspective of the playing time, the author A spatial object is a dynamic spatial object with a constantly changing position, that is, the position of the author space object corresponding to each play time is different in the panoramic space. Each of the spatial objects shown in FIG. 8 is a spatial object divided according to a preset rule, and is a spatial object fixed in a relative position in the panoramic space. The author space object corresponding to any play time is not necessarily fixed as shown in FIG. 8. One of the spatial objects, and a spatial object whose relative position is constantly changing in the global space. The content presented by the video obtained by the client from the server is stringed by each author's perspective. It does not contain the spatial object corresponding to the non-author perspective. The author view code stream only contains the content of the author space object, and the MPD obtained from the server does not. The spatial information of the author's spatial object containing the author's perspective, the client can only decode and present the code stream of the author's perspective. If the viewing angle of the viewing is switched to the non-author perspective during the video viewing process, the client cannot present the corresponding video content to the user.
本发明实施例通过对DASH标准中提供的视频的MPD文件和视频文件格式(英文:file format)进行修改,实现视频播放过程中作者视角与非作者视角的相互切换过程中的视频内容呈现。The embodiment of the invention modifies the MPD file and the video file format (file format) of the video provided in the DASH standard, so as to realize the video content presentation in the process of switching between the author perspective and the non-author perspective in the video playback process.
本发明提供的对DASH的MPD文件的修改也可以携带在基于HTTP协议的实时流(英文:Http Live Streaming,HLS)定义的.m3u8文件中或者平滑流(英文:Smooth Streaming,SS的.ismc文件中或者其他的会话描述协议中(英文:Session Description Protocol,SDP),对文件格式的修改,也可应用在ISOBMFF或者MPEG2-TS的文件格式中,具体可根据实际应用场景需求确定,在此不做限制。本发明实施例将以上述标识信息携带在DASH码流为例进行说明。The modification of the DASH MPD file provided by the present invention can also be carried in a .m3u8 file defined by the HTTP protocol-based real-time stream (English: Http Live Streaming, HLS) or a smooth stream (English: Smooth Streaming, SS.ismc file) In the middle or other session description protocol (SDP), the modification of the file format can also be applied to the file format of ISOBMFF or MPEG2-TS, which can be determined according to the requirements of the actual application scenario. The embodiment of the present invention will be described by taking the above identification information in the DASH code stream as an example.
在一些可行的实施方式中,服务器生成媒体呈现描述时,可在媒体呈现描述中添加标识信息,用于标识视频的作者视角码流,即作者视角码流。具体实现中,上述标识信息可携带在媒体呈现描述中携带的作者视角码流所在码流集合的属性信息中,即上述标识信息可携带在媒体呈现描述中的自适应集的信息中,上述标识信息也可携带在媒体呈现描述中包含的表示的信息中。进一步的,上述标识信息还可携带在媒体呈现描述中的描述子的信息中。客户端可通过解析MPD得到MPD中增加的语法元素快速识别作者视角码流和非作者视角的码流。其中,具体改动或者增加的语法描述如下表2,表2为新 增语法元素的属性信息表:In some possible implementation manners, when the server generates the media presentation description, the identifier information may be added to the media presentation description for identifying the author view code stream of the video, that is, the author view code stream. In the specific implementation, the identifier information may be carried in the attribute information of the code stream set in which the author view code stream is carried in the media presentation description, that is, the identifier information may be carried in the information of the adaptive set in the media presentation description, where the identifier is The information can also be carried in the information contained in the presentation contained in the media presentation description. Further, the foregoing identification information may also be carried in the information of the descriptor in the media presentation description. The client can quickly obtain the code stream of the author view code stream and the non-author view by parsing the MPD to obtain the syntax elements added in the MPD. Among them, the specific modification or added syntax is described in Table 2 below, and Table 2 is new. The attribute information table of the grammar element:
表2Table 2
Figure PCTCN2016107111-appb-000035
Figure PCTCN2016107111-appb-000035
在MPD中通过属性@view_type来标记对应的representation是非作者视角(或称静态视角)码流还是作者视角(或称动态视角)码流。当view_type值为0时,表示对应的representation是非作者视角码流;当view_type值为1时,表示对应的representation是作者视角码流。客户端在本地解析MPD文件时,可根据该属性来判断当前视频流中是否包含作者视角码流。下面将通过一些可行的实现方式对应的MPD样例进行说明:In the MPD, the attribute @view_type is used to mark whether the corresponding representation is a non-author view (or static view) code stream or an author view (or dynamic view) code stream. When the view_type value is 0, it indicates that the corresponding representation is a non-author view code stream; when the view_type value is 1, it indicates that the corresponding representation is the author view code stream. When the client parses the MPD file locally, it can determine whether the current video stream contains the author view stream according to the attribute. The following is a description of the MPD examples corresponding to some possible implementations:
样例一:描述在MPD描述子中Example 1: Described in the MPD descriptor
Figure PCTCN2016107111-appb-000036
Figure PCTCN2016107111-appb-000036
Figure PCTCN2016107111-appb-000037
Figure PCTCN2016107111-appb-000037
如上所示,在该样例中,服务器可在现有MPD语法的EssentialProperty包含value属性的第二个值的位置插入一个新的值,原有value的第二个值以及第二个值以后的值依次往后挪一个值。客户端解析MPD之后则可获取得到value的第二个值。即在该样例中,value的第二个值是view_type。EssentialProperty中的value=“0,0,…”,即value的第二个为0(即view_type=0)表示EssentialProperty描述的是固定视角码流(即静态视角码流);value=“0,1”,即value的第二个为1(即view_type=1)表示EssentialProperty描述的是作者视角码流,即view_type=0。As shown above, in this example, the server can insert a new value at the position where the EssentialProperty of the existing MPD syntax contains the second value of the value attribute, the second value of the original value, and the second value. The value is followed by a value. After the client parses the MPD, it can obtain the second value of the value. That is, in this example, the second value of value is view_type. Value=“0,0,...” in EssentialProperty, that is, the second value of value is 0 (ie view_type=0), indicating that EssentialProperty describes a fixed view stream (ie, static view stream); value=“0,1 ", that is, the second value of value is 1 (ie, view_type = 1), indicating that the EssentialProperty describes the author's view stream, that is, view_type=0.
样例二:描述在表示中 Example 2: Described in the representation
Figure PCTCN2016107111-appb-000038
Figure PCTCN2016107111-appb-000038
在该样例中,Representation的属性信息中新增了语法元素view_type。当view_type=“0”或不设定时(默认是0),表示该Representation描述的码流为固定视角码流;当view_type=“1”时,表示该Representation描述的码流为作者视角码流。In this example, the syntax element view_type has been added to the property information of Representation. When view_type=“0” or not set (default is 0), it indicates that the code stream described by the Representation is a fixed view code stream; when view_type=“1”, it indicates that the code stream described by the Representation is the author view code stream. .
样例三:描述在自适应集(adaptationSet)的属性信息中 Example 3: Described in the attribute information of the adaptation set (adaptationSet)
Figure PCTCN2016107111-appb-000039
Figure PCTCN2016107111-appb-000039
该样例中,在AdaptationSet的属性信息(即作者视角码流所在码流集合的属性信息)中新增语法元素view_type。当view_type=“0”或不设定时(默认是0),表示该AdaptationSet下包含的码流为固定视角码流;当view_type=“1”时,表示AdaptationSet下包含的码流为作者视角码流。In this example, the syntax element view_type is added to the property information of the AdaptationSet (ie, the attribute information of the code stream set where the author's view stream is located). When view_type=“0” or not set (default is 0), it indicates that the code stream contained in the AdaptationSet is a fixed view stream; when view_type=“1”, it indicates that the code stream included in the AdaptationSet is the author view code. flow.
如果作者视角流相关的空间信息是独立封装在一个元数据文件中(timed metadata文件),那么在上述所有的MPD样例中,可以添加该空 间信息文件的描述信息,比如增加一个adaptation set,在该adaptationset中描述空间信息文件的信息;If the spatial information related to the author's view stream is independently encapsulated in a metadata file (timed metadata file), then in all of the above MPD examples, the space may be added. Descriptive information of the information file, such as adding an adaptation set, describing the information of the spatial information file in the adaptation set;
Figure PCTCN2016107111-appb-000040
Figure PCTCN2016107111-appb-000040
客户端可通过解析MPD,根据MPD中携带的view_type等标识信息确定作者视角码流。进一步的,客户端可在接收到的指令信息指示观看作者视角码流时,获取作者视角码流中的分段,并呈现作者视角码流的分段。若所述指令信息指示的不是观看作者视角码流,则可获取静态视角码流的分段进行呈现。如果作者视角流相关的空间信息封装在独立的元数据文件中,那么客户端可以通过解析MPD,根据codec标识获取空间信息的元数据,从而解析出空间信息;The client can determine the author view code stream according to the identification information such as the view_type carried in the MPD by parsing the MPD. Further, the client may obtain a segment in the author view code stream when the received instruction information indicates to view the author view code stream, and present a segment of the author view code stream. If the instruction information indicates that the author's view code stream is not viewed, the segment of the static view code stream may be acquired for presentation. If the spatial information related to the author view stream is encapsulated in a separate metadata file, the client can parse the MPD and obtain the metadata of the spatial information according to the codec identifier, thereby parsing the spatial information;
在本发明实施例中,上述客户端接收到的切换指令信息可包括上述头部转动,眼睛、手势、或其他人体行为动作信息,也可包括用户的输入信息,上述输入信息可包括键盘输入信息、语音输入信息和触屏输入信息等。In the embodiment of the present invention, the switching instruction information received by the client may include the above-mentioned head rotation, eyes, gestures, or other human behavior action information, and may also include input information of the user, and the input information may include keyboard input information. , voice input information and touch screen input information.
在一些可行的实施方式中,服务器还可在作者视角码流中添加一个或者多个作者空间对象的空间信息。其中,每个作者空间对象对应一个或者 多个图像,即一个或者多个图像可关联同一个空间对象,也可每个图像关联一个空间对象。服务器可在作者视角码流中添加每个作者空间对象的空间信息,也就可以将空间信息作为样本,独立的封装在一个轨迹或者文件中。其中,一个作者空间对象的空间信息为该作者空间对象与其关联的内容成分的空间关系,即作者空间对象与全景空间的空间关系。即上述作者空间对象的空间信息所描述的空间具体可为全景空间中的部分空间,如上述图8中任意一个,或者图10中的实线框(或者虚线框中任一个)等。具体实现中,针对于DASH码流,服务器可在现有文件格式中的作者视角码流的分段中包含的trun box中增加上述空间信息,用于描述作者视角码流的每一帧图像所关联的空间对象的空间信息。In some possible implementations, the server may also add spatial information of one or more author space objects to the author view code stream. Wherein, each author space object corresponds to one or Multiple images, ie one or more images, may be associated with the same spatial object, or each image may be associated with one spatial object. The server can add the spatial information of each author space object in the author view code stream, and can also use the space information as a sample and independently encapsulate it in a track or file. The spatial information of an author space object is the spatial relationship between the author space object and its associated content component, that is, the spatial relationship between the author space object and the panoramic space. That is, the space described by the spatial information of the author space object may specifically be a partial space in the panoramic space, such as any one of the above-mentioned FIG. 8 or the solid line frame (or any one of the dotted lines) in FIG. In a specific implementation, for the DASH code stream, the server may add the foregoing spatial information to the trun box included in the segment of the author view code stream in the existing file format, and describe each frame image of the author view code stream. The spatial information of the associated spatial object.
添加描述的样例(样例一):Add a description of the sample (Sample 1):
Figure PCTCN2016107111-appb-000041
Figure PCTCN2016107111-appb-000041
Figure PCTCN2016107111-appb-000042
Figure PCTCN2016107111-appb-000042
在该样例中,服务器可在现有的trun box中添加语法元素tr_flags,并将tr_flags的值设定为0x001000,用于标记预设空间对象在全局空间对象中的相对位置的空间信息包含在trun box中。In this example, the server may add the syntax element tr_flags to the existing trun box and set the value of tr_flags to 0x001000, and the spatial information for marking the relative position of the preset space object in the global space object is included in Trun box.
在一些可行的实施方式中,上述trun box中包含的作者空间对象的空间信息采用偏航角来描述,也可以采用经纬图的空间位置描述,或者采用其他几何立体图形来描述,在此不做限制。上述trun box中采用偏航角方式描述,如center_pitch、center_yaw、center_roll、pitch_h和yaw_w来描述空间信息在球面中的中心位置(center_pitch、center_yaw、center_roll),高度(pitch_h),宽度yaw_w的偏航角。如图11,图11是作者空间对象在全景空间中的相对位置的示意图。在图11中,O点为360度VR全景视频球面图像对应的球心,可认为是观看VR全景图像时人眼的位置。A点为作者视角图像中心点,C、F为作者视角图像中过A点的沿该图像横向坐标轴的边界点,E、D为作者视角图像中过A点的沿该图像纵向坐标轴的边界点,B为A点沿球面经线在赤道线的投影点,I为赤道线上水平方向的起始坐标点。各个元素的含义解释如下:In some feasible implementation manners, the spatial information of the author space object included in the trun box described above is described by a yaw angle, or the spatial position description of the latitude and longitude map may be used, or other geometric solid figures may be used for description. limit. The trun box described above uses yaw angle descriptions such as center_pitch, center_yaw, center_roll, pitch_h, and yaw_w to describe the center position (center_pitch, center_yaw, center_roll), height (pitch_h), width yaw_w yaw angle of the spatial information in the sphere. . Figure 11 is a schematic illustration of the relative positions of author space objects in a panoramic space. In FIG. 11, the point O is the center of the sphere corresponding to the 360-degree VR panoramic video spherical image, and can be regarded as the position of the human eye when viewing the VR panoramic image. Point A is the center point of the author's perspective image, C and F are the boundary points of the A-point of the author's perspective image along the horizontal coordinate axis of the image, and E and D are the points along the longitudinal axis of the image in the image of the author's perspective. Boundary point, B is the projection point of point A along the spherical meridian on the equator line, and I is the starting coordinate point in the horizontal direction on the equator line. The meaning of each element is explained as follows:
center_pitch:作者空间对象的图像的中心位置映射到全景球面(即全局空间)图像上的点的竖直方向的偏转角,如图11中的∠AOB;Center_pitch: the center position of the image of the author space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ∠AOB in FIG. 11;
center_yaw:作者空间对象的图像的中心位置映射到全景球面图像上的点的水平方向的偏转角,如图11中的∠IOB;Center_yaw: the center position of the image of the author space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, as shown in FIG. 11 ∠ IOB;
center_roll:作者空间对象的图像的中心位置映射到全景球面图像上的点与球心连线方向的旋转角,如图11中的∠DOB;Center_roll: the center position of the image of the author space object is mapped to the rotation angle of the point on the panoramic spherical image and the direction of the connection of the spherical center, as shown in FIG. 11 ∠ DOB;
pitch_h:作者空间对象的图像在全景球面图像的视场高度,以视场纵向最大角度表示,如图11中∠DOE;yaw_w:作者空间对象的图像在全景球面图像的视场宽度,以视场横向最大角度表示,如图11中∠COF。 Pitch_h: the image of the author's spatial object in the field of view of the panoramic spherical image, expressed as the maximum angle of the field of view, as shown in Figure 11 ∠ DOE; yaw_w: the image of the author's spatial object in the field of view of the panoramic spherical image, with the field of view The horizontal maximum angle is expressed as ∠COF in Figure 11.
在一些可行的实施方式中,服务器端也可在视频格式中添加一个新的box及其语法描述,用于描述作者空间对象的空间信息。其中,上述新增的box及其语法描述信息样例如下(样例二):In some possible implementations, the server may also add a new box and its syntax description to the video format for describing the spatial information of the author space object. Among them, the above-mentioned new box and its syntax description information are as follows (example 2):
Figure PCTCN2016107111-appb-000043
Figure PCTCN2016107111-appb-000043
上述strp box包含的信息为新增的作者空间对象的空间信息,其中包含的各个语法元素的含义与上述样例一中包含的各个语法元素的含义相同。具体实现中,在该样例中Box里的“unsigned int(16)center_roll;//fov的中心位置roll”可以不存在,具体可根据实际应用场景需求确定,在此不做限制。The information contained in the strp box is the spatial information of the newly added author space object, and the meaning of each syntax element included is the same as the meaning of each syntax element included in the above example 1. In the specific implementation, the "unsigned int(16)center_roll;//fov center position roll" in the box may not exist in the example, and may be determined according to actual application scenario requirements, and is not limited herein.
上述的strp box可以包含在stbl box中,stbl box中的flag标记为存在空间位置信息strp box,具体的strp box的上一级box是什么可根据实际应用场景需求确定,在此不做限制。The above-mentioned strp box can be included in the stbl box, and the flag in the stbl box is marked as the spatial location information strp box. The upper box of the specific strp box can be determined according to the actual application scenario requirements, and no limitation is imposed here.
上述的strp box可以描述在DASH的分段的元数据(traf)中,也可以是基于ISOBMAF格式封装的一个轨迹(track)的元数据(matadata)中。The strp box described above may be described in the metadata (traf) of the segmentation of the DASH, or in the metadata (matadata) of a track based on the ISOBMAF format.
上述各个实施例中描述的空间信息(fov的中心位置和宽高,或者fov左上 位置和右下位置)也可以被包含在作者视角码流相关联的一个独立的元数据码流或者轨迹中,每个空间信息对应元数据轨迹或者元数据文件中的一个样本。Spatial information (fov center position and width and height, or fov upper left) described in the above various embodiments The location and the lower right position may also be included in a separate metadata stream or trajectory associated with the author's view stream, each spatial information corresponding to a metadata trajectory or a sample in the metadata file.
客户端解析作者视角码流的分段,获取作者空间对象的空间信息之后则可确定作者空间对象在全景空间中的相对位置,进而可在视频播放过程中根据当前所播放作者视角码流中的作者空间对象的空间信息和视角切换的轨迹,确定视角切换后的空间对象位置,从而实现作者视角与非作者视角码流之间的切换播放,上述的作者空间对象的空间信息也可以通过解析元数据轨迹或者元数据文件获取。The client parses the segment of the author's view code stream, and obtains the spatial information of the author space object, and then determines the relative position of the author space object in the panoramic space, and then can be in the video playback process according to the currently viewed author view code stream. The space information of the author's spatial object and the trajectory of the perspective switching determine the position of the spatial object after the perspective switching, so as to realize the switching between the author's perspective and the non-author's perspective stream. The spatial information of the author's spatial object can also be analyzed by the analytic element. Data track or metadata file acquisition.
在一些可行的实施方式中,若视角切换过程中视角的变化是从作者视角确定至非作者视角,则可根据作者视角的空间信息确定切换后的非作者视角的空间信息,以获取该非作者视角的空间信息对应的码流进行呈现。具体的,客户端可将上述确定的作者空间对象的中心位置或者上述作者空间对象包含的指定边界位置设定为起点,或者是空间对象左上和右下对应的位置,例如上述center_pitch、center_yaw、center_roll、pitch_h和In some feasible implementation manners, if the change of the viewing angle during the switching of the viewing angle is determined from the perspective of the author to the non-author perspective, the spatial information of the non-author perspective after the switching may be determined according to the spatial information of the author's perspective to obtain the non-author. The code stream corresponding to the spatial information of the perspective is presented. Specifically, the client may set the center position of the author space object determined above or the specified boundary position included in the author space object as a starting point, or a position corresponding to the upper left and the lower right of the space object, for example, the above center_pitch, center_yaw, center_roll , pitch_h and
yaw_w中的一个或者多个参数所指示的位置设定为起点。进一步的,客户端可根据视角切换的空间对象切换轨迹计算得到空间对象切换轨迹所指示的终点空间对象,将该终点空间对象确定为目标空间对象。例如,图10中所示的实线区域为作者空间对象,虚线区域为根据作者空间对象和空间对象切换轨迹计算得到的目标空间对象。The position indicated by one or more parameters in yaw_w is set as the starting point. Further, the client may calculate the end point space object indicated by the space object switching trajectory according to the spatial object switching trajectory switched by the viewing angle, and determine the end point space object as the target space object. For example, the solid line area shown in FIG. 10 is an author space object, and the dotted line area is a target space object calculated based on the author space object and the space object switching trajectory.
在一些可行的实施方式中,由于客户端播放作者视角码流时并未获取整个全景视频模块,因此,客户端确定了视角切换的目标空间对象之后,则可向服务器请求目标空间对象对应的码流。具体实现中,客户端可根据MPD中描述的各个空间对象的码流的URL等信息,向服务器发送获取目标空间对象的码流的请求。服务器接收到客户端发送的请求之后,则可将目标空间对象对应的码流发送给客户端。客户端获取得到目标空间对象的码流之后,则可解码并播放目标空间对象的码流,实现视角码流的切换播放。In some feasible implementation manners, since the client does not acquire the entire panoramic video module when playing the author view code stream, after the client determines the target space object of the view switching, the server may request the code corresponding to the target space object. flow. In a specific implementation, the client may send a request for acquiring a code stream of the target space object to the server according to information such as a URL of a code stream of each spatial object described in the MPD. After receiving the request sent by the client, the server may send the code stream corresponding to the target space object to the client. After the client obtains the code stream of the target space object, the code stream of the target space object can be decoded and played, and the switching of the view code stream is realized.
在本发明实施例中,客户端可根据MPD中携带的标识信息确定作者视 角码流,还可获取作者视角码流中携带的作者视角码流对应的作者空间对象的空间信息,进而可在视角切换过程中,根据作者空间对象的位置获取作者视角码流进行播放,或者根据作者空间对象确定视角切换后的非作者视角的目标空间对象。进一步的,可向服务器请求请求目标空间对象对应的非作者视角码流进行播放,实现视角切换的码流播放。客户端根据视角的切换进行码流切换播放时无需加载全景视频码流,可节省视频数据的传输带宽以及客户端的本地存储空间等资源。客户端根据视角切换过程中确定的目标空间对象请求该目标空间对象对应的码流进行播放,接收视频数据传输的带宽等资源的同时也可实现视角切换的码流播放,提高了视频切换播放的适用性,增强视频观看的用户体验。In the embodiment of the present invention, the client may determine the author view according to the identification information carried in the MPD. The corner stream can also obtain the spatial information of the author space object corresponding to the author view stream carried in the author's view stream, and then can obtain the author view stream according to the position of the author space object during the view switching process, or The target space object of the non-author perspective after the perspective switching is determined according to the author space object. Further, the server may request the non-author view code stream corresponding to the target space object to be played, and the code stream of the view switching is played. The client does not need to load the panoramic video stream when the code stream is switched according to the switching of the view, which can save the transmission bandwidth of the video data and the local storage space of the client. The client requests the code stream corresponding to the target space object to play according to the target space object determined in the perspective switching process, and receives the bandwidth of the video data transmission and the like, and also realizes the stream switching of the view switching, thereby improving the video switching play. Applicability to enhance the user experience of video viewing.
参见图12,是本发明实施例提供的基于HTTP动态自适应流媒体的视频数据的处理装置的结构示意图。本发明实施例提供的处理装置,包括:FIG. 12 is a schematic structural diagram of a device for processing video data based on HTTP dynamic adaptive streaming media according to an embodiment of the present invention. The processing device provided by the embodiment of the present invention includes:
接收模块121,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述至少两个表示中的第一表示是作者视角码流,所述作者视角码流中包含多个图像,所述多个图像中至少两个图像所关联的空间对象的空间信息不同;所述至少两个表示中的第二表示是静态视角码流,所述静态视角码流中包含多个图像,所述多个图像所关联的空间对象的空间信息相同。The receiving module 121 is configured to receive a media presentation description, where the media presentation description includes information of at least two representations, and the first representation of the at least two representations is an author view code stream, where the author view code stream is included a plurality of images, wherein spatial information of the spatial objects associated with at least two of the plurality of images is different; the second representation of the at least two representations is a static view code stream, and the static view code stream includes more An image in which the spatial information of the spatial objects associated with the plurality of images is the same.
在一些可行的实现方式中,媒体呈现描述包括至少一个和作者视角码流相关联的空间信息的文件。In some possible implementations, the media presentation describes a file that includes at least one spatial information associated with the author view codestream.
获取模块122,用于得到指令信息。The obtaining module 122 is configured to obtain instruction information.
所述获取模块122,还用于在得到的所述指令信息是观看作者视角码流时,获取所述第一表示的分段,否则,获取所述第二表示的分段。The obtaining module 122 is further configured to: acquire the segment of the first representation when the obtained instruction information is a viewing author view code stream, and otherwise acquire the segment of the second representation.
在一些可行的实现方式中,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识视频的作者视角码流。In some possible implementations, the media presentation description further includes identification information, where the identification information is used to identify an author view code stream of the video.
在一些可行的实现方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In some possible implementations, the media presentation description includes information of an adaptation set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。 The information of the adaptive set includes the identifier information.
在一些可行的实现方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In some possible implementations, the media presentation description includes information indicating, the representation being a set and encapsulation of one or more code streams in a transport format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一些可行的实现方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In some possible implementations, the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of the spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
在一些可行的实现方式中,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息;In some possible implementations, the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
在一些可行的实现方式中,所述媒体呈现描述中包含空间信息文件的描述信息;In some possible implementations, the media presentation description includes a description information of a spatial information file;
所述获取模块122还用于:获取空间信息文件中的部分或者全部数据;The obtaining module 122 is further configured to: acquire part or all of the data in the spatial information file;
所述获取模块122还用于:The obtaining module 122 is further configured to:
解析所述第一表示的分段,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the segment of the first representation, and acquiring spatial information of a spatial object associated with the image included in the segment of the first representation.
在一些可行的实现方式中,所述图像所关联的空间对象的空间信息为所述空间对象与其关联的内容成分的空间关系。In some possible implementations, the spatial information of the spatial object associated with the image is a spatial relationship of the spatial object and its associated content component.
在一些可行的实现方式中,所述空间信息携带在所述第一表示的分段中的指定box中,或者和第一表示的分段相关联的元数据表达中的指定box中。In some possible implementations, the spatial information is carried in a designated box in the segment of the first representation, or in a specified box in a metadata representation associated with the segment of the first representation.
在一些可行的实现方式中,所述指定box为所述第一表示的分段中包含的trun box,所述trun box用于描述一个轨迹的一组连续样本。In some possible implementations, the designated box is a trun box included in a segment of the first representation, and the trun box is used to describe a set of consecutive samples of a track.
所述获取模块122还用于:The obtaining module 122 is further configured to:
解析空间信息文件,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the spatial information file, and acquiring spatial information of the spatial object associated with the image included in the segment of the first representation.
具体实施例中,本发明实施例提供的视频数据的处理装置具体可为上述实施例中的客户端,可通过其内置的各个模块实现上述视频数据的处理方法中各个步骤所描述的实现方式,在此不再赘述。In a specific embodiment, the processing device for the video data provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in each step of the foregoing video data processing method may be implemented by using the built-in modules. I will not repeat them here.
在本发明实施例中,客户端可根据MPD中携带的标识信息确定作者视角码流,还可获取作者视角码流中携带的作者视角码流对应的作者空间对 象的空间信息,进而可在视角切换过程中,根据作者空间对象的位置获取作者视角码流进行播放,或者根据作者空间对象确定视角切换后的非作者视角的目标空间对象。进一步的,可向服务器请求请求目标空间对象对应的非作者视角码流进行播放,实现视角切换的码流播放。客户端根据视角的切换进行码流切换播放时无需加载全景视频码流,可节省视频数据的传输带宽以及客户端的本地存储空间等资源。客户端根据视角切换过程中确定的目标空间对象请求该目标空间对象对应的码流进行播放,接收视频数据传输的带宽等资源的同时也可实现视角切换的码流播放,提高了视频切换播放的适用性,增强视频观看的用户体验。In the embodiment of the present invention, the client may determine the author view code stream according to the identifier information carried in the MPD, and may also obtain the author space pair corresponding to the author view code stream carried in the author view code stream. The spatial information of the image can be obtained by acquiring the author's perspective stream according to the position of the author's spatial object during the perspective switching, or determining the target space object of the non-author perspective after the perspective switching according to the author's spatial object. Further, the server may request the non-author view code stream corresponding to the target space object to be played, and the code stream of the view switching is played. The client does not need to load the panoramic video stream when the code stream is switched according to the switching of the view, which can save the transmission bandwidth of the video data and the local storage space of the client. The client requests the code stream corresponding to the target space object to play according to the target space object determined in the perspective switching process, and receives the bandwidth of the video data transmission and the like, and also realizes the stream switching of the view switching, thereby improving the video switching play. Applicability to enhance the user experience of video viewing.
本发明的说明书、权利要求书以及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或者单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或者单元,或可选地还包括对于这些过程、方法、系统、产品或设备固有的其他步骤或单元。The terms "first", "second", "third", and "fourth" and the like in the description, the claims, and the drawings of the present invention are used to distinguish different objects, and are not intended to describe a particular order. Furthermore, the terms "comprises" and "comprising" and "comprising" are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or alternatively Other steps or units inherent to these processes, methods, systems, products or equipment.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。 The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.

Claims (41)

  1. 一种基于HTTP动态自适应流媒体的视频数据的处理方法,其特征在于,所述方法包括:A method for processing video data based on HTTP dynamic adaptive streaming media, characterized in that the method comprises:
    接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述至少两个表示中的第一表示是作者视角码流,所述作者视角码流中包含多个图像,所述多个图像中至少两个图像所关联的空间对象的空间信息不同,所述至少两个表示中的第二表示是静态视角码流,所述静态视角码流中包含多个图像,所述多个图像所关联的空间对象的空间信息相同;Receiving a media presentation description, the media presentation description includes information of at least two representations, a first representation of the at least two representations is an author view code stream, and the author view code stream includes a plurality of images, The spatial information of the spatial object associated with at least two of the plurality of images is different, the second representation of the at least two representations is a static view code stream, and the static view code stream includes a plurality of images, the plurality of The spatial information of the spatial objects associated with the images is the same;
    得到指令信息;Get instruction information;
    若所述指令信息是指示观看所述作者视角码流,则获取所述第一表示的分段;若所述指令信息是指示观看所述静态视角码流,则获取所述第二表示的分段。If the instruction information is to indicate viewing the author view code stream, acquiring the segment of the first representation; if the instruction information is to indicate viewing the static view code stream, acquiring the second representation segment.
  2. 如权利要求1所述的方法,其特征在于,所述视角码流中包含的图像所关联的空间对象的空间信息封装在独立于所述视角码流的文件中或者在独立于视角码流的轨迹(track)中。The method according to claim 1, wherein the spatial information of the spatial object associated with the image included in the view code stream is encapsulated in a file independent of the view code stream or independent of the view code stream. In the track.
  3. 如权利要求2所述的方法,其特征在于,所述独立于所述视角码流的文件是一个元数据(metadata)文件,所述元数据文件中的样本包括所述空间信息。The method of claim 2 wherein said file independent of said view stream is a metadata file, said samples in said metadata file comprising said spatial information.
  4. 如权利要求2所述的方法,其特征在于,所述独立于视角码流的轨迹(track)是元数据轨迹,所述元数据轨迹中的样本包括所述空间信息。The method of claim 2 wherein said track independent of the view stream is a metadata track, and said samples in said metadata track comprise said spatial information.
  5. 如权利要求1所述的方法,其特征在于,所述视角码流中包含的图像所关联的空间对象的空间信息封装在所述视角码流中。The method of claim 1, wherein the spatial information of the spatial object associated with the image contained in the view code stream is encapsulated in the view code stream.
  6. 如权利要求5所述的方法,其特征在于,所述空间信息封装在所述视角码流的元数据中。The method of claim 5 wherein said spatial information is encapsulated in metadata of said view stream.
  7. 如权利要求5所述的方法,其特征在于,所述视角码流中包括一个flag标识,所述flag标识用于表示视角码流中是否存在空间信息。The method according to claim 5, wherein the view code stream includes a flag identifier, and the flag identifier is used to indicate whether spatial information exists in the view code stream.
  8. 如权利要求7所述的方法,其特征在于,flag标识所在的box中描述了空间对象的空间信息,或者flag标识所在的box中包含了描述空间对象的空间信息的子box。 The method according to claim 7, wherein the space in which the flag is located describes the spatial information of the spatial object, or the box in which the flag identifies the subbox containing the spatial information describing the spatial object.
  9. 如权利要求5所述的方法,其特征在于,所述空间信息封装在所述视角码流中的辅助增强信息单元或者参数集单元中。The method of claim 5 wherein said spatial information is encapsulated in an auxiliary enhancement information element or parameter set unit in said view code stream.
  10. 如权利要求1所述的方法,其特征在于,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识视频的作者视角码流。The method of claim 1, wherein the media presentation description further comprises identification information, the identification information being used to identify an author view code stream of the video.
  11. 如权利要求1所述的方法,其特征在于,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识所述空间信息封装在独立于所述作者视角码流的文件中。The method of claim 1, wherein the media presentation description further comprises identification information, the identification information being used to identify the spatial information encapsulated in a file independent of the author view code stream.
  12. 如权利要求11所述的方法,其特征在于,所述独立于所述视角码流的文件中的样本是所述空间信息的元数据。The method of claim 11 wherein said samples in said file independent of said view stream are metadata of said spatial information.
  13. 如权利要求10所述的方法,其特征在于,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;The method of claim 10, wherein said media presentation description includes information of an adaptive set, said adaptive set being used to describe media data of a plurality of mutually replaceable encoded versions of the same media content component a data set of segmented attributes;
    其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
  14. 如权利要求10所述的方法,其特征在于,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;The method according to claim 10, wherein the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format;
    其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
  15. 如权利要求10所述的方法,其特征在于,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;The method according to claim 10, wherein the media presentation description includes information of a descriptor, and the descriptor is used to describe spatial information of the associated spatial object;
    其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
  16. 如权利要求13-15任一项所述的方法,其特征在于,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息;The method according to any one of claims 13 to 15, wherein the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
    所述获取所述第一表示的分段之后,所述方法还包括:After the segmentation of the first representation is obtained, the method further includes:
    解析所述第一表示的分段,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the segment of the first representation, and acquiring spatial information of a spatial object associated with the image included in the segment of the first representation.
  17. 如权利要求11或12所述的方法,其特征在于,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息; The method according to claim 11 or 12, wherein the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
    所述获取所述第一表示的分段之后,所述方法还包括:After the segmentation of the first representation is obtained, the method further includes:
    获取所述标识信息,根据所述标识信息,获取所述封装在独立于所述视角码流的文件的空间信息。Obtaining the identifier information, and acquiring, according to the identifier information, the spatial information encapsulated in a file independent of the view code stream.
  18. 如权利要求16所述的方法,其特征在于,所述图像所关联的空间对象的空间信息为所述空间对象与其关联的内容成分的空间关系。The method of claim 16 wherein the spatial information of the spatial object associated with the image is a spatial relationship of the spatial object and its associated content component.
  19. 如权利要求16或18所述的方法,其特征在于,所述空间信息携带在所述第一表示的分段中的指定box中,或者和第一表示的分段相关联的元数据表达中的指定box中。The method of claim 16 or 18, wherein the spatial information is carried in a designated box in the segment of the first representation or in a metadata representation associated with the segment of the first representation In the specified box.
  20. 根据权利要求19所述的方法,其特征在于,所述指定box中还包括一个flag标识,所述flag标识用于表示box中是否存在空间信息。The method according to claim 19, wherein the specified box further includes a flag identifier, and the flag identifier is used to indicate whether space information exists in the box.
  21. 如权利要求19所述的方法,其特征在于,所述指定box为所述第一表示的分段中包含的trun box,所述trun box用于描述一个轨迹的一组连续样本。The method of claim 19 wherein said designated box is a trun box contained in a segment of said first representation, said trun box being used to describe a set of consecutive samples of a track.
  22. 一种基于HTTP动态自适应流媒体的视频数据的处理装置,其特征在于,所述装置包括:A device for processing video data based on HTTP dynamic adaptive streaming media, characterized in that the device comprises:
    接收模块,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述至少两个表示中的第一表示是作者视角码流,所述作者视角码流中包含多个图像,所述多个图像中至少两个图像所关联的空间对象的空间信息不同,所述至少两个表示中的第二表示是静态视角码流,所述静态视角码流中包含多个图像,所述多个图像所关联的空间对象的空间信息相同;a receiving module, configured to receive a media presentation description, where the media presentation description includes at least two representations, the first representation of the at least two representations is an author view code stream, and the author view code stream includes multiple The spatial information of the spatial object associated with at least two of the plurality of images is different, the second representation of the at least two representations is a static view code stream, and the static view code stream includes multiple An image, wherein spatial information of the spatial objects associated with the plurality of images is the same;
    获取模块,用于得到指令信息;Obtaining a module for obtaining instruction information;
    所述获取模块,还用于在得到的所述指令信息是指示观看所述作者视角码流时,获取所述第一表示的分段;在得到的所述指令信息是指示观看所述静态视角码流时,获取所述第二表示的分段。The obtaining module is further configured to: when the obtained instruction information is to indicate viewing the author view code stream, acquire the segment of the first representation; and the obtained instruction information is to indicate viewing the static view At the time of the code stream, the segment of the second representation is obtained.
  23. 如权利要求22所述的装置,其特征在于,所述视角码流中包含的图像所关联的空间对象的空间信息封装在独立于所述视角码流的文件中或者在独立于视角码流的轨迹(track)中。The apparatus according to claim 22, wherein the spatial information of the spatial object associated with the image included in the view code stream is encapsulated in a file independent of the view code stream or independent of the view code stream. In the track.
  24. 如权利要求23所述的装置,其特征在于,所述独立于所述视角 码流的文件是一个元数据(metadata)文件,所述元数据文件中的样本包括所述空间信息。The device of claim 23, wherein said view is independent of said view The file of the code stream is a metadata file, and the samples in the metadata file include the spatial information.
  25. 如权利要求23所述的装置,其特征在于,所述独立于视角码流的轨迹(track)是元数据轨迹,所述元数据轨迹中的样本包括所述空间信息。The apparatus of claim 23, wherein the track independent of the view code stream is a metadata track, and samples in the metadata track comprise the spatial information.
  26. 如权利要求22所述的装置,其特征在于,所述视角码流中包含的图像所关联的空间对象的空间信息封装在所述视角码流中。The apparatus according to claim 22, wherein spatial information of the spatial object associated with the image included in the view code stream is encapsulated in the view code stream.
  27. 如权利要求26所述的装置,其特征在于,所述空间信息封装在所述视角码流的元数据中。The apparatus of claim 26 wherein said spatial information is encapsulated in metadata of said view stream.
  28. 如权利要求26所述的装置,其特征在于,所述视角码流中包括一个flag标识,所述flag标识用于表示视角码流中是否存在空间信息。The apparatus according to claim 26, wherein the view code stream includes a flag identifier, and the flag identifier is used to indicate whether spatial information exists in the view code stream.
  29. 如权利要求28所述的装置,其特征在于,flag标识所在的box中描述了空间对象的空间信息,或者flag标识所在的box中包含了描述空间对象的空间信息的子box。The device according to claim 28, wherein the box in which the flag is located describes the spatial information of the spatial object, or the box in which the flag identifies the subbox containing the spatial information describing the spatial object.
  30. 如权利要求26所述的装置,其特征在于,所述空间信息封装在所述视角码流中的辅助增强信息单元或者参数集单元中。The apparatus of claim 26, wherein the spatial information is encapsulated in an auxiliary enhancement information element or a parameter set unit in the view code stream.
  31. 如权利要求22所述的装置,其特征在于,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识视频的作者视角码流。The device according to claim 22, wherein the media presentation description further comprises identification information, wherein the identification information is used to identify an author view code stream of the video.
  32. 如权利要求22所述的装置,其特征在于,所述媒体呈现描述还包含有标识信息,所述标识信息用于标识所述空间信息封装在独立于所述作者视角码流的文件中。The device according to claim 22, wherein the media presentation description further comprises identification information, wherein the identification information is used to identify the spatial information encapsulated in a file independent of the author view code stream.
  33. 如权利要求32所述的装置,其特征在于,所述独立于所述视角码流的文件中的样本是所述空间信息的元数据。The apparatus of claim 32, wherein the sample in the file independent of the view code stream is metadata of the spatial information.
  34. 如权利要求31所述的装置,其特征在于,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;The apparatus according to claim 31, wherein said media presentation description includes information of an adaptive set, said adaptive set being used to describe media data of a plurality of mutually replaceable encoded versions of the same media content component a data set of segmented attributes;
    其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
  35. 如权利要求31所述的装置,其特征在于,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封 装;The apparatus according to claim 31, wherein said media presentation description includes information indicating that said one or more code streams in a transport format are set and sealed. Load
    其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
  36. 如权利要求31所述的装置,其特征在于,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;The apparatus according to claim 31, wherein said media presentation description includes information of a descriptor, and said descriptor is used to describe spatial information of the associated spatial object;
    其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
  37. 如权利要求34-36任一项所述的装置,其特征在于,所述第一表示的分段中携带所述第一表示的分段包含的图像所关联的空间对象的空间信息;The apparatus according to any one of claims 34 to 36, wherein the segment of the first representation carries spatial information of a spatial object associated with an image included in the segment of the first representation;
    所述获取模块还用于:The obtaining module is further configured to:
    解析所述第一表示的分段,获取所述第一表示的分段包含的图像所关联的空间对象的空间信息。Parsing the segment of the first representation, and acquiring spatial information of a spatial object associated with the image included in the segment of the first representation.
  38. 如权利要求37所述的装置,其特征在于,所述图像所关联的空间对象的空间信息为所述空间对象与其关联的内容成分的空间关系。The apparatus according to claim 37, wherein the spatial information of the spatial object associated with the image is a spatial relationship of the spatial object and its associated content component.
  39. 如权利要求37或38所述的装置,其特征在于,所述空间信息携带在所述第一表示的分段中的指定box中,或者和第一表示的分段相关联的元数据表达中的指定box中。38. Apparatus according to claim 37 or claim 38 wherein said spatial information is carried in a designated box in a segment of said first representation or in a metadata representation associated with a segment of said first representation In the specified box.
  40. 如权利要求39所述的装置,其特征在于,所述指定box为所述第一表示的分段中包含的trun box,所述trun box用于描述一个轨迹的一组连续样本。The apparatus of claim 39, wherein said designated box is a trun box included in a segment of said first representation, said trun box being used to describe a set of consecutive samples of a track.
  41. 如权利要求39所述的装置,其特征在于,所述指定box中还包括一个flag标识,所述flag标识用于表示box中是否存在空间信息。 The apparatus according to claim 39, wherein said designated box further comprises a flag identifier, and said flag identifier is used to indicate whether spatial information exists in the box.
PCT/CN2016/107111 2016-09-30 2016-11-24 Video data processing method and apparatus WO2018058773A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610874490.7 2016-09-30
CN201610874490.7A CN107888939A (en) 2016-09-30 2016-09-30 A kind of processing method and processing device of video data

Publications (1)

Publication Number Publication Date
WO2018058773A1 true WO2018058773A1 (en) 2018-04-05

Family

ID=61763034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107111 WO2018058773A1 (en) 2016-09-30 2016-11-24 Video data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN107888939A (en)
WO (1) WO2018058773A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114097249A (en) * 2019-06-26 2022-02-25 佳能株式会社 Method and apparatus for packaging panoramic images in a file
CN114697631A (en) * 2022-04-26 2022-07-01 腾讯科技(深圳)有限公司 Immersion medium processing method, device, equipment and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833937B (en) * 2018-05-30 2021-03-23 华为技术有限公司 Video processing method and device
WO2020063850A1 (en) * 2018-09-27 2020-04-02 华为技术有限公司 Method for processing media data and terminal and server
CN113228690B (en) * 2018-12-25 2023-09-08 索尼集团公司 Video reproduction device, reproduction method, and program
CN111417008B (en) 2019-01-08 2022-06-03 诺基亚技术有限公司 Method, apparatus and computer readable medium for virtual reality
CN111510757A (en) * 2019-01-31 2020-08-07 华为技术有限公司 Method, device and system for sharing media data stream
JP7373581B2 (en) 2019-03-14 2023-11-02 ノキア テクノロジーズ オサケユイチア Method and apparatus for late binding in media content
JP7286791B2 (en) * 2019-03-20 2023-06-05 北京小米移動軟件有限公司 Method and apparatus for transmitting viewpoint switching capability in VR360
CN115086635B (en) * 2021-03-15 2023-04-14 腾讯科技(深圳)有限公司 Multi-view video processing method, device and equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101589619A (en) * 2007-11-20 2009-11-25 索尼株式会社 Information processing device, information processing method, display control device, display control method, and program
CN101998116A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method, system and equipment for realizing multi-view video service
GB2506911A (en) * 2012-10-12 2014-04-16 Canon Kk Streaming data corresponding to divided image portions (tiles) via a description file including spatial and URL data
CN104010225A (en) * 2014-06-20 2014-08-27 合一网络技术(北京)有限公司 Method and system for displaying panoramic video
CN104904225A (en) * 2012-10-12 2015-09-09 佳能株式会社 Method and corresponding device for streaming video data
US20160088047A1 (en) * 2014-09-23 2016-03-24 Futurewei Technologies, Inc. Ownership Identification, Signaling, and Handling Of Content Components In Streaming Media
CN105791882A (en) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 Video coding method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9147251B2 (en) * 2012-08-03 2015-09-29 Flyby Media, Inc. Systems and methods for efficient 3D tracking of weakly textured planar surfaces for augmented reality applications
EP3561728A1 (en) * 2013-07-26 2019-10-30 Huawei Technologies Co., Ltd. System and method for spatial adaptation in adaptive streaming
CN104469398B (en) * 2014-12-09 2015-12-30 北京清源新创科技有限公司 A kind of Internet video picture processing method and device
CN104602129B (en) * 2015-01-27 2018-03-06 三星电子(中国)研发中心 The player method and system of interactive multi-angle video
CN104735542B (en) * 2015-03-30 2018-09-28 北京奇艺世纪科技有限公司 A kind of video broadcasting method and device
CN105847379A (en) * 2016-04-14 2016-08-10 乐视控股(北京)有限公司 Tracking method and tracking apparatus for panoramic video moving direction
CN105933343B (en) * 2016-06-29 2019-01-08 深圳市优象计算技术有限公司 A kind of code stream caching method for 720 degree of panoramic video netcasts

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101589619A (en) * 2007-11-20 2009-11-25 索尼株式会社 Information processing device, information processing method, display control device, display control method, and program
CN101998116A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method, system and equipment for realizing multi-view video service
GB2506911A (en) * 2012-10-12 2014-04-16 Canon Kk Streaming data corresponding to divided image portions (tiles) via a description file including spatial and URL data
CN104904225A (en) * 2012-10-12 2015-09-09 佳能株式会社 Method and corresponding device for streaming video data
CN104010225A (en) * 2014-06-20 2014-08-27 合一网络技术(北京)有限公司 Method and system for displaying panoramic video
US20160088047A1 (en) * 2014-09-23 2016-03-24 Futurewei Technologies, Inc. Ownership Identification, Signaling, and Handling Of Content Components In Streaming Media
CN105791882A (en) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 Video coding method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114097249A (en) * 2019-06-26 2022-02-25 佳能株式会社 Method and apparatus for packaging panoramic images in a file
CN114697631A (en) * 2022-04-26 2022-07-01 腾讯科技(深圳)有限公司 Immersion medium processing method, device, equipment and storage medium
CN114697631B (en) * 2022-04-26 2023-03-21 腾讯科技(深圳)有限公司 Immersion medium processing method, device, equipment and storage medium
WO2023207119A1 (en) * 2022-04-26 2023-11-02 腾讯科技(深圳)有限公司 Immersive media processing method and apparatus, device, and storage medium

Also Published As

Publication number Publication date
CN107888939A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
WO2018058773A1 (en) Video data processing method and apparatus
CN110121734B (en) Information processing method and device
US11563793B2 (en) Video data processing method and apparatus
WO2018214698A1 (en) Method and device for displaying video information
CN107888993B (en) Video data processing method and device
CN109644262A (en) The method for sending omnidirectional's video, the method for receiving omnidirectional's video, the device for sending omnidirectional's video and the device for receiving omnidirectional's video
US20200389640A1 (en) Method and device for transmitting 360-degree video by using metadata related to hotspot and roi
WO2018126702A1 (en) Streaming media transmission method applied to virtual reality technology and client
CN109218755B (en) Media data processing method and device
US20210176446A1 (en) Method and device for transmitting and receiving metadata about plurality of viewpoints
KR20200066601A (en) Method and apparatus for transceiving metadata for multiple viewpoints
WO2019007096A1 (en) Method and apparatus for processing media information
WO2018058993A1 (en) Video data processing method and apparatus
WO2018072488A1 (en) Data processing method, related device and system
KR20200008631A (en) How to send 360 degree video, how to receive 360 degree video, 360 degree video transmitting device, 360 degree video receiving device
WO2018120474A1 (en) Information processing method and apparatus
CN108271084B (en) Information processing method and device
WO2023169003A1 (en) Point cloud media decoding method and apparatus and point cloud media coding method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16917518

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16917518

Country of ref document: EP

Kind code of ref document: A1