WO2018058993A1 - Procédé et appareil de traitement de données vidéo - Google Patents

Procédé et appareil de traitement de données vidéo Download PDF

Info

Publication number
WO2018058993A1
WO2018058993A1 PCT/CN2017/086548 CN2017086548W WO2018058993A1 WO 2018058993 A1 WO2018058993 A1 WO 2018058993A1 CN 2017086548 W CN2017086548 W CN 2017086548W WO 2018058993 A1 WO2018058993 A1 WO 2018058993A1
Authority
WO
WIPO (PCT)
Prior art keywords
representation
segment
information
switching
code stream
Prior art date
Application number
PCT/CN2017/086548
Other languages
English (en)
Chinese (zh)
Inventor
邸佩云
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201610890964.7A external-priority patent/CN107888993B/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018058993A1 publication Critical patent/WO2018058993A1/fr
Priority to US16/370,052 priority Critical patent/US20190230388A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements

Definitions

  • the present invention relates to the field of streaming media data processing, and in particular, to a method and an apparatus for processing video data.
  • VR video viewing applications such as 360-degree viewing angles are increasingly presented to users.
  • the user may change the view angle (English: field of view, FOV) at any time, and the VR video image appearing within the user's perspective should also be switched.
  • FOV field of view
  • the user in the user experience of the above application scenario, the user has to quickly see the new screen after the switch, and also to see a high quality new picture. Therefore, how to realize efficient and high-quality switching of VR video images is one of the problems to be solved in the video stream data processing in VR applications.
  • the prior art divides the panoramic space of VR video viewing into a plurality of spatial objects, and prepares a set for each spatial object based on dynamic adaptation via a hypertext transfer protocol (HTTP).
  • HTTP hypertext transfer protocol
  • DASH Dynamic Hossion Initiation Protocol
  • the terminal selects the DASH code stream of the spatial object corresponding to the switched perspective to play, and realizes video image switching of different viewing angles.
  • Each area corresponding to the DASH code stream contains a plurality of segments (English: segment), and the video image switching is specifically represented as a play switch between the segment and the segment.
  • the DASH technical specification is mainly composed of two major parts: the media presentation description (English: Media Presentation) Description, MPD) and media file format (English: file format).
  • the server prepares multiple versions of the code stream for the same video content.
  • Each version of the code stream is called a representation in the DASH standard (English: representation).
  • Representation is a collection and encapsulation of one or more codestreams in a transport format, one representation containing one or more segments.
  • the coding parameters of the code rate and resolution of different versions of the code stream may be different, and each code stream is divided into a plurality of small files, and each small file is called a segment (English: segment).
  • the server prepares three representations for a movie, including rep1, rep2, and rep3.
  • rep1 is a code rate of 4mbps (mega ratio per second) High-definition video
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a standard-definition video with a code rate of 1mbps.
  • the segment marked as shaded in Figure 3 is the segmentation data requested by the client.
  • the first three segments requested by the client are segments representing rep3, the fourth segment is switched to rep2, and the fourth segment is requested.
  • switch to rep1 request the fifth and sixth segments, and so on.
  • Each represented segment can be stored in a file end to end, or it can be stored as a small file.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
  • the media presentation description is called MPD
  • the MPD can be an xml file.
  • the information in the file is described in a hierarchical manner. As shown in FIG. 2, the information of the upper level is completely inherited by the next level. Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the http-URL of the request segment.
  • media presentation is a collection of structured data for presenting media content
  • media presentation description English: media presentation description
  • a standardized description of media presentation files for providing streaming media services Period English: period
  • representation English: representation
  • a structured data set (encoded individual media types, such as audio, video, etc.) is a collection and encapsulation of one or more code streams in a transport format, one representation comprising one or more segments
  • Set (English: AdaptationSet), which represents a set of multiple interchangeable coded versions of the same media content component, an adaptive set containing one or more representations
  • a subset English: subset
  • the information is a media unit referenced by the HTTP uniform resource locator in the media presentation description, and the segmentation information describes segmentation
  • the related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC 23009-1:2014 ⁇ Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats. You can also refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
  • Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment.
  • VR mainly includes simulation environment, perception, natural skills and sensing equipment.
  • the simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has.
  • there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception.
  • Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors.
  • a sensing device is a three-dimensional interactive device.
  • VR video or 360 degree video, or Omnidirectional video
  • only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented.
  • VR video is that the entire video content will be presented to the user; VR video is only a subset of the entire video is presented to the user (English: in VR typically only a Subset of the entire video region represented by the video pictures).
  • a Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an Adaptation Set or a Sub-Representation.”
  • spatial object The spatial relationship between spatial objects (Spatial Objects) is described in MPD.
  • a spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation.
  • ROI region of interest
  • the existing DASH standard defines some descriptor elements in the MPD.
  • Each descriptor element has two attributes, schemeIdURI and value.
  • schemeIdURI describes what the current descriptor is, and the value is the parameter value of the descriptor.
  • Figure 6 is a schematic diagram showing the spatial relationship of spatial objects.
  • the image AS can be set as a content component, and AS1, AS2, AS3, and AS4 are four spatial objects included in the AS, and each spatial object is associated with a space.
  • the spatial relationship of each spatial object is described in the MPD, for example, each spatial object. The relationship between the associated spaces.
  • the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to one sub-view, one or The splicing of multiple sub-views forms a complete human eye viewing angle.
  • the viewing angle of the human eye is usually 120 degrees * 120 degrees.
  • the viewing angle 1 corresponding to the frame 1 and the viewing angle 2 corresponding to the frame 2 are shown in FIG.
  • the server may prepare a set of video code streams for each spatial object. Specifically, the server may obtain encoding configuration parameters of each code stream in the video, and generate a code stream corresponding to each spatial object of the video according to the encoding configuration parameters of the code stream.
  • the client may request a video stream segment corresponding to a certain angle of view for a certain period of time when the video is output, and output a spatial object corresponding to the perspective.
  • the client outputs the video stream segment corresponding to all the angles of view within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree spatial object.
  • the client may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the client may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
  • FIG. 9 is a schematic diagram of a spatial object according to an embodiment of the present invention. The client can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I.
  • the client may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view.
  • the client can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 10 in a latitude and longitude mapping manner.
  • a set of DASH code streams can be prepared for each spatial object by the server.
  • Each spatial object corresponds to one sub-view, and a set of DASH code streams corresponding to each spatial object is a view code stream of each sub-view.
  • the spatial information of the spatial object associated with each image in a view code stream is the same, whereby the view code stream can be set as a static code stream.
  • the DASH code stream corresponding to the corresponding spatial object may be selected for playing according to the viewing angle currently viewed by the user.
  • the client can determine the DASH code stream corresponding to the target space object of the switch according to the new perspective selected by the user, and then switch the video play content to the DASH code stream corresponding to the target space object.
  • the nine view code streams of repA to repI correspond to the nine spatial objects of A to I in the latitude and longitude map, respectively.
  • the repA is any one of a set of DASH code streams corresponding to the space object A.
  • the repA is taken as an example in the embodiment of the present invention.
  • each of the sub-view streams is respectively in any one of a group of DASH streams corresponding to the corresponding spatial object.
  • repB, repC, ..., and repI are taken as an example for description.
  • the segment alignment included in the view code stream of each sub-view is the same as the length of the segment included in each view code stream in the same time period.
  • Segment alignment of streams of different view streams enables different view code streams to switch between video image presentations of segments as the view angle switches. For example, the user switches after the third segment of repD ends. Go to the 4th segment of repB, and then switch to the 6th segment of repC at the end of the 5th segment playback of repB. The video image presented by the client is switched from the screen of the D view to the screen of the B view, and then to the screen of the C view.
  • the embodiment of the present invention provides a switching code stream having a segment duration different from a view code stream, and the playback duration corresponding to the segment included in the switching code stream is smaller than the playback duration of the segment included in the corresponding view code stream.
  • Each set of switching code streams corresponds to a set of view code streams (as shown in FIG. 11, repA represents a set of view code streams, repA' represents a set of switch code streams), and a set of switch code streams includes one or more switch code streams.
  • Each group of switching code streams corresponds to a spatial object.
  • the switching code stream and the corresponding view code stream correspond to the same spatial object, that is, the content components of the code stream segment of the same time period included in the switching code stream and the corresponding view code stream are the same.
  • the server prepares a set of switching code streams for each sub-view while preparing the view code stream of the video stream data, that is, each group of view code streams corresponds to a set of switching code streams.
  • Each set of view code streams and their corresponding switched code streams contain the same sub-views (ie, the same spatial objects), except that the segment length of the view code stream is longer, and the segment length of the switch code stream is shorter.
  • the client first selects the switching code stream, so that the client will present a high-quality video with a new perspective after a short time; when the client detects the segment of the switching code stream, it can go to the view stream.
  • the client's representation is switched from the switching code stream to the view code stream, so that the user's best experience can be guaranteed under the same bandwidth condition.
  • the server in order to enable the client side to identify the switching code stream, the server needs to add a syntax element corresponding to the switching code stream when generating the MPD, and the client may obtain a switch corresponding to the view code stream according to the syntax element.
  • Code stream information When the server generates the MPD, a representation for describing the switching code stream may be added to the MPD, and a description may include description information of one or more switching code streams, and the representation may also be referred to as a switching code stream representation or a first representation.
  • An existing representation in the MPD for describing a view stream may be referred to as a view stream representation or a media representation or a second representation.
  • the code stream of the new perspective can be quickly selected to present a high-quality video of a new perspective.
  • the representation of several possible MPD syntax elements is as follows. It can be understood that the MPD example of the embodiment of the present invention only shows the relevant part of the existing standard that modifies the syntax element of the MPD in the existing standard, and does not show all the syntax elements of the MPD file, and those skilled in the art. The technical solution of the embodiment of the present invention can be applied in conjunction with the relevant provisions in the DASH standard.
  • Table 2 is a syntax information table:
  • the switching code stream in the corresponding representation is marked in the MPD by the attribute @FovType. Under the same viewing angle, code rate and other parameters, the client prefers to represent the representation of the switching code stream to present a new perspective.
  • the relevant MPD examples are as follows.
  • Table 3 is another syntax information table:
  • the above-mentioned switch-representation is the same as other representation contents belonging to an adaptation set, but not all the segments can be seamlessly switched with other representations, and the representation can only be switched when the specified segment is in another representation. , indicating that the representation is a switching code stream.
  • the client When the view angle is switched, the client first obtains the segment of the representation to present a new perspective.
  • a new syntax element is added to the MPD, and the representation is grouped, one set is a representation specified in the existing DASH standard, and the other set is a representation of the switched code stream.
  • Examples of related MPDs are as follows:
  • the embodiment of the invention provides a method and a device for processing video data, which can improve the switching efficiency of media data segmentation and enhance the user experience of video viewing.
  • the first aspect provides a method for processing video data, which may include:
  • the client obtains the switching instruction information, which may include the above-mentioned head rotation, eyes, gestures, or other human behavior action information, and may also include input information of the user, and the input information may include keyboard input information and voice input. Information and touch screen input information, etc.
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the identification information used to identify the first representation may exist in multiple representations, which is more flexible and more applicable.
  • the first representation in the video is identified by the representation type identifier, and when the spatial object switching instruction is received, the segment of the first representation of the target with a short playback duration is preferentially switched, thereby improving the stream segmentation switching. Efficiency, quickly presenting the video content corresponding to the switched video space area to the user, enhancing the user experience of video viewing.
  • the switching point information is used to identify the switching segment information indicating that the first representation and the second representation are switched; wherein the switching segment information includes: a segment interval, a first representation At least one of a segmented position and a segmented position of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the switching segment information of the first representation and the second representation for content switching may be identified by using the switching point information, and the switching segment information may exist in multiple representation forms, and the flexibility is higher, and the applicability is stronger. .
  • the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
  • the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
  • the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
  • the identifier information used to identify the first representation may be carried in the media presentation description in multiple representations, and further, may be carried in different location attribute information of the media presentation description, and the flexibility is higher. , the applicability is stronger.
  • the obtaining a target representation according to the current playing time and the target representation Segmentation including:
  • the segment in which the playback start time is the first time is determined as the target presentation segment.
  • the play start time of each segment may be determined according to the play duration of each segment included in the target representation, and the target of the play start time closest to the current play time may be represented according to the current play time.
  • the segmentation is determined as the target segment of the video switch, and the target segment can be presented at the playback start time of the target segment, which ensures that the video content played during the view switching is coherent, the video content is presented smoothly, and the user experience of the video viewing is enhanced.
  • the media presentation description may refer to an example in the foregoing MPD.
  • the switching code stream may refer to the example in FIG.
  • the switching instruction information includes information indicating a viewing angle to be switched
  • the client may determine the information of the view code stream and the switching code stream according to the switching instruction information, such as the ID of the view stream or Store location information, switch the ID of the stream or store location information, and so on.
  • the client may obtain the spatial object associated with the switched target view according to the switching instruction information, and then associate the spatial object associated with the switched target angle with each switched code stream.
  • the spatial object determines a target switching code stream (or target representation) from a plurality of switching code streams.
  • the segment of the target switching code stream to be played (ie, the target representation segment) may be determined according to the current playing time, and then the corresponding HTTP request is constructed according to the URL template included in the MPD, thereby requesting corresponding Segmentation of the switching code stream.
  • the segmented URL may be constructed according to the information of the target switching code stream according to the current playing time.
  • the client After receiving the segmentation of the switching code stream, the client can directly render.
  • the client After an implementation manner of the embodiment of the present invention, the client also switches from the switching code stream to the view code stream corresponding to the switched perspective. Thereby protecting the user's good experience.
  • a syntax element description of the switching point information is also added in the MPD.
  • a method for switching a coded stream to a view code stream is described in the embodiment of the present invention. Because the switching code stream and the view code stream are not switched between each segment, the embodiment of the present invention provides a description method of the switching point, and the description information is stored in the file of the media data in the on-demand application scenario. The description information is stored in the MPD in the live application scenario. The two modes are compatible with the existing DASH protocol, and the changes to the existing CDN and the client are minimal, and the switching code stream and the view stream are also supported. Switching.
  • the switching point information between the view code stream (ie, the non-switched code stream) and the switched code stream is described in the file, and the specific syntax is as follows:
  • the flag in the box of sidx takes a value of 1, which may indicate that the switch point information is included in the sidx box, or may indicate the handover information of each segment.
  • FOV_group_change_Info This information identifies information about the expression switching of the current segment and other duration/FOVGroup/FovType attributes.
  • the switching point information between the view code stream and the switched code stream is described in the MPD, and the specific syntax is as shown in Table 4 below.
  • Table 4 For another grammar information table:
  • the FOV_group_change_Info information is added to an existing sidx box, and the information may also be added to other boxes, such as
  • the client may implement switching from the switching code stream to the view code stream in the following manner.
  • the client obtains an index segment in the switching code stream, parses the sidx information, and obtains information of the segment switching point (FOV_group_change_Info);
  • the client When the client detects the switching point information of a segment, the current segment can switch to the segment of the view stream; the client finds the segment in the view stream that can be switched with the current segment according to the playback start time information of the FOV_group_change_Info/current segment.
  • the information is used to construct the URL of the segment of the view stream; as shown in FIG.
  • the client detects the FOV_group_change_Info information of the 5th segment of the view switching code stream repA', and determines that the 5th segment can be switched to the repA, the client According to the playback start time of the 5th segment of repA', find the segment closest to the playback start time of the 5th segment of repA' in repA (the second segment in repA), construct The segmentURL; according to the URL of the constructed view stream, the client requests a segment of the view stream.
  • the second aspect provides a client, which can include:
  • an obtaining module configured to parse the media presentation description, and obtain the identifier information, where the identifier information is used to identify the first representation of the video, where the playback duration of the segment described by the first representation is smaller than the second representation of the video.
  • a receiving module configured to obtain switching instruction information, where the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object;
  • a determining module configured to use the identifier information acquired by the acquiring module and the cut received by the receiving module Transmitting the instruction information to obtain a target representation, the target representation corresponding to the target spatial object;
  • the acquiring module is further configured to acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation determined by the determining module.
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the switching point information is used to identify switching segment information indicating that the first representation and the second representation are switched;
  • the switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
  • the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
  • the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
  • the acquiring module is specifically configured to:
  • the segment in which the playback start time is the first time is determined as the target presentation segment.
  • the third aspect provides a method for processing video data, which may include:
  • the server generates a first representation of the video according to the encoding configuration parameter of the first representation, and generates a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is smaller than the second Indicates the duration of the described segmentation;
  • the server generates a media presentation description, where the media presentation description includes identification information, and the identifier information is used to identify a first representation of the video.
  • the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation
  • the playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
  • the identifier information describes switching point information of the first representation and the segment of the second representation.
  • the switching point information is used to identify that the first representation and the second representation are content-cut. Switched segmentation information;
  • the switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • a fourth aspect provides a server, which can include:
  • a generating module configured to generate a first representation of the video according to the encoding configuration parameter of the first representation, and generate a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is less than The second representation indicates the playing duration of the segment described;
  • a description module configured to generate a media presentation description, where the media presentation description includes identification information, where the identifier information is used to identify a first representation of the video.
  • the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation
  • the playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
  • the identifier information describes switching point information of the first representation and the segment of the second representation.
  • the switch point information is used to identify switch segment information that is used for content switching between the first representation and the second representation;
  • the switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the fifth aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
  • the media presentation description including at least two representations, the representation including attribute information describing a media data segment, the media presentation description further comprising at least two handover code stream representations, the handover code
  • the stream representation includes attribute information describing a data segment of the switched code stream
  • the at least two representations of the associated spatial object and the at least two switched code stream representations have a one-to-one correspondence between the spatial objects, a media data segment described in a media representation
  • the corresponding playing duration is greater than a playing duration corresponding to a data segment of a switching code stream described in the switching code stream representation corresponding to the media representation;
  • Target switching code stream request information Determining, according to the target switching code stream, target switching code stream request information, where the switching code stream request information is used Request a partial data segment of the target switching code stream.
  • the media presentation description further includes spatial information of the associated spatial object of the switched code stream, where the spatial information is used to describe a content component associated with the switched spatial representation and the associated content component Spatial relationship
  • the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component.
  • the information of the adaptive set includes information represented by the at least two switched code streams.
  • the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format
  • the information represented by the information includes information represented by the at least two switched code streams.
  • the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
  • the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
  • the switching segment information includes: at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream;
  • the switching point information is a flag indicating the switching capability of the segment.
  • a sixth aspect provides a client, which can include:
  • a receiving module configured to receive a media presentation description, where the media presentation description includes at least two representations, the representation includes attribute information describing a media data segment, and the media presentation description further includes at least two handover code stream representations
  • the switching code stream representation includes attribute information describing a data segment of the switched code stream, wherein the at least two representations of the associated spatial object and the at least two switched code stream representations are associated with the spatial object
  • the acquiring module is further configured to obtain a target switching code stream representation according to the switching instruction information and the media presentation description, where the target view switching code stream is represented by the at least two switching code stream representations a switching code stream representation;
  • the acquiring module is further configured to obtain target switching code stream request information according to the target switching code stream representation, where the switching code stream request information is used to request a partial data segment of the target switching code stream.
  • the media presentation description further includes: switching the code stream representation to associate the spatial object Spatial information for describing a spatial relationship between a spatial object associated with the switched code stream representation and its associated content component;
  • the obtaining module is specifically configured to:
  • the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component.
  • the information of the adaptive set includes information represented by the at least two switched code streams.
  • the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format
  • the information represented by the information includes information represented by the at least two switched code streams.
  • the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
  • the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
  • the switching segment information includes: at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the seventh aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
  • the media presentation description including at least two representations, the representation comprising at least one segment, a segmentation duration of the first representation of the at least two representations being less than a score of the second representation Period length;
  • first representation indicates that the associated spatial object corresponds to the spatial object associated with the second representation
  • the first representation carries handover point information.
  • the media presentation description carries the identifier information
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
  • the switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the carrying handover point information is carried in a designated box in the first representation.
  • the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
  • the representation type identifier is used to identify the first representation.
  • the media presentation description includes information of an adaptation set
  • the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format
  • the information that is represented includes the identifier information.
  • the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the eighth aspect provides a client, which can include:
  • a receiving module configured to receive a media presentation description, where the media presentation description includes information of at least two representations, the representation includes at least one segment, and a segmentation duration of the first representation of the at least two representations is less than a segmentation duration of the second representation; wherein the first representation indicates that the associated spatial object corresponds to a spatial object associated with the second representation;
  • the acquiring module is further configured to acquire the segment of the first representation according to the representation switching instruction, and acquire the segment of the second representation after a preset time.
  • the first representation carries handover point information.
  • the media presentation description carries the identifier information
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
  • the switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
  • the switching point information is a flag indicating the switching capability of the segment.
  • the carrying handover point information is carried in a designated box in the first representation.
  • the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
  • the representation type identifier is used to identify the first representation.
  • the media presentation description includes information of an adaptation set
  • the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format
  • the information that is represented includes the identifier information.
  • the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the embodiment of the present invention may identify the switching code stream and the view code stream included in the video according to the identifier information carried in the media presentation description.
  • the target switching code stream corresponding to the target spatial object may be identified from the plurality of switching code streams of the video according to the target spatial object, and then the target switching code may be determined according to the video playing time when the spatial object is switched.
  • the playback duration of the segment of the switching code stream is smaller than the playback duration of the segment of the view code stream. Therefore, when the spatial object is switched, the switching code stream segment with a shorter playback duration can be switched to improve the segmentation switching corresponding to the spatial object. Play efficiency and enhance the user experience.
  • the segment of the target view code stream corresponding to the target space object may be obtained and presented, and the segment switch play of the corresponding view code stream when the space object is switched is completed.
  • the client can switch to the playback of the target view code stream after completing the intermediate transition of the space object switching by the target switching code stream, which can ensure the stability of the video playback after the space object is switched, and enhance the user experience of the video viewing.
  • FIG. 1 is a schematic diagram of an example of a framework for DASH standard transmission used in system layer video streaming media transmission
  • FIG. 2 is a schematic structural diagram of an MPD transmitted by a DASH standard used for system layer video streaming media transmission
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a segmentation storage manner in code stream data
  • 5 is another schematic diagram of a segmentation storage manner in code stream data
  • Figure 6 is a schematic diagram showing the spatial relationship of a spatial object
  • FIG. 7 is a schematic diagram of a change in a spatial object corresponding to a change in a viewing angle
  • FIG. 8 is a schematic flowchart of a method for processing video data according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a spatial object according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of segmentation of a DASH code stream
  • FIG. 11 is another schematic diagram of segmentation of a DASH code stream
  • FIG. 12 is another schematic diagram of a change in a spatial object corresponding to a change in a viewing angle
  • FIG. 13 is a schematic structural diagram of a client according to an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 15 is another schematic structural diagram of a client according to an embodiment of the present invention.
  • FIG. 16 is another schematic structural diagram of a client according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a frame example of DASH standard transmission used in system layer video streaming media transmission.
  • the data transmission process of the system layer video streaming media transmission scheme includes two processes: a server side (such as an HTTP server, a media content preparation server, hereinafter referred to as a server), a process of generating media data for video content, and a client (such as an HTTP streaming client). End) The process of requesting and obtaining media data from the server in response to the client request.
  • the media data includes a media presentation description (MPD) file and a media code stream.
  • MPD media presentation description
  • the MPD on the server includes a plurality of representations (English: representation), each representation describing a plurality of segments.
  • the HTTP streaming request control module of the client obtains the MPD sent by the server, analyzes the MPD, determines the information of each segment of the video code stream described in the MPD, and further determines the segment to be requested, and requests the receiving end through the HTTP request.
  • the server requests the corresponding segment and decodes it through the media player.
  • the media data generated by the server for the video content includes a video code stream corresponding to different video qualities of the same video content, and an MPD file of the video code stream.
  • the server generates a low-resolution low-rate low frame rate (such as 360p resolution, 300kbps code rate, 15fps frame rate) for the video content of the same episode, and a medium-rate medium-rate high frame rate (such as 720p). Resolution, 1200 kbps, 25 fps frame rate, high resolution, high bit rate, high frame rate (such as 1080p resolution, 3000 kbps, 25 fps frame rate).
  • FIG. 2 is a schematic structural diagram of an MPD of a system transmission scheme DASH standard.
  • each of the information indicating a plurality of segments (English: Segment) according to the time series, such as Initialization Segment, Media Segment 1, Media Segment 2, ... , Media Segment20 and so on.
  • the representation may include a play start time, a play duration, and a network storage address (for example, in the form of a Uniform Resource Locator (URL). Segmentation information such as network storage address).
  • the client In the process of the client requesting and obtaining the media data from the server, when the user selects to play the video, the client obtains the corresponding MPD according to the video content requested by the user to the server.
  • the client sends a request for downloading the code stream segment corresponding to the network storage address to the server according to the network storage address of the code stream segment described in the MPD, and the server sends the code stream segment to the client according to the received request.
  • the client After the client obtains the stream segment sent by the server, it can perform decoding, playback, and the like through the media player.
  • the system layer video streaming media transmission scheme adopts the DASH standard, and realizes the transmission of video data by analyzing the MPD by the client, requesting the video data to the server as needed, and receiving the data sent by the server.
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • the server can prepare three different video quality stream data for the same video content (such as a movie), and describe the three different video quality stream data in the MPD using three Representations.
  • the above three Representations (hereinafter referred to as rep) can be assumed to be rep1, rep2, rep3, and the like.
  • rep1 is a high-definition video with a code rate of 4mbps (megabits per second)
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a normal video with a code rate of 1mbps.
  • Each rep segment contains a video stream within a time period.
  • each rep describes the segments of each time segment according to the time series, and the segment lengths of the same time period are the same, thereby enabling content switching of segments on different reps.
  • the segment marked as shadow in the figure is the segmentation data requested by the client, wherein the first 3 segments requested by the client are segments of rep3, and the client may request rep2 when requesting the 4th segment.
  • the fourth segment in the middle can be switched to play on the fourth segment of rep2 after the end of the third segment of rep3.
  • the playback end point of the third segment of Rep3 (corresponding to the time end of the playback time) is the playback start point of the fourth segment (corresponding to the time start time of playback), and also rep2 or rep1.
  • the playback start point of the 4th segment is used to achieve alignment of segments on different reps. After the client requests the 4th segment of rep2, it switches to rep1, requests the 5th segment and the 6th segment of rep1, and so on. Then you can switch to rep3, request the 7th segment of rep3, then switch to rep1, request the 8th segment of rep1.
  • segment of the different rep is switched, and a segment of the previous rep is needed (for example, the third segment on rep3 in FIG. 3, marked as segment3).
  • segment3 the third segment on rep3 in FIG. 3, marked as segment3
  • the time is the playback start time of segment4, and the video content of segment3 and segment4 is continuous.
  • Each rep segment can be stored in a file end to end, or it can be stored as a small file.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS). It can be determined according to the requirements of the actual application scenario, and no limitation is imposed here.
  • FIG. 4 is a schematic diagram of a segment storage mode in the code stream data; All the segments on the same rep are stored in one file, as shown in Figure 5.
  • Figure 5 is another schematic diagram of the segmentation storage mode in the code stream data.
  • each segment in the segment of repA is stored as a file separately, and each segment in the segment of repB is also stored as a file separately.
  • the server can be in the code stream
  • the MPD can describe information such as the URL of each segment in the form of a template or a list.
  • the server may use an index segment (English: index segment, that is, sidx in FIG. 5) in the MPD of the code stream to describe related information of each segment.
  • the index segment describes the byte offset of each segment in its stored file, the size of each segment, and the duration of each segment (duration, also known as the duration of each segment, referred to as the duration).
  • the rendering space of the VR video is a space of 360 degrees, which exceeds the normal visual range of the human eye, and therefore, the user
  • the viewing angle ie, the angle of view, FOV
  • the user views different views, and the video images that are viewed will also be different. Therefore, the content of the video playback needs to change as the user's perspective changes.
  • Box 1 and box 2 are respectively spatial objects corresponding to two different perspectives of the user, wherein different spatial objects display different segments of the video code stream.
  • the user can switch the viewing angle of the video viewing from the frame 1 to the frame 2 through the operation of the eye or the head rotation or the screen switching of the video viewing device.
  • the video image viewed by the user when the user's perspective is the frame 1 is a video image presented by the content included in a segment of the video stream.
  • the user's perspective is switched to box 2.
  • the video image viewed by the user should also be switched to the video image presented by the space object corresponding to the box 2 at this moment.
  • the video image is the content of another segment.
  • the method and device for processing video data provided by the embodiments of the present invention can provide a more efficient and better manner of switching the visual experience for the switching of video stream segments caused by the view switching.
  • FIG. 8 is a schematic flowchart diagram of a method for processing video data according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention includes the following steps:
  • the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user, For example, the space object 1 corresponding to the frame 1 described in FIG. 7 and the space object 1 corresponding to the frame 2 are used. Further, the server may prepare a set of video code streams for each spatial object. Specifically, the server may obtain encoding configuration parameters of each code stream in the video, and generate corresponding spatial objects of the video according to the encoding configuration parameters of the code stream. Code stream.
  • the client may request a video segment corresponding to a certain sub-view of a certain period of time to be output to the spatial object corresponding to the view when the video is output.
  • the client outputs the video segments corresponding to all the sub-views within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree space.
  • the client may first map the spherical surface into a plane, and divide the space on the plane. Specifically, the client may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
  • FIG. 9 is a schematic diagram of a spatial object according to an embodiment of the present invention. The client can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I. Further, the client can also map the sphere to a cube.
  • the plurality of faces of the cube are developed to obtain a plan view, or the spherical surface is mapped to another polyhedron, and a plurality of faces of the polyhedron are developed to obtain a plan view or the like.
  • the client can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 9 in a latitude and longitude mapping manner.
  • a set of DASH code streams can be prepared for each spatial object by the server.
  • Each spatial object corresponds to one sub-view
  • a set of DASH code streams corresponding to each spatial object is a view code stream of each sub-view.
  • the view code stream of each sub-view is part of the entire video stream, and the view code streams of all sub-views constitute a complete video stream. That is, in a specific implementation, a set of DASH code streams corresponding to each spatial object are view code streams, and the entire video can be divided into multiple view code streams, and a specific space object (set to a specified space object) corresponds to a view angle.
  • a code stream can be referred to as a specified view code stream.
  • the corresponding DASH code stream corresponding to one or more spatial objects may be selected for playing according to the viewing angle currently viewed by the user.
  • the client can determine the DASH code stream (or the target view code stream) corresponding to the target object of the switch according to the new perspective selected by the user, and then switch the video play content to the target space object.
  • Corresponding DASH code stream. 10 is a schematic diagram of a segmentation of a DASH code stream.
  • the 10 view code streams of repA to repI correspond to 9 spatial objects of A to I in the warp and latitude chart, respectively.
  • the repA is any one of a set of DASH code streams corresponding to the space object A.
  • the repA is taken as an example in the embodiment of the present invention.
  • each of the sub-view streams is respectively in any one of a group of DASH streams corresponding to the corresponding spatial object.
  • repB, repC, ..., and repI are taken as an example for description.
  • the segment alignment included in the view code stream of each sub-view is the same as the length of the segment included in each view code stream in the same time period.
  • the segment alignment of the code streams of different viewing angles enables different video streams to seamlessly switch the video content of the segment as the viewing angle is switched. For example, the user switches to the fourth segment of repB after the third segment of repD ends, and then switches to the sixth segment of repC when the fifth segment of repB ends.
  • the video image presented by the client is switched from the screen of the D view to the screen of the B view, and then to the screen of the C view.
  • the client in the switching mode of the view code stream shown in FIG. 10, if the client just plays the third segment of the repD, and the duration of the third segment is 5 seconds, the user switches the angle of view from the D view. From the perspective of B, the client needs to wait until the third segment is played before switching to the fourth segment of repB, and the user needs to wait for 5 seconds before seeing the video image of view B.
  • the duration of this 5s will give the user a sense of discomfort. Usually, the delay is more than 200ms, and the user will feel uncomfortable.
  • the duration of the segment of the view code stream is simply shortened, for example, 200 ms, although the presentation time of the video image of the new view angle can be shortened when the view angle is switched, the compression performance of the video will be seriously affected.
  • the video quality of the 200ms segment is much worse than the video quality of the 5s segment.
  • a larger transmission bandwidth or higher compression performance is required, which increases the transmission bandwidth requirement and compression performance requirements of the video stream data, and increases the video output cost of the view switching.
  • the embodiment of the present invention provides a switching code stream (set as a first representation or a switched code stream representation) having a segment duration different from a view code stream, and the duration of the segment included in the switching code stream is smaller than that of the corresponding view code stream.
  • the length of the segment corresponds to a set of view code streams, and one set of switching code streams includes one or more switching code streams, and each group of switching code streams corresponds to one spatial object.
  • the switching code stream is associated with the same spatial pair of its corresponding view code stream For example, the video content of the code stream segment of the same time period included in the switching code stream and its corresponding view code stream is the same.
  • the server prepares a set of switching code streams for each view angle while preparing the view code stream of the video code stream data, that is, each set of view code streams corresponds to a set of switching code streams.
  • Each set of view code streams and their corresponding switched code streams contain the same sub-views (ie, the same spatial objects), except that the segment length of the view code stream is longer, and the segment length of the switch code stream is shorter.
  • the server may obtain an encoding configuration parameter of the view code stream (set as the second encoding configuration parameter) and an encoding configuration parameter of the switching code stream (set as the first encoding configuration parameter), and generate a first representation according to the first encoding configuration parameter. Generating a second representation according to the second encoding configuration parameter.
  • the first encoding configuration parameter may include a playing duration (set to the first playing duration) of the first represented segment (set as the first presentation segment), a first spatial object corresponding to the first representation, and the like.
  • the second encoding configuration parameter may include a playing duration of the second representation segment (set to the second representation segment) (set to the second playback duration), a second spatial object corresponding to the second representation, and the like.
  • the client may parse the MPD sent by the server, and distinguish the switching code stream and the view code stream of the video according to the foregoing identification information, where the code stream of the rep description carrying the identifier information may be a switching code stream or a part carrying the identifier information.
  • the segment is a segment of the switching code stream, and the like.
  • the identifier information may be an identifier of a code stream type (or a representation type identifier), a playback duration of the segment, or information of a switch point.
  • the server may describe the segment location information that the switching code stream can switch to the view code stream in the switching code stream by using the foregoing identifier information, or describe the segment location information that the switching code stream can switch to the view code stream in the MPD.
  • the plurality of segments of the switching code stream may have one or more location points (or switching points, specifically, locations of segments that can be switched) that can be switched to the view code stream, and the view code stream and its corresponding switching code.
  • the stream can be switched in the segment of the specified switching location contained in the switching code stream.
  • the code stream is switched to the segment in the view code stream at the segment position of the switching code designation switching position, and the video content before and after the code stream switching is continuous.
  • the segment alignment between different view code streams, the segments between different switch code streams are also aligned, so the segments between different switch code streams can be freely switched, and the switching code stream and the view code stream are switched before and after the video.
  • repA, repB, repC, and repD are view code streams corresponding to spatial objects A, B, C, and D, respectively (corresponding to the sub-view of FIG. 9).
  • repA' is one of a set of switching code streams corresponding to the spatial object A
  • repA' and repA correspond to the same sub-viewpoint
  • repA' may be a switching code stream corresponding to repA.
  • repB' may be a switching code stream corresponding to repB
  • repC' may be a switching code stream corresponding to repC
  • repD' may be a switching code stream corresponding to repD.
  • the segment alignment between repA, repB, repC, and repD can be switched at the end of each segment (also the playback start time of the next segment) according to the switching of the view angle (ie, the content seamlessly switches).
  • the segment alignment between repA', repB', repC', and repD' can be switched freely according to the switching of the angle of view at the end of playback of each segment (also the playback start time of the next segment).
  • the view code stream can be switched in the switching code stream in the specified segment of the switching code stream, as shown in FIG. 11 for the specified segment corresponding to T2 (the second segment of the switching code stream, and T2 is the playback start time of the segment).
  • the switching code stream can be switched to the segment of the view code stream at the specified switching point, such as T3 or T4 in FIG. Where T3 is the playback start time of the second segment of the view stream.
  • the view code stream and the switch code stream are described in the MPD.
  • the client requests the MPD from the server, and then can parse the MPD sent by the server, and obtain the identification information of the switching code stream from the MPD.
  • the client may also obtain the view code stream information of the view code stream from the MPD, for example, the view code streams of the foregoing repA, repB, repC, and repD.
  • Viewing stream information may include a duration of each segment in the view code stream, a related URL of each segment, and the like. For details, refer to the segmentation information described in the DASH standard.
  • the client may also obtain switching code stream information of the switching code stream from the MPD, for example, switching code stream information of the switching code stream such as repA', repB', repC', and repD'.
  • the foregoing switching code stream information may include a duration of each segment in the switching code stream, a URL related to each segment, and the like.
  • the switching code stream information further includes the foregoing identification information for identifying the switching code stream.
  • the representation type identifier is used to identify the first representation. If the handover instruction of the spatial object is received, the client preferentially selects the segment of the designated first representation corresponding to the specified spatial object switched by the spatial object to switch the video content.
  • the client may also determine the switching code stream and the view code stream in the video according to the playing duration of the segment of the code stream.
  • the switching point information is used to identify the switching segment information of the switching code stream and the view code stream for seamlessly switching the content, including: switching the code stream segmentation interval of the switching code stream to the view code stream switching, and switching the code stream to the view code stream switching. Switching the code stream segment position and switching the code stream to the view code stream segment position of the view code stream, and the like.
  • the identifier information may be carried in the attribute information of the code stream set of the switching code stream carried in the media presentation description (such as the attribute information of the adaptation set); or the identifier information is carried in the media presentation description.
  • Switching the attribute information of the code stream (as in the attribute information of the representation described above); or carrying the attribute information of the code stream segment of the switching code stream carried in the media presentation description (as in the attribute information of the segment above). In a specific implementation, it may also be carried in an index segment of a target switching code stream that needs to perform video content switching.
  • the foregoing representation type identifier may be a new syntax element in the MPD, and the code stream used to identify the rep description carrying the syntax element is a handover code stream.
  • the client can quickly identify the switching code stream and the view code stream by using the added syntax elements in the MPD, and then select the target switching code stream corresponding to the target space object of the view switching from each switching code stream when the view angle is switched. .
  • the syntax elements may include: FovType, FovGroup, FOV_group_change_Info, and the like. The following describes the description of several possible MPD syntax elements:
  • Table 2 is a property information table of a syntax element:
  • the client can parse the MPD of the video stream. If the character FovType is carried in a representation from the MPD and the value of the FovType is not limited, the code stream described by the representation can be determined as the switching code stream. If the code stream is switched, the client prefers the representation to present a new perspective under the same viewing angle, code rate and other parameters, thereby improving the switching efficiency of the view and enhancing the user experience.
  • the other descriptions in the above examples are the same as the descriptions of the related MPDs provided in the DASH standard. For details, refer to the description provided in the DASH standard, which is not limited herein. The related descriptions of the following examples can also be found in the description provided in the DASH standard. Said.
  • Table 3 is an attribute information table of another syntax element:
  • the above-mentioned switch-representation is the same as other representation contents belonging to an adaptation set, but not all the segments can be seamlessly switched with other representations, and the representation can only be switched when the specified segment is in another representation. , indicating that the representation is a switching code stream.
  • the client When the view angle is switched, the client first obtains the segment of the representation to present a new perspective.
  • a new syntax FovGroup is added to the MPD, and the representation is grouped.
  • One group is the view code stream, which is the existing representation, and the other group is the new code stream, that is, the switching code stream.
  • packet information is added in the representation, and packets that are freely switchable between the segments are determined based on the packet information.
  • the representations in the group can be switched freely in each group, that is, the segments belonging to the representation of the view code stream can be switched freely, and the segments belonging to the representation of the switching code stream can be freely switched.
  • the segments of the two representations are aligned and can be switched seamlessly. .
  • the identifier information carried in the MPD may be an existing syntax element in the MPD, for example, a duration attribute corresponding to the segment.
  • the client can use the duration attribute corresponding to the segment included in the MPD to use the code stream with the shortest playback duration as the switching code stream.
  • the request and playback of the related view code stream may be performed according to the perspective of the user when watching the video, and The operation of switching between the view code stream and the switching code stream.
  • the space object corresponding to the first view may be first determined according to the view angle of the currently viewed video of the user (set as the first view) (set to The current spatial object, and the first view code stream corresponding to the first view angle (or the current view code stream) may be determined according to the spatial object corresponding to each view code stream described in the MPD.
  • the client may request the first view code stream from the server according to the view code stream information of the first view code stream.
  • the first view code stream can be sent to the client.
  • the client can decode and play the first view code stream. For example, assume that the first view code stream is the one of FIG. repD, after the client obtains the above repD, it can start playing repD from the first segment of repD (which can be marked as segmentD1).
  • the identifier information carried in the MPD in the embodiment of the present invention may also be carried in an .m3u8 file defined by an HTTP-based real-time stream (English: Http Live Streaming, HLS) or a smooth stream (English: Smooth Streaming)
  • the SS.ismc file may be determined according to the requirements of the actual application scenario, and is not limited herein. The embodiment of the present invention will be described by taking the above identification information in the DASH code stream as an example.
  • FIG. 12 is another schematic diagram of a spatial object change corresponding to a change in viewing angle.
  • each spatial object is prepared with a set of view code streams and a switching code stream.
  • the dashed box in (a), (b), and (c) of FIG. 12 may be represented as a currently presented spatial object (ie, a current spatial object), and the solid line frame may be represented as a spatial object that is rendered after switching (ie, a target spatial object). .
  • the view angle corresponding to the current space object includes the space objects A, B, D, and E; the view angle corresponding to the switched target space object may include the space objects B, C, E, and F, or the switched The perspective corresponding to the target space object may also include the spatial objects C and F, which are not limited herein.
  • the view angle corresponding to the current space object includes the space objects A, B, D, and E; the view angle corresponding to the switched target space object may include the space objects E, F, H, and I, or the switched The perspective corresponding to the target space object may include the spatial objects F, H, and I, and is not limited herein.
  • FIG. 12(b) the view angle corresponding to the current space object includes the space objects A, B, D, and E; the view angle corresponding to the switched target space object may include the space objects E, F, H, and I, or the switched The perspective corresponding to the target space object may include the spatial objects F, H, and I, and is not limited herein.
  • the viewing angle corresponding to the current spatial object may include the spatial objects A and B; the viewing angle corresponding to the switched target spatial object includes the spatial objects E, F, H, and I, which are not limited herein.
  • the switching of the video content brought about by the spatial object switching will be described below in conjunction with step 704.
  • the perspective of the user watching the video may be monitored during the process of playing the first view code stream by the client.
  • the view switching instruction is received (ie, the switching instruction information of the current video space is switched to the target space object)
  • the target view code stream that needs to be switched may be determined according to the new view information carried in the view switching instruction information (as shown in FIG. 11). repB).
  • the new view information carried in the view switching request may be a target space object of the view switch.
  • the client may select a target view code stream corresponding to the target space object from each view code stream in the video code stream according to the spatial object corresponding to each view code stream described in the MPD.
  • the client may further determine, according to the indication information corresponding to each switching code stream described in the MPD, a switching code stream (ie, a target code stream, or a target representation) corresponding to the target space object, and then may be used in each switching code stream. Select the target switching code stream corresponding to the target perspective (such as repB in Figure 11).
  • a switching code stream ie, a target code stream, or a target representation
  • the client constructs the URL of the segment to be requested according to the target switching code stream information described in the MPD.
  • the target segment can be requested from the server according to the above URL, and the target segment can be acquired and played.
  • the client may obtain segmentation information of each segment of the target switching code stream described in the MPD, where the segmentation information may include a duration of play corresponding to each segment (hereinafter referred to as duration), and the segment may be calculated according to the duration information.
  • the playback start time or the client calculates the playback start time of each segment according to the duration information of the segment in the sidx box.
  • the play start time is selected from each segment of the target switch code stream. a segment that is closest to the switching triggering time, and determines a playback start time of the segment (ie, the first target segment is set as the first segment) as a time at which the first view code stream is switched to the target switching code stream (set to One moment).
  • the client determines the first segment, constructs the URL of the first segment and sends the URL request to the server.
  • the server may send the segmentation data of the segment to the client.
  • the client receives the view switching request at time T1, and then switches to play the video data of the first segment at time T2 after determining the first segment (assuming a second segment of repB').
  • the target switching code stream is a switching code stream corresponding to the target view code stream, and the video content included in the target switching code stream is the same as the video content included in the target view code stream, and the segmentation of the target switching code stream is played.
  • the duration is less than the playback duration of the segment of the target view stream. Since the duration of the segment of the switched code stream is smaller than the length of the segment of the view stream, the client does not need to wait until the current segment of the current view stream (eg, segmentD1) ends, and then switches to the new view, that is, switches to the first A segment (assumed to be the second segment of repB') improves the switching efficiency of the stream segmentation.
  • the video content included in the switching code stream is the same as the video content included in the corresponding view code stream
  • the quality of the video data of the switching code stream may be the same as the quality of the video data included in the corresponding view code stream, or
  • the quality of the video data of the switching code stream is slightly lower than the video data included in the corresponding view code stream, which can ensure a new perspective of the higher quality video image presented to the user after the fast switching, and avoids the delay to bring discomfort to the user, and enhances the User experience for VR video viewing.
  • the target view code stream may be requested from the server according to the target view code stream information carried in the MPD.
  • the client may obtain description information (or segmentation information) of the switching code stream in the MPD, where the description information includes segmentation duration information of the switching code stream and spatial information of the switching code stream.
  • the segmentation duration information of the handover code stream describes the duration of the segment of the handover code stream
  • the spatial information describes the spatial object corresponding to the handover code stream.
  • the client may also obtain description information of the target view code stream in the MPD, where the description information includes segmentation duration information of the target view code stream and spatial information.
  • the segmentation duration information of the view code stream describes the duration of the segment of the view code stream
  • the space information describes the space object corresponding to the view code stream.
  • the client calculates the starting play time of each segment through the duration of the segment of the target view code stream; determines and switches the view stream with the same view angle of the code stream through the spatial information, and finds the play start time closest to the view code stream.
  • the segment of the current play time which in turn determines the playback start time of the segment as the second time.
  • the client may request the segment from the server according to the URL of the segment, receive and decode the segment, and then switch to the segment to play at the second moment.
  • the client may calculate the initial play time of each segment of the view code stream by using the duration of the segment of the view code stream; and calculate the switch code stream by switching the duration of the segment of the code stream.
  • the starting play time of each segment Further, a segment position in which the target view code stream and the target switching code stream are aligned in the playback start time may be determined.
  • the playback start time alignment refers to that when the switching code stream is switched to the view code stream at the segment position, the video content played before and after the switching is continuous and not repeated.
  • the client may request the segment from the server according to the URL of the segment, receive and decode the segment, and then switch to the segment to play at the second moment.
  • the client may also perform handover of the target switching code stream and the target view code stream according to the switching point information described in the MPD.
  • the MPD of the video code stream generated by the server not only marks the switching code stream, but also marks the position where each switching code stream can be switched to the view code stream, that is, the switching point of the switching code stream and the view code stream. Mark it.
  • Table 4 below is a description table of the switching point indication information of the view code stream and the switching code stream:
  • the FOV_group_change_Info is used to mark information such as a switching point of the switching code stream to the view code stream switching, where the switching point information is used to identify the content of the first representation (ie, the switching code stream) and the second representation (ie, the view code stream).
  • Switching segmentation information for seamless switching.
  • the switching segmentation information includes: a first representation segmentation interval indicating a switch to the second representation, a first representation segmentation position in which the first representation switches to the second representation, and a second representation switching to the second representation. Indicates the position of the segment, etc.
  • the specific MPD example is as follows:
  • the second target stream segment can be directly determined by the identifier information.
  • the switching start time of the switched code stream and the view code stream can be determined by the playback start time of the second segment of the view code stream.
  • the client can obtain the FOV_group_change_Info information by parsing the MPD, determine the switching segment location information of each switching code stream and its corresponding view code stream, and further determine the switching code stream according to the switching segment location information. The segment of the corresponding view code stream is switched.
  • the switching segment closest to the playback start time of the target switching code stream may be selected as the target first representation segment, that is, the target switching code stream is directed to the target viewing angle.
  • Segmentation of code stream switching In this semantics, FOV_group_change_Info can be placed in the syntax layer of the adaptation set or representation, which can be determined according to the actual application scenario, and is not limited here.
  • the client may request the target switching code stream from the server, and after detecting the switching point information of the switching code stream to the view code stream switching, according to the switching point The indication of the information, the client requests the second target stream segment of the target view stream, and presents the segment at the playback start time of the segment.
  • the switching point information between the view code stream and the switched code stream can also be described in the sixd box of the code stream.
  • the syntax of the above sixd box is described in ISO/IEC 14496-12 as follows:
  • reference_ID the ID of the code stream
  • Timescale time unit
  • Earliest_presentation_time The earliest rendering time of the code stream described in the index segment, in units of timescale;
  • First_offset the starting offset of the first segment after the index segment
  • Reference_count the number of segments described in the index segment
  • Reference_type 1 indicates that the segment is an index segment, and 0 indicates that the segment is a media content
  • Referenced_size the size of the segment
  • Subsegment_duration the duration of the segment in timescale
  • starts_with_SAP the stream access type of the segment
  • SAP_delta_time The earliest rendering time of the first streaming access point.
  • FOV_group_change_Info switching point identification information, indicating that the current segment (ie, the target first representation segment) can be switched with other arbitrary representations having the same content component, that is, the target first representation is switched to the target second representation.
  • the first indicates the segment position.
  • the FOV_group_change_Info information may be a segment switch indicating whether the current segment can be rep with other reps carrying attribute information such as Duration/FOVGroup/FovType.
  • the indication information of the current segment switchable view code stream may be described in the segment information of the segment carrying the information, and the view code corresponding to the switch code stream may be determined by the indication information of the view code stream.
  • the FOV_group_change_Info information may also be a value of a segment ID of a code rate that carries the attribute information such as the Duration/FOVGroup/FovType that the segment carrying the information can currently switch.
  • the switching point information between the view code stream and the switching code stream can also be described in other new boxes, such as:
  • FOV_group_change_Info This information indicates the interval of the segment in which the segment of the switching stream is switched to the view code stream.
  • the client may determine the target cut according to the switch point information carried in the segment information of the target switching code stream.
  • the escape point is switched to the switching point of the target view code stream switching, and then the target view code stream is requested from the server according to the information such as the URL of the target view code stream described in the MPD.
  • the segmentation information of the target switching code stream may include switching segment location information that is switched by the target switching code stream to the target view code stream, for example, a switching segment position indicated by a value of a FOV_group_change_Info element carried in the MPD, or the FOV_group_change_Info The segmentation interval of the switching segment specified by the value of the element, and the like.
  • the client may switch to the segment of the target switching code stream corresponding to the target switching code stream according to the current view code stream (set the first switching segment, for example, the second segment of repB'), and combine the value of the FOV_group_change_Info mentioned above.
  • the second segment of the code stream switches.
  • the client may calculate the playback start time of each segment according to the duration of the segment in the MPD or the duration of the segment in the sidx box, and determine the second moment by the playback start time of the segment. For example, the time at which the playback start time of the segment in the view code stream and the segment play start time in the switching code stream are closest to each other is determined as the second time.
  • the server may request the target segment of the target view code stream corresponding to the time (such as the second segment of repB in FIG. 10, labeled segmentB2), where the second moment may be segmentB2.
  • the playback start time, or the second time is the shortest distance from the playback start time of segmentB2.
  • the client may select a target handover segment from each segment, such as segment B2, by comparing the second moment with the playback start time of each segment in the target view stream, and request the segment from the server.
  • the client may switch the play video data to segmentB2 when the target switch code stream is played to the playback start time of segmentB2, and present the user with the second-view high-quality video.
  • the client can switch the played video data from the current view code stream to the target switch code stream before switching the video data played by the client to the target view code stream. The speed of the video presents the user with a new perspective.
  • the client may switch the play video data to the target view code stream at a second moment when the preset target switch code stream is switched to the target view code stream.
  • the client plays segmentD1
  • the user triggers the view switching request at time T1
  • the client can switch to the first segment at time T2, so that the user can present a new perspective in a short time between T1 and T2. Picture.
  • the first segment can be switched to the segment B2 at the time T3, and the switching from the first perspective to the second perspective is completed.
  • the user triggers the view switching request at the time T1, and the client needs to wait for the segmentD1 to play after the end, and then switch to segmentB2 at time T3, and the user needs to wait for the new view time (T3). -T1). If (T3-T1) is greater than 200ms, it will bring discomfort to the user and the user experience is low.
  • the segmentation information of the target switching code stream may include one or more switching moments of the target switching code stream switching to the target viewing angle code stream, where the switching moment is used to indicate that the target switching code stream can be directed to
  • the time node of the target view code stream switching may be specifically represented as the playback start time of a certain segment, such as the play start time T3 of the segment B2 and the play start time T4 of the segment B3 in FIG. 10 .
  • the switching time may specifically be a playback start time of a certain segment, for example, a playback start time of the second segment.
  • the server side may add indication information of the switching moment in the segment information field of the target switching code stream described in the MPD or the index segment.
  • the indication information of the switching time may be obtained from the MPD or the index segment, and the switching moment of the target switching code stream to the target view code stream switching is determined.
  • the client determines the switching moments of the target switching code stream to the target view code stream switching, the most recent switching time of the specific first moment can be selected as the switching moment of the current target switching code stream to the target viewing angle code stream switching (ie, the second time).
  • the client may request a segment (such as repB2) whose starting time is closest to the second time in the respective segments of the target view code stream from the server, and switch to the segment to play.
  • the first moment may be a playback start time of the first segment
  • the second moment may be a playback start time of the second segment
  • the interval between the first segment and the second segment 3 segments the duration between the first time and the second time is N (assumed to be 3) times the length of the code stream segmentation of the target switching code stream.
  • N is an integer greater than or equal to 1, and may be determined according to an actual application scenario, and is not limited herein.
  • the client may parse the MPD of the video data, determine the view code stream information of each view code stream in the video data, and switch code stream information of each switch code stream.
  • the client may request to play the server according to the view angle of the video currently viewed by the user and the view code stream information of each view code stream determined above or the view code stream corresponding to the current view.
  • the client can switch the played video data from the current view code stream to the target switch code stream before switching the video data played by the client to the target view code stream.
  • the speed of the video presents the user with a new perspective.
  • the client may switch the play video data to the target view code stream when the target switch code stream is played to the second time after determining the second time when the target switch code stream is switched to the target view code stream.
  • the embodiment of the present invention can provide a switching code stream, so that the client can quickly switch the code stream to the switching code stream in the process of switching the viewing angle of the terminal user to obtain a high-quality new perspective, and by switching the code stream and the view code stream.
  • the switching point information is such that the client switches to the view code stream after requesting a piece of switching code stream, so that the compression performance of the code stream received by the client is optimal, and the best experience of the view video can be guaranteed under the same bandwidth condition. .
  • FIG. 13 is a schematic structural diagram of a client provided by an embodiment of the present invention.
  • the client provided by the embodiment of the present invention includes:
  • the obtaining module 131 is configured to parse the media presentation description, and obtain the identifier information, where the identifier information is used to identify the first representation of the video, and the playback duration of the segment of the first representation is smaller than the segment of the second representation of the video. The playing time.
  • the receiving module 132 is configured to obtain switching instruction information, where the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object.
  • a determining module 133 configured to determine, according to the identifier information acquired by the acquiring module and the switching instruction information received by the receiving module, a target representation, the target representation and a location from a first representation of the video The target space object corresponds.
  • the obtaining module 131 is further configured to acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation determined by the determining module.
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the switching point information is used to identify that the first representation and the second representation are represented Switched segmentation information
  • the switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
  • the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
  • the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
  • the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
  • the acquiring module is specifically configured to:
  • the segment in which the playback start time is the first time is determined as the target presentation segment.
  • the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the client may perform the implementation modes described in the foregoing steps in the foregoing embodiments by using the built-in modules, and details are not described herein. .
  • FIG. 14 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the client provided by the embodiment of the present invention includes:
  • the generating module 141 is configured to generate a first representation of the video according to the encoding configuration parameter of the first representation, and generate a second representation of the video according to the encoding configuration parameter of the second representation, where the playback duration of the segment of the first representation is smaller than The playback duration of the segment represented by the second representation.
  • the description module 142 is configured to generate a media presentation description, where the media presentation description carries the identifier information, where the identifier information is used to identify the first representation of the video.
  • the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation
  • the playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
  • the identifier information describes switching point information of the first representation and the segment of the second representation.
  • the switch point information is used to identify switch segment information that is used for content switching between the first representation and the second representation;
  • the switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
  • the server provided by the embodiment of the present invention may be specifically the server in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein.
  • FIG. 15 is another schematic structural diagram of a client provided by an embodiment of the present invention.
  • the client provided by the embodiment of the present invention includes:
  • the receiving module 151 is configured to receive a media presentation description, where the media presentation description includes at least two representations, the representation includes attribute information describing a media data segment, and the media presentation description further includes at least two handover code streams Representing that the switched code stream representation includes attribute information describing a data segment of the switched code stream, wherein the at least two representations of the associated spatial object and the at least two switched code stream representations are associated with the space There is a one-to-one correspondence between objects, and a media data segment described in one media representation corresponds to a playback duration corresponding to a data segment of a switched code stream described in the switched code stream representation corresponding to the media representation. Play time.
  • the obtaining module 152 is configured to obtain switching instruction information.
  • the obtaining module 152 is further configured to obtain a target switching code stream representation according to the switching instruction information and the media presentation description, where the target view switching code stream is represented as the at least two switching code stream representations A switch code stream representation.
  • the obtaining module 152 is further configured to obtain target switching code stream request information according to the target switching code stream representation, where the switching code stream request information is used to request a partial data segment of the target switching code stream.
  • the media presentation description further includes spatial information of the associated spatial object of the switched code stream, where the spatial information is used to describe a content component associated with the switched spatial representation and the associated content component Spatial relationship
  • the obtaining module 152 is specifically configured to:
  • the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component.
  • the information of the adaptive set includes information represented by the at least two switched code streams.
  • the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format
  • the information represented by the information includes information represented by the at least two switched code streams.
  • the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
  • the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
  • the switching segment information includes at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream.
  • the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein again.
  • FIG. 16 is another schematic structural diagram of a client provided by an embodiment of the present invention.
  • the client provided by the embodiment of the present invention includes:
  • the receiving module 161 is configured to receive a media presentation description, where the media presentation description includes information of at least two representations, the representation includes at least one segment, and a segmentation duration of the first representation of the at least two representations a segmentation duration that is less than the second representation; wherein the first representation represents an associated space object and an empty associated with the second representation The corresponding object corresponds.
  • the obtaining module 162 is configured to obtain switching instruction information.
  • the obtaining module 162 is further configured to acquire the segment of the first representation according to the representation switching instruction, and acquire the segment of the second representation after a preset time.
  • the first representation carries handover point information.
  • the media presentation description carries the identifier information
  • the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  • the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
  • the switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
  • the carrying handover point information is carried in a designated box in the first representation.
  • the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
  • the representation type identifier is used to identify the first representation.
  • the media presentation description includes information of an adaptation set
  • the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component.
  • the information of the adaptive set includes the identifier information.
  • the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format
  • the information that is represented includes the identifier information.
  • the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
  • the information of the descriptor includes the identifier information.
  • the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein again.
  • the embodiment of the present invention may identify the switching code stream and the view code stream included in the video according to the identifier information carried in the media presentation description.
  • the target switching code stream corresponding to the target spatial object may be identified from the plurality of switching code streams of the video according to the target spatial object, and then the target switching code may be determined according to the video playing time when the spatial object is switched.
  • the playback duration of the segment of the switching code stream is smaller than the playback duration of the segment of the view code stream. Therefore, when the spatial object is switched, the switching code stream segment with a shorter playback duration can be switched to improve the segmentation switching corresponding to the spatial object. Play efficiency and enhance the user experience.
  • the segment of the target view code stream corresponding to the target space object may be obtained and presented, and the segment switch play of the corresponding view code stream when the space object is switched is completed.
  • the client can switch to the playback of the target view code stream after completing the intermediate transition of the space object switching by the target switching code stream, which can ensure the stability of the video playback after the space object is switched, and enhance the user experience of the video viewing.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

L'invention concerne un procédé et un appareil de traitement de données vidéo. Le procédé consiste à : analyser une description de présentation multimédia et acquérir des informations d'identification, les informations d'identification étant utilisées pour identifier une première représentation d'une vidéo, et une durée de lecture d'un segment de la première représentation étant plus courte qu'une durée de lecture d'un segment d'une seconde représentation de la vidéo; obtenir des informations d'instruction de commutation, les informations d'instruction de commutation étant utilisées pour ordonner de commuter un objet d'espace actuel vers un objet d'espace cible; déterminer une représentation cible depuis l'intérieur de la première représentation de la vidéo d'après les informations d'identification et les informations d'instruction de commutation, la représentation cible correspondant à l'objet d'espace cible; et acquérir un instant de lecture actuel de la vidéo, et obtenir un segment de représentation cible d'après l'instant de lecture actuel et la représentation cible. La présente invention présente les avantages d'améliorer l'efficacité de commutation d'un segment de données vidéo et d'améliorer l'expérience de visionnage vidéo de l'utilisateur.
PCT/CN2017/086548 2016-09-30 2017-05-31 Procédé et appareil de traitement de données vidéo WO2018058993A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/370,052 US20190230388A1 (en) 2016-09-30 2019-03-29 Method and apparatus for processing video data

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201610878496 2016-09-30
CN201610878496.1 2016-09-30
CN201610890964.7A CN107888993B (zh) 2016-09-30 2016-10-11 一种视频数据的处理方法及装置
CN201610890964.7 2016-10-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/370,052 Continuation US20190230388A1 (en) 2016-09-30 2019-03-29 Method and apparatus for processing video data

Publications (1)

Publication Number Publication Date
WO2018058993A1 true WO2018058993A1 (fr) 2018-04-05

Family

ID=61763092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086548 WO2018058993A1 (fr) 2016-09-30 2017-05-31 Procédé et appareil de traitement de données vidéo

Country Status (1)

Country Link
WO (1) WO2018058993A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111757305A (zh) * 2020-06-16 2020-10-09 西安闻泰电子科技有限公司 业务执行终端的切换方法、装置、系统及存储介质
US11323683B2 (en) * 2019-01-08 2022-05-03 Nokia Technologies Oy Method, an apparatus and a computer program product for virtual reality
CN114513674A (zh) * 2020-11-16 2022-05-17 上海科技大学 互动直播数据传输/处理方法、处理系统、介质及服务端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204973A1 (en) * 2010-10-06 2013-08-08 Humax Co., Ltd. Method for transmitting a scalable http stream for natural reproduction upon the occurrence of expression-switching during http streaming
CN104025604A (zh) * 2012-07-02 2014-09-03 索尼公司 传输设备、传输方法、以及网络装置
CN104509119A (zh) * 2012-04-24 2015-04-08 Vid拓展公司 用于mpeg/3gpp-dash中平滑流切换的方法和装置
WO2015150736A1 (fr) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Diffusion en continu en multidiffusion
CN105612753A (zh) * 2013-10-08 2016-05-25 高通股份有限公司 媒体流传输期间在适配集合间的切换

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204973A1 (en) * 2010-10-06 2013-08-08 Humax Co., Ltd. Method for transmitting a scalable http stream for natural reproduction upon the occurrence of expression-switching during http streaming
CN104509119A (zh) * 2012-04-24 2015-04-08 Vid拓展公司 用于mpeg/3gpp-dash中平滑流切换的方法和装置
CN104025604A (zh) * 2012-07-02 2014-09-03 索尼公司 传输设备、传输方法、以及网络装置
CN105612753A (zh) * 2013-10-08 2016-05-25 高通股份有限公司 媒体流传输期间在适配集合间的切换
WO2015150736A1 (fr) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Diffusion en continu en multidiffusion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11323683B2 (en) * 2019-01-08 2022-05-03 Nokia Technologies Oy Method, an apparatus and a computer program product for virtual reality
US20220247990A1 (en) * 2019-01-08 2022-08-04 Nokia Technologies Oy Method, An Apparatus And A Computer Program Product For Virtual Reality
US11943421B2 (en) * 2019-01-08 2024-03-26 Nokia Technologies Oy Method, an apparatus and a computer program product for virtual reality
CN111757305A (zh) * 2020-06-16 2020-10-09 西安闻泰电子科技有限公司 业务执行终端的切换方法、装置、系统及存储介质
CN111757305B (zh) * 2020-06-16 2023-12-19 西安闻泰电子科技有限公司 业务执行终端的切换方法、装置、系统及存储介质
CN114513674A (zh) * 2020-11-16 2022-05-17 上海科技大学 互动直播数据传输/处理方法、处理系统、介质及服务端

Similar Documents

Publication Publication Date Title
CN107888993B (zh) 一种视频数据的处理方法及装置
WO2018058773A1 (fr) Procédé et appareil de traitement de données vidéo
KR102247399B1 (ko) 가상 현실 미디어 콘텐트의 적응적 스트리밍을 위한 방법, 디바이스, 및 컴퓨터 프로그램
KR102261559B1 (ko) 정보 처리 방법 및 장치
WO2018214698A1 (fr) Procédé et dispositif d'affichage d'informations vidéo
BR112019019836A2 (pt) sinalização de informações importantes de vídeo em streaming de vídeo em rede usando parâmetros tipo mime
TW201924323A (zh) 用於浸入式媒體資料之內容來源描述
CN109644262A (zh) 发送全向视频的方法、接收全向视频的方法、发送全向视频的装置和接收全向视频的装置
CN109362242B (zh) 一种视频数据的处理方法及装置
WO2018068236A1 (fr) Procédé de transmission de flux vidéo, dispositif associé, et système
US11323683B2 (en) Method, an apparatus and a computer program product for virtual reality
CN108282449B (zh) 一种应用于虚拟现实技术的流媒体的传输方法和客户端
WO2020043126A1 (fr) Procédés et appareils de traitement et de transmission de données vidéo, et système de traitement de données vidéo
US20210176446A1 (en) Method and device for transmitting and receiving metadata about plurality of viewpoints
WO2018058993A1 (fr) Procédé et appareil de traitement de données vidéo
WO2019007096A1 (fr) Procédé et appareil de traitement d'informations multimédias
WO2018072488A1 (fr) Système, dispositif associé et procédé de traitement de données
US11677978B2 (en) Omnidirectional video processing method and device, related apparatuses and storage medium
KR20240007142A (ko) 5g 네트워크들을 통한 확장 현실 데이터의 분할 렌더링
KR20200008631A (ko) 360도 비디오를 전송하는 방법, 360도 비디오를 수신하는 방법, 360도 비디오 전송 장치, 360도 비디오 수신 장치
WO2018120474A1 (fr) Procédé et appareil de traitement d'informations
CN114930869A (zh) 用于视频编码和视频解码的方法、装置和计算机程序产品
CN108271084B (zh) 一种信息的处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17854449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17854449

Country of ref document: EP

Kind code of ref document: A1