WO2018058993A1 - Video data processing method and apparatus - Google Patents

Video data processing method and apparatus Download PDF

Info

Publication number
WO2018058993A1
WO2018058993A1 PCT/CN2017/086548 CN2017086548W WO2018058993A1 WO 2018058993 A1 WO2018058993 A1 WO 2018058993A1 CN 2017086548 W CN2017086548 W CN 2017086548W WO 2018058993 A1 WO2018058993 A1 WO 2018058993A1
Authority
WO
WIPO (PCT)
Prior art keywords
representation
segment
information
switching
code stream
Prior art date
Application number
PCT/CN2017/086548
Other languages
French (fr)
Chinese (zh)
Inventor
邸佩云
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201610878496 priority Critical
Priority to CN201610878496.1 priority
Priority to CN201610890964.7A priority patent/CN107888993B/en
Priority to CN201610890964.7 priority
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018058993A1 publication Critical patent/WO2018058993A1/en
Priority claimed from US16/370,052 external-priority patent/US20190230388A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements

Abstract

Disclosed are a video data processing method and apparatus, the method comprising: parsing a media presentation description and acquiring identification information, the identification information being used to identify a first representation of a video, and a playback duration of a segment of the first representation being shorter than a playback duration of a segment of a second representation of the video; obtaining switching instruction information, the switching instruction information being used to instruct to switch a current space object to a target space object; determining a target representation from within the first representation of the video according to the identification information and the switching instruction information, the target representation corresponding to the target space object; and acquiring a current playback time of the video, and obtaining a target representation segment according to the current playback time and the target representation. The present invention has the advantages of improving switching efficiency of a video data segment and enhancing the video viewing user experience.

Description

一种视频数据的处理方法及装置Method and device for processing video data 技术领域Technical field
本发明涉及流媒体数据处理领域,尤其涉及一种视频数据的处理方法及装置。The present invention relates to the field of streaming media data processing, and in particular, to a method and an apparatus for processing video data.
背景技术Background technique
随着虚拟现实(英文:virtual reality,VR)技术的日益发展完善,360度视角等VR视频的观看应用越来越多地呈现在用户面前。在VR视频观看过程中,用户随时可能变换视角(英文:field of view,FOV),此时出现在用户视角内的VR视频图像也应当随着切换。在VR的应用中,在上述应用场景的用户体验上,用户既要快速看到切换后的新画面,也要看到高质量的新画面。因此,如何实现VR视频图像的高效高质量切换,是VR应用中的视频码流数据处理亟待解决的问题之一。With the development of virtual reality (VR) technology, VR video viewing applications such as 360-degree viewing angles are increasingly presented to users. During the VR video viewing process, the user may change the view angle (English: field of view, FOV) at any time, and the VR video image appearing within the user's perspective should also be switched. In the application of the VR, in the user experience of the above application scenario, the user has to quickly see the new screen after the switch, and also to see a high quality new picture. Therefore, how to realize efficient and high-quality switching of VR video images is one of the problems to be solved in the video stream data processing in VR applications.
现有技术将VR视频观看的全景空间划分为多个空间对象,并为每个空间对象准备一组基于通过超文本传输协议(英文:hypertext transfer protocol,HTTP)动态自适应(英文:dynamic adaptive streaming over HTTP,DASH)码流。在用户变换视角时,终端选择切换后的视角对应的空间对象的DASH码流进行播放,实现不同视角的视频图像切换。每个区域对应的DASH码流中包含多个分段(英文:segment),视频图像切换具体表现为segment与segment之间的播放切换。视角切换时,需要等待当前播放的segment播放结束之后才能切入下一个segment。在动态图像专家组(英文:Moving Picture Experts Group,MPEG)组织批准的现有MPEG-DASH标准中,规定了表示不同的视频质量的码流之间的segment的切换方式。但是在现在的大部分应用当中,每个segment的时长(英文:duration)为5秒等较长的时间,因此视角切换时,用户可能需要等待5秒才能看到切换后的新视角的画面。然而,在VR应用中,视角切换的时延超过200ms则将给用户带来不适,因此5秒的时间间隔将会给用户带来不适,终端的用户体验低,VR视频观看的效果差。The prior art divides the panoramic space of VR video viewing into a plurality of spatial objects, and prepares a set for each spatial object based on dynamic adaptation via a hypertext transfer protocol (HTTP). Over HTTP, DASH) code stream. When the user changes the viewing angle, the terminal selects the DASH code stream of the spatial object corresponding to the switched perspective to play, and realizes video image switching of different viewing angles. Each area corresponding to the DASH code stream contains a plurality of segments (English: segment), and the video image switching is specifically represented as a play switch between the segment and the segment. When the angle of view is switched, you need to wait for the currently playing segment to end before you can cut into the next segment. In the existing MPEG-DASH standard approved by the Moving Picture Experts Group (MPEG) organization, a segment switching manner between code streams indicating different video qualities is specified. However, in most of the current applications, the duration of each segment (English: duration) is 5 seconds, etc., so when the view is switched, the user may have to wait 5 seconds to see the new perspective after the switch. However, in a VR application, the delay of the view switching time exceeds 200 ms, which will cause discomfort to the user. Therefore, the 5-second time interval will bring discomfort to the user, the user experience of the terminal is low, and the effect of VR video viewing is poor.
发明内容Summary of the invention
一、MPEG-DASH技术介绍First, MPEG-DASH technology introduction
2011年11月,MPEG组织批准了DASH标准,DASH标准是基于HTTP协议传输媒体流的技术规范(以下称DASH技术规范);DASH技术规范主要由两大部分组成:媒体呈现描述(英文:Media Presentation Description,MPD)和媒体文件格式(英文:file format)。In November 2011, the MPEG organization approved the DASH standard, which is a technical specification for transmitting media streams based on the HTTP protocol (hereinafter referred to as the DASH technical specification); the DASH technical specification is mainly composed of two major parts: the media presentation description (English: Media Presentation) Description, MPD) and media file format (English: file format).
1、媒体文件格式1, media file format
在DASH中服务器会为同一个视频内容准备多种版本的码流,每个版本的码流在DASH标准中称为表示(英文:representation)。表示是在传输格式中的一个或者多个码流的集合和封装,一个表达中包含一或者多个分段。不同版本的码流的码率、分辨率等编码参数可以不同,每个码流分割成多个小的文件,每个小文件被称为分段(英文:segment)。在客户端请求媒体分段数据的过程中可以在不同的表示之间切换,如图3所示,服务器为一部电影准备了3个表示,包括rep1,rep2,rep3。其中,rep1是码率为4mbps(每秒兆比 特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的标清视频。图3中标记为阴影的分段是客户端请求播放的分段数据,客户端请求的前三个分段是表示rep3的分段,第四个分段切换到rep2,请求第四个分段,之后切换到rep1,请求第五个分段和第六个分段等。每个表示的分段可以首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。In DASH, the server prepares multiple versions of the code stream for the same video content. Each version of the code stream is called a representation in the DASH standard (English: representation). Representation is a collection and encapsulation of one or more codestreams in a transport format, one representation containing one or more segments. The coding parameters of the code rate and resolution of different versions of the code stream may be different, and each code stream is divided into a plurality of small files, and each small file is called a segment (English: segment). In the process of requesting media segmentation data by the client, it is possible to switch between different representations. As shown in FIG. 3, the server prepares three representations for a movie, including rep1, rep2, and rep3. Among them, rep1 is a code rate of 4mbps (mega ratio per second) High-definition video, rep2 is a standard-definition video with a code rate of 2mbps, and rep3 is a standard-definition video with a code rate of 1mbps. The segment marked as shaded in Figure 3 is the segmentation data requested by the client. The first three segments requested by the client are segments representing rep3, the fourth segment is switched to rep2, and the fourth segment is requested. Then switch to rep1, request the fifth and sixth segments, and so on. Each represented segment can be stored in a file end to end, or it can be stored as a small file. The segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
2、媒体呈现描述2, the media presentation description
在DASH标准中,媒体呈现描述被称为MPD,MPD可以是一个xml的文件,文件中的信息是采用分级方式描述,如图2所示,上一级的信息被下一级完全继承。在该文件中描述了一些媒体元数据,这些元数据可以使得客户端了解服务器中的媒体内容信息,并且可以使用这些信息构造请求segment的http-URL。In the DASH standard, the media presentation description is called MPD, and the MPD can be an xml file. The information in the file is described in a hierarchical manner. As shown in FIG. 2, the information of the upper level is completely inherited by the next level. Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the http-URL of the request segment.
在DASH标准中,媒体呈现(英文:media presentation),是呈现媒体内容的结构化数据的集合;媒体呈现描述(英文:media presentation description),一个规范化描述媒体呈现的文件,用于提供流媒体服务;时期(英文:period),一组连续的时期组成整个媒体呈现,时期具有连续和不重叠的特性;表示(英文:representation),封装有一个或多个具有描述性元数据的的媒体内容成分(编码的单独的媒体类型,例如音频、视频等)的结构化的数据集合即表示是传输格式中一个或者多个码流的集合和封装,一个表示中包含一个或者多个分段;自适应集(英文:AdaptationSet),表示同一媒体内容成分的多个可互替换的编码版本的集合,一个自适应集包含一个或者多个表示;子集(英文:subset),一组自适应集合的组合,当播放器播放其中所有自适应集合时,可以获得相应的媒体内容;分段信息,是媒体呈现描述中的HTTP统一资源定位符引用的媒体单元,分段信息描述媒体数据的分段,媒体数据的分段可以存储在一个文件中,也可以单独存储,在一种可能的方式中,MPD中会存储媒体数据的分段。In the DASH standard, media presentation (English: media presentation) is a collection of structured data for presenting media content; media presentation description (English: media presentation description), a standardized description of media presentation files for providing streaming media services Period (English: period), a set of consecutive periods that constitute the entire media presentation, the period has continuous and non-overlapping characteristics; representation (English: representation), encapsulated with one or more media content components with descriptive metadata A structured data set (encoded individual media types, such as audio, video, etc.) is a collection and encapsulation of one or more code streams in a transport format, one representation comprising one or more segments; Set (English: AdaptationSet), which represents a set of multiple interchangeable coded versions of the same media content component, an adaptive set containing one or more representations; a subset (English: subset), a set of adaptive sets When the player plays all of the adaptive collections, the corresponding media content can be obtained; The information is a media unit referenced by the HTTP uniform resource locator in the media presentation description, and the segmentation information describes segmentation of the media data, and the segmentation of the media data may be stored in one file or separately, in a possible In the mode, the segmentation of the media data is stored in the MPD.
本发明有关MPEG-DASH技术的相关技术概念可以参考ISO/IEC 23009-1:2014秔Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats,中的有关规定,也可以参考历史标准版本中的相关规定,如ISO/IEC 23009-1:2013或ISO/IEC 23009-1:2012等。The related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC 23009-1:2014秔Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats. You can also refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
二、虚拟现实(virtual reality,VR)技术介绍Second, virtual reality (VR) technology introduction
虚拟现实技术是一种可以创建和体验虚拟世界的计算机仿真系统,它利用计算机生成一种模拟环境,是一种多源信息融合的交互式的三维动态视景和实体行为的系统仿真,可以使用户沉浸到该环境中。VR主要包括模拟环境、感知、自然技能和传感设备等方面。模拟环境是由计算机生成的、实时动态的三维立体逼真图像。感知是指理想的VR应该具有一切人所具有的感知。除计算机图形技术所生成的视觉感知外,还有听觉、触觉、力觉、运动等感知,甚至还包括嗅觉和味觉等,也称为多感知。自然技能是指人的头部转动,眼睛、手势、或其他人体行为动作,由计算机来处理与参与者的动作相适应的数据,并对用户的输入作出实时响应,并分别反馈到用户的五官。传感设备是指三维交互设备。当VR视频(或者360度视频,或者全方位视频(英文:Omnidirectional video))在头戴设备和手持设备上呈现时,只有对应于用户头部的方位部分的视频图像呈现和相关联的音频呈现。 Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment. VR mainly includes simulation environment, perception, natural skills and sensing equipment. The simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has. In addition to the visual perception generated by computer graphics technology, there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception. Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors. The computer processes the data that is appropriate to the actions of the participants, responds to the user's input in real time, and feeds back to the user's facial features. . A sensing device is a three-dimensional interactive device. When VR video (or 360 degree video, or Omnidirectional video) is presented on the headset and handheld device, only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented. .
VR视频和通常的视频(英文:normal video)的差别在于通常的视频是整个视频内容都会被呈现给用户;VR视频是只有整个视频的一个子集被呈现给用户(英文:in VR typically only a subset of the entire video region represented by the video pictures)。The difference between VR video and normal video (English: normal video) is that the normal video is that the entire video content will be presented to the user; VR video is only a subset of the entire video is presented to the user (English: in VR typically only a Subset of the entire video region represented by the video pictures).
三、现有DASH标准的空间描述:Third, the spatial description of the existing DASH standard:
现有标准中,对空间信息的描述原文是“The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects.A Spatial Object is defined as a spatial part of a content component(e.g.a region of interest,or a tile)and represented by either an Adaptation Set or a Sub-Representation.”In the existing standard, the description of the spatial information is "The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects. A Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an Adaptation Set or a Sub-Representation."
【中文】:MPD中描述的是空间对象(即Spatial Objects)之间的空间关系(即spatial relationships)。空间对象被定义为一个内容成分的一部分空间,比如现有的感兴趣区域(英文:region of interest,ROI)和tile;空间关系可以在Adaptation Set和Sub-Representation中描述。[英文]: The spatial relationship between spatial objects (Spatial Objects) is described in MPD. A spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation.
现有DASH标准在MPD中定义了一些描述子元素,每个描述子元素都有两个属性,schemeIdURI和value。其中,schemeIdURI描述了当前描述子是什么,value是描述子的参数值。The existing DASH standard defines some descriptor elements in the MPD. Each descriptor element has two attributes, schemeIdURI and value. Among them, the schemeIdURI describes what the current descriptor is, and the value is the parameter value of the descriptor.
在已有的标准中有两个已有描述子SupplementalProperty和EssentialProperty(补充特性描述子和基本特性描述子)。现有标准中如果这两个描述子的schemeIdURI="urn:mpeg:dash:srd:2014"(或者schemeIdURI=urn:mpeg:dash:VR:2017),则表示该描述子描述了关联到的空间对象的空间信息(spatial information associated to the containing Spatial Object.),相应的value中列出了SDR的一系列参数值。具体value的语法如下表1:In the existing standards, there are two existing descriptions of SupplementalProperty and EssentialProperty (supplemental feature descriptors and basic property descriptors). If the two descriptors have the schemeIdURI="urn:mpeg:dash:srd:2014" (or schemeIdURI=urn:mpeg:dash:VR:2017) in the existing standard, it means that the descriptor describes the associated space. The spatial information associated to the containing Spatial Object., the corresponding value lists a series of parameter values of the SDR. The syntax of the specific value is shown in Table 1:
表1Table 1
Figure PCTCN2017086548-appb-000001
Figure PCTCN2017086548-appb-000001
Figure PCTCN2017086548-appb-000002
Figure PCTCN2017086548-appb-000002
如图6,图6是空间对象的空间关系示意图。其中,图像AS可设为一个内容成分,AS1、AS2、AS3和AS4为AS包含的4个空间对象,每个空间对象关联一个空间,MPD中描述了各个空间对象的空间关系,例如各个空间对象关联的空间之间的关系。Figure 6 is a schematic diagram showing the spatial relationship of spatial objects. The image AS can be set as a content component, and AS1, AS2, AS3, and AS4 are four spatial objects included in the AS, and each spatial object is associated with a space. The spatial relationship of each spatial object is described in the MPD, for example, each spatial object. The relationship between the associated spaces.
MPD样例如下:MPD like the following:
Figure PCTCN2017086548-appb-000003
Figure PCTCN2017086548-appb-000003
Figure PCTCN2017086548-appb-000004
Figure PCTCN2017086548-appb-000004
Figure PCTCN2017086548-appb-000005
Figure PCTCN2017086548-appb-000005
其中,上述空间对象的左上坐标、空间对象的长宽和人空间对象参考的空间,也可以是相对值,比如:上述value="1,0,0,1920,1080,3840,2160,2"可以描述成value="1,0,0,1,1,2,2,2"。Wherein, the upper left coordinate of the spatial object, the length and width of the spatial object, and the space of the human space object reference may also be relative values, such as: the above value="1, 0, 0, 1920, 1080, 3840, 2160, 2" Can be described as value="1,0,0,1,1,2,2,2".
在一些可行的实施方式中,对于360度大视角的视频图像的输出,服务器可将360度的视角范围内的空间进行划分以得到多个空间对象,每个空间对象对应一个子视角,一个或者多个子视角的拼接形成一个完整的人眼观察视角。人眼观察视角通常为120度*120度。例如图7所述的框1对应的视角1和框2对应的视角2。服务器可为每个空间对象准备一组视频码流,具体的,服务器可获取视频中每个码流的编码配置参数,并根据码流的编码配置参数生成视频的各个空间对象对应的码流。客户端可在视频输出时向服务器请求某一时间段某个视角对应的视频码流分段并输出该视角对应的空间对象。客户端在同一个时间段内输出360度的视角范围内的所有视角对应的视频码流分段,则可在整个360度的空间对象内输出显示该时间段内的完整视频图像。In some feasible implementation manners, for output of a 360-degree view video image, the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to one sub-view, one or The splicing of multiple sub-views forms a complete human eye viewing angle. The viewing angle of the human eye is usually 120 degrees * 120 degrees. For example, the viewing angle 1 corresponding to the frame 1 and the viewing angle 2 corresponding to the frame 2 are shown in FIG. The server may prepare a set of video code streams for each spatial object. Specifically, the server may obtain encoding configuration parameters of each code stream in the video, and generate a code stream corresponding to each spatial object of the video according to the encoding configuration parameters of the code stream. The client may request a video stream segment corresponding to a certain angle of view for a certain period of time when the video is output, and output a spatial object corresponding to the perspective. The client outputs the video stream segment corresponding to all the angles of view within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree spatial object.
具体实现中,在360度的空间对象的划分中,客户端可首先将球面映射为平面,在平面上对空间对象进行划分。具体的,客户端可采用经纬度的映射方式将球面映射为经纬平面图。如图9,图9是本发明实施例提供的空间对象的示意图。客户端可将球面映射为经纬平面图,并将经纬平面图划分为A~I等多个空间对象。进一步的,客户端可也将球面映射为立方体,再将立方体的多个面进行展开得到平面图,或者将球面映射为其他多面体,在将多面体的多个面进行展开得到平面图等。客户端还可采用更多的映射方式将球面映射为平面,具体可根据实际应用场景需求确定,在此不做限制。下面将以经纬度的映射方式,结合图10进行说明。In a specific implementation, in the division of a 360-degree spatial object, the client may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the client may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner. FIG. 9 is a schematic diagram of a spatial object according to an embodiment of the present invention. The client can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I. Further, the client may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view. The client can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 10 in a latitude and longitude mapping manner.
如图10,客户端可将球面的空间对象划分为A~I等多个空间对象之后,则可通过服务器为每个空间对象准备一组DASH码流。其中,每个空间对象对应一个子视角,每个空间对象对应的一组DASH码流为每个子视角的视角码流。一个视角码流中每个图像所关联的空间对象的空间信息相同,由此可将视角码流设为静态码流。视频播放过程中,可根据用户当前观看的视角选择相应的空间对象对应的DASH码流进行播放。用户切换视频观看的视角时,客户端则可根据用户选择的新视角确定切换的目标空间对象对应的DASH码流,进而可将视频播放内容切换为目标空间对象对应的DASH码流。As shown in FIG. 10, after the client can divide the spherical space object into multiple spatial objects such as A to I, a set of DASH code streams can be prepared for each spatial object by the server. Each spatial object corresponds to one sub-view, and a set of DASH code streams corresponding to each spatial object is a view code stream of each sub-view. The spatial information of the spatial object associated with each image in a view code stream is the same, whereby the view code stream can be set as a static code stream. During the video playback process, the DASH code stream corresponding to the corresponding spatial object may be selected for playing according to the viewing angle currently viewed by the user. When the user switches the view angle of the video view, the client can determine the DASH code stream corresponding to the target space object of the switch according to the new perspective selected by the user, and then switch the video play content to the DASH code stream corresponding to the target space object.
在图10中repA到repI的9个视角码流分别对应经纬图中的A到I的9个空间对象。其中,repA为空间对象A对应的一组DASH码流中的任一个,本发明实施例将repA为例进行说明。同理,repB到repI中各个子视角码流分别对其对应的空间对象对应的一组DASH码流中的任一个,本发明实施例将以repB、repC、……、以及repI为例进行说明。其中,各个子视角的视角码流中包含的segment对齐,即同一时间段内各个视角码流中包含的segment的长度相同。不同视角码流的segment对齐使得不同视角码流可随着视角的切换进行segment的视频图像呈现切换。例如,用户在repD的第3个segment播放结束之后切换 到repB的第4个segment,之后在repB的第5个segment播放结束时切换至repC的第6个segment。客户端呈现的视频图像从D视角的画面切换到B视角的画面,再切换到C视角的画面。In Fig. 10, the nine view code streams of repA to repI correspond to the nine spatial objects of A to I in the latitude and longitude map, respectively. The repA is any one of a set of DASH code streams corresponding to the space object A. The repA is taken as an example in the embodiment of the present invention. Similarly, in the repB to repI, each of the sub-view streams is respectively in any one of a group of DASH streams corresponding to the corresponding spatial object. In the embodiment of the present invention, repB, repC, ..., and repI are taken as an example for description. . The segment alignment included in the view code stream of each sub-view is the same as the length of the segment included in each view code stream in the same time period. Segment alignment of streams of different view streams enables different view code streams to switch between video image presentations of segments as the view angle switches. For example, the user switches after the third segment of repD ends. Go to the 4th segment of repB, and then switch to the 6th segment of repC at the end of the 5th segment playback of repB. The video image presented by the client is switched from the screen of the D view to the screen of the B view, and then to the screen of the C view.
本发明实施例提供了与视角码流具有不同的segment时长的切换码流,切换码流中包含的segment对应的播放时长小于其对应的视角码流包含的segment的播放时长。每组切换码流对应一组视角码流(如图11所示,repA表示一组视角码流,repA’表示一组切换码流),一组切换码流中包含一个或者多个切换码流,每组切换码流对应一个空间对象。切换码流与其对应的视角码流对应相同的空间对象,即,切换码流及其对应的视角码流中包含的同一个时间段的码流分段的内容成分相同。The embodiment of the present invention provides a switching code stream having a segment duration different from a view code stream, and the playback duration corresponding to the segment included in the switching code stream is smaller than the playback duration of the segment included in the corresponding view code stream. Each set of switching code streams corresponds to a set of view code streams (as shown in FIG. 11, repA represents a set of view code streams, repA' represents a set of switch code streams), and a set of switch code streams includes one or more switch code streams. Each group of switching code streams corresponds to a spatial object. The switching code stream and the corresponding view code stream correspond to the same spatial object, that is, the content components of the code stream segment of the same time period included in the switching code stream and the corresponding view code stream are the same.
在一些可行的实施方式中,服务器在准备视频码流数据的视角码流的同时,针对每个子视角多准备一组切换码流,即每组视角码流对应一组切换码流。每组视角码流及其对应的切换码流包含的子视角相同(即空间对象相同),只是视角码流的segment时长较长,切换码流的segment时长较短。用户的视角需要切换时,客户端先选择切换码流,这样客户端在很短的时间后就会呈现出新视角的高质量视频;当客户端检测到切换码流的segment可以向视角码流切换时,客户端的representation从切换码流向视角码流切换,这样在同等带宽条件下就可以保证用户的最佳体验。In some feasible implementation manners, the server prepares a set of switching code streams for each sub-view while preparing the view code stream of the video stream data, that is, each group of view code streams corresponds to a set of switching code streams. Each set of view code streams and their corresponding switched code streams contain the same sub-views (ie, the same spatial objects), except that the segment length of the view code stream is longer, and the segment length of the switch code stream is shorter. When the user's perspective needs to be switched, the client first selects the switching code stream, so that the client will present a high-quality video with a new perspective after a short time; when the client detects the segment of the switching code stream, it can go to the view stream. When switching, the client's representation is switched from the switching code stream to the view code stream, so that the user's best experience can be guaranteed under the same bandwidth condition.
在本发明实施例中,为了使得客户端侧可以识别出切换码流,服务器在生成MPD时需要增加对应切换码流的语法元素,客户端可以根据该语法元素得到与视角码流相对应的切换码流信息。服务器生成MPD时可在MPD中添加用于描述切换码流的表示,一个表示可包含一个或者多个切换码流的描述信息,该表示也可称为切换码流表示或称第一表示。MPD中现有的用于描述视角码流的表示可称为视角码流表示或者媒体表示或者第二表示。在用户的视角需要切换时,能够快速的选择新视角的码流,呈现新视角的高质量视频。几种可能的MPD语法元素的表示方式如下所示。可以理解的是,本发明实施例的MPD示例仅示出了本发明技术对现有标准中规定MPD的语法元素进行修改的相关部分,未示出MPD文件的全部语法元素,本领域普通技术人员可以结合DASH标准中的相关规定运用本发明实施例的技术方案。In the embodiment of the present invention, in order to enable the client side to identify the switching code stream, the server needs to add a syntax element corresponding to the switching code stream when generating the MPD, and the client may obtain a switch corresponding to the view code stream according to the syntax element. Code stream information. When the server generates the MPD, a representation for describing the switching code stream may be added to the MPD, and a description may include description information of one or more switching code streams, and the representation may also be referred to as a switching code stream representation or a first representation. An existing representation in the MPD for describing a view stream may be referred to as a view stream representation or a media representation or a second representation. When the user's perspective needs to be switched, the code stream of the new perspective can be quickly selected to present a high-quality video of a new perspective. The representation of several possible MPD syntax elements is as follows. It can be understood that the MPD example of the embodiment of the present invention only shows the relevant part of the existing standard that modifies the syntax element of the MPD in the existing standard, and does not show all the syntax elements of the MPD file, and those skilled in the art. The technical solution of the embodiment of the present invention can be applied in conjunction with the relevant provisions in the DASH standard.
在本发明实施例的一种实现方式中,在MPD中新增语法描述,如下表2,表2为一语法信息表:In an implementation manner of the embodiment of the present invention, a syntax description is added to the MPD, as shown in Table 2 below. Table 2 is a syntax information table:
表2Table 2
Figure PCTCN2017086548-appb-000006
Figure PCTCN2017086548-appb-000006
在MPD中通过属性@FovType来标记对应的representation中的切换码流。在相同的视角、码率等参数下,客户端首选表示切换码流的representation来呈现新的视角。相关的MPD示例如下,The switching code stream in the corresponding representation is marked in the MPD by the attribute @FovType. Under the same viewing angle, code rate and other parameters, the client prefers to represent the representation of the switching code stream to present a new perspective. The relevant MPD examples are as follows.
MPD样例1:MPD sample 1:
Figure PCTCN2017086548-appb-000007
Figure PCTCN2017086548-appb-000007
在本MPD样例中Representation id="author1"的表示是切换码流。In this MPD example, the representation of Representation id="author1" is the switching code stream.
MPD样例2:MPD sample 2:
Figure PCTCN2017086548-appb-000008
Figure PCTCN2017086548-appb-000008
Figure PCTCN2017086548-appb-000009
Figure PCTCN2017086548-appb-000009
在本MPD样例中Representation id=“3"的表示是切换码流。In this MPD example, the representation of Representation id = "3" is the switching code stream.
在本发明实施例的另一实现方式中,In another implementation manner of the embodiment of the present invention,
MPD样例3:MPD Sample 3:
Figure PCTCN2017086548-appb-000010
Figure PCTCN2017086548-appb-000010
Figure PCTCN2017086548-appb-000011
Figure PCTCN2017086548-appb-000011
在本MPD样例中AdaptationSet id=”2”下层的所有表示都是切换码流。In the MPD example, all representations of the lower layers of AdaptationSet id=”2” are the switching streams.
本发明实施例的另一实施例中给出MPD中切换码流的另一种该描述方式,如下表3,表3为另一语法信息表:Another description of the switching code stream in the MPD is given in another embodiment of the embodiment of the present invention. As shown in Table 3 below, Table 3 is another syntax information table:
表3table 3
Figure PCTCN2017086548-appb-000012
Figure PCTCN2017086548-appb-000012
其中,上述标识着switch-representation和同属于一个adaptationset下的其他representation内容一样,但是该representation不是所有的segment都可以和其他representation的segment无缝切换,该representation只能在指定segment处于其他representation进行切换,表明该representation为切换码流。在视角发生切换的时候,客户端首先获取该representation的segment进行新视角的呈现。Wherein, the above-mentioned switch-representation is the same as other representation contents belonging to an adaptation set, but not all the segments can be seamlessly switched with other representations, and the representation can only be switched when the specified segment is in another representation. , indicating that the representation is a switching code stream. When the view angle is switched, the client first obtains the segment of the representation to present a new perspective.
相关的MPD示例如下:Examples of related MPDs are as follows:
MPD样例4:MPD Sample 4:
Figure PCTCN2017086548-appb-000013
Figure PCTCN2017086548-appb-000013
Figure PCTCN2017086548-appb-000014
Figure PCTCN2017086548-appb-000014
在本MPD样例中switch-representation id=“3"的表示是切换码流;本发明实施例增加了新的表达类型switch-representation。In the present MPD example, the representation of switch-representation id = "3" is a switching code stream; the embodiment of the present invention adds a new expression type switch-representation.
在本发明实施例的另一实现方式中,在MPD中增加新的语法元素,将representation分组,一组是现有DASH标准中规定的representation,另一组是切换码流的representation。相关的MPD示例如下:In another implementation of the embodiment of the present invention, a new syntax element is added to the MPD, and the representation is grouped, one set is a representation specified in the existing DASH standard, and the other set is a representation of the switched code stream. Examples of related MPDs are as follows:
MPD样例5:MPD Sample 5:
Figure PCTCN2017086548-appb-000015
Figure PCTCN2017086548-appb-000015
Figure PCTCN2017086548-appb-000016
Figure PCTCN2017086548-appb-000016
在该MPD中,在representation增加了分组信息,根据分组信息可分出segment可以切换的组;比如Representation id=“3"和Representation id=“5"的representation的FovGroup=”2”,这两个representation的segment都是对齐的,可以切换。 In the MPD, the grouping information is added in the representation, and the group that the segment can switch can be separated according to the grouping information; for example, the Representation id=“3” and the Representation id=“5” representation of the FovGroup=”2”, the two The segments of the representation are aligned and can be switched.
本发明实施例提供了一种视频数据的处理方法及装置,可提高媒体数据分段的切换效率,增强视频观看的用户体验。The embodiment of the invention provides a method and a device for processing video data, which can improve the switching efficiency of media data segmentation and enhance the user experience of video viewing.
第一方面提供了一种视频数据的处理方法,其可包括:The first aspect provides a method for processing video data, which may include:
解析媒体呈现描述,获取标识信息,所述标识信息用于标识视频的第一表示,所述第一表示所描述的分段的播放时长小于所述视频的第二表示所描述的分段的播放时长;得到切换指令信息,所述切换指令信息用于指示将当前空间对象切换到目标空间对象;根据所述标识信息和所述切换指令信息,得到目标表示,所述目标表示和所述目标空间对象相对应;获取所述视频的当前播放时刻,根据所述当前播放时刻和所述目标表示得到目标表示分段。Parsing the media presentation description, obtaining the identification information, the identification information is used to identify the first representation of the video, and the playback duration of the segment described by the first representation is smaller than the playback of the segment described by the second representation of the video a switching instruction information, the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object; and the target representation is obtained according to the identification information and the switching instruction information, the target representation and the target space Corresponding to the object; acquiring a current playing time of the video, and obtaining a target representation segment according to the current playing time and the target representation.
在本发明实施例中,客户端得到切换指令信息可包括上述头部转动,眼睛、手势、或其他人体行为动作信息,也可包括用户的输入信息,上述输入信息可包括键盘输入信息、语音输入信息和触屏输入信息等。In the embodiment of the present invention, the client obtains the switching instruction information, which may include the above-mentioned head rotation, eyes, gestures, or other human behavior action information, and may also include input information of the user, and the input information may include keyboard input information and voice input. Information and touch screen input information, etc.
在一种可行的实施方式中,所述标识信息包括:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。In a possible implementation manner, the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在本发明实施例中,用于标识第一表示的标识信息可以以多种表示形式存在,灵活性更高,适用性更强。通过表示类型标识来识别视频中的第一表示,进而可在接收到空间对象切换指令时,优先选择播放时长较短的目标第一表示的分段进行切换,可提高码流分段切换播放的效率,快速呈现切换后的视频空间区域对应的视频内容给用户,增强了视频观看的用户体验。In the embodiment of the present invention, the identification information used to identify the first representation may exist in multiple representations, which is more flexible and more applicable. The first representation in the video is identified by the representation type identifier, and when the spatial object switching instruction is received, the segment of the first representation of the target with a short playback duration is preferentially switched, thereby improving the stream segmentation switching. Efficiency, quickly presenting the video content corresponding to the switched video space area to the user, enhancing the user experience of video viewing.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行表示切换的切换分段信息;其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;In a possible implementation, the switching point information is used to identify the switching segment information indicating that the first representation and the second representation are switched; wherein the switching segment information includes: a segment interval, a first representation At least one of a segmented position and a segmented position of the second representation;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
在本发明实施例中,可通过切换点信息识别第一表示和第二表示进行内容切换的切换分段信息,切换分段信息可以以多种表现形式存在,灵活性更高,适用性更强。In the embodiment of the present invention, the switching segment information of the first representation and the second representation for content switching may be identified by using the switching point information, and the switching segment information may exist in multiple representation forms, and the flexibility is higher, and the applicability is stronger. .
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示所在表示集合的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的分段的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
在本发明实施例中,用于标识第一表示的标识信息可以以多种表示形式携带在媒体呈现描述中,进一步的,还可携带在媒体呈现描述的不同位置属性信息中,灵活性更高,适用性更强。In the embodiment of the present invention, the identifier information used to identify the first representation may be carried in the media presentation description in multiple representations, and further, may be carried in different location attribute information of the media presentation description, and the flexibility is higher. , the applicability is stronger.
在一种可行的实施方式中,所述根据所述当前播放时刻和所述目标表示得到目标表示 分段,包括:In a feasible implementation manner, the obtaining a target representation according to the current playing time and the target representation Segmentation, including:
获取所述目标表示的分段信息,所述目标表示的分段信息包括所述目标表示中包含的各个分段对应的播放时长;Acquiring the segmentation information of the target representation, where the segmentation information represented by the target includes a play duration corresponding to each segment included in the target representation;
根据所述各个分段对应的播放时长,计算各个分段的播放起始时刻,并根据各个分段的播放起始时刻和所述当前播放时刻确定第一时刻,所述第一时刻为所述各个分段的播放起始时刻中距离所述当前播放时刻最近的播放起始时刻;Determining a play start time of each segment according to a play duration corresponding to each segment, and determining a first time according to a play start time of each segment and the current play time, where the first time is a playback start time that is closest to the current playback time in the playback start time of each segment;
将播放起始时刻为所述第一时刻的分段确定为目标表示分段。The segment in which the playback start time is the first time is determined as the target presentation segment.
在本发明实施例中,可根据目标表示中包含的各个分段的播放时长确定各个分段的播放起始时刻,进而可根据当前播放时刻将播放起始时刻距离当前播放时刻最近的目标表示的分段确定为视频切换的目标分段,进而可在目标分段的播放起始时刻呈现该目标分段,可保证视角切换时播放的视频内容连贯,视频内容呈现顺畅,增强视频观看的用户体验。In the embodiment of the present invention, the play start time of each segment may be determined according to the play duration of each segment included in the target representation, and the target of the play start time closest to the current play time may be represented according to the current play time. The segmentation is determined as the target segment of the video switch, and the target segment can be presented at the playback start time of the target segment, which ensures that the video content played during the view switching is coherent, the video content is presented smoothly, and the user experience of the video viewing is enhanced. .
在本发明实施例的一种实现方式中,媒体呈现描述可以参考上述MPD中的示例。In an implementation manner of the embodiment of the present invention, the media presentation description may refer to an example in the foregoing MPD.
在本发明实施例的一种实现方式中,切换码流可以参照图11中的示例。In an implementation manner of the embodiment of the present invention, the switching code stream may refer to the example in FIG.
在本发明实施例的一种实现方式中,切换指令信息包括表示待切换到的视角的信息,客户端可以根据切换指令信息确定视角码流和切换码流的信息,例如视角码流的ID或存储位置信息,切换码流的ID或存储位置信息等。In an implementation manner of the embodiment of the present invention, the switching instruction information includes information indicating a viewing angle to be switched, and the client may determine the information of the view code stream and the switching code stream according to the switching instruction information, such as the ID of the view stream or Store location information, switch the ID of the stream or store location information, and so on.
在本发明实施例的一种实现方式中,客户端可以根据切换指令信息得到切换后的目标视角所关联的空间对象,然后根据切换后的目标视角所关联的空间对象和各个切换码流所关联的空间对象从多个切换码流中确定一个目标切换码流(或称目标表示)。In an implementation manner of the embodiment of the present invention, the client may obtain the spatial object associated with the switched target view according to the switching instruction information, and then associate the spatial object associated with the switched target angle with each switched code stream. The spatial object determines a target switching code stream (or target representation) from a plurality of switching code streams.
在确定目标切换码流之后,可以根据当前播放时刻,确定待播放的目标切换码流的分段(即目标表示分段),然后根据MPD中包括的URL模板构造相应的HTTP请求,从而请求相应的切换码流的分段。After determining the target switching code stream, the segment of the target switching code stream to be played (ie, the target representation segment) may be determined according to the current playing time, and then the corresponding HTTP request is constructed according to the URL template included in the MPD, thereby requesting corresponding Segmentation of the switching code stream.
在本发明实施例的一种实现方式中,可以根据当前播放时刻,目标切换码流的信息构造分段的URL。In an implementation manner of the embodiment of the present invention, the segmented URL may be constructed according to the information of the target switching code stream according to the current playing time.
相关的构造分段URL和请求分段的方式可以参照DASH标准中或其它类似方式的描述,在此不再赘述。For the related method of constructing the segmentation URL and requesting the segmentation, refer to the description in the DASH standard or other similar manners, and details are not described herein again.
客户端在接收到切换码流的分段后,可以直接进行呈现。After receiving the segmentation of the switching code stream, the client can directly render.
在本发明实施例的一种实现方式后,客户端还要从切换码流切换到切换后的视角对应的视角码流。从而保障用户的良好体验。After an implementation manner of the embodiment of the present invention, the client also switches from the switching code stream to the view code stream corresponding to the switched perspective. Thereby protecting the user's good experience.
本发明实施例的另一方面的实施例中,还在MPD中新增了切换点信息的语法元素描述。In an embodiment of another aspect of the embodiments of the present invention, a syntax element description of the switching point information is also added in the MPD.
在本发明实施例中描述切换码流向视角码流切换的方法。因为切换码流和视角码流并不是在每一个segment之间都进行切换的,所以本发明实施例给出了切换点的描述方法,在点播应用场景下将描述信息存储在媒体数据的文件中,在直播应用场景中将描述信息存储在MPD中,这两种方式即兼容了已有的DASH协议,又对已有的CDN和客户端的改动最小,同时也支持了切换码流和视角码流的切换。A method for switching a coded stream to a view code stream is described in the embodiment of the present invention. Because the switching code stream and the view code stream are not switched between each segment, the embodiment of the present invention provides a description method of the switching point, and the description information is stored in the file of the media data in the on-demand application scenario. The description information is stored in the MPD in the live application scenario. The two modes are compatible with the existing DASH protocol, and the changes to the existing CDN and the client are minimal, and the switching code stream and the view stream are also supported. Switching.
视角码流(即非切换码流)和切换码流之间的切换点信息描述在文件中,具体语法如下: The switching point information between the view code stream (ie, the non-switched code stream) and the switched code stream is described in the file, and the specific syntax is as follows:
Figure PCTCN2017086548-appb-000017
Figure PCTCN2017086548-appb-000017
在一个可能的实施例中,sidx的box中的flag取值为1,可以表示在sidx box中包含有切换点信息,或者可以表示每个分段的切换信息。In a possible embodiment, the flag in the box of sidx takes a value of 1, which may indicate that the switch point information is included in the sidx box, or may indicate the handover information of each segment.
FOV_group_change_Info:该信息标识当前segment和其他duration/FOVGroup/FovType属性的表达切换的相关信息。FOV_group_change_Info: This information identifies information about the expression switching of the current segment and other duration/FOVGroup/FovType attributes.
该信息可以是标识当前segment是否可以和其他duration/FOVGroup/FovType流切换;比如对应上述实施例中的MPD样例1~3,Representation id=“3"的码流文件video-3.mp4中包含上述的sidx box,在该box中解析到某个segment的FOV_group_change_Info=1,表示这个segment是可以切换到Representation id="2"的;反之则不可以切换;如果是实施例一中的MPD样例4,FOV_group_change_Info=1,可以表示当前segment可以和属性FOVGroup=1的representaion切换。The information may be used to identify whether the current segment can be switched with other duration/FOVGroup/FovType streams; for example, corresponding to the MPD samples 1-3 in the above embodiment, the code stream file video-3.mp4 of Representation id=“3” is included. The above sidx box resolves to a segment's FOV_group_change_Info=1 in the box, indicating that the segment can be switched to Representation id="2"; otherwise, it cannot be switched; if it is the MPD sample in the first embodiment 4, FOV_group_change_Info=1, can indicate that the current segment can be switched with the representationaion of the attribute FOVGroup=1.
该信息也可以是当前segment可以切换的其他duration/FOVGroup/FovType流的segment ID的值,比如FOV_group_change_Info=4,表示当前segment可以和视角码流的第4个segment切换。The information may also be the value of the segment ID of other duration/FOVGroup/FovType streams that the current segment can switch, such as FOV_group_change_Info=4, indicating that the current segment can be switched to the fourth segment of the view stream.
视角码流和切换码流之间的切换点信息描述在MPD中,具体语法如下表4所示,表示 为另一语法信息表:The switching point information between the view code stream and the switched code stream is described in the MPD, and the specific syntax is as shown in Table 4 below. For another grammar information table:
表4Table 4
Figure PCTCN2017086548-appb-000018
Figure PCTCN2017086548-appb-000018
MPD样例5:MPD Sample 5:
Figure PCTCN2017086548-appb-000019
Figure PCTCN2017086548-appb-000019
Figure PCTCN2017086548-appb-000020
Figure PCTCN2017086548-appb-000020
在这个MPD样例中Representation id=“3"的码流是切换码流,在SegmentURL media="seg-m1-3.mp4”是可以切换到视角码流的,而且可以切换到视角码流的第2个segment。In this MPD example, the code stream of Representation id=“3” is the switching code stream, which can be switched to the view code stream in SegmentURL media="seg-m1-3.mp4", and can be switched to the view stream. The second segment.
在本发明实施例的一种实现方式中,将FOV_group_change_Info信息添加在了已有的sidx box中,该信息也可以添加在其他的box中,比如In an implementation manner of the embodiment of the present invention, the FOV_group_change_Info information is added to an existing sidx box, and the information may also be added to other boxes, such as
Figure PCTCN2017086548-appb-000021
Figure PCTCN2017086548-appb-000021
FOV_group_change_Info的语义和上述实施例中的语义相同。The semantics of FOV_group_change_Info are the same as those in the above embodiment.
在本发明实施例的一种实现方式中,客户端可以采用如下的方式实现从切换码流到视角码流的切换。In an implementation manner of the embodiment of the present invention, the client may implement switching from the switching code stream to the view code stream in the following manner.
客户端获取切换码流中的索引分片(index segment),解析sidx信息,获得segment切换点的信息(FOV_group_change_Info);The client obtains an index segment in the switching code stream, parses the sidx information, and obtains information of the segment switching point (FOV_group_change_Info);
当客户端检测到某个segment的切换点信息表示当前segment可以切换到视角码流的segment;客户端根据FOV_group_change_Info/当前segment的播放起始时间信息,找到视角码流中可以和当前segment切换的segment的信息,构造视角码流的segment的URL;如图11中,客户端检测到视角切换码流repA’的第5个segment的FOV_group_change_Info信息,判断出在第5个segment可以向repA切换,客户端根据repA’的第5个segment的播放起始时间,在repA中找出起始时间和repA’的第5个segment的播放起始时间最接近的segment(repA中的第2个segment),构造该segmentURL;根据构造的视角码流的URL,客户端请求视角码流的segment。When the client detects the switching point information of a segment, the current segment can switch to the segment of the view stream; the client finds the segment in the view stream that can be switched with the current segment according to the playback start time information of the FOV_group_change_Info/current segment. The information is used to construct the URL of the segment of the view stream; as shown in FIG. 11, the client detects the FOV_group_change_Info information of the 5th segment of the view switching code stream repA', and determines that the 5th segment can be switched to the repA, the client According to the playback start time of the 5th segment of repA', find the segment closest to the playback start time of the 5th segment of repA' in repA (the second segment in repA), construct The segmentURL; according to the URL of the constructed view stream, the client requests a segment of the view stream.
第二方面提供了一种客户端,其可包括:The second aspect provides a client, which can include:
获取模块,用于解析媒体呈现描述,获取标识信息,所述标识信息用于标识视频的第一表示,所述第一表示所描述的分段的播放时长小于所述视频的第二表示所描述的分段的播放时长;And an obtaining module, configured to parse the media presentation description, and obtain the identifier information, where the identifier information is used to identify the first representation of the video, where the playback duration of the segment described by the first representation is smaller than the second representation of the video. The length of the segmentation;
接收模块,用于得到切换指令信息,所述切换指令信息用于指示将当前空间对象切换到目标空间对象;a receiving module, configured to obtain switching instruction information, where the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object;
确定模块,用于根据所述获取模块获取的所述标识信息和所述接收模块接收的所述切 换指令信息,得到目标表示,所述目标表示和所述目标空间对象相对应;a determining module, configured to use the identifier information acquired by the acquiring module and the cut received by the receiving module Transmitting the instruction information to obtain a target representation, the target representation corresponding to the target spatial object;
所述获取模块,还用于获取所述视频的当前播放时刻,根据所述当前播放时刻和所述确定模块确定的所述目标表示得到目标表示分段。The acquiring module is further configured to acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation determined by the determining module.
在一种可行的实施方式中,所述标识信息包括:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。In a possible implementation manner, the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行表示切换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify switching segment information indicating that the first representation and the second representation are switched;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示所在表示集合的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的分段的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
在一种可行的实施方式中,所述获取模块具体用于:In a feasible implementation manner, the acquiring module is specifically configured to:
获取所述目标表示的分段信息,所述目标表示的分段信息包括所述目标表示中包含的各个分段对应的播放时长;Acquiring the segmentation information of the target representation, where the segmentation information represented by the target includes a play duration corresponding to each segment included in the target representation;
根据所述各个分段对应的播放时长,计算各个分段的播放起始时刻,并根据各个分段的播放起始时刻和所述当前播放时刻确定第一时刻,所述第一时刻为所述各个分段的播放起始时刻中距离所述当前播放时刻最近的播放起始时刻;Determining a play start time of each segment according to a play duration corresponding to each segment, and determining a first time according to a play start time of each segment and the current play time, where the first time is a playback start time that is closest to the current playback time in the playback start time of each segment;
将播放起始时刻为所述第一时刻的分段确定为目标表示分段。The segment in which the playback start time is the first time is determined as the target presentation segment.
第三方面提供了一种视频数据的处理方法,其可包括:The third aspect provides a method for processing video data, which may include:
服务器根据第一表示的编码配置参数生成视频的第一表示,并根据第二表示的编码配置参数生成视频的第二表示,所述第一表示所描述的分段的播放时长小于所述第二表示所描述的分段的播放时长;The server generates a first representation of the video according to the encoding configuration parameter of the first representation, and generates a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is smaller than the second Indicates the duration of the described segmentation;
所述服务器生成媒体呈现描述,所述媒体呈现描述中包括标识信息,所述标识信息用于标识所述视频的第一表示。The server generates a media presentation description, where the media presentation description includes identification information, and the identifier information is used to identify a first representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示的分段的播放时长和所述第二表示的分段的播放时长;In a feasible implementation manner, the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation;
其中,所述第一表示的分段的播放时长小于所述视频的第二表示的分段的播放时长。The playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示和所述第二表示的分段的切换点信息。In a possible implementation manner, the identifier information describes switching point information of the first representation and the segment of the second representation.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行内容切 换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify that the first representation and the second representation are content-cut. Switched segmentation information;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
第四方面提供了一种服务器,其可包括:A fourth aspect provides a server, which can include:
生成模块,用于根据第一表示的编码配置参数生成视频的第一表示,并根据第二表示的编码配置参数生成视频的第二表示,所述第一表示所描述的分段的播放时长小于所述第二表示所描述的分段的播放时长;a generating module, configured to generate a first representation of the video according to the encoding configuration parameter of the first representation, and generate a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is less than The second representation indicates the playing duration of the segment described;
描述模块,用于生成媒体呈现描述,所述媒体呈现描述中包括标识信息,所述标识信息用于标识所述视频的第一表示。And a description module, configured to generate a media presentation description, where the media presentation description includes identification information, where the identifier information is used to identify a first representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示的分段的播放时长和所述第二表示的分段的播放时长;In a feasible implementation manner, the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation;
其中,所述第一表示的分段的播放时长小于所述视频的第二表示的分段的播放时长。The playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示和所述第二表示的分段的切换点信息。In a possible implementation manner, the identifier information describes switching point information of the first representation and the segment of the second representation.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行内容切换的切换分段信息;In a feasible implementation manner, the switch point information is used to identify switch segment information that is used for content switching between the first representation and the second representation;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
第五方面提供了一种基于HTTP动态自适应流媒体的视频数据的处理方法,其可包括:The fifth aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示,所述表示包括描述媒体数据分段的属性信息,所述媒体呈现描述还包括至少两个的切换码流表示,所述切换码流表示包括描述切换码流的数据分段的属性信息,Receiving a media presentation description, the media presentation description including at least two representations, the representation including attribute information describing a media data segment, the media presentation description further comprising at least two handover code stream representations, the handover code The stream representation includes attribute information describing a data segment of the switched code stream,
其中,所述至少两个的表示所关联的空间对象与所述至少两个的切换码流表示所关联的空间对象之间存在一一对应的关系,一个媒体表示中描述的一个媒体数据分段对应的播放时长大于一个与媒体表示相对应的切换码流表示中描述的一个切换码流的数据分段对应的播放时长;Wherein the at least two representations of the associated spatial object and the at least two switched code stream representations have a one-to-one correspondence between the spatial objects, a media data segment described in a media representation The corresponding playing duration is greater than a playing duration corresponding to a data segment of a switching code stream described in the switching code stream representation corresponding to the media representation;
得到切换指令信息;Obtaining switching instruction information;
根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,其中,所述目标视角切换码流表示为所述至少两个的切换码流表示中的一个切换码流表示;And obtaining, by the switching instruction information and the media presentation description, a target switching code stream representation, where the target view switching code stream is represented as one of the at least two switched code stream representations;
根据所述目标切换码流表示得到目标切换码流请求信息,所述切换码流请求信息用于 请求目标切换码流的部分数据分段。Determining, according to the target switching code stream, target switching code stream request information, where the switching code stream request information is used Request a partial data segment of the target switching code stream.
在一种可行的实施方式中,所述媒体呈现描述还包括切换码流表示所关联的空间对象的空间信息,所述空间信息用于描述切换码流表示所关联的空间对象与其关联的内容成分的空间关系;In a feasible implementation manner, the media presentation description further includes spatial information of the associated spatial object of the switched code stream, where the spatial information is used to describe a content component associated with the switched spatial representation and the associated content component Spatial relationship
所述根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,包括:And obtaining the target switching code stream representation according to the switching instruction information and the media presentation description, including:
根据所述切换指令信息得到目标空间对象的空间信息;Obtaining spatial information of the target spatial object according to the switching instruction information;
根据所述目标空间对象的空间信息和所述空间关系得到所述目标切换码流表示。And obtaining the target switching code stream representation according to the spatial information of the target spatial object and the spatial relationship.
在一种可行的实施方式中,所述媒体呈现描述包括自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation, the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component. Data collection
其中,所述自适应集的信息包括所述至少两个的切换码流表示的信息。The information of the adaptive set includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述媒体呈现描述包括表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a feasible implementation manner, the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息包括所述至少两个的切换码流表示的信息。The information represented by the information includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述切换码流表示的信息包括:码流类型标识、码流分段的播放时长以及切换点信息中的至少一种。In a feasible implementation manner, the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识切换码流与非切换码流进行内容切换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
其中,所述切换分段信息包括:码流分段间隔、切换码流的码流分段位置以及非切换码流的码流分段位置中的至少一个;The switching segment information includes: at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
第六方面提供了一种客户端,其可包括:A sixth aspect provides a client, which can include:
接收模块,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示,所述表示包括描述媒体数据分段的属性信息,所述媒体呈现描述还包括至少两个的切换码流表示,所述切换码流表示包括描述切换码流的数据分段的属性信息,其中,所述至少两个的表示所关联的空间对象与所述至少两个的切换码流表示所关联的空间对象之间存在一一对应的关系,一个媒体表示中描述的一个媒体数据分段对应的播放时长大于一个与媒体表示相对应的切换码流表示中描述的一个切换码流的数据分段对应的播放时长;a receiving module, configured to receive a media presentation description, where the media presentation description includes at least two representations, the representation includes attribute information describing a media data segment, and the media presentation description further includes at least two handover code stream representations The switching code stream representation includes attribute information describing a data segment of the switched code stream, wherein the at least two representations of the associated spatial object and the at least two switched code stream representations are associated with the spatial object There is a one-to-one correspondence between the media data segments described in one media representation corresponding to the playback duration of the data segment corresponding to one of the switched codestreams corresponding to the media representation. duration;
获取模块,用于得到切换指令信息;Obtaining a module, configured to obtain switching instruction information;
所述获取模块,还用于根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,其中,所述目标视角切换码流表示为所述至少两个的切换码流表示中的一个切换码流表示;The acquiring module is further configured to obtain a target switching code stream representation according to the switching instruction information and the media presentation description, where the target view switching code stream is represented by the at least two switching code stream representations a switching code stream representation;
所述获取模块,还用于根据所述目标切换码流表示得到目标切换码流请求信息,所述切换码流请求信息用于请求目标切换码流的部分数据分段。The acquiring module is further configured to obtain target switching code stream request information according to the target switching code stream representation, where the switching code stream request information is used to request a partial data segment of the target switching code stream.
在一种可行的实施方式中,所述媒体呈现描述还包括切换码流表示所关联的空间对象 的空间信息,所述空间信息用于描述切换码流表示所关联的空间对象与其关联的内容成分的空间关系;In a feasible implementation manner, the media presentation description further includes: switching the code stream representation to associate the spatial object Spatial information for describing a spatial relationship between a spatial object associated with the switched code stream representation and its associated content component;
所述获取模块具体用于:The obtaining module is specifically configured to:
根据所述切换指令信息得到目标空间对象的空间信息;Obtaining spatial information of the target spatial object according to the switching instruction information;
根据所述目标空间对象的空间信息和所述空间关系得到所述目标切换码流表示。And obtaining the target switching code stream representation according to the spatial information of the target spatial object and the spatial relationship.
在一种可行的实施方式中,所述媒体呈现描述包括自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation, the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component. Data collection
其中,所述自适应集的信息包括所述至少两个的切换码流表示的信息。The information of the adaptive set includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述媒体呈现描述包括表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a feasible implementation manner, the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息包括所述至少两个的切换码流表示的信息。The information represented by the information includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述切换码流表示的信息包括:码流类型标识、码流分段的播放时长以及切换点信息中的至少一种。In a feasible implementation manner, the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识切换码流与非切换码流进行内容切换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
其中,所述切换分段信息包括:码流分段间隔、切换码流的码流分段位置以及非切换码流的码流分段位置中的至少一个;The switching segment information includes: at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
第七方面提供了一种基于HTTP动态自适应流媒体的视频数据的处理方法,其可包括:The seventh aspect provides a method for processing video data based on HTTP dynamic adaptive streaming, which may include:
接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述表示包括至少一个分段,所述至少两个的表示中的第一表示的分段时长小于第二表示的分段时长;Receiving a media presentation description, the media presentation description including at least two representations, the representation comprising at least one segment, a segmentation duration of the first representation of the at least two representations being less than a score of the second representation Period length;
其中,所述第一表示表示所关联的空间对象和所述第二表示所关联的空间对象相对应;Wherein the first representation indicates that the associated spatial object corresponds to the spatial object associated with the second representation;
得到切换指令信息;Obtaining switching instruction information;
根据所述表示切换指令,获取所述第一表示的分段,并在预设的时间后获取所述第二表示的分段。And acquiring, according to the indicating switching instruction, the segment of the first representation, and acquiring the segment of the second representation after a preset time.
在一种可行的实施方式中,所述第一表示中携带切换点信息。In a feasible implementation manner, the first representation carries handover point information.
在一种可行的实施方式中,所述的媒体呈现描述中携带标识信息;In a feasible implementation manner, the media presentation description carries the identifier information;
其中,所述标识信息包含:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识第一码流与第二码流进行表示切换的切换分段信息;In a possible implementation, the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
或者 Or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
在一种可行的实施方式中,所述携带切换点信息携带在所述第一表示中的指定box中。In a feasible implementation manner, the carrying handover point information is carried in a designated box in the first representation.
在一种可行的实施方式中,所述指定box为所述第一表示中包含的sidx box,所述sidx box用于描述分段信息。In a possible implementation manner, the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
在一种可行的实施方式中,所述表示类型标识用来标识所述第一表示。In a possible implementation, the representation type identifier is used to identify the first representation.
在一种可行的实施方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation manner, the media presentation description includes information of an adaptation set, and the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a possible implementation manner, the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In a feasible implementation manner, the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
第八方面提供了一种客户端,其可包括:The eighth aspect provides a client, which can include:
接收模块,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述表示包括至少一个分段,所述至少两个的表示中的第一表示的分段时长小于第二表示的分段时长;其中,所述第一表示表示所关联的空间对象和所述第二表示所关联的空间对象相对应;a receiving module, configured to receive a media presentation description, where the media presentation description includes information of at least two representations, the representation includes at least one segment, and a segmentation duration of the first representation of the at least two representations is less than a segmentation duration of the second representation; wherein the first representation indicates that the associated spatial object corresponds to a spatial object associated with the second representation;
获取模块,用于得到切换指令信息;Obtaining a module, configured to obtain switching instruction information;
所述获取模块,还用于根据所述表示切换指令,获取所述第一表示的分段,并在预设的时间后获取所述第二表示的分段。The acquiring module is further configured to acquire the segment of the first representation according to the representation switching instruction, and acquire the segment of the second representation after a preset time.
在一种可行的实施方式中,所述第一表示中携带切换点信息。In a feasible implementation manner, the first representation carries handover point information.
在一种可行的实施方式中,所述的媒体呈现描述中携带标识信息;In a feasible implementation manner, the media presentation description carries the identifier information;
其中,所述标识信息包含:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识第一码流与第二码流进行表示切换的切换分段信息;In a possible implementation, the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
或者or
所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
在一种可能的方式中,flag的值为1时,表示当前分段可以切换;flag的值为0时,表示当前分段不能无缝切换。In a possible manner, when the value of flag is 1, it indicates that the current segment can be switched; when the value of flag is 0, it indicates that the current segment cannot be seamlessly switched.
在一种可行的实施方式中,所述携带切换点信息携带在所述第一表示中的指定box中。 In a feasible implementation manner, the carrying handover point information is carried in a designated box in the first representation.
在一种可行的实施方式中,所述指定box为所述第一表示中包含的sidx box,所述sidx box用于描述分段信息。In a possible implementation manner, the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
在一种可行的实施方式中,所述表示类型标识用来标识所述第一表示。In a possible implementation, the representation type identifier is used to identify the first representation.
在一种可行的实施方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation manner, the media presentation description includes information of an adaptation set, and the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a possible implementation manner, the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In a feasible implementation manner, the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
本发明实施例可根据媒体呈现描述中携带的标识信息识别出视频中包含的切换码流和视角码流。在空间对象切换的过程中,可根据目标空间对象从视频的多个切换码流中识别出目标空间对象对应的目标切换码流,进而可根据空间对象切换时的视频播放时刻确定出目标切换码流中的目标分段,并呈现目标分段。切换码流的分段的播放时长小于视角码流的分段的播放时长,因此空间对象切换时,可先切换到播放时长较短的切换码流分段,可提高空间对象对应的分段切换播放的效率,增强用户体验。进一步的,可获取并呈现目标空间对象对应的目标视角码流的分段,完成空间对象切换时对应的视角码流的分段切换播放。客户端通过目标切换码流完成空间对象时码流切换的中间过渡之后可切换至目标视角码流的播放,可保障空间对象切换后的视频播放的稳定性,增强视频观看的用户体验。The embodiment of the present invention may identify the switching code stream and the view code stream included in the video according to the identifier information carried in the media presentation description. In the process of switching the spatial object, the target switching code stream corresponding to the target spatial object may be identified from the plurality of switching code streams of the video according to the target spatial object, and then the target switching code may be determined according to the video playing time when the spatial object is switched. The target segment in the stream and presents the target segment. The playback duration of the segment of the switching code stream is smaller than the playback duration of the segment of the view code stream. Therefore, when the spatial object is switched, the switching code stream segment with a shorter playback duration can be switched to improve the segmentation switching corresponding to the spatial object. Play efficiency and enhance the user experience. Further, the segment of the target view code stream corresponding to the target space object may be obtained and presented, and the segment switch play of the corresponding view code stream when the space object is switched is completed. The client can switch to the playback of the target view code stream after completing the intermediate transition of the space object switching by the target switching code stream, which can ensure the stability of the video playback after the space object is switched, and enhance the user experience of the video viewing.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明实施例的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some implementations of the embodiments of the present invention. For example, other drawings may be obtained from those of ordinary skill in the art in light of the inventive work.
图1是系统层视频流媒体传输采用的DASH标准传输的框架实例示意图;1 is a schematic diagram of an example of a framework for DASH standard transmission used in system layer video streaming media transmission;
图2是系统层视频流媒体传输采用的DASH标准传输的MPD的结构示意图;2 is a schematic structural diagram of an MPD transmitted by a DASH standard used for system layer video streaming media transmission;
图3是本发明实施例提供的码流分段的切换的一示意图;FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention; FIG.
图4是码流数据中的分段存储方式的一示意图;4 is a schematic diagram of a segmentation storage manner in code stream data;
图5是码流数据中的分段存储方式的另一示意图;5 is another schematic diagram of a segmentation storage manner in code stream data;
图6是空间对象的空间关系示意图;Figure 6 is a schematic diagram showing the spatial relationship of a spatial object;
图7是视角变化对应的空间对象变化的一示意图;7 is a schematic diagram of a change in a spatial object corresponding to a change in a viewing angle;
图8是本发明实施例提供的视频数据的处理方法的流程示意图;FIG. 8 is a schematic flowchart of a method for processing video data according to an embodiment of the present disclosure;
图9是本发明实施例提供的空间对象的示意图; 9 is a schematic diagram of a spatial object according to an embodiment of the present invention;
图10是DASH码流的分段的一示意图;10 is a schematic diagram of segmentation of a DASH code stream;
图11是DASH码流的分段的另一示意图;11 is another schematic diagram of segmentation of a DASH code stream;
图12是视角变化对应的空间对象变化的另一示意图;12 is another schematic diagram of a change in a spatial object corresponding to a change in a viewing angle;
图13是本发明实施例提供的客户端的一结构示意图;FIG. 13 is a schematic structural diagram of a client according to an embodiment of the present invention;
图14是本发明实施例提供的服务器的结构示意图;FIG. 14 is a schematic structural diagram of a server according to an embodiment of the present invention;
图15是本发明实施例提供的客户端的另一结构示意图;FIG. 15 is another schematic structural diagram of a client according to an embodiment of the present invention;
图16是本发明实施例提供的客户端的另一结构示意图。FIG. 16 is another schematic structural diagram of a client according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
当前以客户端为主导的系统层视频流媒体传输方案可采用DASH标准框架,如图1,图1是系统层视频流媒体传输采用的DASH标准传输的框架实例示意图。系统层视频流媒体传输方案的数据传输过程包括两个过程:服务器端(如HTTP服务器,媒体内容准备服务器,以下简称服务器)为视频内容生成媒体数据的过程,和客户端(如HTTP流媒体客户端)向服务器请求并获取媒体数据,响应客户端请求的过程。其中,上述媒体数据包括媒体呈现描述(英文:Media Presentation Description,MPD)文件和媒体码流。服务器上的MPD中包括多个表示(英文:representation),每个表示描述多个分段。客户端的HTTP流媒体请求控制模块获取服务器发送的MPD,并对MPD进行分析,确定MPD中描述的视频码流的各个分段的信息,进而可确定要请求的分段,通过HTTP请求接收端向服务器请求相应的分段,并通过媒体播放器进行解码播放。The current client-side system layer video streaming media transmission scheme can adopt the DASH standard framework, as shown in FIG. 1. FIG. 1 is a schematic diagram of a frame example of DASH standard transmission used in system layer video streaming media transmission. The data transmission process of the system layer video streaming media transmission scheme includes two processes: a server side (such as an HTTP server, a media content preparation server, hereinafter referred to as a server), a process of generating media data for video content, and a client (such as an HTTP streaming client). End) The process of requesting and obtaining media data from the server in response to the client request. The media data includes a media presentation description (MPD) file and a media code stream. The MPD on the server includes a plurality of representations (English: representation), each representation describing a plurality of segments. The HTTP streaming request control module of the client obtains the MPD sent by the server, analyzes the MPD, determines the information of each segment of the video code stream described in the MPD, and further determines the segment to be requested, and requests the receiving end through the HTTP request. The server requests the corresponding segment and decodes it through the media player.
1)在上述服务器为视频内容生成媒体数据的过程中,服务器为视频内容生成的媒体数据包括对应同一视频内容的不同视频质量的视频码流,以及视频码流的MPD文件。例如,服务器为同一集电视剧的视频内容生成低分辨率低码率低帧率(如360p分辨率、300kbps码率、15fps帧率)的码流,中分辨率中码率高帧率(如720p分辨率、1200kbps码率、25fps帧率)的码流,高分辨率高码率高帧率(如1080p分辨率、3000kbps码率、25fps帧率)的码流等。1) In the process of generating media data for the video content by the server, the media data generated by the server for the video content includes a video code stream corresponding to different video qualities of the same video content, and an MPD file of the video code stream. For example, the server generates a low-resolution low-rate low frame rate (such as 360p resolution, 300kbps code rate, 15fps frame rate) for the video content of the same episode, and a medium-rate medium-rate high frame rate (such as 720p). Resolution, 1200 kbps, 25 fps frame rate, high resolution, high bit rate, high frame rate (such as 1080p resolution, 3000 kbps, 25 fps frame rate).
此外,服务器还为该集电视剧的视频内容生成MPD文件。其中,如图2,图2是系统传输方案DASH标准的MPD的结构示意图。上述码流的MPD包含多个时期(Period),例如,图2的MPD)中的period start=100s部分可包含多个自适应集(英文:adaptation set),每个adaptation set可包含Representation1、Representation2,…等多个表示。每个表示描述码流的一个或者多个分段。In addition, the server generates MPD files for the video content of the episode. 2, FIG. 2 is a schematic structural diagram of an MPD of a system transmission scheme DASH standard. The MPD of the above code stream includes a plurality of periods (Period), for example, the period start=100s part in the MPD of FIG. 2 may include a plurality of adaptation sets (English: adaptation set), and each adaptation set may include Representation1, Representation2 , ... and so on. Each represents one or more segments that describe the code stream.
在本发明一个实施例中,每个表示按照时序描述若干个分段(英文:Segment)的信息,例如初始化分段(英文:Initialization segment)、媒体分段(Media Segment)1、Media Segment2,…,Media Segment20等。表示中可以包括播放起始时刻、播放持续时长、网络存储地址(例如以统一资源定位符(英文:Universal Resource Locator,URL)的形式表示 的网络存储地址)等分段信息。In an embodiment of the present invention, each of the information indicating a plurality of segments (English: Segment) according to the time series, such as Initialization Segment, Media Segment 1, Media Segment 2, ... , Media Segment20 and so on. The representation may include a play start time, a play duration, and a network storage address (for example, in the form of a Uniform Resource Locator (URL). Segmentation information such as network storage address).
2)在客户端向服务器请求并获取媒体数据的过程中,用户选择播放视频时,客户端根据用户点播的视频内容向服务器获取相应的MPD。客户端根据MPD中描述的码流分段的网络存储地址,向服务器发送下载网络存储地址对应的码流分段的请求,服务器根据接收到的请求向客户端发送码流分段。客户端获取得到服务器发送的码流分段之后,则可通过媒体播放器进行解码、播放等操作。2) In the process of the client requesting and obtaining the media data from the server, when the user selects to play the video, the client obtains the corresponding MPD according to the video content requested by the user to the server. The client sends a request for downloading the code stream segment corresponding to the network storage address to the server according to the network storage address of the code stream segment described in the MPD, and the server sends the code stream segment to the client according to the received request. After the client obtains the stream segment sent by the server, it can perform decoding, playback, and the like through the media player.
系统层视频流媒体传输方案采用DASH标准,通过客户端分析MPD、按需向服务器请求视频数据并接收服务器发送的数据的方式实现视频数据的传输。The system layer video streaming media transmission scheme adopts the DASH standard, and realizes the transmission of video data by analyzing the MPD by the client, requesting the video data to the server as needed, and receiving the data sent by the server.
参见图3,是本发明实施例提供的码流分段的切换的一示意图。服务器可为同一个视频内容(比如一部电影)准备三个不同视频质量的的码流数据,并在MPD中使用三个Representation对上述三个不同视频质量的码流数据进行描述。其中,上述三个Representation(以下简称rep)可假设为rep1、rep2和rep3等。其中,rep1是码率为4mbps(每秒兆比特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的普通视频。每个rep的segment包含一个时间段内的视频码流,同一个时间段内,不同的rep包含的segment相互对齐。即,每个rep按照时序描述每个时间段的segment,并且相同时段的segment长度相同,进而可实现不同rep上的segment的内容切换。如图,图中标记为阴影的分段是客户端请求播放的分段数据,其中,客户端请求的前3个分段是rep3的分段,客户端请求第4个分段时可请求rep2中的第4个分段,进而可在rep3的第3个segment播放结束之后切换到rep2的第4个分段上播放。Rep3的第3个segment的播放终止点(对应到时间上可为播放结束时刻)即为第4个segment的播放起始点(对应到时间上可为播放起始时刻),同时也是rep2或者rep1的第4个segment的播放起始点,实现不同rep上的segment的对齐。客户端请求rep2的第4个分段之后切换到rep1,请求rep1的第5个分段和第6个分段等。随后可切换至rep3上,请求rep3的第7个分段,再切换到rep1上,请求rep1的第8个分段。FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention. The server can prepare three different video quality stream data for the same video content (such as a movie), and describe the three different video quality stream data in the MPD using three Representations. The above three Representations (hereinafter referred to as rep) can be assumed to be rep1, rep2, rep3, and the like. Among them, rep1 is a high-definition video with a code rate of 4mbps (megabits per second), rep2 is a standard-definition video with a code rate of 2mbps, and rep3 is a normal video with a code rate of 1mbps. Each rep segment contains a video stream within a time period. During the same time period, different rep-containing segments are aligned with each other. That is, each rep describes the segments of each time segment according to the time series, and the segment lengths of the same time period are the same, thereby enabling content switching of segments on different reps. As shown in the figure, the segment marked as shadow in the figure is the segmentation data requested by the client, wherein the first 3 segments requested by the client are segments of rep3, and the client may request rep2 when requesting the 4th segment. The fourth segment in the middle can be switched to play on the fourth segment of rep2 after the end of the third segment of rep3. The playback end point of the third segment of Rep3 (corresponding to the time end of the playback time) is the playback start point of the fourth segment (corresponding to the time start time of playback), and also rep2 or rep1. The playback start point of the 4th segment is used to achieve alignment of segments on different reps. After the client requests the 4th segment of rep2, it switches to rep1, requests the 5th segment and the 6th segment of rep1, and so on. Then you can switch to rep3, request the 7th segment of rep3, then switch to rep1, request the 8th segment of rep1.
需要说明的是,在现有的DASH码流中,不同的rep上的segment的切换,需要在前一个rep的某个segment(例如图3中的rep3上的第3个segment,标记为segment3)播放结束之后才能切换到下一个rep的指定segment(例如图3中的rep2上的第4个segment,标记为segment4),并且需要segment3和segment4的视频内容在时域上连续,即segment3的播放结束时刻为segment4的播放起始时刻,segment3和segment4的视频内容连续。It should be noted that in the existing DASH code stream, the segment of the different rep is switched, and a segment of the previous rep is needed (for example, the third segment on rep3 in FIG. 3, marked as segment3). After the end of playback, you can switch to the specified segment of the next rep (for example, the 4th segment on rep2 in Figure 3, marked as segment4), and the video content of segment3 and segment4 needs to be continuous in the time domain, that is, the playback of segment3 ends. The time is the playback start time of segment4, and the video content of segment3 and segment4 is continuous.
每个rep的segment可以首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。具体可根据实际应用场景需求确定,在此不做限制。Each rep segment can be stored in a file end to end, or it can be stored as a small file. The segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS). It can be determined according to the requirements of the actual application scenario, and no limitation is imposed here.
在DASH媒体文件格式中提到,上述segment有两种存储方式:一种是每个segment分开独立存储,如图4,图4是码流数据中的分段存储方式的一示意图;另一种是同一个rep上的所有segment均存储在一个文件中,如图5,图5是码流数据中的分段存储方式的另一示意图。如图4,repA的segment中每个segment单独存储为一个文件,repB的segment中每个segment也单独存储为一个文件。对应的,图4所示的存储方式,服务器可在码流 的MPD中可采用模板的形式或者列表的形式描述每个segment的URL等信息。如图5,rep1的segment中所有segment存储为一个文件,rep2的segment中所有segment存储为一个文件。对应的,图5所示的存储方法,服务器可在码流的MPD中采用一个索引分段(英文:index segment,也就是图5中的sidx)来描述每个segment的相关信息。索引分段描述了每个segment在其所存储的文件中的字节偏移,每个segment大小以及每个segment持续时间(duration,也称每个segment的播放时长,简称时长)等信息。As mentioned in the DASH media file format, the above segment has two storage modes: one is that each segment is separately stored separately, as shown in FIG. 4, and FIG. 4 is a schematic diagram of a segment storage mode in the code stream data; All the segments on the same rep are stored in one file, as shown in Figure 5. Figure 5 is another schematic diagram of the segmentation storage mode in the code stream data. As shown in Figure 4, each segment in the segment of repA is stored as a file separately, and each segment in the segment of repB is also stored as a file separately. Correspondingly, the storage method shown in Figure 4, the server can be in the code stream The MPD can describe information such as the URL of each segment in the form of a template or a list. As shown in Figure 5, all segments in the segment of rep1 are stored as one file, and all segments in the segment of rep2 are stored as one file. Correspondingly, in the storage method shown in FIG. 5, the server may use an index segment (English: index segment, that is, sidx in FIG. 5) in the MPD of the code stream to describe related information of each segment. The index segment describes the byte offset of each segment in its stored file, the size of each segment, and the duration of each segment (duration, also known as the duration of each segment, referred to as the duration).
当前随着360度视频等VR视频的观看应用的日益普及,越来越多的用户加入到大视角的VR视频观看的体验队伍中。这种新的视频观看应用给用户带来了新的视频观看模式和视觉体验的同时,也带来了新的技术挑战。由于360度(本发明实施例将以360度为例进行说明)等大视角的视频观看过程中,VR视频的呈现空间为360度的一个空间,超过了人眼正常的视觉范围,因此,用户在观看视频的过程中随时都会变换观看的角度(即视角,FOV)。用户观看的视角不同,看到的视频图像也将不同,故此视频播放的内容需要随着用户的视角变化而变化。如图7,图7是视角变化对应的空间对象变化的一示意图。框1和框2分别为用户的两个不同的视角对应的空间对象,其中不同的空间对象显示的视频码流的分段不同。用户在观看视频的过程中,可通过眼部或者头部转动,或者视频观看设备的画面切换等操作,将视频观看的视角由框1切换到框2。其中,用户的视角为框1时观看的视频图像为视频码流的某一个segment包含的内容所呈现出来的视频图像。下一个时刻用户的视角切换为框2,此时用户观看到的视频图像也应该切换为框2对应的空间对象在该时刻所呈现视频图像,此时,该视频图像为另外一个segment包含的内容所呈现出来的视频图像。若用户快速看到切换后的视频图像,则需要客户端以更快更好的方式实现视频码流的segment的播放切换。本发明实施例提供的视频数据的处理方法及装置可针对视角切换带来的视频码流分段的切换提供更高效率的,更好的视觉体验的切换方式。With the increasing popularity of VR video viewing applications such as 360-degree video, more and more users are joining the VR video viewing experience team with large viewing angles. This new video viewing application brings new video viewing modes and visual experiences to users while also bringing new technical challenges. In the video viewing process of a large viewing angle such as 360 degrees (the embodiment of the present invention will be described by taking 360 degrees as an example), the rendering space of the VR video is a space of 360 degrees, which exceeds the normal visual range of the human eye, and therefore, the user The viewing angle (ie, the angle of view, FOV) is changed at any time during the viewing of the video. The user views different views, and the video images that are viewed will also be different. Therefore, the content of the video playback needs to change as the user's perspective changes. FIG. 7 is a schematic diagram showing changes in spatial objects corresponding to changes in viewing angles. Box 1 and box 2 are respectively spatial objects corresponding to two different perspectives of the user, wherein different spatial objects display different segments of the video code stream. During the process of watching the video, the user can switch the viewing angle of the video viewing from the frame 1 to the frame 2 through the operation of the eye or the head rotation or the screen switching of the video viewing device. The video image viewed by the user when the user's perspective is the frame 1 is a video image presented by the content included in a segment of the video stream. At the next moment, the user's perspective is switched to box 2. At this time, the video image viewed by the user should also be switched to the video image presented by the space object corresponding to the box 2 at this moment. At this time, the video image is the content of another segment. The video image presented. If the user quickly sees the switched video image, the client needs to implement the segment switching of the video stream in a faster and better manner. The method and device for processing video data provided by the embodiments of the present invention can provide a more efficient and better manner of switching the visual experience for the switching of video stream segments caused by the view switching.
下面将结合图8至图16对本发明实施例提供的视频数据的处理方法及装置进行说明。The method and apparatus for processing video data provided by the embodiments of the present invention will be described below with reference to FIG. 8 to FIG.
参见图8,是本发明实施例提供的视频数据的处理方法的流程示意图。本发明实施例提供的方法,包括步骤:FIG. 8 is a schematic flowchart diagram of a method for processing video data according to an embodiment of the present invention. The method provided by the embodiment of the present invention includes the following steps:
S801,解析媒体呈现描述,获取标识信息。S801. Parse the media presentation description and obtain the identification information.
在一些可行的实施方式中,对于360度大视角的视频图像的输出,服务器可将360度的视角范围内的空间进行划分以得到多个空间对象,每个空间对象对应用户的一个子视角,例如上述图7所述的框1对应的空间对象1和框2对应的空间对象1。进一步的,服务器可为每个空间对象准备一组视频码流,具体的,服务器可获取视频中每个码流的编码配置参数,并根据码流的编码配置参数生成视频的各个空间对象对应的码流。客户端可在视频输出时向服务器请求某一时间段某个子视角对应的视频分段并输出至该视角对应的空间对象。客户端在同一个时间段内输出360度的视角范围内的所有子视角对应的视频分段,则可在整个360度的空间内输出显示该时间段内的完整视频图像。In some feasible implementation manners, for output of a 360-degree large-view video image, the server may divide a space within a 360-degree view range to obtain a plurality of spatial objects, each spatial object corresponding to a sub-view of the user, For example, the space object 1 corresponding to the frame 1 described in FIG. 7 and the space object 1 corresponding to the frame 2 are used. Further, the server may prepare a set of video code streams for each spatial object. Specifically, the server may obtain encoding configuration parameters of each code stream in the video, and generate corresponding spatial objects of the video according to the encoding configuration parameters of the code stream. Code stream. The client may request a video segment corresponding to a certain sub-view of a certain period of time to be output to the spatial object corresponding to the view when the video is output. The client outputs the video segments corresponding to all the sub-views within the 360-degree viewing angle range in the same period of time, and the complete video image in the time period can be outputted in the entire 360-degree space.
具体实现中,在360度的空间的划分中,客户端可首先将球面映射为平面,在平面上对空间进行划分。具体的,客户端可采用经纬度的映射方式将球面映射为经纬平面图。如图9,图9是本发明实施例提供的空间对象的示意图。客户端可将球面映射为经纬平面图,并将经纬平面图划分为A~I等多个空间对象。进一步的,客户端可也将球面映射为立方体, 再将立方体的多个面进行展开得到平面图,或者将球面映射为其他多面体,在将多面体的多个面进行展开得到平面图等。客户端还可采用更多的映射方式将球面映射为平面,具体可根据实际应用场景需求确定,在此不做限制。下面将以经纬度的映射方式,结合图9进行说明。In the specific implementation, in the division of the space of 360 degrees, the client may first map the spherical surface into a plane, and divide the space on the plane. Specifically, the client may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner. FIG. 9 is a schematic diagram of a spatial object according to an embodiment of the present invention. The client can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I. Further, the client can also map the sphere to a cube. Further, the plurality of faces of the cube are developed to obtain a plan view, or the spherical surface is mapped to another polyhedron, and a plurality of faces of the polyhedron are developed to obtain a plan view or the like. The client can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 9 in a latitude and longitude mapping manner.
如图9,客户端可将球面的空间划分为A~I等多个空间对象之后,则可通过服务器为每个空间对象准备一组DASH码流。其中,每个空间对象对应一个子视角,每个空间对象对应的一组DASH码流为每个子视角的视角码流。每个子视角的视角码流为整个视频码流的一部分,所有子视角的视角码流构成一个完整的视频码流。即,具体实现中,每个空间对象对应的一组DASH码流均为视角码流,整个视频可切分为多个视角码流,具体某个空间对象(设为指定空间对象)对应的视角码流可称为指定视角码流。视频播放过程中,可根据用户当前观看的视角选择相应的一个或者多个空间对象对应的DASH码流进行播放。用户切换视频观看的视角时,客户端则可根据用户选择的新视角确定切换的目标空间对象对应的DASH码流(或称为目标视角码流),进而可将视频播放内容切换为目标空间对象对应的DASH码流。如图10,图10是DASH码流的分段的一示意图。As shown in FIG. 9, after the client can divide the space of the sphere into a plurality of spatial objects such as A to I, a set of DASH code streams can be prepared for each spatial object by the server. Each spatial object corresponds to one sub-view, and a set of DASH code streams corresponding to each spatial object is a view code stream of each sub-view. The view code stream of each sub-view is part of the entire video stream, and the view code streams of all sub-views constitute a complete video stream. That is, in a specific implementation, a set of DASH code streams corresponding to each spatial object are view code streams, and the entire video can be divided into multiple view code streams, and a specific space object (set to a specified space object) corresponds to a view angle. A code stream can be referred to as a specified view code stream. During the video playing process, the corresponding DASH code stream corresponding to one or more spatial objects may be selected for playing according to the viewing angle currently viewed by the user. When the user switches the view angle of the video view, the client can determine the DASH code stream (or the target view code stream) corresponding to the target object of the switch according to the new perspective selected by the user, and then switch the video play content to the target space object. Corresponding DASH code stream. 10 is a schematic diagram of a segmentation of a DASH code stream.
在图10中repA到repI的10个视角码流分别对应经纬图中的A到I的9个空间对象。其中,repA为空间对象A对应的一组DASH码流中的任一个,本发明实施例将repA为例进行说明。同理,repB到repI中各个子视角码流分别对其对应的空间对象对应的一组DASH码流中的任一个,本发明实施例将以repB、repC、……、以及repI为例进行说明。其中,各个子视角的视角码流中包含的segment对齐,即同一时间段内各个视角码流中包含的segment的长度相同。不同视角码流的segment对齐使得不同视角码流可随着视角的切换进行segment的视频内容无缝切换。例如,用户在repD的第3个segment播放结束之后切换到repB的第4个segment,之后在repB的第5个segment播放结束时切换至repC的第6个segment。客户端呈现的视频图像从D视角的画面切换到B视角的画面,再切换到C视角的画面。In Fig. 10, the 10 view code streams of repA to repI correspond to 9 spatial objects of A to I in the warp and latitude chart, respectively. The repA is any one of a set of DASH code streams corresponding to the space object A. The repA is taken as an example in the embodiment of the present invention. Similarly, in the repB to repI, each of the sub-view streams is respectively in any one of a group of DASH streams corresponding to the corresponding spatial object. In the embodiment of the present invention, repB, repC, ..., and repI are taken as an example for description. . The segment alignment included in the view code stream of each sub-view is the same as the length of the segment included in each view code stream in the same time period. The segment alignment of the code streams of different viewing angles enables different video streams to seamlessly switch the video content of the segment as the viewing angle is switched. For example, the user switches to the fourth segment of repB after the third segment of repD ends, and then switches to the sixth segment of repC when the fifth segment of repB ends. The video image presented by the client is switched from the screen of the D view to the screen of the B view, and then to the screen of the C view.
需要说明的是,在图10所示的视角码流的切换方式中,若客户端刚刚播放repD的第3个segment,并且第3个segment的时长是5秒时,用户将视角从D视角切换至B视角,此时客户端需要等到第3个segment播放结束之后才能切换到repB的第4个segment,则用户需要等到5s后才能看到视角B的视频图像。在VR视频观看的用户体验中,这5s的时长会给用户带来不适感,通常这个时延超过200ms,用户就会产生不适感。为了解决用户的不适感问题,如果简单地将视角码流的segment的时长缩短,比如200ms,虽然可以将视角切换时新视角的视频图像的呈现时间缩短,但是将严重影响视频的压缩性能,在相同的目标码率下,200ms的segment的视频质量要比5s的segment的视频质量差很多。若要保证视频质量,则需要更大的传输带宽或者更高的压缩性能,提高了对视频码流数据的传输带宽的要求和压缩性能的要求,增加了视角切换的视频输出成本。It should be noted that, in the switching mode of the view code stream shown in FIG. 10, if the client just plays the third segment of the repD, and the duration of the third segment is 5 seconds, the user switches the angle of view from the D view. From the perspective of B, the client needs to wait until the third segment is played before switching to the fourth segment of repB, and the user needs to wait for 5 seconds before seeing the video image of view B. In the user experience of VR video viewing, the duration of this 5s will give the user a sense of discomfort. Usually, the delay is more than 200ms, and the user will feel uncomfortable. In order to solve the user's discomfort problem, if the duration of the segment of the view code stream is simply shortened, for example, 200 ms, although the presentation time of the video image of the new view angle can be shortened when the view angle is switched, the compression performance of the video will be seriously affected. At the same target bit rate, the video quality of the 200ms segment is much worse than the video quality of the 5s segment. To ensure video quality, a larger transmission bandwidth or higher compression performance is required, which increases the transmission bandwidth requirement and compression performance requirements of the video stream data, and increases the video output cost of the view switching.
本发明实施例提供了与视角码流具有不同的segment时长的切换码流(设为第一表示或者切换码流表示),切换码流中包含的segment的时长小于其对应的视角码流包含的segment的时长。每组切换码流对应一组视角码流,一组切换码流中包含一个或者多个切换码流,每组切换码流对应一个空间对象。切换码流与其对应的视角码流关联相同的空间对 象,即,切换码流及其对应的视角码流中包含的同一个时间段的码流分段的视频内容相同。The embodiment of the present invention provides a switching code stream (set as a first representation or a switched code stream representation) having a segment duration different from a view code stream, and the duration of the segment included in the switching code stream is smaller than that of the corresponding view code stream. The length of the segment. Each group of switching code streams corresponds to a set of view code streams, and one set of switching code streams includes one or more switching code streams, and each group of switching code streams corresponds to one spatial object. The switching code stream is associated with the same spatial pair of its corresponding view code stream For example, the video content of the code stream segment of the same time period included in the switching code stream and its corresponding view code stream is the same.
在一些可行的实施方式中,服务器在准备视频码流数据的视角码流的同时,针对每个视角多准备一组切换码流,即每组视角码流对应一组切换码流。每组视角码流及其对应的切换码流包含的子视角相同(即空间对象相同),只是视角码流的segment时长较长,切换码流的segment时长较短。服务器可获取视角码流的编码配置参数(设为第二编码配置参数)和切换码流的编码配置参数(设为第一编码配置参数),并根据所述第一编码配置参数生成第一表示,根据第二编码配置参数生成第二表示。其中,上述第一编码配置参数可包括第一表示的分段(设为第一表示分段)的播放时长(设为第一播放时长)和第一表示对应的第一空间对象等。上述第二编码配置参数可包括第二表示的分段(设为第二表示分段)的播放时长(设为第二播放时长)和第二表示对应的第二空间对象等。服务器生成MPD时可在MPD中添加标识信息,用于标识视频中的切换码流。客户端可解析服务器发送的MPD,并根据上述标识信息区分视频的切换码流和视角码流,其中,携带上述标识信息的rep描述的码流可为切换码流,或者携带上述标识信息的分段为切换码流的分段等。其中,上述标识信息可为码流类型的标识(或称表示类型标识),分段的播放时长或者切换点的信息等。具体的,服务器可通过上述标识信息在切换码流中描述切换码流可以向视角码流切换的分段位置信息,也可以在MPD中描述切换码流可以向视角码流切换的分段位置信息,切换码流的多个segment中会存在一个或者多个可以向视角码流切换的位置点(或称切换点,具体可为可进行切换的segment的位置),视角码流与其对应的切换码流可在切换码流中包含的指定切换位置的segment进行切换。在切换码流指定切换位置的segment位置将码流切换到视角码流中segment,码流切换前后的视频内容是连续的。此外,不同的视角码流之间的segment对齐,不同切换码流之间的segment也对齐,因此不同的切换码流之间的segment可自由切换,切换码流和视角码流进行切换的前后视频内容是连续的,即切换后播放的视频内容是紧接着切换前播放的视频内容的。如图11,图11是DASH码流的分段的另一示意图。repA、repB、repC和repD是分别为空间对象A、B、C和D对应的视角码流(和图9的子视角对应)。repA’为空间对象A对应的一组切换码流中的一个,repA’和repA对应同一个子视角,repA’可为repA对应的切换码流。同理,repB’可为repB对应的切换码流,repC’可为repC对应的切换码流,repD’可为repD对应的切换码流。repA、repB、repC和repD之间的segment对齐,可根据视角的切换在各个segment的播放结束时刻(也为下一个segment的播放起始时刻)自由切换(即内容无缝切换)。repA’、repB’、repC’和repD’之间的segment对齐,可根据视角的切换在各个segment的播放结束时刻(也为下一个segment的播放起始时刻)自由切换。视角码流可在切换码流的指定segment处于切换码流进行切换,如图11中T2对应的指定segment(切换码流的第2个segment,T2为该segment的播放起始时刻)。切换码流可在指定切换点向视角码流的segment切换,如图11中的T3或者T4。其中,T3为视角码流的第二个segment的播放起始时刻。In some feasible implementation manners, the server prepares a set of switching code streams for each view angle while preparing the view code stream of the video code stream data, that is, each set of view code streams corresponds to a set of switching code streams. Each set of view code streams and their corresponding switched code streams contain the same sub-views (ie, the same spatial objects), except that the segment length of the view code stream is longer, and the segment length of the switch code stream is shorter. The server may obtain an encoding configuration parameter of the view code stream (set as the second encoding configuration parameter) and an encoding configuration parameter of the switching code stream (set as the first encoding configuration parameter), and generate a first representation according to the first encoding configuration parameter. Generating a second representation according to the second encoding configuration parameter. The first encoding configuration parameter may include a playing duration (set to the first playing duration) of the first represented segment (set as the first presentation segment), a first spatial object corresponding to the first representation, and the like. The second encoding configuration parameter may include a playing duration of the second representation segment (set to the second representation segment) (set to the second playback duration), a second spatial object corresponding to the second representation, and the like. When the server generates the MPD, the identifier information may be added to the MPD to identify the switching code stream in the video. The client may parse the MPD sent by the server, and distinguish the switching code stream and the view code stream of the video according to the foregoing identification information, where the code stream of the rep description carrying the identifier information may be a switching code stream or a part carrying the identifier information. The segment is a segment of the switching code stream, and the like. The identifier information may be an identifier of a code stream type (or a representation type identifier), a playback duration of the segment, or information of a switch point. Specifically, the server may describe the segment location information that the switching code stream can switch to the view code stream in the switching code stream by using the foregoing identifier information, or describe the segment location information that the switching code stream can switch to the view code stream in the MPD. The plurality of segments of the switching code stream may have one or more location points (or switching points, specifically, locations of segments that can be switched) that can be switched to the view code stream, and the view code stream and its corresponding switching code. The stream can be switched in the segment of the specified switching location contained in the switching code stream. The code stream is switched to the segment in the view code stream at the segment position of the switching code designation switching position, and the video content before and after the code stream switching is continuous. In addition, the segment alignment between different view code streams, the segments between different switch code streams are also aligned, so the segments between different switch code streams can be freely switched, and the switching code stream and the view code stream are switched before and after the video. The content is continuous, that is, the video content played after the switching is the video content played immediately before the switching. Figure 11 is another schematic diagram of the segmentation of the DASH code stream. repA, repB, repC, and repD are view code streams corresponding to spatial objects A, B, C, and D, respectively (corresponding to the sub-view of FIG. 9). repA' is one of a set of switching code streams corresponding to the spatial object A, repA' and repA correspond to the same sub-viewpoint, and repA' may be a switching code stream corresponding to repA. Similarly, repB' may be a switching code stream corresponding to repB, repC' may be a switching code stream corresponding to repC, and repD' may be a switching code stream corresponding to repD. The segment alignment between repA, repB, repC, and repD can be switched at the end of each segment (also the playback start time of the next segment) according to the switching of the view angle (ie, the content seamlessly switches). The segment alignment between repA', repB', repC', and repD' can be switched freely according to the switching of the angle of view at the end of playback of each segment (also the playback start time of the next segment). The view code stream can be switched in the switching code stream in the specified segment of the switching code stream, as shown in FIG. 11 for the specified segment corresponding to T2 (the second segment of the switching code stream, and T2 is the playback start time of the segment). The switching code stream can be switched to the segment of the view code stream at the specified switching point, such as T3 or T4 in FIG. Where T3 is the playback start time of the second segment of the view stream.
在一些可行的实施方式中,服务器准备了视频数据的视角码流以及每个视角码流对应的切换码流之后,则在MPD中描述视角码流和切换码流。客户端向服务器请求MPD,进而可解析服务器发送的MPD,从MPD中获取切换码流的标识信息。客户端还可从上述MPD中获取视角码流的视角码流信息,例如上述repA、repB、repC和repD等视角码流的 视角码流信息。其中,上述视角码流信息可包括视角码流中每个segment的时长、和每个segment的相关的URL等,具体可参见上述DASH标准中描述的分段信息。客户端还可从上述MPD中获取切换码流的切换码流信息,例如上述repA’、repB’、repC’和repD’等切换码流的切换码流信息。其中,上述切换码流信息可包括切换码流中每个segment的时长以及每个segment相关的URL等。此外,切换码流信息还包括上述用于标识切换码流的标识信息。其中,上述表示类型标识用于标识第一表示,若接收到空间对象的切换指令,客户端优先选择空间对象切换的指定空间对象对应的指定第一表示的分段进行视频内容的切换。客户端也可根据码流的分段的播放时长确定视频中的切换码流和视角码流。上述切换点信息用于标识切换码流与视角码流进行内容无缝切换的切换分段信息,包括:切换码流向视角码流切换的切换码流分段间隔、切换码流向视角码流切换的切换码流分段位置以及切换码流切换至视角码流的视角码流分段位置等。具体实现中,上述标识信息可携带在媒体呈现描述中携带的切换码流所在码流集合的属性信息中(如上述adaptation set的属性信息中);或者上述标识信息携带在媒体呈现描述中携带的切换码流的属性信息中(如上述representation的属性信息中);或者携带在媒体呈现描述中携带的切换码流的码流分段的属性信息中(如上述segment的属性信息中)。具体实现中,也可携带在需要进行视频内容切换的目标切换码流的索引分段中。In some possible implementation manners, after the server prepares the view code stream of the video data and the switch code stream corresponding to each view code stream, the view code stream and the switch code stream are described in the MPD. The client requests the MPD from the server, and then can parse the MPD sent by the server, and obtain the identification information of the switching code stream from the MPD. The client may also obtain the view code stream information of the view code stream from the MPD, for example, the view code streams of the foregoing repA, repB, repC, and repD. Viewing stream information. The foregoing view code stream information may include a duration of each segment in the view code stream, a related URL of each segment, and the like. For details, refer to the segmentation information described in the DASH standard. The client may also obtain switching code stream information of the switching code stream from the MPD, for example, switching code stream information of the switching code stream such as repA', repB', repC', and repD'. The foregoing switching code stream information may include a duration of each segment in the switching code stream, a URL related to each segment, and the like. In addition, the switching code stream information further includes the foregoing identification information for identifying the switching code stream. The representation type identifier is used to identify the first representation. If the handover instruction of the spatial object is received, the client preferentially selects the segment of the designated first representation corresponding to the specified spatial object switched by the spatial object to switch the video content. The client may also determine the switching code stream and the view code stream in the video according to the playing duration of the segment of the code stream. The switching point information is used to identify the switching segment information of the switching code stream and the view code stream for seamlessly switching the content, including: switching the code stream segmentation interval of the switching code stream to the view code stream switching, and switching the code stream to the view code stream switching. Switching the code stream segment position and switching the code stream to the view code stream segment position of the view code stream, and the like. In a specific implementation, the identifier information may be carried in the attribute information of the code stream set of the switching code stream carried in the media presentation description (such as the attribute information of the adaptation set); or the identifier information is carried in the media presentation description. Switching the attribute information of the code stream (as in the attribute information of the representation described above); or carrying the attribute information of the code stream segment of the switching code stream carried in the media presentation description (as in the attribute information of the segment above). In a specific implementation, it may also be carried in an index segment of a target switching code stream that needs to perform video content switching.
在一些可行的实施方式中,上述表示类型标识具体可为MPD中新增的语法元素,用于标识携带上述语法元素的rep描述的码流为切换码流。具体实现中,客户端可通过MPD中增加的语法元素快速识别切换码流和视角码流,进而可在视角切换时,从各个切换码流中选择视角切换的目标空间对象对应的目标切换码流。快速接入新视角,呈现新视角的视频数据。其中,上述语法元素可包括:FovType、FovGroup、FOV_group_change_Info等。下面将对上述几种可行的MPD语法元素的描述方式进行描述:In some possible implementations, the foregoing representation type identifier may be a new syntax element in the MPD, and the code stream used to identify the rep description carrying the syntax element is a handover code stream. In a specific implementation, the client can quickly identify the switching code stream and the view code stream by using the added syntax elements in the MPD, and then select the target switching code stream corresponding to the target space object of the view switching from each switching code stream when the view angle is switched. . Quickly access new perspectives and present video data from new perspectives. The syntax elements may include: FovType, FovGroup, FOV_group_change_Info, and the like. The following describes the description of several possible MPD syntax elements:
方式一:method one:
如下表2,表2为一语法元素的属性信息表:As shown in Table 2 below, Table 2 is a property information table of a syntax element:
表2Table 2
Figure PCTCN2017086548-appb-000022
Figure PCTCN2017086548-appb-000022
客户端可解析视频码流的MPD,若从MPD中解析得到某个representation中携带字符FovType,FovType取值不作限定描述,则可确定该representation描述的码流为切换码流。 如果是切换码流,在相同的视角,码率等参数下,客户端首选该representation来呈现新的视角,可提高视角的切换效率,增强用户体验。The client can parse the MPD of the video stream. If the character FovType is carried in a representation from the MPD and the value of the FovType is not limited, the code stream described by the representation can be determined as the switching code stream. If the code stream is switched, the client prefers the representation to present a new perspective under the same viewing angle, code rate and other parameters, thereby improving the switching efficiency of the view and enhancing the user experience.
MPD示例1:MPD example 1:
Figure PCTCN2017086548-appb-000023
Figure PCTCN2017086548-appb-000023
在本MPD示例中Representation id=“3"的表达中携带“fovType=‘1’”,标记着Representation id=“3"的表达的是切换码流;Representation id=“2"中“fovType”缺省,则默认“fovType=0”,标记着Representation id=“2"的表达的是视角码流。上述示例中其他描述与DASH标准中提供的相关MPD描述的格式相同,具体可参见DASH标准中提供的描述,在此不做限制。下面各个示例的相关描述也可参见DASH标准中提供的描述,以下不再赘 述。In this MPD example, Representation id=“3” carries “fovType='1'”, and the expression of Representation id=“3” is the switching code stream; Representation id=“2” in “fovType” For the province, the default is "fovType=0", and the representation of Representation id=“2” is the view stream. The other descriptions in the above examples are the same as the descriptions of the related MPDs provided in the DASH standard. For details, refer to the description provided in the DASH standard, which is not limited herein. The related descriptions of the following examples can also be found in the description provided in the DASH standard. Said.
MPD示例2:MPD example 2:
Figure PCTCN2017086548-appb-000024
Figure PCTCN2017086548-appb-000024
在本MPD示例中AdaptationSet id=”2”的AdaptationSet属性信息中携带fovType,标记着AdaptationSet id=”2”下层的所有rep描述的码流均为切换码流;AdaptationSet id=”1”的AdaptationSet属性信息中fovType缺省,则默认为“fovType=0”,标记着AdaptationSet id=”1”下层的所有rep描述的码流均为不是切换码流。 In the MPD example, the AdaptationSet attribute information of the AdaptationSet id=”2” carries the fovType, and the code stream of all the rep descriptions marked under the AdaptationSet id=”2” is the switching code stream; the AdaptationSet attribute of the AdaptationSet id=”1” By default, fovType in the message defaults to "fovType=0", and all code streams marked with the rep description of the lower layer of AdaptationSet id=”1” are not the switching code stream.
方式二:Method 2:
如下表3,表3为另一语法元素的属性信息表:As shown in Table 3 below, Table 3 is an attribute information table of another syntax element:
表3table 3
Figure PCTCN2017086548-appb-000025
Figure PCTCN2017086548-appb-000025
其中,上述标识着switch-representation和同属于一个adaptationset下的其他representation内容一样,但是该representation不是所有的segment都可以和其他representation的segment无缝切换,该representation只能在指定segment处于其他representation进行切换,表明该representation为切换码流。在视角发生切换的时候,客户端首先获取该representation的segment进行新视角的呈现。Wherein, the above-mentioned switch-representation is the same as other representation contents belonging to an adaptation set, but not all the segments can be seamlessly switched with other representations, and the representation can only be switched when the specified segment is in another representation. , indicating that the representation is a switching code stream. When the view angle is switched, the client first obtains the segment of the representation to present a new perspective.
MPD示例3:MPD example 3:
Figure PCTCN2017086548-appb-000026
Figure PCTCN2017086548-appb-000026
Figure PCTCN2017086548-appb-000027
Figure PCTCN2017086548-appb-000027
在本MPD示例中,增加了新的表达类型switch-representation,其中,上述switch-representation可为切换码流所属描述层的类型标识,标识着switch-representation id=“3"的表达的码流是切换码流。In the MPD example, a new expression type switch-representation is added, wherein the switch-representation may be a type identifier of a description layer to which the switching code stream belongs, and a code stream indicating that the expression of switch-representation id=“3” is Switch the code stream.
方式三:Method three:
在MPD中增加新的语法FovGroup,将representation分组,一组是视角码流,即为现有的representation;另一组是新增的码流,即切换码流。A new syntax FovGroup is added to the MPD, and the representation is grouped. One group is the view code stream, which is the existing representation, and the other group is the new code stream, that is, the switching code stream.
MPD示例4:MPD example 4:
Figure PCTCN2017086548-appb-000028
Figure PCTCN2017086548-appb-000028
Figure PCTCN2017086548-appb-000029
Figure PCTCN2017086548-appb-000029
在该MPD中,在representation增加了分组信息,根据分组信息确定各个segment之间可自由切换的分组。其中,FovGroup=”2”标记的是切换码流的分组,FovGroup=”1”标记的是视角码流的分组。各个分组中组内representation可自由切换,即同属于视角码流的representation的各个segment之间可自由切换,同属于切换码流的representation的各个segment之间可自由切换。属于不同的分组的representation只能在指定segment处进行切换。比如Representation id=“3"和Representation id=“5"的representation的FovGroup=”2”,这两个representation描述的均为切换码流,这两个representation的segment都是对齐的,可以无缝切换。In the MPD, packet information is added in the representation, and packets that are freely switchable between the segments are determined based on the packet information. Among them, FovGroup=”2” marks the grouping of the switching code stream, and FovGroup=”1” marks the grouping of the view code stream. The representations in the group can be switched freely in each group, that is, the segments belonging to the representation of the view code stream can be switched freely, and the segments belonging to the representation of the switching code stream can be freely switched. Representations belonging to different groups can only be switched at the specified segment. For example, Representation id=“3” and Representation id=“5” representation of FovGroup=”2”. Both representations describe the switching code stream. The segments of the two representations are aligned and can be switched seamlessly. .
在一些可行的实施方式中,上述MPD中携带的标识信息具体可为MPD中的已有语法元素,例如,分段对应的播放时长(duration)属性。客户端可以通过解析MPD中包含的分段对应的播放时长(duration)属性,将分段的播放时长最短的码流作为切换码流。In some feasible implementation manners, the identifier information carried in the MPD may be an existing syntax element in the MPD, for example, a duration attribute corresponding to the segment. The client can use the duration attribute corresponding to the segment included in the MPD to use the code stream with the shortest playback duration as the switching code stream.
在一些可行的实施方式中,客户端解析视频码流的MPD并确定了MPD中各个representation描述的码流类型之后,则可根据用户观看视频时的视角进行相关视角码流的请求和播放,以及视角码流与切换码流的切换播放等操作。具体实现中,客户端解码得到各个视角对应的视角码流的视角码流信息之后,则可首先根据用户当前观看视频的视角(设为第一视角)确定第一视角对应的空间对象(设为当前空间对象),进而可根据MPD中描述的各个视角码流对应的空间对象,确定第一视角对应的第一视角码流(或称当前视角码流)。进一步的,客户端可根据第一视角码流的视角码流信息向服务器请求第一视角码流。服务器接收到客户端的请求之后,则可将第一视角码流发送给客户端。客户端接收到第一视角码流之后,则可解码并播放第一视角码流。例如,假设上述第一视角码流为图10的 repD,客户端获取得到上述repD之后,则可从repD的第一个segment(可标记为segmentD1)开始播放repD。In some feasible implementation manners, after the client parses the MPD of the video code stream and determines the code stream type of each representation description in the MPD, the request and playback of the related view code stream may be performed according to the perspective of the user when watching the video, and The operation of switching between the view code stream and the switching code stream. In a specific implementation, after the client decodes the view code stream information of the view code stream corresponding to each view, the space object corresponding to the first view may be first determined according to the view angle of the currently viewed video of the user (set as the first view) (set to The current spatial object, and the first view code stream corresponding to the first view angle (or the current view code stream) may be determined according to the spatial object corresponding to each view code stream described in the MPD. Further, the client may request the first view code stream from the server according to the view code stream information of the first view code stream. After the server receives the request from the client, the first view code stream can be sent to the client. After receiving the first view code stream, the client can decode and play the first view code stream. For example, assume that the first view code stream is the one of FIG. repD, after the client obtains the above repD, it can start playing repD from the first segment of repD (which can be marked as segmentD1).
具体实现中,在本发明实施例中MPD中携带的标识信息,也可以携带在基于HTTP协议的实时流(英文:Http Live Streaming,HLS)定义的.m3u8文件中或者平滑流(英文:Smooth Streaming,SS的.ismc文件中,具体可根据实际应用场景需求确定,在此不做限制。本发明实施例将以上述标识信息携带在DASH码流为例进行说明。In a specific implementation, the identifier information carried in the MPD in the embodiment of the present invention may also be carried in an .m3u8 file defined by an HTTP-based real-time stream (English: Http Live Streaming, HLS) or a smooth stream (English: Smooth Streaming) The SS.ismc file may be determined according to the requirements of the actual application scenario, and is not limited herein. The embodiment of the present invention will be described by taking the above identification information in the DASH code stream as an example.
S802,得到切换指令信息。S802, obtaining switching instruction information.
S803,根据所述标识信息和所述切换指令信息,从所述视频的第一表示中确定出目标表示。S803. Determine, according to the identifier information and the switching instruction information, a target representation from the first representation of the video.
在一些可行的实施方式中,如图12,图12是视角变化对应的空间对象变化的另一示意图。如图所描述,在VR视频呈现的空间被划分成的9个空间对象,包括空间对象A至空间对象I中,每个空间对象都准备了一组视角码流和切换码流。在图12的(a)(b)(c)中的虚线框可表示为当前呈现的空间对象(即当前空间对象),实线框可表示为切换后呈现的空间对象(即目标空间对象)。In some possible implementations, as shown in FIG. 12, FIG. 12 is another schematic diagram of a spatial object change corresponding to a change in viewing angle. As described in the figure, in the nine spatial objects into which the space presented by the VR video is divided, including the spatial object A to the spatial object I, each spatial object is prepared with a set of view code streams and a switching code stream. The dashed box in (a), (b), and (c) of FIG. 12 may be represented as a currently presented spatial object (ie, a current spatial object), and the solid line frame may be represented as a spatial object that is rendered after switching (ie, a target spatial object). .
在图12(a)中,当前空间对象对应的视角包括空间对象A、B、D和E;切换后的目标空间对象对应的视角可包括空间对象B、C、E和F,或者切换后的目标空间对象对应的视角也可包括空间对象C和F,在此不做限制。在图12(b)中,当前空间对象对应的视角包括空间对象A、B、D和E;切换后的目标空间对象对应的视角可包括空间对象E、F、H和I,或者切换后的目标空间对象对应的视角可包括空间对象F、H和I,在此不做限制。在图12(c)中,当前空间对象对应的视角可包括空间对象A和B;切换后的目标空间对象对应的视角包括空间对象E、F、H和I,在此不做限制。下面将结合步骤704对空间对象切换时带来的视频内容的切换进行描述。In FIG. 12( a ), the view angle corresponding to the current space object includes the space objects A, B, D, and E; the view angle corresponding to the switched target space object may include the space objects B, C, E, and F, or the switched The perspective corresponding to the target space object may also include the spatial objects C and F, which are not limited herein. In FIG. 12(b), the view angle corresponding to the current space object includes the space objects A, B, D, and E; the view angle corresponding to the switched target space object may include the space objects E, F, H, and I, or the switched The perspective corresponding to the target space object may include the spatial objects F, H, and I, and is not limited herein. In FIG. 12(c), the viewing angle corresponding to the current spatial object may include the spatial objects A and B; the viewing angle corresponding to the switched target spatial object includes the spatial objects E, F, H, and I, which are not limited herein. The switching of the video content brought about by the spatial object switching will be described below in conjunction with step 704.
S804,获取所述视频的当前播放时刻,根据所述当前播放时刻和所述目标表示得到目标表示分段。S804. Acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation.
在一些可行的实施方式中,客户端播放第一视角码流的过程中可对用户观看视频的视角进行监控。若接收到视角切换指令(即检测到当前视频空间切换到目标空间对象的切换指令信息),则可根据视角切换指令信息中携带的新视角信息确定需要切换的目标视角码流(如图11中的repB)。具体实现中,上述视角切换请求中携带的新视角信息具体可为视角切换的目标空间对象。客户端可根据MPD中描述的各个视角码流对应的空间对象,从视频码流中各个视角码流中选择目标空间对象对应的目标视角码流。进一步的,客户端还可根据MPD中描述的各个切换码流对应的指示信息,确定目标空间对象对应的切换码流(即目标码流,或称目标表示),进而可从各个切换码流中选择目标视角对应的目标切换码流(如图11中的repB’)。In some feasible implementation manners, the perspective of the user watching the video may be monitored during the process of playing the first view code stream by the client. If the view switching instruction is received (ie, the switching instruction information of the current video space is switched to the target space object), the target view code stream that needs to be switched may be determined according to the new view information carried in the view switching instruction information (as shown in FIG. 11). repB). In a specific implementation, the new view information carried in the view switching request may be a target space object of the view switch. The client may select a target view code stream corresponding to the target space object from each view code stream in the video code stream according to the spatial object corresponding to each view code stream described in the MPD. Further, the client may further determine, according to the indication information corresponding to each switching code stream described in the MPD, a switching code stream (ie, a target code stream, or a target representation) corresponding to the target space object, and then may be used in each switching code stream. Select the target switching code stream corresponding to the target perspective (such as repB in Figure 11).
在一些可行的实施方式中,客户端确定了要请求的representation(即目标表示,以下称目标切换码流)后,客户端根据MPD中描述的目标切换码流信息构造所要请求的分段的URL,进而可根据上述URL向服务器请求目标分段,获取并播放目标分段。具体实现中,客户端可获取MPD中描述的上述目标切换码流的各个segment的分段信息,上述分段信息可包括各个segment对应的播放时长(以下简称时长),根据时长信息可以计算出segment 的播放起始时刻,或者客户端根据sidx box中的segment的时长信息计算每个segment的播放起始时刻。进而可根据接收到视角切换请求的时刻(即当前视角切换切换到目标空间对象的时刻,可标记为切换触发时刻或者当前播放时刻),从目标切换码流的各个分段中选择播放起始时刻距离上述切换触发时刻最近的segment,并将该segment(即第一目标分段,设为第一segment)的播放起始时刻确定为第一视角码流向目标切换码流切换的时刻(设为第一时刻)。客户端确定了第一segment之后,构造第一segment的URL,并向服务器发送该URL请求。服务器接收到客户端的请求之后,则可将上述segment的分段数据发送给客户端。例如,图11,客户端在T1时刻接收到视角切换请求,进而可在确定了第一segment(假设为repB’的第二个segment)之后,在T2时刻切换为播放第一segment的视频数据。In some feasible implementation manners, after the client determines the representation to be requested (ie, the target representation, hereinafter referred to as the target switching code stream), the client constructs the URL of the segment to be requested according to the target switching code stream information described in the MPD. In addition, the target segment can be requested from the server according to the above URL, and the target segment can be acquired and played. In a specific implementation, the client may obtain segmentation information of each segment of the target switching code stream described in the MPD, where the segmentation information may include a duration of play corresponding to each segment (hereinafter referred to as duration), and the segment may be calculated according to the duration information. The playback start time, or the client calculates the playback start time of each segment according to the duration information of the segment in the sidx box. In addition, according to the time when the view switching request is received (that is, the time when the current view switch is switched to the target space object, which may be marked as the switch trigger time or the current play time), the play start time is selected from each segment of the target switch code stream. a segment that is closest to the switching triggering time, and determines a playback start time of the segment (ie, the first target segment is set as the first segment) as a time at which the first view code stream is switched to the target switching code stream (set to One moment). After the client determines the first segment, constructs the URL of the first segment and sends the URL request to the server. After receiving the request from the client, the server may send the segmentation data of the segment to the client. For example, in FIG. 11, the client receives the view switching request at time T1, and then switches to play the video data of the first segment at time T2 after determining the first segment (assuming a second segment of repB').
需要说明的是,目标切换码流为目标视角码流对应的切换码流,目标切换码流中包含的视频内容与目标视角码流包含的视频内容相同,并且目标切换码流的分段的播放时长小于目标视角码流的分段的播放时长。由于切换码流的segment的时长小于视角码流的分段的时长,因此,客户端不需要等到当前视角码流的当前segment(如segmentD1)播放结束即可切换到新的视角,即切换到第一segment(假设为repB’的第二个segment),提高了码流分段的切换效率。具体实现中,切换码流包含的视频内容与其对应的视角码流包含的视频内容相同,同时,切换码流的视频数据的质量也可与其对应的视角码流包含的视频数据的质量相同,或者切换码流的视频数据的质量略次于与其对应的视角码流包含的视频数据,可保障快速切换后呈现给用户较高质量视频图像的新视角,避免时延给用户带来不适,增强了VR视频观看的用户体验。It should be noted that the target switching code stream is a switching code stream corresponding to the target view code stream, and the video content included in the target switching code stream is the same as the video content included in the target view code stream, and the segmentation of the target switching code stream is played. The duration is less than the playback duration of the segment of the target view stream. Since the duration of the segment of the switched code stream is smaller than the length of the segment of the view stream, the client does not need to wait until the current segment of the current view stream (eg, segmentD1) ends, and then switches to the new view, that is, switches to the first A segment (assumed to be the second segment of repB') improves the switching efficiency of the stream segmentation. In a specific implementation, the video content included in the switching code stream is the same as the video content included in the corresponding view code stream, and the quality of the video data of the switching code stream may be the same as the quality of the video data included in the corresponding view code stream, or The quality of the video data of the switching code stream is slightly lower than the video data included in the corresponding view code stream, which can ensure a new perspective of the higher quality video image presented to the user after the fast switching, and avoids the delay to bring discomfort to the user, and enhances the User experience for VR video viewing.
在一些可行的实施方式中,客户端将播放视频数据从第一视角码流切换到目标切换码流之后,可根据MPD中携带的目标视角码流信息向服务器请求目标视角码流。具体实现中,客户端可获取MPD中切换码流的描述信息(或称分段信息),上述描述信息包括切换码流的分段时长信息以及切换码流的空间信息等。其中,切换码流的分段时长信息描述的是切换码流的segment的时长,上述空间信息描述了切换码流所对应的空间对象。客户端还可获得MPD中目标视角码流的描述信息,上述描述信息包括目标视角码流的分段时长信息以及空间信息等。其中,视角码流的分段时长信息描述的是视角码流的segment的时长,上述空间信息描述了视角码流所对应的空间对象。客户端通过目标视角码流的segment的时长,计算每个segment的起始播放时间;通过空间信息,确定和切换码流视角相同的视角码流,在视角码流中查找播放起始时间最接近当前播放时间的segment,进而可将该segment的播放起始时刻确定为第二时刻。客户端可根据该segment的URL向服务器请求该segment,接收并解码该segment,进而可在第二时刻切换到该segment上播放。In some feasible implementation manners, after the client switches the play video data from the first view code stream to the target switch code stream, the target view code stream may be requested from the server according to the target view code stream information carried in the MPD. In a specific implementation, the client may obtain description information (or segmentation information) of the switching code stream in the MPD, where the description information includes segmentation duration information of the switching code stream and spatial information of the switching code stream. The segmentation duration information of the handover code stream describes the duration of the segment of the handover code stream, and the spatial information describes the spatial object corresponding to the handover code stream. The client may also obtain description information of the target view code stream in the MPD, where the description information includes segmentation duration information of the target view code stream and spatial information. The segmentation duration information of the view code stream describes the duration of the segment of the view code stream, and the space information describes the space object corresponding to the view code stream. The client calculates the starting play time of each segment through the duration of the segment of the target view code stream; determines and switches the view stream with the same view angle of the code stream through the spatial information, and finds the play start time closest to the view code stream. The segment of the current play time, which in turn determines the playback start time of the segment as the second time. The client may request the segment from the server according to the URL of the segment, receive and decode the segment, and then switch to the segment to play at the second moment.
进一步的,在一些可行的实施方式中,客户端可通过视角码流的segment的时长,计算视角码流的每个segment的起始播放时间;通过切换码流的segment的时长,计算切换码流的每个segment的起始播放时间。进一步的,可确定目标视角码流和目标切换码流中播放起始时刻对齐的segment位置。其中播放起始时刻对齐是指在这个segment位置上进行切换码流向视角码流切换时,切换前后播放的视频内容连续并且不重复。客户端可根据该segment的URL向服务器请求该segment,接收并解码该segment,进而可在第二时刻切换到该segment上播放。 Further, in some feasible implementation manners, the client may calculate the initial play time of each segment of the view code stream by using the duration of the segment of the view code stream; and calculate the switch code stream by switching the duration of the segment of the code stream. The starting play time of each segment. Further, a segment position in which the target view code stream and the target switching code stream are aligned in the playback start time may be determined. The playback start time alignment refers to that when the switching code stream is switched to the view code stream at the segment position, the video content played before and after the switching is continuous and not repeated. The client may request the segment from the server according to the URL of the segment, receive and decode the segment, and then switch to the segment to play at the second moment.
进一步的,在一些可行的实施方式中,客户端也可根据MPD中描述的切换点信息进行目标切换码流和目标视角码流的切换。服务器生成的视频码流的MPD除了对切换码流进行标记之外,还可对每个切换码流可以向视角码流切换的位置进行标记,即切换码流与视角码流的切换点等信息进行标记。如下表4,表4为视角码流与切换码流的切换点指示信息的描述表:Further, in some feasible implementation manners, the client may also perform handover of the target switching code stream and the target view code stream according to the switching point information described in the MPD. The MPD of the video code stream generated by the server not only marks the switching code stream, but also marks the position where each switching code stream can be switched to the view code stream, that is, the switching point of the switching code stream and the view code stream. Mark it. Table 4 below is a description table of the switching point indication information of the view code stream and the switching code stream:
表4Table 4
Figure PCTCN2017086548-appb-000030
Figure PCTCN2017086548-appb-000030
其中,上述FOV_group_change_Info用于标记切换码流向视角码流切换的切换点等信息,其中,上述切换点信息用于标识第一表示(即切换码流)与第二表示(即视角码流)进行内容无缝切换的切换分段信息。上述切换分段信息包括:第一表示向第二表示切换的第一表示分段间隔、第一表示向第二表示切换的第一表示分段位置以及第一表示切换至第二表示的第二表示分段位置等。下面将通过具体的MPD示例进行描述,具体的MPD示例如下:The FOV_group_change_Info is used to mark information such as a switching point of the switching code stream to the view code stream switching, where the switching point information is used to identify the content of the first representation (ie, the switching code stream) and the second representation (ie, the view code stream). Switching segmentation information for seamless switching. The switching segmentation information includes: a first representation segmentation interval indicating a switch to the second representation, a first representation segmentation position in which the first representation switches to the second representation, and a second representation switching to the second representation. Indicates the position of the segment, etc. The following will be described by a specific MPD example. The specific MPD example is as follows:
MPD示例5:MPD example 5:
Figure PCTCN2017086548-appb-000031
Figure PCTCN2017086548-appb-000031
Figure PCTCN2017086548-appb-000032
Figure PCTCN2017086548-appb-000032
在这个MPD示例中Representation id=“3"的码流是切换码流(设为目标切换码流,即目标码流),在Segment URL media="seg-m1-3.mp4”对应的segment(第一目标码流分段)处可以切换到视角码流(设为目标视角码流),而且可FOV_group_change_Info=“2”直接表明该切换码流可以切换到视角码流的第2个segment(即第二目标码流分段)。其中FOV_group_change_Info=“2”标记着目标第一表示切换至目标第二表示的目标第二表示分段的位置。客户端解析MPD,获取得到上述标识信息之后,可由该标识信息直接确定第二目标码流分段。由视角码流的第2个segment的播放起始时刻即可确定该切换码流与视角码流的切换时刻。In this MPD example, the code stream of Representation id=“3” is the switching code stream (set as the target switching code stream, ie the target code stream), and the segment corresponding to Segment URL media="seg-m1-3.mp4" ( The first target stream stream segment can be switched to the view stream (set to the target view stream), and FOV_group_change_Info = "2" directly indicates that the switch stream can be switched to the second segment of the view stream (ie The second target stream is segmented). Wherein FOV_group_change_Info = "2" marks the location where the target first representation switches to the target second representation segment of the target second representation. After the client parses the MPD and obtains the identifier information, the second target stream segment can be directly determined by the identifier information. The switching start time of the switched code stream and the view code stream can be determined by the playback start time of the second segment of the view code stream.
MPD示例6:MPD example 6:
Figure PCTCN2017086548-appb-000033
Figure PCTCN2017086548-appb-000033
Figure PCTCN2017086548-appb-000034
Figure PCTCN2017086548-appb-000034
具体实现中,上述MPD示例6的FOV_group_change_Info还可表示可以切换的segment的间隔,即目标第一表示向目标第二表示切换的第一表示分段间隔,比如FOV_group_change_Info=4,表示切换码流中每间隔4个segment则可切换至视角码流。在该语义下,客户端可通过解析MPD得到上述FOV_group_change_Info信息,确定各个切换码流与其对应的视角码流进行切换的切换分段位置信息,进而可根据该切换分段位置信息确定切换码流及其对应的视角码流进行切换的segment。如果切换码流中包含的切换码流分段多于一个,则可从中选择距离目标切换码流的播放起始时刻最近的切换分段作为目标第一表示分段,即目标切换码流向目标视角码流切换的分段。这种语义下FOV_group_change_Info可以放在adaptationset或者representation的语法层,具体可根据实际应用场景确定,在此不做限制。In a specific implementation, the FOV_group_change_Info of the MPD example 6 may also indicate an interval of segments that can be switched, that is, a first representation segment interval in which the target first representation switches to the target second representation, such as FOV_group_change_Info=4, indicating that each of the switching code streams is The interval of 4 segments can be switched to the view stream. In the semantics, the client can obtain the FOV_group_change_Info information by parsing the MPD, determine the switching segment location information of each switching code stream and its corresponding view code stream, and further determine the switching code stream according to the switching segment location information. The segment of the corresponding view code stream is switched. If the switching code stream segment included in the switching code stream is more than one, the switching segment closest to the playback start time of the target switching code stream may be selected as the target first representation segment, that is, the target switching code stream is directed to the target viewing angle. Segmentation of code stream switching. In this semantics, FOV_group_change_Info can be placed in the syntax layer of the adaptation set or representation, which can be determined according to the actual application scenario, and is not limited here.
客户端根据上述MPD描述确定了目标视角码流对应的目标切换码流之后,则可向服务器请求目标切换码流,并且当检测到切换码流向视角码流切换的切换点信息后,按照切换点信息的指示,客户端请求目标视角码流的第二目标码流分段,并在该分段的播放起始时刻呈现该分段。After determining the target switching code stream corresponding to the target view code stream according to the foregoing MPD description, the client may request the target switching code stream from the server, and after detecting the switching point information of the switching code stream to the view code stream switching, according to the switching point The indication of the information, the client requests the second target stream segment of the target view stream, and presents the segment at the playback start time of the segment.
具体实现中,视角码流和切换码流之间的切换点信息还可以描述在码流的sixd box(索 引分段,英文:index segment)数据中,上述sixd box的语法格式在ISO/IEC 14496-12中的描述如下:In a specific implementation, the switching point information between the view code stream and the switched code stream can also be described in the sixd box of the code stream. In the data segmentation, the syntax of the above sixd box is described in ISO/IEC 14496-12 as follows:
Figure PCTCN2017086548-appb-000035
Figure PCTCN2017086548-appb-000035
其中,上述描述中包含的语法元素表示的含义如下:The meanings of the syntax elements included in the above description are as follows:
reference_ID:码流的ID;reference_ID: the ID of the code stream;
timescale:时间单位;Timescale: time unit;
earliest_presentation_time:index segment中描述的码流的最早呈现时间,以timescale为单位;Earliest_presentation_time: The earliest rendering time of the code stream described in the index segment, in units of timescale;
first_offset:第一个segment在index segment后的起始偏移;First_offset: the starting offset of the first segment after the index segment;
reference_count:index segment中描述的segment的个数;Reference_count: the number of segments described in the index segment;
reference_type;1表示segment是index segment,0表示segment是媒体内容;Reference_type; 1 indicates that the segment is an index segment, and 0 indicates that the segment is a media content;
referenced_size:segment的大小;Referenced_size: the size of the segment;
subsegment_duration:以timescale为单位的segment持续时长;Subsegment_duration: the duration of the segment in timescale;
starts_with_SAP:segment的流接入类型;starts_with_SAP: the stream access type of the segment;
SAP_delta_time:第一个流接入点的最早呈现时间。 SAP_delta_time: The earliest rendering time of the first streaming access point.
FOV_group_change_Info:切换点标识信息,表示当前分段(segment,即目标第一表示分段)可以和具有相同内容成分的其他任意表达(representation)切换,即目标第一表示向目标第二表示切换的目标第一表示分段位置。FOV_group_change_Info: switching point identification information, indicating that the current segment (ie, the target first representation segment) can be switched with other arbitrary representations having the same content component, that is, the target first representation is switched to the target second representation. The first indicates the segment position.
其中,上述FOV_group_change_Info表达的含义可包括以下2种:The meaning of the above FOV_group_change_Info expression may include the following two types:
1、该FOV_group_change_Info信息可以是标识当前segment是否可以和其他携带Duration/FOVGroup/FovType等属性信息的rep的segment切换。具体的,还可在携带该信息的segment的分段信息中描述当前segment可切换的视角码流的指示信息,通过该视角码流的指示信息则可确定该切换码流对应的视角码流。1. The FOV_group_change_Info information may be a segment switch indicating whether the current segment can be rep with other reps carrying attribute information such as Duration/FOVGroup/FovType. Specifically, the indication information of the current segment switchable view code stream may be described in the segment information of the segment carrying the information, and the view code corresponding to the switch code stream may be determined by the indication information of the view code stream.
比如上述实现方式中的MPD的示例1~3,Representation id=“3"的码流文件video-3.mp4中包含上述的sidx box,在该box中解析到第n个segment的FOV_group_change_Info=1,表示这个segment是可以切换到具有相同内容成分的其他representaion;在上述示例1~3中,Representation id="2"的码流和Representation id=“3"的码流具有相同的视角,(Representation id="2"的码流仅是示例,具体可根据实际应用场景确定该segment对应的视角码流),所以Representation id=“3"在第n个segment的位置可以切换到Representation id="2",反之则不可以切换。如果是上述MPD示例4,Representation id=“3"的FovGroup=”2”,并且解析sidx box获得第n个segment的FOV_group_change_Info=1,表示Representation id=“3"的码流在第n个segment的位置可以切换到属性FOVGroup=1(即视角码流,以rep id=“2”的码流为例)的representaion。For example, in the examples 1 to 3 of the MPD in the above implementation manner, the code stream file video-3.mp4 of Representation id=“3” includes the above sidx box, and the FOV_group_change_Info=1 of the nth segment is parsed in the box. Indicates that this segment can be switched to other representaions with the same content component; in the above examples 1 to 3, the code stream of Representation id="2" and the code stream of Representation id="3" have the same perspective, (Representation id The code stream of ="2" is only an example, and the view code stream corresponding to the segment can be determined according to the actual application scenario, so Representation id=“3” can be switched to Representation id="2" at the position of the nth segment. Otherwise, you cannot switch. If it is the above MPD example 4, Representation id = "3" FovGroup = "2", and parse the sidx box to obtain the FOV_group_change_Info=1 of the nth segment, indicating that the code stream of Representation id = "3" is in the nth segment The position can be switched to the representationaion of the attribute FOVGroup=1 (ie, the view stream, taking the code stream of rep id=“2” as an example).
2、该FOV_group_change_Info信息也可以是当前携带该信息的segment可以切换的其他携带Duration/FOVGroup/FovType等属性信息的码率的segment ID的值。比如FOV_group_change_Info=4,表示当前segment可以和视角码流的第4个segment切换。2. The FOV_group_change_Info information may also be a value of a segment ID of a code rate that carries the attribute information such as the Duration/FOVGroup/FovType that the segment carrying the information can currently switch. For example, FOV_group_change_Info=4 indicates that the current segment can be switched to the fourth segment of the view stream.
具体实现中,视角码流和切换码流之间的切换点信息还可以描述在其他的新的box中,比如:In a specific implementation, the switching point information between the view code stream and the switching code stream can also be described in other new boxes, such as:
Figure PCTCN2017086548-appb-000036
Figure PCTCN2017086548-appb-000036
上述的FOV_group_change_Info语义和在sidx中语义一致;The above FOV_group_change_Info semantics are consistent with the semantics in sidx;
还可以描述为:It can also be described as:
aligned(8)class SegmentSwitchBox extends FullBox(‘sswx’,version,flag){Aligned(8)class SegmentSwitchBox extends FullBox(‘sswx’,version,flag){
  unsigned int(8)          FOV_group_change_Info;Unsigned int(8) FOV_group_change_Info;
}}
FOV_group_change_Info:该信息表示切换流的segment向视角码流切换的segment的间隔。FOV_group_change_Info: This information indicates the interval of the segment in which the segment of the switching stream is switched to the view code stream.
具体实现中,客户端可根据目标切换码流的分段信息中携带的切换点信息确定目标切 换码流向目标视角码流切换的切换点,进而根据MPD中描述的目标视角码流的URL等信息向服务器请求目标视角码流。其中,上述目标切换码流的分段信息可包括目标切换码流向目标视角码流切换的切换分段位置信息,例如上述MPD中携带的FOV_group_change_Info元素的值所指明的切换分段位置,或者上述FOV_group_change_Info元素的值所指明的切换分段的分段间隔等。客户端可根据当前视角码流切换到目标切换码流时对应的目标切换码流的segment(设定第一切换分段,例如repB’的第二个segment),结合上述FOV_group_change_Info的值指代的切换分段位置信息确定目标切换码流向目标视角码流切换的目标分段(设为第二切换分段)。例如,如图10,假设上述MPD中描述的目标切换码流的分段信息中携带FOV_group_change_Info=2的指示信息,指明目标切换码流的第5个segment(标记为第二segment)可以和目标视角码流的第2个segment切换。客户端则可根据上述FOV_group_change_Info=2的指示信息确定在第二视角的切换码流的第4个segment之后,客户端可以请求第二视角的视角码流的第2个segment。In a specific implementation, the client may determine the target cut according to the switch point information carried in the segment information of the target switching code stream. The escape point is switched to the switching point of the target view code stream switching, and then the target view code stream is requested from the server according to the information such as the URL of the target view code stream described in the MPD. The segmentation information of the target switching code stream may include switching segment location information that is switched by the target switching code stream to the target view code stream, for example, a switching segment position indicated by a value of a FOV_group_change_Info element carried in the MPD, or the FOV_group_change_Info The segmentation interval of the switching segment specified by the value of the element, and the like. The client may switch to the segment of the target switching code stream corresponding to the target switching code stream according to the current view code stream (set the first switching segment, for example, the second segment of repB'), and combine the value of the FOV_group_change_Info mentioned above. The handover segment location information determines a target segment (set as a second handover segment) of the target handover code stream to the target view code stream handover. For example, as shown in FIG. 10, it is assumed that the segmentation information of the target switching code stream described in the foregoing MPD carries the indication information of FOV_group_change_Info=2, indicating that the fifth segment of the target switching code stream (marked as the second segment) can be compared with the target perspective. The second segment of the code stream switches. The client may determine, according to the indication information of the FOV_group_change_Info=2, that after the fourth segment of the switched code stream of the second view, the client may request the second segment of the view code stream of the second view.
在一些可行的实施方式中,客户端可以根据MPD中的segment的时长或者sidx box中的segment的时长来计算每个segment的播放起始时刻,通过segment的播放起始时刻来确定了第二时刻,比如将视角码流中的segment的播放起始时刻和切换码流中的segment播放起始时刻最接近的时刻确定为第二时刻。确定第二时刻后,则可向服务器请求该时刻对应的目标视角码流的目标分段(如图10中的repB的第二个segment,标记为segmentB2),其中,上述第二时刻可为segmentB2的播放起始时刻,或者第二时刻与segmentB2的播放起始时刻的时间距离最短。客户端可通过第二时刻与目标视角码流中各个分段的播放起始时刻的比较,从各个分段中选择目标切换分段,如segmentB2,并向服务器请求该segment。客户端接收到服务器发送的segmentB2之后,则可在目标切换码流播放到segmentB2的播放起始时刻时,将播放视频数据切换为segmentB2,为用户呈现第二视角的高质量视频。客户端接收到视角切换请求之后,客户端播放的视频数据从当前视角码流切换到目标视角码流之前,可先将播放的视频数据从当前视角码流切换到目标切换码流,以更快的速度为用户呈现新的视角的视频图像。进一步的,客户端可在预设的目标切换码流向目标视角码流切换的第二时刻将播放视频数据切换为目标视角码流。如图10,客户端播放segmentD1时,用户在T1时刻触发视角切换请求,客户端可在T2时刻切换至上述第一segment,即可在T1与T2之间的短时间内为用户呈现新视角的画面。进而可在T3时刻从第一segment切换至segmentB2,完成第一视角到第二视角的切换。若按照现有的DASH标准提供的分段切换方法,用户在第T1时刻触发视角切换请求,客户端需要等待segmentD1播放结束之后,在T3时刻切换到segmentB2,用户需要等待新视角的时长为(T3-T1)。若(T3-T1)大于200ms,则将给用户带来不适感,用户体验低。In some feasible implementation manners, the client may calculate the playback start time of each segment according to the duration of the segment in the MPD or the duration of the segment in the sidx box, and determine the second moment by the playback start time of the segment. For example, the time at which the playback start time of the segment in the view code stream and the segment play start time in the switching code stream are closest to each other is determined as the second time. After determining the second time, the server may request the target segment of the target view code stream corresponding to the time (such as the second segment of repB in FIG. 10, labeled segmentB2), where the second moment may be segmentB2. The playback start time, or the second time is the shortest distance from the playback start time of segmentB2. The client may select a target handover segment from each segment, such as segment B2, by comparing the second moment with the playback start time of each segment in the target view stream, and request the segment from the server. After receiving the segmentB2 sent by the server, the client may switch the play video data to segmentB2 when the target switch code stream is played to the playback start time of segmentB2, and present the user with the second-view high-quality video. After receiving the view switching request, the client can switch the played video data from the current view code stream to the target switch code stream before switching the video data played by the client to the target view code stream. The speed of the video presents the user with a new perspective. Further, the client may switch the play video data to the target view code stream at a second moment when the preset target switch code stream is switched to the target view code stream. As shown in FIG. 10, when the client plays segmentD1, the user triggers the view switching request at time T1, and the client can switch to the first segment at time T2, so that the user can present a new perspective in a short time between T1 and T2. Picture. In turn, the first segment can be switched to the segment B2 at the time T3, and the switching from the first perspective to the second perspective is completed. According to the segmentation switching method provided by the existing DASH standard, the user triggers the view switching request at the time T1, and the client needs to wait for the segmentD1 to play after the end, and then switch to segmentB2 at time T3, and the user needs to wait for the new view time (T3). -T1). If (T3-T1) is greater than 200ms, it will bring discomfort to the user and the user experience is low.
进一步的,在一些可行的实施方式中,上述目标切换码流的分段信息可包括目标切换码流向目标视角码流切换的一个或者多个切换时刻,切换时刻用于指示目标切换码流可以向目标视角码流切换的时间节点,具体可表示为某一个segment的播放起始时刻,如图10中segmentB2的播放起始时刻T3和segmentB3的播放起始时刻T4等。其中,上述切换时刻具体可为某个segment的播放起始时刻,例如上述第二segment的播放起始时刻。具体的, 服务器端可在MPD或者索引分段中描述的目标切换码流的分段信息字段添加切换时刻的指示信息。客户端解析MPD或者索引分段之后可从上述MPD或者索引分段中获取切换时刻的指示信息,确定目标切换码流向目标视角码流切换的切换时刻。客户端确定了目标切换码流向目标视角码流切换的各个切换时刻之后,则可从中选择具体第一时刻最近的切换时刻作为本次目标切换码流向目标视角码流切换的切换时刻(即第二时刻)。进一步的,客户端可从服务器请求目标视角码流的各个分段中播放起始时刻距离第二时刻最近的segment(如repB2),并且切换至该segment上播放。Further, in some feasible implementation manners, the segmentation information of the target switching code stream may include one or more switching moments of the target switching code stream switching to the target viewing angle code stream, where the switching moment is used to indicate that the target switching code stream can be directed to The time node of the target view code stream switching may be specifically represented as the playback start time of a certain segment, such as the play start time T3 of the segment B2 and the play start time T4 of the segment B3 in FIG. 10 . The switching time may specifically be a playback start time of a certain segment, for example, a playback start time of the second segment. specific, The server side may add indication information of the switching moment in the segment information field of the target switching code stream described in the MPD or the index segment. After the client parses the MPD or the index segment, the indication information of the switching time may be obtained from the MPD or the index segment, and the switching moment of the target switching code stream to the target view code stream switching is determined. After the client determines the switching moments of the target switching code stream to the target view code stream switching, the most recent switching time of the specific first moment can be selected as the switching moment of the current target switching code stream to the target viewing angle code stream switching (ie, the second time). Further, the client may request a segment (such as repB2) whose starting time is closest to the second time in the respective segments of the target view code stream from the server, and switch to the segment to play.
需要说明的是,在上述实现方式中,上述第一时刻可为第一segment的播放起始时刻,第二时刻可为第二segment的播放起始时刻,第一segment与第二segment之间间隔3个segment。即第一时刻和第二时刻之间的时长为目标切换码流的码流分段时长的N(假设为3)倍。具体实现中,N为大于或者等于1的整数,具体可根据实际应用场景确定,在此不做限制。It should be noted that, in the foregoing implementation manner, the first moment may be a playback start time of the first segment, and the second moment may be a playback start time of the second segment, and the interval between the first segment and the second segment 3 segments. That is, the duration between the first time and the second time is N (assumed to be 3) times the length of the code stream segmentation of the target switching code stream. In a specific implementation, N is an integer greater than or equal to 1, and may be determined according to an actual application scenario, and is not limited herein.
在本发明实施例中,客户端可解析视频数据的MPD,确定视频数据中的各个视角码流的视角码流信息和各个切换码流的切换码流信息。客户端可根据用户当前观看视频的视角以及上述确定的各个视角码流的视角码流信息向服务器请求或者当前视角对应的视角码流进行播放。客户端接收到视角切换请求之后,客户端播放的视频数据从当前视角码流切换到目标视角码流之前,可先将播放的视频数据从当前视角码流切换到目标切换码流,以更快的速度为用户呈现新的视角的视频图像。进一步的,客户端可在确定目标切换码流向目标视角码流切换的第二时刻之后,在目标切换码流播放到第二时刻时将播放视频数据切换为目标视角码流。本发明实施例通过提供切换码流,可以使得客户端在终端用户切换视角的过程中,将码流快速切换到切换码流以得到高质量的新视角,并且通过切换码流和视角码流的切换点信息,使得客户端在请求一段切换码流后,切换到视角码流,保证客户端所接收的码流的压缩性能是最佳的,在同等带宽条件下可以保证视角视频的最佳体验。In the embodiment of the present invention, the client may parse the MPD of the video data, determine the view code stream information of each view code stream in the video data, and switch code stream information of each switch code stream. The client may request to play the server according to the view angle of the video currently viewed by the user and the view code stream information of each view code stream determined above or the view code stream corresponding to the current view. After receiving the view switching request, the client can switch the played video data from the current view code stream to the target switch code stream before switching the video data played by the client to the target view code stream. The speed of the video presents the user with a new perspective. Further, the client may switch the play video data to the target view code stream when the target switch code stream is played to the second time after determining the second time when the target switch code stream is switched to the target view code stream. The embodiment of the present invention can provide a switching code stream, so that the client can quickly switch the code stream to the switching code stream in the process of switching the viewing angle of the terminal user to obtain a high-quality new perspective, and by switching the code stream and the view code stream. The switching point information is such that the client switches to the view code stream after requesting a piece of switching code stream, so that the compression performance of the code stream received by the client is optimal, and the best experience of the view video can be guaranteed under the same bandwidth condition. .
参见图13,是本发明实施例提供的客户端的一结构示意图。本发明实施例提供的客户端,包括:FIG. 13 is a schematic structural diagram of a client provided by an embodiment of the present invention. The client provided by the embodiment of the present invention includes:
获取模块131,用于解析媒体呈现描述,获取标识信息,所述标识信息用于标识视频的第一表示,所述第一表示的分段的播放时长小于所述视频的第二表示的分段的播放时长。The obtaining module 131 is configured to parse the media presentation description, and obtain the identifier information, where the identifier information is used to identify the first representation of the video, and the playback duration of the segment of the first representation is smaller than the segment of the second representation of the video. The playing time.
接收模块132,用于得到切换指令信息,所述切换指令信息用于指示将当前空间对象切换到目标空间对象。The receiving module 132 is configured to obtain switching instruction information, where the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object.
确定模块133,用于根据所述获取模块获取的所述标识信息和所述接收模块接收的所述切换指令信息,从所述视频的第一表示中确定出目标表示,所述目标表示和所述目标空间对象相对应。a determining module 133, configured to determine, according to the identifier information acquired by the acquiring module and the switching instruction information received by the receiving module, a target representation, the target representation and a location from a first representation of the video The target space object corresponds.
所述获取模块131,还用于获取所述视频的当前播放时刻,根据所述当前播放时刻和所述确定模块确定的所述目标表示得到目标表示分段。The obtaining module 131 is further configured to acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation determined by the determining module.
在一种可行的实施方式中,所述标识信息包括:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。In a possible implementation manner, the identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行表示切 换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify that the first representation and the second representation are represented Switched segmentation information;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个。The switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示所在表示集合的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the representation set in which the first representation is carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the first representation carried in the media presentation description.
在一种可行的实施方式中,所述标识信息携带在媒体呈现描述中携带的第一表示的分段的属性信息中。In a feasible implementation manner, the identifier information is carried in the attribute information of the segment of the first representation carried in the media presentation description.
在一种可行的实施方式中,所述获取模块具体用于:In a feasible implementation manner, the acquiring module is specifically configured to:
获取所述目标表示的分段信息,所述目标表示的分段信息包括所述目标表示中包含的各个分段对应的播放时长;Acquiring the segmentation information of the target representation, where the segmentation information represented by the target includes a play duration corresponding to each segment included in the target representation;
根据所述各个分段对应的播放时长,计算各个分段的播放起始时刻,并根据各个分段的播放起始时刻和所述当前播放时刻确定第一时刻,所述第一时刻为所述各个分段的播放起始时刻中距离所述当前播放时刻最近的播放起始时刻;Determining a play start time of each segment according to a play duration corresponding to each segment, and determining a first time according to a play start time of each segment and the current play time, where the first time is a playback start time that is closest to the current playback time in the playback start time of each segment;
将播放起始时刻为所述第一时刻的分段确定为目标表示分段。The segment in which the playback start time is the first time is determined as the target presentation segment.
具体实现中,本发明实施例提供的客户端具体可为上述实施例中的客户端,客户端可通过其内置的各个模块执行上述实施例中各个步骤所描述的实现方式,在此不再赘述。In a specific implementation, the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the client may perform the implementation modes described in the foregoing steps in the foregoing embodiments by using the built-in modules, and details are not described herein. .
参见图14,是本发明实施例提供的服务器的结构示意图。本发明实施例提供的客户端,包括:FIG. 14 is a schematic structural diagram of a server according to an embodiment of the present invention. The client provided by the embodiment of the present invention includes:
生成模块141,用于根据第一表示的编码配置参数生成视频的第一表示,并根据第二表示的编码配置参数生成视频的第二表示,所述第一表示的分段的播放时长小于所述第二表示的分段的播放时长。The generating module 141 is configured to generate a first representation of the video according to the encoding configuration parameter of the first representation, and generate a second representation of the video according to the encoding configuration parameter of the second representation, where the playback duration of the segment of the first representation is smaller than The playback duration of the segment represented by the second representation.
描述模块142,用于生成媒体呈现描述,所述媒体呈现描述中携带标识信息,所述标识信息用于标识所述视频的第一表示。The description module 142 is configured to generate a media presentation description, where the media presentation description carries the identifier information, where the identifier information is used to identify the first representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示的分段的播放时长和所述第二表示的分段的播放时长;In a feasible implementation manner, the identifier information describes a playing duration of the segment of the first representation and a playing duration of the segment of the second representation;
其中,所述第一表示的分段的播放时长小于所述视频的第二表示的分段的播放时长。The playing duration of the segment of the first representation is less than the playing duration of the segment of the second representation of the video.
在一种可行的实施方式中,所述标识信息描述所述第一表示和所述第二表示的分段的切换点信息。In a possible implementation manner, the identifier information describes switching point information of the first representation and the segment of the second representation.
在一种可行的实施方式中,所述切换点信息用于标识第一表示与第二表示进行内容切换的切换分段信息;In a feasible implementation manner, the switch point information is used to identify switch segment information that is used for content switching between the first representation and the second representation;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个。The switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
具体实现中,本发明实施例提供的服务器具体可为上述实施例中的服务器,可通过其内置的各个模块执行上述实施例中各个步骤所描述的实现方式,在此不再赘述。In a specific implementation, the server provided by the embodiment of the present invention may be specifically the server in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein.
参见图15,是本发明的实施例提供的客户端的另一结构示意图。本发明实施例提供的客户端,包括: FIG. 15 is another schematic structural diagram of a client provided by an embodiment of the present invention. The client provided by the embodiment of the present invention includes:
接收模块151,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示,所述表示包括描述媒体数据分段的属性信息,所述媒体呈现描述还包括至少两个的切换码流表示,所述切换码流表示包括描述切换码流的数据分段的属性信息,其中,所述至少两个的表示所关联的空间对象与所述至少两个的切换码流表示所关联的空间对象之间存在一一对应的关系,一个媒体表示中描述的一个媒体数据分段对应的播放时长大于一个与媒体表示相对应的切换码流表示中描述的一个切换码流的数据分段对应的播放时长。The receiving module 151 is configured to receive a media presentation description, where the media presentation description includes at least two representations, the representation includes attribute information describing a media data segment, and the media presentation description further includes at least two handover code streams Representing that the switched code stream representation includes attribute information describing a data segment of the switched code stream, wherein the at least two representations of the associated spatial object and the at least two switched code stream representations are associated with the space There is a one-to-one correspondence between objects, and a media data segment described in one media representation corresponds to a playback duration corresponding to a data segment of a switched code stream described in the switched code stream representation corresponding to the media representation. Play time.
获取模块152,用于得到切换指令信息。The obtaining module 152 is configured to obtain switching instruction information.
所述获取模块152,还用于根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,其中,所述目标视角切换码流表示为所述至少两个的切换码流表示中的一个切换码流表示。The obtaining module 152 is further configured to obtain a target switching code stream representation according to the switching instruction information and the media presentation description, where the target view switching code stream is represented as the at least two switching code stream representations A switch code stream representation.
所述获取模块152,还用于根据所述目标切换码流表示得到目标切换码流请求信息,所述切换码流请求信息用于请求目标切换码流的部分数据分段。The obtaining module 152 is further configured to obtain target switching code stream request information according to the target switching code stream representation, where the switching code stream request information is used to request a partial data segment of the target switching code stream.
在一种可行的实施方式中,所述媒体呈现描述还包括切换码流表示所关联的空间对象的空间信息,所述空间信息用于描述切换码流表示所关联的空间对象与其关联的内容成分的空间关系;In a feasible implementation manner, the media presentation description further includes spatial information of the associated spatial object of the switched code stream, where the spatial information is used to describe a content component associated with the switched spatial representation and the associated content component Spatial relationship
所述获取模块152具体用于:The obtaining module 152 is specifically configured to:
根据所述切换指令信息得到目标空间对象的空间信息;Obtaining spatial information of the target spatial object according to the switching instruction information;
根据所述目标空间对象的空间信息和所述空间关系得到所述目标切换码流表示。And obtaining the target switching code stream representation according to the spatial information of the target spatial object and the spatial relationship.
在一种可行的实施方式中,所述媒体呈现描述包括自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation, the media presentation description includes information of an adaptive set for describing attributes of media data segments of a plurality of replaceable encoded versions of the same media content component. Data collection
其中,所述自适应集的信息包括所述至少两个的切换码流表示的信息。The information of the adaptive set includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述媒体呈现描述包括表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a feasible implementation manner, the media presentation description includes information represented by the set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息包括所述至少两个的切换码流表示的信息。The information represented by the information includes information represented by the at least two switched code streams.
在一种可行的实施方式中,所述切换码流表示的信息包括:码流类型标识、码流分段的播放时长以及切换点信息中的至少一种。In a feasible implementation manner, the information represented by the switching code stream includes at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识切换码流与非切换码流进行内容切换的切换分段信息;In a feasible implementation manner, the switching point information is used to identify switching segment information of a switching between a switching code stream and a non-switching code stream;
其中,所述切换分段信息包括:码流分段间隔、切换码流的码流分段位置以及非切换码流的码流分段位置中的至少一个。The switching segment information includes at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream.
具体实现中,本发明实施例提供的客户端具体可为上述实施例中的客户端,可通过其内置的各个模块执行上述实施例中各个步骤所描述的实现方式,在此不再赘述。In a specific implementation, the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein again.
参见图16,是本发明实施例提供的客户端的另一结构示意图。本发明实施例提供的客户端,包括:FIG. 16 is another schematic structural diagram of a client provided by an embodiment of the present invention. The client provided by the embodiment of the present invention includes:
接收模块161,用于接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述表示包括至少一个分段,所述至少两个的表示中的第一表示的分段时长小于第二表示的分段时长;其中,所述第一表示表示所关联的空间对象和所述第二表示所关联的空 间对象相对应。The receiving module 161 is configured to receive a media presentation description, where the media presentation description includes information of at least two representations, the representation includes at least one segment, and a segmentation duration of the first representation of the at least two representations a segmentation duration that is less than the second representation; wherein the first representation represents an associated space object and an empty associated with the second representation The corresponding object corresponds.
获取模块162,用于得到切换指令信息。The obtaining module 162 is configured to obtain switching instruction information.
所述获取模块162,还用于根据所述表示切换指令,获取所述第一表示的分段,并在预设的时间后获取所述第二表示的分段。The obtaining module 162 is further configured to acquire the segment of the first representation according to the representation switching instruction, and acquire the segment of the second representation after a preset time.
在一种可行的实施方式中,所述第一表示中携带切换点信息。In a feasible implementation manner, the first representation carries handover point information.
在一种可行的实施方式中,所述的媒体呈现描述中携带标识信息;In a feasible implementation manner, the media presentation description carries the identifier information;
其中,所述标识信息包含:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
在一种可行的实施方式中,所述切换点信息用于标识第一码流与第二码流进行表示切换的切换分段信息;In a possible implementation, the switching point information is used to identify the switching segment information indicating the switching between the first code stream and the second code stream;
其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个。The switching segment information includes at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation.
在一种可行的实施方式中,所述携带切换点信息携带在所述第一表示中的指定box中。In a feasible implementation manner, the carrying handover point information is carried in a designated box in the first representation.
在一种可行的实施方式中,所述指定box为所述第一表示中包含的sidx box,所述sidx box用于描述分段信息。In a possible implementation manner, the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
在一种可行的实施方式中,所述表示类型标识用来标识所述第一表示。In a possible implementation, the representation type identifier is used to identify the first representation.
在一种可行的实施方式中,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;In a possible implementation manner, the media presentation description includes information of an adaptation set, and the adaptation set is used to describe attributes of media data segments of the plurality of replaceable coded versions of the same media content component. Data collection
其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;In a possible implementation manner, the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format;
其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
在一种可行的实施方式中,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;In a feasible implementation manner, the media presentation includes information describing a descriptor, and the descriptor is used to describe spatial information of a spatial object to which the association is associated;
其中,所述描述子的信息中包含所述标识信息。The information of the descriptor includes the identifier information.
具体实现中,本发明实施例提供的客户端具体可为上述实施例中的客户端,可通过其内置的各个模块执行上述实施例中各个步骤所描述的实现方式,在此不再赘述。In a specific implementation, the client provided by the embodiment of the present invention may be specifically the client in the foregoing embodiment, and the implementation manners described in the foregoing steps in the foregoing embodiments may be performed by using the built-in modules, and details are not described herein again.
本发明实施例可根据媒体呈现描述中携带的标识信息识别出视频中包含的切换码流和视角码流。在空间对象切换的过程中,可根据目标空间对象从视频的多个切换码流中识别出目标空间对象对应的目标切换码流,进而可根据空间对象切换时的视频播放时刻确定出目标切换码流中的目标分段,并呈现目标分段。切换码流的分段的播放时长小于视角码流的分段的播放时长,因此空间对象切换时,可先切换到播放时长较短的切换码流分段,可提高空间对象对应的分段切换播放的效率,增强用户体验。进一步的,可获取并呈现目标空间对象对应的目标视角码流的分段,完成空间对象切换时对应的视角码流的分段切换播放。客户端通过目标切换码流完成空间对象时码流切换的中间过渡之后可切换至目标视角码流的播放,可保障空间对象切换后的视频播放的稳定性,增强视频观看的用户体验。 The embodiment of the present invention may identify the switching code stream and the view code stream included in the video according to the identifier information carried in the media presentation description. In the process of switching the spatial object, the target switching code stream corresponding to the target spatial object may be identified from the plurality of switching code streams of the video according to the target spatial object, and then the target switching code may be determined according to the video playing time when the spatial object is switched. The target segment in the stream and presents the target segment. The playback duration of the segment of the switching code stream is smaller than the playback duration of the segment of the view code stream. Therefore, when the spatial object is switched, the switching code stream segment with a shorter playback duration can be switched to improve the segmentation switching corresponding to the spatial object. Play efficiency and enhance the user experience. Further, the segment of the target view code stream corresponding to the target space object may be obtained and presented, and the segment switch play of the corresponding view code stream when the space object is switched is completed. The client can switch to the playback of the target view code stream after completing the intermediate transition of the space object switching by the target switching code stream, which can ensure the stability of the video playback after the space object is switched, and enhance the user experience of the video viewing.
本发明实施例的说明书、权利要求书以及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或者单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或者单元,或可选地还包括对于这些过程、方法、系统、产品或设备固有的其他步骤或单元。The terms "first", "second", "third", and "fourth" and the like in the description, the claims, and the drawings of the embodiments of the present invention are used to distinguish different objects, and are not used to describe a specific order. . Furthermore, the terms "comprises" and "comprising" and "comprising" are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or alternatively Other steps or units inherent to these processes, methods, systems, products or equipment.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。 The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.

Claims (38)

  1. 一种视频数据的处理方法,其特征在于,包括:A method for processing video data, comprising:
    解析媒体呈现描述,获取标识信息,所述标识信息用于标识视频的第一表示,所述第一表示所描述的分段的播放时长小于所述视频的第二表示所描述的分段的播放时长;Parsing the media presentation description, obtaining the identification information, the identification information is used to identify the first representation of the video, and the playback duration of the segment described by the first representation is smaller than the playback of the segment described by the second representation of the video duration;
    得到切换指令信息,所述切换指令信息用于指示将当前空间对象切换到目标空间对象;Obtaining switching instruction information, the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object;
    根据所述标识信息和所述切换指令信息,得到目标表示,所述目标表示和所述目标空间对象相对应;Obtaining, according to the identification information and the switching instruction information, a target representation, where the target representation corresponds to the target spatial object;
    获取所述视频的当前播放时刻,根据所述当前播放时刻和所述目标表示得到目标表示分段。Obtaining a current playing time of the video, and obtaining a target representation segment according to the current playing time and the target representation.
  2. 如权利要求1所述的方法,其特征在于,所述标识信息包括:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The method according to claim 1, wherein the identification information comprises at least one of a representation type identifier, a playback duration indicating a segment, and switch point information.
  3. 如权利要求2所述的方法,其特征在于,所述切换点信息用于标识第一表示与第二表示进行表示切换的切换分段信息,The method according to claim 2, wherein the switching point information is used to identify switching segment information indicating that the first representation and the second representation are switched,
    其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  4. 如权利要求1或2所述的方法,其特征在于,所述媒体呈现描述包括表示集合的属性信息,所述表示集合的属性信息包括所述标识信息,所述第一表示是所述表示集合中的一个表示。The method according to claim 1 or 2, wherein the media presentation description comprises attribute information representing a set, the attribute information of the representation set comprises the identification information, and the first representation is the representation set One of the representations.
  5. 如权利要求1或2所述的方法,其特征在于,所述媒体呈现描述包括所述第一表示的属性信息,所述第一表示的属性信息包括所述标识信息。The method according to claim 1 or 2, wherein the media presentation description comprises attribute information of the first representation, and the attribute information of the first representation comprises the identification information.
  6. 如权利要求1或2所述的方法,其特征在于,所述媒体呈现描述包括所述第一表示所描述的分段的属性信息,所述分段的属性信息包括所述标识信息。The method according to claim 1 or 2, wherein the media presentation description includes attribute information of the segment described by the first representation, and the attribute information of the segment includes the identification information.
  7. 如权利要求2-6任一项所述的方法,其特征在于,所述根据所述当前播放时刻和所述目标表示得到目标表示分段,包括:The method according to any one of claims 2-6, wherein the obtaining the target representation segment according to the current play time and the target representation comprises:
    获取所述目标表示的分段信息,所述目标表示的分段信息包括所述目标表示中包含的各个分段对应的播放时长;Acquiring the segmentation information of the target representation, where the segmentation information represented by the target includes a play duration corresponding to each segment included in the target representation;
    根据所述各个分段对应的播放时长,计算各个分段的播放起始时刻,并根据各个分段的播放起始时刻和所述当前播放时刻确定第一时刻,所述第一时刻为所述各个分段的播放起始时刻中距离所述当前播放时刻最近的播放起始时刻;Determining a play start time of each segment according to a play duration corresponding to each segment, and determining a first time according to a play start time of each segment and the current play time, where the first time is a playback start time that is closest to the current playback time in the playback start time of each segment;
    将播放起始时刻为所述第一时刻的分段确定为目标表示分段。The segment in which the playback start time is the first time is determined as the target presentation segment.
  8. 一种视频数据的处理方法,其特征在于,所述方法包括:A method for processing video data, characterized in that the method comprises:
    服务器根据第一表示的编码配置参数生成视频的第一表示,并根据第二表示的编码配置参数生成视频的第二表示,所述第一表示所描述的分段的播放时长小于所述第二表示所描述的分段的播放时长;The server generates a first representation of the video according to the encoding configuration parameter of the first representation, and generates a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is smaller than the second Indicates the duration of the described segmentation;
    所述服务器生成媒体呈现描述,所述媒体呈现描述中包括标识信息,所述标识信息用 于标识所述视频的第一表示。The server generates a media presentation description, where the media presentation description includes identification information, and the identifier information is used by the identifier The first representation of the video is identified.
  9. 如权利要求8所述的方法,其特征在于,所述标识信息描述所述第一表示的分段的播放时长和所述第二表示的分段的播放时长。The method of claim 8, wherein the identification information describes a play duration of the segment of the first representation and a play duration of the segment of the second representation.
  10. 如权利要求8所述的方法,其特征在于,所述标识信息描述所述第一表示和所述第二表示的分段的切换点信息。The method of claim 8 wherein said identification information describes switching point information for said first representation and said second represented segment.
  11. 如权利要求9或10所述的方法,其特征在于,所述切换点信息用于标识第一表示与第二表示进行内容切换的切换分段信息,The method according to claim 9 or 10, wherein the switching point information is used to identify switching segment information in which the first representation and the second representation perform content switching,
    其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  12. 一种客户端,其特征在于,包括:A client, comprising:
    获取模块,用于解析媒体呈现描述,获取标识信息,所述标识信息用于标识视频的第一表示,所述第一表示所描述的分段的播放时长小于所述视频的第二表示所描述的分段的播放时长;And an obtaining module, configured to parse the media presentation description, and obtain the identifier information, where the identifier information is used to identify the first representation of the video, where the playback duration of the segment described by the first representation is smaller than the second representation of the video. The length of the segmentation;
    接收模块,用于得到切换指令信息,所述切换指令信息用于指示将当前空间对象切换到目标空间对象;a receiving module, configured to obtain switching instruction information, where the switching instruction information is used to indicate that the current spatial object is switched to the target spatial object;
    确定模块,用于根据所述获取模块获取的所述标识信息和所述接收模块接收的所述切换指令信息,得到目标表示,所述目标表示和所述目标空间对象相对应;a determining module, configured to obtain, according to the identifier information acquired by the acquiring module and the switching instruction information received by the receiving module, a target representation, where the target representation corresponds to the target spatial object;
    所述获取模块,还用于获取所述视频的当前播放时刻,根据所述当前播放时刻和所述确定模块得到的所述目标表示得到目标表示分段。The acquiring module is further configured to acquire a current playing time of the video, and obtain a target representation segment according to the current playing time and the target representation obtained by the determining module.
  13. 如权利要求12所述的客户端,其特征在于,所述标识信息包括:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The client according to claim 12, wherein the identification information comprises at least one of a representation type identifier, a playback duration indicating a segment, and switch point information.
  14. 如权利要求13所述的客户端,其特征在于,所述切换点信息用于标识第一表示与第二表示进行表示切换的切换分段信息,The client according to claim 13, wherein the switching point information is used to identify switching segment information indicating that the first representation and the second representation are switched,
    其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  15. 如权利要求12或13所述的客户端,其特征在于,所述媒体呈现描述包括表示集合的属性信息,所述表示集合的属性信息包括所述标识信息,所述第一表示是所述表示集合中的一个表示。The client according to claim 12 or 13, wherein the media presentation description comprises attribute information representing a set, the attribute information of the representation set comprises the identification information, and the first representation is the representation A representation in the collection.
  16. 如权利要求12或13所述的客户端,其特征在于,所述媒体呈现描述包括所述第一表示的属性信息,所述第一表示的属性信息包括所述标识信息。The client according to claim 12 or 13, wherein the media presentation description includes attribute information of the first representation, and the attribute information of the first representation includes the identification information.
  17. 如权利要求12或13所述的客户端,其特征在于,所述媒体呈现描述包括所述第一表示所描述的分段的属性信息,所述分段的属性信息包括所述标识信息。The client according to claim 12 or 13, wherein the media presentation description includes attribute information of the segment described by the first representation, and the attribute information of the segment includes the identification information.
  18. 如权利要求13-17任一项所述的客户端,其特征在于,所述获取模块具体用于:The client according to any one of claims 13-17, wherein the obtaining module is specifically configured to:
    获取所述目标表示的分段信息,所述目标表示的分段信息包括所述目标表示中包含的 各个分段对应的播放时长;Obtaining segmentation information represented by the target, the segmentation information represented by the target includes the segmentation information included in the target representation The playback duration corresponding to each segment;
    根据所述各个分段对应的播放时长,计算各个分段的播放起始时刻,并根据各个分段的播放起始时刻和所述当前播放时刻确定第一时刻,所述第一时刻为所述各个分段的播放起始时刻中距离所述当前播放时刻最近的播放起始时刻;Determining a play start time of each segment according to a play duration corresponding to each segment, and determining a first time according to a play start time of each segment and the current play time, where the first time is a playback start time that is closest to the current playback time in the playback start time of each segment;
    将播放起始时刻为所述第一时刻的分段确定为目标表示分段。The segment in which the playback start time is the first time is determined as the target presentation segment.
  19. 一种服务器,其特征在于,包括:A server, comprising:
    生成模块,用于根据第一表示的编码配置参数生成视频的第一表示,并根据第二表示的编码配置参数生成视频的第二表示,所述第一表示所描述的分段的播放时长小于所述第二表示所描述的分段的播放时长;a generating module, configured to generate a first representation of the video according to the encoding configuration parameter of the first representation, and generate a second representation of the video according to the encoding configuration parameter of the second representation, where the playing duration of the segment described by the first representation is less than The second representation indicates the playing duration of the segment described;
    描述模块,用于生成媒体呈现描述,所述媒体呈现描述中包括标识信息,所述标识信息用于标识所述视频的第一表示。And a description module, configured to generate a media presentation description, where the media presentation description includes identification information, where the identifier information is used to identify a first representation of the video.
  20. 如权利要求19所述的服务器,其特征在于,所述标识信息描述所述第一表示的分段的播放时长和所述第二表示的分段的播放时长。The server according to claim 19, wherein said identification information describes a play duration of said segment of said first representation and a duration of play of said segment of said second representation.
  21. 如权利要求19所述的服务器,其特征在于,所述标识信息描述所述第一表示和所述第二表示的分段的切换点信息。The server according to claim 19, wherein said identification information describes switching point information of said first representation and said second represented segment.
  22. 如权利要求20或21所述的服务器,其特征在于,所述切换点信息用于标识第一表示与第二表示进行内容切换的切换分段信息,The server according to claim 20 or 21, wherein the switching point information is used to identify switching segment information in which the first representation and the second representation perform content switching,
    其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  23. 一种基于HTTP动态自适应流媒体的视频数据的处理方法,其特征在于,所述方法包括:A method for processing video data based on HTTP dynamic adaptive streaming media, characterized in that the method comprises:
    接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示,所述表示包括描述媒体数据分段的属性信息,所述媒体呈现描述还包括至少两个的切换码流表示,所述切换码流表示包括描述切换码流的数据分段的属性信息,Receiving a media presentation description, the media presentation description including at least two representations, the representation including attribute information describing a media data segment, the media presentation description further comprising at least two handover code stream representations, the handover code The stream representation includes attribute information describing a data segment of the switched code stream,
    其中,所述至少两个的表示所关联的空间对象与所述至少两个的切换码流表示所关联的空间对象之间存在一一对应的关系,一个媒体表示中描述的一个媒体数据分段对应的播放时长大于一个与媒体表示相对应的切换码流表示中描述的一个切换码流的数据分段对应的播放时长;Wherein the at least two representations of the associated spatial object and the at least two switched code stream representations have a one-to-one correspondence between the spatial objects, a media data segment described in a media representation The corresponding playing duration is greater than a playing duration corresponding to a data segment of a switching code stream described in the switching code stream representation corresponding to the media representation;
    得到切换指令信息;Obtaining switching instruction information;
    根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,其中,所述目标视角切换码流表示为所述至少两个的切换码流表示中的一个切换码流表示;And obtaining, by the switching instruction information and the media presentation description, a target switching code stream representation, where the target view switching code stream is represented as one of the at least two switched code stream representations;
    根据所述目标切换码流表示得到目标切换码流请求信息,所述切换码流请求信息用于请求目标切换码流的部分数据分段。And obtaining, according to the target switching code stream, target switching code stream request information, where the switching code stream request information is used to request a partial data segment of the target switching code stream.
  24. 根据权利要求23所述的方法,其特征在于,所述媒体呈现描述还包括切换码流表示所关联的空间对象的空间信息,所述空间信息用于描述切换码流表示所关联的空间对象与其关联的内容成分的空间关系; The method according to claim 23, wherein the media presentation description further comprises a spatial information of the associated spatial object represented by the switched code stream, the spatial information being used to describe a spatial object associated with the switched code stream representation and The spatial relationship of the associated content components;
    所述根据所述切换指令信息和所述媒体呈现描述得到目标切换码流表示,包括:And obtaining the target switching code stream representation according to the switching instruction information and the media presentation description, including:
    根据所述切换指令信息得到目标空间对象的空间信息;Obtaining spatial information of the target spatial object according to the switching instruction information;
    根据所述目标空间对象的空间信息和所述空间关系得到所述目标切换码流表示。And obtaining the target switching code stream representation according to the spatial information of the target spatial object and the spatial relationship.
  25. 根据权利要求23或24所述的方法,其特征在于,所述媒体呈现描述包括自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;The method according to claim 23 or 24, wherein said media presentation description comprises information of an adaptive set, said adaptive set being used to describe a plurality of mutually replaceable encoded versions of media of the same media content component a data set of attributes of the data segmentation;
    其中,所述自适应集的信息包括所述至少两个的切换码流表示的信息。The information of the adaptive set includes information represented by the at least two switched code streams.
  26. 根据权利要求23或24所述的方法,其特征在于,所述媒体呈现描述包括表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;The method according to claim 23 or 24, wherein the media presentation description comprises information represented, the representation being a set and encapsulation of one or more code streams in a transport format;
    其中,所述表示的信息包括所述至少两个的切换码流表示的信息。The information represented by the information includes information represented by the at least two switched code streams.
  27. 根据权利要求25或26所述的方法,其特征在于,所述切换码流表示的信息包括:码流类型标识、码流分段的播放时长以及切换点信息中的至少一种。The method according to claim 25 or 26, wherein the information represented by the switching code stream comprises at least one of a code stream type identifier, a play duration of the code stream segment, and switch point information.
  28. 如权利要求27所述的方法,其特征在于,所述切换点信息用于标识切换码流与非切换码流进行内容切换的切换分段信息,The method according to claim 27, wherein the switching point information is used to identify switching segment information for switching content between the switching code stream and the non-switching code stream,
    其中,所述切换分段信息包括:码流分段间隔、切换码流的码流分段位置以及非切换码流的码流分段位置中的至少一个;The switching segment information includes: at least one of a code stream segmentation interval, a code stream segmentation position of the switching code stream, and a code stream segmentation position of the non-switching code stream;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  29. 一种基于HTTP动态自适应流媒体的视频数据的处理方法,其特征在于,所述方法包括:A method for processing video data based on HTTP dynamic adaptive streaming media, characterized in that the method comprises:
    接收媒体呈现描述,所述媒体呈现描述包括至少两个的表示的信息,所述表示包括至少一个分段,所述至少两个的表示中的第一表示的分段时长小于第二表示的分段时长;Receiving a media presentation description, the media presentation description including at least two representations, the representation comprising at least one segment, a segmentation duration of the first representation of the at least two representations being less than a score of the second representation Period length;
    其中,所述第一表示表示所关联的空间对象和所述第二表示所关联的空间对象相对应;Wherein the first representation indicates that the associated spatial object corresponds to the spatial object associated with the second representation;
    得到切换指令信息;Obtaining switching instruction information;
    根据所述表示切换指令,获取所述第一表示的分段,并在预设的时间后获取所述第二表示的分段。And acquiring, according to the indicating switching instruction, the segment of the first representation, and acquiring the segment of the second representation after a preset time.
  30. 根据权利要求29所述的方法,其特征在于,所述第一表示中包括切换点信息。The method of claim 29 wherein said first representation includes switching point information.
  31. 根据权利要求29所述的方法,其特征在于,所述的媒体呈现描述中包括标识信息;The method according to claim 29, wherein said media presentation description includes identification information;
    其中,所述标识信息包含:表示类型标识、表示分段的播放时长以及切换点信息中的至少一种。The identifier information includes at least one of a type identifier, a play duration indicating a segment, and switch point information.
  32. 根据权利要求30和31所述的方法,其特征在于,所述切换点信息用于标识第一码流与第二码流进行表示切换的切换分段信息,The method according to any one of claims 30 and 31, wherein the switching point information is used to identify switching segment information indicating switching between the first code stream and the second code stream,
    其中,所述切换分段信息包括:分段间隔、第一表示的分段位置以及第二表示的分段位置中的至少一个;The switching segment information includes: at least one of a segmentation interval, a segmentation location of the first representation, and a segmentation location of the second representation;
    或者or
    所述切换点信息是一个标识(flag),所述标识用于指示分段的切换能力。The switching point information is a flag indicating the switching capability of the segment.
  33. 根据权利要求30所述的方法,其特征在于,所述第一表示中的指定box中包括所述切换点信息。 The method according to claim 30, wherein said switching point information is included in a designated box in said first representation.
  34. 如权利要求33所述的方法,其特征在于,所述指定box为所述第一表示中包含的sidx box,所述sidx box用于描述分段信息。The method of claim 33, wherein the designated box is a sidx box included in the first representation, and the sidx box is used to describe segmentation information.
  35. 根据权利要求31所述的方法,其特征在于,所述表示类型标识用来标识所述第一表示。The method of claim 31 wherein said representation type identifier is used to identify said first representation.
  36. 如权利要求31所述的方法,其特征在于,所述媒体呈现描述中包含自适应集的信息,所述自适应集用于描述同一媒体内容成分的多个可互相替换的编码版本的媒体数据分段的属性的数据集合;The method of claim 31, wherein said media presentation description includes information of an adaptive set, said adaptive set being used to describe media data of a plurality of interchangeable encoded versions of the same media content component a data set of segmented attributes;
    其中,所述自适应集的信息中包含所述标识信息。The information of the adaptive set includes the identifier information.
  37. 如权利要求31所述的方法,其特征在于,所述媒体呈现描述中包含表示的信息,所述表示为传输格式中的一个或者多个码流的集合和封装;The method according to claim 31, wherein the media presentation description includes information indicating that the representation is a set and encapsulation of one or more code streams in a transmission format;
    其中,所述表示的信息中包含所述标识信息。The information that is represented includes the identifier information.
  38. 如权利要求31所述的方法,其特征在于,所述媒体呈现描述中包含描述子的信息,所述描述子用于描述关联到的空间对象的空间信息;The method according to claim 31, wherein said media presentation description includes information of a descriptor, and said descriptor is used to describe spatial information of the associated spatial object;
    其中,所述描述子的信息中包含所述标识信息。 The information of the descriptor includes the identifier information.
PCT/CN2017/086548 2016-09-30 2017-05-31 Video data processing method and apparatus WO2018058993A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201610878496 2016-09-30
CN201610878496.1 2016-09-30
CN201610890964.7A CN107888993B (en) 2016-09-30 2016-10-11 Video data processing method and device
CN201610890964.7 2016-10-11

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/370,052 US20190230388A1 (en) 2016-09-30 2019-03-29 Method and apparatus for processing video data

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/370,052 Continuation US20190230388A1 (en) 2016-09-30 2019-03-29 Method and apparatus for processing video data

Publications (1)

Publication Number Publication Date
WO2018058993A1 true WO2018058993A1 (en) 2018-04-05

Family

ID=61763092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086548 WO2018058993A1 (en) 2016-09-30 2017-05-31 Video data processing method and apparatus

Country Status (1)

Country Link
WO (1) WO2018058993A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204973A1 (en) * 2010-10-06 2013-08-08 Humax Co., Ltd. Method for transmitting a scalable http stream for natural reproduction upon the occurrence of expression-switching during http streaming
CN104025604A (en) * 2012-07-02 2014-09-03 索尼公司 Transmission apparatus, transmission method, and network apparatus
CN104509119A (en) * 2012-04-24 2015-04-08 Vid拓展公司 Method and apparatus for smooth stream switching in MPEG/3GPP-DASH
WO2015150736A1 (en) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Multicast streaming
CN105612753A (en) * 2013-10-08 2016-05-25 高通股份有限公司 Switching between adaptation sets during media streaming

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204973A1 (en) * 2010-10-06 2013-08-08 Humax Co., Ltd. Method for transmitting a scalable http stream for natural reproduction upon the occurrence of expression-switching during http streaming
CN104509119A (en) * 2012-04-24 2015-04-08 Vid拓展公司 Method and apparatus for smooth stream switching in MPEG/3GPP-DASH
CN104025604A (en) * 2012-07-02 2014-09-03 索尼公司 Transmission apparatus, transmission method, and network apparatus
CN105612753A (en) * 2013-10-08 2016-05-25 高通股份有限公司 Switching between adaptation sets during media streaming
WO2015150736A1 (en) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Multicast streaming

Similar Documents

Publication Publication Date Title
CN109644262A (en) The method for sending omnidirectional's video, the method for receiving omnidirectional's video, the device for sending omnidirectional's video and the device for receiving omnidirectional's video
KR102247399B1 (en) Method, device, and computer program for adaptive streaming of virtual reality media content
WO2018058773A1 (en) Video data processing method and apparatus
WO2018068236A1 (en) Video stream transmission method, related device and system
WO2018120294A1 (en) Information processing method and device
US20190230388A1 (en) Method and apparatus for processing video data
CN109362242B (en) Video data processing method and device
WO2018072488A1 (en) Data processing method, related device and system
KR20200062144A (en) Method and apparatus for transceiving metadata for multiple viewpoints
TW201924323A (en) Content source description for immersive media data
WO2018126702A1 (en) Streaming media transmission method applied to virtual reality technology and client
WO2019007120A1 (en) Method and device for processing media data
US20200092600A1 (en) Method and apparatus for presenting video information
WO2020043126A1 (en) Video data processing and transmission methods and apparatus, and video data processing system
CN110913278A (en) Video playing method, display terminal and storage medium
BR112019019836A2 (en) signaling important video information in network video streaming using mime parameters
KR20200066601A (en) Method and apparatus for transceiving metadata for multiple viewpoints
WO2018058993A1 (en) Video data processing method and apparatus
US20210176446A1 (en) Method and device for transmitting and receiving metadata about plurality of viewpoints
WO2019007096A1 (en) Method and apparatus for processing media information
US20200389640A1 (en) Method and device for transmitting 360-degree video by using metadata related to hotspot and roi
KR20200008631A (en) How to send 360 degree video, how to receive 360 degree video, 360 degree video transmitting device, 360 degree video receiving device
BR112020000195A2 (en) processing media data using omnidirectional media format
WO2018120474A1 (en) Information processing method and apparatus
CN108271084B (en) Information processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17854449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17854449

Country of ref document: EP

Kind code of ref document: A1