WO2018068236A1 - 一种视频流传输方法、相关设备及系统 - Google Patents

一种视频流传输方法、相关设备及系统 Download PDF

Info

Publication number
WO2018068236A1
WO2018068236A1 PCT/CN2016/101920 CN2016101920W WO2018068236A1 WO 2018068236 A1 WO2018068236 A1 WO 2018068236A1 CN 2016101920 W CN2016101920 W CN 2016101920W WO 2018068236 A1 WO2018068236 A1 WO 2018068236A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
video
sub
client
multiplexed
Prior art date
Application number
PCT/CN2016/101920
Other languages
English (en)
French (fr)
Inventor
邸佩云
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201680086678.3A priority Critical patent/CN109644296A/zh
Publication of WO2018068236A1 publication Critical patent/WO2018068236A1/zh
Priority to US16/379,894 priority patent/US10897646B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/437Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server

Definitions

  • the present invention relates to the field of video technologies, and in particular, to a video stream transmission method, related device, and system.
  • VR Virtual Reality
  • VR technology is a computer simulation system that can create and experience virtual worlds. It uses a computer to generate a simulation environment to immerse users in the environment.
  • VR technology can be widely used in many fields such as urban planning, interior design, industrial simulation, monument restoration, bridge road design, real estate sales, tourism teaching, education and training.
  • the VR technology when the VR technology is applied to the existing video technology, a panoramic video application that exceeds the normal visual range of the human eye is realized, and the video application brings a new viewing mode and visual experience to the people.
  • the user can view the VR video content 360 degrees, such as a virtual reality live broadcast or a recording system, but at the same time, since the video stream of the VR video is large, and the request feedback process between the client and the server is complicated, It may cause problems such as large bandwidth consumption when the user views the VR video through the client. Therefore, when the VR video content is prepared, the video content is divided into a plurality of spatial objects, and the spatial object corresponding to the user's perspective is sent to the user when viewed.
  • the client renders, which can reduce the amount of data transferred, but at the same time introduces a new problem.
  • the client's perspective may correspond to multiple spatial objects at the same time, so that the client must acquire the code streams of multiple spatial objects at the same time, and more
  • the code stream of the spatial object is decoded synchronously, so the client needs to wait for the code streams of multiple spatial objects to be received before it can be presented, thus increasing the delay of the client to present a new perspective, thereby affecting the user.
  • Experience is a code stream of the spatial object.
  • the technical problem to be solved by the embodiments of the present invention is to provide a video stream transmission method, a related device, and a system, which solves the problem of large delay in presentation of the VR video experience in the prior art.
  • the MPEG organization approved the DASH standard, which is based on HTTP.
  • the technical specification of the protocol transmission media stream (hereinafter referred to as the DASH technical specification); the DASH technical specification is mainly composed of two major parts: a media presentation description (English: Media Presentation Description, MPD) and a media file format (English: file format).
  • FIG. 1 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • the server prepares three different versions of code stream data for a movie, and uses three representations (Representation, hereinafter referred to as rep) in the MPD to describe the code streams of the three different versions, including rep1, rep2, Rep3.
  • rep1 is a high-definition video with a code rate of 4mbps (megabits per second)
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a standard-definition video with a code rate of 1mbps.
  • the segment marked as shaded in Figure 1 is the segmentation data requested by the client.
  • the first three segments requested by the client are the segments of the media representation rep3, the fourth segment is switched to rep2, and the fourth segment is requested.
  • Segment then switch to rep1, request the fifth segment and the sixth segment, and so on.
  • Each represented segment can be stored in a file end to end, or it can be stored as a small file.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
  • the media presentation description is called MPD
  • the MPD can be an xml file
  • the information in the file is described in a hierarchical manner, as shown in FIG. 2,
  • FIG. 2 is a classification of the mdp file provided by the embodiment of the present invention.
  • the structure diagram, the information of the upper level is completely inherited by the next level.
  • Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the http-URL of the request segment.
  • media presentation is a collection of structured data for presenting media content
  • media presentation description English: media presentation description
  • a standardized description of media presentation files for providing streaming media services Period English: period
  • the representation English: representation
  • the package has one or more
  • a structured data set of media content components encoded individual media types, such as audio, video, etc.
  • descriptive metadata is represented as a collection and encapsulation of one or more code streams in a transport format, including in a representation One or more segments
  • an adaptive set (English: AdaptationSet), representing a set of multiple interchangeable coded versions of the same media content component, one adaptive set containing one or more representations
  • a subset English: subset a combination of a set of adaptive sets, when the player plays all of the adaptive sets, the corresponding media content can be obtained
  • the segmentation information is a media unit referenced by the HTTP uniform resource locator in the media presentation
  • the related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC23009-1:2014Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats, refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
  • Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment.
  • VR mainly includes simulation environment, perception, natural skills and sensing equipment.
  • the simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has.
  • there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception.
  • Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors.
  • a sensing device is a three-dimensional interactive device.
  • VR video or 360 degree video, or Omnidirectional video
  • only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented.
  • a Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an Adaptation Set or a Sub-Representation.”
  • [ ⁇ ] The spatial relationship between spatial objects (Spatial Objects) is described in MPD.
  • a spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation.
  • the existing DASH standard defines some descriptor elements in the MPD. Each descriptor element has two attributes, schemeIdURI and value. Among them, the schemeIdURI describes what the current descriptor is, and the value is the parameter value of the descriptor.
  • SupplementalProperty and EssentialProperty SupplementalProperty and EssentialProperty (supplemental feature descriptors and basic property descriptors).
  • FIG. 3 is a schematic diagram showing the spatial relationship of a spatial object according to an embodiment of the present invention.
  • the image AS can be set as a content component, and AS1, AS2, AS3, and AS4 are four spatial objects included in the AS, and each spatial object is associated with a space.
  • the spatial relationship of each spatial object is described in the MPD, for example, each spatial object. The relationship between the associated spaces.
  • Video source ID 1 (the same content source as the above video source); the upper left coordinate of the spatial object (0, 0); the length and width of the spatial object is (1920, 1080); the space referenced by the spatial object (3840, 2160); The spatial object group ID is 2. Here the space object's length and width are one quarter of the space object reference space. Small, and from the coordinates of the sitting is the space object in the upper left corner, that is, the content of AS1 in Representation2. Similarly, the description of other spatial objects is as follows. The spatial objects with the same spatial object group ID belong to the same video content -->
  • Figure 16 illustrates a method for a server to multiplex a spatial object stream corresponding to a client's perspective (FOV).
  • the client initiates an ROI request to the server, and the server reuses the segment of the spatial object corresponding to the ROI region. And sent to the client.
  • the method can be applied in the interaction of a client and a server based on MPEG-DASH technology.
  • an embodiment of the present invention provides a video stream transmission method, which may include:
  • the client sends a target request to the server, where the target request includes the target space location information corresponding to the target space object that the client requests to render in the virtual reality VR content component; the client receiving the server responding to the target request Target request feedback, the target request feedback includes multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target space object; and the client performs video according to the multiplexed video stream information.
  • Analyze the presentation That is, by responding to the client request by performing the preset multiplexing process on the video stream that requests the response, the number of requests from the client is reduced, the number of responses of the server is also reduced, and the video stream information of each view at the same time is ensured. Simultaneous arrival reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • the multiplexed video stream information includes information of N multiplexed sub-video streams obtained by the N sub-video streams respectively by the preset multiplexing process,
  • the N sub video streams divide the target space object into N subspace objects.
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes the following items: At least one item: the number N of the sub-video streams included in the multiplexed video stream information; the start of the initial sub-video stream in the N sub-video streams in the multiplexed video stream information Position offset; data amount information of the N multiplexed sub video streams; spatial position information corresponding to the N multiplexed sub video streams respectively in the VR content component; Resolution information of the sub-video stream to be multiplexed; a video stream multiplexing type of the N multiplexed sub-video streams.
  • the client can perform parsing and rendering of the multiplexed video stream according to the content in the multiplexed expression information.
  • the multiplexing description information further includes: the N multiplexed sub video streams respectively in the VR content component Corresponding spatial location information.
  • the client finally presents the parsed video stream according to the spatial position information corresponding to the N multiplexed sub video streams.
  • the target request includes at least one of the following: an identifier of the media representation, a perspective information of the user of the client, or a space represented by the media information.
  • the server obtains N sub-video streams for multiplexing according to the target request.
  • the multiplexing description information further includes:
  • the sub-video streams respectively correspond to spatial location information in the VR content component. Therefore, the client can know the sub-video streams of the views that have been requested according to the spatial location information of the video stream, so that when the repeated view content needs to be viewed, there is no need to repeat the request to improve the VR video transmission efficiency. And user experience.
  • the target request further includes at least one of a region of interest ROI information, bandwidth information of the client, decoding standard information supported by the client, and maximum video resolution information of the client.
  • the client may also carry some relevant parameters such as its own video playing condition or playing performance, so that the server can process and feedback the video stream in a more appropriate processing manner.
  • the preset multiplexing processing includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing deal with.
  • the preset multiplexing processing method can include multiple types to meet different processing requirements of different VR videos.
  • an embodiment of the present invention provides a video stream transmission method, which may include:
  • the server Receiving, by the server, a target request sent by the client, where the target request includes target space location information corresponding to the target spatial object that the client requests to present in the virtual reality VR content component;
  • the server is configured according to the target spatial location information Searching for a corresponding target space object in the VR content component; the server acquiring the multiplexed video stream information obtained by performing a preset multiplexing process on the video stream corresponding to the target space object; the server is to the client Sending target request feedback in response to the target request, the target request feedback including the multiplexed video stream information.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views.
  • the simultaneous arrival of information reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • the server obtains the multiplexed video stream information obtained by performing the preset multiplexing processing on the original video stream corresponding to the target spatial object, including:
  • the server divides the target spatial object into N subspace objects, and encodes the N subspace objects to generate corresponding N sub video streams, where N is a natural number greater than 1; the server acquires the N The sub-video streams respectively perform information of the N multiplexed sub-video streams obtained by the preset multiplexing process.
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes at least one of the following items: The number N of the sub-video streams included in the multiplexed video stream information; the N The starting sub-video stream in the sub-video stream is offset from the starting position in the multiplexed video stream information; the data amount of the N multiplexed sub-video streams; the N multiplexed sub- Spatial location information corresponding to the video stream in the VR content component; resolution information of the N multiplexed sub-video streams; video stream multiplexing type of the N multiplexed sub-video streams .
  • the multiplexing description information further includes: the N multiplexed sub video streams respectively in the VR content component Corresponding spatial location information.
  • the multiplexing description information further includes: The sub-video streams respectively correspond to spatial location information in the VR content component.
  • the preset multiplexing processing includes a video stream binary end-to-end stitching processing, or a video segmentation binary end-to-end stitching processing or interleaving complex Use processing.
  • an embodiment of the present invention provides a client, which may include:
  • a requesting module configured to send a target request to the server, where the target request is at least one of the following: the client requests the target space object that needs to be presented in the virtual reality VR content component corresponding target space location information, and the identifier of the media representation The perspective information of the user of the client or the spatial information represented by the media. ;
  • a receiving module configured to receive a target request feedback of the server in response to the target request, where the target request feedback includes the multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object;
  • a processing module configured to perform video parsing and presentation according to the multiplexed video stream information.
  • the multiplexed video stream information includes information of N multiplexed sub-video streams obtained by the N sub-video streams respectively by the preset multiplexing process.
  • the N sub-video streams are corresponding sub-video streams that are obtained by dividing the target spatial object into N sub-space objects and encoding the N sub-space objects, where N is a natural number greater than 1.
  • the target request feedback further includes multiplexing description information, and the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the video stream multiplexing type of the N multiplexed sub video streams is the video stream multiplexing type of the N multiplexed sub video streams.
  • the multiplexing description information further includes: the N multiplexed sub video streams respectively in the VR content component Corresponding spatial location information.
  • the multiplexing description information further includes: The sub-video streams respectively correspond to spatial location information in the VR content component.
  • the target request further includes at least one of a region of interest ROI information, bandwidth information of the client, decoding standard information supported by the client, and maximum video resolution information of the client.
  • the preset multiplexing processing includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing deal with.
  • an embodiment of the present invention provides a video stream transmission method, which may include:
  • a receiving module configured to receive a target request sent by the client, where the target request is at least one of the following: the client requests a target space object that needs to be presented in the virtual reality VR content component, and the target space location information, the media representation The identifier of the client, the perspective information of the user or the spatial information represented by the media. ;
  • a parsing module configured to search, in the VR content component, according to the target spatial location information Corresponding target space object;
  • An acquiring module configured to obtain multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object;
  • a feedback module configured to send, to the client, target request feedback in response to the target request, where the target request feedback includes the multiplexed video stream information.
  • the acquiring module includes:
  • a dividing unit configured to divide the target spatial object into N subspace objects, and encode the N subspace objects to generate corresponding N sub video streams, where N is a natural number greater than 1;
  • an obtaining unit configured to acquire information of the N multiplexed sub-video streams obtained by performing the preset multiplexing process on the N sub-video streams.
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the video stream multiplexing type of the N multiplexed sub video streams is the video stream multiplexing type of the N multiplexed sub video streams.
  • the multiplexing description information further includes:
  • the N multiplexed sub-video streams respectively correspond to spatial location information in the VR content component.
  • the multiplexing description information further includes: The sub-video streams respectively correspond to spatial location information in the VR content component.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleave multiplexing process.
  • an embodiment of the present invention provides a client, which may include a processor, a memory, and a transceiver, where the memory is used to store an instruction, and the processor is configured to invoke an instruction stored in the memory to execute the embodiment of the present invention.
  • an embodiment of the present invention provides a server, which may include a processor, a memory, and a transceiver, where the memory is used to store an instruction, and the processor is configured to invoke an instruction stored in the memory to perform the second embodiment of the present invention.
  • An embodiment of the seventh aspect of the present invention provides a method for processing video data based on a streaming media technology, where the method includes:
  • the server receives a video data acquisition request sent by the client, where the acquisition request includes information of a spatial object;
  • the server encapsulates the video data corresponding to the at least two media representations into one code stream
  • the server sends the code stream to the client.
  • the information of the spatial object includes at least one of the following:
  • the identifier represented by the media the perspective information of the user of the client, or the spatial information represented by the media.
  • the code stream includes at least one of the following information:
  • the media indicates a starting position offset in the code stream
  • the amount of data represented by the media is the amount of data represented by the media
  • the media indicates corresponding spatial location information
  • the resolution information represented by the media is the resolution information represented by the media.
  • the code stream includes a package identifier, and the identifier is used to indicate that the code stream adopts a segmentation interleave encapsulation mode or the code stream adopts a sample interleave encapsulation manner.
  • An embodiment of the eighth aspect of the present invention is a method for processing video data based on a streaming media technology, the method comprising:
  • a video data acquisition request sent by the client to the server, where the acquisition request includes information of a spatial object
  • the client receives a code stream sent by the server after responding to the video data acquisition request, where the code stream includes data represented by at least two media.
  • the information of the spatial object includes at least one of the following:
  • the identifier represented by the media the perspective information of the user of the client, or the spatial information represented by the media.
  • the code stream includes at least one of the following information:
  • the media indicates a starting position offset in the code stream
  • the amount of data represented by the media is the amount of data represented by the media
  • the media indicates corresponding spatial location information
  • the resolution information represented by the media is the resolution information represented by the media.
  • the code stream includes a package identifier, and the identifier is used to indicate that the code stream adopts a packet interleaving encapsulation manner or the code stream adopts a sample interleaving encapsulation manner.
  • An embodiment of the ninth aspect of the present invention is a server based on a streaming media technology, the server comprising:
  • a receiver configured to receive a video data acquisition request sent by the client, where the acquisition request includes information of a spatial object
  • a processor configured to determine, according to the information of the spatial object, the video data corresponding to the at least two media representations
  • the processor is further configured to encapsulate the at least two media representations corresponding video data into one code stream;
  • a transmitter configured to send the code stream to the client.
  • the information of the spatial object includes at least one of the following:
  • the identifier represented by the media the perspective information of the user of the client, or the spatial information represented by the media.
  • the code stream includes at least one of the following information:
  • the media indicates a starting position offset in the code stream
  • the amount of data represented by the media is the amount of data represented by the media
  • the media indicates corresponding spatial location information
  • the resolution information represented by the media is the resolution information represented by the media.
  • the code stream includes a package identifier, and the identifier is used to indicate that the code stream adopts a packet interleaving encapsulation manner or the code stream adopts a sample interleaving encapsulation manner.
  • An embodiment of the tenth aspect of the present invention is a client based on a streaming media technology, where the client includes:
  • a sender a video data acquisition request sent to the server, where the acquisition request includes information of a spatial object
  • a receiver configured to receive a code stream sent by the server after responding to the video data acquisition request, where the code stream includes data represented by at least two media.
  • the information of the spatial object includes at least one of the following:
  • the identifier represented by the media the perspective information of the user of the client, or the spatial information represented by the media.
  • the code stream includes at least one of the following information:
  • the media indicates a starting position offset in the code stream
  • the amount of data represented by the media is the amount of data represented by the media
  • the media indicates corresponding spatial location information
  • the resolution information represented by the media is the resolution information represented by the media.
  • the code stream includes a package identifier, and the identifier is used to indicate that the code stream adopts a packet interleaving encapsulation manner or the code stream adopts a sample interleaving encapsulation manner.
  • the video stream corresponding to the view location information is multiplexed and encapsulated by the server according to the view location information in the request information of the client, and transmitted to the client.
  • the video stream involved in the location information refers to a video stream in the video content that partially or completely overlaps with the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views.
  • the simultaneous arrival of information reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • FIG. 1 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention
  • FIG. 2 is a hierarchical structure diagram of an mdp file according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of spatial relationships of spatial objects according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a network architecture of a video stream transmission system according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a video stream transmission method according to an embodiment of the present invention.
  • FIG. 6 is a 360 degree view change diagram provided by an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart diagram of another video stream transmission method according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a multiplexed video stream according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of sample interleaving multiplexing in a video stream according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a client according to an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
  • FIG. 13 is a schematic structural diagram of another client according to an embodiment of the present disclosure.
  • FIG. 14 is a schematic structural diagram of another server according to an embodiment of the present disclosure.
  • FIG. 15 is a schematic structural diagram of a video stream transmission system according to an embodiment of the present invention.
  • FIG. 16 is a schematic structural diagram of a system for segment multiplexing according to an embodiment of the present invention.
  • references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention.
  • the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
  • the client may be installed on the terminal device in the form of software or APP, or may be a client on the terminal device (such as a terminal device with VR video viewing) in the form of an inherent functional component inside the system. That is, the client mentioned in the present invention refers to a terminal device that has successfully installed the client.
  • the terminal device includes but is not limited to various forms of User Equipment (UE), such as an access terminal, a terminal device, a subscriber unit, a subscriber station, a mobile station, a mobile station, and a remote station, which can perform a VR video viewing experience.
  • UE User Equipment
  • remote terminal mobile device, user terminal, terminal, wireless communication device, user agent or user device, cellular phone, cordless phone, smart phone, tablet, Session Initiation Protocol (SIP) phone, wireless local loop (Wireless Local Loop, WLL) station, smart bracelet, smart wearable devices (such as smart glasses, smart helmets, etc.), MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio level 3), MP4 ( Moving Picture Experts Group Audio Layer IV, Motion Picture Expert Compress Standard Audio Level 3) Player, Personal Digital Assistant (PDA), Handheld with Wireless Communication, Computing Device or Other Processing Device Connected to Wireless Modem , in-vehicle equipment and future 5G networks Terminal equipment, etc.
  • SIP Session Initiation Protocol
  • WLL Wireless Local Loop
  • WLL Wireless Local Loop
  • MP4 Moving Picture Experts Group Audio Layer IV, Motion Picture Expert Compress Standard Audio Level 3
  • PDA Personal Digital Assistant
  • the server can store a large number of VR video files, complete the interaction with the request of the client, and implement a cloud service device, a terminal device, or a core network device for processing operations such as encoding, decoding, and multiplexing of the VR video.
  • Multiple means two or more. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • FIG. 4 is a schematic diagram of a network architecture of a video streaming system according to an embodiment of the present invention.
  • the system includes a client and a server, where the user can use the client to initiate the server to the server through a wired network or a wireless network.
  • the server responds to the VR video request and feeds back the corresponding VR video content to the client.
  • the client parses the feedback VR video content and presents the VR video effect to the user, that is, the user passes the request.
  • the video stream interaction between the client and the server implements the VR video experience.
  • the functions of the client include, but are not limited to, sending a VR video request to the client according to the current perspective location information of the client, where the request carries the perspective information of the client, the multiplexing description information, and the like.
  • the functions of the server include, but are not limited to, description information of all media stream files for managing VR video, description information including spatial location information of the content of the video stream in the VR video; obtaining request information of the client, and parsing the perspective carried in the request The information stream is read according to the view information; the video stream involved in the user view is encapsulated and multiplexed, and the packaged and multiplexed file includes the multiplexed description information of each view.
  • the server can also be a logic module in a Content Delivery Network (CDN).
  • CDN Content Delivery Network
  • the above network architecture is only one of the embodiments of the present invention.
  • the network architecture in the embodiment of the present invention includes, but is not limited to, the foregoing network architecture, as long as the network of the video stream transmission method in the present invention can be implemented.
  • the architecture is within the scope of the invention as protected and covered.
  • FIG. 5 is a schematic flowchart of a video stream transmission method according to an embodiment of the present invention.
  • the video in the embodiment of the present invention will be described from the interaction side between the client and the server in conjunction with FIG. 5.
  • the streaming method is described in detail.
  • the method may include the following steps S501 - S505.
  • Step S501 The client sends a target request to the server, and the server receives the target request sent by the client, where the target request includes the target spatial location information corresponding to the target spatial object that needs to be presented by the client in the virtual reality VR content component. At least one of an identifier of the media representation, a perspective information of the user of the client, or spatial information represented by the media.
  • Spatial Objects is a part of a content component, that is, a content component is composed of a plurality of spatial objects.
  • a content component is composed of a plurality of spatial objects.
  • the sub-videos corresponding to the perspectives are composed.
  • the identifier of the media representation is the identifier of the sub-video stream
  • the perspective information of the user of the client is the spatial object information, that is, in the embodiment of the present invention, the VR content component may be a VR video
  • the target spatial object may be a user needs to be presented in the VR video.
  • the perspective part of the view can be called the Region of Interest (ROI).
  • ROI Region of Interest
  • the processed image is outlined in the form of a box, a circle, an ellipse or an irregular polygon.
  • the area is called the area of interest.
  • the target spatial location information carried in the target request can be considered as the viewing angle region that the user is currently interested in to present. It can be understood that the target request may be triggered by the angle movement action of the client, or may be triggered by a related input instruction of the user, which is not specifically limited by the present invention.
  • the user can view the VR video 360 degrees, but the video display area viewed by the user at each moment is only a part of the VR video, so when the content is prepared, the VR is divided into multiple regions, and each region corresponds to a group.
  • the adaptive code stream, the client selects the corresponding video stream according to the area viewed by the user to receive the viewing.
  • FIG. 6 is a 360-degree view change diagram provided by an embodiment of the present invention.
  • the contents of the left border and the right frame are respectively two view areas of the user, and when the user watches the video, The user turns the left border into the right border through some operation (such as turning the smart helmet).
  • the client When the user's perspective is switched to the right border, the client also presents the video content of the response viewing area; the viewing position of the user viewing the content is Arbitrary, then when there is a user viewing a certain perspective, the content of the perspective will appear in multiple VR partitioned areas, the user Need to get more regional video streams.
  • the spherical surface in addition to mapping the spherical surface in FIG. 6 to the latitude and longitude map, the spherical surface can also be mapped into other aggregates such as a cube, a polyhedron, etc., in the following description, the latitude and longitude is The 2D image mapping mode of the image is mainly described, but other mapping methods are also covered by the protection of the present invention.
  • FIG. 7 is a mapping of a spherical to a latitude and longitude map according to an embodiment of the present invention, in which the target spatial position information is assumed to be the coordinates of the upper left position of the viewing angle area and the width and height of the viewing angle area.
  • the information for example, the upper left position of the right part of FIG. 6 is (x, y) in the latitude and longitude picture, and the length and width of the right side frame is (w, h), which carries x, y, w in the request of the client.
  • the value of h, or the proportional scaling value of x, y, w, h may also be the angle value in the sphere.
  • the target request further includes at least one of a region of interest ROI information, bandwidth information of the client, decoding standard information supported by the client, and video maximum resolution information of the client. . That is, when the client initiates the video request, the client may also carry some related parameters such as the video playing condition or the playing performance, so that the server can process and feed the video stream in a more appropriate processing manner.
  • Step S502 The server searches for a corresponding target space object in the VR content component according to the target spatial location information.
  • the server searches for the corresponding target space object in the VR component content according to the target spatial location information in the received target request, so as to obtain the video stream corresponding to the target spatial object subsequently. For example, after receiving the target request of the client, the server parses the target request to obtain the perspective information requested by the client, and obtains a video stream that has overlapping content with the client view area from the media presentation description information according to the perspective information of the client.
  • Step S503 The server acquires multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object.
  • the server acquires the multiplexed video stream information obtained by performing the preset multiplexing process on the original video stream corresponding to the target spatial object.
  • the operation of performing preset multiplexing processing on the original video stream corresponding to the target space object may be completed before the server receives the target request, or may be processed after receiving the target request. . If the target request is completed before receiving the target request, the time for responding to the request can be saved. After the target space object is determined, the pre-processed original video stream corresponding to the target space object is directly obtained for preset multiplexing.
  • the frequency stream information improves the response rate of the server, shortens the response time, and improves the user's viewing experience. If the preset multiplexing process is performed after receiving the target request, it takes a certain multiplexing processing time, but the saving can be saved in advance.
  • the storage space required for a large number of preset multiplexing processes needs to be performed; of course, the combination of the above two methods may be performed, that is, some users may need to frequently view the content, and perform preset preset multiplexing processing, which may not be required by the user.
  • the content of the viewing is not processed until the target request is received. Therefore, the present invention does not specifically limit the preset multiplexing processing performed by the server on the original video stream corresponding to the related target space object.
  • the server obtains a corresponding video stream according to the information of the video stream having overlapping content with the client view area, and performs preset multiplexing processing on the video stream; as shown in FIG. 6 , the right side frame is the perspective of the client request.
  • the area, the A to I area is the nine areas described in the media presentation description information, and the code stream multiplexing module can derive the right border content area coverage B according to the information of the right border and the position information of the nine areas of A to I.
  • the video stream includes the multiplexed video stream description information, where the video stream description information includes part or all of the following information: the number of video streams in the multiplexed video stream, and the spatial area of each multiplexed video stream. Location information, resolution information of each multiplexed video stream, storage location information of each multiplexed video stream in the multiplexed video stream, video stream multiplexing type, resolution of the video source corresponding to each view information.
  • the specific preset multiplexing process may be a binary end-to-end splicing of the multiplexed video stream in the multiplexed file, or a sample interleaving storage.
  • the multiplexed video stream information includes information of N multiplexed sub-video streams obtained by N sub-video streams respectively undergoing preset multiplexing processing, and N sub-video streams and N are multiplexed.
  • the sub-video streams have a one-to-one correspondence.
  • the N sub-video streams are corresponding sub-video streams generated by dividing the target spatial object into N sub-space objects and dividing the original video stream according to the N sub-space objects, where N is a natural number greater than 1; the target request feedback further includes
  • the multiplexing description information includes at least one of the following items: the number N of sub-video streams included in the multiplexed video stream information; and the initial sub-video stream in the N sub-video streams in the multiplexed video stream The starting position offset in the information; the size of the N multiplexed sub-video streams; the spatial position information corresponding to the N multiplexed sub-video streams in the VR content component; N multiplexed children
  • the resolution information of the video stream; the video stream multiplexing type of the N multiplexed sub-video streams; the resolution information of the N sub-video streams, that is, the various types of information carried in the multiplex description information is convenient for the client Completed according to the multiplexing description information Parsing and rendering of the VR video requested by the user.
  • Step S504 The server sends a target request feedback to the client in response to the target request, the client receives a target request feedback of the server in response to the target request, and the target request feedback includes the target space object
  • the corresponding original video stream is subjected to preset multiplexing processing to obtain multiplexed video stream information.
  • the server directly returns the corresponding video stream, so there may be a large amount of video stream coding redundancy, especially for some VR video scenes, there are some repetitions.
  • the scene for example, in the VR experience scene of the sightseeing, the color of the sky, or the color and texture of the river are basically the same, so the part of the duplicate content can be multiplexed to save the transmission bandwidth, time and efficiency of the video stream.
  • the multiplexing description information further includes: spatial location information corresponding to the N multiplexed sub video streams respectively in the VR content component.
  • the multiplexing description information includes specific spatial location information of the plurality of multiplexed sub-video streams. Therefore, the client can finally parse and display the VR video that the user needs to view according to the information in the multiplexing description information.
  • the multiplexing description information further includes: spatial location information corresponding to the N sub-video streams in the VR content component. Therefore, the client can know the sub-video streams of the views that have been requested according to the spatial location information of the video stream, so that when the repeated view content needs to be viewed, there is no need to repeat the request to improve the VR video transmission efficiency. And user experience.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing process. That is, the preset multiplexing processing manner may include multiple types, which are not specifically limited by the present invention.
  • Step S505 The client performs video parsing and presentation according to the multiplexed video stream information.
  • the client parses the related video stream according to the multiplexed video stream information carried in the target request feedback sent by the received server, and finally presents the video stream.
  • the client obtains the multiplexed video stream, parses the multiplexed video stream description information in the multiplexed video stream, sends the video stream to the decoder for decoding, and decodes the decoded video stream video content according to the complex
  • the information described in the video stream description information is presented.
  • other video streams that need to be transmitted to the client may be included.
  • the server according to the perspective location information in the request information of the client,
  • the video stream involved in the view location information is multiplexed and packaged and transmitted to the client;
  • the video stream involved in the view location information refers to the content of the video content that partially or completely overlaps with the content range requested by the client.
  • Video stream That is, by responding to the client request by performing the preset multiplexing process on the video stream that requests the response, the number of requests from the client is reduced, the number of responses of the server is also reduced, and the video stream information of each view at the same time is ensured. Simultaneous arrival reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • FIG. 8 is a schematic flowchart diagram of another video stream transmission method according to an embodiment of the present invention. Another video stream transmission method in the embodiment of the present invention will be described in detail below with reference to FIG. 8 from the interaction side between the client and the server. The method may include the following steps S801 to S805.
  • steps S801 to S802 in the embodiment provided in FIG. 8 are the same as the steps S501 to S502 in the embodiment provided in FIG. 2, and specific implementation manners are not described herein again.
  • Step S803 The server divides the target spatial object into N subspace objects, and the N subspace objects encode to generate corresponding N sub video streams, where N is a natural number greater than 1.
  • the target spatial object is divided into multiple sub-space objects, so as to more multiplex different sub-video streams corresponding to multiple spaces, thereby further improving the multiplexing efficiency of the video stream.
  • the principle of dividing the target space object into N subspace objects may be divided according to the continuity of the spatial position, or may be divided according to the content or the overlapping of the video.
  • Step S804 The server acquires information of the N multiplexed sub-video streams obtained by the preset multiplexing processing by using the N sub-video streams.
  • the server acquires information of the plurality of multiplexed sub video streams obtained by performing preset multiplexing processing on the plurality of subspace objects, so as to finally transmit at a small code rate.
  • the preset multiplexing process may be performed in advance, or the preset multiplexing process performed after the subspace is determined.
  • steps S805 to S806 in the embodiment provided in FIG. 8 are the same as the steps S504 to S505 in the embodiment provided in FIG. 5 , and specific implementation manners are not described herein again.
  • description manners of the related multiplexing description information of the video stream using the preset multiplexing processing in the foregoing embodiments may be implemented by any one of the following specific description manners:
  • FOVCount Number of sub video streams in the multiplexed video stream information
  • First_offset offset of the sub-video stream of the first view in the multiplexed video stream information in the multiplexed video stream
  • FOV_size the size of each multiplexed sub-video stream in the multiplexed video stream
  • the client receives the multiplexed video stream information, parses the multiplexed video stream description information in ‘fovm’, and obtains the number of the sub-video stream, the offset and size information of the sub-video stream;
  • FIG. 9 is a schematic diagram of a multiplexed video stream according to an embodiment of the present invention.
  • video 1 to video n are video content in a simultaneous interval, and other data may not exist;
  • First_offset is the starting position offset of video 1.
  • the first three steps are consistent with the client behavior in the description mode 1.
  • the presentation in the fourth step is to splicing the decoded image according to the xywh information in the fovm;
  • Step 5 Combine the ROI information carried by the client request, and present the spliced video stream content.
  • ROI_x the corresponding x-axis position information of the N sub-video streams requested by the client in the VR content component
  • ROI_y the corresponding y-axis position information of the N sub-video streams requested by the client in the VR content component
  • ROI_w the width of the N sub-video streams requested by the client
  • ROI_h the height of the N sub-video streams requested by the client
  • the new ROI information in this description mode can be used together with the information in the description modes one and two;
  • the first to fourth steps are consistent with the client behavior in the description mode 2.
  • the fifth step is described as: presenting the video content of the area specified by ROI_x, ROI_y, ROI_w, and ROI_h in a plurality of perspective stitched contents. ;
  • MultiplexType Multiplexing of the stream file of the multiplexed video stream in the multiplex file: video stream stream (or stream segmentation) binary end-to-end splicing, or sample interleaving multiplexing in each video stream
  • Sample_offset the offset of the sample in the multiplex file
  • Sample_size the size of the sample
  • the client receives the multiplexed video stream, parses the multiplexed video stream description information in ‘fovm’, and obtains mode information of the multiplexed video stream;
  • the client determines the multiplexing mode of the data of each view according to the multiplexing mode information; if the multiplexing mode is the end-to-end stitching mode, the client parses the offset information and the data volume information, and sends the data of each view into the decoding. If the sample is interleaved, the client parses the offset and data amount information for each sample and sends each sample to the corresponding decoder.
  • FIG. 10 is a schematic diagram of sample interleave multiplexing in a video stream according to an embodiment of the present invention.
  • the time-frequency (videos 1, 2, and 3) of different views can be multiplexed by interleaving.
  • the sub-video streams corresponding to the checkered lines, the oblique lines, and the vertical lines may be multiplexed by interleaving, and the multiplexing result is the multiplexed video stream on the right side of FIG.
  • source_w and source_h are respectively the width and height of the video source corresponding to the view.
  • each spatial position involved in the above five description manners may be absolute position information in the VR video content, or may be a proportional value or a yaw angle.
  • the request information of the client adopts the HTTP protocol
  • the get request of the http carries the view area information of the client, such as x, y, w, h mentioned in the embodiment.
  • the request information of the client may further carry the bandwidth of the client.
  • the decoding standard supported by the client or the maximum resolution of the video; the server selects the video stream that meets the performance requirements of the client for multiplexing transmission according to the information carried in the request.
  • the multiplexed video stream data may be segment media data in the DASH protocol.
  • the code stream involved in the multiplexing may include a content stream of the code stream generated by the server side and a code stream partially or completely overlapping with the content area requested by the client.
  • the present invention also provides a related apparatus for implementing the foregoing method.
  • FIG. 11 is a schematic structural diagram of a client according to an embodiment of the present invention.
  • the client 10 includes a requesting module 101, a receiving module 102, and a processing module 103.
  • the requesting module 101 is configured to send a target request to the server, where the target request includes the target spatial location information corresponding to the target spatial object that the client requests to present in the virtual reality VR content component;
  • the receiving module 102 is configured to receive, by the server, the target request feedback in response to the target request, where the target request feedback includes the multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object;
  • the processing module 103 is configured to perform video parsing and presentation according to the multiplexed video stream information.
  • the multiplexed video stream information includes information of N multiplexed sub video streams obtained by the N sub video streams respectively through the preset multiplexing process, where the N sub video streams are The target space object is divided into N subspace objects, and the N subspace objects are encoded to generate a corresponding sub video stream, where N is a natural number greater than 1;
  • the target request feedback further includes multiplexing description information,
  • the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the video stream multiplexing type of the N multiplexed sub video streams is the video stream multiplexing type of the N multiplexed sub video streams.
  • the multiplexing description information further includes:
  • the N multiplexed sub-video streams respectively correspond to spatial location information in the VR content component.
  • the multiplexing description information further includes:
  • the N sub-video streams respectively correspond to spatial location information in the VR content component.
  • the target request further includes at least one of a region of interest ROI information, bandwidth information of the client, decoding standard information supported by the client, and maximum video resolution information of the client. item.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing process.
  • modules in the client 10 may be corresponding to the specific implementation manners in the foregoing method embodiments in FIG. 5 to FIG. 10, and details are not described herein again.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views.
  • the simultaneous arrival of information reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • FIG. 12 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the server 20 includes: a receiving module 201, a parsing module 202, an obtaining module 203, and a feedback module 204.
  • the receiving module 201 is configured to receive a target request sent by the client, where the target request includes the target spatial location information corresponding to the target spatial object that the client needs to present in the virtual reality VR content component;
  • the parsing module 202 is configured to be in the VR content component according to the target spatial location information Find the corresponding target space object;
  • the obtaining module 203 is configured to obtain multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object.
  • the feedback module 204 is configured to send, to the client, target request feedback in response to the target request, where the target request feedback includes the multiplexed video stream information.
  • the obtaining module 203 includes:
  • a dividing unit configured to divide the target spatial object into N subspace objects, and divide the original video stream according to the N subspace objects to generate corresponding N sub video streams, where N is a natural number greater than 1.
  • an obtaining unit configured to acquire information of the N multiplexed sub-video streams obtained by performing the preset multiplexing process on the N sub-video streams.
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the video stream multiplexing type of the N multiplexed sub video streams is the video stream multiplexing type of the N multiplexed sub video streams.
  • the multiplexing description information further includes:
  • the N multiplexed sub-video streams respectively correspond to spatial location information in the VR content component.
  • the multiplexing description information further includes:
  • the N sub-video streams respectively correspond to spatial location information in the VR content component.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing process.
  • modules in the server 20 may be corresponding to the specific implementation manners in the foregoing method embodiments in FIG. 5 to FIG. 10, and details are not described herein again.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views.
  • the simultaneous arrival of information reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • FIG. 13 is a schematic structural diagram of another client according to an embodiment of the present invention.
  • the client 30 includes a processor 301, a memory 302, and a transceiver 303.
  • the processor 301, the memory 302 and the transceiver 303 can be connected by a bus or other means.
  • the client 30 may further include a network interface 304 and a power module 305.
  • the processor 301 can be a digital signal processing (DSP) chip.
  • DSP digital signal processing
  • the memory 302 is used to store instructions.
  • the memory 302 can be a read-only memory (English: Read-Only Memory, ROM) or a random access memory (English: Random Access Memory, RAM).
  • the memory 302 is configured to store a video stream transmission program code.
  • the transceiver 303 is used to transmit and receive signals.
  • Network interface 304 is used by client 30 to communicate data with other devices.
  • the network interface 304 can be a wired interface or a wireless interface.
  • the power module 305 is used to power the various modules of the client 30.
  • the processor 301 is configured to call an instruction stored in the memory 302 to perform the following operations:
  • the transceiver 303 Sending, by the transceiver 303, a target request to the server, where the target request includes the target spatial location information corresponding to the target spatial object that the client requests to present in the virtual reality VR content component;
  • the target request feedback includes the multiplexed video stream information obtained by performing preset multiplexing processing on the original video stream corresponding to the target spatial object;
  • the multiplexed video stream information includes N sub-video streams respectively through the preset multiplexing Processing the obtained information of the N multiplexed sub video streams, wherein the N sub video streams divide the target spatial object into N subspace objects, and the original video stream according to the N subspaces The object performs partitioning to generate a corresponding sub-video stream, where N is a natural number greater than 1;
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the video stream multiplexing type of the N multiplexed sub video streams is the video stream multiplexing type of the N multiplexed sub video streams.
  • the multiplexing description information further includes:
  • the N multiplexed sub-video streams respectively correspond to spatial location information in the VR content component.
  • the multiplexing description information further includes:
  • the N sub-video streams respectively correspond to spatial location information in the VR content component.
  • the target request further includes at least one of a region of interest ROI information, bandwidth information of the client, decoding standard information supported by the client, and video maximum resolution information of the client.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing process.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server streams the video of the request response.
  • the line preset multiplexing process responds to the client's request, which reduces the number of requests from the client and reduces the number of responses of the server, and ensures simultaneous simultaneous arrival of video stream information of various views, reducing waiting for all The time each video stream is received, thereby reducing the presentation delay of the view.
  • FIG. 14 is a schematic structural diagram of another server according to an embodiment of the present invention.
  • the server 40 includes a processor 401, a memory 402, and a transceiver 403.
  • the processor 401, the memory 402, and the transceiver 403 may be connected by a bus or other means.
  • the server 40 may further include a network interface 404 and a power module 405.
  • the processor 401 can be a digital signal processing (DSP) chip.
  • DSP digital signal processing
  • the memory 402 is used to store instructions.
  • the memory 402 can be a read-only memory (English: Read-Only Memory, ROM) or a random access memory (English: Random Access Memory, RAM).
  • the memory 402 is configured to store a video stream transmission program code.
  • the transceiver 403 is for transmitting and receiving signals.
  • Network interface 404 is used by server 40 for data communication with other devices.
  • the network interface 404 can be a wired interface or a wireless interface.
  • the power module 405 is used to power various modules of the server 40.
  • the processor 401 is configured to call an instruction stored in the memory 402 to perform the following operations:
  • the transceiver 403 Receiving, by the transceiver 403, a target request sent by the client, where the target request includes the target spatial location information corresponding to the target spatial object that the client needs to present in the virtual reality VR content component;
  • the target request feedback in response to the target request is sent to the client by the transceiver 403, and the target request feedback includes the multiplexed video stream information.
  • the processor 401 is configured to obtain the multiplexed video stream information obtained by performing the preset multiplexing process on the original video stream corresponding to the target spatial object, specifically:
  • the target request feedback further includes multiplexing description information, where the multiplexing description information includes at least one of the following items:
  • the number N of the sub video streams included in the multiplexed video stream information is the number N of the sub video streams included in the multiplexed video stream information
  • the starting sub-video stream of the N sub-video streams is offset from a starting position in the multiplexed video stream information
  • the multiplexing description information further includes:
  • the N multiplexed sub-video streams respectively correspond to spatial location information in the VR content component.
  • the multiplexing description information further includes:
  • the N sub-video streams respectively correspond to spatial location information in the VR content component.
  • the preset multiplexing process includes a video stream binary end-to-end splicing process or a video segmentation binary end-to-end splicing or sample interleaving multiplexing process.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views. Information arrives at the same time Up, reducing the waiting time for all video streams to be received, thus reducing the presentation delay of the view.
  • FIG. 15 is a schematic structural diagram of a video stream transmission system according to an embodiment of the present invention.
  • the video stream transmission system 50 includes a VR client 501 and a VR server 502, where
  • the VR client 501 can be the client 30 in the above embodiment of FIG. 13, and the VR server 502 can be the server 40 in the above-described embodiment of FIG. It can be understood that the video streaming system 50 in the embodiment of the present invention may further include a device such as a photographing device, a storage device, a routing device, a switching device, and a core network server.
  • a device such as a photographing device, a storage device, a routing device, a switching device, and a core network server.
  • the video stream related to the view position information is multiplexed and encapsulated by the server according to the view position information in the request information of the client, and transmitted to the client, where the video stream related to the view position information is Refers to a video stream in the video content that has some or all of the content of the range of views requested by the client. That is, the server responds to the client request by performing a preset multiplexing process on the video stream that responds to the request, which reduces the number of requests from the client, reduces the number of responses of the server, and ensures simultaneous video streams of various views.
  • the simultaneous arrival of information reduces the time required to wait for all video streams to be received separately, thereby reducing the presentation delay of the viewing angle.
  • the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and the program includes some or all of the steps of any one of the video stream transmission methods described in the foregoing method embodiments.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, It is electrical or other form.
  • the units described above as separate components may or may not be physically separated.
  • the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the above-described integrated unit if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • the instructions include a plurality of instructions for causing a computer device (which may be a personal computer, server or network device, etc., and in particular a processor in a computer device) to perform all or part of the steps of the above-described methods of various embodiments of the present invention.
  • the foregoing storage medium may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read only memory (English: Read-Only Memory, abbreviation: ROM) or a random access memory (English: Random Access Memory, abbreviation: RAM) and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种视频流传输方法、相关设备及系统,其中的方法可包括:客户端向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;所述客户端接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;所述客户端根据所述复用视频流信息进行视频解析呈现。可以有效地提升视频流传输效率。

Description

一种视频流传输方法、相关设备及系统 技术领域
本发明涉及视频技术领域,尤其涉及一种视频流传输方法、相关设备及系统。
背景技术
虚拟现实(Virtual Reality,VR)技术是一种可以创建和体验虚拟世界的计算机仿真系统,它利用计算机生成一种模拟环境,使用户沉浸到该环境中。目前,VR技术可广泛的应用于城市规划、室内设计、工业仿真、古迹复原、桥梁道路设计、房地产销售、旅游教学、教育培训等众多领域。
现有技术中,将VR技术应用于现有的视频技术中时,则实现了360度超出人眼正常视觉范围的全景视频应用,这种视频应用给人们带来了全新的观看方式和视觉体验,同时也带来了技术的挑战。即用户可以360度的观看VR视频内容,例如虚拟现实直播或录播系统等,但是与此同时,由于VR视频的视频流较大,且客户端与服务器之间的请求反馈过程较为复杂,因此可能会导致用户在通过客户端观看VR视频时传输带宽耗费大等问题,所以在VR视频内容准备时会将视频内容划分成多个空间对象,在用户观看时将用户视角对应的空间对象发送到客户端呈现,这样就可以减少传输的数据量,但是同时又引入了新的问题,客户端的视角可能同时对应多个空间对象,这样客户端就要同时获取多个空间对象的码流,而且多个空间对象的码流解码后是同步呈现的,所以客户端需要等待多个空间对象的码流都收到后才可以呈现,这样就加大了客户端呈现新视角的时延,从而影响用户体验。
发明内容
本发明实施例所要解决的技术问题在于,提供一种视频流传输方法、相关设备及系统,解决了现有技术的VR视频体验中呈现时延大的问题。
一、MPEG-DASH技术介绍
2011年11月,MPEG组织批准了DASH标准,DASH标准是基于HTTP 协议传输媒体流的技术规范(以下称DASH技术规范);DASH技术规范主要由两大部分组成:媒体呈现描述(英文:Media Presentation Description,MPD)和媒体文件格式(英文:file format)。
1、媒体文件格式
在DASH中服务器会为同一个视频内容准备多种版本的码流,每个版本的码流在DASH标准中称为表示(英文:representation)。表示是在传输格式中的一个或者多个码流的集合和封装,一个表达中包含一或者多个分段。不同版本的码流的码率、分辨率等编码参数可以不同,每个码流分割成多个小的文件,每个小文件被称为分段(或称分段,英文:segment)。在客户端请求媒体分段数据的过程中可以在不同的媒体表示之间切换,如图1所示,图1是本发明实施例提供的码流分段的切换的示意图。服务器为一部电影准备三个不同版本的码流数据,并在MPD中使用3个表示(英文:Representation,以下简称rep)对上述三个不同版本的码流数据进行描述,包括rep1,rep2,rep3。其中,rep1是码率为4mbps(每秒兆比特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的标清视频。图1中标记为阴影的分段是客户端请求播放的分段数据,客户端请求的前三个分段是媒体表示rep3的分段,第四个分段切换到rep2,请求第四个分段,之后切换到rep1,请求第五个分段和第六个分段等。每个表示的分段可以首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。
2、媒体呈现描述
在DASH标准中,媒体呈现描述被称为MPD,MPD可以是一个xml的文件,文件中的信息是采用分级方式描述,如图2所示,图2为本发明实施例提供的mdp文件的分级式结构图,上一级的信息被下一级完全继承。在该文件中描述了一些媒体元数据,这些元数据可以使得客户端了解服务器中的媒体内容信息,并且可以使用这些信息构造请求segment的http-URL。
在DASH标准中,媒体呈现(英文:media presentation),是呈现媒体内容的结构化数据的集合;媒体呈现描述(英文:media presentation description),一个规范化描述媒体呈现的文件,用于提供流媒体服务;时期(英文:period),一组连续的时期组成整个媒体呈现,时期具有连续和不重叠的特性;表示(英文:representation),封装有一个或多个具有 描述性元数据的的媒体内容成分(编码的单独的媒体类型,例如音频、视频等)的结构化的数据集合即表示是传输格式中一个或者多个码流的集合和封装,一个表示中包含一个或者多个分段;自适应集(英文:AdaptationSet),表示同一媒体内容成分的多个可互替换的编码版本的集合,一个自适应集包含一个或者多个表示;子集(英文:subset),一组自适应集合的组合,当播放器播放其中所有自适应集合时,可以获得相应的媒体内容;分段信息,是媒体呈现描述中的HTTP统一资源定位符引用的媒体单元,分段信息描述媒体数据的分段,媒体数据的分段可以存储在一个文件中,也可以单独存储,在一种可能的方式中,MPD中会存储媒体数据的分段。
本发明有关MPEG-DASH技术的相关技术概念可以参考ISO/IEC23009-1:2014Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats,中的有关规定,也可以参考历史标准版本中的相关规定,如ISO/IEC 23009-1:2013或ISO/IEC 23009-1:2012等。
二、虚拟现实(virtual reality,VR)技术介绍
虚拟现实技术是一种可以创建和体验虚拟世界的计算机仿真系统,它利用计算机生成一种模拟环境,是一种多源信息融合的交互式的三维动态视景和实体行为的系统仿真,可以使用户沉浸到该环境中。VR主要包括模拟环境、感知、自然技能和传感设备等方面。模拟环境是由计算机生成的、实时动态的三维立体逼真图像。感知是指理想的VR应该具有一切人所具有的感知。除计算机图形技术所生成的视觉感知外,还有听觉、触觉、力觉、运动等感知,甚至还包括嗅觉和味觉等,也称为多感知。自然技能是指人的头部转动,眼睛、手势、或其他人体行为动作,由计算机来处理与参与者的动作相适应的数据,并对用户的输入作出实时响应,并分别反馈到用户的五官。传感设备是指三维交互设备。当VR视频(或者360度视频,或者全方位视频(英文:Omnidirectional video))在头戴设备和手持设备上呈现时,只有对应于用户头部的方位部分的视频图像呈现和相关联的音频呈现。
VR视频和通常的视频(英文:normal video)的差别在于通常的视频是整个视频内容都会被呈现给用户;VR视频是只有整个视频的一个子集被呈 现给用户(英文:in VR typically only a subset of the entire video region represented by the video pictures)。
三、现有DASH标准的空间描述:
现有标准中,对空间信息的描述原文是“The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects.A Spatial Object is defined as a spatial part of a content component(e.g.a region of interest,or a tile)and represented by either an Adaptation Set or a Sub-Representation.”
【中文】:MPD中描述的是空间对象(即Spatial Objects)之间的空间关系(即spatial relationships)。空间对象被定义为一个内容成分的一部分空间,比如现有的感兴趣区域(英文:region of interest,ROI)和tile;空间关系可以在Adaptation Set和Sub-Representation中描述。现有DASH标准在MPD中定义了一些描述子元素,每个描述子元素都有两个属性,schemeIdURI和value。其中,schemeIdURI描述了当前描述子是什么,value是描述子的参数值。在已有的标准中有两个已有描述子SupplementalProperty和EssentialProperty(补充特性描述子和基本特性描述子)。现有标准中如果这两个描述子的schemeIdURI="urn:mpeg:dash:srd:2014"(或者schemeIdURI=urn:mpeg:dash:VR:2017),则表示该描述子描述了关联到的空间对象的空间信息(spatial information associated to the containing Spatial Object.),相应的value中列出了SDR的一系列参数值。具体value的语法如下表1:
表1
Figure PCTCN2016101920-appb-000001
Figure PCTCN2016101920-appb-000002
如图3,图3是本发明实施例提供的空间对象的空间关系示意图。其中,图像AS可设为一个内容成分,AS1、AS2、AS3和AS4为AS包含的4个空间对象,每个空间对象关联一个空间,MPD中描述了各个空间对象的空间关系,例如各个空间对象关联的空间之间的关系。
MPD样例如下:
<?xml version="1.0"encoding="UTF-8"?>
<MPD
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:mpeg:dash:schema:mpd:2011"
xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011DASH-MPD.xsd"
[...]>
<Period>
<AdaptationSet…]>
<SupplementalProperty schemeIdUri="urn:mpeg:dash:srd:2014"
value="1,0,0,1920,1080,1920,1080,1"/><!--视频源标识:1;空间对象的左上坐标(0,0)空间对象的长宽是(1920,1080);空间对象参考的空间(1920.1080);空间对象组度ID是1;这里空间对象的长宽=空间对象参考的空间,所以Representation1(id=1)中的表达对应的是整个视频内容-->
Figure PCTCN2016101920-appb-000003
value="1,0,0,1920,1080,3840,2160,2"/><!--视频源标识:1(和上面的视频源相同的内容源);空间对象的左上坐标(0,0);空间对象的长宽是(1920,1080);空间对象参考的空间(3840,2160);空间对象组ID是2。这里空间对象的长宽是空间对象参考的空间的四分之一大 小,而且从坐坐标看是左上角的空间对象即AS1,Representation2中的表达AS1的内容。同理,其他空间对象的描述如下相关描述子的描述,空间对象组ID相同的空间对象属于同一个视频内容-->
Figure PCTCN2016101920-appb-000004
Figure PCTCN2016101920-appb-000005
其中,上述空间对象的左上坐标、空间对象的长宽和人空间对象参考的空间,也可以是相对值,比如:上述value="1,0,0,1920,1080,3840,2160,2"可以描述成value="1,0,0,1,1,2,2,2"。
图16描述了一种服务器将客户端的视角(英文field of view,FOV)对应的空间对象码流复用的方法,客户端向服务器发起ROI请求,服务器将ROI区域对应的空间对象的segment复用并发送到客户端。所述方法可以应用在基于MPEG-DASH技术的客户端和服务器的交互中。
第一方面,本发明实施例提供了一种视频流传输方法,可包括:
客户端向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;所述客户端接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;所述客户端根据所述复用视频流信息进行视频解析呈现。即通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
结合第一方面,在第一种可能的实现方式中,所述复用视频流信息包括N个子视频流分别经过所述预设复用处理得到的N个被复用的子视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象, 并将所述所述N个子空间对象进行编码产生对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:所述复用视频流信息中包含的所述子视频流的个数N;所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;所述N个被复用的子视频流的数据量信息;所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;所述N个被复用的子视频流的分辨率信息;所述N个被复用的子视频流的视频流复用类型。通过在目标请求反馈中携带复用描述信息,让客户端可以根据该复用表述信息中的内容来进行复用视频流的解析和呈现。
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述复用描述信息还包括:所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。客户端根据N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息来进行解析后的视频流的最终呈现。
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述目标请求包括如下的至少一种:媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。服务器根据目标请求,获得N个的子视频流进行复用。
结合第一方面的第一种可能的实现方式,或者,结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述复用描述信息还包括:所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。因此,客户端可以根据视频流的空间位置信息,随时获知已经请求过哪些视角的子视频流,以便于后续再有重复的视角内容需要观看时,无需再重复请求,以提升VR视频传输效率,和用户体验。
结合第一方面,或者结合第一方面的第一种可能的实现方式,或者,结合第一方面的第二种可能的实现方式,或者,结合第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。客户端在发起视频请求时,还可以携带一些自身的视频播放条件或播放性能等的相关参数,以便于服务器以更合适的处理方式,进行视频流的处理和反馈。
结合第一方面,或者结合第一方面的第一种可能的实现方式,或者,结合第一方面的第二种可能的实现方式,或者,结合第一方面的第三种可能的实现方式,或者,结合第一方面的第四种可能的实现方式,在第五种可能的实现方式中,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。预设复用处理方式可以包含多种,以满足不同VR视频的不同处理需求。
第二方面,本发明实施例提供了一种视频流传输方法,可包括:
服务器接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;所述服务器根据所述目标空间位置信息在所述VR内容成分中查找对应的目标空间对象;所述服务器获取将所述目标空间对象对应的视频流进行了预设复用处理得到的复用视频流信息;所述服务器向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
结合第二方面,在第一种可能的实现方式中,所述服务器获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息,包括:
所述服务器将所述目标空间对象划分为N个子空间对象,并将所述N个子空间对象进行编码生成对应的N个子视频流,所述N为大于1的自然数;所述服务器获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
结合第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:所述复用视频流信息中包含的所述子视频流的个数N;所述N个 子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;所述N个被复用的子视频流的数据量;所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;所述N个被复用的子视频流的分辨率信息;所述N个被复用的子视频流的视频流复用类型。
结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,所述复用描述信息还包括:所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第二方面的第二种可能的实现方式,或者,结合第二方面的第三种可能的实现方式,在第四种可能的实现方式中,所述复用描述信息还包括:所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第二方面,或者结合第二方面的第一种可能的实现方式,或者,结合第二方面的第二种可能的实现方式,或者,结合第二方面的第三种可能的实现方式,或者,结合第二方面的第四种可能的实现方式,在第五种可能的实现方式中,所述预设复用处理包括视频流二进制首尾拼接处理,或视频分段二进制首尾拼接处理或交织复用处理。
第三方面,本发明实施例提供了一种客户端,可包括:
请求模块,用于向服务器发送目标请求,所述目标请求如下的至少一种:所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息,媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。;
接收模块,用于接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
处理模块,用于根据所述复用视频流信息进行视频解析呈现。
结合第三方面,在第一种可能的实现方式中,所述复用视频流信息包括N个子视频流分别经过所述预设复用处理得到的N个被复用的子视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象,并将所述N个子空间对象进行编码生成的对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的数据量;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型。
结合第三方面的第一种可能的实现方式,在第二种可能的实现方式中,所述复用描述信息还包括:所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第三方面的第一种可能的实现方式,或者,结合第三方面的第二种可能的实现方式,在第三种可能的实现方式中,所述复用描述信息还包括:所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第三方面,或者结合第三方面的第一种可能的实现方式,或者,结合第三方面的第二种可能的实现方式,或者,结合第三方面的第三种可能的实现方式,在第四种可能的实现方式中,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。
结合第三方面,或者结合第三方面的第一种可能的实现方式,或者,结合第三方面的第二种可能的实现方式,或者,结合第三方面的第三种可能的实现方式,或者,结合第三方面的第四种可能的实现方式,在第五种可能的实现方式中,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
第四方面,本发明实施例提供了一种视频流传输方法,可包括:
接收模块,用于接收客户端发送的目标请求,所述目标请求如下的至少一种:所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息,媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。;
解析模块,用于根据所述目标空间位置信息在所述VR内容成分中查找 对应的目标空间对象;
获取模块,用于获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
反馈模块,用于向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。
结合第四方面,在第一种可能的实现方式中,所述获取模块,包括:
划分单元,用于将所述目标空间对象划分为N个子空间对象,并将所述N个子空间对象进行编码生成对应的N个子视频流,所述N为大于1的自然数;
获取单元,用于获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
结合第四方面的第一种可能的实现方式,在第二种可能的实现方式中,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的大小;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型。
结合第四方面的第二种可能的实现方式,在第三种可能的实现方式中,所述复用描述信息还包括:
所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第四方面的第二种可能的实现方式,或者,结合第四方面的第三种可能的实现方式,在第四种可能的实现方式中,所述复用描述信息还包括:所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
结合第四方面,或者结合第四方面的第一种可能的实现方式,或者,结 合第四方面的第二种可能的实现方式,或者,结合第四方面的第三种可能的实现方式,或者,结合第四方面的第四种可能的实现方式,在第五种可能的实现方式中,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
第五方面,本发明实施例提供了一种客户端,可包括处理器、存储器和收发器,其中,存储器用于存储指令,处理器用于调用存储器中存储的指令来执行如本发明实施例第一方面任一方法中所描述的部分或全部步骤。
第六方面,本发明实施例提供了一种服务器,可包括处理器、存储器和收发器,其中,存储器用于存储指令,处理器用于调用存储器中存储的指令来执行如本发明实施例第二方面任一方法中所描述的部分或全部步骤。
本发明第七方面的实施例提供了一种基于流媒体技术的视频数据的处理方法,所述方法包括:
服务器接收客户端发送的视频数据获取请求,所述获取请求包括空间对象的信息;
所述服务器根据所述空间对象的信息确定至少两个媒体表示对应的视频数据;
所述服务器将所述至少两个媒体表示对应的视频数据封装成一个码流;
所述服务器将所述码流向所述客户端发送。
在一种可能的实现方式中,所述空间对象的信息包括如下的至少一种:
媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
在一种可能的实现方式中,所述码流包括如下的至少一种信息:
媒体表示的数量;
媒体表示在所述码流中的起始位置偏移;
媒体表示的数据量信息;
媒体表示对应的空间位置信息;
媒体表示的视频流复用类型;
或者
媒体表示的分辨率信息。
在一种可能的实现方式中,述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
本发明第八方面的实施例一种基于流媒体技术的视频数据的处理方法,所述方法包括:
客户端向服务器发送的视频数据获取请求,所述获取请求包括空间对象的信息;
所述客户端接收所述服务器响应所述视频数据获取请求后发送的码流,其中,所述码流包括至少两个媒体表示的数据。
在一种可能的实现方式中,所述空间对象的信息包括如下的至少一种:
媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
在一种可能的实现方式中,所述码流包括如下的至少一种信息:
媒体表示的数量;
媒体表示在所述码流中的起始位置偏移;
媒体表示的数据量信息;
媒体表示对应的空间位置信息;
媒体表示的视频流复用类型;
或者
媒体表示的分辨率信息。
在一种可能的实现方式中,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
本发明第九方面的实施例一种基于流媒体技术的服务器,所述服务器包括:
接收器,用于接收客户端发送的视频数据获取请求,所述获取请求包括空间对象的信息;
处理器,用于根据所述空间对象的信息确定至少两个媒体表示对应的视频数据;
所述处理器还用于将所述至少两个媒体表示对应的视频数据封装成一个码流;
发送器,用于将所述码流向所述客户端发送。
在一种可能的实现方式中,所述空间对象的信息包括如下的至少一种:
媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
在一种可能的实现方式中,所述码流包括如下的至少一种信息:
媒体表示的数量;
媒体表示在所述码流中的起始位置偏移;
媒体表示的数据量信息;
媒体表示对应的空间位置信息;
媒体表示的视频流复用类型;
或者
媒体表示的分辨率信息。
在一种可能的实现方式中,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
本发明第十方面的实施例一种基于流媒体技术的客户端,所述客户端包括:
发送器,用于向服务器发送的视频数据获取请求,所述获取请求包括空间对象的信息;
接收器,用于接收所述服务器响应所述视频数据获取请求后发送的码流,其中,所述码流包括至少两个媒体表示的数据。
在一种可能的实现方式中,所述空间对象的信息包括如下的至少一种:
媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
在一种可能的实现方式中,所述码流包括如下的至少一种信息:
媒体表示的数量;
媒体表示在所述码流中的起始位置偏移;
媒体表示的数据量信息;
媒体表示对应的空间位置信息;
媒体表示的视频流复用类型;
或者
媒体表示的分辨率信息。
在一种可能的实现方式中,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
实施本发明实施例,具有如下有益效果:
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角 位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的码流分段的切换的示意图;
图2为本发明实施例提供的mdp文件的分级式结构图;
图3是本发明实施例提供的空间对象的空间关系示意图;
图4是本发明实施例提供的视频流传输系统的网络架构示意图;
图5是本发明实施例提供的一种视频流传输方法的流程示意图;
图6是本发明实施例提供的360度视角变化图;
图7是本发明实施例提供的球面到经纬图的映射;
图8是本发明实施例提供的另一种视频流传输方法的流程示意图;
图9是本发明实施例提供的复用后的视频流示意图;
图10是本发明实施例提供的视频流中的样本交织复用示意图;
图11是本发明实施例提供的一种客户端的结构示意图;
图12是本发明实施例提供的一种服务器的结构示意图;
图13是本发明实施例提供的另一种客户端的结构示意图;
图14是本发明实施例提供的另一种服务器的结构示意图;
图15是本发明实施例提供的一种视频流传输系统的结构示意图。
图16是本发明实施例的分段复用的系统架构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而 不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
1)客户端,可以是以软件或APP的形式安装于终端设备上,也可以是以系统内部的固有功能组件的形式存在于终端设备(如带有VR视频观看的终端设备)上的客户端,即本发明中所提及的客户端是指已成功安装客户端的终端设备。而终端设备则包括但不限于可以进行VR视频观看体验的各种形式的用户设备(User Equipment,UE),如接入终端、终端设备、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置、蜂窝电话、无绳电话、智能手机、平板电脑、会话启动协议(Session Initiation Protocol,SIP)电话、无线本地环路(Wireless Local Loop,WLL)站、智能手环、智能穿戴设备(如智能眼镜、智能头盔等)、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面3)播放器、个人数字处理(Personal Digital Assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备以及未来5G网络中的终端设备等。
2)、服务器,可以存储大量的VR视频文件、完成与客户端的请求交互,实现对VR视频的编码、解码以及复用等处理操作的云服务设备、终端设备、或核心网设备等。
3)“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
下面结合附图对本申请的实施例进行描述。
为了便于理解本发明实施例,下面先对本发明实施例所基于的视频流传输系统的网络架构进行描述。图4是本发明实施例提供的视频流传输系统的网络架构示意图,请参阅图4,该系统中包括客户端和服务器,其中,用户可以利用客户端通过有线网络或者是无线网络向服务器发起用VR视频请求,服务器接收到该请求后则响应该VR视频请求向客户端反馈相应的VR视频内容,最终,客户端对反馈的VR视频内容进解析,并为用户呈现VR视频效果,即用户通过客户端与服务器之间的视频流交互实现VR视频的体验。
在本发明各实施例中,客户端的功能包括但不限于:根据客户端当前的视角位置信息向客户端发送VR视频请求,该请求携带了客户端的视角信息、复用描述信息等。服务器的功能包括但不限于:管理VR视频的所有媒体流文件的描述信息,描述信息包括视频流的内容在VR视频中的空间位置信息;获取客户端的请求信息,并解析请求中所携带的视角信息;根据视角信息读取视角对应的视频流;将用户视角所涉及的视频流封装复用,封装复用后的文件中包含有各个视角的复用描述信息。可以理解的是,服务器也可以是内容分发网络(Content Delivery Network,CDN)中的一个逻辑模块。可以理解的是,以上网络架构只是本发明实施例中的其中一种实施方式,本发明实施例中的网络架构包括但不仅限于以上网络架构,只要能够实现本发明中的视频流传输方法的网络架构均属于本发明所保护和涵盖的范围。
参见图5,图5是本发明实施例提供的一种视频流传输方法的流程示意图。下面将结合附图5从客户端和服务器的交互侧对本发明实施例中的视频 流传输方法进行详细描述。该方法可以包括以下步骤S501-步骤S505。
步骤S501:客户端向服务器发送目标请求,服务器接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息,媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息中的至少一种。
具体地,空间对象(Spatial Objects)是一个内容成分的一部分空间,即一个内容成分由多个空间对象组成,通俗来讲,当应用于VR视频中时,则可以理解为一个VR视频可以由多个视角对应的子视频所组成。媒体表示的标识是子视频流的标识,客户端的用户的视角信息是空间对象信息,即在本发明实施例中,VR内容成分可以是VR视频,目标空间对象可以是在VR视频中用户需要呈现的视角部分,可以称之为感兴趣区域(Region of Interest,ROI),在机器视觉、图像处理中,将被处理的图像以方框、圆、椭圆、不规则多边形等方式勾勒出需要处理的区域,称为感兴趣区域。
由于VR视频和通常的视频的差别就是,通常的视频都是整个视频内容都会被呈现给用户,而VR视频只有整个视频的一个子集(子视频)呈现,即当VR视频(或者360度视频,或者全方位视频(Omnidirectional video))在头带设备和手持设备上呈现时,只有VR视频中对应于用户头部的方位的面积和相关联的音频的部分会最终呈现。因此,目标请求中携带的目标空间位置信息则可以认为是用户当前感兴趣需要呈现的视角区域。可以理解的是,该目标请求可以是由客户端的角度移动动作触发的,也可以是用户的相关输入指令等触发的,本发明对此不作具体限定。
例如,用户可以360度的观看VR视频,但是在每个时刻上用户观看的视频显示区域只是VR视频的一部分,所以在内容准备时,会将VR划分成多个区域,每个区域对应一组自适应的码流,客户端根据用户观看的区域选择对应的视频码流接收观看。如图6所示,图6是本发明实施例提供的360度视角变化图,在图6中,左边框和右边框中的内容分别是用户的两个视角区域,用户在观看视频的时候,用户通过某种操作(如转动智能头盔),视角左边框转变为右边框,当用户的视角转换到右边框后,客户端也要呈现响应的视角区域的视频内容;用户观看内容的视角位置是任意的,那么就存在用户观看某个视角时,该视角的内容会出现在多个VR划分的区域中,用户 需要获取更多区域的视频流。可以理解的是,在现有的VR的2D图像映射中,除了将图6中的球面映射为经纬图,还可以将球面映射为立方体,多面体等其它集合体,在下面的描述中以经纬度为图像的2D的图像映射方式描述为主,但是其它的映射方式也属于本发明所保护涵盖的范围。
如图7所示,图7是本发明实施例提供的球面到经纬图的映射,在该图中假设目标空间位置信息是视角区域的左上方位置在VR视频中的坐标和视角区域的宽高信息,例如为图6中右边部分的左上位置在经纬图中的坐标为(x,y),右边框的长宽是(w,h),则在客户端的请求中会携带x,y,w,h的值,或者是x,y,w,h的等比缩放值,也可以是在球体中的角度值。
在一种可能的实现方式中,目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。即客户端在发起视频请求时,还可以携带一些自身的视频播放条件或播放性能等的相关参数,以便于服务器以更合适的处理方式,进行视频流的处理和反馈。
步骤S502:所述服务器根据所述目标空间位置信息在所述VR内容成分中查找对应的目标空间对象。
具体地,服务器根据接收到的目标请求中的目标空间位置信息,在VR成分内容中查找出相对应的目标空间对象,以便于后续获得该目标空间对象对应的视频流。例如,服务器接收到客户端的目标请求后,解析目标请求获得客户端请求的视角信息,根据客户端的视角信息,从媒体呈现描述信息中获得和客户端视角区域有重叠内容的视频流。
步骤S503:所述服务器获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息。
具体地,服务器在确定了目标空间对象(例如用户的目标视角)后,获取将该目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息。需要说明的是,对目标空间对象对应的原始视频流进行预设复用处理的操作,可以是服务器在接收到目标请求之前就已经完成的,也可以是在接收到目标请求之后才进行的处理。若是在接收到目标请求之前就完成的,便可以节省响应请求的时间,即在确定了目标空间对象之后,就直接获取预先处理好的该目标空间对象对应的原始视频流进行了预设复用处理的复用视 频流信息,提高服务器的响应速率,缩短响应时间,提升用户的观看体验;若是在接收到目标请求之后才进行的预设复用处理,则需要耗费一定的复用处理时间,但是可以节省预先需要进行大量预设复用处理所需要的存储空间;当然也可以是上述两种方式的相结合,即将部分用户可能需要常用观看的内容,进行提前预设复用处理,而将用户可能不需要观看的内容,在接收到目标请求之后才去进行处理获取,因此本发明对服务器在何时对相关的目标空间对象对应的原始视频流进行的预设复用处理,不做具体限定。
例如,服务器根据和客户端视角区域有重叠内容的视频流的信息,获得对应的视频流,将视频流进行预设复用处理;如图6经纬图中右边框中是客户端请求中的视角区域,A到I区域是媒体呈现描述信息中描述的9个区域,码流复用模块根据右边框的信息和A到I的9个区域的位置信息,可以推导出右边框内容区域覆盖B,C,E,F四个区域;码流复用模块从码流获取模块中获得B,C,E,F四个区域对应的视频码流,将4个视频流进行复用,在复用后的视频流中包含所复用的视频流描述信息,该视频流描述信息包含有以下部分或者全部信息:复用视频流中的视频流的个数,每个被复用的视频流的空间区域位置信息,每个被复用的视频流的分辨率信息,每个被复用视频流在复用视频流中的存储位置信息,视频流复用类型,各视角所对应的视频源的分辨率信息。而具体的预设复用处理可以是所复用的视频流的码流文件在复用文件中二进制首尾拼接,或者样本交织存储等处理方式。
在一种可能的实现方式中,复用视频流信息包括N个子视频流分别经过预设复用处理得到的N个被复用的子视频流的信息,N个子视频流与N个被复用的子视频流为一一对应关系。其中,N个子视频流是将目标空间对象划分为N个子空间对象,并将原始视频流按照N个子空间对象进行划分生成的对应的子视频流,N为大于1的自然数;目标请求反馈还包括复用描述信息,复用描述信息包括以下项中的至少一项:复用视频流信息中包含的子视频流的个数N;N个子视频流中的起始子视频流在复用视频流信息中的起始位置偏移;N个被复用的子视频流的大小;N个被复用的子视频流分别在VR内容成分中所对应的空间位置信息;N个被复用的子视频流的分辨率信息;N个被复用的子视频流的视频流复用类型;N个子视频流的分辨率信息,即复用描述信息中携带的各类信息的作用是便于让客户端根据复用描述信息完成用 户所请求的VR视频的解析和呈现。
步骤S504:所述服务器向所述客户端发送响应所述目标请求的目标请求反馈,所述客户端接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息。
具体地,由于在现有技术中,用户请求获取的内容,服务器会直接返回对应的视频流,因此可能存在大量的视频流编码冗余,特别是对于在某些VR视频场景中,存在一些重复的场景,比如在旅游观光的VR体验场景中,可能天空的颜色,或者河流的颜色和纹理基本一致,因此可以将该部分重复内容进行复用,以节省视频流的传输带宽、时间和效率。
在一种可能的实现方式中,所述复用描述信息还包括:所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。复用描述信息中包含了多个被复用的子视频流的具体空间位置信息,因此,客户端可以根据复用描述信息中的该信息,最终解析呈现用户需要观看的VR视频。
在一种可能的实现方式中,所述复用描述信息还包括:所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。因此,客户端可以根据视频流的空间位置信息,随时获知已经请求过哪些视角的子视频流,以便于后续再有重复的视角内容需要观看时,无需再重复请求,以提升VR视频传输效率,和用户体验。
在一种可能的实现方式中,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。即预设复用处理方式可以包括多种,本发明对此不作具体限定。
步骤S505:所述客户端根据所述复用视频流信息进行视频解析呈现。
具体地,客户端根据接收到的服务器发送的目标请求反馈中携带的复用视频流信息对相关的视频流进行解析,并最终呈现。例如,客户端获得复用后的视频流,解析复用后的视频流中的所复用的视频流描述信息,将视频流送入解码器解码,将解码后的视频流视频内容按照所复用的视频流描述信息中描述的信息进行呈现。在本发明中,除了复用各个视角对应的视频流,还可以包括其它需要传输到客户端的视频流。
本发明实施例,服务器根据客户端的请求信息中的视角位置信息,将该 视角位置信息所涉及到的视频流进行复用封装,传输到客户端;该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
参见图8,图8是本发明实施例提供的另一种视频流传输方法的流程示意图。下面将结合附图8从客户端和服务器的交互侧对本发明实施例中的另一种视频流传输方法进行详细描述,该方法可以包括以下步骤S801-步骤S805。
图8提供的实施例中的步骤S801-步骤S802分别与图2提供的实施例中的步骤S501-步骤S502相同,具体的实现方式,这里不再赘述。
步骤S803:所述服务器将所述目标空间对象划分为N个子空间对象,并所述N个子空间对象进行编码生成对应的N个子视频流,所述N为大于1的自然数。
具体地,将目标空间对象划分为多个子空间对象,以便于更细化地复用多个空间对应的不同子视频流,进一步提升视频流的复用效率。其中,将目标空间对象划分为N个子空间对象的原则,可以是按照空间位置的连续性来划分,也可以是按照视频的内容或者重叠性来划分。
步骤S804:所述服务器获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
具体地,服务器确定了多个子空间对象后,获取对该多个子空间对象进行了预设复用处理从而得到的多个被复用的子视频流的信息,以最终以较小的码率传输给客户端,节省带宽提升传输效率。可以理解的是,可以是预先就进行了预设复用处理,也可以是确定了子空间后才进行的预设复用处理。
图8提供的实施例中的步骤S805-步骤S806分别与图5提供的实施例中的步骤S504-步骤S505相同,具体的实现方式,这里不再赘述。
进一步地,上述实施例中所涉及的使用预设复用处理视频流的相关复用描述信息的几种描述方式,可以通过以下具体的描述方式中的任意一种实现:
描述方式一:
Figure PCTCN2016101920-appb-000006
其中
FOVCount:复用视频流信息中的子视频流的个数
first_offset:复用视频流信息中的第一个视角的子视频流在复用视频流中的偏移
FOV_size:每个被复用的子视频流在复用视频流中的大小
1、客户端接收到复用视频流信息,解析‘fovm’中的所复用的视频流描述信息,获得子视频流的个数,子视频流的偏移和大小信息;
2、根据子视频流个数,初始化多个视频流的解码器;
3、根据每个子视频流的偏移和数据量信息,将获取到的被复用视频流,解复用,获得每个视频流的数据送入对应的视频流的解码器进行解码和呈现。
如图9所示,图9为本发明实施例提供的复用后的视频流示意图,图9中视频1到视频n是同时间段内的视频内容,其它数据可以不存在;
first_offset是视频1的起始位置偏移。
描述方式二:
Figure PCTCN2016101920-appb-000007
Figure PCTCN2016101920-appb-000008
x:N个被复用的子视频流分别在VR内容成分中对应的x轴位置信息
y:N个被复用的子视频流分别在VR内容成分中对应的y轴位置信息
w:N个被复用的子视频流的宽
h:N个被复用的子视频流的高
客户端接收到复用流的行为:
1、前三个步骤和描述方式一中的客户端行为一致,第四步中的呈现是按照fovm中的xywh信息将解码后的图像拼接呈现;
步骤五:结合客户端请求携带的ROI信息,呈现拼接好的视频流内容。
描述方式三:
Figure PCTCN2016101920-appb-000009
ROI_x:客户端请求的N个子视频流分别在VR内容成分中对应的x轴位置信息
ROI_y:客户端请求的N个子视频流分别在VR内容成分中对应的y轴位置信息
ROI_w:客户端请求的N个子视频流的宽
ROI_h:客户端请求的N个子视频流的高
本描述方式中新增的ROI信息可以和描述方式一和二中的信息一同使用;
1、第一到四的步骤和描述方式二中的客户端行为一致,第五步骤描述为:在多个视角拼接好的内容里,将ROI_x,ROI_y,ROI_w,ROI_h指定的区域的视频内容呈现;
描述方式四:
Figure PCTCN2016101920-appb-000010
Figure PCTCN2016101920-appb-000011
MultiplexType:所复用的视频流的码流文件在复用文件中的复用方式:视频流码流(或者码流分段)二进制首尾拼接,或者各个视频流中的样本交织复用
sample_offset:样本在复用文件中的偏移
sample_size:样本的大小;
在本样例中的语法可以和上述的描述方式一,二和三一同使用
客户端接收到复用视频流的行为:
1、客户端接收到复用视频流,解析‘fovm’中的所复用的视频流描述信息,获得复用视频流的方式信息;
2、客户端根据复用方式信息,判断获取各个视角数据的复用方式;如果复用方式为首尾拼接方式,那么客户端解析偏移信息和数据量信息,将每个视角的数据送入解码器;如果样本交织方式,那么客户端解析每个样本的偏移和数据量信息,将每个样本送入对应的解码器。如图10所示,图10为本发明实施例提供的视频流中的样本交织复用示意图,图中可将不同视角的时频(视频1、2和3)通过交织的方式进行复用。方格线、斜线以及竖线对应的子视频流可以经过交织的方式进行复用,复用结果为图10右侧的复用视频流。
描述方式五:
Figure PCTCN2016101920-appb-000012
Figure PCTCN2016101920-appb-000013
或者
Figure PCTCN2016101920-appb-000014
在本描述方式中增加了每个视频流对应的视频源的分辨率信息,source_w和source_h分别是视角对应的视频源的宽高。
在上述的五种描述方式中涉及到的各个空间位置的语法,可以是VR视频内容中的绝对位置信息,也可以是比例值,或者是偏航角度。
在本发明实施例中,客户端的请求信息采用HTTP协议,在http的get请求中,携带客户端的视角区域信息,比如实施例中提及的x,y,w,h。
在本发明实施例中,客户端的请求信息中,还可以携带客户端的带宽, 客户端支持的解码标准或者视频最大分辨率等信息;服务器根据请求携带的信息,选择符合客户端性能要求的视频流进行复用传输。
在本发明实施例中,复用的视频流数据可以是DASH协议中的segment媒体数据。
在本发明实施例中,复用所涉及到的码流可以包含服务器侧所产生的码流的内容和客户端所请求的内容区域有部分或者全部重叠的码流。
本发明实施例,保留了图5实施例中的方法和相对应的有益效果,并且详细讲述了多种预设复用处理的具体实现方式,进一步增强了本发明的可实施性,且更加完善的提升了视频流传输的效率。
为了便于更好地实施本发明实施例中图5和图8对应的视频流传输方法,本发明还提供了用于实现实施上述方法的相关设备。
请参见图11,图11是本发明实施例提供的一种客户端的结构示意图,如图11所示,客户端10包括:请求模块101、接收模块102、处理模块103。
请求模块101,用于向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
接收模块102,用于接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
处理模块103,用于根据所述复用视频流信息进行视频解析呈现。
具体地,所述复用视频流信息包括N个子视频流分别经过所述预设复用处理得到的N个被复用的子视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象,并将所述N个子空间对象进行编码生成的对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的数据量;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型。
进一步地,所述复用描述信息还包括:
所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述复用描述信息还包括:
所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,其特征在于,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。
再进一步地,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
可理解的是,客户端10中各模块的功能可对应参考上述图5至图10中的各方法实施例中的具体实现方式,这里不再赘述。
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
请参见图12,图12是本发明实施例提供的一种服务器的结构示意图,如图12所示,服务器20包括:接收模块201、解析模块202、获取模块203和反馈模块204。
接收模块201,用于接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
解析模块202,用于根据所述目标空间位置信息在所述VR内容成分中 查找对应的目标空间对象;
获取模块203,用于获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
反馈模块204,用于向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。
具体地,获取模块203,包括:
划分单元,用于将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成对应的N个子视频流,所述N为大于1的自然数;
获取单元,用于获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
进一步地,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的大小;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型。
再进一步地,所述复用描述信息还包括:
所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述复用描述信息还包括:
所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
可理解的是,服务器20中各模块的功能可对应参考上述图5至图10中的各方法实施例中的具体实现方式,这里不再赘述。
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
参见图13,图13是本发明实施例提供的另一种客户端的结构示意图,如图13所示,客户端30包括处理器301、存储器302和收发器303。其中处理器301、存储器302和收发器303可以通过总线或其他方式连接。
可选的,客户端30还可以包括网络接口304和电源模块305。
其中,处理器301可以是数字信号处理(Digital Signal Processing,DSP)芯片。
存储器302用于存储指令,具体实现中,存储器302可以采用只读存储器(英文:Read-Only Memory,简称:ROM)或随机存取存贮器(英文:Random Access Memory,简称:RAM),在本发明实施例中,存储器302用于存储视频流传输程序代码。
收发器303用于收发信号。
网络接口304用于客户端30与其他设备进行数据通信。该网络接口304可以为有线接口或无线接口。
电源模块305用于为客户端30的各个模块供电。
处理器301用于调用存储器302中存储的指令来执行如下操作:
通过收发器303向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
通过收发器303接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
根据所述复用视频流信息进行视频解析呈现。
具体地,所述复用视频流信息包括N个子视频流分别经过所述预设复用 处理得到的N个被复用的子视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成的对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的大小;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型。
进一步地,所述复用描述信息还包括:
所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述复用描述信息还包括:
所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。
再进一步地,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
需要说明的是,本发明实施例所描述的客户端30中各功能模块的功能可参见上述图5至图10中所示实施例中对应的客户端的相关描述,此处不再赘述。
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进 行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
参见图14,图14是本发明实施例提供的另一种服务器的结构示意图。如图14所示,服务器40包括处理器401、存储器402和收发器403。其中处理器401、存储器402和收发器403可以通过总线或其他方式连接。
可选的,服务器40还可以包括网络接口404和电源模块405。
其中,处理器401可以是数字信号处理(Digital Signal Processing,DSP)芯片。
存储器402用于存储指令,具体实现中,存储器402可以采用只读存储器(英文:Read-Only Memory,简称:ROM)或随机存取存贮器(英文:Random Access Memory,简称:RAM),在本发明实施例中,存储器402用于存储视频流传输程序代码。
收发器403用于收发信号。
网络接口404用于服务器40与其他设备进行数据通信。该网络接口404可以为有线接口或无线接口。
电源模块405用于为服务器40的各个模块供电。
处理器401用于调用存储器402中存储的指令来执行如下操作:
通过收发器403接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
根据所述目标空间位置信息在所述VR内容成分中查找对应的目标空间对象;
获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
通过收发器403向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。
具体地,处理器401用于获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息,具体为:
将所述目标空间对象划分为N个子空间对象,并将所述N个子空间对象 进行编码生成对应的N个子视频流,所述N为大于1的自然数;
获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
进一步地,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
所述复用视频流信息中包含的所述子视频流的个数N;
所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
所述N个被复用的子视频流的数据量;
所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
所述N个被复用的子视频流的分辨率信息;
所述N个被复用的子视频流的视频流复用类型;
所述N个子视频流的分辨率信息。
再进一步地,所述复用描述信息还包括:
所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述复用描述信息还包括:
所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
再进一步地,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
需要说明的是,本发明实施例所描述的服务器40中各功能模块的功能可参见上述图5至图10中所示实施例中对应的服务器的相关描述,此处不再赘述。
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到 达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
图15是本发明实施例提供的一种视频流传输系统的结构示意图,视频流传输系统50包括VR客户端501与VR服务器502,其中
VR客户端501可以为上述图13实施例中的客户端30,VR服务器502可以为上述图14实施例中的服务器40。可理解的是,本发明实施例中的视频流传输系统50还可以包括摄影设备、存储设备、路由设备、交换设备和核心网服务器等设备。
本发明实施例,通过服务器根据客户端的请求信息中的视角位置信息,将该视角位置信息所涉及到的视频流进行复用封装,传输到客户端,该视角位置信息所涉及到的视频流是指视频内容中存在和客户端所请求的视角范围的内容有部分或者全部重叠的视频流。即服务器通过将请求响应的视频流进行预设复用处理来响应客户端的请求,既减少了客户端的请求个数,也减少了服务器的响应个数,而且保证了同时刻的各个视角的视频流信息的同时到达,减少了等待所有视频流都分别接收到的时间,从而减少视角的呈现时延。
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任意一种视频流传输方法的部分或全部步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以 是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本发明各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(英文:Read-Only Memory,缩写:ROM)或者随机存取存储器(英文:Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (41)

  1. 一种视频流传输方法,其特征在于,包括:
    客户端向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
    所述客户端接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
    所述客户端根据所述复用视频流信息进行视频解析呈现。
  2. 如权利要求1所述的方法,其特征在于,所述复用视频流信息包括N个子视频流分别经过所述预设复用处理得到的N个被复用的子视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成的对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
    所述复用视频流信息中包含的所述子视频流的个数N;
    所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
    所述N个被复用的子视频流的大小;
    所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
    所述N个被复用的子视频流的分辨率信息;
    所述N个被复用的子视频流的视频流复用类型;
    所述N个子视频流的分辨率信息。
  3. 如权利要求2所述的方法,其特征在于,所述复用描述信息还包括:
    所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
  4. 如权利要求2或3所述的方法,其特征在于,所述复用描述信息还包括:
    所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
  5. 如权利要求1-4任意一项所述的方法,其特征在于,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。
  6. 如权利要求1-5任意一项所述的方法,其特征在于,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
  7. 一种视频流传输方法,其特征在于,包括:
    服务器接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
    所述服务器根据所述目标空间位置信息在所述VR内容成分中查找对应的目标空间对象;
    所述服务器获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
    所述服务器向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。
  8. 如权利要求7所述的方法,其特征在于,所述服务器获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息,包括:
    所述服务器将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成对应的N个子视频流,所述N为大于1的自然数;
    所述服务器获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
  9. 如权利要求8所述的方法,其特征在于,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
    所述复用视频流信息中包含的所述子视频流的个数N;
    所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
    所述N个被复用的子视频流的大小;
    所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
    所述N个被复用的子视频流的分辨率信息;
    所述N个被复用的子视频流的视频流复用类型;
    所述N个子视频流的分辨率信息。
  10. 如权利要求9所述的方法,其特征在于,所述复用描述信息还包括:
    所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
  11. 如权利要求9或10所述的方法,其特征在于,所述复用描述信息还包括:
    所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
  12. 如权利要求7-11任意一项所述的方法,其特征在于,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
  13. 一种客户端,其特征在于,包括:
    请求模块,用于向服务器发送目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目标空间位置信息;
    接收模块,用于接收服务器响应所述目标请求的目标请求反馈,所述目标请求反馈包括将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
    处理模块,用于根据所述复用视频流信息进行视频解析呈现。
  14. 如权利要求13所述的客户端,其特征在于,所述复用视频流信息包括N个子视频流分别经过所述预设复用处理得到的N个被复用的子 视频流的信息,其中,所述N个子视频流是将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成的对应的子视频流,所述N为大于1的自然数;所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
    所述复用视频流信息中包含的所述子视频流的个数N;
    所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
    所述N个被复用的子视频流的大小;
    所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
    所述N个被复用的子视频流的分辨率信息;
    所述N个被复用的子视频流的视频流复用类型;
    所述N个子视频流的分辨率信息。
  15. 如权利要求14所述的客户端,其特征在于,所述复用描述信息还包括:
    所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
  16. 如权利要求14或15所述的客户端,其特征在于,所述复用描述信息还包括:
    所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
  17. 如权利要求13-16任意一项所述的客户端,其特征在于,所述目标请求还包括感兴趣区域ROI信息、所述客户端的带宽信息、所述客户端支持的解码标准信息和所述客户端的视频最大分辨率信息中的至少一项。
  18. 如权利要求13-17任意一项所述的客户端,其特征在于,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
  19. 一种服务器,其特征在于,包括:
    接收模块,用于接收客户端发送的目标请求,所述目标请求包括所述客户端请求需要呈现的目标空间对象在虚拟现实VR内容成分中对应的目 标空间位置信息;
    解析模块,用于根据所述目标空间位置信息在所述VR内容成分中查找对应的目标空间对象;
    获取模块,用于获取将所述目标空间对象对应的原始视频流进行了预设复用处理得到的复用视频流信息;
    反馈模块,用于向所述客户端发送响应所述目标请求的目标请求反馈,所述目标请求反馈包括所述复用视频流信息。
  20. 如权利要求19所述的服务器,其特征在于,所述获取模块,包括:
    划分单元,用于将所述目标空间对象划分为N个子空间对象,并将所述原始视频流按照所述N个子空间对象进行划分生成对应的N个子视频流,所述N为大于1的自然数;
    获取单元,用于获取所述N个子视频流分别进行了所述预设复用处理得到的N个被复用的子视频流的信息。
  21. 如权利要求20所述的服务器,其特征在于,所述目标请求反馈还包括复用描述信息,所述复用描述信息包括以下项中的至少一项:
    所述复用视频流信息中包含的所述子视频流的个数N;
    所述N个子视频流中的起始子视频流在所述复用视频流信息中的起始位置偏移;
    所述N个被复用的子视频流的大小;
    所述N个被复用的子视频流分别在所述VR内容成分中所对应的空间位置信息;
    所述N个被复用的子视频流的分辨率信息;
    所述N个被复用的子视频流的视频流复用类型;
    所述N个子视频流的分辨率信息。
  22. 如权利要求21所述的服务器,其特征在于,所述复用描述信息还包括:
    所述N个被复用的子视频流分别在所述VR内容成分中对应的空间位置信息。
  23. 如权利要求21或22所述的服务器,其特征在于,所述复用描述信息还包括:
    所述N个子视频流分别在所述VR内容成分中对应的空间位置信息。
  24. 如权利要求19-23任意一项所述的服务器,其特征在于,所述预设复用处理包括视频流二进制首尾拼接处理或视频分段二进制首尾拼接或样本交织复用处理。
  25. 一种视频流传输系统,包括VR客户端和VR服务器,其中
    所述VR客户端为如权利要求13-18任意一项所述的客户端;
    所述VR服务器为如权利要求19-24任意一项所述的服务器。
  26. 一种基于流媒体技术的视频数据的处理方法,其特征在于,所述方法包括:
    服务器接收客户端发送的视频数据获取请求,所述获取请求包括空间对象的信息;
    所述服务器根据所述空间对象的信息确定至少两个媒体表示对应的视频数据;
    所述服务器将所述至少两个媒体表示对应的视频数据封装成一个码流;
    所述服务器将所述码流向所述客户端发送。
  27. 根据权利要求26所述的视频数据的处理方法,其特征在于,所述空间对象的信息包括如下的至少一种:
    媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
  28. 根据权利要求26或27所述的视频数据的处理方法,其特征在于,所述码流包括如下的至少一种信息:
    媒体表示的数量;
    媒体表示在所述码流中的起始位置偏移;
    媒体表示的数据量信息;
    媒体表示对应的空间位置信息;
    媒体表示的视频流复用类型;
    或者
    媒体表示的分辨率信息。
  29. 根据权利要求26或27所述的视频数据的处理方法,其特征在于,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
  30. 一种基于流媒体技术的视频数据的处理方法,其特征在于,所述方法包括:
    客户端向服务器发送的视频数据获取请求,所述获取请求包括空间对象的信息;
    所述客户端接收所述服务器响应所述视频数据获取请求后发送的码流,其中,所述码流包括至少两个媒体表示的数据。
  31. 根据权利要求30所述的视频数据的处理方法,其特征在于,所述空间对象的信息包括如下的至少一种:
    媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
  32. 根据权利要求30或31所述的视频数据的处理方法,其特征在于,所述码流包括如下的至少一种信息:
    媒体表示的数量;
    媒体表示在所述码流中的起始位置偏移;
    媒体表示的数据量信息;
    媒体表示对应的空间位置信息;
    媒体表示的视频流复用类型;
    或者
    媒体表示的分辨率信息。
  33. 根据权利要求30或31所述的视频数据的处理方法,其特征在于,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
  34. 一种基于流媒体技术的服务器,其特征在于,所述服务器包括:
    接收器,用于接收客户端发送的视频数据获取请求,所述获取请求包括空间对象的信息;
    处理器,用于根据所述空间对象的信息确定至少两个媒体表示对应的 视频数据;
    所述处理器还用于将所述至少两个媒体表示对应的视频数据封装成一个码流;
    发送器,用于将所述码流向所述客户端发送。
  35. 根据权利要求34所述的服务器,其特征在于,所述空间对象的信息包括如下的至少一种:
    媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
  36. 根据权利要求34或35所述的服务器,其特征在于,所述码流包括如下的至少一种信息:
    媒体表示的数量;
    媒体表示在所述码流中的起始位置偏移;
    媒体表示的数据量信息;
    媒体表示对应的空间位置信息;
    媒体表示的视频流复用类型;
    或者
    媒体表示的分辨率信息。
  37. 根据权利要求34或35所述的服务器,其特征在于,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
  38. 一种基于流媒体技术的客户端,其特征在于,所述客户端包括:
    发送器,用于向服务器发送的视频数据获取请求,所述获取请求包括空间对象的信息;
    接收器,用于接收所述服务器响应所述视频数据获取请求后发送的码流,其中,所述码流包括至少两个媒体表示的数据。
  39. 根据权利要求38所述的客户端,其特征在于,所述空间对象的信息包括如下的至少一种:
    媒体表示的标识,客户端的用户的视角信息或媒体表示的空间信息。
  40. 根据权利要求38或39所述的客户端,其特征在于,所述码流包括如下的至少一种信息:
    媒体表示的数量;
    媒体表示在所述码流中的起始位置偏移;
    媒体表示的数据量信息;
    媒体表示对应的空间位置信息;
    媒体表示的视频流复用类型;
    或者
    媒体表示的分辨率信息。
  41. 根据权利要求38或39所述的客户端,其特征在于,所述码流包括封装标识;所述标识用于指示码流采用的是分段交织的封装方式或者码流采用的是样本交织的封装方式。
PCT/CN2016/101920 2016-10-10 2016-10-12 一种视频流传输方法、相关设备及系统 WO2018068236A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680086678.3A CN109644296A (zh) 2016-10-10 2016-10-12 一种视频流传输方法、相关设备及系统
US16/379,894 US10897646B2 (en) 2016-10-10 2019-04-10 Video stream transmission method and related device and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610886268.9 2016-10-10
CN201610886268 2016-10-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/379,894 Continuation US10897646B2 (en) 2016-10-10 2019-04-10 Video stream transmission method and related device and system

Publications (1)

Publication Number Publication Date
WO2018068236A1 true WO2018068236A1 (zh) 2018-04-19

Family

ID=61905052

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101920 WO2018068236A1 (zh) 2016-10-10 2016-10-12 一种视频流传输方法、相关设备及系统

Country Status (3)

Country Link
US (1) US10897646B2 (zh)
CN (1) CN109644296A (zh)
WO (1) WO2018068236A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110830821A (zh) * 2018-08-14 2020-02-21 海能达通信股份有限公司 基于切片的rtp流传输方法、装置、终端及服务器
CN111131846A (zh) * 2019-12-05 2020-05-08 中国联合网络通信集团有限公司 视频播放方法、多媒体播放设备、边缘服务器和核心网
EP3787305A4 (en) * 2018-05-22 2021-03-03 Huawei Technologies Co., Ltd. PROCESS PLAYING VIDEO IN VR, TERMINAL, AND SERVER
US11024092B2 (en) 2017-02-01 2021-06-01 Pcms Holdings, Inc. System and method for augmented reality content delivery in pre-captured environments
US11991402B2 (en) 2019-03-26 2024-05-21 Interdigital Vc Holdings, Inc. System and method for multiplexed rendering of light fields

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10432688B2 (en) * 2015-03-13 2019-10-01 Telefonaktiebolaget Lm Ericsson (Publ) System and method for optimized delivery of live ABR media
WO2018221211A1 (ja) * 2017-05-30 2018-12-06 ソニー株式会社 画像処理装置および方法、ファイル生成装置および方法、並びにプログラム
US10911812B2 (en) * 2017-09-18 2021-02-02 S2 Security Corporation System and method for delivery of near-term real-time recorded video
KR102030983B1 (ko) * 2017-11-29 2019-10-11 전자부품연구원 분할 영상 기반의 라이브 스트리밍 서비스 운용 방법 및 이를 지원하는 전자 장치
CN111147930A (zh) * 2019-12-30 2020-05-12 上海曼恒数字技术股份有限公司 一种基于虚拟现实的数据输出方法及系统
CN114390335B (zh) * 2020-10-22 2022-11-18 华为终端有限公司 一种在线播放音视频的方法、电子设备及存储介质
CN116821941B (zh) * 2023-08-25 2023-12-19 建信金融科技有限责任公司 数据加密解密方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1234941A (zh) * 1997-07-18 1999-11-10 索尼公司 图像信号复用装置和方法、分解装置和方法及传输媒体
CN101568018A (zh) * 2008-04-22 2009-10-28 中兴通讯股份有限公司 一种无旋转全景摄像装置及其构成的监控系统
CN102737405A (zh) * 2011-03-31 2012-10-17 索尼公司 图像处理设备、图像处理方法和程序
WO2015014773A1 (en) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Providing tile video streams to a client
CN105408916A (zh) * 2013-07-26 2016-03-16 华为技术有限公司 用于自适应流媒体中的空间自适应的系统和方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110011000A (ko) * 2009-07-27 2011-02-08 삼성전자주식회사 3차원 영상 재생을 위한 부가 정보가 삽입된 3차원 영상 데이터스트림 생성 방법 및 그 장치, 3차원 영상 재생을 위한 부가 정보가 삽입된 3차원 영상 데이터스트림 수신 방법 및 그 장치
CN103747283B (zh) * 2013-12-24 2017-02-01 中国科学院声学研究所 视频分片的下载方法
CN104735464A (zh) * 2015-03-31 2015-06-24 华为技术有限公司 一种全景视频交互传输方法、服务器和客户端
CN105554513A (zh) * 2015-12-10 2016-05-04 Tcl集团股份有限公司 一种基于h.264的全景视频传输方法及系统
CN105704501B (zh) * 2016-02-06 2020-04-21 普宙飞行器科技(深圳)有限公司 一种基于无人机全景视频的虚拟现实直播系统
CN105578199A (zh) * 2016-02-22 2016-05-11 北京佰才邦技术有限公司 虚拟现实全景多媒体处理系统、方法及客户端设备
CN105791882B (zh) 2016-03-22 2018-09-18 腾讯科技(深圳)有限公司 视频编码方法及装置
CN105916060A (zh) * 2016-04-26 2016-08-31 乐视控股(北京)有限公司 数据传输的方法、装置及系统
CN105915937B (zh) * 2016-05-10 2019-12-13 上海乐相科技有限公司 一种全景视频播放方法及设备
EP3761645A1 (en) * 2016-05-26 2021-01-06 Vid Scale, Inc. Methods and apparatus of viewport adaptive 360 degree video delivery

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1234941A (zh) * 1997-07-18 1999-11-10 索尼公司 图像信号复用装置和方法、分解装置和方法及传输媒体
CN101568018A (zh) * 2008-04-22 2009-10-28 中兴通讯股份有限公司 一种无旋转全景摄像装置及其构成的监控系统
CN102737405A (zh) * 2011-03-31 2012-10-17 索尼公司 图像处理设备、图像处理方法和程序
CN105408916A (zh) * 2013-07-26 2016-03-16 华为技术有限公司 用于自适应流媒体中的空间自适应的系统和方法
WO2015014773A1 (en) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Providing tile video streams to a client

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11024092B2 (en) 2017-02-01 2021-06-01 Pcms Holdings, Inc. System and method for augmented reality content delivery in pre-captured environments
EP3787305A4 (en) * 2018-05-22 2021-03-03 Huawei Technologies Co., Ltd. PROCESS PLAYING VIDEO IN VR, TERMINAL, AND SERVER
US11765427B2 (en) 2018-05-22 2023-09-19 Huawei Technologies Co., Ltd. Virtual reality video playing method, terminal, and server
CN110830821A (zh) * 2018-08-14 2020-02-21 海能达通信股份有限公司 基于切片的rtp流传输方法、装置、终端及服务器
US11991402B2 (en) 2019-03-26 2024-05-21 Interdigital Vc Holdings, Inc. System and method for multiplexed rendering of light fields
CN111131846A (zh) * 2019-12-05 2020-05-08 中国联合网络通信集团有限公司 视频播放方法、多媒体播放设备、边缘服务器和核心网

Also Published As

Publication number Publication date
US10897646B2 (en) 2021-01-19
US20190238933A1 (en) 2019-08-01
CN109644296A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2018068236A1 (zh) 一种视频流传输方法、相关设备及系统
JP6721631B2 (ja) ビデオの符号化・復号の方法、装置、およびコンピュータプログラムプロダクト
KR102241082B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
JP7058273B2 (ja) 情報処理方法および装置
CN109691094A (zh) 发送全向视频的方法、接收全向视频的方法、发送全向视频的装置和接收全向视频的装置
CN109644262A (zh) 发送全向视频的方法、接收全向视频的方法、发送全向视频的装置和接收全向视频的装置
US20200092600A1 (en) Method and apparatus for presenting video information
KR102157658B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
WO2018058773A1 (zh) 一种视频数据的处理方法及装置
JP7035088B2 (ja) 魚眼ビデオデータのための高レベルシグナリング
CN112219403B (zh) 沉浸式媒体的渲染视角度量
WO2019007120A1 (zh) 一种媒体数据的处理方法和装置
CN113891117B (zh) 沉浸媒体的数据处理方法、装置、设备及可读存储介质
WO2023061131A1 (zh) 媒体文件封装方法、装置、设备及存储介质
EP3637722A1 (en) Method and apparatus for processing media information
WO2018072488A1 (zh) 一种数据处理方法、相关设备及系统
WO2018058993A1 (zh) 一种视频数据的处理方法及装置
WO2023169003A1 (zh) 点云媒体的解码方法、点云媒体的编码方法及装置
WO2023284487A1 (zh) 容积媒体的数据处理方法、装置、设备以及存储介质
WO2023024841A1 (zh) 点云媒体文件的封装与解封装方法、装置及存储介质
WO2023024843A1 (zh) 媒体文件封装与解封装方法、设备及存储介质
JP2023507586A (ja) 3dof構成要素からの6dofコンテンツを符号化、復号化、及びレンダリングするための方法及び装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16918478

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16918478

Country of ref document: EP

Kind code of ref document: A1