WO2020024373A1 - 传输媒体数据的方法、客户端和服务器 - Google Patents

传输媒体数据的方法、客户端和服务器 Download PDF

Info

Publication number
WO2020024373A1
WO2020024373A1 PCT/CN2018/105036 CN2018105036W WO2020024373A1 WO 2020024373 A1 WO2020024373 A1 WO 2020024373A1 CN 2018105036 W CN2018105036 W CN 2018105036W WO 2020024373 A1 WO2020024373 A1 WO 2020024373A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
perspective
sub
video image
current
Prior art date
Application number
PCT/CN2018/105036
Other languages
English (en)
French (fr)
Inventor
方华猛
范宇群
邸佩云
王业奎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/072735 priority Critical patent/WO2020024567A1/zh
Publication of WO2020024373A1 publication Critical patent/WO2020024373A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/64322IP
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]

Definitions

  • the present application relates to the technical field of streaming media transmission, and more particularly, to a method, a client, and a server for transmitting media data.
  • the ISO / IEC 23090-2 standard specification is also called the OMAF (omnidirectional media format) standard specification.
  • This specification defines a media application format that can implement omnidirectional media presentation in applications.
  • Omnidirectional media mainly refers to omnidirectional video (360-degree video) and related audio.
  • the OMAF specification first specifies a list of projection methods that can be used to convert spherical video into two-dimensional video, and secondly, how to use ISO base media file format (ISOBMFF) to store omnidirectional media and the associated media.
  • ISO base media file format ISO base media file format
  • Metadata and how to encapsulate and transmit omnidirectional media data in a streaming media system, such as through dynamic adaptive streaming (hypertext transfer protocol, HTTP) -based dynamic adaptive streaming HTTP, DASH), ISO / IEC 23009-1 standard dynamic adaptive streaming transmission.
  • dynamic adaptive streaming hypertext transfer protocol, HTTP
  • DASH dynamic adaptive streaming HTTP
  • ISO / IEC 23009-1 standard dynamic adaptive streaming transmission.
  • panoramic video With the rapid development of virtual reality (VR) technology, panoramic video has become more and more widely used.
  • VR technology based on 360-degree panoramic video can create a simulation environment and bring interactive 3D dynamics to users.
  • Panoramic video is composed of a series of panoramic images. These panoramic images can be generated by computer rendering, or video images taken by multiple cameras from multiple different angles can be stitched by a stitching algorithm.
  • the image content viewed by the user at each moment only takes up a small part of the entire panoramic image.
  • Users transmit what they watch at every moment.
  • the present application provides a method, a client, and a server for transmitting media data, so as to reduce a transmission delay of transmitting video content in a current perspective range to the client.
  • a method for transmitting media data includes: a client sends first information to a server, the first information is used to indicate spatial information of a region of a first target video image; A video data packet corresponding to the second target video image, in the video data packet corresponding to the second target video image, at least one video data packet carries second information in total, the second information is used to indicate an area of the second target video image Space information.
  • the second target video image includes a first target video image
  • the first target video image includes a video image within a current perspective.
  • the second target video image including the first target video image may specifically mean that the second target video image includes only the first target video image, or may mean that the second target video image includes the first target video image in addition to the first target video image. Also includes other video images. That is, in addition to sending the first target video image to the client, the server may send other video images other than the first target video image to the client.
  • the current perspective may be the perspective when the user views the video using the client.
  • the client may send new first information to the server again to request the video content within the new perspective.
  • the second information carried in the at least one video data packet may refer to that each video data packet in the at least one video data packet carries at least a part of the second information.
  • At least one video data packet as a whole is ⁇ ⁇ Second information.
  • the at least one video data packet may also carry code stream data while carrying the second information in common.
  • the code stream data refers to data obtained by encoding a video image.
  • the area of the first target video image may refer to an area just covered or occupied by the first target video image, that is, the video images in the area of the first target video image belong to the first target video image.
  • the video images in a target video image are all within the area of the first target video image.
  • the second target video image also meets similar requirements.
  • the spatial information of the region of the first target video image may also be referred to as the spatial information of the region of the first target video image.
  • the spatial information of the region of the first target video image is used to indicate the spatial range of the region of the first target video image. Or spatial location.
  • the spatial position of the region of the first target video image may be specific to a coordinate system, and the coordinate system may be a three-dimensional coordinate system or a two-dimensional coordinate system.
  • the origin of the three-dimensional coordinate system may be the center point of the panoramic video image or the upper-left corner point of the panoramic video image or other fixed positions in the panoramic video image. point.
  • the spatial position of the first target video image may also be the position of the first target video image in the panoramic video image area. The spatial location of the target video image).
  • the first target video image or the second target video image is a part of a panoramic video image.
  • the server after the server obtains the first information characterizing the video content in the current perspective, it can send the second target video image and the position-related information of the second target video image required by the user to the client in real time, which can reduce The time delay for the client to obtain the position-related information of the second target video image.
  • the client can enable the client to obtain the position-related information of the second target video image faster, and reduce the transmission delay.
  • the first target video image is a video image within a current perspective
  • the first information includes spatial information of the current perspective
  • the spatial information of the current perspective can also be referred to as the regional spatial information of the current perspective.
  • the client When the first information includes the spatial information of the current perspective, the client directly reports the spatial information of the current perspective through the first information, and the server determines the second target video image according to the spatial information of the current perspective, and sends the second target video image to the client.
  • the video data packet of the target video image directly carries the spatial position information of the target video image.
  • the spatial information of the current perspective includes spatial position information of the current perspective.
  • the spatial position information of the current perspective is any one of the following information: the spherical coordinate value of the center point of the spherical area corresponding to the current perspective, the spherical coordinate value of the upper left corner of the spherical area corresponding to the current perspective, and the current perspective corresponding The plane coordinate value of the center point of the plane area, and the plane coordinate value of the upper left corner of the plane area corresponding to the current viewing angle.
  • the spherical coordinate value of the center point of the spherical area corresponding to the current perspective is (X, Y, Z), where X corresponds to the azimuth or yaw angle of spherical coordinates, and Y corresponds to spherical coordinates
  • the pitch angle (pitch or elevation), Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the plane coordinate value of the center point of the plane area corresponding to the current perspective is (X, Y), where X and Y respectively represent the abscissa of the center point of the plane area corresponding to the current perspective in the two-dimensional rectangular coordinate system. And ordinate.
  • the plane coordinate value of the upper left corner of the plane area corresponding to the current viewing angle is (X, Y), where X and Y respectively indicate that the upper left corner of the plane area corresponding to the current viewing angle is in a two-dimensional rectangular coordinate system Coordinates and ordinates.
  • the spatial position information of the current viewing angle may also be a two-dimensional coordinate value of an upper right corner, a lower left corner, a lower right corner, and any set position of a plane area corresponding to the current viewing angle.
  • the first target video image is a video image within a current perspective
  • the first information includes perspective information of the current perspective
  • the perspective information of the current perspective includes spatial information of the current perspective and viewpoint information of the current perspective.
  • the spatial information of the current viewing angle includes at least one of viewing angle direction information of the current viewing angle, viewing angle direction change speed information of the current viewing angle, and viewing angle coverage information of the current viewing angle.
  • the viewpoint information of the current viewpoint includes at least one of viewpoint position information of the current viewpoint, viewpoint displacement speed information of the current viewpoint, and acceleration information of viewpoint displacement speed of the current viewpoint.
  • the above method further includes: the client sends the first viewpoint information to the server, where the first viewpoint information is used to indicate a current viewpoint where the current perspective is located.
  • the client reports the viewpoint information of the viewpoint where the current perspective is located to the server, so that the client can obtain a video image matching the current viewpoint from the server, which can improve the viewing effect of the user.
  • the first viewpoint information may be a variety of information including the current viewpoint.
  • the first viewpoint information includes at least one of spatial position information of the current viewpoint, position change speed information of the current viewpoint, and acceleration information of position change speed of the current viewpoint.
  • the server can be made to send the video image of the perspective area corresponding to the current viewpoint to the client, so that the client can obtain the video image that matches the current viewpoint.
  • the server can perform predictive rendering and prefetching of the video image to be sent to the client according to the information reported by the client. Delivery can reduce the delay of video image transmission to the client, thereby improving user experience.
  • the spatial position information of the current viewpoint may represent coordinate values of a position where the current viewpoint is located, where the spatial position information of the current viewpoint may be a three-dimensional coordinate system (can be various types of three-dimensional coordinate systems, for example, a Cartesian coordinate system , Spherical coordinate system, etc.).
  • the position of the current viewpoint (also referred to as the position of the current viewpoint) may be a changed position, and the position change speed information of the current viewpoint is information for indicating how fast the position of the current viewpoint changes.
  • the first viewpoint information includes spatial position information of a current viewpoint
  • the foregoing method further includes: the client sends first indication information to the server, the first indication information includes a first identification bit, and the first The value of the flag is used to indicate that the spatial position information of the current viewpoint is relative spatial position information or absolute spatial position information; wherein when the first flag is the first value, the spatial position information of the current viewpoint is relative spatial position information ;
  • the spatial position information of the current viewpoint is absolute spatial position information.
  • the spatial position information of the current viewpoint may be relative position information of the current viewpoint with respect to the starting viewpoint or a specified viewpoint or the previous viewpoint.
  • the spatial position information of the current viewpoint may be the relative of the current viewpoint with respect to a fixed coordinate system (the fixed coordinate system may be a fixed coordinate system set in advance). location information.
  • first identification bit may correspond to one bit or multiple bits.
  • first and second values may be 0 and 1, respectively, or the first and second values may be 1 and 0, respectively.
  • the above-mentioned spatial information of the current perspective may include only the perspective direction information of the current perspective, or may include both the perspective direction information of the current perspective and the position information of the viewpoint where the current perspective is located.
  • the first indication information may be carried in a real-time transmission control protocol RTCP source description report sent by the client to the server.
  • the spatial position information of the current viewpoint is relative spatial position information
  • the amount of data contained in the reported spatial information of the current viewpoint can be reduced, and resource overhead can be reduced.
  • the client may send instruction information to the server to indicate the composition of the spatial information of the current perspective, so that the server can accurately obtain the spatial information of the current perspective.
  • the above method further includes: the client sends second indication information to the server, the second indication information includes a second identification bit, and a value of the second identification bit is used to indicate a space of the current perspective
  • the composition of the information, and the value of the second identification bit is used to indicate at least one of the following situations: when the second identification bit is the third value, the spatial information of the current perspective is composed of the perspective information of the current perspective; When the second identification bit is the fourth value, the spatial information of the current perspective is composed of the perspective information of the current perspective and the position information of the viewpoint where the current perspective is located.
  • the spatial information of the current perspective includes the perspective direction information of the current perspective, the position information of the perspective where the current perspective is located, and the perspective size information of the current perspective (also referred to as the current perspective). Angle of view coverage information).
  • the above-mentioned spatial information of the current perspective may include at least one of perspective direction information of the current perspective, position information of the viewpoint where the current perspective is located, and perspective size information of the current perspective. Any different information combination included in the spatial information of the current perspective can be indicated through different values of the second identification bit.
  • composition of the spatial information of the current perspective may be indicated when the second identification bit has the following different values.
  • the spatial information used to indicate the current view includes the view size information of the current view
  • the spatial information used to indicate the current perspective includes the position information of the perspective where the current perspective is located;
  • the spatial information used to indicate the current perspective includes the position information of the perspective where the current perspective is located and the perspective size information of the current perspective;
  • the spatial information used to indicate the current perspective includes the position information of the perspective where the current perspective is located, the perspective size information of the current perspective, and the perspective direction information of the current perspective.
  • the above-mentioned second indication information may be carried in a real-time transmission control protocol RTCP source description report sent by the client to the server.
  • the client sends third indication information to the server, where the third indication information includes a third identification bit, and the value of the third identification bit is used to indicate that the viewing direction information of the current viewing angle is Relative absolute viewing angle direction information or relative direction information; where when the third flag bit (value) is the sixth value, the viewing angle direction information of the current viewing angle is absolute angle direction information; when the third flag bit (the value of When) is the seventh value, the viewing direction information of the current viewing angle is relative viewing direction information.
  • the third indication information may be carried in a real-time transmission control protocol RTCP source description report sent by the client to the server.
  • the viewing direction information of the current viewing angle may be viewing direction information relative to a fixed coordinate system; when the viewing direction information of the current viewing angle is relative viewing direction information, the current viewing angle
  • the viewing angle direction information of can be viewing angle direction information relative to a certain previous viewing direction direction (for example, a deflection angle with respect to a previous viewing direction direction, or a deflection angle with respect to an initial viewing direction direction).
  • the amount of data contained in the viewing direction information of the current viewing angle can be reduced, and resource overhead can be reduced.
  • the first information, the first viewpoint information, the first indication information, and the second indication information are all carried in a real-time transmission control protocol RTCP source description report sent by the client to the server.
  • the multiple pieces of information can be sent to the server by sending the RTCP source description report at one time. Reduce the number of client-server interactions and reduce resource overhead.
  • the spatial information of the current perspective includes area range information of the current perspective.
  • the area range information of the current viewing angle includes an azimuth range (yaw angle range) and a pitch angle range of the current viewing angle.
  • the area range information of the current perspective includes the width and height of the two-dimensional perspective area corresponding to the current perspective.
  • the area range of the user's perspective can also be fixed.
  • the client only needs to report the area information of the perspective to the server once.
  • the user only needs to report The position information of the current perspective is sufficient, and it is not necessary to repeatedly report the area range information.
  • the area range information of the current perspective may specifically be used to indicate a range of a region where the current perspective is located.
  • the area range information of the current viewing angle includes an azimuth range (yaw angle range) and a pitch angle range of the area of the current viewing angle.
  • the area range information of the current viewing angle includes the width and height of a planar area corresponding to the current viewing angle.
  • the area range information of the current viewing angle is specifically H and V, where H and V represent the azimuth range and the pitch range of the spherical coordinates of the VR, respectively.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, where the current perspective covers at least one sub-region, and the first information is used to indicate the sub-region covered by the current perspective.
  • the sub-area covered by the current perspective is used for stitching to obtain the area of the first target video image.
  • the client When the first information indicates the sub-area covered by the current perspective, the client directly reports the information of the sub-area covered by the current perspective to the server, so that the server can directly obtain the information of the sub-area covered by the current perspective after receiving the first information. And directly determining the second target video image according to the information, can reduce the complexity of the server in determining the second target video image.
  • the sub-region covered by the current perspective includes both the sub-region completely covered by the region of the current perspective and the sub-region partially covered by the region of the current perspective.
  • the area of the current perspective completely covers the sub-region 1, and a part of the sub-regions 2 and 3, then the sub-regions covered by the current perspective include the sub-region 1, sub-region 2 and sub-region 3.
  • the second information includes spatial information of a region of the second target video image.
  • the spatial information of the area of the second target video image includes the spatial position information of the area where the second target video image is located
  • the spatial position information of the area where the second target video image is located is the following information At least one of: a spherical coordinate value of a center point of a spherical area corresponding to the second target video image, a spherical coordinate value of an upper left corner of the spherical area corresponding to the second target video image, and a The plane coordinate value of the center point, and the plane coordinate value of the upper left corner of the plane area corresponding to the second target video image.
  • the spherical coordinate value of the center point of the spherical area corresponding to the second target video image is (X, Y, Z), where X corresponds to the azimuth or yaw, Y of spherical coordinates Corresponds to the pitch angle (pitch or elevation) of spherical coordinates, and Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the spherical coordinates of the center point of the spherical area corresponding to the second target video image are (X, Y), where X and Y respectively represent that the center point of the spherical area corresponding to the second target video image is at a two-dimensional right angle.
  • the spherical coordinate values of the upper left corner of the spherical area corresponding to the second target video image are (X, Y), where X and Y respectively indicate that the upper left corner of the spherical area corresponding to the second target video image is at a two-dimensional right angle.
  • the spatial position information of the area where the second target video image is located may also be the two-dimensional coordinate values of the upper right corner, the lower left corner, the lower right corner, and any set position of the spherical area corresponding to the second target video image.
  • the spatial information of the second target video image includes area range information of the second target video image.
  • the area range information of the second target video image includes an azimuth range (yaw angle range) and a pitch angle range of the area of the second target video image.
  • the area range information of the area range information of the second target video image is specifically H and V, where H and V represent the azimuth range and the elevation range of the spherical coordinates of the VR, respectively.
  • the area range information of the second target video image includes the width and height of the two-dimensional video image corresponding to the second target video image.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, wherein the current perspective covers at least one sub-region, and the second information is used to indicate that the second target video image covers Sub-region, the sub-region covered by the second target video image is used for stitching to obtain the region of the second target video image.
  • the second information includes at least one third information, each video data packet in the at least one video data packet carries third information, and at least one video data packet carries at least one third information in total
  • the third information carried in any one of the at least one video data packet is used to indicate a sub-region to which the video image corresponding to any video data packet belongs.
  • the number of video data packets corresponding to the second target video image is 100, and the second information includes only one third information.
  • the third information may be carried only in the first video data packet or the 100th data packet.
  • the second information includes 10 third information
  • the 10 third information may be carried in any 10 video data packets among the 100 video data packets.
  • the at least one video data packet includes a video data packet that carries a view identifier.
  • any one of the at least one video data packet may carry the perspective identifier, and the perspective identifier is used to indicate a perspective where a video image corresponding to the arbitrary video packet is located.
  • the perspective identifier may be bound to the sub-region ID, that is, there is a corresponding relationship between the sub-region ID and the perspective identifier.
  • the client can distinguish video packets of different perspectives through the perspective identification, which is convenient for splicing video images corresponding to video packets in a certain perspective.
  • the second target video image is the same as the first target video image.
  • the bandwidth occupied when transmitting the target video image can be reduced.
  • the second target video image further includes a video image other than the first target video image.
  • the client may display a video image within the current perspective while displaying a video image outside the current perspective, so that the user suddenly turns ( Or suddenly change the viewing angle) can also watch video images.
  • the first target video image further includes a panoramic video image.
  • the panoramic video image can be displayed in addition to the first video image corresponding to the current perspective, so that the user can also see the video image when turning quickly (for example, quickly turning his head), which can play a certain role , So that you do n’t see the video content when you suddenly turn.
  • the image quality of the panoramic video image is lower than the image quality of the first target video image.
  • the above method further includes: the client receives a description file sent by the server, and the description file carries the first perspective information or the second perspective information, where the first perspective information is used to indicate that the server supports The maximum range of the viewing angle, and the second viewing angle information is used to indicate the range of the initial viewing angle.
  • the client can obtain the perspective range supported by the server or the initial perspective of the server, which is convenient for the client to stitch the decoded video image according to the supported perspective range and initial perspective of the client after receiving the video data packet subsequently.
  • the range of the area of the first target video image indicated by the first information should be within the maximum range of the view angle supported by the server indicated by the first perspective information.
  • the client can present the video image according to the initial perspective after the client is powered on. Next, the client can present the video image according to the current perspective of the user.
  • the client After the client is turned on, the video image in perspective 1 (view 1 is the initial perspective) is displayed, but the user wants to view the video image in perspective 2 (view 2 is the user's current perspective). Then, the client can then Switch from perspective 1 to perspective 2 and present the video image in perspective 2.
  • the initial perspective here may be a preset perspective.
  • the video image in the initial perspective is presented first.
  • the area range information of the initial viewing angle includes an azimuth range (yaw angle range) and a pitch angle range of the area of the initial viewing angle.
  • the area range information of the initial viewing angle includes the width and height of a planar area corresponding to the initial viewing angle.
  • the area range information of the initial viewing angle is specifically H and V, where H and V represent the azimuth range and the pitch range of the spherical coordinates of the VR, respectively.
  • the second viewing angle information may be used to indicate a region range of a default viewing angle in addition to a region range of the initial viewing angle.
  • the server may directly send the video image in the default perspective to the client, so that the client presents the video image in the range of the default perspective.
  • the above method further includes: the client sends a video description command to the server.
  • the server By sending a video description command to the server, the server can be triggered to send a description file to the client.
  • the description file when the description file carries the second perspective information, the description file also carries third perspective information, and the third perspective information is further used to indicate a spatial position of the initial perspective.
  • the third perspective information is any one of the following information: spherical coordinate value of the center point of the spherical area corresponding to the initial perspective, spherical coordinate value of the upper left corner of the spherical area corresponding to the initial perspective, and a plane corresponding to the initial perspective The plane coordinate value of the center point of the region, and the plane coordinate value of the upper left corner of the plane region corresponding to the initial viewing angle.
  • the spherical coordinate value of the center point of the spherical area corresponding to the initial angle of view is (X, Y, Z), where X corresponds to the azimuth or yaw angle of spherical coordinates, and Y corresponds to spherical coordinates
  • the pitch angle (pitch or elevation), Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the plane coordinate value of the center point of the plane area corresponding to the initial angle of view is (X, Y), where X and Y represent the abscissa of the center point of the plane area corresponding to the initial angle of view in the two-dimensional rectangular coordinate system. And ordinate.
  • the plane coordinate values of the upper left corner of the plane area corresponding to the initial viewing angle are (X, Y), where X and Y respectively represent the abscissa of the upper left corner of the plane area corresponding to the initial viewing angle in a two-dimensional rectangular coordinate system. And ordinate.
  • the above method further includes: the client receives a description file sent by the server, the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, and the description file carries sub-regions of each sub-region.
  • Region description information Each subregion is a subregion of a region of a panoramic video image, and the subregion description information includes spatial information of the subregion.
  • the client can make the video image be stitched according to the information of each sub-area after receiving the video data packet in the target video image, so as to obtain the video content within the current perspective.
  • the above method further includes: the client sends a video description command to the server.
  • the server By sending a video description command to the server, the server can be triggered to send a description file to the client.
  • the sub-region description information includes planar space information of each sub-region
  • the sub-region description information also includes mapping type information of video images within each sub-region
  • the planar space information of the sub-region is a two-dimensional coordinate value of a center point of the sub-region or a two-dimensional coordinate value of an upper left corner of the sub-region.
  • the two-dimensional coordinate value of the center point of the sub-region is (X, Y), where X and Y represent the abscissa and ordinate of the center point of the two-dimensional viewing angle region in the two-dimensional rectangular coordinate system, respectively.
  • the two-dimensional coordinate value of the upper-left corner of the sub-region is (X, Y), where X and Y represent the abscissa and ordinate of the upper-left corner of the two-dimensional viewing angle region in the two-dimensional rectangular coordinate system, respectively.
  • planar space information of the sub-region may also be a two-dimensional coordinate value of an upper-right corner, a lower-left corner, a lower-right corner, and any set position of the sub-region.
  • the mapping type indicated by the mapping type information is any one of a latitude and longitude map, a hexahedron, and an octahedron.
  • the sub-region description information includes spherical space information of each sub-region, and the sub-region description information further includes shape information of each sub-region.
  • the spherical space information of the sub-region may be represented by an azimuth angle, an elevation angle, and an inclination angle of a center point of the sub-region.
  • the shape information of the sub-region may indicate the shape type of the sub-region.
  • the shape type of the sub-region may be surrounded by four large circles, or two large circles and one small circle.
  • the three-dimensional space information of the sub-region includes a mapping type of the sub-region image, shape information of the sub-region, angle information of the sub-region, and region range information of the sub-region.
  • the description file further includes panoramic bitstream description information, and the panoramic bitstream description information includes mapping type information and size information of the panoramic video image.
  • the first information is carried in a real-time transport control protocol (RTCP) source description report sent by the client to the server, and the video data packet in the target video image is a streaming media real-time transmission protocol (real -time (protocol, RTP) video data packets.
  • RTCP real-time transport control protocol
  • RTP streaming media real-time transmission protocol
  • the first information and the second information are customized TLV format information.
  • the first information and the second information are information of a signal format for a multimedia transmission application defined in a state-of-the-art image expert group media transmission (MMT).
  • MMT state-of-the-art image expert group media transmission
  • the information for transmitting the current viewpoint or the current viewpoint between the client and the server may be information in a real-time transmission protocol, or information in a custom TLV format, or may be defined for multimedia in MMT. Transmits information in the applied format.
  • At least one video data packet collectively carries second viewpoint information, and the second viewpoint information is used to indicate a viewpoint corresponding to a second target video image.
  • the above-mentioned second viewpoint information may include multiple types of information of the viewpoint corresponding to the second target video image.
  • the above-mentioned second viewpoint information may include the spatial position information of the first viewpoint, the position changing speed information of the first viewpoint, and the acceleration of the position changing speed of the first viewpoint. At least one of the messages.
  • the above-mentioned second viewpoint information is similar to the first viewpoint information, and may also include various kinds of information included in the first viewpoint information. For details, refer to related content of the first viewpoint information, and the description is not repeated here.
  • the client can facilitate the client to present the target video image to be displayed according to the corresponding viewpoint, and improve the display effect.
  • a method for transmitting media data includes: a client receives a description file sent by a server, and the description file carries session description information of at least two sessions, and at least two sessions are between the client and the server. Session, at least two sessions are used to transmit the code stream data of the corresponding sub-region image, and the session description information includes the spatial information of the sub-region corresponding to the code-stream data of the sub-region image transmitted through each session.
  • the area is obtained by dividing the area of the panoramic video image.
  • the sub-area image is a video image within the sub-area.
  • the client sends first information to the server, and the first information is used to indicate the session corresponding to the sub-area covered by the current perspective. The first information is determined according to the current viewing angle and the session description information; the client receives code stream data of the target video image sent by the server, and the target video image includes an image of a sub-region to which the current viewing angle belongs.
  • the above current perspective may refer to a perspective when a user watches a video using a client.
  • the client may send new first information to the server again to request video content within the new perspective.
  • the target video image is a panoramic video image.
  • the at least two sessions are all or part of a session established between the client and the server.
  • a description file carrying session description information of each session is obtained, so that the client can process the received stream data of the target video image according to the description information.
  • one session may also be used to transmit video images of all sub-regions.
  • the target video image includes video images in sub-region 1 (corresponding to session 1), sub-region 2 (corresponding to session 2), and sub-region 3 (corresponding to session 3). Then, session 1, session 2, and session 3 can be used respectively. Video images in sub-region 1, sub-region 2 and sub-region 3 are transmitted. Alternatively, one session (any one of session 1, session 2, and session 3) may also be used to transmit video images in sub-region 1 to sub-region 3.
  • the client receiving the stream data of the target video image sent by the server includes: the client receives the sub-region image covered by the current perspective through a session corresponding to the sub-region covered by the current perspective. Code stream data to obtain the code stream data of the target video image.
  • the area space information of the sub-area is the flat area space information of the sub-area
  • the session description information further includes mapping type information of code stream data of the sub-area image transmitted by each session.
  • the area space information of the sub-area is the spherical area space information of the sub-area
  • the session description information further includes the shape of the sub-area corresponding to the code stream data of the sub-area image transmitted by each session. information.
  • the first information is carried in a real-time transmission control protocol RTCP source description report sent by the client to the server,
  • the first information belongs to TLV format information.
  • a method for transmitting media data includes: a server receives first information sent by a client, the first information is used to indicate a spatial position of a region of a first target video image, and the first target video image Including the video image within the current perspective; the server determines the second target video image according to the first information, the second target video image includes the first target video image; the server sends a video data packet corresponding to the second target video image to the client, and In the video data packet corresponding to the second target video image, at least one video data packet carries second information in total, and the second information is used to indicate spatial information of a region of the second target video image.
  • the server after the server obtains the first information characterizing the video content in the current perspective, it can send the second target video image and the position-related information of the second target video image required by the user to the client in real time, which can reduce The time delay for the client to obtain the position-related information of the second target video image.
  • the first target video image is a video image within a current perspective
  • the first information includes spatial information of the current perspective
  • the first target video image is a video image within a current perspective
  • the first information includes perspective information of the current perspective
  • the perspective information of the current perspective includes spatial information of the current perspective and viewpoint information of the current perspective.
  • the spatial information of the current viewing angle includes at least one of viewing angle direction information of the current viewing angle, viewing angle direction change speed information of the current viewing angle, and viewing angle coverage information of the current viewing angle.
  • the viewpoint information of the current viewpoint includes at least one of viewpoint position information of the current viewpoint, viewpoint displacement speed information of the current viewpoint, and acceleration information of viewpoint displacement speed of the current viewpoint.
  • the above method further includes: the server receives the first viewpoint information sent by the client, where the first viewpoint information is used to indicate a current viewpoint where the current perspective is located.
  • the server receives the viewpoint information of the viewpoint at which the current perspective is reported by the client, so that the server can transmit the video image matching the current viewpoint to the client, which can improve the viewing effect of the user.
  • the first viewpoint information may be a variety of information including the current viewpoint.
  • the first viewpoint information includes at least one of spatial position information of the current viewpoint, position change speed information of the current viewpoint, and acceleration information of position change speed of the current viewpoint.
  • the server When the server obtains the spatial position information of the current viewpoint reported by the client, it can obtain a video image of the perspective area corresponding to the current viewpoint and send the video image to the client, so that the client can obtain a match with the current viewpoint. Video images can improve the display effect.
  • the server After the server obtains the position change speed information of the current viewpoint and / or acceleration information of the position change speed of the current viewpoint reported by the client, the server can perform predictive rendering and prefetching of the video image to be sent to the client according to the reported information. Delivery can reduce the delay of video image transmission to the client, thereby improving user experience.
  • the spatial position information of the current viewpoint may specifically be a coordinate value of a position where the current viewpoint is located, where the spatial position information of the current viewpoint may be a three-dimensional coordinate system (which may be various types of three-dimensional coordinate systems, for example, Cartesian Coordinate system, spherical coordinate system, etc.).
  • the position of the current viewpoint (also referred to as the position of the current viewpoint) may be a changed position, and the position change speed information of the current viewpoint is information for indicating how fast the position of the current viewpoint changes.
  • the first viewpoint information includes spatial position information of the current viewpoint
  • the foregoing method further includes: receiving, by the server, first indication information sent by the client, where the first indication information includes a first identification bit, The value of the first identification bit is used to indicate that the spatial position information of the current viewpoint is relative spatial position information or absolute spatial position information; wherein when the first identification bit is the first value, the spatial position information of the current viewpoint is relative space Position information; when the first identification bit has a second value, the spatial position information of the current viewpoint is absolute spatial position information.
  • the spatial position information of the current viewpoint may be relative position information of the current viewpoint with respect to the starting viewpoint or a specified viewpoint or the previous viewpoint.
  • the above-mentioned spatial information of the current perspective may include only the perspective direction information of the current perspective, or may include both the perspective direction information of the current perspective and the position information of the viewpoint where the current perspective is located.
  • the client may send instruction information to the server to indicate the composition of the spatial information of the current perspective, so that the server can accurately obtain the spatial information of the current perspective.
  • the above method further includes: the server receives second indication information sent by the client, the second indication information includes a second identification bit, and a value of the second identification bit is used to indicate a current perspective Of the spatial information, the value of the second identification bit is used to indicate at least one of the following situations: when the second identification bit is the third value, the spatial information of the current perspective is determined by the viewing direction information of the current perspective When the second identification bit is the fourth value, the spatial information of the current perspective is composed of the perspective information of the current perspective and the position information of the viewpoint where the current perspective is located.
  • the spatial information of the current perspective is composed of the perspective direction information of the current perspective, the position information of the perspective where the current perspective is located, and the perspective size information of the current perspective.
  • the above-mentioned spatial information of the current perspective may include at least one of perspective direction information of the current perspective, position information of the viewpoint where the current perspective is located, and perspective size information of the current perspective. Any different information combination included in the spatial information of the current perspective can be indicated through different values of the second identification bit.
  • the first information, the first viewpoint information, the first indication information, and the second indication information are all carried in a real-time transmission control protocol RTCP source description report sent by the client to the server.
  • the multiple pieces of information sent by the client can be obtained by receiving the RTCP source description report once. Can reduce the number of client-server interactions and reduce resource overhead.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, where the current perspective covers at least one sub-region, and the first information is used to indicate the sub-region covered by the current perspective.
  • the sub-area covered by the current perspective is used for stitching to obtain the area of the first target video image.
  • the second information includes spatial information of a region of the second target video image.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, wherein the current perspective covers at least one sub-region, and the second information is used to indicate that the second target video image covers Sub-region, the sub-region covered by the second target video image is used for stitching to obtain the region of the second target video image.
  • the second information includes at least one third information, each video data packet in the at least one video data packet carries third information, and at least one video data packet carries at least one third information in total
  • the third information carried in any one of the at least one video data packet is used to indicate a sub-region to which the video image corresponding to any video data packet belongs.
  • At least one video data packet includes a video data packet that carries a view identifier.
  • the above method further includes: the server sends a description file to the client, and the description file carries the first perspective information or the second perspective information, wherein the first perspective information is used to indicate the perspective supported by the server The maximum range of the area, and the second viewing angle information is used to indicate the range of the initial viewing angle.
  • the description file when the description file carries the second perspective information, the description file also carries the third perspective information, and the third perspective information is further used to indicate the spatial position of the initial perspective.
  • the above method further includes: the server sends a description file to the client, the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, and the description file carries sub-regions of each sub-region Description information, the sub-region description information includes spatial information of the sub-region.
  • the sub-region description information includes planar space information of each sub-region
  • the sub-region description information also includes mapping type information of video images within each sub-region
  • the sub-region description information includes spherical space information of each sub-region, and the sub-region description information further includes shape information of each sub-region.
  • At least one video data packet collectively carries second viewpoint information, and the second viewpoint information is used to indicate a viewpoint corresponding to a second target video image.
  • the above-mentioned second viewpoint information may include multiple types of information of the viewpoint corresponding to the second target video image.
  • the above-mentioned second viewpoint information may include the spatial position information of the first viewpoint, the position changing speed information of the first viewpoint, and the acceleration of the position changing speed of the first viewpoint. At least one of the messages.
  • the server carries the viewpoint corresponding to the target video image to be displayed in the video data packet, which facilitates the client to present the target video image to be displayed according to the corresponding viewpoint, and improves the display effect.
  • a method for transmitting media data includes: the server sends a description file to the client, and the description file carries session description information of at least two sessions, and the at least two sessions are sessions between the client and the server.
  • At least two sessions are used to transmit stream data of the corresponding sub-region image, and the session description information includes spatial information of the sub-region corresponding to the stream data of the sub-region image transmitted through each session, where the sub-region is The area obtained by dividing the panoramic video image, and the sub-area image is a video image within the sub-area;
  • the server receives first information sent by the client, and the first information is used to indicate a session corresponding to the sub-area covered by the current perspective, The first information is determined according to the current viewing angle and the session description information;
  • the server sends code stream data of the target video image to the client, and the target video image includes a video image within a sub-region covered by the current viewing angle.
  • the at least two sessions are all or part of a session established between the client and the server.
  • the server transmits a description file carrying session description information of each session to the client before transmitting the target video image to the client, so that the client can process the received stream data of the target video image according to the description information.
  • the client receiving the stream data of the target video image sent by the server includes: the client receives the sub-region image covered by the current perspective through a session corresponding to the sub-region covered by the current perspective. Code stream data to obtain the code stream data of the target video image.
  • the area space information of the sub-area is the flat area space information of the sub-area
  • the session description information further includes mapping type information of code stream data of the sub-area image transmitted by each session.
  • the area space information of the sub-area is spherical area space information of the sub-area
  • the session description information further includes the shape of the sub-area corresponding to the code stream data of the sub-area image transmitted by each session information.
  • the first information is carried in a real-time transmission control protocol RTCP source description report sent by the client to the server,
  • the first information belongs to TLV format information.
  • a client includes a module for executing a method in any one of the foregoing implementation manners of the first aspect or the second aspect.
  • a client is a device capable of presenting a video image to a user.
  • a server includes a module for executing the method in any one of the foregoing third or fourth aspects.
  • the server is a device capable of storing a video image
  • the server may provide the video image to the client, so that the client can present the video image to the user.
  • a client including: a non-volatile memory and a processor coupled to each other; wherein the processor is configured to call program code stored in the memory to execute the first aspect or the second aspect Some or all of the steps of the method in any of the implementations.
  • a server including: a non-volatile memory and a processor coupled to each other; wherein the processor is configured to call program code stored in the memory to execute the third or fourth aspect Some or all of the steps in the method in any of the implementations.
  • a computer-readable storage medium stores program code, where the program code includes instructions for executing the first aspect, the second aspect, the third aspect, and the fourth aspect. An instruction in some or all of the steps of a method in any of the implementations.
  • a computer program product is provided, and when the computer program product runs on a computer, the computer is caused to execute any one of the first aspect, the second aspect, the third aspect, and the fourth aspect. Instructions in some or all steps of the method.
  • FIG. 1 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application
  • FIG. 6 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a client according to an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of a server according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a hardware structure of an apparatus for transmitting media data according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • the method shown in FIG. 1 may be executed by a client.
  • the client may be a program provided on a terminal device to provide a video playback service for the client.
  • the terminal device may be a device having a function of playing a panoramic video, such as a VR device.
  • the method shown in FIG. 1 includes steps 110 and 120. Steps 110 and 120 are described in detail below.
  • the client sends the first information to the server.
  • the above-mentioned first information is information for determining video content within a current perspective.
  • the first information may be specifically used to indicate spatial information of a region of a first target video image, where the first target video image includes a video image within a current perspective.
  • the current viewing angle may refer to a viewing angle when a user watches a video using a client.
  • the client sends the first information to the server to request the video content within the current perspective. If the user's perspective changes, the client sends the first information to the server again to request the video content within the new perspective. .
  • the client receives a video data packet corresponding to the second target video image sent by the server.
  • At least one video data packet carries second information in total, the second information is used to indicate spatial information of the area of the second target video image, and the second target video image includes the first A target video image.
  • the second target video image including the first target video image may specifically mean that the second target video image includes only the first target video image, or may mean that the second target video image includes the first target video image in addition to the first target video image. Also includes other video images. That is, in addition to sending the first target video image to the client, the server may send other video images other than the first target video image to the client.
  • the second information carried in the at least one video data packet may refer to that each video data packet in the at least one video data packet carries at least a part of the second information.
  • At least one video data packet as a whole is ⁇ ⁇ Second information.
  • the at least one video data packet may also carry code stream data while carrying the second information in common.
  • the code stream data refers to data obtained by encoding a video image.
  • the area of the first target video image may refer to an area just covered or occupied by the first target video image, that is, the video images in the area of the first target video image belong to the first target video image.
  • the video images in a target video image are all within the area of the first target video image.
  • the second target video image also meets similar requirements.
  • the spatial information of the region of the first target video image may also be referred to as the spatial information of the region of the first target video image.
  • the spatial information of the region of the first target video image is used to indicate the spatial range of the region of the first target video image. Or spatial location.
  • the spatial position of the region of the first target video image may be specific to a coordinate system, and the coordinate system may be a three-dimensional coordinate system or a two-dimensional coordinate system.
  • the origin of the three-dimensional coordinate system may be the center point of the panoramic video image or the upper-left corner point of the panoramic video image or other fixed positions in the panoramic video image. point.
  • the spatial position of the first target video image may also be the position of the first target video image in the panoramic video image area. The spatial location of the target video image).
  • the second information includes at least one third information
  • each video data packet in the at least one video data packet carries the third information
  • at least one video data packet carries at least one third information in the at least one video data packet.
  • the third information carried by any video data packet is used to indicate a sub-region to which a video image corresponding to any video data packet belongs.
  • the number of video data packets corresponding to the second target video image is 100, and the second information includes only one third information.
  • the third information may be carried only in the first video data packet or the 100th data packet.
  • the second information includes 10 third information
  • the 10 third information may be carried in any 10 video data packets among the 100 video data packets.
  • the server after the server obtains the relevant information characterizing the video content in the current perspective, it can send the target video image and the position related information of the target video image to the client in real time, which can reduce the client's acquisition of the target video. Time delay of image-related information.
  • the first target video image is a video image within a current perspective
  • the first information includes spatial information of the current perspective
  • the spatial information of the current perspective can also be referred to as the regional spatial information of the current perspective.
  • the client When the first information includes the spatial information of the current perspective, the client directly reports the spatial information of the current perspective through the first information, and the server determines the second target video image according to the spatial information of the current perspective, and sends the second target video image to the client.
  • the video data packet of the target video image directly carries the spatial position information of the target video image.
  • the spatial information of the current perspective includes spatial position information of the current perspective.
  • the spatial position information of the current perspective includes the following situations:
  • the spherical coordinate value of the center point of the current perspective is (X, Y, Z), where X corresponds to the azimuth or yaw angle of spherical coordinates, and Y corresponds to the pitch angle (pitch or yaw) of spherical coordinates. elevation), Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the two-dimensional coordinate values of the center point of the two-dimensional viewing area corresponding to the current viewing angle are (X, Y), where X and Y respectively represent the horizontal coordinates of the center point of the two-dimensional viewing area in the two-dimensional rectangular coordinate system and Y-axis.
  • the two-dimensional coordinate values of the upper-left corner of the two-dimensional viewing area corresponding to the current viewing angle are (X, Y), where X and Y respectively represent the abscissa and Y-axis.
  • the spatial information of the current perspective includes area range information of the current perspective.
  • the area information of the current perspective includes the following situations:
  • the azimuth range (yaw angle range) of the current viewing angle is 110 degrees
  • the pitch angle range is 90 degrees.
  • the coverage of the current perspective includes the width and height of the two-dimensional perspective area corresponding to the current perspective.
  • the coverage of the user's perspective can also be fixed.
  • the client only needs to report the area information of the perspective to the server once.
  • the user only needs to report The position information of the current perspective can be used without having to repeatedly report the area range information.
  • the spatial information of the area of the second target video image includes the spatial position information of the area where the second target video image is located.
  • the spatial location information of the area where the second target video image is located specifically includes the following situations:
  • the spherical coordinate value of the center point of the second target video image is (X, Y, Z), where X corresponds to the azimuth or yaw of spherical coordinates, and Y corresponds to the pitch angle of spherical coordinates (pitch or elevation), Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the two-dimensional coordinate value of the center point of the two-dimensional image corresponding to the second target video image is (X, Y), where X and Y respectively represent the horizontal point of the center point of the two-dimensional viewing area in the two-dimensional rectangular coordinate system Coordinates and ordinates.
  • the two-dimensional coordinate value of the upper left corner of the two-dimensional image corresponding to the second target video image is (X, Y), where X and Y respectively represent the horizontal direction of the upper left corner of the two-dimensional viewing area in the two-dimensional rectangular coordinate system. Coordinates and ordinates.
  • the spatial information of the second target video image includes area range information of the second target video image.
  • the area range information of the second target video image includes the following specific situations:
  • the coverage range of the second target video image includes an azimuth range (yaw angle range) and a pitch angle range of the second target video image.
  • the coverage range of the second target video image includes the width and height of the two-dimensional video image corresponding to the second target video image.
  • the area range information of the area range information of the second target video image is specifically H and V, where H and V represent the azimuth range and the elevation range of the spherical coordinates of the VR, respectively.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, where the current perspective covers at least one sub-region, and the second information is used to indicate the sub-region covered by the second target video image. Area, the sub-area covered by the second target video image is used for stitching to obtain the area of the second target video image.
  • the at least one video data packet includes a video data packet carrying a view identifier.
  • any one of the at least one video data packet may carry the perspective identifier, and the perspective identifier is used to indicate a perspective where a video image corresponding to the arbitrary video packet is located.
  • the perspective identifier may be bound to the sub-region ID, that is, there is a corresponding relationship between the sub-region ID and the perspective identifier.
  • the client can distinguish video packets of different perspectives through the perspective identification, which is convenient for splicing video images corresponding to video packets in a certain perspective.
  • the second target video image is the same as the first target video image.
  • the bandwidth occupied when transmitting the target video image can be reduced.
  • the second target video image further includes other video images than the first target video image.
  • the client may display a video image within the current perspective while displaying a video image outside the current perspective, so that the user suddenly turns ( Or suddenly change the viewing angle) can also watch video images.
  • the first target video image further includes a panoramic video image.
  • the panoramic video image can be displayed in addition to the first video image corresponding to the current perspective, so that the user can also see the video image when turning quickly (for example, quickly turning his head), which can play a certain role , So that you do n’t see the video content when you suddenly turn.
  • the image quality of the panoramic video image is lower than the image quality of the first target video image.
  • the method shown in FIG. 1 further includes: the client receives a description file sent by the server, and the description file carries the first perspective information or the second perspective information.
  • the first perspective information is used to indicate the maximum area range of the perspective supported by the server, and the second perspective information is used to indicate the area range of the initial perspective.
  • the client can obtain the perspective range supported by the server or the initial perspective of the server, which is convenient for the client to stitch the decoded video image according to the supported perspective range and initial perspective of the client after receiving the video data packet subsequently.
  • the range of the area of the first target video image indicated by the first information should be within the maximum range of the view angle supported by the server indicated by the first perspective information.
  • the client can present the video image according to the initial perspective after the client is turned on. Next, the client can present the video image according to the current perspective of the user.
  • the client After the client is turned on, the video image in perspective 1 (view 1 is the initial perspective) is displayed, but the user wants to view the video image in perspective 2 (view 2 is the user's current perspective). Then, the client can then Switch from perspective 1 to perspective 2 and present the video image in perspective 2.
  • the initial perspective here may be a preset perspective.
  • the video image in the initial perspective is presented first.
  • the area range information of the initial viewing angle may include an azimuth range (yaw angle range) and a pitch angle range of the region of the initial viewing angle.
  • the area range information of the initial viewing angle may also include the width and height of the planar area corresponding to the initial viewing angle.
  • the area range information of the initial viewing angle is specifically H and V, where H and V represent the azimuth range and the pitch range of the spherical coordinates of the VR, respectively.
  • the second viewing angle information may be used to indicate a region range of a default viewing angle in addition to a region range of the initial viewing angle.
  • the server may directly send the video image in the default perspective to the client, so that the client presents the video image in the range of the default perspective.
  • the above method further includes: the client sends a video description command to the server.
  • the server By sending a video description command to the server, the server can be triggered to send a description file to the client.
  • the description file when the description file carries the second perspective information, the description file also carries the third perspective information, and the third perspective information is also used to indicate the spatial position of the initial perspective.
  • the third perspective information is any one of the following information (11) to (14):
  • the spherical coordinate value of the center point of the spherical area corresponding to the initial angle of view is (X, Y, Z), where X corresponds to the azimuth or yaw of spherical coordinates, and Y corresponds to the pitch of spherical coordinates Angle (pitch or elevation), Z corresponds to the tilt angle or roll angle of spherical coordinates.
  • the plane coordinate value of the center point of the plane region corresponding to the initial angle of view is (X, Y), where X and Y represent the horizontal and vertical coordinates of the center point of the plane region corresponding to the initial angle of view in the two-dimensional rectangular coordinate system. coordinate.
  • the plane coordinate value of the upper left corner of the plane area corresponding to the initial angle of view is (X, Y), where X and Y represent the abscissa and vertical axis of the upper left corner of the plane area corresponding to the initial angle of view in the two-dimensional rectangular coordinate system coordinate.
  • the method shown in FIG. 1 further includes: the client receives a description file sent by the server, and the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, and the description file carries each sub-region Sub-region description information of a region, each sub-region is a sub-region of a region of a panoramic video image, and the sub-region description information includes spatial information of the sub-region.
  • the client can make the video image be stitched according to the information of each sub-area after receiving the video data packet in the target video image, so as to obtain the video content within the current perspective.
  • the method shown in FIG. 1 further includes: the client sends a video description command to the server.
  • the server By sending a video description command to the server, the server can be triggered to send a description file to the client.
  • the sub-region description information includes planar space information of each sub-region
  • the sub-region description information further includes mapping type information of a video image within each sub-region
  • the spherical space information of each sub-region is used for Determined based on mapping type information and plane space information of each sub-region.
  • the planar space information of the sub-region is a two-dimensional coordinate value of a center point of the sub-region or a two-dimensional coordinate value of an upper left corner of the sub-region.
  • the two-dimensional coordinate value of the center point of the sub-region is (X, Y), where X and Y represent the abscissa and ordinate of the center point of the two-dimensional viewing angle region in the two-dimensional rectangular coordinate system, respectively.
  • the two-dimensional coordinate value of the upper-left corner of the sub-region is (X, Y), where X and Y respectively represent the abscissa and ordinate of the upper-left corner of the two-dimensional viewing angle region in the two-dimensional rectangular coordinate system.
  • planar space information of the sub-region may also be a two-dimensional coordinate value of an upper-right corner, a lower-left corner, a lower-right corner, and any set position of the sub-region.
  • the mapping type indicated by the mapping type information is any one of a latitude and longitude map, a hexahedron, and an octahedron.
  • the sub-region description information includes spherical space information of each sub-region, and the sub-region description information further includes shape information of each sub-region.
  • the spherical space information of the sub-region may be represented by an azimuth angle, an elevation angle, and an inclination angle of a center point of the sub-region.
  • the shape information of the sub-region may indicate the shape type of the sub-region.
  • the shape type of the sub-region may be surrounded by four large circles, or two large circles and one small circle.
  • the three-dimensional spatial information of the sub-region includes a mapping type of the sub-region image, shape information of the sub-region, angle information of the sub-region, and region range information of the sub-region.
  • the description file further includes panoramic bitstream description information, and the panoramic bitstream description information includes mapping type information and size information of the panoramic video image.
  • the first information is carried in a real-time transmission control protocol RTCP source description report sent by the client to the server, and the video data packet in the target video image is a streaming media real-time transmission protocol RTP video data packet.
  • the first information and the second information are customized TLV format information.
  • FIG. 2 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application. Similar to FIG. 1, the method shown in FIG. 2 can also be executed by a client.
  • the method shown in FIG. 2 includes steps 210 and 230. Steps 210 to 230 are described in detail below.
  • the client receives a description file sent by the server.
  • the description file carries session description information of at least two sessions. At least two sessions are sessions between the client and the server. At least two sessions are used to transmit code stream data of the corresponding sub-region image, and the session description information. It includes spatial information of the sub-region corresponding to the code stream data of the sub-region image transmitted by each session.
  • the sub-region is obtained by dividing the region of the panoramic video image, and the sub-region image is a video image within the sub-region;
  • the client sends the first information to the server.
  • the first information is used to indicate a session corresponding to the sub-area covered by the current perspective, and the first information is determined according to the current perspective and the session description information.
  • the client receives the stream data of the target video image sent by the server.
  • the target video image includes an image of a sub-region to which the current perspective belongs.
  • the above current perspective may refer to a perspective when a user watches a video using a client.
  • the client may send new first information to the server again to request video content within the new perspective.
  • the target video image is a panoramic video image.
  • a description file carrying session description information of each session is obtained, so that the client can process the received stream data of the target video image according to the description information.
  • one session may also be used to transmit video images of all sub-regions.
  • the target video image includes video images in sub-region 1 (corresponding to session 1), sub-region 2 (corresponding to session 2), and sub-region 3 (corresponding to session 3). Then, session 1, session 2, and session 3 can be used respectively. Video images in sub-region 1, sub-region 2 and sub-region 3 are transmitted. Alternatively, one session (any one of session 1, session 2, and session 3) may also be used to transmit video images in sub-region 1 to sub-region 3.
  • the client receiving the code stream data of the target video image sent by the server includes: the client receives the code stream of the sub-region image covered by the current perspective through a session corresponding to the sub-region covered by the current perspective. Data to obtain the bitstream data of the target video image.
  • the area space information of the sub-area is the flat area space information of the sub-area
  • the session description information further includes mapping type information of code stream data of the sub-area image transmitted by each session.
  • the area space information of the sub-area is the spherical area space information of the sub-area
  • the session description information further includes shape information of the sub-area corresponding to the code stream data of the sub-area image transmitted by each session.
  • the first information is carried in a real-time transmission control protocol RTCP source description report sent by the client to the server,
  • the first information belongs to TLV format information.
  • the method for transmitting media data in the embodiment of the present application is described above from the perspective of the client with reference to FIGS. 1 and 2.
  • the method for transmitting media data in the embodiment of the present application is described from the perspective of the server with reference to FIGS. 3 and 4.
  • the method shown in FIG. 3 corresponds to the method shown in FIG. 1
  • the method shown in FIG. 4 corresponds to the method shown in FIG. 2.
  • the following shows the methods shown in FIGS. 3 and 4. Duplicate descriptions are appropriately omitted when introducing the method.
  • FIG. 3 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • the method shown in FIG. 3 may be executed by a server.
  • the method shown in FIG. 3 includes steps 310 to 330. Steps 310 to 330 are described below.
  • the server receives the first information sent by the client.
  • the first information is used to indicate a spatial position of a region of the first target video image, and the first target video image includes a video image within a current perspective.
  • the server determines a second target video image according to the first information.
  • the second target video image includes the first target video image.
  • the server sends a video data packet corresponding to the second target video image to the client.
  • At least one video data packet carries second information in total, and the second information is used to indicate spatial information of a region of the second target video image.
  • the server after the server obtains the first information characterizing the video content in the current perspective, it can send the second target video image and the position-related information of the second target video image required by the user to the client in real time, which can reduce The time delay for the client to obtain the position-related information of the second target video image.
  • the first target video image is a video image within a current perspective
  • the first information includes spatial information of the current perspective
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, where the current perspective covers at least one sub-region, and the first information is used to indicate the sub-region covered by the current perspective.
  • the sub-area covered by the perspective is used to stitch the area of the first target video image.
  • the second information includes spatial information of a region of the second target video image.
  • the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, where the current perspective covers at least one sub-region, and the second information is used to indicate the sub-region covered by the second target video image Area, the sub-area covered by the second target video image is used for stitching to obtain the area of the second target video image.
  • the second information includes at least one third information
  • each video data packet in the at least one video data packet carries third information
  • at least one video data packet carries at least one third information in total
  • at least The third information carried by any video data packet in a video data packet is used to indicate a sub-region to which a video image corresponding to any video data packet belongs.
  • At least one video data packet includes a video data packet carrying a view identifier.
  • the above method further includes: the server sends a description file to the client, and the description file carries the first perspective information or the second perspective information, wherein the first perspective information is used to indicate the maximum perspective supported by the server. Area range, the second viewing angle information is used to indicate the area range of the initial viewing angle.
  • the description file when the description file carries the second perspective information, the description file also carries the third perspective information, and the third perspective information is further used to indicate the spatial position of the initial perspective.
  • the above method further includes: the server sends a description file to the client, the panoramic video image includes at least two sub-regions obtained by dividing the panoramic video image, and the description file carries sub-region description information of each sub-region
  • the sub-region description information includes spatial information of the sub-region.
  • the sub-region description information includes planar space information of each sub-region
  • the sub-region description information further includes mapping type information of a video image within each sub-region
  • the spherical space information of each sub-region is used for Determined based on mapping type information and plane space information of each sub-region.
  • the sub-region description information includes spherical space information of each sub-region, and the sub-region description information further includes shape information of each sub-region.
  • FIG. 4 is a schematic flowchart of a method for transmitting media data according to an embodiment of the present application.
  • the method shown in FIG. 4 may be executed by a server.
  • the method shown in FIG. 4 includes steps 410 to 430. Steps 410 to 430 are described below.
  • the server sends a description file to the client.
  • the description file carries session description information of at least two sessions. At least two sessions are sessions between the client and the server. At least two sessions are used to transmit code stream data of the corresponding sub-region image.
  • the session description information includes Through the spatial information of the sub-region corresponding to the code stream data of the sub-region image transmitted by each session, the sub-region is obtained by dividing the region of the panoramic video image, and the sub-region image is a video image within the sub-region.
  • the server receives the first information sent by the client.
  • the first information is used to indicate the session corresponding to the sub-area covered by the current perspective, and the first information is determined according to the current perspective and the session description information.
  • the server sends code stream data of the target video image to the client.
  • the target video image includes a video image within a sub-region covered by the current perspective.
  • the server transmits a description file carrying session description information of each session to the client before transmitting the target video image to the client, so that the client can process the received stream data of the target video image according to the description information.
  • the client receiving the code stream data of the target video image sent by the server includes: the client receives the code stream of the sub-region image covered by the current perspective through a session corresponding to the sub-region covered by the current perspective. Data to obtain the bitstream data of the target video image.
  • the area space information of the sub-area is the flat area space information of the sub-area
  • the session description information further includes mapping type information of code stream data of the sub-area image transmitted by each session.
  • the area space information of the sub-area is the spherical area space information of the sub-area
  • the session description information further includes shape information of the sub-area corresponding to the code stream data of the sub-area image transmitted by each session.
  • the first information is carried in a real-time transmission control protocol RTCP source description report sent by the client to the server,
  • the first information belongs to TLV format information.
  • FIG. 5 The specific process of the method for transmitting media data shown in the first embodiment is shown in FIG. 5.
  • the method shown in FIG. 5 includes steps 1001 to 1007. Steps 1001 to 1007 are described in detail below.
  • the server publishes an address of a video with a preset perspective position.
  • the server may select the video content of a certain perspective position from the panoramic video, and then publish the address of the video content; or, the server may also render the content of a certain perspective position, and then publish the address of the video content .
  • the format of the video content address published by the server may be a real-time streaming protocol (RTSP) protocol format.
  • RTSP real-time streaming protocol
  • the address of the video content published by the server may be: rtsp: //server.example.com/video.
  • the client sends a video description request command to the server.
  • the server sends session description protocol (SDP) information to the client, describing the area range of a partial area (field of view, FOV) that covers the user's perspective in the supported panoramic picture.
  • SDP session description protocol
  • the client may send a video description request command to the address in step 1001, and after receiving the video description command request from the client, the server sends video description information to the client, which includes descriptions that the server can support.
  • the request description command sent by the client to the server may be as shown in Table 1:
  • the DESCRIBE field indicates a video description command.
  • the response content of the video description command may be an SDP description file, and the specific content of the SDP description file may be shown in Table 2:
  • the SDP description file shown in Table 2 describes a video session.
  • the bit rate of the FOV stream transmitted in the video session is 5000 kbps.
  • H and V are used to describe the range of FOV. Among them, H and V can represent VR spherical coordinates, respectively.
  • the azimuth range and elevation_range of the azimuth angle range, or H and V can also represent the pixel values of the width and height of the two-dimensional image, respectively.
  • the SDP description file shown in Table 2 above can be transmitted using the RTP protocol.
  • a session is established between the client and the server.
  • the client when establishing a session between the client and the server, the client may first send a session connection establishment command to the server, and then the server responds to the connection establishment command sent by the client, thereby establishing the client and the server.
  • Conversation when establishing a session between the client and the server, the client may first send a session connection establishment command to the server, and then the server responds to the connection establishment command sent by the client, thereby establishing the client and the server.
  • the content of the session connection establishment command sent by the client to the server can be shown in Table 3:
  • the SETUP field in Table 3 above indicates a session connection establishment command.
  • the first command in the command for establishing a session connection indicates that the client needs to establish a connection with track1 in the SDP description file, and the Transport field indicates that the FOV video content is transmitted in unicast through the RTP protocol, and the client is used to receive
  • the RTP port number of the data stream is 20000, and the RTCP port number used to receive the control stream is 20001.
  • the server After receiving the session connection establishment command shown in Table 3, the server responds to the session connection establishment command, and the content of the response can be shown in Table 4.
  • the response content shown in Table 4 above is the server's response to the client's first connection establishment request, indicating that the client's connection establishment request is accepted.
  • the Transport field indicates that the unicast address is 10.70.144.123, and the client's RTP receiving port is 20000.
  • the RTCP receiving port is 20001, the RTP receiving port on the server is 50000, and the RTCP receiving port is 50001.
  • the session number of this connection is 12345678.
  • the client sends an RTCP source description report to the server, where the RTCP source description report carries a center point of the user's current perspective and / or a coverage range of the client perspective.
  • the client may send an RTCP source description report to the server through an RTSP session according to the RTCP port information of the server response after the connection establishment command.
  • H and V together describe the coverage range of the viewing angle.
  • H and V may represent the azimuth range and elevation_range of the spherical coordinates of VR, respectively, or H and V may also represent the width of a two-dimensional image, respectively. And height.
  • RTCP source description report in addition to describing the client perspective coverage, it can also describe the user's current perspective center point.
  • the specific format for describing the central point of the user's perspective in the RTCP source description report can be shown in Table 7.
  • X, Y, and Z collectively identify the information of the center point of the viewing angle.
  • X, Y, and Z can respectively represent the azimuth, elevation, and tilt values of the VR spherical coordinates, or only X and Y can be reserved to indicate the center points of the corresponding viewing area respectively.
  • perspective information such as perspective coverage and perspective center point information may also be sent in the same SDES item.
  • the COVERAGE and CENTER_FOV fields may not exist.
  • the RTCP source description report only carries perspective information such as perspective coverage and perspective center point information.
  • the client sends a play command to the server.
  • step 1006 the client sends a play command to a session established between the client and the server.
  • the specific format of the play command may be as shown in Table 9.
  • the client After receiving the playback command sent by the client, the client will respond to the client's playback command.
  • the specific response content can be shown in Table 11.
  • the server sends to the client the video data carried in the form of an RTP data packet corresponding to the user's perspective, and the corresponding video data.
  • the video data packet may be a data packet of a video within a first window range, and the video data may include a center point coordinate and content coverage of a video sent by the server to the client;
  • the video content in the first window includes the FOV content requested by the client.
  • the video content in the first window may be the same as the FOV content requested by the user, or the video content in the first window is more than the FOV content requested by the user. .
  • the carrying server can send the coordinates of the center point of the window content and the azimuth and elevation elevation information to support the server encoding transmission Application scenarios where the window size is adaptively changed.
  • the RTP header format of the RTP packet used to carry video data may be shown in Table 11.
  • RTP data packets can be extended.
  • the extended format is shown in Table 12.
  • X, Y, and Z collectively represent the location information of the center point where the server sends the video to the client.
  • X, Y, and Z can correspond to the azimuth, elevation, and tilt values of the spherical coordinates of the VR, respectively.
  • X and Y can be reserved to indicate the corresponding view area The two-dimensional coordinate value of the center point or the upper left corner coordinate value of the viewing area.
  • H and V can collectively characterize the range of video content sent, where H and V can represent the azimuth range and elevation range of the spherical coordinates of VR, respectively, or elevation_range, or H and V can also Represents the width and height in pixels of a two-dimensional image.
  • the azimuth range is 110 degrees
  • the elevation range is 90 degrees
  • the corresponding RTP header is shown in Table 13 As shown.
  • the client can send the RTCP source description report to the server again (re-perform step 1005) to update the user's perspective center point information or the user's perspective area information.
  • the server extracts FOV area content from the panoramic video or renders FOV content in real time according to the latest user perspective information, and then encodes and sends FOV video data.
  • the panoramic video space region can be divided into multiple sub-regions.
  • the following is a combination of Embodiments 2 to 4 for cases where there are multiple sub-regions.
  • the transmission of media data is described in detail.
  • the client may only display the video content in the FOV region, or in addition to the video content in the FOV region, the client may also display the video content in other regions adjacent to the FOV region on the client.
  • the method of transmitting high-quality FOV area and other areas of low quality is usually used to transmit video data.
  • a panoramic video content is divided into 8 sub-regions, and the sub-regions 2, 3, 6, 7 cover the user's perspective FOV.
  • sub-regions 2, 3, 6, and 7 can be encoded using high-quality encoding methods to obtain high-quality video data, and low-quality encoding is used for all sub-regions to obtain low-quality panoramic videos, and high-quality
  • the video data is transmitted to the client together with the low-quality panoramic video, which is decoded and rendered by the client, and the client presents some areas to the user according to the user's perspective.
  • FIG. 6 The specific process of the method for transmitting media data shown in the second embodiment is shown in FIG. 6.
  • the method shown in FIG. 6 includes steps 2001 to 2008. Steps 2001 to 2008 are described in detail below.
  • the server divides the panoramic space area into a plurality of sub-areas, and determines a preset FOV and a sub-area corresponding to the preset FOV.
  • the sub-region corresponding to the preset FOV can be obtained from the panoramic video or obtained by rendering (the corresponding sub-region range can be generated by rendering according to the sub-region involved in the requested perspective).
  • the address of the video content published by the server may be an address in the RTSP protocol format.
  • the address of the video content published by the server may be: rtsp: //server.example.com/video.
  • the client sends a video description command to the server.
  • the server describes the spatial mapping type of the panoramic video corresponding to each sub-region, the coordinates of the center of the sub-region, and the range information of the sub-region.
  • the client may send a video description command to the address in step 2001.
  • the server After receiving the video description command from the client, the server responds to the video description command and describes the area of FOV that the server can support.
  • the description command sent by the client to the server may be as shown in Table 14:
  • the DESCRIBE field indicates a video description command.
  • the server After receiving the video description command shown in Table 14, the server sends a session description file SDP to the client.
  • the specific format of the SDP description file is shown in Table 15:
  • a total of nine video sessions are described in the SDP description file shown in Table 15.
  • the code streams of the sub-regions (tiles) corresponding to the eight-way video sessions are track1 to track8, and the code rate of the code streams of each sub-region is 5000kbps.
  • the other video call corresponds to a panoramic code stream (track9), with a code rate of 1000 kbps.
  • the SDP description file shown in the above Table 15 may include only the sub-region description information, and does not include the panoramic code stream description information.
  • the above sub-region description information and panoramic code stream description information can be transmitted using the RTP protocol.
  • d ⁇ projection_type> ⁇ shape_type>: ⁇ azimuth> ⁇ elevation> ⁇ tilt> ⁇ azimuth_range> ⁇ elevation_range>, the semantics of each statement in d is as follows:
  • projection_type The expression type of the two-dimensional spatial range of the panoramic video, which can be a latitude and longitude map, a hexahedron, or an octahedron.
  • shape_type area shape identification, which identifies the type of shape surrounding the area. It can be surrounded by four large circles, or two large circles and one small circle.
  • azimuth the azimuth of the center point of the subregion
  • elevation the elevation angle of the center point of the subregion
  • tilt the tilt angle of the center point of the subregion
  • azimuth_range the azimuth range of the subregion
  • elevation_range The elevation range of the subregion.
  • the above SDP description file can carry two-dimensional coordinate information describing the center point and range of the content of the area. There can be multiple specific forms. One possible form is shown in Table 16.
  • d ⁇ projection_type>: ⁇ h_center> ⁇ v_center> ⁇ h> ⁇ v>, the semantics of each statement in d is as follows:
  • projection_type two-dimensional spatial expression type of the panoramic video. For example, it can be a latitude and longitude map, hexahedron, or octahedron.
  • h_center the horizontal center point coordinate value of the area
  • v_center the coordinate value of the center point in the vertical direction of the area
  • v the vertical height of the area
  • the two-dimensional coordinates and the size range of the upper left corner of the area content can also be used for the description.
  • the above SDP description file may carry two-dimensional coordinate information of the upper left corner of the description area content and two-dimensional coordinate information of a size range.
  • An optional form is shown in Table 17.
  • d ⁇ projection_type>: ⁇ h_left_top> ⁇ v_left_top> ⁇ h> ⁇ v>, the semantics of each syntax in d is as follows:
  • projection_type The expression type of the two-dimensional spatial range of the panoramic video, which can be a latitude and longitude map, a hexahedron, or an octahedron.
  • h_left_top coordinate value of the center point of the upper left corner of the area
  • v_left_top coordinate value of the center point of the upper left corner of the area
  • v the vertical height of the area
  • d ⁇ projection_type>: ⁇ h_left_top> ⁇ v_left_top> ⁇ h> ⁇ v>, the semantics of each syntax in d is as follows:
  • projection_type The expression type of the two-dimensional spatial range of the panoramic video, which can be a latitude and longitude map, a hexahedron, or an octahedron.
  • h_left_top coordinate value of the center point of the upper left corner of the area
  • v_left_top coordinate value of the center point of the upper left corner of the area
  • v The vertical height of the area.
  • a nine-way session can be established between the client and the server.
  • the client may send a session connection establishment command to the server.
  • the session connection establishment command is shown in Table 3 above.
  • the server After receiving the session connection establishment command, the server sends the session connection establishment command to the session.
  • the connection establishment command responds, and the specific content of the response can be shown in Table 4 above.
  • the client sends an RTCP source description report to the server, and the RTCP source description report carries the sub-region session label required by the user's perspective.
  • the client can transmit the RTCP source description report to the server through a session, and the specific format of the RTCP source description report sent can be shown in Table 19.
  • the RTCP source description report shown in Table 19 carries the sub-areas required by the client perspective (optional And includes session label information corresponding to the panoramic video.
  • the SESSION_FOV field may not exist.
  • SESSION_NUM indicates the number of required sessions.
  • the sessionID1 identifier needs the first session connection identifier
  • the sessionID2 identifier needs the second session connection identifier
  • the session connection identifier can be any number or character, and can uniquely distinguish the multiple sessions between the client and the server.
  • the client sends a play command to the server.
  • the server sends an RTP video data packet to the client.
  • the server may send sub-region video data to the client through the multi-session connection given by the SDES information.
  • the client sends an RTCP source description report to the server to update which session connections the server needs to send to the client to send the corresponding video content.
  • the server When receiving the RTCP source description report resent by the client, the server extracts the FOV area content from the panoramic video or renders the FOV content in real time according to the latest user perspective information, and then encodes and sends the FOV video data.
  • each session corresponds to a sub-area
  • the client reports the RTCP description report by carrying the session corresponding to the sub-area required by the user's perspective It is described by the label.
  • a session can be established between the client and the server, and an RTCP source description report is sent to the server through the session.
  • the RTCP description report carries the label of the sub-area required by the client user's perspective.
  • FIG. 7 The specific process of the method for transmitting media data shown in the third embodiment is shown in FIG. 7.
  • the method shown in FIG. 7 includes steps 3001 to 3008. Steps 3001 to 3008 are described in detail below.
  • the server divides the panoramic space area into a plurality of sub-areas, and determines a preset FOV and a sub-area corresponding to the preset FOV.
  • the sub-region corresponding to the preset FOV can be obtained from a panoramic video or obtained through rendering.
  • the address of the video content published by the server may be an address in the RTSP protocol format.
  • the address of the video content published by the server may be: rtsp: //server.example.com/video.
  • the client sends a video description command to the server.
  • the server describes the type of the panoramic video space mapping format, the ID identification number of each sub-area, the coordinates of the sub-area center and the sub-area range information corresponding to each sub-area ID.
  • the client may send a video description command to the address in step 3003.
  • the server After receiving the video description command from the client, the server responds to the video description command and describes the area of FOV that the server can support.
  • the description command sent by the client to the server may be as shown in Table 20 below:
  • the DESCRIBE field indicates a video description command.
  • the server After receiving the video description command shown in Table 20, the server sends a session description file SDP to the client.
  • the specific format of the SDP description file is shown in Table 21:
  • the SDP description file shown in Table 21 describes a total of two video sessions. Among them, all the way is the sub-region (tiles) code stream, each code rate is 5000kbps. All the way is a panoramic code stream with a code rate of 1000kbps. Both sessions can be transmitted using the RTP protocol.
  • d ⁇ projection_type>: ⁇ subPic_Num> ⁇ subPicID> ⁇ azimuth> ⁇ elevation> ⁇ tilt>
  • projection_type The expression type of the two-dimensional spatial range of the panoramic video, which can be a latitude and longitude map, a hexahedron, or an octahedron.
  • shape_type area shape identification, which identifies the type of shape surrounding the area. It can be surrounded by four large circles, or two large circles and one small circle.
  • subPic_Num the number of divided regions, which is convenient for analyzing how many sets of region parameters are in total
  • subPicID the label of each area
  • azimuth the azimuth of the center point of the area
  • elevation the elevation angle of the center point of the area
  • tilt the tilt angle of the center point of the area
  • azimuth_range the azimuth range of the area
  • elevation_range The elevation range of the area.
  • only one video session may be described in a form shown in Table 21, and only a sub-region code stream is described.
  • the panoramic code stream and the sub-region code stream can also be placed in the same description field.
  • One possible form is shown in Table 22:
  • d ⁇ projection_type>: ⁇ subPic_Num> ⁇ subPicID> ⁇ x> ⁇ y> ⁇ h>
  • projection_type The expression type of the two-dimensional spatial range of the panoramic video, which can be a latitude and longitude map, a hexahedron, or an octahedron.
  • subPic_Num the number of regions, which is convenient for analyzing how many groups of region parameters there are
  • subPicID the label of each area
  • x the horizontal coordinate value of the center of the area; optionally, it can be the horizontal coordinate value of the upper left corner of the area;
  • y the vertical coordinate value of the center of the area; optionally, the vertical coordinate value of the upper left corner of the area;
  • v The vertical height of the area.
  • a session is established between the client and the server.
  • step 3005 the process of establishing a session between the client and the server in step 3005 is as shown in step 1004 above.
  • the client sends an RTCP source description report to the server, and the RTCP source description report carries a sub-area label required by a user's perspective.
  • the client may send an RTCP source description report to the server through a session established between the client and the server.
  • the SUBPIC_FOV field may not exist.
  • subPicNum indicates how many sub-region contents are needed
  • subPicID1 indicates the first sub-region label required
  • subPicID2 indicates the second sub-region label required, and so on.
  • the client sends a play command to the server.
  • step 3007 the specific content of the playback command sent by the client to the server in step 3007 is the same as the above step 1006, and will not be described in detail here.
  • the server sends an RTP video data packet corresponding to the user's perspective to the client.
  • the header of each RTP packet includes the ID identification number of the area currently carrying the video content, and optionally includes the FOV frame number or other information to which the current area belongs. Information identifying the contents of different areas belonging to the same FOV.
  • step 3008 an optional format of the RTP video data packet sent is shown in Table 24.
  • the RTP data packet can be extended.
  • the extended format is shown in Table 25.
  • the subPicID in Table 24 identifies the ID number of the region to which the currently carried video content belongs, and FOV_SN identifies the perspective number to which the content belongs, which facilitates the client to combine different subPicID videos with the same FOV_SN into FOV content for display.
  • the client when the user's perspective changes, in order to accurately present the video within the user's perspective, the client can send the RTCP source description report to the server again (re-execute step 3006) to update the user's perspective center point Information or user perspective area range information.
  • the server extracts FOV area content from the panoramic video or renders FOV content in real time according to the latest user perspective information, and then encodes and sends FOV video data.
  • the client determines the sub-area corresponding to the user's perspective, and carries the sub-area label required by the user's perspective in the RTCP description report sent to the server.
  • the client can only carry the user's perspective information in the RTCP source description report, and the server determines the sub-area corresponding to the user's perspective, and the sub-area corresponding to the user Video packets are sent to the client.
  • FIG. 8 The specific process of the method for transmitting media data shown in the fourth embodiment is shown in FIG. 8.
  • the method shown in FIG. 8 includes steps 4001 to 4008. Steps 4001 to 4008 are described in detail below.
  • the server divides the panoramic space area into a plurality of sub-areas, and determines a preset FOV and a sub-area corresponding to the preset FOV.
  • the client sends a video description command to the server.
  • the server describes the type of the panoramic video space mapping format, the ID identification number of each sub-area, the sub-area center coordinates and the sub-area range information corresponding to each sub-area ID.
  • a session is established between the client and the server.
  • steps 4001 to 4005 in the fourth embodiment is the same as steps 3001 to 3005 in the third embodiment, and will not be described in detail here.
  • the client sends an RTCP source description report to the server.
  • the RTCP source description report carries the user's perspective information.
  • the perspective information includes the center point of the user's current perspective and the coverage of the client perspective.
  • the client may send an RTCP source description report to the server through the FOV video stream transmission session according to the RTCP port information of the server response after the connection command.
  • the specific format of the RTCP source description report may be shown in Table 26. .
  • COVERAGE 9 as an example
  • H and V together describe the coverage of the viewing angle.
  • H and V can be the azimuth range and elevation range of the spherical coordinates of VR, respectively, and the width and height.
  • perspective information such as perspective coverage and perspective center point information may be sent in the same SDES item.
  • the COVERAGE and CENTER_FOV fields may not exist, and RTCP only carries perspective information such as perspective coverage and perspective center point information.
  • the client sends a play command to the server.
  • the server sends an RTP video data packet corresponding to the user's perspective to the client.
  • the header of each RTP packet includes the ID identification number of the area where the video content is currently carried, optionally including the FOV frame number to which the current area belongs or other Information identifying the contents of different areas belonging to the same FOV.
  • the perspective information of the user carried in the RTCP source description report sent by the client to the server includes not only the central point of the current perspective, but also the coverage of the perspective of the client.
  • the perspective information of the user carried in the RTCP source description report may also carry only the center point of the current perspective, and the coverage of the perspective of the client may be preset.
  • the server when the server responds to the client and describes the video content, in addition to describing the video content in the FOV area range it supports, it can also directly describe several initial FOV information for the client. select.
  • the SDP description file carries the initial FOV information sent by the server.
  • the SDP description file sent by the server to the client may carry the initial FOV information sent by the server.
  • the format of the initial FOV information is shown in Table 27.
  • H and V together describe the range of FOV.
  • H and V can be the azimuth range and elevation range of the spherical coordinates of VR, respectively, and can also be a two-dimensional image. Width and height.
  • X, Y, and Z together identify the information of the initial perspective center point.
  • X, Y, and Z may correspond to the (azimuth, elevation, tilt) values of the VR spherical coordinates, or only two items (X and Y ) Respectively correspond to the two-dimensional coordinate value of the center point of the viewing area or the coordinate value of the upper left corner of the viewing area.
  • the first embodiment to the sixth embodiment all implement the real-time transmission of FOV information based on the streaming media transmission technology.
  • the specific implementation method is to carry the user's perspective information and corresponding FOV information into the original data.
  • some transmission protocols can also be customized to realize the real-time transmission of FOV information.
  • the perspective information fed back by the client can be sent using a message defined in a custom TLV message mode.
  • An optional TLV format is shown in Table 28.
  • Type Semantic Payload 0x00 FOV range information H, V 0x01 FOV location information X, Y, Z 0x02 FOV range and location information V, H, X, Y, Z other Keep Zh
  • H and V together describe the range of FOV.
  • H and V can be the azimuth range and elevation range of the spherical coordinates of VR, respectively, and the width and height of the two-dimensional image.
  • X, Y, and Z together identify the information of the center point of the viewing angle.
  • X, Y, and Z may correspond to the (azimuth, elevation, tilt) values of the VR spherical coordinates, or only two items (X and Y) may be retained, respectively.
  • H and V together describe the window video range sent by the server to the client.
  • H and V can be the azimuth range, elevation range, and elevation_range of the spherical coordinates of VR, respectively, or the width and height.
  • X, Y, and Z together identify the information of the center point of the window.
  • X, Y, and Z can correspond to the azimuth, elevation, and tilt values of the VR spherical coordinates, respectively, and only two items (X and Y) can be reserved for the perspective.
  • video content represents video content, which is compressed video data.
  • TLV data sent by the server to the client needs to include the FOV label.
  • An optional TLV format is shown in Table 31:
  • H and V together describe the window video range sent by the server to the client.
  • H and V can be the azimuth range, elevation range, and elevation_range of the spherical coordinates of VR, respectively, or the width and height.
  • X, Y, and Z together identify the information of the center point of the window.
  • X, Y, and Z may correspond to the azimuth, elevation, and tilt values of the VR spherical coordinates, respectively, and only two items (X and Y) corresponding to the perspective may be retained.
  • video content is compressed video data.
  • FOV_SN identifies the FOV number to which the current regional video belongs.
  • the method for transmitting media data according to the embodiments of the present application is described in detail in combination with the first to sixth embodiments.
  • the client when the client reports the spatial information of the current perspective to the server, it can also report the viewpoint information of the current perspective to the server, so that the server can comprehensively determine the second target video by combining the spatial information of the current perspective and the viewpoint information of the current perspective. image.
  • the client can also indicate the specific composition of the spatial information of the current perspective by sending instruction information to the server. The following describes these two cases in detail with reference to Embodiment 7 and Embodiment 8.
  • FIG. 9 shows a specific process of the method for transmitting media data shown in Embodiment 7.
  • the specific process shown in FIG. 9 includes steps 5001 to 5007. Steps 5001 to 5007 are described in detail below.
  • the server publishes an address of a video with a preset perspective position.
  • step 5001 The specific process of step 5001 is the same as step 1001 in the first embodiment, and will not be described in detail here.
  • the client sends a description command to the address in step 5001.
  • step 5002 The specific process of sending a description command in step 5002 is the same as that in step 1002.
  • step 1002 The related explanations, limitations, and examples of step 1002 are also applicable to step 5002, which will not be described in detail here.
  • 5003 The server sends SDP information to the client.
  • the SDP information includes preset viewpoint position information, preset FOV direction information, and preset FOV range information.
  • the above SDP information may be an SDP description file, and the specific content of the SDP description file may be shown in Table 32.
  • a video session is described in the SDP description file in Table 32.
  • the information contained in the SDP description file in Table 32 adds the spatial position information of the preset viewpoint and the preset FOV direction information.
  • (location_x, location_y, location_z) is the spatial position information of the preset viewpoint
  • (X, Y, Z) is the preset FOV direction information.
  • (location_x, location_y, location_z) can collectively describe the spatial coordinate position of viewpoint A
  • (X, Y, Z) can identify the direction information of the FOV center on the unit ball with viewpoint A as the center of the sphere .
  • X may correspond to an azimuth or yaw in a three-dimensional coordinate system (for example, a three-dimensional rectangular coordinate system), and Y may correspond to a three-dimensional coordinate system.
  • the pitch angle (pitch or elevation), Z can correspond to the tilt angle or roll angle in the three-dimensional coordinate system.
  • the three-dimensional coordinate system may be a three-dimensional coordinate system with the position of the viewpoint A as an origin.
  • (X, Y) may be used to identify the direction information of the FOV center point on the unit ball with the viewpoint A as the center of the sphere.
  • (X, Y) may respectively correspond to the two-dimensional coordinate value of the center point of the viewing area, and (X, Y) may also correspond to the upper left corner coordinate value of the viewing area.
  • the SDP description file shown in Table 32 can be transmitted based on the RTP protocol.
  • a session is established between the client and the server.
  • step 5004 is the same as the specific process of step 1004, and will not be described in detail here.
  • the client sends an RTCP source description report to the server, where the RTCP source description report carries at least one of viewpoint position information of the current perspective, perspective direction information of the current perspective, and perspective coverage information of the current perspective.
  • the information such as the viewpoint position information of the current perspective, the perspective direction information of the current perspective, and the perspective coverage information of the current perspective carried in the RTCP source description report may be referred to as the perspective information of the current perspective.
  • the perspective information of the current perspective includes the viewpoint of the current perspective. At least one of position information, perspective direction information of the current perspective, and perspective coverage information of the current perspective.
  • the above RTCP source description report may carry the perspective information of the current perspective of the user, and the perspective information may include the viewpoint position information of the current perspective, the viewpoint displacement speed information, the acceleration information of the viewpoint displacement speed, the perspective direction information, and the perspective direction change speed.
  • the perspective information may include the viewpoint position information of the current perspective, the viewpoint displacement speed information, the acceleration information of the viewpoint displacement speed, the perspective direction information, and the perspective direction change speed.
  • the above-mentioned perspective information of the current perspective may also be divided into spatial information of the current perspective and perspective information of the current perspective (equivalent to the first perspective information above).
  • the viewpoint information of the current perspective includes the viewpoint position information of the current perspective, the viewpoint displacement speed information of the current perspective, and the acceleration information of the viewpoint displacement speed of the current perspective.
  • the spatial information of the current perspective includes the perspective direction information of the current perspective, and the perspective of the current perspective Direction change speed information and viewing angle coverage information of the current viewing angle.
  • the client may also send the first indication information to the server, where the first indication information includes a first identification bit, and a value of the first identification bit is used to indicate that the spatial position information of the current viewpoint is relative spatial position information or Absolute spatial position information; wherein when the first identification bit (value) is the first value, the spatial position information of the current viewpoint is relative spatial position information; when the first identification bit is the second value, the current viewpoint The spatial position information is absolute spatial position information.
  • the first indication information may be carried in an RTCP source description report sent by the client to the server.
  • the first identification bit may correspond to at least one bit, and the different values of the at least one bit may be used to indicate that the spatial position information of the current viewpoint is relative spatial position information or absolute spatial position information, respectively.
  • the spatial position information indicating the current viewpoint when the value of the first flag is relative spatial position information or absolute spatial position information, as long as the value of the first flag can indicate the space of the current viewpoint.
  • the manner in which the position information is relative spatial position information or absolute spatial position information is within the protection scope of the present application.
  • the spatial position information of the current viewpoint is relative spatial position information
  • the amount of data contained in the reported spatial information of the current viewpoint can be reduced, and resource overhead can be reduced.
  • the spatial position information of the current viewpoint may be relative position information of the current viewpoint with respect to the starting viewpoint or a specified viewpoint or the previous viewpoint.
  • the spatial position information of the current viewpoint is absolute position information
  • the spatial position information of the current viewpoint may be the relative of the current viewpoint with respect to a fixed coordinate system (the fixed coordinate system may be a fixed coordinate system set in advance). location information.
  • the client may also send a second indication information to the server, and the second indication information is used to indicate the information carried in the RTCP source description report. information.
  • the client may further send second indication information to the server, where the second indication information includes a second identification bit, and a value of the second identification bit is used to indicate composition information of the spatial information of the current perspective.
  • the value of the second identification bit is used to indicate at least one of the following situations:
  • the spatial information of the current perspective is composed of the perspective information of the current perspective
  • the spatial information of the current perspective is composed of the perspective information of the current perspective and the position information of the perspective where the current perspective is located;
  • the spatial information of the current perspective is composed of the perspective direction information of the current perspective, the position information of the perspective where the current perspective is located, and the perspective size information of the current perspective.
  • the second indication information may be carried in an RTCP source description report sent by the client to the server.
  • the viewing direction information of the current viewing angle may be absolute viewing direction information or relative viewing direction information.
  • the amount of data contained in the viewing direction information of the current viewing angle can be reduced, and resource overhead can be reduced.
  • the viewing direction information of the current viewing angle may be viewing direction information relative to a fixed coordinate system; when the viewing direction information of the current viewing angle is relative viewing direction information, the current viewing angle
  • the viewing angle direction information of can be viewing angle direction information relative to a certain previous viewing direction direction (for example, a deflection angle with respect to a previous viewing direction direction, or a deflection angle with respect to an initial viewing direction direction).
  • the client may further send third indication information to the server, where the third indication information includes a third identification bit, and the value of the third identification bit is used to indicate that the viewing direction information of the current viewing angle is relative absolute viewing direction information Or relative direction information; where, when the third flag (value) is the sixth value, the viewing direction information of the current viewing angle is absolute angle direction information; when the third flag (the value) is the seventh value When the value is, the viewing direction information of the current viewing angle is relative viewing direction information.
  • the third indication information includes a third identification bit, and the value of the third identification bit is used to indicate that the viewing direction information of the current viewing angle is relative absolute viewing direction information Or relative direction information; where, when the third flag (value) is the sixth value, the viewing direction information of the current viewing angle is absolute angle direction information; when the third flag (the value) is the seventh value When the value is, the viewing direction information of the current viewing angle is relative viewing direction information.
  • the third indication information may be carried in an RTCP source description report sent by the client to the server.
  • the RTCP source description report in step 5005 may only include some main information of all the information that may be contained in the foregoing viewpoint information.
  • a possible specific format in the RTCP source description report is shown in Table 33.
  • H and V collectively describe the range of FOV.
  • H and V may respectively represent a horizontal coverage angle and a vertical coverage angle of a VR spherical coordinate, and may also be a width and a height of a two-dimensional image.
  • FOV_X, FOV_Y, and FOV_Z collectively identify the rotation information of FOV.
  • FOV_X corresponds to the azimuth or yaw of the three-dimensional coordinate system
  • FOV_Y corresponds to the azimuth or yaw of the three-dimensional coordinate system
  • FOV_Z corresponds to the inclination of the three-dimensional coordinate system (tilt) or roll angle value.
  • the three-dimensional coordinate system may be a three-dimensional coordinate system with the position of the current viewpoint as the origin.
  • FOV_X and FOV_Y may be reserved to identify the two-dimensional coordinate value of the center point of the corresponding FOV region or the coordinate value of the upper left corner of the FOV region in the panoramic two-dimensional video image of the current viewpoint.
  • position_x, position_y, position_z are coordinate values of viewpoint space position information.
  • the client can only feed back the viewpoint information to the server.
  • the client only feeds back viewpoint information to the server, which can reduce signaling overhead.
  • the specific format of the RTCP source description report reported by the client to the server can be shown in Table 34.
  • the viewpoint is not fixed but always moves.
  • the information indicating the speed of the viewpoint movement can also be reported to the server, so that the server can make predictions. Rendering and prefetching can reduce the delay of video image transmission to the client, thereby improving the user experience.
  • H and V collectively describe the range of FOV.
  • H and V may respectively represent a horizontal coverage angle and a vertical coverage angle of a VR spherical coordinate, and may also be a width and a height of a two-dimensional image.
  • FOV_X, FOV_Y, and FOV_Z collectively identify the rotation information of FOV.
  • FOV_X corresponds to the azimuth or yaw of the three-dimensional coordinate system
  • FOV_Y corresponds to the azimuth or yaw of the three-dimensional coordinate system
  • FOV_Z corresponds to the inclination of the three-dimensional coordinate system (tilt) or roll angle value.
  • the three-dimensional coordinate system may be a three-dimensional coordinate system with the position of the current viewpoint as the origin.
  • FOV_X and FOV_Y may be retained to identify the two-dimensional coordinate value of the center point of the corresponding FOV region or the coordinate value of the upper left corner of the FOV region in the panoramic two-dimensional video image of the current viewpoint.
  • position_x, position_y, position_z are coordinate values of viewpoint space position information
  • speed_pos_x, speed_pos_y, and speed_pos_z are viewpoint position change speed values.
  • the RTCP source description report also carries speed_fov_x, speed_fov_y, and speed_fov_z.
  • speed_fov_x is the change speed value of the azimuth or yaw in the three-dimensional coordinate system of FOV
  • speed_fov_y is the change speed value of the pitch angle (pitch or elevation) of FOV in the three-dimensional coordinate system
  • speed_fov_z It is the change speed value of the tilt angle or roll angle of the FOV in the three-dimensional coordinate system.
  • message_type identifies the type of perspective information carried, and indicates whether the current SDES contains one or more types of information.
  • the specific contents of message_type are shown in Table 36.
  • the form of a mask may be used to indicate whether the current SDES contains one or more types of information. That is, when each bit of message_type is 1, the identifier carries the information type corresponding to the current bit identifier (which may be one type of information or a combination of multiple types of information), otherwise it is not carried.
  • the viewpoint position information, viewpoint displacement speed information, viewpoint displacement speed acceleration information, viewpoint direction information, viewpoint direction change speed information, and perspective coverage information of the current perspective can be sent in different SDES items (can be Each type of information corresponds to an SDES item, or a combination of multiple types of information corresponds to an SDES item.)
  • the above-mentioned viewpoint spatial position information may be relative position information of the viewpoint with respect to a starting viewpoint or a certain designated viewpoint or a previous viewpoint.
  • the above-mentioned viewing angle direction information may be relative change amount information of the viewing angle direction with respect to a starting viewing direction direction or a specified viewing angle direction or a previous viewing direction direction.
  • the client sends a play command to the server.
  • step 5006 The specific process of step 5006 is the same as that of step 1006, and the description is not repeated here.
  • the server sends an RTP video data packet to the client.
  • step 5007 the server sends the video data corresponding to the user perspective to the client in the form of an RTP data packet.
  • the RTP data packet may carry the spatial position information of the viewpoint where the current window is located, the direction information of the window center point, and the window coverage information.
  • the server can send the window video content rendered or divided from the panoramic video to the client through the session (the content contains the FOV content requested by the client), and the encoded data can be carried in an RTP data packet.
  • the RTP data packet requires It carries the spatial position information of the viewpoint where the window area is located, the coordinates of the center point of the window, and the horizontal and vertical coverage.
  • the RTP header format of the RTP packet is shown in Table 37.
  • the RTP data packet can be extended.
  • the header format of the extended RTP data packet is shown in Table 38.
  • position_x, position_y, and position_z collectively identify the position coordinates of the viewpoint where the current window is located;
  • X, Y, and Z collectively represent the position of the center point where the server sends the video to the client.
  • X may correspond to the azimuth or yaw angle of the three-dimensional coordinate system
  • Y may correspond to the pitch angle (pitch or elevation) in the three-dimensional coordinate system
  • Z may correspond to a tilt angle or a roll angle in a three-dimensional coordinate system.
  • the three-dimensional coordinate system may be a three-dimensional coordinate system with the position of the current viewpoint as the origin.
  • H and V collectively characterize the range of video content sent.
  • H and V may represent the horizontal coverage angle and the vertical coverage angle of the VR spherical coordinates, respectively; or H and V may also represent the pixel values of the width and height of the two-dimensional image, respectively.
  • the prediction information may be used to render or intercept a window content from the panoramic video that has a larger FOV perspective range than that requested by the client.
  • the new FOV corresponding content can be obtained from the previous window content, so the carrying server can send the coordinates of the center point of the window content and the azimuth and elevation elevation information to support the server encoding transmission Application scenarios where the window size is adaptively changed.
  • the client can send the RTCP source description report to the server again (re-perform step 5005) to update the user's perspective center point information or the user's perspective area range information.
  • the server extracts FOV area content from the panoramic video or renders FOV content in real time according to the latest user perspective information, and then encodes and sends FOV video data.
  • the existing TCP or UDP communication protocol can be used to transmit the perspective information.
  • the client feeds back the current viewpoint position or perspective information to the server, and the server sends a certain window content corresponding to the viewpoint position to the client according to the viewpoint position and viewpoint information requested by the client, and the sent content needs to carry window position and size information .
  • the client feeds back the perspective information to the server, it uses the custom TLV (type, length, value) message mode to send.
  • TLV type, length, value
  • a possible TLV format is used when the client reports information, as shown in Table 39.
  • H and V together describe the range of FOV, where H and V can represent the horizontal coverage angle and vertical coverage angle of the VR spherical coordinates, respectively, and can also be the width and height of a two-dimensional image.
  • X, Y, Z can collectively identify the information of the center point of the viewing angle.
  • X may correspond to the azimuth or yaw in the three-dimensional coordinate system
  • Y may correspond to the pitch angle (pitch or elevation) in the three-dimensional coordinate system
  • Z may correspond to a tilt angle or a roll angle in a three-dimensional coordinate system.
  • the three-dimensional coordinate system may be a three-dimensional coordinate system with the position of the current viewpoint as the origin.
  • H and V collectively characterize the range of video content sent.
  • H and V may represent the horizontal coverage angle and the vertical coverage angle of the VR spherical coordinates, respectively; or H and V may also represent the pixel values of the width and height of the two-dimensional image, respectively.
  • speed_fov_x is the change speed value of the azimuth or yaw in the three-dimensional coordinate system of FOV
  • speed_fov_y is the change speed value of the pitch angle (pitch or elevation) of FOV in the three-dimensional coordinate system
  • speed_fov_z is FOV The value of the rate of change of the tilt angle or roll angle in the three-dimensional coordinate system.
  • the server can also use the TLV format when sending data to the client.
  • a possible TLV format adopted by the server is shown in Table 40.
  • H and V together describe the window video range sent by the server to the client.
  • H and V can be the horizontal and vertical coverage angles of the spherical coordinates of the VR, respectively, and the width and height of the two-dimensional image. .
  • X, Y, Z together identify the center point of the window.
  • X may correspond to the azimuth or yaw angle of the three-dimensional coordinate system
  • Y may correspond to the pitch angle (pitch or elevation) in the three-dimensional coordinate system
  • Z may correspond to a tilt angle or a roll angle in a three-dimensional coordinate system.
  • position_x, position_y, position_z indicates the position information of the viewpoint where the currently sent window is located.
  • custom TLV format information can support real-time interactive feedback of viewpoint position information, viewpoint movement speed information, perspective direction information, perspective direction change speed information, and area size information from the client and server, which can reduce transmission as much as possible.
  • the delay is suitable for application scenarios with high real-time requirements.
  • the client may also use the signal format defined for MMT to feed back information.
  • the client reports the perspective information to the server and the related information of the image carried in the video data packet sent by the server to the client.
  • the client can also be transmitted through the dynamic image expert group media.
  • Transmission MPEG, Media, MMT
  • MPEG defines a signal format for multimedia transmission applications for transmission.
  • Embodiment 8 is a diagrammatic representation of Embodiment 8
  • MMT (for example, specifically in the ISO / IEC 23008-1 standard) defines a set of signal formats for multimedia transmission applications. Among them, the field "urn: mpeg: mmt: app: vr: 2017" is also defined in the ISO / IEC 23008-1 standard to identify that the information is used for VR content transmission.
  • the viewpoint information of the MMT VR receiver (equivalent to the client above) needs to be periodically fed back to the MMT VR receiver (equivalent to the server above), or when the user's viewpoint position changes
  • the viewpoint information of the MMT VR receiver needs to be fed back to the MMT VR receiver.
  • the MMT VR sender In order to facilitate the MMT VR sender to determine the position of the viewpoint where the current perspective is located.
  • it can be used to describe the change information of the viewpoint position to support the application when the viewpoint changes.
  • the MMT VR receiver can also feedback the angle change speed information, viewpoint change speed information, the angle of view relative to a specific view or the initial view or One or more kinds of information such as the direction relative change information of the previous perspective, the viewpoint relative to the specific viewpoint or the initial viewpoint, or the position change information of the previous viewpoint (any specific information combination can be specifically fed back).
  • the information transmitted between the client and the server specifically includes viewpoint position information, viewpoint change speed information, viewpoint change speed information, relative change information of the direction of the viewpoint relative to a specific viewpoint or the initial viewpoint or the previous viewpoint, and the viewpoint Which information, such as the position change information of a specific viewpoint or an initial viewpoint or a previous viewpoint, can transmit an instruction between the client and the server (the instruction information can be information sent by the client to the server, or (Is the information sent by the server to the client), and different values of different identification bits of the indication information indicate that different information is carried.
  • the instruction information can be information sent by the client to the server, or (Is the information sent by the server to the client)
  • different values of different identification bits of the indication information indicate that different information is carried.
  • VRViewDependentSupportQuery and VRViewDependentSupportResponse are application information used between the client and the server to confirm whether the server supports video streaming based on the perspective.
  • VRViewDependentSupportQuery The client uses this command to discover whether the server supports video streaming based on the perspective;
  • VRViewDependentSupportResponse The server feeds back to the client the identity of the video stream that the server supports based on the perspective.
  • the information corresponding to 0x07 to 0x0B is the information that the client sends back to the server.
  • the specific meaning is as follows:
  • VRViewpointChangeFeedback feedback the current viewpoint position information
  • VRViewportSpeedFeedback feedback speed change direction of viewing angle
  • VRViewpointSpeedFeedback feedback information about the speed of the viewpoint position change
  • VRViewportDeltaChangeFeedback feedback information about the relative change of the perspective
  • VRViewpointDeltaChangeFeedback Feedback information about the relative change of viewpoint position.
  • the information corresponding to 0x0C to 0x0E is the window content information that the server sends to the client to select or render.
  • the specific meaning is as follows:
  • VR_content_window_range the size of the content range selected or rendered by the server
  • VR_content_window_centre the location information of the content center selected or rendered by the server
  • VR_content_window_viewpoint The position information of the viewpoint where the content selected or rendered by the server.
  • app_message_type represents the different message types shown in Table 41, and posx, posy, and posz represent the coordinate position of the viewpoint position in the three-dimensional Cartesian coordinate system.
  • VRViewportSpeedFeedback a possible syntax of VRViewportSpeedFeedback is shown in Table 43.
  • app_message_type represents the different message types shown in Table 41, and direx_speed, diry_speed, and dirr_speed represent the speed of change of the viewing direction in the three-dimensional Cartesian or polar coordinate system, respectively.
  • VRViewpointSpeedFeedback is shown in Table 44.
  • app_message_type indicates different message types shown in Table 41, posx_speed, posy_speed, and posz_speed: respectively, the viewpoint position change speed in the three-dimensional Cartesian coordinate system.
  • app_message_type represents the different message types shown in Table 41, delta_dirx, delta_diry, delta_dirz: Respectively the angle of view in a three-dimensional Cartesian or polar coordinate system relative to a specific or initial or previous perspective Relative change in direction.
  • app_message_type represents different message types shown in Table 41, delta_posx, delta_posy, delta_posz: the position change amount of the viewpoint position in a three-dimensional Cartesian coordinate system relative to a specific viewpoint or an initial viewpoint or a previous viewpoint.
  • VR_content_window_range is shown in Table 47.
  • app_message_type different message types shown in Table 41;
  • Hor_resolution the number of pixels in the width direction of the content rendered or sent by the server
  • Ver_resolution the number of pixels in the height direction of the content rendered or sent by the server
  • Hor_fov the horizontal field of view coverage of the content rendered or sent by the server
  • Ver_fov vertical field of view coverage of content rendered or sent by the server.
  • VR_content_window_centre a possible syntax of VR_content_window_centre is shown in Table 48.
  • app_message_type indicates different message types shown in Table 41; center_x, center_y, and center_z: collectively indicate the center point position information of the video sent by the server to the client.
  • the center_x corresponds to the azimuth or yaw of the three-dimensional coordinates; the center_y corresponds to the pitch angle (pitch or elevation); and the center_z corresponds to the tilt angle or roll angle.
  • center_x and center_y may be retained, where center_x and center_y respectively represent the two-dimensional coordinate value of the center point of the corresponding area or the coordinate value of the upper left corner of the viewing area.
  • VR_content_window_viewpoint is shown in Table 49.
  • app_message_type represents different message types shown in Table 41; posx_s, posy_s, and pos_z represent the viewpoint position information of the window video sent by the server to the client. Specifically, the coordinate position information in the 3D Cartesian coordinate system Means.
  • FIG. 10 is a schematic block diagram of a client according to an embodiment of the present application.
  • the client 500 shown in FIG. 10 includes a sending module 510 and a receiving module 520.
  • the sending module 510 and the receiving module 520 in the client 500 may perform various steps in the method shown in FIG. 1 and FIG. 2.
  • the specific functions of the sending module 510 and the receiving module 520 are as follows:
  • a sending module 510 is configured to send first information to a server, where the first information is used to indicate spatial information of a region of a first target video image, where the first target video image includes a video image within a current perspective;
  • the receiving module 520 is configured to receive a video data packet corresponding to a second target video image sent by the server, wherein the second target video image includes the first target video image and corresponds to the second target video image. At least one video data packet in the video data packet carries second information, and the second information is used to indicate spatial information of an area of the second target video image.
  • the specific functions of the sending module 510 and the receiving module 520 are as follows:
  • the receiving module 510 is configured to receive a description file sent by the server, where the description file carries session description information of at least two sessions, and the at least two sessions are sessions between the client and the server.
  • the at least two sessions are used to transmit code stream data of respective sub-region images, and the session description information includes spatial information of sub-regions corresponding to code stream data of sub-region images transmitted through the respective sessions, where ,
  • the sub-region is obtained by dividing the region of the panoramic video image, and the sub-region image is a video image within the sub-region;
  • a sending module 520 is configured to send first information to a server, where the first information is used to indicate a session corresponding to a sub-area covered by a current perspective, and the first information is determined according to the current perspective and the session description information;
  • the receiving module 510 is further configured to receive code stream data of a target video image sent by the server, where the target video image includes a video image within a sub-region covered by the current perspective.
  • FIG. 11 is a schematic block diagram of a server according to an embodiment of the present application.
  • the server 600 shown in FIG. 11 includes a receiving module 610, a determining module 620, and a sending module 630.
  • the receiving module 610, the determining module 620, and the sending module 630 in the server 600 may perform various steps in the methods shown in FIGS. 3 and 4.
  • the specific functions of the receiving module 610, the determining module 620, and the sending module 630 are as follows:
  • the receiving module 610 is configured to receive first information sent by a client, where the first information is used to indicate a spatial position of a region of a first target video image, where the first target video image includes a video image within a current perspective;
  • a determining module 620 configured to determine a second target video image according to the first information, where the second target video image includes the first target video image;
  • the sending module 630 is configured to send a video data packet corresponding to the second target video image to the client.
  • the video data packets corresponding to the second target video image at least one video data packet carries second information in total.
  • the second information is used to indicate spatial information of a region of the second target video image.
  • the server 600 executes the method shown in FIG. 4, only the receiving module 610 and the sending module 630 may be used.
  • the specific functions of the receiving module 610 and the sending module 630 are as follows:
  • the sending module 630 is configured to send a description file to the client, where the description file carries session description information of at least two sessions, where the at least two sessions are sessions between the client and the server, and the at least The two sessions are used to transmit code stream data of the corresponding sub-region image, and the session description information includes spatial information of the sub-region corresponding to the code stream data of the sub-region image transmitted through the respective sessions.
  • the area is obtained by dividing the area of the panoramic video image, and the sub-area image is a video image within the sub-area;
  • the receiving module 610 is configured to receive first information sent by the client, where the first information is used to indicate a session corresponding to a sub-area covered by a current perspective, and the first information is based on the current perspective and the session. Descriptive information
  • the sending module 630 is further configured to send code stream data of a target video image to the client, where the target video image includes a video image within a sub-region covered by the current perspective.
  • FIG. 12 is a schematic diagram of a hardware structure of an apparatus for transmitting media data according to an embodiment of the present application.
  • the apparatus 700 shown in FIG. 12 can be regarded as a computer device, and the apparatus 700 can be used as an implementation manner of the client 500 or the server 600 in the embodiment of the present application, or as a method for transmitting media data in the embodiment of the present application.
  • the device 700 includes a processor 710, a memory 720, an input / output interface 730, and a bus 750, and may further include a communication interface 740.
  • the processor 710, the memory 720, the input / output interface 730, and the communication interface 740 implement a communication connection with each other through a bus 750.
  • the processor 710 may use a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for executing related programs to The functions required by the modules in the client or server in the embodiments of the present application are implemented, or the method for transmitting media data in the embodiments of the method of the present application is performed.
  • the processor 710 may be an integrated circuit chip and has a signal processing capability. In the implementation process, each step of the above method may be completed by using hardware integrated logic circuits or instructions in the form of software in the processor 710.
  • the processor 710 may be a general-purpose processor, digital signal processor (DSP), ASIC, field programmable gate array (FPGA), or other programmable logic device, discrete gate, or transistor logic Devices, discrete hardware components.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly implemented by a hardware decoding processor, or may be performed by using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in the memory 720, and the processor 710 reads the information in the memory 720 and completes the functions required by the modules included in the client or server in the embodiment of the present application in combination with its hardware, or performs the transmission of the method embodiment in the present application Method of media data.
  • the memory 720 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 720 may store an operating system and other application programs.
  • software or firmware is used to implement the functions required by the client or server included in the embodiments of this application, or to implement the method for transmitting media data according to the embodiments of this application, it is used to implement the functions provided by the embodiments of this application.
  • the program code of the technical solution is stored in the memory 720, and the processor 710 executes operations required by the client or the module included in the server, or executes the method for transmitting media data provided by the method embodiments of the present application.
  • the input / output interface 730 is used to receive input data and information, and output data such as operation results.
  • the communication interface 740 uses a transceiving device such as, but not limited to, a transceiver to implement communication between the device 700 and other devices or a communication network. It can be used as an acquisition module or a sending module in the processing device.
  • a transceiving device such as, but not limited to, a transceiver to implement communication between the device 700 and other devices or a communication network. It can be used as an acquisition module or a sending module in the processing device.
  • the bus 750 may include a path for transmitting information between various components of the device 700 (for example, the processor 710, the memory 720, the input / output interface 730, and the communication interface 740).
  • the device 700 shown in FIG. 12 only shows the processor 710, the memory 720, the input / output interface 730, the communication interface 740, and the bus 750, in a specific implementation process, those skilled in the art should understand that The device 700 also includes other devices necessary for achieving normal operation, for example, it may further include a display for displaying video data to be played. At the same time, according to specific needs, those skilled in the art should understand that the apparatus 700 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 700 may also include only the components necessary to implement the embodiments of the present application, and not necessarily all the components shown in FIG. 12.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the aforementioned storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Studio Devices (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请提供了传输媒体数据的方法和装置。该方法包括:客户端向服务器发送第一信息;所述客户端接收所述服务器发送的第二目标视频图像对应的视频数据包;其中,所述第一信息用于指示第一目标视频图像的区域的空间信息,所述第一目标视频图像包括当前视角之内的视频图像,所述第二目标视频图像包括所述第一目标视频图像,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包分别携带至少一个第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。本申请能够降低播放视频时的端到端时延,提高用户体验。

Description

传输媒体数据的方法、客户端和服务器
本申请要求于2018年08月02日提交中国专利局、申请号为201810873806.X、申请名称为“传输媒体数据的方法、客户端和服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及流媒体传输技术领域,并且更具体地,涉及一种传输媒体数据的方法、客户端和服务器。
背景技术
ISO/IEC 23090-2标准规范又称为OMAF(omnidirectional media format,全向媒体格式)标准规范,该规范定义了一种媒体应用格式,该媒体应用格式能够在应用中实现全向媒体的呈现,全向媒体主要是指全向视频(360度视频)和相关音频。OMAF规范首先指定了可以用于将球面视频转换为二维视频的投影方法的列表,其次是如何使用ISO基本媒体文件格式(ISO base media file format,ISOBMFF)存储全向媒体和该媒体相关联的元数据,以及如何在流媒体系统中封装全向媒体的数据和传输全向媒体的数据,例如通过基于超文本传输协议(hyper text transfer protocol,HTTP)的动态自适应流传输(dynamic adaptive streaming over HTTP,DASH),ISO/IEC 23009-1标准中规定的动态自适应流传输。
随着虚拟现实(virtual reality,VR)技术的快速发展,全景视频得到了越来越广泛的应用,基于360度全景视频的VR技术可以创建一种模拟环境,为用户带来交互式的三维动态视觉体验。全景视频由一系列全景图像组成,这些全景图像可以由计算机渲染产生,也可以通过拼接算法将多个相机分别从多个不同角度拍摄的视频图像拼接而成。一般来说,在观看全景视频时,用户在每个时刻观看到的图像内容仅占整个全景图像的一小部分,为了节省传输带宽,在通过远端服务器为用户提供全景图像时,可以只为用户传输每个时刻观看到的内容。
用户在通过客户端观看全景视频时,每个时刻看到的图像内容仅占整幅全景图像的一小部分。因此,为了节省传输带宽,在通过远端的服务器为用户提供视频数据时,可以只向客户端传输用户当前视角能够观看到的视频内容,而当用户的视角发生变化时,需要将变化后的视角范围内的视频内容传输到客户端。因此,如何向用户及时传输用户当前视角范围内的视频内容,并尽可能的减少传输时延是一个需要解决的问题。
发明内容
本申请提供一种传输媒体数据的方法、客户端和服务器,以降低向客户端传输当前视角范围内的视频内容的传输时延。
第一方面,提供了一种传输媒体数据的方法,该方法包括:客户端向服务器发送第一 信息,该第一信息用于指示第一目标视频图像的区域的空间信息;客户端接收服务器发送的第二目标视频图像对应的视频数据包,在该第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,第二信息用于指示第二目标视频图像的区域的空间信息。
其中,上述第二目标视频图像包括第一目标视频图像,上述第一目标视频图像包括当前视角之内的视频图像。
应理解,第二目标视频图像包括第一目标视频图像具体可以是指第二目标视频图像仅包括第一目标视频图像,也可以是指第二目标视频图像除了包括第一目标视频图像之外,还包括其它视频图像。也就是说,服务器除了向客户端发送第一目标视频图像之外,还可以向客户端发送第一目标视频图像之外的其它视频图像。
当前视角可以是用户利用客户端观看视频时的视角,当用户的视角发生变化时,客户端可以再次向服务器发送新的第一信息,以请求获取新的视角之内的视频内容。
应理解,上述至少一个视频数据包共携带第二信息可以是指上述至少一个视频数据包中的每个视频数据包均携带第二信息的至少一部分信息,至少一个视频数据包整体上携带的是第二信息。
另外,上述至少一个视频数据包在共同携带第二信息的同时还可以携带码流数据,这里的码流数据是指对视频图像进行编码后得到的数据。
应理解,上述第一目标视频图像的区域可以是指第一目标视频图像刚好覆盖或者占据的区域,也就是说,第一目标视频图像的区域内的视频图像均属于第一目标视频图像,第一目标视频图像中的视频图像均在第一目标视频图像的区域内。类似的,第二目标视频图像也满足类似的要求。
上述第一目标视频图像的区域的空间信息也可以称为第一目标视频图像的区域的区域空间信息,第一目标视频图像的区域的空间信息用于指示第一目标视频图像的区域的空间范围或者空间位置。
上述第一目标视频图像的区域的空间位置具体可以是针对一个坐标系而言的,该坐标系可以是一个三维坐标系也可以是一个二维坐标系。例如,当采用三维坐标系来表示第一目标视频图像的区域的空间位置时,三维坐标系的原点可以是全景视频图像的中心点或者全景视频图像左上角的点或者全景视频图像中其它固定位置点。另外,上述第一目标视频图像的空间位置也可以是第一目标视频图像在全景视频图像区域中的位置(此时可以采用三维坐标系之外的其它坐标系,如球面坐标系来表示第一目标视频图像的空间位置)。
可选地,上述第一目标视频图像或者第二目标视频图像为全景视频图像中的部分视频图像。
本申请中,当服务器获取了表征当前视角内的视频内容的第一信息后,能够将用户所需要的第二目标视频图像以及第二目标视频图像的位置相关信息实时发送给客户端,能够减少客户端获取第二目标视频图像的位置相关信息的时延。
具体地,通过在第一目标视频图像对应的视频数据包中直接携带第二信息,能够使得客户端更快地获取第二目标视频图像的位置相关信息,减少传输时延。
在第一方面的某些实现方式中,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的空间信息。
当前视角的空间信息还可以称为当前视角的区域空间信息。
当第一信息包括当前视角的空间信息时,客户端通过第一信息直接上报了当前视角的空间信息,服务器根据当前视角的空间信息确定第二目标视频图像,并在向客户端发送的第二目标视频图像的视频数据包中直接携带目标视频图像的空间位置信息。
可选地,当前视角的空间信息包括当前视角的空间位置信息。
可选地,当前视角的空间位置信息为下列信息中的任意一种:当前视角对应的球面区域的中心点的球面坐标值,当前视角对应的球面区域的左上角的球面坐标值,当前视角对应的平面区域的中心点的平面坐标值,当前视角对应的平面区域的左上角的平面坐标值。
可选地,当前视角对应的球面区域的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
可选地,当前视角对应的平面区域的中心点的平面坐标值为(X,Y),其中,X和Y分别表示当前视角对应的平面区域的中心点在二维直角坐标系中的横坐标和纵坐标。
可选地,当前视角对应的平面区域的左上角的平面坐标值为(X,Y),其中,X和Y分别表示,当前视角对应的平面区域的左上角在二维直角坐标系中的横坐标和纵坐标。
应理解,当前视角的空间位置信息还可以是当前视角对应的平面区域的右上角、左下角、右下角以及任意一个设定位置的二维坐标值。
在第一方面的某些实现方式中,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的视角信息。
可选地,当前视角的视角信息包括当前视角的空间信息和当前视角的视点信息。
其中,当前视角的空间信息包括当前视角的视角方向信息、当前视角的视角方向变化速度信息以及当前视角的视角覆盖范围信息中的至少一种。
当前视角的视点信息包括当前视角的视点位置信息、当前视角的视点位移速度信息和当前视角的视点位移速度的加速度信息中的至少一种。
在第一方面的某些实现方式中,上述方法还包括:客户端向服务器发送第一视点信息,该第一视点信息用于指示当前视角所在的当前视点。
客户端通过向服务器上报当前视角所在的视点的视点信息,使得客户端能够从服务器获取与当前视点匹配的视频图像,能够提高用户的观看效果。
上述第一视点信息可以是包含当前视点的多种信息。
可选地,上述第一视点信息包括当前视点的空间位置信息、当前视点的位置变化速度信息和当前视点的位置变化速度的加速度信息中的至少一种。
当客户端向服务器上报当前视点的空间位置信息时,可以使得服务器将与当前视点相对应的视角区域的视频图像下发给客户端,便于客户端获取与当前视点匹配的视频图像。
当客户端将当前视点的位置变化速度信息和/或当前视点的位置变化速度的加速度信息上报给服务器后,服务器能够根据客户端上报的信息对即将发送给客户端的视频图像进行预测渲染和预取下发,能够降低视频图像传输到客户端的时延,进而提升用户体验。
具体地,当前视点的空间位置信息可以表示当前视点所在的位置的坐标值,其中,当前视点的空间位置信息可以是三维坐标系(可以是各种类型的三维坐标系,例如,笛卡尔坐标系、球面坐标系等)下的坐标值。
当前视点的位置(也可以称为当前视点所在的位置)可以是变化的位置,而当前视点的位置变化速度信息就是用于指示当前视点的位置变化快慢的一个信息。
在第一方面的某些实现方式中,第一视点信息包括当前视点的空间位置信息,上述方法还包括:客户端向服务器发送第一指示信息,第一指示信息包括第一标识位,第一标识位的取值用于指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;其中,当第一标识位为第一取值时,当前视点的空间位置信息为相对空间位置信息;
当第一标识位为第二取值时,当前视点的空间位置信息为绝对空间位置信息。
在当前视点的空间位置信息为相对空间位置信息的情况下,当前视点的空间位置信息可以是当前视点相对于起始视点或者某一指定视点或者上一视点的相对位置信息。在当前视点的空间位置信息为绝对位置信息的情况下,当前视点的空间位置信息可以是当前视点相对于某个固定坐标系(该固定坐标系可以是预先设置好的一个固定坐标系)的相对位置信息。
应理解,上述第一标识位既可以对应一个比特位,也可以对应多个比特位。当上述第一标识对应一个比特位时,上述第一取值和第二取值可以分别为0和1,或者,上述第一取值和第二取值可以分别为1和0。
上述当前视角的空间信息可以仅包含当前视角的视角方向信息,也可以既包含当前视角的视角方向信息和当前视角所在视点的位置信息。
上述第一指示信息可以携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中。
本申请中,在当前视点的空间位置信息为相对空间位置信息时,能够减少上报当前视点的空间信息包含的数据量,可以减少资源开销。
为了更好地指示上述当前视角的空间信息的组成,客户端可以向服务器发送指示信息,以指示当前视角的空间信息的组成,使得服务器能够准确获取当前视角的空间信息。
在第一方面的某些实现方式中,上述方法还包括:客户端向服务器发送第二指示信息,第二指示信息包括第二标识位,第二标识位的取值用于指示当前视角的空间信息的组成,所述第二标识位的取值用于指示下列情况中的至少一种:当第二标识位为第三取值时,当前视角的空间信息由当前视角的视角方向信息组成;当第二标识位为第四取值时,当前视角的空间信息由当前视角的视角方向信息和当前视角所在视点的位置信息组成。
可选地,当上述第二标识位为第五取值时,当前视角的空间信息由当前视角的视角方向信息、当前视角所在视点的位置信息以及当前视角的视角大小信息(也可以称为当前视角的视角覆盖范围大小信息)组成。
应理解,上述当前视角的空间信息可以包含当前视角的视角方向信息、当前视角所在视点的位置信息以及当前视角的视角大小信息中的至少一种。通过上述第二标识位的不同取值可以分别指示当前视角的空间信息包含的任意一种信息组合。
例如,可以通过第二标识位为下列不同取值时指示当前视角的空间信息的组成。
当第二标识的取值为X时,用于指示当前视角的空间信息包括当前视角的视角大小信息;
当第二标识位的取值为Y时,用于指示当前视角的空间信息包括当前视角所在视点的位置信息;
当第二标识位的取值为Z时,用于指示当前视角的空间信息包括当前视角所在视点的位置信息和当前视角的视角大小信息;
当第二标识位的取值为W时,用于指示当前视角的空间信息包括当前视角所在视点的位置信息、当前视角的视角大小信息和当前视角的视角方向信息。
可选地,上述第二指示信息可以携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中。
在第一方面的某些实现方式中,客户端向服务器发送第三指示信息,该第三指示信息包含第三标识位,该第三标识位的取值用于指示当前视角的视角方向信息为相对绝对视角方向信息或者相对方向信息;其中,当第三标识位(的取值)为第六取值时,当前视角的视角方向信息为绝对视角方向信息;当第三标识位(的取值)为第七取值时,当前视角的视角方向信息为相对视角方向信息。
上述第三指示信息可以携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中。
在当前视角的视角方向信息为绝对视角方向信息时,当前视角的视角方向信息可以是相对于某个固定坐标系的视角方向信息;在当前视角的视角方向信息为相对视角方向信息时,当前视角的视角方向信息可以是相对于之前的某个视角方向(例如,相对于前一个视角方向的偏转角度,或者相对于初始视角方向的偏转角度)的视角方向信息。
本申请中,在当前视角的视角方向采用相对视角方向信息时,能够减少上报当前视角的视角方向信息包含的数据量,可以减少资源开销。
可选地,上述第一信息、第一视点信息、第一指示信息以及第二指示信息均携带在客户端发送给服务器的实时传输控制协议RTCP源描述报告中。
通过将上述第一信息、第一视点信息、第一指示信息以及第二指示信息均携带在RTCP源描述报告中,能够通过一次发送RTCP源描述报告既可以将上述多个信息发送给服务器,可以减少客户端与服务器交互的次数,减少资源开销。
在第一方面的某些实现方式中,当前视角的空间信息包括当前视角的区域范围信息。
可选地,当前视角的区域范围信息包括当前视角的方位角范围(偏航角范围)和俯仰角范围。
可选地,当前视角的区域范围信息包括当前视角对应的二维视角区域的宽度和高度。
应理解,用户的视角的区域范围也可以是固定的,在这种情况下,客户端只需要向服务器上报一次视角的区域范围信息即可,当用户的视角再发生变化时,用户只需要上报当前视角的位置信息就可以,而不必再重复上报区域范围信息。
应理解,当前视角的区域范围信息具体可以是用于指示指当前视角所在的区域的范围。
可选地,当前视角的区域范围信息包括当前视角的区域的方位角范围(偏航角范围)和俯仰角范围。
可选地,当前视角的区域范围信息包括当前视角对应的平面区域的宽度和高度。
例如,当前视角的区域范围信息具体为H和V,其中,H和V分别表示VR球面坐标的方位角范围和俯仰角范围。
在第一方面的某些实现方式中,全景视频图像包括对全景视频图像进行划分得到的至 少两个子区域,其中,当前视角覆盖至少一个子区域,第一信息用于指示当前视角覆盖的子区域,当前视角覆盖的子区域用于拼接得到第一目标视频图像的区域。
当第一信息指示当前视角覆盖的子区域时,客户端向服务器直接上报的是当前视角覆盖的子区域的信息,这样服务器在接收到第一信息之后可以直接获取当前视角覆盖的子区域的信息,并根据该信息直接确定第二目标视频图像,能够减少服务器确定第二目标视频图像的复杂度。
应理解,在本申请中,当前视角覆盖的子区域既包括是指当前视角的区域完全覆盖的子区域也包括当前视角的区域部分覆盖的子区域。
例如,当前视角的区域完全覆盖了子区域1,并且覆盖了子区域2和子区域3的部分区域,那么,当前视角覆盖的子区域就包括子区域1、子区域2和子区域3。
在第一方面的某些实现方式中,第二信息包括第二目标视频图像的区域的空间信息。
在第一方面的某些实现方式中,第二目标视频图像的区域的空间信息包括第二目标视频图像所在区域的空间位置信息,其中,第二目标视频图像所在区域的空间位置信息为下列信息中的至少一种:第二目标视频图像对应的球面区域的中心点的球面坐标值,第二目标视频图像对应的球面区域的左上角的球面坐标值,第二目标视频图像对应的平面区域的中心点的平面坐标值,第二目标视频图像对应的平面区域的左上角的平面坐标值。
可选地,第二目标视频图像对应的球面区域的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
可选地,第二目标视频图像对应的球面区域的中心点的球面坐标值为(X,Y),其中,X和Y分别表示第二目标视频图像对应的球面区域的中心点在二维直角坐标系中的横坐标和纵坐标。
可选地,第二目标视频图像对应的球面区域的左上角的球面坐标值为(X,Y),其中,X和Y分别表示第二目标视频图像对应的球面区域的左上角在二维直角坐标系中的横坐标和纵坐标。
应理解,第二目标视频图像所在区域的空间位置信息还可以是第二目标视频图像对应的球面区域的右上角、左下角、右下角以及任意一个设定位置的二维坐标值。
在第一方面的某些实现方式中,第二目标视频图像的空间信息包括第二目标视频图像的区域范围信息。
可选地,第二目标视频图像的区域范围信息包括第二目标视频图像的区域的方位角范围(偏航角范围)和俯仰角范围。
例如,第二目标视频图像的区域范围信息的区域范围信息具体为H和V,其中,H和V分别表示VR球面坐标的方位角范围和俯仰角范围。
可选地,第二目标视频图像的区域范围信息包括第二目标视频图像对应的二维视频图像的宽度和高度。
在第一方面的某些实现方式中,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第二信息用于指示第二目标视频图像覆盖的子区域,第二目标视频图像覆盖的子区域用于拼接得到第二目标视频图像的区域。
在第一方面的某些实现方式中,第二信息包括至少一个第三信息,至少一个视频数据包中每个视频数据包均携带第三信息,至少一个视频数据包共携带至少一个第三信息,至少一个视频数据包中的任一视频数据包携带的第三信息用于指示任一视频数据包对应的视频图像所属的子区域。
例如,第二目标视频图像对应的视频数据包的数量为100个,第二信息仅包含一个第三信息,此时第三信息可以只携带在第1个视频数据包或者第100个数据包中。或者,当第二信息包括10个第三信息时,该10个第三信息可以携带在该100个视频数据包中的任意10个视频数据包中。
在第一方面的某些实现方式中,至少一个视频数据包包括携带视角标识的视频数据包。
应理解,上述至少一个视频数据包中的任意一个视频数据包均可以携带该视角标识,该视角标识用于指示该任意一个视频数据包对应的视频图像所在的视角。此外,该视角标识可以与子区域ID是绑定的,也就是说,子区域ID与视角标识存在一个对应关系。
客户端通过视角标识能够区分不同视角的视频数据包,便于对某个视角内的视频数据包对应的视频图像进行拼接。
可选地,上述第二目标视频图像与第一目标视频图像相同。
当第二目标视频图像仅包含第一目标视频图像时,能够减少传输目标视频图像时占用的带宽。
在第一方面的某些实现方式中,第二目标视频图像还包括第一目标视频图像之外的其它视频图像。
当第二目标视频图像包括第一目标视频图像之外的其它视频图像时,客户端在显示当前视角之内的视频图像的同时还可以显示当前视角之外的视频图像,使得用户在突然转向(或者突然改变观看视角)的过程中也能够观看到视频图像。
可选地,第一目标视频图像还包括全景视频图像。
通过传输全景视频图像,能够在显示当前视角对应的第一视频图像之外还显示全景视频图像,使得用户在快速转向(例如,快速转头)时,也能够看到视频图像,可以起到一定的缓存作用,不至于在突然转向时看不到视频内容。
可选地,上述全景视频图像的图像质量低于第一目标视频图像的图像质量。
当需要传输全景视频图像时,通过传输具有较低图像质量的全景视频图像,能够在一定程度上减少数据传输量,减少对带宽的占用。
在第一方面的某些实现方式中,上述方法还包括:客户端接收服务器发送的描述文件,描述文件携带第一视角信息或者第二视角信息,其中,第一视角信息用于指示服务器支持的视角的最大区域范围,第二视角信息用于指示初始视角的区域范围。
通过接收描述文件客户端能够获取服务器支持的视角范围或者服务器的初始视角,便于客户端在后续接收到视频数据包之后根据客户端的支持的视角范围以及初始视角来对解码得到的视频图像进行拼接。
应理解,第一信息指示的第一目标视频图像的区域的范围应当在第一视角信息指示的服务器支持的视角的最大区域范围之内。
应理解,当上述描述文件中携带第二视角信息时,客户端在开机后可以按照初始视角 来呈现视频图像,接下来,客户端可以再按照用户的当前视角来呈现视频图像。
例如,客户端开机后呈现视角1(视角1为初始视角)内的视频图像,但是用户想观看视角2(视角2为用户的当前视角)内的视频图像,那么,接下来,客户端可以再从视角1切换到视角2,并呈现视角2内的视频图像。
这里的初始视角可以是预先设置好的一个视角,当客户端每次开机是都会先呈现该初始视角内的视频图像。
可选地,初始视角的区域范围信息包括初始视角的区域的方位角范围(偏航角范围)和俯仰角范围。
可选地,初始视角的区域范围信息包括初始视角对应的平面区域的宽度和高度。
例如,初始视角的区域范围信息具体为H和V,其中,H和V分别表示VR球面坐标的方位角范围和俯仰角范围。
可选地,第二视角信息除了指示初始视角的区域范围之外还可以用于指示默认视角的区域范围。
应理解,当服务器没有获取到客户端的视角信息或者当前视角对应的视频图像信息时,服务器可以直接将默认视角内的视频图像发送给客户端,使得客户端呈现默认视角范围内的视频图像。
可选地,客户端在接收服务器发送的描述文件之前,上述方法还包括:客户端向服务器发送视频描述命令。
通过向服务器发送视频描述命令,能够触发服务器向客户端发送描述文件。
在第一方面的某些实现方式中,在描述文件携带第二视角信息时,描述文件还携带第三视角信息,第三视角信息还用于指示初始视角的空间位置。
可选地,第三视角信息为下列信息中的任意一种:初始视角对应的球面区域的中心点的球面坐标值,初始视角对应的球面区域的左上角的球面坐标值,初始视角对应的平面区域的中心点的平面坐标值,初始视角对应的平面区域的左上角的平面坐标值。
可选地,初始视角对应的球面区域的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
可选地,初始视角对应的平面区域的中心点的平面坐标值为(X,Y),其中,X和Y分别表示初始视角对应的平面区域的中心点在二维直角坐标系中的横坐标和纵坐标。
可选地,初始视角对应的平面区域的左上角的平面坐标值为(X,Y),其中,X和Y分别表示初始视角对应的平面区域的左上角在二维直角坐标系中的横坐标和纵坐标。
在第一方面的某些实现方式中,上述方法还包括:客户端接收服务器发送的描述文件,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,描述文件携带各个子区域的子区域描述信息,各个子区域是全景视频图像的区域的子区域,子区域描述信息包括子区域的空间信息。
通过在描述文件中携带子区域描述信息,能够使得客户端在接收到目标视频图像中的视频数据包后根据各个子区域的信息对视频图像进行拼接,从而得到当前视角之内的视频内容。
可选地,客户端在接收服务器发送的描述文件之前,上述方法还包括:客户端向服务 器发送视频描述命令。
通过向服务器发送视频描述命令,能够触发服务器向客户端发送描述文件。
在第一方面的某些实现方式中,子区域描述信息包括各个子区域的平面空间信息,子区域描述信息还包括各个子区域之内的视频图像的映射类型信息,各个子区域的球面空间信息用于根据映射类型信息和各个子区域的平面空间信息确定。
可选地,子区域的平面空间信息为子区域的中心点的二维坐标值或者子区域的左上角的二维坐标值。
可选地,子区域的中心点的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的中心点在二维直角坐标系中的横坐标和纵坐标。
可选地,子区域的左上角的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的左上角在二维直角坐标系中的横坐标和纵坐标。
应理解,子区域的平面空间信息还可以是子区域的右上角、左下角、右下角以及任意一个设定位置的二维坐标值。
可选地,上述映射类型信息指示的映射类型为经纬图、六面体和八面体中的任意一种。
在第一方面的某些实现方式中,子区域描述信息包括各个子区域的球面空间信息,子区域描述信息还包括各个子区域的形状信息。
可选地,子区域的球面空间信息可以用子区域的中心点的方位角、俯仰角以及倾斜角来表示。
子区域的形状信息可以表示子区域的形状类型,例如,子区域的形状类型可以是由四个大圆围成,也可以是两个大圆和一个小圆围成等。
在第一方面的某些实现方式中,子区域的三维空间信息包括子区域图像的映射类型、子区域的形状信息、子区域的角度信息以及子区域的区域范围信息。
可选地,上述描述文件还包括全景码流描述信息,全景码流描述信息包括全景视频图像的映射类型信息和大小信息。
可选地,第一信息携带在客户端向服务器发送的实时传输控制协议(real-time transport control protocol,RTCP)源描述报告中,目标视频图像中的视频数据包为流媒体实时传输协议(real-time transmit protocol,RTP)视频数据包。
可选地,第一信息和第二信息为自定义的TLV格式信息。
可选地,上述第一信息和第二信息为态图像专家组媒体传输(MPEG Media Transport,MMT)中定义的一种针对多媒体传输应用的信号格式的信息。
具体地,客户端和服务器之间用于传输当前视点或者当前视点的信息既可以是实时传输协议中的信息,也可以是自定义的TLV格式的信息,还可以是MMT中定义的用于多媒体传输应用的格式的信息。
在第一方面的某些实现方式中,至少一个视频数据包共携带第二视点信息,该第二视点信息用于指示第二目标视频图像对应的视点。
可选地,上述第二视点信息可以包含第二目标视频图像对应的视点的多种信息。
假设,第二目标视频图像对应的视点为第一视点,那么,上述第二视点信息可以包括第一视点的空间位置信息、第一视点的位置变化速度信息和第一视点的位置变化速度的加速度信息中的至少一种。
上述第二视点信息与第一视点信息类似,也可以包含第一视点信息包含的各种信息,具体可以参见第一视点信息的相关内容,这里不再重复描述。
通过在视频数据包中携带待显示目标视频图像对应的视点,能够便于客户端按照相应的视点呈现待显示的目标视频图像,提高显示效果。
第二方面,提供了一种传输媒体数据的方法,该方法包括:客户端接收服务器发送的描述文件,描述文件携带至少两个会话的会话描述信息,至少两个会话为客户端与服务器之间的会话,至少两个会话用于传输各自对应的子区域图像的码流数据,会话描述信息包括通过各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;客户端向服务器发送第一信息,第一信息用于指示当前视角所覆盖的子区域对应的会话,第一信息根据当前视角和会话描述信息确定;客户端接收服务器发送的目标视频图像的码流数据,目标视频图像包括当前视角所属的子区域的图像。
上述当前视角可以是指用户利用客户端观看视频时的视角,当用户的视角发生变化时,客户端可以再次向服务器发送新的第一信息,以请求获取新的视角之内的视频内容。
可选地,上述目标视频图像为全景视频图像。
应理解,上述至少两个会话为客户端与服务器之间建立的全部会话或者部分会话。
本申请中,在接收服务器传输的目标视频图像之前通过获取携带各个会话的会话描述信息的描述文件,便于客户端根据描述信息对接收到的目标视频图像的码流数据进行处理。
应理解,除了采用上述至少两个会话来传输各自对应的子区域图像的码流数据之外,还可以采用一个会话来传输全部子区域的视频图像。
例如,目标视频图像包含子区域1(对应会话1)、子区域2(对应会话2)和子区域3(对应会话3)内的视频图像,那么,可以采用会话1、会话2以及会话3来分别传输子区域1、子区域2和子区域3内的视频图像。或者,也可以采用一个会话(会话1、会话2和会话3中的任意一个会话)来传输子区域1至子区域3内的视频图像。
在第二方面的某些实现方式中,客户端接收服务器发送的目标视频图像的码流数据,包括:客户端通过当前视角所覆盖的子区域对应的会话接收当前视角所覆盖的子区域图像的码流数据,以得到目标视频图像的码流数据。
在第二方面的某些实现方式中,子区域的区域空间信息为子区域的平面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据的映射类型信息。
在第二方面的某些实现方式中,子区域的区域空间信息为子区域的球面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据对应的子区域的形状信息。
在第二方面的某些实现方式中,第一信息携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中,
在第二方面的某些实现方式中,第一信息属于TLV格式信息。
应理解,上述对第一方面中的各个实现方式中的相应内容的限定和解释同样适用于第二方面中的各个实现方式。
第三方面,提供了一种传输媒体数据的方法,该方法包括:服务器接收客户端发送的 第一信息,第一信息用于指示第一目标视频图像的区域的空间位置,第一目标视频图像包括当前视角之内的视频图像;服务器根据第一信息确定第二目标视频图像,第二目标视频图像包括第一目标视频图像;服务器向客户端发送第二目标视频图像对应的视频数据包,在第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,第二信息用于指示第二目标视频图像的区域的空间信息。
本申请中,当服务器获取了表征当前视角内的视频内容的第一信息后,能够将用户所需要的第二目标视频图像以及第二目标视频图像的位置相关信息实时发送给客户端,能够减少客户端获取第二目标视频图像的位置相关信息的时延。
在第三方面的某些实现方式中,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的空间信息。
在第三方面的某些实现方式中,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的视角信息。
可选地,当前视角的视角信息包括当前视角的空间信息和当前视角的视点信息。
其中,当前视角的空间信息包括当前视角的视角方向信息、当前视角的视角方向变化速度信息以及当前视角的视角覆盖范围信息中的至少一种。
当前视角的视点信息包括当前视角的视点位置信息、当前视角的视点位移速度信息和当前视角的视点位移速度的加速度信息中的至少一种。
在第三方面的某些实现方式中,上述方法还包括:服务器接收客户端发送的第一视点信息,该第一视点信息用于指示当前视角所在的当前视点。
服务器通过接收客户端上报的当前视角所在的视点的视点信息,使得服务器能够将与当前视点匹配的视频图像传输给客户端,能够提高用户的观看效果。
上述第一视点信息可以是包含当前视点的多种信息。
可选地,上述第一视点信息包括当前视点的空间位置信息、当前视点的位置变化速度信息和当前视点的位置变化速度的加速度信息中的至少一种。
服务器获取到客户端上报的当前视点的空间位置信息时,能够获取与当前视点相对应的视角区域的视频图像,并将该视频图像下发给客户端,使得客户端能够获取与当前视点匹配的视频图像,可以提高显示效果。
当服务器获取到客户端上报的当前视点的位置变化速度信息和/或当前视点的位置变化速度的加速度信息后,服务器能够根据这些上报的信息对即将发送给客户端的视频图像进行预测渲染和预取下发,能够降低视频图像传输到客户端的时延,进而提升用户体验。
可选地,当前视点的空间位置信息具体可以是当前视点所在的位置的坐标值,其中,当前视点的空间位置信息可以是三维坐标系(可以是各种类型的三维坐标系,例如,笛卡尔坐标系、球面坐标系等)下的坐标值。
当前视点的位置(也可以称为当前视点所在的位置)可以是变化的位置,而当前视点的位置变化速度信息就是用于指示当前视点的位置变化快慢的一个信息。
在第三方面的某些实现方式中,第一视点信息包括当前视点的空间位置信息,上述方法还包括:服务器接收客户端发送的第一指示信息,该第一指示信息包括第一标识位,第一标识位的取值用于指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;其中,当第一标识位为第一取值时,当前视点的空间位置信息为相对空间位置信 息;当第一标识位为第二取值时,当前视点的空间位置信息为绝对空间位置信息。
在当前视点的空间位置信息为相对空间位置信息的情况下,当前视点的空间位置信息可以是当前视点相对于起始视点或者某一指定视点或者上一视点的相对位置信息。
上述当前视角的空间信息可以仅包含当前视角的视角方向信息,也可以既包含当前视角的视角方向信息和当前视角所在视点的位置信息。
为了更好地指示上述当前视角的空间信息的组成,客户端可以向服务器发送指示信息,以指示当前视角的空间信息的组成,使得服务器能够准确获取当前视角的空间信息。
在第三方面的某些实现方式中,上述方法还包括:服务器接收客户端发送的第二指示信息,该第二指示信息包括第二标识位,第二标识位的取值用于指示当前视角的空间信息的组成,所述第二标识位的取值用于指示下列情况中的至少一种:当第二标识位为第三取值时,当前视角的空间信息由当前视角的视角方向信息组成;当第二标识位为第四取值时,当前视角的空间信息由当前视角的视角方向信息和当前视角所在视点的位置信息组成。
可选地,当上述第二标识位为第五取值时,当前视角的空间信息由当前视角的视角方向信息、当前视角所在视点的位置信息以及当前视角的视角大小信息组成。
应理解,上述当前视角的空间信息可以包含当前视角的视角方向信息、当前视角所在视点的位置信息以及当前视角的视角大小信息中的至少一种。通过上述第二标识位的不同取值可以分别指示当前视角的空间信息包含的任意一种信息组合。
可选地,上述第一信息、第一视点信息、第一指示信息以及第二指示信息均携带在客户端发送给服务器的实时传输控制协议RTCP源描述报告中。
通过将上述第一信息、第一视点信息、第一指示信息以及第二指示信息均携带在RTCP源描述报告中,能够通过一次接受RTCP源描述报告就可以获得客户端发送的上述多个信息,可以减少客户端与服务器交互的次数,减少资源开销。
在第三方面的某些实现方式中,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第一信息用于指示当前视角覆盖的子区域,当前视角覆盖的子区域用于拼接得到第一目标视频图像的区域。
在第三方面的某些实现方式中,第二信息包括第二目标视频图像的区域的空间信息。
在第三方面的某些实现方式中,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第二信息用于指示第二目标视频图像覆盖的子区域,第二目标视频图像覆盖的子区域用于拼接得到第二目标视频图像的区域。
在第三方面的某些实现方式中,第二信息包括至少一个第三信息,至少一个视频数据包中每个视频数据包均携带第三信息,至少一个视频数据包共携带至少一个第三信息,至少一个视频数据包中的任一视频数据包携带的第三信息用于指示任一视频数据包对应的视频图像所属的子区域。
在第三方面的某些实现方式中,至少一个视频数据包包括携带视角标识的视频数据包。
在第三方面的某些实现方式中,上述方法还包括:服务器向客户端发送描述文件,描述文件携带第一视角信息或者第二视角信息,其中,第一视角信息用于指示服务器支持的视角的最大区域范围,第二视角信息用于指示初始视角的区域范围。
在第三方面的某些实现方式中,在描述文件携带第二视角信息时,描述文件还携带第三视角信息,第三视角信息还用于指示初始视角的空间位置。
在第三方面的某些实现方式中,上述方法还包括:服务器向客户端发送描述文件,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,描述文件携带各个子区域的子区域描述信息,子区域描述信息包括子区域的空间信息。
在第三方面的某些实现方式中,子区域描述信息包括各个子区域的平面空间信息,子区域描述信息还包括各个子区域之内的视频图像的映射类型信息,各个子区域的球面空间信息用于根据映射类型信息和各个子区域的平面空间信息确定。
在第三方面的某些实现方式中,子区域描述信息包括各个子区域的球面空间信息,子区域描述信息还包括各个子区域的形状信息。
上述对第一方面中的各种实现方式的相应内容的限定和解释同样适用于第三方面中的各个实现方式。
在第三方面的某些实现方式中,至少一个视频数据包共携带第二视点信息,该第二视点信息用于指示第二目标视频图像对应的视点。
可选地,上述第二视点信息可以包含第二目标视频图像对应的视点的多种信息。
假设,第二目标视频图像对应的视点为第一视点,那么,上述第二视点信息可以包括第一视点的空间位置信息、第一视点的位置变化速度信息和第一视点的位置变化速度的加速度信息中的至少一种。
服务器通过在视频数据包中携带待显示目标视频图像对应的视点,便于客户端按照相应的视点呈现待显示的目标视频图像,提高显示效果。
第四方面,提供一种传输媒体数据的方法,该方法包括:服务器向客户端发送描述文件,描述文件携带至少两个会话的会话描述信息,至少两个会话为客户端与服务器之间的会话,至少两个会话用于传输各自对应的子区域图像的码流数据,会话描述信息包括通过各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;服务器接收客户端发送的第一信息,第一信息用于指示当前视角所覆盖的子区域对应的会话,第一信息根据当前视角和会话描述信息确定;服务器向客户端发送目标视频图像的码流数据,目标视频图像包括当前视角所覆盖的子区域之内的视频图像。
应理解,上述至少两个会话为客户端与服务器之间建立的全部会话或者部分会话。
本申请中,服务器在向客户端传输目标视频图像之前通过向客户端传输携带各个会话的会话描述信息的描述文件,便于客户端根据描述信息对接收到的目标视频图像的码流数据进行处理。
在第四方面的某些实现方式中,客户端接收服务器发送的目标视频图像的码流数据,包括:客户端通过当前视角所覆盖的子区域对应的会话接收当前视角所覆盖的子区域图像的码流数据,以得到目标视频图像的码流数据。
在第四方面的某些实现方式中,子区域的区域空间信息为子区域的平面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据的映射类型信息。
在第四方面的某些实现方式中,子区域的区域空间信息为子区域的球面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据对应的子区域的形 状信息。
在第四方面的某些实现方式中,第一信息携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中,
在第四方面的某些实现方式中,第一信息属于TLV格式信息。
应理解,上述对第一方面中的各个实现方式中的相应内容的限定和解释同样适用于第四方面中的各个实现方式。
第五方面,提供一种客户端,该客户端包括用于执行上述第一方面或第二方面中任意一种实现方式中的方法的模块。
应理解,客户端是能够为用户呈现视频图像的设备。
第六方面,提供一种服务器,该服务器包括用于执行上述第三方面或第四方面中任意一种实现方式中的方法的模块。
应理解,服务器是能够存储视频图像的设备,服务器可以将视频图像提供给客户端,使得客户端能够将视频图像呈现给用户。
第七方面,提供一种客户端,包括:相互耦合的非易失性存储器和处理器;其中,所述处理器用于调用存储在所述存储器中的程序代码以执行第一方面或第二方面中的任意一种实现方式中的方法的部分或全部步骤。
第八方面,提供一种服务器,包括:相互耦合的非易失性存储器和处理器;其中,所述处理器用于调用存储在所述存储器中的程序代码以执行第三方面或第四方面中的任意一种实现方式中的方法的部分或全部步骤。
第九方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行第一方面、第二方面、第三方面以及第四方面中的任意一种实现方式中的方法的部分或全部步骤的指令。
第十方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面、第二方面、第三方面以及第四方面中的任意一种实现方式中的方法的部分或全部步骤的指令。
附图说明
图1是本申请实施例的传输媒体数据的方法的示意性流程图;
图2是本申请实施例的传输媒体数据的方法的示意性流程图;
图3是本申请实施例的传输媒体数据的方法的示意性流程图;
图4是本申请实施例的传输媒体数据的方法的示意性流程图;
图5是本申请实施例的传输媒体数据的方法的示意性流程图;
图6是本申请实施例的传输媒体数据的方法的示意性流程图;
图7是本申请实施例的传输媒体数据的方法的示意性流程图;
图8是本申请实施例的传输媒体数据的方法的示意性流程图;
图9是本申请实施例的传输媒体数据的方法的示意性流程图;
图10是本申请实施例的客户端的示意性框图;
图11是本申请实施例的服务器的示意性框图;
图12是本申请实施例的传输媒体数据的装置的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1是本申请实施例的传输媒体数据的方法的示意性流程图。图1所示的方法可以由客户端执行,客户端可以是位于终端设备上为客户提供视频播放服务的程序,终端设备可以是具有播放全景视频功能的设备,例如,VR设备。
图1所示的方法包括步骤110和步骤120,下面对步骤110和步骤120进行详细的描述。
110、客户端向服务器发送第一信息。
上述第一信息是用于确定当前视角之内的视频内容的信息。该第一信息可以具体用于指示第一目标视频图像的区域的空间信息,所述第一目标视频图像包括当前视角之内的视频图像。
可选地,当前视角可以是指用户利用客户端观看视频时的视角。
客户端通过向服务器发送第一信息来请求获取当前视角之内的视频内容,如果用户的视角发生变化,那么,客户端向服务器再次发送第一信息,以请求获取新的视角之内的视频内容。
120、客户端接收服务器发送的第二目标视频图像对应的视频数据包。
其中,在第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,第二信息用于指示第二目标视频图像的区域的空间信息,第二目标视频图像包括第一目标视频图像。
应理解,第二目标视频图像包括第一目标视频图像具体可以是指第二目标视频图像仅包括第一目标视频图像,也可以是指第二目标视频图像除了包括第一目标视频图像之外,还包括其它视频图像。也就是说,服务器除了向客户端发送第一目标视频图像之外,还可以向客户端发送第一目标视频图像之外的其它视频图像。
应理解,上述至少一个视频数据包共携带第二信息可以是指上述至少一个视频数据包中的每个视频数据包均携带第二信息的至少一部分信息,至少一个视频数据包整体上携带的是第二信息。
另外,上述至少一个视频数据包在共同携带第二信息的同时还可以携带码流数据,这里的码流数据是指对视频图像进行编码后得到的数据。
应理解,上述第一目标视频图像的区域可以是指第一目标视频图像刚好覆盖或者占据的区域,也就是说,第一目标视频图像的区域内的视频图像均属于第一目标视频图像,第一目标视频图像中的视频图像均在第一目标视频图像的区域内。类似的,第二目标视频图像也满足类似的要求。
上述第一目标视频图像的区域的空间信息也可以称为第一目标视频图像的区域的区域空间信息,第一目标视频图像的区域的空间信息用于指示第一目标视频图像的区域的空间范围或者空间位置。
上述第一目标视频图像的区域的空间位置具体可以是针对一个坐标系而言的,该坐标系可以是一个三维坐标系也可以是一个二维坐标系。例如,当采用三维坐标系来表示第一目标视频图像的区域的空间位置时,三维坐标系的原点可以是全景视频图像的中心点或者 全景视频图像左上角的点或者全景视频图像中其它固定位置点。另外,上述第一目标视频图像的空间位置也可以是第一目标视频图像在全景视频图像区域中的位置(此时可以采用三维坐标系之外的其它坐标系,如球面坐标系来表示第一目标视频图像的空间位置)。
可选地,第二信息包括至少一个第三信息,至少一个视频数据包中每个视频数据包均携带第三信息,至少一个视频数据包共携带至少一个第三信息,至少一个视频数据包中的任一视频数据包携带的第三信息用于指示任一视频数据包对应的视频图像所属的子区域。
例如,第二目标视频图像对应的视频数据包的数量为100个,第二信息仅包含一个第三信息,此时第三信息可以只携带在第1个视频数据包或者第100个数据包中。或者,当第二信息包括10个第三信息时,该10个第三信息可以携带在该100个视频数据包中的任意10个视频数据包中。
本申请中,当服务器获取了表征当前视角内的视频内容的相关信息后,能够将用户所需要的目标视频图像以及目标视频图像的位置相关信息实时发送给客户端,能够减少客户端获取目标视频图像的位置相关信息的时延。
可选地,作为一个实施例,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的空间信息。
当前视角的空间信息还可以称为当前视角的区域空间信息。
当第一信息包括当前视角的空间信息时,客户端通过第一信息直接上报了当前视角的空间信息,服务器根据当前视角的空间信息确定第二目标视频图像,并在向客户端发送的第二目标视频图像的视频数据包中直接携带目标视频图像的空间位置信息。
可选地,当前视角的空间信息包括当前视角的空间位置信息。
当前视角的空间位置信息包括以下几种情况:
(1)当前视角的中心点的球面坐标值。
例如,当前视角的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
(2)当前视角对应的二维视角区域的中心点的二维坐标值。
例如,当前视角对应的二维视角区域的中心点的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的中心点在二维直角坐标系中的横坐标和纵坐标。
(3)当前视角对应的二维视角区域的左上角/右上角/左下角/右下角的二维坐标值。
例如,当前视角对应的二维视角区域的左上角的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的左上角在二维直角坐标系中的横坐标和纵坐标。
可选地,当前视角的空间信息包括当前视角的区域范围信息。
当前视角的区域范围信息包括以下几种情况:
(4)当前视角的方位角范围(偏航角范围)和俯仰角范围。
例如,当前视角的方位角范围(偏航角范围)为110度,俯仰角范围为90度。
(5)当前视角的覆盖范围包括当前视角对应的二维视角区域的宽度和高度。
应理解,用户的视角的覆盖范围也可以是固定的,在这种情况下,客户端只需要向服务器上报一次视角的区域范围信息即可,当用户的视角再发生变化时,用户只需要上报当前视角的位置信息既可以,而不必再重复上报区域范围信息。
可选地,第二目标视频图像的区域的空间信息包括第二目标视频图像所在区域的空间位置信息。
第二目标视频图像所在区域的空间位置信息具体包括以下几种情况:
(6)第二目标视频图像的中心点的球面坐标值。
例如,第二目标视频图像的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
(7)第二目标视频图像对应的二维图像的中心点的二维坐标值。
例如,第二目标视频图像对应的二维图像的中心点的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的中心点在二维直角坐标系中的横坐标和纵坐标。
(8)第二目标视频图像对应的二维图像的左上角/右上角/左下角/右下角的二维坐标值
例如,第二目标视频图像对应的二维图像的左上角的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的左上角在二维直角坐标系中的横坐标和纵坐标。
可选地,第二目标视频图像的空间信息包括第二目标视频图像的区域范围信息。
第二目标视频图像的区域范围信息包括以下具体情况:
(9)第二目标视频图像的覆盖范围包括第二目标视频图像的方位角范围(偏航角范围)和俯仰角范围。
(10)第二目标视频图像的覆盖范围包括第二目标视频图像对应的二维视频图像的宽度和高度。
例如,第二目标视频图像的区域范围信息的区域范围信息具体为H和V,其中,H和V分别表示VR球面坐标的方位角范围和俯仰角范围。
可选地,作为一个实施例,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第二信息用于指示第二目标视频图像覆盖的子区域,第二目标视频图像覆盖的子区域用于拼接得到第二目标视频图像的区域。
可选地,至少一个视频数据包包括携带视角标识的视频数据包。
应理解,上述至少一个视频数据包中的任意一个视频数据包均可以携带该视角标识,该视角标识用于指示该任意一个视频数据包对应的视频图像所在的视角。此外,该视角标识可以与子区域ID是绑定的,也就是说,子区域ID与视角标识存在一个对应关系。
客户端通过视角标识能够区分不同视角的视频数据包,便于对某个视角内的视频数据包对应的视频图像进行拼接。
可选地,上述第二目标视频图像与第一目标视频图像相同。
当第二目标视频图像仅包含第一目标视频图像时,能够减少传输目标视频图像时占用的带宽。
可选地,作为一个实施例,第二目标视频图像还包括第一目标视频图像之外的其它视频图像。
当第二目标视频图像包括第一目标视频图像之外的其它视频图像时,客户端在显示当前视角之内的视频图像的同时还可以显示当前视角之外的视频图像,使得用户在突然转向(或者突然改变观看视角)的过程中也能够观看到视频图像。
可选地,第一目标视频图像还包括全景视频图像。
通过传输全景视频图像,能够在显示当前视角对应的第一视频图像之外还显示全景视频图像,使得用户在快速转向(例如,快速转头)时,也能够看到视频图像,可以起到一定的缓存作用,不至于在突然转向时看不到视频内容。
可选地,上述全景视频图像的图像质量低于第一目标视频图像的图像质量。
当需要传输全景视频图像时,通过传输具有较低图像质量的全景视频图像,能够在一定程度上减少数据传输量,减少对带宽的占用。
可选地,作为一个实施例,图1所示的方法还包括:客户端接收服务器发送的描述文件,该描述文件携带第一视角信息或者第二视角信息。
其中,上述第一视角信息用于指示服务器支持的视角的最大区域范围,第二视角信息用于指示初始视角的区域范围。
通过接收描述文件客户端能够获取服务器支持的视角范围或者服务器的初始视角,便于客户端在后续接收到视频数据包之后根据客户端的支持的视角范围以及初始视角来对解码得到的视频图像进行拼接。
应理解,第一信息指示的第一目标视频图像的区域的范围应当在第一视角信息指示的服务器支持的视角的最大区域范围之内。
应理解,当上述描述文件中携带第二视角信息时,客户端在开机后可以按照初始视角来呈现视频图像,接下来,客户端可以再按照用户的当前视角来呈现视频图像。
例如,客户端开机后呈现视角1(视角1为初始视角)内的视频图像,但是用户想观看视角2(视角2为用户的当前视角)内的视频图像,那么,接下来,客户端可以再从视角1切换到视角2,并呈现视角2内的视频图像。
这里的初始视角可以是预先设置好的一个视角,当客户端每次开机是都会先呈现该初始视角内的视频图像。
应理解,初始视角的区域范围信息可以包括初始视角的区域的方位角范围(偏航角范围)和俯仰角范围。始视角的区域范围信息也可以包括初始视角对应的平面区域的宽度和高度。
例如,初始视角的区域范围信息具体为H和V,其中,H和V分别表示VR球面坐标的方位角范围和俯仰角范围。
可选地,第二视角信息除了指示初始视角的区域范围之外还可以用于指示默认视角的区域范围。
应理解,当服务器没有获取到客户端的视角信息或者当前视角对应的视频图像信息时,服务器可以直接将默认视角内的视频图像发送给客户端,使得客户端呈现默认视角范围内的视频图像。
可选地,客户端在接收服务器发送的描述文件之前,上述方法还包括:客户端向服务器发送视频描述命令。
通过向服务器发送视频描述命令,能够触发服务器向客户端发送描述文件。
可选地,作为一个实施例,当描述文件携带第二视角信息时,描述文件还携带第三视角信息,第三视角信息还用于指示初始视角的空间位置。
可选地,第三视角信息为下列信息(11)至(14)中的任意一种:
(11)初始视角对应的球面区域的中心点的球面坐标值;
(12)初始视角对应的球面区域的左上角的球面坐标值;
(13)初始视角对应的平面区域的中心点的平面坐标值;
(14)初始视角对应的平面区域的左上角的平面坐标值。
例如,初始视角对应的球面区域的中心点的球面坐标值为(X,Y,Z),其中,X对应球面坐标的方位角(azimuth)或者偏航角(yaw),Y对应球面坐标的俯仰角(pitch或者elevation),Z对应球面坐标的倾斜角(tilt)或者翻滚角(roll)。
例如,初始视角对应的平面区域的中心点的平面坐标值为(X,Y),其中,X和Y分别表示初始视角对应的平面区域的中心点在二维直角坐标系中的横坐标和纵坐标。
例如,初始视角对应的平面区域的左上角的平面坐标值为(X,Y),其中,X和Y分别表示初始视角对应的平面区域的左上角在二维直角坐标系中的横坐标和纵坐标。
可选地,作为一个实施例,图1所示的方法还包括:客户端接收服务器发送的描述文件,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,该描述文件携带各个子区域的子区域描述信息,各个子区域是全景视频图像的区域的子区域,子区域描述信息包括子区域的空间信息。
通过在描述文件中携带子区域描述信息,能够使得客户端在接收到目标视频图像中的视频数据包后根据各个子区域的信息对视频图像进行拼接,从而得到当前视角之内的视频内容。
可选地,客户端在接收服务器发送的描述文件之前,图1所示的方法还包括:客户端向服务器发送视频描述命令。
通过向服务器发送视频描述命令,能够触发服务器向客户端发送描述文件。
可选地,作为一个实施例,子区域描述信息包括各个子区域的平面空间信息,子区域描述信息还包括各个子区域之内的视频图像的映射类型信息,各个子区域的球面空间信息用于根据映射类型信息和各个子区域的平面空间信息确定。
可选地,子区域的平面空间信息为子区域的中心点的二维坐标值或者子区域的左上角的二维坐标值。
例如,子区域的中心点的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的中心点在二维直角坐标系中的横坐标和纵坐标。
例如,子区域的左上角的二维坐标值为(X,Y),其中,X和Y分别表示二维视角区域的左上角在二维直角坐标系中的横坐标和纵坐标。
应理解,子区域的平面空间信息还可以是子区域的右上角、左下角、右下角以及任意一个设定位置的二维坐标值。
可选地,上述映射类型信息指示的映射类型为经纬图、六面体和八面体中的任意一种。
可选地,作为一个实施例,子区域描述信息包括各个子区域的球面空间信息,子区域描述信息还包括各个子区域的形状信息。
可选地,子区域的球面空间信息可以用子区域的中心点的方位角、俯仰角以及倾斜角来表示。
子区域的形状信息可以表示子区域的形状类型,例如,子区域的形状类型可以是由四个大圆围成,也可以是两个大圆和一个小圆围成等。
可选地,作为一个实施例,子区域的三维空间信息包括子区域图像的映射类型、子区域的形状信息、子区域的角度信息以及子区域的区域范围信息。
可选地,上述描述文件还包括全景码流描述信息,全景码流描述信息包括全景视频图像的映射类型信息和大小信息。
可选地,第一信息携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中,目标视频图像中的视频数据包为流媒体实时传输协议RTP视频数据包。
可选地,第一信息和第二信息为自定义的TLV格式信息。
图2是本申请实施例的传输媒体数据的方法的示意性流程图。与图1类似,图2所示的方法也可以由客户端执行。
应理解,上文中对图1所示的方法中相应内容的限定和解释同样适用于图2所示的方法,为了简洁,下面在介绍图2所示的方法时适当省略重复的描述。
图2所示的方法包括步骤210和步骤230,下面对步骤210至步骤230进行详细的描述。
210、客户端接收服务器发送的描述文件。
其中,该描述文件携带至少两个会话的会话描述信息,至少两个会话为客户端与服务器之间的会话,至少两个会话用于传输各自对应的子区域图像的码流数据,会话描述信息包括通过各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
220、客户端向服务器发送第一信息。
其中,该第一信息用于指示当前视角所覆盖的子区域对应的会话,第一信息根据当前视角和会话描述信息确定。
230、客户端接收服务器发送的目标视频图像的码流数据。
其中,该目标视频图像包括当前视角所属的子区域的图像。
上述当前视角可以是指用户利用客户端观看视频时的视角,当用户的视角发生变化时,客户端可以再次向服务器发送新的第一信息,以请求获取新的视角之内的视频内容。
可选地,上述目标视频图像为全景视频图像。
本申请中,在接收服务器传输的目标视频图像之前通过获取携带各个会话的会话描述信息的描述文件,便于客户端根据描述信息对接收到的目标视频图像的码流数据进行处理。
应理解,除了采用上述至少两个会话来传输各自对应的子区域图像的码流数据之外,还可以采用一个会话来传输全部子区域的视频图像。
例如,目标视频图像包含子区域1(对应会话1)、子区域2(对应会话2)和子区域3(对应会话3)内的视频图像,那么,可以采用会话1、会话2以及会话3来分别传输子区域1、子区域2和子区域3内的视频图像。或者,也可以采用一个会话(会话1、会话2和会话3中的任意一个会话)来传输子区域1至子区域3内的视频图像。
可选地,作为一个实施例,客户端接收服务器发送的目标视频图像的码流数据,包括:客户端通过当前视角所覆盖的子区域对应的会话接收当前视角所覆盖的子区域图像的码流数据,以得到目标视频图像的码流数据。
可选地,作为一个实施例,子区域的区域空间信息为子区域的平面区域空间信息,会 话描述信息还包括各个会话各自所传输的子区域图像的码流数据的映射类型信息。
可选地,作为一个实施例,子区域的区域空间信息为子区域的球面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据对应的子区域的形状信息。
可选地,作为一个实施例,第一信息携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中,
可选地,作为一个实施例,第一信息属于TLV格式信息。
上面结合图1和2从客户端的角度对本申请实施例的传输媒体数据的方法进行了介绍,下面结合图3和图4从服务器的角度对本申请实施例的传输媒体数据的方法进行介绍,应理解,图3所示的方法与图1所示的方法是对应的,图4所示的方法与图2所示的方法时对应的,为了避免重复,下面对图3和图4所示的方法进行介绍时适当省略重复的描述。
图3是本申请实施例的传输媒体数据的方法的示意性流程图。图3所示的方法可以由服务器执行,图3所示的方法包括步骤310至步骤330,下面对步骤310至步骤330进行描述。
310、服务器接收客户端发送的第一信息。
其中,第一信息用于指示第一目标视频图像的区域的空间位置,第一目标视频图像包括当前视角之内的视频图像。
320、服务器根据第一信息确定第二目标视频图像。
其中,第二目标视频图像包括第一目标视频图像;
330、服务器向客户端发送第二目标视频图像对应的视频数据包。
其中,在第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,第二信息用于指示第二目标视频图像的区域的空间信息。
本申请中,当服务器获取了表征当前视角内的视频内容的第一信息后,能够将用户所需要的第二目标视频图像以及第二目标视频图像的位置相关信息实时发送给客户端,能够减少客户端获取第二目标视频图像的位置相关信息的时延。
可选地,作为一个实施例,第一目标视频图像为当前视角之内的视频图像,第一信息包括当前视角的空间信息。
可选地,作为一个实施例,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第一信息用于指示当前视角覆盖的子区域,当前视角覆盖的子区域用于拼接得到第一目标视频图像的区域。
可选地,作为一个实施例,第二信息包括第二目标视频图像的区域的空间信息。
可选地,作为一个实施例,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,其中,当前视角覆盖至少一个子区域,第二信息用于指示第二目标视频图像覆盖的子区域,第二目标视频图像覆盖的子区域用于拼接得到第二目标视频图像的区域。
可选地,作为一个实施例,第二信息包括至少一个第三信息,至少一个视频数据包中每个视频数据包均携带第三信息,至少一个视频数据包共携带至少一个第三信息,至少一个视频数据包中的任一视频数据包携带的第三信息用于指示任一视频数据包对应的视频图像所属的子区域。
可选地,作为一个实施例,至少一个视频数据包包括携带视角标识的视频数据包。
可选地,作为一个实施例,上述方法还包括:服务器向客户端发送描述文件,描述文件携带第一视角信息或者第二视角信息,其中,第一视角信息用于指示服务器支持的视角的最大区域范围,第二视角信息用于指示初始视角的区域范围。
可选地,作为一个实施例,在描述文件携带第二视角信息时,描述文件还携带第三视角信息,第三视角信息还用于指示初始视角的空间位置。
可选地,作为一个实施例,上述方法还包括:服务器向客户端发送描述文件,全景视频图像包括对全景视频图像进行划分得到的至少两个子区域,描述文件携带各个子区域的子区域描述信息,子区域描述信息包括子区域的空间信息。
可选地,作为一个实施例,子区域描述信息包括各个子区域的平面空间信息,子区域描述信息还包括各个子区域之内的视频图像的映射类型信息,各个子区域的球面空间信息用于根据映射类型信息和各个子区域的平面空间信息确定。
可选地,作为一个实施例,子区域描述信息包括各个子区域的球面空间信息,子区域描述信息还包括各个子区域的形状信息。
图4是本申请实施例的传输媒体数据的方法的示意性流程图。图4所示的方法可以由服务器执行,图4所示的方法包括步骤410至步骤430,下面对步骤410至步骤430进行描述。
410、服务器向客户端发送描述文件。
其中,描述文件携带至少两个会话的会话描述信息,至少两个会话为客户端与服务器之间的会话,至少两个会话用于传输各自对应的子区域图像的码流数据,会话描述信息包括通过各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像。
420、服务器接收客户端发送的第一信息。
其中,第一信息用于指示当前视角所覆盖的子区域对应的会话,第一信息根据当前视角和会话描述信息确定。
430、服务器向客户端发送目标视频图像的码流数据。
其中,目标视频图像包括当前视角所覆盖的子区域之内的视频图像。
本申请中,服务器在向客户端传输目标视频图像之前通过向客户端传输携带各个会话的会话描述信息的描述文件,便于客户端根据描述信息对接收到的目标视频图像的码流数据进行处理。
可选地,作为一个实施例,客户端接收服务器发送的目标视频图像的码流数据,包括:客户端通过当前视角所覆盖的子区域对应的会话接收当前视角所覆盖的子区域图像的码流数据,以得到目标视频图像的码流数据。
可选地,作为一个实施例,子区域的区域空间信息为子区域的平面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据的映射类型信息。
可选地,作为一个实施例,子区域的区域空间信息为子区域的球面区域空间信息,会话描述信息还包括各个会话各自所传输的子区域图像的码流数据对应的子区域的形状信息。
可选地,作为一个实施例,第一信息携带在客户端向服务器发送的实时传输控制协议RTCP源描述报告中,
可选地,作为一个实施例,第一信息属于TLV格式信息。
上文结合图1至图4对本申请实施例的传输媒体数据的方法进行了详细的描述,下面结合具体的实施例对本申请实施例的传输媒体数据的方法进行详细的描述。
实施例一:
实施例一所示的传输媒体数据的方法的具体流程如图5所示,图5所示的方法包括步骤1001至步骤1007,下面对步骤1001至步骤1007进行详细的描述。
1001、服务器发布预设视角位置视频的地址。
具体地,服务器可以从全景视频中选取某个视角位置的视频内容,然后再发布该视频内容的地址;或者,服务器还可以先渲染出某个视角位置的内容,然后再发布该视频内容的地址。
服务器发布的视频内容地址的格式可以为实时流传输协议(real-time streaming protocol,RTSP)协议格式,例如,服务器发布的视频内容的地址可以为:rtsp://server.example.com/video。
1002、客户端向服务器发送视频描述请求命令。
1003、服务器向客户端发送会话描述协议(session description protocol,SDP)信息,描述支持的全景图片中覆盖用户视角的部分区域(field of view,FOV)的区域范围。
具体地,在步骤1002中,客户端可以向步骤1001中的地址发送视频描述请求命令,服务器在接收到客户端的视频描述命令请求之后,向客户端发送视频描述信息,其中包括描述服务器能够支持的FOV的区域范围。
例如,客户端向服务器发送的请求描述命令具体可以如表1所示:
表1
Figure PCTCN2018105036-appb-000001
在上述表1中,DESCRIBE字段表示视频描述命令。
服务器在接收到表1所示的视频描述命令之后,对该视频描述命令的应答内容可以是一个SDP描述文件,该SDP描述文件的具体内容可以如表2所示:
表2
Figure PCTCN2018105036-appb-000002
Figure PCTCN2018105036-appb-000003
表2所示的SDP描述文件中描述了一路视频会话,该视频会话中传输FOV码流的码率为5000kbps,H和V用于描述FOV的范围;其中,H和V可以分别表示VR球面坐标的方位角范围azimuth range、俯仰角范围elevation_range,或者,H和V也可以分别表示二维图像的宽度和高度像素个数值。
上述表2所示的SDP描述文件可以采用RTP协议进行传输。
1004、客户端与服务器之间建立会话;
具体地,客户端与服务器之间建立会话时,客户端可以先向服务器发送会话连接建立命令,接下来,服务器再针对该客户端发送的建立连接命令进行应答,从而建立起客户端和服务器之间的会话;
客户端向服务器发送的会话连接建立命令的内容可以如表3所示:
表3
Figure PCTCN2018105036-appb-000004
上述表3中的SETUP字段表示会话连接建立命令。其中,建立会话连接命令中的第一个命令表明客户端要与SDP描述文件中的track1建立连接,其中的Transport字段表明FOV视频内容通过RTP协议以单播方式传输,并且,客户端用于接收数据流的RTP端口号为20000,用于接收控制流的RTCP端口号为20001。
在接收到表3所示的会话连接建立命令之后,服务器对该会话连接建立命令进行应答,应答的内容可以如表4所示。
表4
Figure PCTCN2018105036-appb-000005
上述表4所示的应答内容是服务器对客户端第一个建立连接请求的应答,表明接受客户端的连接建立请求,其中的Transport字段表明单播地址为10.70.144.123,客户端的RTP接收端口为20000,RTCP接收端口为20001,服务器端的RTP接收端口为50000,RTCP接收端口为50001.该路连接的会话号为12345678。
1005、客户端向服务器发送RTCP源描述报告,该RTCP源描述报告携带用户当前视角的中心点和/或客户端视角的覆盖范围。
具体地,在步骤1005中,客户端可以根据建立连接命令后服务器应答的RTCP端口信息,通过RTSP会话向服务器发送RTCP源描述报告。
其中,上述RTCP源描述报告中描述客户端视角覆盖范围的具体格式可以如表5所示。
表5
Figure PCTCN2018105036-appb-000006
在上述表5中,新增的SDES item类型采用COVERAGE字段来标识(例如,可以用COVERAGE=9来标识),表示这个RTCP源描述报告携带的是客户端视角的覆盖范围。其中,H和V共同描述视角的覆盖范围,具体地,H和V可以分别表示VR球面坐标的方位角范围azimuth range、俯仰角范围elevation_range,或者,H和V也可以分别表示二维图像的宽度和高度。
具体地,一个用于描述客户端视角覆盖范围为方位角范围为110度,俯仰角范围为90度的RTCP源描述报告的具体格式如表6所示:
表6
Figure PCTCN2018105036-appb-000007
应理解,当客户端的FOV视角覆盖范围动态变化时,客户端需要向服务器发送上述COVERAGE=9的RTCP源描述报告,请求播放FOV视角范围动态变化的视频内容。
上述RTCP源描述报告中除了描述客户端视角覆盖范围之外,还可以描述用户当前视角中心点。RTCP源描述报告中描述用户视角中心点的具体格式可以如表7所示。
表7
Figure PCTCN2018105036-appb-000008
在上述表7中,新增的SDES item类型采用CENTER_FOV字段来标识(这里以CENTER_FOV=10为例),表示该RTCP源描述报告携带的是客户端视角的中心点信息。其中,X、Y、Z共同标识视角中心点的信息,X、Y、Z可以分别表示VR球面坐标的azimuth,elevation,tilt值,也可以只保留X和Y来分别表示对应视角区域的中心点的二维坐标值或者视角区域左上角坐标值。
具体地,一个用于描述用户视角中心点位于-45,-45,0的RTCP源描述报告的格式可以如表8所示:
表8
Figure PCTCN2018105036-appb-000009
应理解,上述视角覆盖范围和视角中心点信息等视角信息还可以放在同一个SDES item中发送。
另外,在上述RTCP源描述报告中,COVERAGE和CENTER_FOV字段也可以不存在,此时,RTCP源描述报告中只携带视角覆盖范围和视角中心点信息等视角信息。
1006、客户端向服务器发送播放命令。
具体地,在步骤1006中,客户端是向客户端和服务器之间建立的会话发送播放命令,该播放命令的具体格式可以具体如表9所示。
表9
Figure PCTCN2018105036-appb-000010
Figure PCTCN2018105036-appb-000011
在接收到客户端发送的播放命令之后,客户端会对应答客户端的播放命令,具体应答的内容可以如表11所示。
表10
Figure PCTCN2018105036-appb-000012
1007、服务器向客户端发送与用户视角对应的以RTP数据包形式携带的视频数据,以及相应的视频数据。
上述视频数据包可以是第一窗口范围内的视频的数据包,上述视频数据可以包括服务器发送给客户端的视频的中心点坐标和内容覆盖范围;
其中,上述第一窗口内的视频内容包括客户端请求的FOV内容,第一窗口内的视频内容可以与用户请求的FOV内容相同,或者,第一窗口内的视频内容多于用户请求的FOV内容。
也就是说,为了提升用户体验,在实际应用中,可以利用预测信息渲染或者从全景视频中截取一个比客户端请求的FOV视角范围更大的窗口内容。当客户端无法及时取得新FOV内容时,可以从前一个窗口内容中获得新FOV对应内容,因此携带服务器可以发送窗口内容的中心点坐标以及方位角azimuth、俯仰角elevation范围信息,以支持服务器编码传输窗口大小自适应变化的应用场景。
具体地,上述用于携带视频数据的RTP数据包的RTP包头格式可以如表11所示。
表11
Figure PCTCN2018105036-appb-000013
RTP数据包中为了携带视频数据,可以对RTP数据包进行扩展,扩展的格式如表12所示。
表12
Figure PCTCN2018105036-appb-000014
在表12中,X,Y,Z共同表示服务器向客户端发送视频的中心点位置信息。
其中,X、Y、Z可以分别对应VR球面坐标的azimuth,elevation,tilt值,或者,在表示发送视频的中心点位置时也可以只保留两项(X和Y)来分别表示对应视角区域的中心点二维坐标值或视角区域左上角坐标值。
另外,在表12中,H和V可以共同表征发送的视频内容的范围,其中,H和V可以分别表示VR球面坐标的方位角范围azimuth range、俯仰角范围elevation_range,或者,H和V也可以分别表示二维图像的宽度和高度像素个数值。
具体地,以一个中心点位于(-45,-45,0),方位角范围为110度,俯仰角范围为90度的窗口视频数据为例,其对应的RTP包头的具体表示形式如表13所示。
表13
Figure PCTCN2018105036-appb-000015
当用户的视角发生变化时,为了准确地呈现出用户视角范围内的视频,客户端可以再次向服务器发送RTCP源描述报告(重新执行步骤1005),以更新用户视角中心点信息 或用户视角区域范围信息。服务器则根据最新的用户视角信息从全景视频中提取FOV区域内容或者实时渲染FOV内容,然后编码并发送FOV视频数据。
上面结合实施例一对本申请实施例的传输媒体数据的方法进行了详细的介绍,应理解,在实施例一中并没有对全景视频空间区域进行子区域划分,服务器在接收到播放命令之后只需要将与用户视角对应的视频数据包发送给客户端就可以了。
在传输媒体数据的过程中,为了更好地传输不同区域的视频,可以将全景视频空间区域划分成多个子区域,下面结合实施例二至实施例四对存在多个子区域的情况下,如何进行媒体数据的传输进行详细的介绍。
实施例二
应理解,在客户端可以仅展现FOV区域的视频内容,或者,在客户端除了展现FOV区域的视频内容之外,还可以在客户端展示与FOV区域相邻的其它区域的视频内容。当需要同时传输FOV区域的视频内容和其它区域的视频内容时,为了保证保障FOV区域观看质量同时减少网络传输数据量,通常采用传输高质量的FOV区域和低质量的其它区域的方式来传输视频数据。
例如,一个全景视频内容被划分为8个子区域,覆盖用户视角FOV的是子区域2,3,6,7。对于子区域2,3,6,7可以采用高质量编码方式进行编码,得到高质量的视频数据,并且对于全部的子区域采用低质量的编码,得到低质量的全景视频,并将高质量的视频数据以及低质量的全景视频一起传输到客户端,由客户端进行解码渲染,并由客户端根据用户的视角呈现部分区域给用户。
实施例二所示的传输媒体数据的方法的具体流程如图6所示,图6所示的方法包括步骤2001至步骤2008,下面对步骤2001至步骤2008进行详细的描述。
2001、服务器将全景空间区域划分成多个子区域,并确定预设FOV以及预设FOV对应的子区域。
上述预设FOV对应的子区域可以从全景视频中获取或者通过渲染(可以根据请求的视角涉及的子区域,对相应约定的子区域范围进行渲染生成)得到。
2002、发布预设FOV中的视频的地址。
服务器发布的视频内容地址可以时RTSP协议格式的地址,例如,服务器发布的视频内容的地址可以为:rtsp://server.example.com/video。
2003、客户端向服务器发送视频描述命令。
2004、服务器描述每个子区域对应的全景视频的空间映射类型、子区域中心坐标、子区域范围信息。
具体地,在步骤2002中,客户端可以向步骤2001中的地址发送视频描述命令,服务器在接收到客户端的视频描述命令之后,应答该视频描述命令,并描述服务器能够支持的FOV的区域范围。
例如,客户端向服务器发送的描述命令具体可以如表格14所示:
表14
Figure PCTCN2018105036-appb-000016
在上述表14中,DESCRIBE字段表示视频描述命令。
服务器在接收到表14所示的视频描述命令之后,向客户端发送会话描述文件SDP,该SDP描述文件的具体格式如表15所示:
表15
Figure PCTCN2018105036-appb-000017
表15所示的SDP描述文件中总共描述了九路视频会话。其中八路视频会话对应的子 区域(tiles)码流为track1~track8,每个子区域的码流的码率为5000kbps。另一路视频回话对应的是全景(panoramic)码流(track9),码率为1000kbps。可选地,上述表15中所示的SDP描述文件中可以只包括子区域描述信息,而不包括全景码流描述信息。
上述子区域描述信息和全景码流描述信息均可以采用RTP协议进行传输。
d=<projection_type><shape_type>:<azimuth><elevation><tilt><azimuth_range><elevation_range>,d中的各个语句的语义如下:
projection_type:全景视频二维空间范围表达类型,具体可以是经纬图、六面体或者八面体等。
shape_type:区域形状标识,标识围成区域的形状类型,可以是由四个大圆围成,也可以是两个大圆和一个小圆围成等;
azimuth:子区域的中心点方位角;
elevation:子区域的中心点俯仰角;
tilt:子区域的中心点倾斜角;
azimuth_range:子区域的方位角范围;
elevation_range:子区域的俯仰角范围。
上述SDP描述文件可以携带描述区域内容中心点和范围的二维坐标信息,具体形式可以有多种,其中一种可能的形式如表16所示。
表16
Figure PCTCN2018105036-appb-000018
Figure PCTCN2018105036-appb-000019
其中,d=<projection_type>:<h_center><v_center><h><v>,d中的各个语句的语义如下:
projection_type:全景视频二维空间范围表达类型,例如,可以是经纬图、六面体或者八面体等;
h_center:区域的水平方向中心点坐标值;
v_center:区域的垂直方向中心点坐标值;
h:区域的水平宽度;
v:区域的垂直高度;
在描述区域内容的位置时,除了采用区域中心点的坐标值来表示之外,还可以采用区域内容的左上角的二维坐标和大小范围来表示。
可选地,上述SDP描述文件可以携带描述区域内容左上角二维坐标和大小范围的二维坐标信息,一种可选的形式如表17所示。
表17
Figure PCTCN2018105036-appb-000020
Figure PCTCN2018105036-appb-000021
其中,d=<projection_type>:<h_left_top><v_left_top><h><v>,d中的各个语法的语义如下:
projection_type:全景视频二维空间范围表达类型,可以是经纬图、六面体或者八面体等。
h_left_top:区域的左上角中心点坐标值;
v_left_top:区域的左上角中心点坐标值;
h:区域的水平宽度;
v:区域的垂直高度;
可选地,上述SDP描述文件携带区域内容左上角二维坐标和大小范围的二维坐标信息的另一种方式如表18所示。
表18
Figure PCTCN2018105036-appb-000022
Figure PCTCN2018105036-appb-000023
其中,d=<projection_type>:<h_left_top><v_left_top><h><v>,d中的各个语法的语义如下:
projection_type:全景视频二维空间范围表达类型,可以是经纬图、六面体或者八面体等。
h_left_top:区域的左上角中心点坐标值;
v_left_top:区域的左上角中心点坐标值;
h:区域的水平宽度;
v:区域的垂直高度。
2005、在客户端与服务器之间建立多路会话。
例如,当SDP描述文件中描述了九个子区域的视频内容时,可以在客户端与服务器之间建立九路会话。
在建立这九路会话中的每一路会话时,客户端可以向服务器发送会话连接建立命令, 该会话连接建立命令如上述表3所示,服务器在接收到该会话连接建立命令之后,对该会话连接建立命令进行应答,应答的具体的内容可以如上述表4所示。通过上述过程逐渐建立起每个子区域对应的会话,直到在客户端和服务器之间建立起九路会话。
2006、客户端向服务器发送RTCP源描述报告,该RTCP源描述报告携带用户的视角需要的子区域会话标号。
具体地,客户端可以通过一路会话向服务器传输RTCP源描述报告,发送的RTCP源描述报告的具体格式可以如表19所示。
表19
Figure PCTCN2018105036-appb-000024
在上述表19中,新增的SDES item类型采用SESSION_FOV字段来标识(这里以SESSION_FOV=11为例),表19所示的RTCP源描述报告携带的是客户端视角所需要的子区域(可选地,包括全景视频)对应的会话标号信息。可选地,在表19中,SESSION_FOV字段也可以不存在。
另外,在表19中,SESSION_NUM表示所需要的会话个数,sessionID1标识需要的是第一个会话连接标识,sessionID2标识需要的是第二个会话连接标识,依次类推。会话连接标识可以是任意数字或字符,能唯一地区分客户端和服务器之间的多路会话。
2007、客户端向服务器发送播放命令。
应理解,上述客户端向服务器发送的播放命令对上述各个会话都有效。
2008、服务器向客户端发送RTP视频数据包。
具体地,服务器可以通过SDES信息给出的多路会话连接向客户端发送子区域视频数据。
可选地,当用户视角发生变化时,客户端向服务器发送RTCP源描述报告,以更新服务器需要向客户端采用哪些会话连接发送对应的视频内容。
当接收到客户端重新发送的RTCP源描述报告时,服务器根据最新的用户视角信息从全景视频中提取FOV区域内容或者实时渲染FOV内容,然后编码并发送FOV视频数据。
在上述实施例二中,是在客户端和服务器之间建立多路会话,每路会话对应一个子区域,并且客户端在上报RTCP描述报告时是通过携带用户视角所需要的子区域对应的会话标号来描述,实际上也可以在客户端与服务器之间建立一路会话,通过该会话向服务器发送RTCP源描述报告,该RTCP描述报告携带客户端用户的视角需要的子区域的标号。
实施例三:
实施例三所示的传输媒体数据的方法的具体流程如图7所示,图7所示的方法包括步骤3001至步骤3008,下面对步骤3001至步骤3008进行详细的描述。
3001、服务器将全景空间区域划分成多个子区域,并确定预设FOV以及预设FOV对应的子区域。
上述预设FOV对应的子区域可以从全景视频中获取或者通过渲染得到。
3002、发布预设FOV中的视频的地址。
服务器发布的视频内容地址可以时RTSP协议格式的地址,例如,服务器发布的视频内容的地址可以为:rtsp://server.example.com/video。
3003、客户端向服务器发送视频描述命令。
3004、服务器描述全景视频空间映射格式类型、每个子区域的ID标识号、每个子区域ID对应的子区域中心坐标和子区域范围信息。
在上述步骤3003中,客户端可以向步骤3003中的地址发送视频描述命令,服务器在接收到客户端的视频描述命令之后,应答该视频描述命令,并描述服务器能够支持的FOV的区域范围。
例如,客户端向服务器发送的描述命令具体可以如下面的表格20所示:
表20
Figure PCTCN2018105036-appb-000025
在上述表20中,DESCRIBE字段表示视频描述命令。
服务器在接收到表20所示的视频描述命令之后,向客户端发送会话描述文件SDP,该SDP描述文件的具体格式如表21所示:
表21
Figure PCTCN2018105036-appb-000026
Figure PCTCN2018105036-appb-000027
表21所示的SDP描述文件中总共描述了两路视频会话。其中,一路是子区域(tiles)码流,每个码率为5000kbps。一路是全景(panoramic)码流,码率为1000kbps。两路会话均可以采用RTP协议进行传输。
d=<projection_type>:<subPic_Num><subPicID><azimuth><elevation><tilt>
<azimuth_range><elevation_range>…<subPicID><azimuth><elevation><tilt><azimuth_r ange><elevation_range>。
d中的各个语句的语义如下:
projection_type:全景视频二维空间范围表达类型,可以是经纬图、六面体或者八面体等。
shape_type:区域形状标识,标识围成区域的形状类型,可以是由四个大圆围成,也可以是两个大圆和一个小圆围成等;
subPic_Num:划分的区域数目,便于解析共有多少组区域参数
subPicID:每个区域的标号
azimuth:区域的中心点方位角;
elevation:区域的中心点俯仰角;
tilt:区域的中心点倾斜角;
azimuth_range:区域的方位角范围;
elevation_range:区域的俯仰角范围。
可选地,也可以采用表21所示的形式只描述一路视频会话,只描述子区域码流。
或者,还可以将全景码流和子区域码流放在同一个描述字段中,一种可能的形式如表22所示:
表22
Figure PCTCN2018105036-appb-000028
Figure PCTCN2018105036-appb-000029
在表21中,d=<projection_type>:<subPic_Num><subPicID><x><y><h>
<v>…<subPicID><x><y><h><v>。
d中的各个语句的语义如下:
projection_type:全景视频二维空间范围表达类型,可以是经纬图、六面体或者八面体等。
subPic_Num:区域数目,便于解析共有多少组区域参数
subPicID:每个区域的标号
x:区域的中心水平方向坐标值;可选地,可以是区域左上角水平方向坐标值;
y:区域的中心垂直方向坐标值;可选地,可以是区域左上角垂直方向坐标值;
h:区域的水平宽度;
v:区域的垂直高度。
3005、客户端与服务器之间建立会话。
具体地,步骤3005在客户端和服务器之间建立会话的过程如上述步骤1004所示。
3006、客户端向服务器发送RTCP源描述报告,该RTCP源描述报告携带用户的视角需要的子区域标号。
具体地,客户端可以通过客户端与服务器之间建立的会话向服务器发送RTCP源描述报告。
在步骤3006中发送的RTCP源描述报告的一种可能的形式如表23所示。
表23
Figure PCTCN2018105036-appb-000030
Figure PCTCN2018105036-appb-000031
在上述表22中,新增的SDES item类型采用SUBPIC_FOV字段来标识(以SUBPIC_FOV=12为例),客户端视角需要哪些子区域(可选地,可以包括全景视频)。可选地,SUBPIC_FOV字段可以不存在。subPicNum表示需要多少个子区域内容,subPicID1表示需要的第一个子区域标号,subPicID2表示需要的第二个子区域标号,以此类推。
3007、客户端向服务器发送播放命令。
具体地,在步骤3007中客户端向服务器之间发送播放命令的具体内容与上述步骤1006相同,这里不再详细描述。
3008、服务器向客户端发送与用户视角对应的RTP视频数据包,每个RTP报文包头中包括当前携带的视频内容的区域的ID标识号,可选地包括当前区域所属的FOV帧号或其它标识不同区域内容属于同一FOV的信息。
具体地,在步骤3008中,发送的RTP视频数据包的一种可选的格式如表24所示。
表24
Figure PCTCN2018105036-appb-000032
RTP数据包中为了携带视频数据,可以对RTP数据包进行扩展,扩展的格式如表25所示。
表25
Figure PCTCN2018105036-appb-000033
其中,表24中的subPicID标识当前携带的视频内容所属的区域ID编号,FOV_SN标识该内容所属的视角标号,便于客户端将具有相同FOV_SN的不同subPicID视频组合成FOV内容送显。
在实施例三中,当用户的视角发生变化时,为了准确地呈现出用户视角范围内的视频,客户端可以再次向服务器发送RTCP源描述报告(重新执行步骤3006),以更新用户视角中心点信息或用户视角区域范围信息。服务器则根据最新的用户视角信息从全景视频中提取FOV区域内容或者实时渲染FOV内容,然后编码并发送FOV视频数据。
在上述实施例三中,是由客户端确定用户视角对应的子区域,并将用户的视角需要的子区域标号携带在向服务器发送的RTCP描述报告中。事实上,当全景视频空间区域被划分成多个子区域时,客户端也可以只在RTCP源描述报告中携带用户的视角信息,由服务器确定用户视角对应的子区域,并将用户对应的子区域的视频数据包发送给客户端。
实施例四
实施例四所示的传输媒体数据的方法的具体流程如图8所示,图8所示的方法包括步骤4001至步骤4008,下面对步骤4001至步骤4008进行详细的描述。
4001、服务器将全景空间区域划分成多个子区域,并确定预设FOV以及预设FOV对应的子区域。
4002、发布预设FOV中的视频的地址。
4003、客户端向服务器发送视频描述命令。
4004、服务器描述全景视频空间映射格式类型、每个子区域的ID标识号、每个子区域ID对应的子区域中心坐标和子区域范围信息。
4005、客户端与服务器之间建立会话。
实施例四中的步骤4001至步骤4005与实施例三中的步骤3001至步骤3005的具体内容相同,这里不再详细描述。
4006、客户端向服务器发送RTCP源描述报告,该RTCP源描述报告携带用户的视角信息,视角信息包括用户当前视角的中心点和客户端视角的覆盖范围。
具体地,步骤4006中,客户端可以根据连接命令后服务器应答的RTCP端口信息,通过FOV视频码流传输会话向服务器发送RTCP源描述报告,该RTCP源描述报告的具体格式可以如表26所示。
表26
Figure PCTCN2018105036-appb-000034
在上述表26中,新增的SDES item类型采用COVERAGE字段来标识(以COVERAGE=9为例),表示这个RTCP源描述报告携带的是客户端视角的覆盖范围。其中,H和V共同描述视角的覆盖范围,可选地,H和V可以分别是VR球面坐标的方位角范围(azimuth range)、俯仰角范围(elevation_range),也可以是二维图像的宽度和高度。
以表26中的RTCP描述报告为例,描述用户视角中心点位于-45,-45,0的RTCP源描 述报告实例如表7所示。
可选地,视角覆盖范围和视角中心点信息等视角信息可以放在同一个SDES item中发送。
可选地,COVERAGE和CENTER_FOV字段也可以不存在,RTCP中只携带视角覆盖范围和视角中心点信息等视角信息。
4007、客户端向服务器发送播放命令。
4008、服务器向客户端发送与用户视角对应的RTP视频数据包,每个RTP报文包头中包括当前携带的视频内容的区域的ID标识号,可选地包括当前区域所属的FOV帧号或其它标识不同区域内容属于同一FOV的信息。
上述步骤4007和步骤4008与实施例三中的步骤3007和步骤3008的具体过程相同,这里不再详细描述。
在上述实施例四中,客户端向服务器发送的RTCP源描述报告中携带的用户的视角信息中不仅包括了当前视角的中心点,还携带了客户端视角的覆盖范围。
可选地,上述RTCP源描述报告中携带的用户的视角信息还可以只携带当前视角的中心点,这时客户端的视角的覆盖范围可以是预设的。
在上述实施例一至实施例四中,服务器在应答客户端并描述视频内容时,除了可以描述其支持的FOV区域范围的视频内容之外时,还可以直接描述几个初始FOV信息,供客户端选择。
实施例五
正在实施例五中,SDP描述文件中携带服务器发送的初始FOV信息。具体地,服务器向客户端发送的SDP描述文件中可以携带服务器发送的初始FOV信息,具体地,初始FOV信息的格式如表27所示。
表27
Figure PCTCN2018105036-appb-000035
Figure PCTCN2018105036-appb-000036
表27所示的SDP描述文件中H和V共同描述FOV的范围,H和V可以分别是VR球面坐标的方位角范围(azimuth range)、俯仰角范围(elevation_range),也可以是二维图像的宽度和高度。X、Y、Z一起标识初始的视角中心点的信息,可选地,X、Y、Z可以分别对应VR球面坐标的(azimuth,elevation,tilt)值,也可以只保留两项(X和Y)分别对应视角区域的中心点二维坐标值或视角区域左上角坐标值。
应理解,在实施例六中服务器除了发送SDP描述文件与上述实施例一至实施例五不同之外,其他步骤可以与上述实施例一至实施例五相同。
应理解,上述实施例一至实施例六都是基于流媒体传输技术来实现FOV信息的信息的实时传输,具体实现方式是将用户的视角信息以及相应的FOV信息携带在原有的数据中。事实上,为了实现FOV信息的实时传输,还可以自定义一些传输协议来实现FOV信息的实时传输。
例如,客户端反馈的视角信息可以采用自定义的TLV消息模式下定义的消息来发送,一种可选的TLV格式如表28所示。
表28
Figure PCTCN2018105036-appb-000037
其中,不同的类型(type)具有不同的payload,一种可能的形式如表29所示。
表29
Type 语义 Payload
0x00 FOV范围信息 H,V
0x01 FOV位置信息 X,Y,Z
0x02 FOV范围和位置信息 V,H,X,Y,Z
其它 保留  
在表28中,H和V共同描述FOV的范围,H和V可以分别是VR球面坐标的方位角范围(azimuth range)、俯仰角范围(elevation_range),也可以是二维图像的宽度和高度。X、Y、Z一起标识视角中心点的信息,可选地,X、Y、Z可以分别对应VR球面坐标的(azimuth,elevation,tilt)值,也可以只保留两项(X和Y)分别对应视角区域的中心点二维坐标值或视角区域左上角坐标值。
应理解,服务器发送的数据是也用TLV方式发送,一种可选的TLV格式如表29所示。
其中,不同的类型(type)具有不同的payload,一种可能的形式如表30所示。
表30
Figure PCTCN2018105036-appb-000038
Figure PCTCN2018105036-appb-000039
在表30中,H和V共同描述服务器向客户端发送的窗口视频范围,H和V可以分别是VR球面坐标的方位角范围azimuth range、俯仰角范围elevation_range,也可以是二维图像的宽度和高度。X、Y、Z一起标识窗口中心点的信息,可选地,X、Y、Z可以分别对应VR球面坐标的azimuth,elevation,tilt值,也可以只保留两项(X和Y)分别对应视角区域的中心点二维坐标值或视角区域左上角坐标值。video content表示视频内容,是压缩后的视频数据。
进一步地,当服务器发送的数据是也用TLV方式发送时,为了保证FOV由相同时刻的正确子区域内容拼接而成,服务器向客户端发送的TLV数据中需要包括FOV的标号。一种可选的TLV格式如表31所示:
表31
Figure PCTCN2018105036-appb-000040
在表31中,H和V共同描述服务器向客户端发送的窗口视频范围,H和V可以分别是VR球面坐标的方位角范围azimuth range、俯仰角范围elevation_range,也可以是二维图像的宽度和高度。X、Y、Z一起标识窗口中心点的信息,可选地,X、Y、Z可以分别对应VR球面坐标的azimuth,elevation,tilt值,也可以只保留两项(X和Y)分别对应视角区域的中心点二维坐标值或视角区域左上角坐标值。video content是压缩后的视频数据。FOV_SN标识当前的区域视频所属的FOV编号。
上文结合实施例一至实施例六对本申请实施例的传输媒体数据的方法进行了详细的介绍。事实上,客户端向服务器上报当前视角的空间信息时,还可以向服务器上报当前视 角所在的视点信息,使得服务器能够结合当前视角的空间信息以及当前视角所在的视点信息来综合确定第二目标视频图像。另外,客户端还可以通过向服务器发送指示信息来指示当前视角的空间信息的具体组成。下面结合实施例七和实施例八对这两种情况进行详细的描述。
实施例七
图9示出了实施例七所示的传输媒体数据的方法的具体过程,图9所示的具体过程包括步骤5001至步骤5007,下面对步骤5001至步骤5007进行详细的描述。
5001、服务器发布预设视角位置视频的地址。
步骤5001的具体过程同实施例一中的步骤1001相同,这里不再详细描述。
5002、客户端向步骤5001中的地址发送描述命令。
其中,步骤5002中的发送描述命令的具体过程与步骤1002相同,步骤1002的相关解释、限定和举例同样适用于步骤5002,这里不再详细描述。
5003、服务器向客户端发送SDP信息。
其中,SDP信息包含预设的视点位置信息,预设的FOV方向信息和预设的FOV范围信息。
上述SDP信息具体可以是一个SDP描述文件,该SDP描述文件的具体内容可以如表32所示。
表32
Figure PCTCN2018105036-appb-000041
表32中的SDP描述文件中描述了一路视频会话。与实施例一中的表2相比,表32中的SDP描述文件包含的信息增加了预设视点的空间位置信息和预设的FOV方向信息。其中,(location_x、location_y、location_z)为预设视点的空间位置信息,(X、Y、Z)为预设的FOV方向信息。假设预设视点为视点A,那么,(location_x、location_y、location_z)可以共同描述视点A空间坐标位置,(X、Y、Z)可以标识FOV中心在视点A为球心的单位球上的方向信息。
可选地,上述(X、Y、Z)中,X可以对应三维坐标系(例如,三维直角坐标系)中的方位角(azimuth)或者偏航角(yaw),Y可以对应三维坐标系中的俯仰角(pitch或者elevation),Z可以对应三维坐标系中的倾斜角(tilt)或者翻滚角(roll)。
其中,上述三维坐标系可以是以视点A的位置为原点的三维坐标系。
可选地,也可以只采用(X、Y)来标识FOV中心点在视点A为球心的单位球上的方向信息。例如,(X、Y)可以分别对应视角区域的中心点二维坐标值,(X、Y)还可以分别对应视角区域左上角坐标值。
表32所示的SDP描述文件可以基于RTP协议进行传输。
5004、客户端与服务器之间建立会话。
步骤5004的具体过程与步骤1004的具体过程相同,这里不再详细描述。
5005、客户端向服务器发送RTCP源描述报告,该RTCP源描述报告携带当前视角的视点位置信息、当前视角的视角方向信息以及当前视角的视角覆盖范围信息中的至少一种信息。
上述RTCP源描述报告中携带的当前视角的视点位置信息、当前视角的视角方向信息以及当前视角的视角覆盖范围信息等信息可以称为当前视角的视角信息,当前视角的视角信息包括当前视角的视点位置信息、当前视角的视角方向信息以及当前视角的视角覆盖范围信息中的至少一种。
进一步地,上述RTCP源描述报告中可以携带用户当前视角的视角信息,该视角信息可以包括当前视角的视点位置信息、视点位移速度信息、视点位移速度的加速度信息、视角方向信息、视角方向变化速度信息以及视角的覆盖范围信息中的一种或多种组合。
应理解,上述当前视角的视角信息还可以划分成当前视角的空间信息和当前视角的视点信息(相当于上文中的第一视点信息)。其中,当前视角的视点信息包括当前视角的视点位置信息、当前视角的视点位移速度信息和当前视角的视点位移速度的加速度信息;当前视角的空间信息包括当前视角的视角方向信息、当前视角的视角方向变化速度信息以及当前视角的视角覆盖范围信息。
可选地,客户端还可以向服务器发送第一指示信息,该第一指示信息包含第一标识位,该第一标识位的取值用于指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;其中,当第一标识位(的取值)为第一取值时,当前视点的空间位置信为相对空间位置信息;当第一标识位为第二取值时,当前视点的空间位置信息为绝对空间位置信息。
上述第一指示信息可以携带在客户端向服务器发送的RTCP源描述报告中。
上述第一标识位可以对应于至少一个比特,通过该至少一个比特的不同取值,可以用于分别指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息。本申请中对第一标识位为何种取值时指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息不做限定,只要通过第一标识位的不同取值能够指示当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息的方式都在本申请的保护范围内。
本申请中,在当前视点的空间位置信息为相对空间位置信息时,能够减少上报当前视点的空间信息包含的数据量,可以减少资源开销。
应理解,在当前视点的空间位置信息为相对空间位置信息的情况下,当前视点的空间位置信息可以是当前视点相对于起始视点或者某一指定视点或者上一视点的相对位置信息。在当前视点的空间位置信息为绝对位置信息的情况下,当前视点的空间位置信息可以是当前视点相对于某个固定坐标系(该固定坐标系可以是预先设置好的一个固定坐标系) 的相对位置信息。
由于RCTP源描述报告中携带当前视角的视角信息具体可以包含不同种类的信息,因此,客户端还可以向服务器发送一个第二指示信息,通过该第二指示信息来指示RTCP源描述报告中携带的信息。
可选地,客户端还可以向服务器发送第二指示信息,该第二指示信息包括第二标识位,该第二标识位的取值用于指示当前视角的空间信息的组成信息。
其中,第二标识位的取值用于指示下列情况中的至少一种:
当第二标识位为第三取值时,当前视角的空间信息由当前视角的视角方向信息组成;
当第二标识位为第四取值时,当前视角的空间信息由当前视角的视角方向信息和当前视角所在的视点的位置信息组成;
当第二标识位为第五取值时,当前视角的空间信息由当前视角的视角方向信息、当前视角所在视点的位置信息以及当前视角的视角大小信息组成。
上述第二指示信息可以携带在客户端向服务器发送的RTCP源描述报告中。
可选地,当前视角的视角方向信息可以是绝对视角方向信息,也可以是相对视角方向信息。
本申请中,在当前视角的视角方向采用相对视角方向信息时,能够减少上报当前视角的视角方向信息包含的数据量,可以减少资源开销。
在当前视角的视角方向信息为绝对视角方向信息时,当前视角的视角方向信息可以是相对于某个固定坐标系的视角方向信息;在当前视角的视角方向信息为相对视角方向信息时,当前视角的视角方向信息可以是相对于之前的某个视角方向(例如,相对于前一个视角方向的偏转角度,或者相对于初始视角方向的偏转角度)的视角方向信息。
可选地,客户端还可以向服务器发送第三指示信息,该第三指示信息包含第三标识位,该第三标识位的取值用于指示当前视角的视角方向信息为相对绝对视角方向信息或者相对方向信息;其中,当第三标识位(的取值)为第六取值时,当前视角的视角方向信息为绝对视角方向信息;当第三标识位(的取值)为第七取值时,当前视角的视角方向信息为相对视角方向信息。
上述第三指示信息可以携带在客户端向服务器发送的RTCP源描述报告中。
应理解,步骤5005中的RTCP源描述报告可以只包含上述视点信息中可以只包含的所有信息中一些主要信息,例如,RTCP源描述报告中一种可能的具体格式如表33所示。
表33
Figure PCTCN2018105036-appb-000042
Figure PCTCN2018105036-appb-000043
在表33中,新增的SDES item类型采用FOV_POS_MESSAGE字段来标识(以FOV_POS_MESSAGE=11为例),表示这个RTCP源描述报告携带的是客户端反馈的视角信息。
其中,H和V共同描述FOV的范围,具体地,H和V可以分别表示VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度,也可以是二维图像的宽度和高度。
FOV_X、FOV_Y、FOV_Z共同标识FOV的旋转信息。
可选地,FOV_X对应三维坐标系的方位角(azimuth)或者偏航角(yaw),FOV_Y对应三维坐标系的方位角(azimuth)或者偏航角(yaw),FOV_Z对应三维坐标系的倾斜角(tilt)或者翻滚角(roll)值。
其中,上述三维坐标系可以是以当前视点的位置为原点的三维坐标系。
应理解,也可以只保留两项(FOV_X和FOV_Y)来标识在当前视点的全景二维视频图像中对应FOV区域的中心点二维坐标值或FOV区域左上角坐标值。
另外,在上述RTCP源描述报告中,position_x,position_y,position_z是视点空间位置信息坐标值。
应理解,当视角方向未变,而用户只移动视点位置的情况下,客户端可以只向服务器反馈视点信息。客户端只向服务器反馈视点信息,能够减少信令开销。
当客户端只向服务器反馈视点信息时,客户端向服务器上报的RTCP源描述报告的具体格式可以如表34所示。
表34
Figure PCTCN2018105036-appb-000044
在表34中,新增的SDES item类型采用VP_POS_MESSAGE字段来标识(以VP_POS_MESSAGE=12为例),position_x,position_y,position_z是三维笛卡尔坐标系下的视点空间位置坐标值或相对于前一个视点位置的变化值。
在某些情况下,视点并不是固定不变的,而是一直移动的,在这种情况下,在这种情况下,还可以将表示视点移动快慢的信息上报给服务器,使得服务器能够进行预测渲染和预取下发,能够降低视频图像传输到客户端的时延,进而提升用户体验。
在这种情况下,客户端向服务器上报的RTCP源描述报告的一种可能的实现形式如表35所示。
表35
Figure PCTCN2018105036-appb-000045
在表35中,新增的SDES item类型采用FOV_MESSAGE字段(以FOV_MESSAGE=13为例,FOV_MESSAGE也可以是其它的数值,这里不做限制)来表示该RTCP源描述报告携带的是客户端的视角信息。
其中,H和V共同描述FOV的范围,具体地,H和V可以分别表示VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度,也可以是二维图像的宽度和高度。
FOV_X、FOV_Y、FOV_Z共同标识FOV的旋转信息。
可选地,FOV_X对应三维坐标系的方位角(azimuth)或者偏航角(yaw),FOV_Y对应三维坐标系的方位角(azimuth)或者偏航角(yaw),FOV_Z对应三维坐标系的倾斜角(tilt)或者翻滚角(roll)值。
其中,上述三维坐标系可以是以当前视点的位置为原点的三维坐标系。
应理解,也可以只保留两项(FOV_X和FOV_Y)来标识在当前视点的全景二维视频 图像中对应FOV区域的中心点二维坐标值或FOV区域左上角坐标值。
另外,在上述RTCP源描述报告中,position_x,position_y,position_z是视点空间位置信息坐标值,speed_pos_x,speed_pos_y,speed_pos_z是视点位置变化速度值。
进一步的,RTCP源描述报告中还携带speed_fov_x,speed_fov_y,speed_fov_z。
其中,speed_fov_x是FOV在三维坐标系中的方位角(azimuth)或者偏航角(yaw)的变化速度值,speed_fov_y是FOV在三维坐标系中的俯仰角(pitch或者elevation)的变化速度值,speed_fov_z是FOV在三维坐标系中的倾斜角(tilt)或者翻滚角(roll)的变化速度值。
在上述RTCP源描述报告中,message_type标识携带的视角信息类型,表征当前SDES中是否含有其中一种类型或多种类型信息的组合。message_type的具体内容如表36所示。
表36
Figure PCTCN2018105036-appb-000046
应理解,上述表36中只是列出了视角信息的部分可能的组合情况,事实上,任何可能的组合情况都在本申请的保护范围内,为了简洁,这里不再一一列举。
可选地,可以利用掩码的形式表征当前SDES中是否含有其中一种类型或多种类型信息的组合。即message_type的每个比特为1时,标识携带有当前比特标识对应的信息类型(可以是一个对应一种类型的信息或者多种类型信息的组合),否则未携带。
应理解,当前视角的视角信息中的不同类型的信息可以分别放在不同的SDES item中发送。
具体地,当前视角的视点位置信息、视点位移速度信息、视点位移速度的加速度信息、视角方向信息、视角方向变化速度信息以及视角的覆盖范围信息可以分别放在不同的SDES item中发送(可以是每种信息对应一个SDES item,或者是多种信息的组合对应一个SDES item)。
可选地,上述视点空间位置信息可以是视点相对于起始视点或者某一指定视点或者上一视点的相对位置信息。
上述视角方向信息可以是视角方向相对于起始视点方向或者某一指定视角方向或者上一视角方向的相对变化量信息。
5006、客户端向服务器发送播放命令。
步骤5006与上文中的步骤1006的具体过程相同,这里不再重复描述。
5007、服务器向客户端发送RTP视频数据包。
具体地,在步骤5007中,服务器以RTP数据包形式向客户端发送与用户视角对应的视频数据。
其中,上述RTP数据包可以携带当前窗口所在视点的空间位置信息、窗口中心点方向信息和窗口覆盖范围信息。
具体地,服务器可以通过会话向客户端发送渲染或者从全景视频中划分的窗口视频内容(该内容包含客户端请求的FOV内容),编码后的数据可以通过RTP数据包携带,该RTP数据包需携带该窗口区域所在视点的空间位置信息、窗口的中心点方向坐标以及水平和垂直方向覆盖范围。
上述RTP数据包的RTP包头格式如表37所示。
表37
Figure PCTCN2018105036-appb-000047
为了携带更多的信息,可以对RTP数据包进行扩展,扩展的RTP数据包的包头格式如表38所示。
表38
Figure PCTCN2018105036-appb-000048
在表36中,position_x,position_y,position_z共同标识当前窗口所在视点的位置坐 标;X,Y,Z共同表示服务器向客户端发送视频的中心点位置。
可选地,上述(X、Y、Z)中,X可以对应三维坐标系的方位角(azimuth)或者偏航角(yaw),Y可以对应三维坐标系中的俯仰角(pitch或者elevation),Z可以对应三维坐标系中的倾斜角(tilt)或者翻滚角(roll)。
其中,上述三维坐标系可以是以当前视点的位置为原点的三维坐标系。
另外,H和V共同表征发送的视频内容的范围。具体地,H和V可以分别表示VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度;或者,H和V也可以分别表示采用二维图像的宽度和高度像素个数值。
在实施例七中,为了提升用户体验,在实际应用中,可以利用预测信息渲染或者从全景视频中截取一个比客户端请求的FOV视角范围更大的窗口内容。当客户端无法及时取得新FOV内容时,可以从前一个窗口内容中获得新FOV对应内容,因此携带服务器可以发送窗口内容的中心点坐标以及方位角azimuth、俯仰角elevation范围信息,以支持服务器编码传输窗口大小自适应变化的应用场景。
当用户的视角发生变化时,为了准确地呈现出用户视角范围内的视频,客户端可以再次向服务器发送RTCP源描述报告(重新执行步骤5005),以更新用户视角中心点信息或用户视角区域范围信息。服务器则根据最新的用户视角信息从全景视频中提取FOV区域内容或者实时渲染FOV内容,然后编码并发送FOV视频数据。
在本申请实施例中,为了尽可能地降低时延,可以利用已有的TCP或UDP通信协议传输视角信息。具体地,客户端向服务器反馈当前视点位置或视角信息,服务器根据客户端请求的视点位置和视角信息,将对应视点位置的一定窗口内容发送至客户端,发送的内容需要携带窗口位置和大小信息。
在客户端向服务器反馈视角信息时采用自定义的TLV(type,length,value)消息模式发送。
可选地,客户端上报信息时采用一种可能的TLV格式如表39所示。
表39
Figure PCTCN2018105036-appb-000049
Figure PCTCN2018105036-appb-000050
在表37中,H和V共同描述FOV的范围,其中,H和V可以分别表示VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度,也可以是二维图像的宽度和高度。
X、Y、Z可以共同标识视角中心点的信息。
可选地,上述(X、Y、Z)中,X可以对应三维坐标系的方位角(azimuth)或者偏航角(yaw),Y可以对应三维坐标系中的俯仰角(pitch或者elevation),Z可以对应三维坐标系中的倾斜角(tilt)或者翻滚角(roll)。
其中,上述三维坐标系可以是以当前视点的位置为原点的三维坐标系。
另外,H和V共同表征发送的视频内容的范围。具体地,H和V可以分别表示VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度;或者,H和V也可以分别表示采用二维图像的宽度和高度像素个数值。
speed_fov_x是FOV在三维坐标系中的方位角(azimuth)或者偏航角(yaw)的变化速度值,speed_fov_y是FOV在三维坐标系中的俯仰角(pitch或者elevation)的变化速度值,speed_fov_z是FOV在三维坐标系中的倾斜角(tilt)或者翻滚角(roll)的变化速度值。
除了客户端向服务器上报信息时采用TLV格式的信息,服务器向客户端发送数据时也可以采用TLV格式的信息。
可选地,服务器采用的一种可能的TLV格式如表40所示。
表40
Figure PCTCN2018105036-appb-000051
在表40中,H和V共同描述服务器向客户端发送的窗口视频范围,H和V可以分别是VR球面坐标的水平方向覆盖角度和垂直方向覆盖角度,也可以是二维图像的宽度和高度。
X、Y、Z一起标识窗口中心点的信息。
可选地,上述(X、Y、Z)中,X可以对应三维坐标系的方位角(azimuth)或者偏航角(yaw),Y可以对应三维坐标系中的俯仰角(pitch或者elevation),Z可以对应三维坐标系中的倾斜角(tilt)或者翻滚角(roll)。
另外,video content是压缩后的视频数据。position_x,position_y,position_z表示当前发送的窗口所在的视点位置信息。
本申请中,通过采用自定义TLV格式信息能够支持客户端和服务器实时交互反馈的视点位置信息、视点移动速度信息、视角方向信息、视角方向变化速度信息和区域大小信息,能够尽可能的降低传输时延,适合于实时要求较高应用场景。
可选地,客户端在向服务器反馈信息时,还可以采用MMT定义的针对多媒体传输应用的信号格式来反馈信息。
在本申请中,客户端在向服务器上报视角信息以及服务器向客户端发送的视频数据包中携带的图像的相关信息除了可以通过TLV格式的信息进行传输之外,还可以通过动态图像专家组媒体传输(MPEG Media Transport,MMT)中定义的一种针对多媒体传输应用的信号格式来进行传输。
下面结合实施例八对客户端和服务器之间采用MMT中定义的针对多媒体传输应用的信号格式来传输视角信息和视频图像的空间信息的情况进行详细的介绍。
实施例八:
MMT(例如,具体在ISO/IEC 23008-1标准)定义了一套针对多媒体传输应用的信号格式。其中,在ISO/IEC 23008-1标准中还定义了字段"urn:mpeg:mmt:app:vr:2017"以用来标识信息是给VR内容传输使用。
在MMT标准下,MMT VR接收端(相当于上文中的客户端)的视点信息需要周期性地反馈给MMT VR发送端(相当于上文中的服务器),或者,当用户的视点位置发生变化时,MMT VR接收端的视点信息需要反馈给MMT VR接收端。以便于MMT VR发送端确定当前视角所在的视点位置。通过在已有的MMT信息类型中添加一种新的类型,可以用于描述视点位置的变化信息,以支持视点变化时的应用。
MMT VR接收端除了可以向MMT VR发送端反馈视点的位置信息之外,MMT VR接收端还可以向MMT VR发送端反馈视角变化速度信息、视点变化速度信息、视角相对于特定视角或初始视角或前一视角的方向相对变化信息、视点相对于特定视点或初始视点或者前一视点的位置变化信息等信息中的一种或者多种(可以具体反馈任意一种信息组合)。
可选地,为了指示客户端和服务器之间传输的信息具体包括视点位置信息、视角变化速度信息、视点变化速度信息、视角相对于特定视角或初始视角或前一视角的方向相对变化信息、视点相对于特定视点或初始视点或者前一视点的位置变化信息等信息中的哪些信息,可以在客户端与服务器之间传输一个指示信息(该指示信息可以是客户端发送给服务器的信息,也可以是服务器发送给客户端的信息),该指示信息的不同的标识位的不同取值表示携带不同的信息。
上述指示信息的标识位的取值与指示的信息之间的关系可以如表41所示。
表41
应用信息类型(Application 应用信息名称(Application Message Name)
Message Type)
0x01 VRViewDependentSupportQuery
0x02 VRViewDependentSupportResponse
0x07 VRViewpointChangeFeedback
0x08 VRViewportSpeedFeedback
0x09 VRViewpointSpeedFeedback
0x0A VRViewportDeltaChangeFeedback
0x0B VRViewpointDeltaChangeFeedback
0x0C VR_content_window_range
0x0D VR_content_window_centre
0x0E VR_content_window_viewpoint
0x0F-0xFF Reserved for future use
表41中的各种应用信息的含义或者作用如下:
其中,VRViewDependentSupportQuery和VRViewDependentSupportResponse是客户端和服务器之间用于确认服务器是否支持基于视角传输视频流的应用信息。
VRViewDependentSupportQuery:客户端用该命令发现服务器是否支持基于视角传输视频流;
VRViewDependentSupportResponse:服务器向客户端反馈服务器所支持基于视角传输视频流的标识。
表41中,0x07至0x0B对应的信息是客户端向服务器反馈的信息,具体含义如下:
VRViewpointChangeFeedback:反馈当前视点位置信息;
VRViewportSpeedFeedback:反馈视角方向变化速度信息;
VRViewpointSpeedFeedback:反馈视点位置变化速度信息;
VRViewportDeltaChangeFeedback:反馈视角相对变化信息;
VRViewpointDeltaChangeFeedback:反馈视点位置相对变化信息。
表41中,0x0C至0x0E对应的信息是服务器向客户端发送选择或者渲染的窗口内容信息,具体含义如下:
VR_content_window_range:服务器端选取或者渲染的内容范围大小信息;
VR_content_window_centre:服务器端选取或者渲染的内容中心位置信息;
VR_content_window_viewpoint:服务器选取或者渲染的内容所在视点的位置信息。
可选地,VRViewpointChangeFeedback一种可能的语法如表42所示。
表42
Figure PCTCN2018105036-appb-000052
Figure PCTCN2018105036-appb-000053
在表42中,app_message_type表示表41中所示的不同消息类型,posx,posy,posz表示视点位置在三维笛卡尔坐标系中的坐标位置。
可选地,VRViewportSpeedFeedback一种可能的语法如表43所示。
表43
Figure PCTCN2018105036-appb-000054
在表43中,app_message_type表示表41中所示的不同消息类型,dirx_speed,diry_speed,dirz_speed分别表示在三维笛卡尔坐标系或者极坐标系下的视角方向变化速 度。
可选地,VRViewpointSpeedFeedback一种可能的语法如表44所示。
表44
Figure PCTCN2018105036-appb-000055
在表44中,app_message_type表示表41中所示的不同消息类型,posx_speed,posy_speed,posz_speed:分别表示在三维笛卡尔坐标系下的视点位置变化速度。
可选地,VRViewportDeltaChangeFeedback一种可能的语法如表45所示。
表45
Figure PCTCN2018105036-appb-000056
Figure PCTCN2018105036-appb-000057
在表45中,app_message_type表示表41中所示的不同消息类型,delta_dirx,delta_diry,delta_dirz:分别表示在三维笛卡尔坐标系或者极坐标系下的视角相对于特定视角或初始视角或前一视角的方向相对变化值。
可选地,VRViewpointDeltaChangeFeedback一种可能的语法如表46所示。
表46
Figure PCTCN2018105036-appb-000058
在表46中,app_message_type表示表41中所示的不同消息类型,delta_posx,delta_posy,delta_posz:视点位置在三维笛卡尔坐标系中相对于特定视点或初始视点或前一视点的位置变化量。
可选地,VR_content_window_range一种可能的语法如表47所示。
表47
Figure PCTCN2018105036-appb-000059
在表47中,各个主要语法的语义如下:
app_message_type:表41中所示的不同消息类型;
Hor_resolution:服务器渲染或者发送的内容在宽度方向上的像素个数;
Ver_resolution:服务器渲染或者发送的内容在高度方向上的像素个数;
Hor_fov:服务器渲染或者发送的内容水平方向视场角度覆盖范围;
Ver_fov:服务器渲染或者发送的内容垂直方向视场角度覆盖范围。
可选地,VR_content_window_centre一种可能的语法如表48所示。
表48
Figure PCTCN2018105036-appb-000060
Figure PCTCN2018105036-appb-000061
在表48中,app_message_type表示表41中所示的不同消息类型;centre_x,centre_y,centre_z:共同表示服务器向客户端发送的视频的中心点位置信息。
可选地,centre_x对应三维坐标的方位角(azimuth)或者偏航角(yaw);centre_y对应俯仰角(pitch或者elevation);centre_z对应倾斜角(tilt)或者翻滚角(roll)。
应理解,也可以只保留centre_x和centre_y,其中,centre_x和centre_y分别表示对应区域的中心点二维坐标值或视角区域左上角坐标值。
可选地,VR_content_window_viewpoint一种可能的语法如表49所示。
表49
Figure PCTCN2018105036-appb-000062
在表49中,app_message_type表示表41中所示的不同消息类型;posx_s,posy_s,pos_z表示服务器向客户端发送的窗口视频所在的视点位置信息,具体可以用三维笛卡尔坐标系中的坐标位置信息表示。
在实施例八中,通过在MMT信令中增加视点位置信息,能够支持视点变化情况下的VR视角传输应用。
图10是本申请实施例的客户端的示意性框图。图10所示的客户端500包括:发送模块510和接收模块520。
客户端500中的发送模块510和接收模块520可以执行图1和图2中所示的方法中的 各个步骤。
当客户端500执行图1所示的方法时,发送模块510和接收模块520的具体作用如下:
发送模块510,用于向服务器发送第一信息,所述第一信息用于指示第一目标视频图像的区域的空间信息,所述第一目标视频图像包括当前视角之内的视频图像;
接收模块520,用于接收所述服务器发送的第二目标视频图像对应的视频数据包,其中,所述第二目标视频图像包括所述第一目标视频图像,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
当客户端500执行图2所示的方法时,发送模块510和接收模块520的具体作用如下:
接收模块510,用于接收所述服务器发送的描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
发送模块520,用于向服务器发送第一信息,所述第一信息用于指示当前视角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
所述接收模块510还用于接收所述服务器发送的目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
图11是本申请实施例的服务器的示意性框图。图11所示的服务器600包括:接收模块610、确定模块620和发送模块630。
服务器600中的接收模块610、确定模块620和发送模块630可以执行图3和图4中所示的方法中的各个步骤。
当服务器600执行图3所示的方法时,接收模块610、确定模块620和发送模块630的具体作用如下:
接收模块610,用于接收客户端发送的第一信息,所述第一信息用于指示第一目标视频图像的区域的空间位置,所述第一目标视频图像包括当前视角之内的视频图像;
确定模块620,用于根据所述第一信息确定第二目标视频图像,所述第二目标视频图像包括所述第一目标视频图像;
发送模块630,用于向所述客户端发送所述第二目标视频图像对应的视频数据包,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
应理解,当服务器600执行图4所示的方法时,只采用接收模块610和发送模块630即可,其中,接收模块610和发送模块630的具体作用如下:
发送模块630,用于向客户端发送描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
接收模块610,用于接收所述客户端发送的第一信息,所述第一信息用于指示当前视 角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
所述发送模块630还用于向所述客户端发送目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
图12是本申请实施例的传输媒体数据的装置的硬件结构示意图。图12所示的装置700可以视为是一种计算机设备,装置700可以作为本申请实施例的客户端500或者服务器600的一种实现方式,也可以作为本申请实施例的传输媒体数据的方法的一种实现方式,装置700包括处理器710、存储器720、输入/输出接口730和总线750,还可以包括通信接口740。其中,处理器710、存储器720、输入/输出接口730和通信接口740通过总线750实现彼此之间的通信连接。
处理器710可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的客户端或者服务器中的模块所需执行的功能,或者执行本申请方法实施例的传输媒体数据的方法。处理器710可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器710中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器710可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器720,处理器710读取存储器720中的信息,结合其硬件完成本申请实施例的客户端或者服务器中包括的模块所需执行的功能,或者执行本申请方法实施例的传输媒体数据的方法。
存储器720可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器720可以存储操作系统以及其他应用程序。在通过软件或者固件来实现本申请实施例的客户端或者服务器中包括的模块所需执行的功能,或者执行本申请方法实施例的传输媒体数据的方法时,用于实现本申请实施例提供的技术方案的程序代码保存在存储器720中,并由处理器710来执行客户端或者服务器中包括的模块所需执行的操作,或者执行本申请方法实施例提供的传输媒体数据的方法。
输入/输出接口730用于接收输入的数据和信息,输出操作结果等数据。
通信接口740使用例如但不限于收发器一类的收发装置,来实现装置700与其他设备或通信网络之间的通信。可以作为处理装置中的获取模块或者发送模块。
总线750可包括在装置700各个部件(例如处理器710、存储器720、输入/输出接口730和通信接口740)之间传送信息的通路。
应注意,尽管图12所示的装置700仅仅示出了处理器710、存储器720、输入/输出 接口730、通信接口740以及总线750,但是在具体实现过程中,本领域的技术人员应当明白,装置700还包括实现正常运行所必须的其他器件,例如还可以包括显示器,用于显示要播放的视频数据。同时,根据具体需要,本领域的技术人员应当明白,装置700还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当明白,装置700也可仅仅包括实现本申请实施例所必须的器件,而不必包括图12中所示的全部器件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (72)

  1. 一种传输媒体数据的方法,其特征在于,包括:
    客户端向服务器发送第一信息,所述第一信息用于指示第一目标视频图像的区域的空间信息,所述第一目标视频图像包括当前视角之内的视频图像;
    所述客户端接收所述服务器发送的第二目标视频图像对应的视频数据包,其中,所述第二目标视频图像包括所述第一目标视频图像,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
  2. 如权利要求1所述的方法,其特征在于,所述第一目标视频图像为所述当前视角之内的视频图像,所述第一信息包括所述当前视角的空间信息。
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    所述客户端向所述服务器发送第一视点信息,所述第一视点信息用于指示所述当前视角所在的当前视点。
  4. 如权利要求3所述的方法,其特征在于,所述第一视点信息包括所述当前视点的空间位置信息、所述当前视点的位置变化速度信息和所述当前视点的位置变化速度的加速度信息中的至少一种。
  5. 如权利要求3或4所述的方法,其特征在于,所述第一视点信息包括当前视点的空间位置信息,所述方法还包括:
    所述客户端向所述服务器发送第一指示信息,所述第一指示信息包括第一标识位,所述第一标识位的取值用于指示所述当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;
    其中,当所述第一标识位为第一取值时,所述当前视点的空间位置信息为相对空间位置信息;
    当所述第一标识位为第二取值时,所述当前视点的空间位置信息为绝对空间位置信息。
  6. 如权利要求2-5中任一项所述的方法,其特征在于,所述方法还包括:
    所述客户端向所述服务器发送第二指示信息,所述第二指示信息包括第二标识位,所述第二标识位的取值用于指示所述当前视角的空间信息的组成信息,所述第二标识位的取值用于指示下列情况中的至少一种:
    当所述第二标识位为第三取值时,所述当前视角的空间信息由所述当前视角的视角方向信息组成;
    当所述第二标识位为第四取值时,所述当前视角的空间信息由所述当前视角的视角方向信息和所述当前视角所在的视点的位置信息组成;
    当所述第二标识位为第五取值时,所述当前视角的空间信息由所述当前视角的视角方向信息、所述当前视角所在视点的位置信息以及所述当前视角的视角大小信息组成。
  7. 如权利要求1所述的方法,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第一信 息用于指示所述当前视角覆盖的子区域,所述当前视角覆盖的子区域用于拼接得到所述第一目标视频图像的区域。
  8. 如权利要求1-7中任一项所述的方法,其特征在于,所述第二信息包括所述第二目标视频图像的区域的空间信息。
  9. 如权利要求1-7中任一项所述的方法,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第二信息用于指示所述第二目标视频图像覆盖的子区域,所述第二目标视频图像覆盖的子区域用于拼接得到所述第二目标视频图像的区域。
  10. 如权利要求9所述的方法,其特征在于,所述第二信息包括至少一个第三信息,所述至少一个视频数据包中每个视频数据包均携带第三信息,所述至少一个视频数据包共携带所述至少一个第三信息,所述至少一个视频数据包中的任一视频数据包携带的第三信息用于指示所述任一视频数据包对应的视频图像所属的子区域。
  11. 如权利要求1-10中任一项所述的方法,其特征在于,所述至少一个视频数据包包括携带视角标识的视频数据包。
  12. 如权利要求1-11中任一项所述的方法,其特征在于,所述方法还包括:
    所述客户端接收所述服务器发送的描述文件,所述描述文件携带第一视角信息或者第二视角信息,其中,所述第一视角信息用于指示所述服务器支持的视角的最大区域范围,所述第二视角信息用于指示初始视角的区域范围。
  13. 如权利要求12所述的方法,其特征在于,在所述描述文件携带所述第二视角信息时,所述描述文件还携带第三视角信息,所述第三视角信息还用于指示所述初始视角的空间位置。
  14. 如权利要求1-11中任一项所述的方法,其特征在于,所述方法还包括:
    所述客户端接收所述服务器发送的描述文件,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,所述描述文件携带各个子区域的子区域描述信息,所述子区域描述信息包括子区域的空间信息。
  15. 如权利要求14所述的方法,其特征在于,所述子区域描述信息包括所述各个子区域的平面空间信息,所述子区域描述信息还包括所述各个子区域之内的视频图像的映射类型信息,所述各个子区域的球面空间信息用于根据所述映射类型信息和所述各个子区域的平面空间信息确定。
  16. 如权利要求14所述的方法,其特征在于,所述子区域描述信息包括所述各个子区域的球面空间信息,所述子区域描述信息还包括所述各个子区域的形状信息。
  17. 如权利要求1-16中任一项所述的方法,其特征在于,所述至少一个视频数据包共携带第二视点信息,所述第二视点信息用于指示所述第二目标视频图像对应的视点。
  18. 一种传输媒体数据的方法,其特征在于,包括:
    客户端接收所述服务器发送的描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
    所述客户端向服务器发送第一信息,所述第一信息用于指示当前视角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
    所述客户端接收所述服务器发送的目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
  19. 一种传输媒体数据的方法,其特征在于,包括:
    服务器接收客户端发送的第一信息,所述第一信息用于指示第一目标视频图像的区域的空间位置,所述第一目标视频图像包括当前视角之内的视频图像;
    所述服务器根据所述第一信息确定第二目标视频图像,所述第二目标视频图像包括所述第一目标视频图像;
    所述服务器向所述客户端发送所述第二目标视频图像对应的视频数据包,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
  20. 如权利要求19所述的方法,其特征在于,所述第一目标视频图像为所述当前视角之内的视频图像,所述第一信息包括所述当前视角的空间信息。
  21. 如权利要求20所述的方法,其特征在于,所述方法还包括:
    所述服务器接收所述客户端发送的第一视点信息,所述第一视点信息用于指示所述当前视角所在的当前视点。
  22. 如权利要求21所述的方法,其特征在于,所述第一视点信息包括所述当前视点的空间位置信息、所述当前视点的位置变化速度信息和所述当前视点的位置变化速度的加速度信息中的至少一种。
  23. 如权利要求21或22所述的方法,其特征在于,所述第一视点信息包括当前视点的空间位置信息,所述方法还包括:
    所述服务器接收所述客户端发送的第一指示信息,所述第一指示信息包括第一标识位,所述第一标识位的取值用于指示所述当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;
    其中,当所述第一标识位为第一取值时,所述当前视点的空间位置信息为相对空间位置信息;
    当所述第一标识位为第二取值时,所述当前视点的空间位置信息为绝对空间位置信息。
  24. 如权利要求20-23中任一项所述的方法,其特征在于,所述方法还包括:
    所述服务器接收所述客户端发送的第二指示信息,所述第二指示信息包括第二标识位,所述第二标识位的取值用于指示所述当前视角的空间信息的组成信息,所述第二标识位的取值用于指示下列情况中的至少一种:
    当所述第二标识位为第三取值时,所述当前视角的空间信息由所述当前视角的视角方向信息组成;
    当所述第二标识位为第四取值时,所述当前视角的空间信息由所述当前视角的视角方向信息和所述当前视角所在的视点的位置信息组成;
    当所述第二标识位为第五取值时,所述当前视角的空间信息由所述当前视角的视角方向信息、所述当前视角所在视点的位置信息以及所述当前视角的视角大小信息组成。
  25. 如权利要求19所述的方法,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第一信息用于指示所述当前视角覆盖的子区域,所述当前视角覆盖的子区域用于拼接得到所述第一目标视频图像的区域。
  26. 如权利要求19-25中任一项所述的方法,其特征在于,所述第二信息包括所述第二目标视频图像的区域的空间信息。
  27. 如权利要求19-25中任一项所述的方法,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第二信息用于指示所述第二目标视频图像覆盖的子区域,所述第二目标视频图像覆盖的子区域用于拼接得到所述第二目标视频图像的区域。
  28. 如权利要求27所述的方法,其特征在于,所述第二信息包括至少一个第三信息,所述至少一个视频数据包中每个视频数据包均携带第三信息,所述至少一个视频数据包共携带所述至少一个第三信息,所述至少一个视频数据包中的任一视频数据包携带的第三信息用于指示所述任一视频数据包对应的视频图像所属的子区域。
  29. 如权利要求19-28中任一项所述的方法,其特征在于,所述至少一个视频数据包包括携带视角标识的视频数据包。
  30. 如权利要求19-29中任一项所述的方法,其特征在于,所述方法还包括:
    所述服务器向所述客户端发送描述文件,所述描述文件携带第一视角信息或者第二视角信息,其中,所述第一视角信息用于指示所述服务器支持的视角的最大区域范围,所述第二视角信息用于指示初始视角的区域范围。
  31. 如权利要求30所述的方法,其特征在于,在所述描述文件携带所述第二视角信息时,所述描述文件还携带第三视角信息,所述第三视角信息还用于指示所述初始视角的空间位置。
  32. 如权利要求19-29中任一项所述的方法,其特征在于,所述方法还包括:
    所述服务器向所述客户端发送描述文件,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,所述描述文件携带各个子区域的子区域描述信息,所述子区域描述信息包括子区域的空间信息。
  33. 如权利要求32所述的方法,其特征在于,所述子区域描述信息包括所述各个子区域的平面空间信息,所述子区域描述信息还包括所述各个子区域之内的视频图像的映射类型信息,所述各个子区域的球面空间信息用于根据所述映射类型信息和所述各个子区域的平面空间信息确定。
  34. 如权利要求32所述的方法,其特征在于,所述子区域描述信息包括所述各个子区域的球面空间信息,所述子区域描述信息还包括所述各个子区域的形状信息。
  35. 如权利要求19-34中任一项所述的方法,其特征在于,所述至少一个视频数据包共携带第二视点信息,所述第二视点信息用于指示所述第二目标视频图像对应的视点。
  36. 一种传输媒体数据的方法,其特征在于,包括:
    服务器向客户端发送描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子 区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
    所述服务器接收所述客户端发送的第一信息,所述第一信息用于指示当前视角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
    所述服务器向所述客户端发送目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
  37. 一种客户端,其特征在于,包括:
    发送模块,用于向服务器发送第一信息,所述第一信息用于指示第一目标视频图像的区域的空间信息,所述第一目标视频图像包括当前视角之内的视频图像;
    接收模块,用于接收所述服务器发送的第二目标视频图像对应的视频数据包,其中,所述第二目标视频图像包括所述第一目标视频图像,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
  38. 如权利要求37所述的客户端,其特征在于,所述第一目标视频图像为所述当前视角之内的视频图像,所述第一信息包括所述当前视角的空间信息。
  39. 如权利要求38所述的客户端,其特征在于,所述发送模块还用于向所述服务器发送第一视点信息,所述第一视点信息用于指示所述当前视角所在的当前视点。
  40. 如权利要求39所述的客户端,其特征在于,所述第一视点信息包括所述当前视点的空间位置信息、所述当前视点的位置变化速度信息和所述当前视点的位置变化速度的加速度信息中的至少一种。
  41. 如权利要求39或40所述的客户端,其特征在于,所述第一视点信息包括当前视点的空间位置信息,所述发送模块还用于向所述服务器发送第一指示信息,所述第一指示信息包括第一标识位,所述第一标识位的取值用于指示所述当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;
    其中,当所述第一标识位为第一取值时,所述当前视点的空间位置信息为相对空间位置信息;
    当所述第一标识位为第二取值时,所述当前视点的空间位置信息为绝对空间位置信息。
  42. 如权利要求38-41中任一项所述的客户端,其特征在于,所述发送模块还用于向所述服务器发送第二指示信息,所述第二指示信息包括第二标识位,所述第二标识位的取值用于指示所述当前视角的空间信息的组成信息,所述第二标识位的取值用于指示下列情况中的至少一种:
    当所述第二标识位为第三取值时,所述当前视角的空间信息由所述当前视角的视角方向信息组成;
    当所述第二标识位为第四取值时,所述当前视角的空间信息由所述当前视角的视角方向信息和所述当前视角所在的视点的位置信息组成;
    当所述第二标识位为第五取值时,所述当前视角的空间信息由所述当前视角的视角方向信息、所述当前视角所在视点的位置信息以及所述当前视角的视角大小信息组成。
  43. 如权利要求37所述的客户端,其特征在于,全景视频图像包括对所述全景视频 图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第一信息用于指示所述当前视角覆盖的子区域,所述当前视角覆盖的子区域用于拼接得到所述第一目标视频图像的区域。
  44. 如权利要求37-43中任一项所述的客户端,其特征在于,所述第二信息包括所述第二目标视频图像的区域的空间信息。
  45. 如权利要求37-43中任一项所述的客户端,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第二信息用于指示所述第二目标视频图像覆盖的子区域,所述第二目标视频图像覆盖的子区域用于拼接得到所述第二目标视频图像的区域。
  46. 如权利要求45所述的客户端,其特征在于,所述第二信息包括至少一个第三信息,所述至少一个视频数据包中每个视频数据包均携带第三信息,所述至少一个视频数据包共携带所述至少一个第三信息,所述至少一个视频数据包中的任一视频数据包携带的第三信息用于指示所述任一视频数据包对应的视频图像所属的子区域。
  47. 如权利要求37-46中任一项所述的客户端,其特征在于,所述至少一个视频数据包包括携带视角标识的视频数据包。
  48. 如权利要求37-47中任一项所述的客户端,其特征在于,所述接收模块还用于:
    接收所述服务器发送的描述文件,所述描述文件携带第一视角信息或者第二视角信息,其中,所述第一视角信息用于指示所述服务器支持的视角的最大区域范围,所述第二视角信息用于指示初始视角的区域范围。
  49. 如权利要求48所述的客户端,其特征在于,在所述描述文件携带所述第二视角信息时,所述描述文件还携带第三视角信息,所述第三视角信息还用于指示所述初始视角的空间位置。
  50. 如权利要求37-47中任一项所述的客户端,其特征在于,所述接收模块还用于:
    接收所述服务器发送的描述文件,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,所述描述文件携带各个子区域的子区域描述信息,所述子区域描述信息包括子区域的空间信息。
  51. 如权利要求50所述的客户端,其特征在于,所述子区域描述信息包括所述各个子区域的平面空间信息,所述子区域描述信息还包括所述各个子区域之内的视频图像的映射类型信息,所述各个子区域的球面空间信息用于根据所述映射类型信息和所述各个子区域的平面空间信息确定。
  52. 如权利要求50所述的客户端,其特征在于,所述子区域描述信息包括所述各个子区域的球面空间信息,所述子区域描述信息还包括所述各个子区域的形状信息。
  53. 如权利要求37-52中任一项所述的客户端,其特征在于,所述至少一个视频数据包共携带第二视点信息,所述第二视点信息用于指示所述第二目标视频图像对应的视点。
  54. 一种客户端,其特征在于,包括:
    接收模块,用于接收所述服务器发送的描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全 景视频图像的区域进行划分得到的,子区域图像为子区域之内的视频图像;
    发送模块,用于向服务器发送第一信息,所述第一信息用于指示当前视角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
    所述接收模块还用于接收所述服务器发送的目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
  55. 一种服务器,其特征在于,包括:
    接收模块,用于接收客户端发送的第一信息,所述第一信息用于指示第一目标视频图像的区域的空间位置,所述第一目标视频图像包括当前视角之内的视频图像;
    确定模块,用于根据所述第一信息确定第二目标视频图像,所述第二目标视频图像包括所述第一目标视频图像;
    发送模块,用于向所述客户端发送所述第二目标视频图像对应的视频数据包,在所述第二目标视频图像对应的视频数据包中,至少一个视频数据包共携带第二信息,所述第二信息用于指示所述第二目标视频图像的区域的空间信息。
  56. 如权利要求55所述的服务器,其特征在于,所述第一目标视频图像为所述当前视角之内的视频图像,所述第一信息包括所述当前视角的空间信息。
  57. 如权利要求56所述的服务器,其特征在于,所述接收模块还用于接收所述客户端发送的第一视点信息,所述第一视点信息用于指示所述当前视角所在的当前视点。
  58. 如权利要求57所述的客户端,其特征在于,所述第一视点信息包括所述当前视点的空间位置信息、所述当前视点的位置变化速度信息和所述当前视点的位置变化速度的加速度信息中的至少一种。
  59. 如权利要求57或58所述的客户端,其特征在于,所述第一视点信息包括当前视点的空间位置信息,所述接收模块还用于接收所述客户端发送的第一指示信息,所述第一指示信息包括第一标识位,所述第一标识位的取值用于指示所述当前视点的空间位置信息为相对空间位置信息或者绝对空间位置信息;
    其中,当所述第一标识位为第一取值时,所述当前视点的空间位置信息为相对空间位置信息;
    当所述第一标识位为第二取值时,所述当前视点的空间位置信息为绝对空间位置信息。
  60. 如权利要求56-59中任一项所述的客户端,其特征在于,所述接收模块还用于接收所述客户端发送的第二指示信息,所述第二指示信息包括第二标识位,所述第二标识位的取值用于指示所述当前视角的空间信息的组成信息,所述第二标识位的取值用于指示下列情况中的至少一种:
    当所述第二标识位为第三取值时,所述当前视角的空间信息由所述当前视角的视角方向信息组成;
    当所述第二标识位为第四取值时,所述当前视角的空间信息由所述当前视角的视角方向信息和所述当前视角所在的视点的位置信息组成;
    当所述第二标识位为第五取值时,所述当前视角的空间信息由所述当前视角的视角方向信息、所述当前视角所在视点的位置信息以及所述当前视角的视角大小信息组成。
  61. 如权利要求55所述的服务器,其特征在于,全景视频图像包括对所述全景视频 图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第一信息用于指示所述当前视角覆盖的子区域,所述当前视角覆盖的子区域用于拼接得到所述第一目标视频图像的区域。
  62. 如权利要求55-61中任一项所述的服务器,其特征在于,所述第二信息包括所述第二目标视频图像的区域的空间信息。
  63. 如权利要求55-61中任一项所述的服务器,其特征在于,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,其中,所述当前视角覆盖至少一个子区域,所述第二信息用于指示所述第二目标视频图像覆盖的子区域,所述第二目标视频图像覆盖的子区域用于拼接得到所述第二目标视频图像的区域。
  64. 如权利要求63所述的服务器,其特征在于,所述第二信息包括至少一个第三信息,所述至少一个视频数据包中每个视频数据包均携带第三信息,所述至少一个视频数据包共携带所述至少一个第三信息,所述至少一个视频数据包中的任一视频数据包携带的第三信息用于指示所述任一视频数据包对应的视频图像所属的子区域。
  65. 如权利要求55-64中任一项所述的服务器,其特征在于,所述至少一个视频数据包包括携带视角标识的视频数据包。
  66. 如权利要求55-65中任一项所述的服务器,其特征在于,所述发送模块还用于:
    向所述客户端发送描述文件,所述描述文件携带第一视角信息或者第二视角信息,其中,所述第一视角信息用于指示所述服务器支持的视角的最大区域范围,所述第二视角信息用于指示初始视角的区域范围。
  67. 如权利要求66所述的服务器,其特征在于,在所述描述文件携带所述第二视角信息时,所述描述文件还携带第三视角信息,所述第三视角信息还用于指示所述初始视角的空间位置。
  68. 如权利要求55-65中任一项所述的服务器,其特征在于,所述发送模块还用于:
    向所述客户端发送描述文件,全景视频图像包括对所述全景视频图像进行划分得到的至少两个子区域,所述描述文件携带各个子区域的子区域描述信息,所述子区域描述信息包括子区域的空间信息。
  69. 如权利要求68所述的服务器,其特征在于,所述子区域描述信息包括所述各个子区域的平面空间信息,所述子区域描述信息还包括所述各个子区域之内的视频图像的映射类型信息,所述各个子区域的球面空间信息用于根据所述映射类型信息和所述各个子区域的平面空间信息确定。
  70. 如权利要求68所述的服务器,其特征在于,所述子区域描述信息包括所述各个子区域的球面空间信息,所述子区域描述信息还包括所述各个子区域的形状信息。
  71. 如权利要求55-70中任一项所述的服务器,其特征在于,所述至少一个视频数据包共携带第二视点信息,所述第二视点信息用于指示所述第二目标视频图像对应的视点。
  72. 一种服务器,其特征在于,包括:
    发送模块,用于向客户端发送描述文件,所述描述文件携带至少两个会话的会话描述信息,所述至少两个会话为所述客户端与所述服务器之间的会话,所述至少两个会话用于传输各自对应的子区域图像的码流数据,所述会话描述信息包括通过所述各个会话各自所传输的子区域图像的码流数据对应的子区域的空间信息,其中,子区域是对全景视频图像 的区域进行划分得到的,子区域图像为子区域之内的视频图像;
    接收模块,用于接收所述客户端发送的第一信息,所述第一信息用于指示当前视角所覆盖的子区域对应的会话,所述第一信息根据所述当前视角和所述会话描述信息确定;
    所述发送模块还用于向所述客户端发送目标视频图像的码流数据,所述目标视频图像包括所述当前视角所覆盖的子区域之内的视频图像。
PCT/CN2018/105036 2018-08-02 2018-09-11 传输媒体数据的方法、客户端和服务器 WO2020024373A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/072735 WO2020024567A1 (zh) 2018-08-02 2019-01-22 传输媒体数据的方法、客户端和服务器

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810873806.X 2018-08-02
CN201810873806.XA CN110798707B (zh) 2018-08-02 2018-08-02 传输媒体数据的方法、客户端和服务器

Publications (1)

Publication Number Publication Date
WO2020024373A1 true WO2020024373A1 (zh) 2020-02-06

Family

ID=69230571

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2018/105036 WO2020024373A1 (zh) 2018-08-02 2018-09-11 传输媒体数据的方法、客户端和服务器
PCT/CN2019/072735 WO2020024567A1 (zh) 2018-08-02 2019-01-22 传输媒体数据的方法、客户端和服务器

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072735 WO2020024567A1 (zh) 2018-08-02 2019-01-22 传输媒体数据的方法、客户端和服务器

Country Status (3)

Country Link
US (1) US11368729B2 (zh)
CN (1) CN110798707B (zh)
WO (2) WO2020024373A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11202110312XA (en) * 2019-03-20 2021-10-28 Beijing Xiaomi Mobile Software Co Ltd Method and device for transmitting viewpoint switching capabilities in a vr360 application
WO2021011772A1 (en) * 2019-07-16 2021-01-21 Apple Inc. Streaming of volumetric point cloud content based on session description protocols and real time protocols
CN115086635B (zh) * 2021-03-15 2023-04-14 腾讯科技(深圳)有限公司 多视角视频的处理方法、装置、设备及存储介质
CN117581550A (zh) * 2021-11-10 2024-02-20 英特尔公司 基于内容分区的沉浸式媒体实时通信

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491353A (zh) * 2016-01-15 2016-04-13 广东小天才科技有限公司 一种远程监控方法和装置
CN105898254A (zh) * 2016-05-17 2016-08-24 亿唐都科技(北京)有限公司 节省带宽的vr全景视频布局方法、装置及展现方法、系统
CN105915937A (zh) * 2016-05-10 2016-08-31 上海乐相科技有限公司 一种全景视频播放方法及设备
CN105916060A (zh) * 2016-04-26 2016-08-31 乐视控股(北京)有限公司 数据传输的方法、装置及系统
CN106791888A (zh) * 2016-12-20 2017-05-31 三星电子(中国)研发中心 基于用户视角的全景图片的传输方法和装置
CN107395984A (zh) * 2017-08-25 2017-11-24 北京佰才邦技术有限公司 一种视频传输的方法及装置
WO2018067680A1 (en) * 2016-10-05 2018-04-12 Hidden Path Entertainment, Inc. System and method of capturing and rendering a stereoscopic panorama using a depth buffer
CN108040260A (zh) * 2017-12-13 2018-05-15 北京视博云科技有限公司 C/s架构下高清全景视频的观看方法及系统、终端及服务器
WO2018108104A1 (zh) * 2016-12-13 2018-06-21 中兴通讯股份有限公司 一种全景视频传输方法、装置、终端、服务器及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG188630A1 (en) * 2010-09-24 2013-04-30 Gnzo Inc Video bit stream transmission system
JP5002047B2 (ja) * 2010-11-05 2012-08-15 シャープ株式会社 立体画像データ再生装置
US9349195B2 (en) * 2012-03-19 2016-05-24 Google Inc. Apparatus and method for spatially referencing images
EP2824883A1 (en) * 2013-07-12 2015-01-14 Alcatel Lucent A video client and video server for panoramic video consumption
CN108886639B (zh) * 2016-02-02 2021-05-07 弗劳恩霍夫应用研究促进协会 视频流传输中的场景部分和感兴趣区域处理
CN107318008A (zh) * 2016-04-27 2017-11-03 深圳看到科技有限公司 全景视频播放方法及播放装置
US10582201B2 (en) * 2016-05-19 2020-03-03 Qualcomm Incorporated Most-interested region in an image
CN106101847A (zh) * 2016-07-12 2016-11-09 三星电子(中国)研发中心 全景视频交互传输的方法和系统
EP3501014A1 (en) * 2016-08-17 2019-06-26 VID SCALE, Inc. Secondary content insertion in 360-degree video
BR112019003605A2 (pt) * 2016-08-25 2019-05-21 Lg Electronics Inc. método para transmitir vídeo omnidirecional, método para receber vídeo omnidirecional, aparelho para transmitir vídeo omnidirecional, e aparelho para receber vídeo omnidirecional
US10595069B2 (en) * 2016-12-05 2020-03-17 Adobe Inc. Prioritizing tile-based virtual reality video streaming using adaptive rate allocation
EP3334164B1 (en) * 2016-12-09 2019-08-21 Nokia Technologies Oy A method and an apparatus and a computer program product for video encoding and decoding
CN107396077B (zh) * 2017-08-23 2022-04-08 深圳看到科技有限公司 虚拟现实全景视频流投影方法和设备
CN107995493B (zh) * 2017-10-30 2021-03-19 河海大学 一种全景视频的多描述视频编码方法
CN108111832A (zh) * 2017-12-25 2018-06-01 北京麒麟合盛网络技术有限公司 增强现实ar视频的异步交互方法及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491353A (zh) * 2016-01-15 2016-04-13 广东小天才科技有限公司 一种远程监控方法和装置
CN105916060A (zh) * 2016-04-26 2016-08-31 乐视控股(北京)有限公司 数据传输的方法、装置及系统
CN105915937A (zh) * 2016-05-10 2016-08-31 上海乐相科技有限公司 一种全景视频播放方法及设备
CN105898254A (zh) * 2016-05-17 2016-08-24 亿唐都科技(北京)有限公司 节省带宽的vr全景视频布局方法、装置及展现方法、系统
WO2018067680A1 (en) * 2016-10-05 2018-04-12 Hidden Path Entertainment, Inc. System and method of capturing and rendering a stereoscopic panorama using a depth buffer
WO2018108104A1 (zh) * 2016-12-13 2018-06-21 中兴通讯股份有限公司 一种全景视频传输方法、装置、终端、服务器及系统
CN106791888A (zh) * 2016-12-20 2017-05-31 三星电子(中国)研发中心 基于用户视角的全景图片的传输方法和装置
CN107395984A (zh) * 2017-08-25 2017-11-24 北京佰才邦技术有限公司 一种视频传输的方法及装置
CN108040260A (zh) * 2017-12-13 2018-05-15 北京视博云科技有限公司 C/s架构下高清全景视频的观看方法及系统、终端及服务器

Also Published As

Publication number Publication date
CN110798707A (zh) 2020-02-14
US11368729B2 (en) 2022-06-21
CN110798707B (zh) 2023-06-16
US20210160552A1 (en) 2021-05-27
WO2020024567A1 (zh) 2020-02-06

Similar Documents

Publication Publication Date Title
US11282283B2 (en) System and method of predicting field of view for immersive video streaming
RU2711591C1 (ru) Способ, устройство и компьютерная программа для адаптивной потоковой передачи мультимедийного контента виртуальной реальности
WO2020024373A1 (zh) 传输媒体数据的方法、客户端和服务器
US20190325652A1 (en) Information Processing Method and Apparatus
EP3557845B1 (en) Method and device for transmitting panoramic videos, terminal, server and system
CN110519652B (zh) Vr视频播放方法、终端及服务器
TWI670973B (zh) 在iso基本媒體檔案格式推導虛擬實境投影、填充、感興趣區域及視埠相關軌跡並支援視埠滾動訊號之方法及裝置
US11451838B2 (en) Method for adaptive streaming of media
US20200145736A1 (en) Media data processing method and apparatus
US10659815B2 (en) Method of dynamic adaptive streaming for 360-degree videos
US20210250568A1 (en) Video data processing and transmission methods and apparatuses, and video data processing system
WO2019157803A1 (zh) 传输控制方法
WO2021190221A1 (zh) 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
US20220369000A1 (en) Split rendering of extended reality data over 5g networks
US20190199921A1 (en) Method for transmitting 360-degree video, method for receiving 360-degree video, 360-degree video transmitting device, and 360-degree video receiving device
WO2020063924A1 (zh) 传输媒体数据的方法、客户端和服务器
US20230033063A1 (en) Method, an apparatus and a computer program product for video conferencing
CN107438203B (zh) 用于建立和接收清单的方法、网络设备及终端
GB2568020A (en) Transmission of video content based on feedback
GB2567136A (en) Moving between spatially limited video content and omnidirectional video content
WO2018120474A1 (zh) 一种信息的处理方法及装置
EP3386203A1 (en) Signalling of auxiliary content for a broadcast signal
WO2023284487A1 (zh) 容积媒体的数据处理方法、装置、设备以及存储介质
US20240195966A1 (en) A method, an apparatus and a computer program product for high quality regions change in omnidirectional conversational video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928620

Country of ref document: EP

Kind code of ref document: A1