WO2022222533A1 - 视频播放方法、装置及系统、计算机可读存储介质 - Google Patents

视频播放方法、装置及系统、计算机可读存储介质 Download PDF

Info

Publication number
WO2022222533A1
WO2022222533A1 PCT/CN2021/141641 CN2021141641W WO2022222533A1 WO 2022222533 A1 WO2022222533 A1 WO 2022222533A1 CN 2021141641 W CN2021141641 W CN 2021141641W WO 2022222533 A1 WO2022222533 A1 WO 2022222533A1
Authority
WO
WIPO (PCT)
Prior art keywords
rotation
video
slice
camera position
camera
Prior art date
Application number
PCT/CN2021/141641
Other languages
English (en)
French (fr)
Inventor
侯成宝
屈小刚
李虹波
曹阳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21937755.3A priority Critical patent/EP4319168A1/en
Publication of WO2022222533A1 publication Critical patent/WO2022222533A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/1613Constructional details or arrangements for portable computers
    • G06F1/1633Constructional details or arrangements of portable computers not specific to the type of enclosures covered by groups G06F1/1615 - G06F1/1626
    • G06F1/1684Constructional details or arrangements related to integrated I/O peripherals not covered by groups G06F1/1635 - G06F1/1675
    • G06F1/1694Constructional details or arrangements related to integrated I/O peripherals not covered by groups G06F1/1635 - G06F1/1675 the I/O peripheral being a single or a set of motion sensors for pointer control or gesture input obtained by sensing movements of the portable computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/437Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • H04N9/8227Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal the additional signal being at least another television signal

Definitions

  • the present application relates to the technical field of video processing, and in particular, to a video playback method, device, and system, and a computer-readable storage medium.
  • Surround playback requires front-end shooting to use multiple cameras distributed at specific locations to capture video images from different angles in the same focus area. At the same time, based on the camera synchronization technology, the time and frequency of image capture by multiple cameras are guaranteed to be the same. Then, the multiple cameras respectively send the collected video streams to the video processing platform, and the video processing platform processes the multiple video streams, and further realizes the surround playback of the focus area on the terminal.
  • the server usually splices video frames with the same acquisition time in multiple video streams into one video frame.
  • front-end shooting uses 16 cameras to capture video images from different angles in the same focus area.
  • the server adjusts the resolution of the video frames in each of the 16 video streams received to 960 ⁇ 540, and then adjusts the 16 video frames in the 16 video streams at the same acquisition time according to the same ratio of 4 ⁇ 4.
  • a video stream is obtained.
  • the server sends the video stream to the terminal. After decoding the video stream, the terminal selects 1/16 of the video image (a video image captured by a camera) according to the set viewing position to play.
  • the present application provides a video playback method, device and system, and a computer-readable storage medium, which can solve the problem of high application limitations of video playback in the related art.
  • a video playback method includes: an upper-layer device receives a playback request sent by a terminal, where the playback request includes player position information, and the player position information is used to indicate a requested target position for playback.
  • the upper-layer device sends the video slice corresponding to the target camera position and the rotated video data corresponding to the target camera position to the terminal.
  • the rotating video data includes video data corresponding to the forward camera and/or video data corresponding to the reverse camera.
  • the forward camera position includes one or more first camera positions located in the clockwise direction of the target camera position
  • the reverse camera position includes one or more second camera positions located in the counterclockwise direction of the target camera position.
  • the upper-layer device sends to the terminal the video segment corresponding to the target camera position requested by the terminal and the rotating video data corresponding to the target camera position, and after receiving the video fragment corresponding to the target camera position, the terminal Decoding the video slices can realize the playback of the video images collected by the target camera; when the terminal receives the rotation instruction, it can realize the surround playback of the video images according to the pre-acquired rotation video data, and the surround playback delay is relatively long.
  • the resolution of the played video picture can be the same as the resolution of the video image in the video slice or the video image in the rotated video data. Therefore, the video playback method provided by the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the upper-layer device does not need to always send the video images captured by all cameras to the terminal, which can reduce the amount of data transmission and save transmission resources.
  • the implementation process of the upper-layer device sending the rotation video data corresponding to the target camera position to the terminal includes: in response to the upper-layer device receiving the rotation preparation request sent by the terminal, the upper-layer device sends the rotation video data to the terminal, and the rotation preparation request is used for. Request to obtain the rotated video data corresponding to the target camera. Or, in response to the playback request, the upper-layer device sends the rotated video data to the terminal.
  • the upper-layer device may, after receiving the playback request sent by the terminal, actively send the rotating video data corresponding to the position requested to be played to the terminal, or, after receiving the rotation preparation request sent by the terminal, passively respond Send the rotating video data corresponding to the requested camera to the terminal.
  • the video data corresponding to the forward camera position includes video slices corresponding to each first camera position.
  • the forward camera position includes a plurality of first camera positions located in the clockwise direction of the target camera position, the video data corresponding to the forward camera position is a forward rotation slice, and the forward rotation slice includes a forward dynamic rotation sub-section. Slices and/or Forward Static Rotation Subslices.
  • the forward dynamic rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of first camera positions, and each image frame group in the forward dynamic rotation sub-slice is based on a The video images in the video slice corresponding to the first camera position are obtained, and the multiple image frame groups in the forward dynamic rotation sub-slice are arranged in chronological order, and arrive at the target in a clockwise direction according to the plurality of first camera positions.
  • the distances of the seats are arranged in order from nearest to farthest.
  • the forward static rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of first camera positions, and each image frame group in the forward static rotation sub-slice is based on a first camera position.
  • the video images in the video slice corresponding to the camera position are obtained, and the playback periods corresponding to the multiple image frame groups in the forward static rotation sub-slice are the same, and they reach the target camera position in a clockwise direction according to the multiple first camera positions.
  • the distances are arranged in order from nearest to farthest.
  • the video data corresponding to the reverse camera position includes video slices corresponding to each second camera position.
  • the reverse camera position includes a plurality of second camera positions located in the counterclockwise direction of the target camera position
  • the video data corresponding to the reverse camera position is a reverse rotation slice
  • the reverse rotation slice includes a reverse dynamic rotation sub-slice and/or reverse rotation.
  • Static spinner shards the reverse dynamic rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of second camera positions, and each image frame group in the reverse dynamic rotation sub-slice is based on a second camera position.
  • the video images in the video slice corresponding to the camera position are obtained, and the multiple image frame groups in the reverse dynamic rotation sub-slice are arranged in chronological order, and according to the number of second camera positions in the counterclockwise direction to the target camera position.
  • the distances are arranged in order from nearest to farthest.
  • the reverse static rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of first camera positions, and each image frame group in the reverse static rotation sub-slice is based on a first camera position
  • the video images in the corresponding video slices are obtained, and the playback periods corresponding to the multiple image frame groups in the reverse static rotation sub-slice are the same, and the distances from the multiple first positions to the target position in the clockwise direction are given by Arranged from near to far.
  • the image frame group involved in this application includes one or more frames of video images, and each image frame group can be decoded independently.
  • the upper-layer device may also receive a surround playback request sent by the terminal, where the surround playback request includes rotation camera position information, and the rotation camera position information is used to indicate a rotation range.
  • the upper-layer device determines playback time information based on the surround playback request.
  • the upper-layer device generates a rotating slice according to the rotating camera position information and the playback time information.
  • the rotating slice includes image frame groups corresponding to multiple camera positions within the rotation range, and the image frame group includes one or more frames of video images. Groups of picture frames can be decoded independently.
  • the upper-layer device sends the rotated shard to the terminal.
  • the upper-layer device determines the playback time information according to the surround playback request sent by the terminal, and then generates the rotation segment according to the playback time information and the rotation position information in the surround playback request. Since the rotating slice contains image frame groups corresponding to multiple cameras within the rotation range indicated by the rotating camera information, after receiving the rotating slice, the terminal decodes the rotating slice to realize the wrapping of the video picture. Play, and the resolution of the played video picture can be the same as the resolution of the video image in the rotated slice. Therefore, the video playback method provided by the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the group of picture frames is a GOP.
  • the group of image frames includes interpolated frames.
  • the image frame group includes a combination of interpolated frames and P frames.
  • the image frame group includes a combination of interpolated frames, P frames, and B frames.
  • the image frame group includes inserted frames, that is, the rotated slices can be generated based on the inserted frames.
  • the video slices sent by the upper-layer device to the terminal do not need to use full I frames or mini GOPs, but normal GOPs can be used, which can reduce the The data volume of the video fragment sent by the upper-layer device to the terminal; and the data volume of the inserted frame is usually less than the data volume of the I frame, which can reduce the data volume of the rotated fragment sent by the upper-layer device to the terminal. Therefore, the frame insertion technology is used to generate rotation Fragmentation can effectively reduce the consumption of network transmission resources.
  • the rotation preparation request includes one or more of the preparation rotation direction, the number of preparation rotation stands, the identification of the preparation rotation station, or the preparation rotation state, and the preparation rotation state includes a dynamic rotation state and/or a static rotation state,
  • the content in the rotation preparation request is pre-configured in the terminal.
  • a video playback method includes: when the terminal receives the playback instruction, the terminal sends a playback request generated based on the playback instruction to the upper-layer device, the playback request includes player position information, and the player position information is used to indicate the requested target position.
  • the terminal receives the video slice corresponding to the target camera position and the rotating video data corresponding to the target camera position sent by the upper-layer device.
  • the rotating video data includes the video data corresponding to the forward camera position and/or the video data corresponding to the reverse camera position.
  • the positions include one or more first positions in a clockwise direction from the target position, and the reverse positions include one or more second positions in a counterclockwise direction from the target position.
  • the terminal When the terminal receives a rotation instruction in the process of playing the video image based on the video segment corresponding to the target camera, the terminal determines the rotation direction according to the rotation instruction, and the rotation direction is clockwise or counterclockwise. In response to the rotating video data including the target video data corresponding to the camera position located in the rotation direction of the target camera position, the terminal plays the video picture based on the target video data.
  • the upper-layer device sends to the terminal the video segment corresponding to the target camera position requested by the terminal and the rotating video data corresponding to the target camera position, and after receiving the video fragment corresponding to the target camera position, the terminal Decoding the video slices can realize the playback of the video images collected by the target camera; when the terminal receives the rotation instruction, it can realize the surround playback of the video images according to the rotated video data, and the surround playback delay is low, and
  • the resolution of the played video picture may be the same as the resolution of the video image in the video slice or the video image in the rotated video data. Therefore, the video playback method provided by the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the upper-layer device does not need to always send the video images captured by all cameras to the terminal, which can reduce the amount of data transmission and save transmission resources.
  • the terminal may also generate a rotation preparation request, where the rotation preparation request is used to request to obtain the rotation video data corresponding to the target camera.
  • the terminal sends a rotation preparation request to the upper-layer device, and the rotation video data corresponding to the target camera position is sent by the upper-layer device in response to the rotation preparation request.
  • the rotation preparation request includes one or more of the preparation rotation direction, the number of preparation rotation stands, the identification of the preparation rotation station, or the preparation rotation state, and the preparation rotation state includes a dynamic rotation state and/or a static rotation state,
  • the content in the rotation preparation request is pre-configured in the terminal.
  • the target video data is a target rotated slice.
  • the target rotation slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the multiple camera positions located in the rotation directions of the target camera position, and each image frame group in the target rotation slice is based on the target camera position.
  • a video image in a video slice corresponding to one camera in the rotation direction of the bit is obtained.
  • the image frame group includes one or more frames of video images, and each image frame group can be decoded independently.
  • the target rotation segment in response to the terminal receiving a rotation instruction in a video playback state, includes a dynamic rotation sub-slice, and multiple image frame groups in the dynamic rotation sub-slice are arranged in chronological order, and are arranged according to multiple machine The distances from the position to the target position in the rotation direction are arranged in order from near to far.
  • the target rotation segment in response to the terminal receiving the rotation instruction in the video paused playback state, includes a static rotation sub-segment, and the playback periods corresponding to the multiple image frame groups in the static rotation sub-segment are the same, and are arranged according to multiple The distances from the positions to the target position in the rotation direction are arranged in order from near to far.
  • the implementation process for the terminal to play a video picture based on the target video data includes: the terminal decodes and plays the target rotated slice.
  • the surround playback of the video picture can be realized by decoding the pre-acquired rotation slices, and the surround playback delay is low.
  • the target video data includes video segments corresponding to a plurality of camera positions located in the rotation direction of the target camera position.
  • the implementation process of the terminal playing a video image based on the target video data includes: the terminal generates an image frame group based on the video images in the video slices corresponding to multiple cameras, and the image frame group includes one or more frames of video images, and each image frame group includes one or more frames of video images. Groups of frames can be decoded independently.
  • the terminal plays the video images in the generated image frame group in sequence according to the distances from the multiple camera positions to the target camera position in the rotation direction from near to far.
  • the terminal when the terminal receives a rotation instruction, the terminal can decode and play the video images in the video slices corresponding to the camera positions in the pre-acquired rotation direction to realize the surround playback of the video images, and the surround playback delay is low. .
  • the terminal when the terminal receives the rotation instruction, the terminal sends a surround playback request generated based on the rotation instruction to the upper-layer device, the surround playback request includes rotation camera position information, and the rotation camera position information is used to indicate the rotation range.
  • the terminal receives the rotating slice sent by the upper-layer device.
  • the rotating slice includes image frame groups corresponding to multiple cameras within the rotation range.
  • the image frame group includes one or more frames of video images, and each image frame group can be decoded independently .
  • the terminal decodes and plays the rotated slice.
  • the image frame group is a GOP; or, the image frame group includes interleaved frames; or, the image frame group includes a combination of interleaved frames and P frames; or, the image frame group includes a combination of interleaved frames, P frames, and B frames.
  • a video playback device is provided.
  • the video playback device is an upper-layer device.
  • the apparatus includes a plurality of functional modules, and the plurality of functional modules interact to implement the methods in the first aspect and the various embodiments thereof.
  • the multiple functional modules may be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules may be arbitrarily combined or divided based on specific implementations.
  • a video playback device is provided.
  • the video playback device is a terminal.
  • the apparatus includes a plurality of functional modules, and the plurality of functional modules interact to implement the methods in the second aspect and the respective embodiments thereof.
  • the multiple functional modules may be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules may be arbitrarily combined or divided based on specific implementations.
  • a video playback system in a fifth aspect, includes: an upper layer device and a terminal, the upper layer device includes the video playback device according to the third aspect, and the terminal includes the video according to the fourth aspect. playback device.
  • a video playback device including: a processor and a memory;
  • the memory for storing a computer program, the computer program including program instructions
  • the processor is configured to invoke the computer program to implement the video playback method according to any one of the first aspect; or, implement the video playback method according to any one of the second aspect.
  • a computer storage medium is provided, and instructions are stored on the computer storage medium.
  • the instructions are executed by a processor of a computer device, the video according to any one of the first aspect or the second aspect is implemented. play method.
  • a chip in an eighth aspect, includes a programmable logic circuit and/or program instructions. When the chip is running, it implements the methods in the above-mentioned first aspect and its various embodiments or implements the above-mentioned second aspect and its various implementations. method in method.
  • a computer program product including a computer program that, when executed by a processor, implements the video playback method according to any one of the first aspect or the second aspect.
  • the upper-layer device sends the video segment corresponding to the target camera position requested by the terminal to play and the rotating video data corresponding to the target camera position to the terminal.
  • the terminal After receiving the video fragment corresponding to the target camera position, the terminal performs a Decoding can realize the playback of the video screen collected by the target camera; when the terminal receives the rotation command, it can realize the surround playback of the video screen according to the prefetched rotation video data, the surround playback delay is low, and the playback
  • the resolution of the video picture can be the same as the resolution of the video image in the video slice or the video image in the rotated video data. Therefore, the video playback method provided by the embodiments of the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the upper-layer device does not need to always send the video images captured by all cameras to the terminal, which can reduce the amount of data transmission and save transmission resources.
  • rotated slices can be generated based on inserted frames.
  • the video slices sent by the upper-layer device to the terminal do not need to use full I-frames or mini GOPs, but normal GOPs can be used, which can reduce the cost of video slices sent by the upper-layer equipment to the terminal.
  • the data amount of the inserted frame is usually smaller than the data amount of the I frame, which can reduce the data amount of the rotating fragment sent by the upper-layer device to the terminal. Therefore, the use of the inserted frame technology to generate the rotating fragment can effectively reduce the network transmission resources. consume.
  • FIG. 1 is a schematic structural diagram of a video playback system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a video fragmentation provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a comparative structure of a GOP obtained by encoding and an inserted frame stream provided by an embodiment of the present application;
  • FIG. 4 is a schematic diagram of a camera distribution scene on a media source side provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a video playback method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another camera distribution scene on the media source side provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of video slices corresponding to a plurality of camera positions respectively provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of media content sent by an upper-layer device to a terminal according to an embodiment of the present application
  • FIG. 9 is a schematic structural diagram of a forward static rotating slice provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a video stream and an inserted frame stream corresponding to a plurality of camera positions respectively provided by an embodiment of the present application;
  • FIG. 11 is a schematic diagram of media content sent by another upper-layer device to a terminal provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of another forward static rotating slice provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a generation process of a dynamic rotating slice provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a generation process of another dynamic rotating slice provided by an embodiment of the present application.
  • 15 is a schematic diagram of a generation process of a static rotating slice provided by an embodiment of the present application.
  • 16 is a schematic diagram of a generation process of another static rotating slice provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a video playback device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of another video playback device provided by an embodiment of the present application.
  • FIG. 19 is a block diagram of a video playback apparatus provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a video playback system provided by an embodiment of the present application. As shown in FIG. 1 , the system includes: a media source 101 , a video server 102 and a terminal 103 .
  • the media source 101 is used to provide multiple video streams.
  • the media source 101 includes a plurality of cameras 1011 and a front-end encoder 1012 .
  • the camera 1011 is connected to the front-end encoder 1012 .
  • Each camera 1011 is used to capture one video stream and transmit the captured video stream to the front-end encoder 1012 .
  • the front-end encoder 1012 is configured to encode the video streams collected by the multiple cameras 1011 , and send the encoded video streams to the video server 102 .
  • multiple cameras 1011 are used to collect video images from different angles within the same focus area, and the multiple cameras 1011 collect images at the same time and frequency.
  • a camera synchronization technology can be used to achieve synchronous shooting of multiple cameras 1011 .
  • the number of cameras in the figure is only used for exemplary illustration, and not as a limitation on the video playback system provided by the embodiments of the present application.
  • the multiple cameras may be arranged in a ring-shaped arrangement or a fan-shaped arrangement, and the embodiment of the present application does not limit the arrangement of the cameras.
  • the video server 102 is configured to process the video stream sent by the media source 101 using the OTT (over the top) technology, and distribute the processed video stream to the terminal through a content delivery network (CDN).
  • CDN is an intelligent virtual network built on the existing network. CDN can include edge servers deployed in various places, and can also include central servers.
  • the video server 102 includes a video processing server 1021 and a video distribution server 1022 .
  • the video processing server 1021 is used to process the video stream using the OTT technology, and send the processed video stream to the video distribution server 1022; the video distribution server 1022 is used to distribute the video stream to the terminal.
  • the video processing server 1021 may also be called a video processing platform, and the video processing server 1021 may be one server, or a server cluster composed of several servers, or a cloud computing service center.
  • the video distribution server 1022 may be a central server or an edge server of the CDN. Certainly, the video processing server 1021 and the video distribution server 1022 may also be integrated together, which is not limited in this embodiment of the present application.
  • the terminal 103 is a video player, and is used for decoding and playing the video stream sent by the video server 102 .
  • the terminal 103 can change the playback angle through one or more control methods such as touch control, voice control, gesture control, or remote control control.
  • This embodiment of the present application does not limit the control manner for triggering the terminal to change the playback angle.
  • the terminal 103 may be a device such as a mobile phone, a tablet computer, or a smart wearable device that can change the playback angle through touch or voice control.
  • the terminal 103 may also be a device such as a set top box (set top box, STB) that can change the playback angle through the control of a remote control.
  • set top box set top box
  • the front-end encoder 1012 on the media source 101 side or the video processing server 1021 on the video server 102 side re-encodes (also may be referred to as transcoding) each video stream to obtain Group of Pictures (GOP), and generate video slices based on GOP for transmission, and each GOP can be decoded independently.
  • GOP Group of Pictures
  • a video slice is usually encapsulated with multiple GOPs, and each GOP includes one or more frames of video images.
  • a GOP may include an intra coded picture (I) frame; alternatively, a GOP may include an I frame and one or more predictive coded picture (P) frames following the I frame Alternatively, a GOP may include an I frame, one or more P frames located after the I frame, and one or more bidirectionally predicted coded picture (bidirectionally predicted picture, B) frames located between the I frame and the P frame.
  • a GOP is usually a set of temporally continuous video images.
  • the time stamp of the GOP obtained by re-encoding the video stream corresponds to the acquisition time of the video image in the GOP by the camera.
  • the timestamp of the GOP may be set to the acquisition moment of the last frame of video image in the GOP.
  • the GOP corresponds to a start time stamp and an end time stamp
  • the start time stamp is the acquisition time of the first frame of video images in the GOP
  • the end time stamp is the last frame in the GOP. The acquisition moment of the video image.
  • a GOP with a time length of less than 1 second is generally referred to as a mini GOP (mini GOP).
  • the time parameters of GOP can be set by managers. Under a fixed length of time, the number of video image frames contained in each GOP is positively related to the shooting frame rate of the camera, that is, the higher the shooting frame rate of the camera, the more video image frames contained in each GOP.
  • the GOP may include 2 frames of video images (corresponding to a frame per second (FPS) of 25 (abbreviation: 25FPS)), 3 frames of video images (corresponding to 30FPS), and 5 frames of video images (corresponding to 30FPS) Corresponding to 50FPS) or 6 frames of video images (corresponding to 60FPS).
  • the GOP may also include only one frame of video image (that is, only include I frame) or include more frames of video images, which is not limited in this embodiment of the present application.
  • the GOPs in the video slices are encoded in an independent transmission and encapsulation manner, so that each GOP can be used independently as a separate slice (also referred to as a sub slice).
  • a video fragment can be encapsulated in a fragmented mp4 (fragmented mp4, fmp4) format.
  • the fmp4 format is a streaming media format defined in the MPEG-4 standard proposed by the Moving Picture Expert Group (MPEG).
  • FIG. 2 is a schematic structural diagram of a video slice provided by an embodiment of the present application.
  • the video fragment includes n encapsulation headers and n data fields (mdat), each mdat is used to carry the data of one GOP, that is, the video fragment is encapsulated with n GOPs, n is an integer greater than 1.
  • Each encapsulation header includes a moof field.
  • the encapsulation method of the video slice may also be referred to as a multi-moof header encapsulation method.
  • the encapsulation header may further include a styp field and a sidx field.
  • segment involved in the embodiments of the present application refers to the video data that can be acquired by independent request
  • sub-segment fragment refers to the video data that can be decoded and played independently.
  • a shard usually includes one or more sub-shards.
  • the front-end encoder 1012 on the media source 101 side or the video processing server 1021 on the video server 102 side may re-encode each video stream to obtain the inserted frame stream.
  • the inserted frame stream includes a plurality of inserted frames, the inserted frames are P frames encoded without reference to the temporal motion vector, and the inserted frames can be regarded as the continuation of the I frames.
  • Interpolated frames are defined as P-frames that can be decoded independently of I-frames. Ordinary P-frames must depend on I-frames for decoding operations, while interpolated frames can be decoded independently.
  • the P' frame is used to represent the inserted frame.
  • Fig. 3 is a schematic diagram of a comparison structure of a GOP obtained by encoding and an inserted frame stream provided by an embodiment of the present application.
  • the GOP includes an I frame and 9 P frames located after the I frame.
  • the 9 P frames are respectively P-0 to P-8.
  • the inserted frame stream includes 4 P' frames.
  • the four P' frames are P'-1, P'-3, P'-5, and P'-7, respectively.
  • the video picture corresponding to the P'-1 frame is the same as the video picture corresponding to the P-1 frame
  • the video picture corresponding to the P'-3 frame is the same as the video picture corresponding to the P-3 frame
  • the video picture corresponding to the P'-5 frame is the same.
  • the picture is the same as the video picture corresponding to the P-5 frame
  • the video picture corresponding to the P'-7 frame is the same as the video picture corresponding to the P-7 frame.
  • the P-0 frame depends on the I frame decoding to obtain the video image
  • the P-2 frame can rely on the P'-1 frame to decode the video image
  • the P-4 frame can rely on the P'-3 frame to decode the video image
  • the P-6 frame can obtain the video image.
  • the frame can be decoded depending on the P'-5 frame to obtain a video image
  • the P-8 frame can be decoded depending on the P'-7 frame to obtain a video image.
  • the video processing server 1021 on the video server 102 side also generates a media content index (also referred to as an OTT index) according to externally set data.
  • the media content index is used to describe the information of each video stream, and the media content index is essentially a file describing the information of the video stream.
  • the information of the video stream includes address information of the video stream, time information of the video stream, and the like.
  • the address information of the video stream is used to indicate the acquisition address of the video stream, for example, the address information of the video stream may be a uniform resource locator (uniform resource locator, URL) address corresponding to the video stream.
  • the time information of the video stream is used to indicate the start moment and end moment of each video slice in the video stream.
  • the start moment of the video slice may be the collection moment of the first frame of video image in the video slice
  • the end moment of the video slice may be the collection moment of the last frame of video image in the video slice.
  • the media content index may also include camera position information.
  • the camera position information includes the camera position number (ie, the number of cameras on the media source side) and the camera position angle corresponding to each video stream.
  • the camera angle corresponding to the video stream is the camera angle corresponding to the camera.
  • FIG. 4 is a schematic diagram of a camera distribution scene on a media source side provided by an embodiment of the present application.
  • the scene includes 20 cameras, which are denoted as cameras 1-20 respectively.
  • the 20 cameras are arranged in a ring, and are used to shoot the same focus area M, and the shooting focus is point O.
  • the camera angle corresponding to one of the cameras can be set to 0, and the corresponding camera angles of the other cameras can be calculated accordingly.
  • the camera angle corresponding to camera 4 can be set to 0°, and the camera position angles corresponding to other cameras can be calculated respectively.
  • the camera position angle corresponding to camera 9 is 90°
  • the camera position angle corresponding to camera 14 is 180°
  • the camera position angle corresponding to camera 19 is 180°.
  • the corresponding camera angle is 270°.
  • the media content index in this embodiment of the present application may be an m3u8 file (may be referred to as a hypertext transfer protocol (HTTP)-based live streaming (HTTP living streaming, HLS) index) or a media presentation description. (media presentation description, MPD) file (may be referred to as a dynamic adaptive streaming over HTTP (DASH) index based on HTTP).
  • HTTP hypertext transfer protocol
  • HLS live streaming
  • MPD media presentation description
  • DASH dynamic adaptive streaming over HTTP
  • the m3u8 file refers to the m3u file in the 8-bit unicode transformation format (UTF-8) encoding format.
  • a video stream may be transmitted between the video server 102 and the terminal 103 based on a hypertext transfer protocol (hyper text transfer protocol, HTTP).
  • the process of the terminal acquiring the video content in the video server includes: the terminal first downloads the media content index from the video server, and obtains the information of the video stream by parsing the media content index.
  • the terminal selects the current video stream to be played, extracts the URL address of the video stream from the media content index, and sends a media content request to the video server through the URL address of the video stream.
  • the video server After receiving the media content request, the video server sends the corresponding video stream to the terminal.
  • the video playback system may further include a network device 104 , and the video server 102 and the terminal 103 are connected through the network device 104 .
  • Network device 104 may be a gateway or other intermediary device.
  • the video server 102 and the terminal 103 may also be directly connected, which is not limited in this embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a video playback method provided by an embodiment of the present application. The method can be applied to the video playback system shown in FIG. 1 . As shown in Figure 5, the method includes:
  • Step 501 When the terminal receives the play instruction, the terminal generates a play request based on the play instruction.
  • the playback request includes player position information, and the player position information is used to indicate the target position requested to be played.
  • the playback camera information includes the URL address of the video stream corresponding to the target camera.
  • the playback camera position information may include an identifier of the target camera position.
  • the playback request may also be referred to as a media content request.
  • Step 502 The terminal sends a playback request to the upper-layer device.
  • the upper-layer device refers to the upstream device of the terminal.
  • the upper layer device may be the video server 102 or the network device 104 in the video playback system as shown in FIG. 1 .
  • Step 503 The upper-layer device sends the video slice corresponding to the target camera position and the rotated video data corresponding to the target camera position to the terminal.
  • the rotating video data corresponding to the target camera position includes video data corresponding to the forward camera position and/or video data corresponding to the reverse camera position.
  • the forward stand includes one or more first stands located clockwise from the target stand.
  • the reverse stand includes one or more second stands located in a counterclockwise direction from the target stand.
  • the forward camera may include camera 5, camera 6, and camera 7, and the reverse camera may include camera 3, camera 2, and camera 1, etc.
  • FIG. 6 is a schematic diagram of another camera distribution scene on the media source side provided by an embodiment of the present application. As shown in FIG. 6 , the scene includes 9 cameras, which are divided into camera J to camera R.
  • the nine cameras are arranged in a fan-shaped manner and are used to shoot the same focus area M'.
  • the forward camera may include camera M, camera L, camera K, and camera J
  • the reverse camera may include camera O, camera P, camera Q, and camera R.
  • the video data corresponding to the forward camera position when the forward camera position includes a first camera position located in the clockwise direction of the target camera position, the video data corresponding to the forward camera position includes the video segment corresponding to the first camera position.
  • the video data corresponding to the forward camera position includes the corresponding video data of each of the plurality of first camera positions.
  • Video slice; or, the video data corresponding to the forward camera position is a forward rotation slice.
  • the forward rotating shards include forward dynamic rotating sub-shards and/or forward static rotating sub-shards.
  • the forward dynamic rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of first camera positions, and each image frame group in the forward dynamic rotation sub-slice is based on a first camera position.
  • the video images in the video slice corresponding to one camera position are obtained.
  • the multiple image frame groups in the forward dynamic rotation sub-slice are arranged in chronological order, and are arranged in order from near to far from the distances from the first camera positions to the target camera position in the clockwise direction.
  • the forward static rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of first camera positions, and each image frame group in the forward static rotation sub-slice is based on a first camera position.
  • the video images in the video slice corresponding to one camera position are obtained.
  • the playback periods corresponding to the multiple image frame groups in the forward static rotation sub-slice are the same, and are arranged in order from near to far from the distances from the first camera positions to the target camera position in the clockwise direction.
  • the video data corresponding to the reverse camera position includes video slices corresponding to the second camera position.
  • the video data corresponding to the reverse camera position includes video points corresponding to each second camera position in the plurality of second camera positions. or, the video data corresponding to the reverse camera position is a reverse rotation slice.
  • the reverse rotation shards include reverse dynamic rotation sub-shards and/or reverse static rotation sub-shards.
  • the reverse dynamic rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the plurality of second camera positions, and each image frame group in the reverse dynamic rotation sub-slice is based on a second camera.
  • the video image in the corresponding video slice is obtained.
  • the plurality of image frame groups in the reverse dynamic rotation sub-slice are arranged in chronological order, and are arranged in order from near to far according to the distances from the second camera positions to the target camera position in the counterclockwise direction.
  • the reverse static rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the video slices corresponding to the multiple first camera positions, and each image frame group in the reverse static rotation sub-slice is based on a first camera.
  • the video image in the corresponding video slice is obtained.
  • the playback periods corresponding to the multiple image frame groups in the reverse static rotation sub-slice are the same, and they are arranged in order from near to far away from the distances from the first camera positions to the target camera position in the clockwise direction.
  • the difference between the dynamic rotation slice and the static rotation slice is that the multiple image frame groups included in the former are arranged in chronological order, and the multiple image frame groups included in the latter have the same playback period.
  • the former can be prepared for surround playback in a video playback state, and the latter can be prepared for surround playback in a video paused playback state.
  • each image frame group in the rotated slice includes one or more frames of video images, and each image frame group can be decoded independently.
  • the group of picture frames is a GOP.
  • the group of image frames includes interpolated frames.
  • the picture frame group includes a combination of interpolated frames and P-frames, the P-frames being decoded in dependence on the interpolated frames.
  • the picture frame group includes a combination of interpolated frames, P-frames, and B-frames, the P-frames are decoded on the interpolated frame, and the B-frames are decoded on the interpolated-frame and P-frame dependent.
  • the image frame group in the rotated slice is a GOP.
  • FIG. 7 is a schematic structural diagram of video slices corresponding to multiple cameras respectively according to an embodiment of the present application.
  • each camera in the camera J to the camera R corresponds to the video slice a (corresponding to the time period T1 ) and the video slice b (corresponding to the time period T1 ) arranged in chronological order. period T2) and video slice c (corresponding to period T3).
  • Each video slice includes 5 GOPs
  • video slice a includes GOPs numbered 1 to 5
  • video slice b includes GOPs numbered 6 to 10
  • video slice c includes GOPs numbered 11 to 15.
  • the target camera is camera N
  • the forward camera includes camera M, camera L and camera K
  • the reverse camera includes camera O, camera P and camera Q
  • the rotating video data corresponding to the target camera includes forward dynamic rotation slices and reverse dynamic rotation of shards.
  • the media content in the time period T1 sent by the upper-layer device to the terminal may be as shown in FIG. 8 , including the video slice N-a corresponding to camera N, the forward dynamic rotation slice N-a1 corresponding to camera N, and the reverse dynamic corresponding to camera N.
  • the video slice N-a corresponding to the camera N includes N-1 to N-5;
  • the forward dynamic rotation slice N-a1 includes M-1, L-2, K-3, M-4 and L-5;
  • the reverse direction Dynamically rotated shard N-a2 includes O-1, P-2, Q-3, O-4 and P-5.
  • M-1, L-2 and K-3 form a forward dynamic rotating sub-shard
  • M-4 and L-5 form another forward dynamic rotating sub-shard
  • the forward dynamic rotating slice N- a1 includes two forward dynamic rotator slices
  • O-1, P-2 and Q-3 form a reverse dynamic rotator slice
  • O-4 and P-5 form another reverse dynamic rotator slice
  • the slice, the inverse dynamic rotation slice N-a2 includes two inverse dynamic rotation sub-slices.
  • FIG. 9 is a schematic structural diagram of a forward static rotating slice provided by an embodiment of the present application.
  • the forward camera position corresponding to the camera N includes the camera M and the camera L
  • the camera N corresponds to 5 forward static rotating sub-slices 1-5 in the time period T1.
  • the five forward static rotating sub-slices 1-5 correspond to N-1 to N-5 in time
  • the forward static rotating sub-slice 1 includes M-1 and L-1
  • the forward static rotating sub-slice Slice 2 includes M-2 and L-2
  • Forward Static Rotator Slice 3 includes M-3 and L-3
  • Forward Static Rotator Slice 4 includes M-4 and L-4
  • Forward Static Rotator Slice 5 includes M-5 and L-5.
  • the forward static rotation sub-slice 1 is used for clockwise playback when the video image is paused on the video image corresponding to N-1
  • the forward static rotation sub-slice 2 is used for the video image to be paused on the corresponding video image of N-2. Clockwise surround playback is performed on the video image, and so on, which is not repeated in this embodiment of the present application.
  • the structure of the reverse static rotor sub-shard can refer to the structure of the forward static rotor sub-shard.
  • FIG. 10 is a schematic structural diagram of a video stream and an inserted frame stream respectively corresponding to multiple cameras provided by an embodiment of the present application.
  • each camera from camera J to camera R corresponds to a video stream and an inserted frame stream
  • the video stream includes multiple video slices (only shown in FIG. 10 )
  • a GOP in a video slice) the GOP includes an I frame and 8 P frames located after the I frame, and the 8 P frames are respectively P-1 to P-8.
  • the inserted frame stream includes a plurality of P' frames obtained by interval coding for a plurality of P frames in the GOP, including P'-1, P'-3, P'-5 and P'-7.
  • the target camera is camera N
  • the forward camera includes camera M, camera L and camera K
  • the reverse camera includes camera O, camera P and camera Q
  • the rotating video data corresponding to the target camera includes forward dynamic rotation slices and reverse dynamic rotation of shards.
  • the media content in the time period T1' sent by the upper-layer device to the terminal may be as shown in Figure 11, including the N-GOP corresponding to the camera N, the forward dynamic rotation slice N-a1' corresponding to the camera N, and the reverse direction corresponding to the camera N. Dynamically rotate shard N-a2'.
  • the N-GOP corresponding to the camera N includes NI and NP-0 to NP-8;
  • the forward dynamic rotation slice N-a1' includes MI, MP-0, LP'-1, LP-2, KP'-3 , KP-4, MP'-5, MP-6, LP'-7 and LP-8;
  • reverse dynamic rotation slice N-a2' includes OI, OP-0, PP'-1, PP-2, QP' -3, QP-4, OP'-5, OP-6, PP'-7 and PP-8.
  • MI, MP-0, LP'-1, LP-2, KP'-3 and KP-4 form a forward dynamic rotating sub-shard
  • MP'-5, MP-6, LP'-7 and LP -8 forms another forward dynamic rotating sub-shard
  • OI, OP-0, PP'-1, PP-2, QP'-3 and QP-4 form an inverse dynamic rotating sub-shard
  • OP' -5, OP-6, PP'-7 and PP-8 form another inverse dynamic rotator sub-shard.
  • FIG. 12 is a schematic structural diagram of another forward static rotating slice provided by an embodiment of the present application.
  • the forward camera position corresponding to the camera N includes the camera M and the camera L
  • the camera N corresponds to 10 forward static rotation sub-slices 1-10 in the time period T1 ′
  • the 10 forward static rotation sub-slices 1-10 correspond in time to the 10 frames of video images in the N-GOP
  • the forward static rotation sub-slice 1 includes MI and LI
  • the forward static rotation sub-slice 2 includes MI, MP-0, LI and LP-0 (MI is used for MP-0 decoding, LI is used for LP-0 decoding)
  • forward static rotation sub-slice 3 includes MP'-1 and LP'- 1.
  • the forward static rotation sub-slice 4 includes MP'-1, MP-2, LP'-1 and LP-2 (MP'-1 is used for MP-2 decoding, LP'-1 is used for LP- 2 decoding), and so on.
  • MI and LI may not be included in forward static rotating sub-slice 2
  • MP-0 depends on MI decoding in forward static rotating sub-slice 1
  • LP-0 depends on forward static rotating sub-slice 1.
  • LI decoding in slice 1; MP'-1 and LP'-1 may also not be included in forward static rotation sub-slice 4
  • MP-2 depends on MP'-1 decoding in forward static rotation sub-slice 3
  • LP-2 relies on LP'-1 decoding in forward static rotation sub-slice 3; and so on.
  • the static rotation sub-slice corresponding to the P frame can be decoded based on the static rotation sub-slice corresponding to the I frame or P' frame on which the P frame is decoded.
  • the forward static rotation sub-slice 1 is used for clockwise playback when the video image is paused on the video image corresponding to NI
  • the forward static rotation sub-slice 2 is used for the video image to be paused on the video image corresponding to NP-0.
  • Clockwise surround playback is performed during the up time, and so on, which is not repeated in this embodiment of the present application.
  • the structure of the reverse static rotator sub-shard can refer to the structure of the forward static rotator sub-shard.
  • the second implementation manner since the rotated slice can be generated based on the inserted frame, the video slice sent by the upper-layer device to the terminal does not need to use the full I frame or mini GOP, but can use the normal GOP.
  • An implementation manner can reduce the data volume of video segments sent by the upper-layer device to the terminal.
  • the data volume of the inserted frame is usually smaller than the data volume of the I frame.
  • the second implementation manner can effectively reduce the consumption of network transmission resources.
  • the upper-layer device may, in response to the playback request, send the video slice corresponding to the target camera and the rotating video data corresponding to the target camera to the terminal, that is, the target camera.
  • the corresponding video segment and the rotating video data corresponding to the target camera position may both be sent by the upper-layer device in response to the playback request.
  • the upper-layer device may respond to the play request and send only the video segment corresponding to the target camera to the terminal.
  • the terminal may also generate a rotation preparation request, and send the rotation preparation request to the upper layer device.
  • the rotation preparation request is used to request to obtain the rotation video data corresponding to the target camera.
  • the upper-layer device sends the rotation video data corresponding to the target camera position to the terminal. That is, the video slice corresponding to the target camera is sent by the upper-layer device in response to the playback request, and the rotating video data corresponding to the target camera is sent by the upper-layer device in response to the rotation preparation request.
  • the terminal may send a playback request and a rotation preparation request to the upper-layer device at the same time; or, the terminal may first send a playback request to the upper-layer device, and then send a rotation preparation request to the upper-layer device, which is not limited in this embodiment of the present application. It is worth noting that the rotation preparation request is sent by the terminal to the upper-layer device before receiving the rotation instruction, that is, the rotation preparation request is used to prefetch the rotation video data corresponding to the position requested to be played.
  • the upper-layer device may actively send the rotation video data corresponding to the requested camera position to the terminal, or, after receiving the rotation preparation request sent by the terminal, The passive response sends the rotating video data corresponding to the requested camera to the terminal.
  • the rotation preparation request includes one or more of the preparation rotation direction, the number of preparation rotation stands, the identification of the preparation rotation station, or the preparation rotation state
  • the preparation rotation state includes a dynamic rotation state and/or a static rotation state
  • the content in the rotation preparation request is pre-configured in the terminal.
  • the dynamic rotation state is used to instruct the acquisition of dynamic rotation shards
  • the static rotation state is used to instruct the acquisition of static rotation shards.
  • the terminal may generate and send a rotation preparation request to the upper-layer device after receiving the preset trigger operation.
  • the terminal when the terminal detects that the video image is displayed on the horizontal screen clockwise, it determines that the preparatory rotation direction is the clockwise direction, and at this time, the terminal can request the upper-layer device for video data corresponding to the forward camera position. For another example, when the terminal detects that the video image is displayed on the horizontal screen counterclockwise, it determines that the preparatory rotation direction is the counterclockwise direction, and at this time, the terminal can request the upper-layer device for video data corresponding to the reverse camera position. For another example, the terminal may display a target button on the display interface, and when the terminal detects a touch operation on the target button, the terminal requests the upper-layer device for rotating video data corresponding to the target camera position. For another example, the terminal may request corresponding rotated video data from the upper-layer device based on the user's historical behavior data, and so on.
  • Step 504 When the terminal receives a rotation instruction in the process of playing the video picture based on the video segment corresponding to the target camera position, the terminal determines the rotation direction according to the rotation instruction.
  • the direction of rotation is either clockwise or counterclockwise.
  • the terminal receives a rotation instruction in the process of playing the video screen based on the video segment corresponding to the target camera. .
  • the terminal when the terminal detects a sliding operation on the video playback interface, the terminal determines that a rotation instruction is received.
  • the terminal determines the rotation direction according to the sliding direction of the sliding operation. For example, a swipe direction to the left represents a counterclockwise rotation, and a swipe direction to the right represents a clockwise rotation.
  • the terminal determines that the rotation instruction is received.
  • the terminal determines the rotation direction according to the key identifier in the target remote control instruction. For example, when the remote control button information includes a left button identifier, it indicates that the rotation direction is counterclockwise, and when the remote control button information includes a right button identifier, it indicates that the rotation direction is clockwise.
  • the remote control button information includes a left button identifier
  • the remote control button information includes a right button identifier
  • other keys on the remote control device may also be set to control the rotation direction, which is not limited in this embodiment of the present application.
  • Step 505 in response to the rotating video data including the target video data corresponding to the camera position on the rotation direction of the target camera position, the terminal plays the video picture based on the target video data.
  • the target video data is a target rotation slice
  • the target rotation slice includes a plurality of image frames obtained based on video images in the video slices corresponding to a plurality of camera positions located in the rotation direction of the target camera position
  • Each image frame group in the target rotation slice is obtained based on the video images in the video slice corresponding to a camera position located in the rotation direction of the target camera position.
  • the target rotation segment in response to the terminal receiving the rotation instruction in the video playing state, includes dynamic rotation sub-slices, and multiple image frame groups in the dynamic rotation sub-slice are arranged in chronological order, and according to the multiple image frame groups
  • the distances from the camera positions to the target camera position in the rotation direction are arranged in order from near to far, that is, the target rotation slice is a dynamic rotation slice.
  • the target rotation segment in response to the terminal receiving the rotation instruction in the video paused playback state, the target rotation segment includes a static rotation sub-segment, and the playback periods corresponding to the multiple image frame groups in the static rotation sub-segment are the same, and according to the multiple image frame groups.
  • the distances from the camera positions to the target camera position in the rotation direction are arranged in order from near to far, that is, the target rotating shard is a static rotating shard.
  • the implementation process for the terminal to play a video picture based on the target video data includes: the terminal decodes and plays the target rotated slice.
  • the terminal receives a rotation instruction in the video playback state, and the rotation direction indicated by the rotation instruction is clockwise, the terminal decodes and plays the forward dynamic rotation sub-segment corresponding to the playback time.
  • the terminal receives the rotation instruction in the video playback state, and the rotation direction indicated by the rotation instruction is the counterclockwise direction, the terminal decodes and plays the reverse dynamic rotation sub-slice corresponding to the playback time.
  • the terminal If the terminal receives the rotation instruction in the video pause state, and the rotation direction indicated by the rotation instruction is clockwise, the terminal decodes and plays the forward static rotation sub-segment corresponding to the video pause time. If the terminal receives the rotation instruction in the video pause state, and the rotation direction indicated by the rotation instruction is counterclockwise, the terminal decodes and plays the reverse static rotation sub-segment corresponding to the video pause time.
  • the surround playback of the video picture can be realized by decoding the pre-acquired rotation slices, and the surround playback delay is low.
  • the target video data includes video segments corresponding to a plurality of camera positions located in the rotation direction of the target camera position.
  • the implementation process for the terminal to play a video picture based on the target video data includes: the terminal generates an image frame group based on the video images in the video slices corresponding to the multiple camera positions. The terminal plays the video images in the generated image frame group in sequence according to the distances from the multiple camera positions to the target camera position in the rotation direction from near to far.
  • the terminal will, according to the distances from the multiple cameras located in the rotation direction of the target camera to the target camera in the rotation direction, based on the corresponding
  • the video images in the video slice are generated into multiple image frame groups arranged in chronological order, each image frame group is generated based on the video images in the video slice corresponding to one camera position, and then the multiple image frame groups are played in sequence.
  • video image For example, referring to the example shown in FIG. 7 , the target camera position is camera N, the forward camera position includes camera M and camera L, and the terminal is currently playing the video segment corresponding to camera N.
  • the rotation instruction, and the rotation direction determined based on the rotation instruction is the clockwise direction, then the terminal extracts and decodes and plays M-3 in the video slice a corresponding to the camera M and L- in the video slice a corresponding to the camera L in turn. 4.
  • the terminal If the terminal receives the rotation instruction in the video pause state, the terminal generates image frame groups based on the video images corresponding to the video pause time in the video slices corresponding to the multiple camera positions, and each image frame group corresponds to one camera position.
  • the video image corresponding to the pause moment of the video is generated in the video segment of .
  • the target camera position is camera N
  • the forward camera position includes camera M and camera L
  • the current video picture of the terminal is paused at N-2 in the video slice a corresponding to camera N.
  • the terminal extracts M-2 in the video slice a corresponding to the camera M and L-2 in the video slice a corresponding to the camera L respectively. 2, and decode and play M-2 and L-2 in turn.
  • the terminal when the target video data only includes a video slice corresponding to one camera position located in the rotation direction of the target camera position, the terminal directly decodes and plays the video image in the video slice corresponding to the camera position. If the terminal receives the rotation instruction in the video playback state, the terminal decodes and plays the video image corresponding to the next playback moment in the video segment corresponding to the camera position.
  • the target camera position is camera N
  • the forward camera position includes camera M
  • the terminal is currently playing the video segment corresponding to camera N, assuming that a rotation instruction is received when playing to N-2
  • the rotation direction determined based on the rotation instruction is the clockwise direction.
  • the terminal decodes and plays M-3 in the video slice a corresponding to the camera M.
  • the terminal receives the rotation instruction in the video pause state, the terminal decodes and plays the video image corresponding to the video pause moment in the video segment corresponding to the camera position.
  • the target camera position is camera N
  • the forward camera position includes camera M
  • the current video picture of the terminal is paused at N-2 in the video slice a corresponding to camera N.
  • a rotation instruction, and based on the rotation direction determined by the rotation instruction is a clockwise direction, the terminal decodes and plays M-2 in the video segment a corresponding to the camera M.
  • the terminal when the terminal receives a rotation instruction, the terminal can decode and play the video images in the video slices corresponding to the camera positions in the pre-acquired rotation direction to realize the surround playback of the video images, and the surround playback delay is low. .
  • the upper-layer device sends to the terminal the video segment corresponding to the target camera that the terminal requests to play and the rotating video data corresponding to the target camera, and after receiving the video segment corresponding to the target camera, the terminal Decoding the video segment can realize the playback of the video image collected by the target camera; when the terminal receives the rotation instruction, it can realize the surround playback of the video screen according to the pre-acquired rotation video data.
  • the delay is lower, and the resolution of the played video picture may be the same as the resolution of the video image in the video slice or the video image in the rotated video data. Therefore, the video playback method provided by the embodiments of the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the upper-layer device does not need to always send the video images captured by all cameras to the terminal, which can reduce the amount of data transmission and save transmission resources.
  • the following steps 506 to 511 may also be performed.
  • Step 506 The terminal generates a surround playback request based on the rotation instruction.
  • the surround playback request includes rotation camera position information, and the rotation camera position information is used to indicate the rotation range.
  • the terminal can determine the starting position, the ending position and the rotation direction according to the rotation instruction and the camera position information.
  • the bit information may include the identification of the starting position, the identification of the ending position and the rotation direction.
  • the terminal may determine the rotation angle according to the rotation instruction, and at this time, the rotation angle may be included in the rotation position information.
  • the surround playback request generated by the terminal is used to request dynamic surround playback of video content.
  • the surround playback request is also used to determine the playback start time and playback end time.
  • the surround playback request further includes playback time information, where the playback time information includes one or more of a playback start time, a playback end time, or a surround playback duration.
  • the surround playback request generated by the terminal is used to request static surround playback of the video content.
  • the surround playback request is also used to determine the target playback time.
  • the surround playback request includes the target playback time, and the target playback time may be the video pause time.
  • the static surround playback of video content refers to the surround playback of the video images corresponding to the target playback moments provided by multiple cameras.
  • the terminal when the terminal detects a sliding operation on the video playback interface, the terminal determines that a rotation instruction is received.
  • the terminal determines the rotation position information according to the sliding information of the sliding operation, where the sliding information includes one or more of a sliding start position, a sliding length, a sliding direction or a sliding angle.
  • the terminal generates a surround playback request based on the rotating camera position information.
  • the sliding start position, sliding length and sliding direction can be used to determine the starting position, the ending position and the rotation direction.
  • the sliding angle can be used to determine the rotation angle.
  • the sliding starting position corresponds to the starting camera position
  • the sliding direction corresponds to the rotation direction
  • the sliding length is used to define the number of switching positions. Swipe to the left to rotate counterclockwise, and to the right to rotate clockwise.
  • the unit length can be set to 1 cm.
  • the sliding length reaches 3 cm, it means to switch 3 camera positions.
  • the sliding sensitivity is negatively related to the setting value of the unit length, that is, the smaller the setting value of the unit length is, the higher the sliding sensitivity is.
  • the sliding sensitivity can be set according to actual needs.
  • the sliding direction is to the right
  • the sliding length is 5 cm
  • the unit length is 1 cm
  • the terminal determines that the rotation direction is clockwise, and the ending position is the camera 14 .
  • the surround playback duration may also be defined by the sliding duration.
  • the surround playback duration can be equal to the sliding duration.
  • the sliding angle is used to determine the rotation angle. You can set the rotation angle and the sliding angle to satisfy a certain relationship, for example, make the rotation angle equal to the sliding angle; or make the rotation angle equal to 2 times the sliding angle; and so on.
  • the rotation position information includes the rotation angle
  • the positive and negative of the rotation angle may also be used to indicate the rotation direction. For example, if the rotation angle is positive, it means clockwise rotation, and if the rotation angle is negative, it means counterclockwise rotation.
  • the terminal when the terminal receives the target remote control instruction sent by the remote control device, the terminal determines that the rotation instruction is received.
  • the target remote control instruction includes remote control key information, and the remote control key information includes key identification and/or key times.
  • the terminal determines the rotation position information according to the remote control key information. Then, the terminal generates a surround playback request based on the rotating camera position information.
  • the key identification can be used to determine the rotation direction.
  • the number of keys can be used to determine the number of switch positions.
  • the rotation direction is determined based on the key identification. For example, when the remote control button information includes a left button identifier, it indicates that the rotation direction is counterclockwise, and when the remote control button information includes a right button identifier, it indicates that the rotation direction is clockwise.
  • other keys on the remote control device may also be set to control the rotation direction, which is not limited in this embodiment of the present application.
  • the number of keys is used to define the number of camera positions to switch. For example, if the number of keys is 1, it means to switch one camera position. For example, assuming that the remote control key information includes the identifier of the left key, and the number of key presses is 3, it means that 3 camera positions are switched by counterclockwise rotation. Referring to FIG.
  • the terminal determines that the rotation direction is counterclockwise according to the key identifier, and determines that the number of switched camera positions is 3 according to the number of buttons, and then determines that the end camera 6 is the end camera.
  • the surround playback duration may also be defined by the button duration.
  • the duration of the surround playback can be equal to the duration of the key press.
  • Step 507 The terminal sends a surround playback request to the upper layer device.
  • Step 508 The upper-layer device determines playback time information based on the surround playback request.
  • the surround playback request is used to request dynamic surround playback of video content
  • the playback time information includes a playback start time and a playback end time.
  • the implementation manners for the upper-layer device to determine the playback time information based on the surround playback request include the following five:
  • the implementation process of step 508 includes: the upper-layer device determines the playback start time and the playback end time according to the time when the surround playback request is received and the preset policy.
  • the preset strategy includes the preset surround playback duration.
  • the preset strategy is defined as follows: the video playback time when the upper-layer device receives the surround playback request is used as the playback start time, and the interval duration between the playback end time and the playback start time is equal to the preset surround playback duration.
  • the video playback time is 00:19:35
  • the preset surround playback time is 2 seconds
  • the upper-layer device determines that the playback start time is 00:19:35, and the playback end time is 00: 19:37.
  • the preset strategy may also define: the video playback time that is spaced apart from the reception time of the surround playback request (corresponding to the video playback time) for a certain period of time is used as the playback start time, and the playback start time can be located at the time sequence of the surround playback request. Before the reception time, or the playback start time may be located after the reception time of the surround playback request in time sequence.
  • the reception time of the surround playback request is 00:19:35
  • the playback start time may be 00:19:34
  • the playback start time may also be 00:19:36.
  • the surround playback request includes a playback start time and a playback end time.
  • the implementation process of step 508 includes: the upper-layer device identifies the playback start time and the playback end time in the surround playback request.
  • the specified field of the pre-defined or pre-configured surround playback request is used to carry the playback start time and the playback end time.
  • the pre-definition may be defined in a standard or a protocol; the pre-configuration may be pre-negotiation between the upper-layer device and the terminal.
  • the upper-layer device can identify the playback start time and the playback end time from the specified fields. For example, if the specified field of the surround playback request carries two times, which are 00:19:35 and 00:19:37 respectively, the upper-layer device determines that the playback start time is 00:19:35 and the playback end time is 00:19 :37.
  • the playback start time is included in the surround playback request.
  • the implementation process of step 508 includes: the upper-layer device determines the playback end time according to the playback start time and the preset surround playback duration. For example, if the playback start time carried in the surround playback request is 00:19:35, and the preset surround playback duration is 2 seconds, the upper-layer device determines that the playback end time is 00:19:37.
  • the surround playback request includes the surround playback duration.
  • the implementation process of step 508 includes: the upper-layer device determines the playback start time and the playback end time according to the time of receiving the surround playback request and the surround playback duration.
  • the surround playback request includes the playback start time and the surround playback duration.
  • the implementation process of step 508 includes: the upper-layer device determines the playback end time according to the playback start time and the surround playback duration. For example, if the playback start time carried in the surround playback request is 00:19:35, and the surround playback duration is 2 seconds, the upper-layer device determines that the playback end time is 00:19:37.
  • the surround playback request is used to request static surround playback of video content
  • the playback time information includes the target playback time.
  • the surround playback request includes the target playback time.
  • the upper-layer device determines the target playback time according to the moment when the surround playback request is received, and the upper-layer device determines the target playback time with reference to the above-mentioned first implementation mode. The upper-layer device determines the playback start The method of time is not repeated in this embodiment of the present application.
  • Step 509 The upper-layer device generates a rotating slice according to the rotating camera position information and the playing time information.
  • the rotation slice includes image frame groups corresponding to multiple camera positions within the rotation range.
  • the rotation slice sequentially includes image frame groups corresponding to multiple camera positions along the rotation direction from the starting camera position to the ending camera position.
  • the upper-layer device first determines the starting position, the ending position and the rotation direction according to the rotating position information, and then determines a plurality of positions from the starting position to the ending position along the rotation direction.
  • the upper-layer device when the identification of the starting plane, the identification of the ending plane and the rotation direction are included in the rotating camera position information, after the upper-layer device receives the surround playback request, it can determine the starting plane according to the content in the rotating camera position information. position, end position and direction of rotation.
  • the rotation camera position information includes the rotation angle
  • the upper-layer device determines the end camera position and the rotation direction according to the start camera position and the rotation angle. For example, referring to Fig. 4 , assuming that the starting camera position determined by the upper-layer device is camera 9, and the rotation angle carried in the surround playback request is -90°, the upper-layer device determines that the rotation direction is counterclockwise, and the end camera position is camera 4.
  • the plurality of positions determined by the upper-layer device may include all or part of the positions from the start position to the end position along the rotation direction.
  • the multiple camera positions determined by the upper-layer device include camera 9, camera 10, camera 11, and camera in turn. 12. Camera 13 and Camera 14.
  • the multiple camera positions determined by the upper-layer device may include some camera positions from the start camera position to the end camera position along the rotation direction. For example, if the union of the shooting area of the camera 11 and the shooting area of the camera 13 in FIG.
  • the multiple camera positions determined by the upper-layer device may not include the shooting area of the camera 12.
  • the multiple camera positions determined by the upper-layer device may not include the shooting area of the camera 12.
  • static surround playback of the video images captured by cameras 9 to 14 since the video images captured by camera 11 and the video images captured by camera 13 include the video images captured by camera 12, there will be no sudden change in the video images during the surround playback process. In this way, the smoothness of the surround playback picture can be ensured.
  • the surround playback request is used to request dynamic surround playback of video content.
  • the group of image frames in the rotated slice is a GOP. Then the implementation process of step 509 includes:
  • step 5091A1 the upper-layer device acquires m video segments corresponding to each of the multiple cameras from the playback start time to the playback end time, where m is a positive integer.
  • the playback start time is t1
  • the playback end time is t2
  • q is an integer greater than 0, t2>t1
  • the video stream corresponding to each camera is in The time period (t1, t2) includes m video slices.
  • the upper-layer device obtains the m video slices corresponding to the q camera positions in the time period (t1, t2).
  • step 5092A1 the upper-layer device extracts one or more GOPs from the m video segments corresponding to each camera position according to the playback time information.
  • the upper-layer device determines the GOP extraction time and the number of GOP extractions corresponding to each camera position according to the surround playback duration and the number of multiple cameras, where the surround playback duration is equal to the difference between the playback end time and the playback start time.
  • the upper-layer device extracts GOPs from m video slices corresponding to each camera position according to the GOP extraction time corresponding to each camera position and the GOP extraction quantity.
  • the GOP extraction time corresponding to the previous position is located before the GOP extraction time corresponding to the latter position in time series.
  • the number of GOP extractions corresponding to each camera position is equal to the ratio of the surround playback duration to the product of the GOP duration and the number of multiple camera positions (this ratio can be rounded up or down).
  • step 5091A For example, continuing to refer to the example in step 5091A, assuming that the time length of each GOP is t, the number of GOP extractions corresponding to each camera position is equal to (t2-t1)/(q*t).
  • step 5093A1 the upper-layer device assembles the extracted GOPs to obtain rotated slices.
  • the upper-layer device sequentially assembles the extracted GOPs according to the rotation direction to obtain a rotating slice, and the rotating slice is a dynamic rotating slice.
  • the playback start time is the start time of the time period T2
  • the playback end time is the end time of the time period T2
  • the start camera position is the camera N
  • the end camera position is the camera R
  • the rotation direction In a counterclockwise direction
  • the video slice b corresponding to each camera position includes 5 GOPs
  • the number of GOP extractions corresponding to each camera position is 1.
  • FIG. 13 Schematic diagram of the production process of slices. As shown in FIG.
  • the GOP extracted by the upper-layer device from the video slice b corresponding to camera N is N-6
  • the GOP extracted from the video slice b corresponding to camera O is O-7
  • the GOP extracted from the video slice b corresponding to camera P is
  • the GOP extracted from slice b is P-8
  • the GOP extracted from video slice b corresponding to camera Q is Q-9
  • the GOP extracted from video slice b corresponding to camera R is R-10.
  • the upper-layer device sequentially assembles the GOPs extracted from the video slices corresponding to the five cameras according to the rotation direction to obtain dynamic rotation slices.
  • step 509 the image frame groups in the rotated slice are generated based on interpolated frames. Then the implementation process of step 509 includes:
  • step 5091A2 the upper-layer device acquires m video segments corresponding to each of the multiple cameras from the playback start time to the playback end time, where m is a positive integer.
  • step 5091A For the explanation of this step, reference may be made to the above-mentioned step 5091A1, which is not repeated in this embodiment of the present application.
  • step 5092A2 the upper-layer device extracts one or more frames of video images from the m video segments corresponding to each camera position according to the playback time information.
  • the upper-layer device determines the video image extraction time and the number of video image extractions corresponding to each camera position according to the surround playback duration and the number of multiple positions, and the surround playback duration is equal to the difference between the playback end moment and the playback start moment. .
  • the upper-layer device extracts video images from the m video slices corresponding to each camera position according to the video image extraction time corresponding to each camera position and the number of video image extractions.
  • the video image extraction time corresponding to the previous camera is located before the video image extraction time corresponding to the latter camera in time series.
  • the number of video image extractions corresponding to each camera position is equal to the ratio of the product of the duration of the surround playback to the time length of the video image and the number of multiple camera positions (this ratio can be rounded up or down).
  • step 5093A2 for each camera position in the multiple camera positions, the upper-layer device generates an image frame group according to the inserted frame stream corresponding to the camera position and the extracted video image, and generates image frames corresponding to the multiple camera positions. Groups are assembled to obtain rotating slices.
  • the upper-layer device sequentially assembles the generated image frame groups according to the rotation direction to obtain a rotating slice, and the rotating slice is a dynamic rotating slice.
  • FIG. 14 is a schematic diagram of a generation process of another dynamic rotation slice provided by an embodiment of the present application.
  • the upper-layer device extracts video images NI and NP-0 from the GOP corresponding to camera N, extracts video image OP'-1 from the inserted frame stream corresponding to camera O, and extracts video from the GOP corresponding to camera O Image OP-2, extract video image PP'-3 from the inserted frame stream corresponding to camera P, extract video image PP-4 from the GOP corresponding to camera P, and extract video image QP' from the inserted frame stream corresponding to camera Q -5 and extract the video image QP-6 from the GOP corresponding to the camera Q, extract the video image RP'-7 from the inserted frame stream corresponding to the camera R, and extract the video image RP-8 from the GOP corresponding to the camera R.
  • NI and NP-0 are image frame groups corresponding to camera N
  • OP'-1 and OP-2 are image frame groups corresponding to camera O
  • PP'-3 and PP-4 are image frame groups corresponding to camera P
  • QP'-5 and QP-6 are image frame groups corresponding to camera Q
  • RP'-7 and RP-8 are image frame groups corresponding to camera R.
  • the upper-layer device sequentially assembles the image frame groups corresponding to the five camera positions according to the rotation direction to obtain dynamic rotation slices.
  • the surround playback request is used to request static surround playback of video content.
  • the image frame groups in the rotated slice are GOPs, and each GOP includes one frame of video image.
  • the implementation process of step 509 includes:
  • step 5091B1 the upper-layer device obtains the target video fragment corresponding to each camera position in the multiple camera positions, and the time period corresponding to the target video fragment includes the target playback moment.
  • the time period corresponding to the target video segment includes the target playback time, which means that the target playback time is located between the start time and the end time of the target video segment.
  • step 5092B1 the upper-layer device extracts a GOP corresponding to the target playback moment from the target video segment corresponding to each camera position.
  • a GOP corresponding to the target playback time means that the acquisition time of the video image in the GOP is the target playback time.
  • step 5093B1 the upper-layer device assembles the extracted GOPs to obtain rotated slices.
  • the upper-layer device sequentially assembles the extracted GOPs according to the rotation direction to obtain a rotating slice, and the rotating slice is a static rotating slice.
  • FIG. 15 is a schematic diagram of a generation process of a static rotation slice provided by an embodiment of the present application.
  • the GOP extracted by the upper-layer device from the video slice b corresponding to camera N is N-7
  • the GOP extracted from the video slice b corresponding to camera O is O-7
  • the GOP extracted from the video slice b corresponding to camera P is
  • the GOP extracted from slice b is P-7
  • the GOP extracted from video slice b corresponding to camera Q is Q-7
  • the GOP extracted from video slice b corresponding to camera R is R-7.
  • the upper-layer device sequentially assembles the GOPs extracted from the video slices corresponding to the 5 camera positions according to the rotation direction to obtain static rotation slices.
  • step 509 the image frame groups in the rotated slice are generated based on interpolated frames. Then the implementation process of step 509 includes:
  • step 5091B2 the upper-layer device acquires the target video segment corresponding to each camera position in the multiple camera positions, and the time period corresponding to the target video fragment includes the target playback time.
  • the time period corresponding to the target video segment includes the target playback time, which means that the target playback time is located between the start time and the end time of the target video segment.
  • step 5092B2 the upper-layer device extracts a frame of video image corresponding to the target playback moment from the target video segment corresponding to each camera position.
  • a frame of video image corresponding to the target playback time refers to the acquisition time of the video image as the target playback time.
  • step 5093B2 for each camera position in the multiple camera positions, the upper-layer device generates an image frame group according to the inserted frame stream corresponding to the camera position and the extracted video image, and the image frames corresponding to the multiple camera positions Groups are assembled to obtain rotating slices.
  • the image frame group includes the I frame. If the extracted video image is not an I frame and the video image corresponds to an inserted frame, the image frame group includes the inserted frame corresponding to the video image. If the extracted video image is not an I frame and the video image has no corresponding inserted frame, the image frame group includes the video image and the I frame or the inserted frame on which the decoding of the video image depends.
  • the upper-layer device sequentially assembles the generated image frame groups according to the rotation direction to obtain a rotation slice, and the rotation slice is a static rotation slice.
  • the starting camera is camera N
  • the ending camera is camera R
  • the target playback time is MP-1 in the GOP corresponding to camera M
  • Figure 16 This is a schematic diagram of another generation process of static rotating slices provided by the embodiment of the present application.
  • the upper layer device extracts the inserted frame NP'-1 from the inserted frame stream corresponding to the camera N, extracts the inserted frame OP'-1 from the inserted frame stream corresponding to the camera O, and extracts the inserted frame stream corresponding to the camera P.
  • the insertion frame PP'-1 is extracted from the camera Q
  • the insertion frame QP'-1 is extracted from the insertion frame stream corresponding to the camera Q
  • the insertion frame RP'-1 is extracted from the insertion frame stream corresponding to the camera R.
  • the upper-layer equipment sequentially assembles the video images corresponding to the five camera positions according to the rotation direction to obtain static rotating slices.
  • the number of image frame groups included in the rotating slice may be the same as or different from the number of image frame groups included in other video slices.
  • the number of image frame groups included in the rotating slice may be less or more than other video slices.
  • the number of image frame groups included in a slice is not limited in this embodiment of the present application.
  • the upper-layer device when the upper-layer device is a network device, after receiving the surround playback request, the upper-layer device first downloads the media content index from the video server, and obtains video stream information by parsing the media content index. The upper-layer device extracts the URL address of the video stream corresponding to each of the multiple cameras from the media content index, and then obtains the corresponding video segments respectively through the URL of the video stream.
  • Step 510 The upper-layer device sends the rotated slice to the terminal.
  • the upper-layer device when the surround playback request is used to request dynamic surround playback of video content, after the upper-layer device sends the rotating segment to the terminal, it continues to send the video segment corresponding to the termination position to the terminal, so that the terminal can smoothly start from the starter.
  • the playback screen corresponding to the position switches to the playback screen corresponding to the end position.
  • the upper-layer device stops sending video data to the terminal after sending the rotating slice to the terminal.
  • Step 511 The terminal decodes and plays the rotated slice.
  • the terminal decodes and plays the rotating slices, which can realize the surround playback of the video images corresponding to the multiple camera positions in the rotation direction from the starting camera position to the ending camera position.
  • the resolution of the video picture played by the terminal may be the same as the resolution of the video image in the rotated slice.
  • step 506 and step 507 may be performed simultaneously with step 505, that is, after the terminal receives the rotation instruction, it may play the video based on the pre-acquired rotation video data.
  • step 505 that is, after the terminal receives the rotation instruction, it may play the video based on the pre-acquired rotation video data.
  • step 505 that is, after the terminal receives the rotation instruction, it may play the video based on the pre-acquired rotation video data.
  • a surround playback request is generated and sent to the upper layer device.
  • the steps can also be increased or decreased according to the situation. Any person skilled in the art who is familiar with the technical scope disclosed in the present application can easily think of any variation of the method, which should be covered by the protection scope of the present application, and thus will not be repeated here.
  • the upper-layer device sends the video segment corresponding to the target camera position requested by the terminal to play and the rotating video data corresponding to the target camera position to the terminal, and the terminal receives the video segment corresponding to the target camera position.
  • decoding the video slice can realize the playback of the video image collected by the target camera; when the terminal receives the rotation instruction, it can rotate the video data according to the pre-acquired rotation video data.
  • the surround playback of the video picture is realized, the delay of the surround playback is low, and the resolution of the played video picture can be the same as the resolution of the video image in the video segment or the video image in the rotated video data.
  • the video playback method provided by the embodiments of the present application is not limited by the number of cameras used for front-end shooting, and has a wide range of applications.
  • the upper-layer device does not need to always send the video images captured by all cameras to the terminal, which can reduce the amount of data transmission and save transmission resources.
  • rotated slices can be generated based on inserted frames.
  • the video slices sent by the upper-layer device to the terminal do not need to use full I-frames or mini GOPs, but normal GOPs can be used, which can reduce the cost of video slices sent by the upper-layer equipment to the terminal.
  • the data amount of the inserted frame is usually smaller than the data amount of the I frame, which can reduce the data amount of the rotating fragment sent by the upper-layer device to the terminal. Therefore, the use of the inserted frame technology to generate the rotating fragment can effectively reduce the network transmission resources. consume.
  • FIG. 17 is a schematic structural diagram of a video playback device provided by an embodiment of the present application.
  • the apparatus is applied to an upper-layer device, for example, the upper-layer device may be a video server or a network device in the video playback system as shown in FIG. 1 .
  • the apparatus 170 includes:
  • the receiving module 1701 is configured to receive a playback request sent by the terminal, where the playback request includes player position information, and the player position information is used to indicate the requested target position for playback.
  • the sending module 1702 is used to send the video slice corresponding to the target camera position and the rotating video data corresponding to the target camera position to the terminal, and the rotating video data includes the video data corresponding to the forward camera position and/or the video data corresponding to the reverse camera position,
  • the forward camera position includes one or more first camera positions located in the clockwise direction of the target camera position
  • the reverse camera position includes one or more second camera positions located in the counterclockwise direction of the target camera position.
  • the sending module 1702 is configured to: in response to the upper-layer device receiving the rotation preparation request sent by the terminal, the upper-layer device sends the rotation video data to the terminal, and the rotation preparation request is used to request to obtain the rotation video data corresponding to the target camera. Or, in response to the playback request, the rotating video data is sent to the terminal.
  • the video data corresponding to the forward camera position includes video fragments corresponding to each first camera position; or, the forward camera position includes a plurality of first camera positions located in the clockwise direction of the target camera position, and the forward camera position
  • the video data corresponding to the camera position is a forward rotation slice
  • the forward rotation slice includes a forward dynamic rotation sub-slice and/or a forward static rotation sub-slice
  • the forward dynamic rotation sub-slice includes a Multiple image frame groups obtained from the video images in the video slice corresponding to the first camera position, each image frame group in the forward dynamic rotation sub-slice is based on a video image in the video slice corresponding to the first camera position It is obtained that the multiple image frame groups in the forward dynamic rotation sub-slice are arranged in chronological order, and are arranged in order from near to far according to the distances from the first camera positions to the target camera position in the clockwise direction;
  • the static rotation sub-slice includes a plurality of image frame groups obtained based on the video images in the
  • the video data corresponding to the reverse camera position includes video slices corresponding to each second camera position; or, the reverse camera position includes a plurality of second camera positions located in the counterclockwise direction of the target camera position, and the reverse camera position corresponds to
  • the video data are reverse rotation slices, and reverse rotation slices include reverse dynamic rotation sub slices and/or reverse static rotation sub slices, wherein the reverse dynamic rotation sub slices include video slices based on the corresponding video slices of multiple second positions.
  • a plurality of image frame groups obtained from the video images in the slice, each image frame group in the reverse dynamic rotation sub slice is obtained based on the video image in the video slice corresponding to a second camera position, and the reverse dynamic rotation sub slice is obtained.
  • each image frame group in the reverse static rotation sub-slice is obtained based on the video image in the video slice corresponding to the first camera position, and the reverse
  • the playback periods corresponding to the plurality of image frame groups in the static rotating sub-slice are the same, and are arranged in order from near to far away from the distances from the first camera positions to the target camera position in the clockwise direction.
  • the image frame group includes one or more frames of video images, and each image frame group can be decoded independently.
  • the apparatus 170 further includes: a processing module 1703 .
  • the receiving module 1701 is further configured to receive a surround playback request sent by the terminal, where the surround playback request includes rotation camera position information, and the rotation camera position information is used to indicate a rotation range.
  • the processing module 1703 is used to determine the playback time information based on the surround playback request; and generate a rotation segment according to the rotation camera position information and the playback time information, and the rotation segment includes the image frame groups corresponding to the multiple camera positions within the rotation range, and the image A frame group includes one or more frames of video images, and each image frame group can be decoded independently.
  • the sending module 1702 is further configured to send the rotated slice to the terminal.
  • the image frame group is a GOP; or, the image frame group includes interleaved frames; or, the image frame group includes a combination of interleaved frames and P frames; or, the image frame group includes a combination of interleaved frames, P frames, and B frames.
  • the rotation preparation request includes one or more of the preparation rotation direction, the number of preparation rotation stands, the identification of the preparation rotation station, or the preparation rotation state, and the preparation rotation state includes a dynamic rotation state and/or a static rotation state,
  • the content in the rotation preparation request is pre-configured in the terminal.
  • FIG. 18 is a schematic structural diagram of another video playback apparatus provided by an embodiment of the present application.
  • the apparatus is applied to a terminal, for example, the apparatus may be the terminal 103 in the video playback system as shown in FIG. 1 .
  • the device 180 includes:
  • the sending module 1801 is used to send a play request generated based on the play instruction to the upper-layer device when the terminal receives the play instruction, the play request includes the player position information, and the player position information is used to indicate the requested target position.
  • the receiving module 1802 is used to receive the video fragment corresponding to the target camera position and the rotating video data corresponding to the target camera position sent by the upper-layer device, and the rotating video data includes the video data corresponding to the forward camera position and/or the video corresponding to the reverse camera position Data, the forward camera position includes one or more first camera positions located in the clockwise direction of the target camera position, and the reverse camera position includes one or more second camera positions located in the counterclockwise direction of the target camera position.
  • the processing module 1803 is configured to determine the rotation direction according to the rotation instruction when the terminal receives the rotation instruction in the process of playing the video picture based on the video slice corresponding to the target camera, and the rotation direction is clockwise or counterclockwise.
  • the playing module 1804 is configured to play the video picture based on the target video data in response to the rotating video data including the target video data corresponding to the camera position located in the rotation direction of the target camera position.
  • the processing module 1803 is further configured to generate a rotation preparation request, and the rotation preparation request is used to request to obtain the rotation video data corresponding to the target camera position.
  • the sending module 1801 is further configured to send a rotation preparation request to the upper-layer device, and the rotation video data corresponding to the target camera position is sent by the upper-layer device in response to the rotation preparation request.
  • the rotation preparation request includes one or more of the preparation rotation direction, the number of preparation rotation stands, the identification of the preparation rotation station, or the preparation rotation state, and the preparation rotation state includes a dynamic rotation state and/or a static rotation state,
  • the content in the rotation preparation request is pre-configured in the terminal.
  • the target video data is the target rotation fragmentation
  • the target rotation fragmentation includes a plurality of image frame groups obtained based on the video images in the video fragments corresponding to the multiple positions of the rotation direction of the target position, and the target rotation fragmentation.
  • Each image frame group in the slice is obtained based on the video image in the video slice corresponding to a camera position located in the rotation direction of the target camera.
  • the image frame group includes one or more frames of video images, and each image frame group can be decoded independently.
  • the target rotation segment in response to the terminal receiving the rotation instruction in the video playback state, the target rotation segment includes dynamic rotation sub-slices, and multiple image frame groups in the dynamic rotation sub-slice are arranged in chronological order, and are arranged according to multiple camera positions.
  • the distances to the target camera position in the rotation direction are arranged in order from near to far; or, in response to the terminal receiving a rotation instruction in the video pause state, the target rotation segment includes a static rotation sub-segment, and the static rotation sub-segment
  • the corresponding playback periods of the multiple image frame groups are the same, and they are arranged in order from near to far from the distances from the multiple camera positions to the target camera position in the rotation direction.
  • the playing module 1804 is used for decoding and playing the target rotated slice.
  • the target video data includes video segments corresponding to a plurality of camera positions located in the rotation direction of the target camera position, respectively.
  • the playing module 1804 is configured to: generate image frame groups based on the video images in the video slices corresponding to the multiple cameras, respectively, the image frame groups include one or more frames of video images, and each image frame group can be decoded independently.
  • the video images in the generated image frame group are played in sequence according to the distance from the multiple camera positions to the target camera position in the rotation direction from near to far.
  • the sending module 1801 is further configured to, when the terminal receives the rotation instruction, send a surround playback request generated based on the rotation instruction to the upper-layer device, the surround playback request includes rotation camera position information, and the rotation camera position information is used to indicate the rotation. scope.
  • the receiving module 1802 is further configured to receive the rotating slice sent by the upper-layer device.
  • the rotating slice includes image frame groups corresponding to multiple cameras within the rotation range, and the image frame group includes one or more frames of video images. Groups of frames can be decoded independently.
  • the playing module 1804 is further configured to decode and play the rotated slice.
  • the image frame group is a GOP; or, the image frame group includes interleaved frames; or, the image frame group includes a combination of interleaved frames and P frames; or, the image frame group includes a combination of interleaved frames, P frames, and B frames.
  • Embodiments of the present application also provide a video playback system, where the system includes: an upper-layer device and a terminal.
  • the upper-layer device includes the video playback device as shown in FIG. 17
  • the terminal includes the video playback device as shown in FIG. 18 .
  • FIG. 19 is a block diagram of a video playback apparatus provided by an embodiment of the present application.
  • the video playback device may be an upper-layer device or a terminal, the upper-layer device may be a video server or a network device, and the terminal may be a mobile phone, a tablet computer, a smart wearable device, or a set-top box.
  • the video playback device 190 includes: a processor 1901 and a memory 1902 .
  • memory 1902 for storing computer programs, the computer programs including program instructions
  • the processor 1901 is configured to invoke the computer program to implement an action performed by an upper-layer device or an action performed by a terminal in the video playback method shown in FIG. 5 .
  • the video playback device 190 further includes a communication bus 1903 and a communication interface 1904 .
  • the processor 1901 includes one or more processing cores, and the processor 1901 executes various functional applications and data processing by running a computer program.
  • Memory 1902 may be used to store computer programs.
  • the memory may store the operating system and application program elements required for at least one function.
  • the operating system may be an operating system such as a real-time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS, or OS X.
  • the communication interfaces 1904 are used to communicate with other storage devices or network devices.
  • the communication interface of the upper-layer device may be used to send the rotated slice to the terminal, and the communication interface of the terminal may be used to send a surround playback request to the upper-layer device.
  • a network device can be a switch or a router, etc.
  • the memory 1902 and the communication interface 1904 are respectively connected to the processor 1901 through the communication bus 1903 .
  • Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor of a computer device, the video according to the foregoing method embodiments is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

公开了一种视频播放方法、装置及系统、计算机可读存储介质,属于视频处理技术领域。上层设备在接收到终端发送的播放请求后,向终端发送该终端所请求播放的目标机位对应的视频分片以及该目标机位对应的旋转视频数据。当终端接收到旋转指令时,可以根据预先获取的旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低。并且终端播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同,因此不受限于前端拍摄所采用的相机数量,应用范围广。

Description

视频播放方法、装置及系统、计算机可读存储介质
本申请要求于2021年04月22日提交的申请号为202110435658.5、发明名称为“视频播放方法、装置及系统、计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理技术领域,特别涉及一种视频播放方法、装置及系统、计算机可读存储介质。
背景技术
随着互联网技术的快速发展,用户开始追求更好的视频观看体验,从而衍生出围绕目标对象环绕观看的需求。特别是在体育比赛、演唱会或其它具有特定焦点的场景下,环绕观看需求更甚。为了满足用户的环绕观看需求,需要在终端上实现环绕播放。
环绕播放要求前端拍摄采用分布在特定位置的多相机采集同一焦点区域内不同角度的视频画面,同时基于相机同步技术,保证多相机采集图像的时刻和频率相同。然后多相机分别将采集的视频流发送到视频处理平台,由视频处理平台对多路视频流进行处理,进一步在终端上实现对焦点区域的环绕播放。
相关技术中,通常由服务端将多路视频流中采集时刻相同的视频帧拼接成一个视频帧。例如,前端拍摄采用16个相机采集同一焦点区域内不同角度的视频画面。服务端将接收到的16路视频流中每路视频流中的视频帧的分辨率均调整为960×540,然后将16路视频流中采集时刻相同的16个视频帧按照4×4等比例组合成分辨率为3840×2160(即4K)的一个视频帧,得到一路视频流。服务端向终端发送该路视频流。终端对该路视频流进行解码后,根据设置的观看机位,选择其中的1/16的视频画面(一个相机采集的视频画面)进行播放。
但是,采用相关技术中的视频播放方法,由于终端播放画面的分辨率与前端拍摄采用的相机数量成反比,导致前端拍摄采用的相机数量受限,因此应用局限性较高。
发明内容
本申请提供了一种视频播放方法、装置及系统、计算机可读存储介质,可以解决相关技术中视频播放的应用局限性较高的问题。
第一方面,提供了一种视频播放方法。该方法包括:上层设备接收终端发送的播放请求,该播放请求中包括播放机位信息,该播放机位信息用于指示所请求播放的目标机位。上层设备向终端发送目标机位对应的视频分片以及目标机位对应的旋转视频数据。该旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据。正向机位包括位于目标机位的顺时针方向的一个或多个第一机位,逆向机位包括位于目标机位的逆时针方向的一个或多个第二机位。
本申请中,上层设备向终端发送该终端所请求播放的目标机位对应的视频分片以及该目 标机位对应的旋转视频数据,终端在接收到目标机位对应的视频分片后,对该视频分片进行解码即可实现对该目标机位所采集的视频画面的播放;当终端接收到旋转指令时,可以根据预先获取的旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低,且播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同。因此本申请提供的视频播放方法不受限于前端拍摄所采用的相机数量,应用范围广。另外,与相关技术相比,上层设备无需始终向终端发送所有相机所采集的视频画面,可以减少数据传输量,节约传输资源。
可选地,上层设备向终端发送目标机位对应的旋转视频数据的实现过程,包括:响应于上层设备接收到终端发送的旋转预备请求,上层设备向终端发送旋转视频数据,旋转预备请求用于请求获取目标机位对应的旋转视频数据。或者,响应于播放请求,上层设备向终端发送旋转视频数据。
本申请中,上层设备可以在接收到终端发送的播放请求后,主动向终端发送所请求播放的机位对应的旋转视频数据,或者,也可以在接收到终端发送的旋转预备请求后,被动响应向终端发送所请求播放的机位对应的旋转视频数据。
可选地,正向机位对应的视频数据包括每个第一机位对应的视频分片。或者,正向机位包括位于目标机位的顺时针方向的多个第一机位,正向机位对应的视频数据为正向旋转分片,正向旋转分片包括正向动态旋转子分片和/或正向静态旋转子分片。其中,正向动态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向动态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,正向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。正向静态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,正向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
可选地,逆向机位对应的视频数据包括每个第二机位对应的视频分片。或者,逆向机位包括位于目标机位的逆时针方向的多个第二机位,逆向机位对应的视频数据为逆向旋转分片,逆向旋转分片包括逆向动态旋转子分片和/或逆向静态旋转子分片。其中,逆向动态旋转子分片包括基于多个第二机位对应的视频分片中的视频图像得到的多个图像帧组,逆向动态旋转子分片中的每个图像帧组基于一个第二机位对应的视频分片中的视频图像得到,逆向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第二机位在逆时针方向上到目标机位的距离由近至远依次排列。逆向静态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,逆向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,逆向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
本申请中涉及的图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。
可选地,上层设备还可以接收终端发送的环绕播放请求,该环绕播放请求中包括旋转机位信息,旋转机位信息用于指示旋转范围。上层设备基于环绕播放请求确定播放时间信息。上层设备根据旋转机位信息和播放时间信息生成旋转分片,该旋转分片中包括旋转范围内的多个机位对应的图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。 上层设备向终端发送旋转分片。
本申请中,上层设备根据终端发送的环绕播放请求确定播放时间信息,然后根据播放时间信息以及环绕播放请求中的旋转机位信息生成旋转分片。由于旋转分片中包含旋转机位信息所指示的旋转范围内的多个机位对应的图像帧组,终端在接收到旋转分片后,对旋转分片进行解码即可实现对视频画面的环绕播放,且播放的视频画面的分辨率可以与旋转分片中的视频图像的分辨率相同。因此本申请提供的视频播放方法不受限于前端拍摄采用的摄像机数量,应用范围广。
可选地,图像帧组为GOP。或者,图像帧组包括插入帧。或者,图像帧组包括插入帧和P帧的组合。或者,图像帧组包括插入帧、P帧和B帧的组合。
本申请中,图像帧组包括插入帧,即旋转分片可以基于插入帧生成,此时上层设备向终端发送的视频分片中无需使用全I帧或mini GOP,而可以使用正常GOP,能够降低上层设备向终端发送的视频分片的数据量;并且,插入帧的数据量通常小于I帧的数据量,能够降低上层设备向终端发送的旋转分片的数据量,因此利用插入帧技术生成旋转分片,可以有效减少网络传输资源的消耗。
可选地,旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,预备旋转状态包括动态旋转状态和/或静态旋转状态,旋转预备请求中的内容是终端中预先配置的。
第二方面,提供了一种视频播放方法。该方法包括:当终端接收到播放指令时,终端向上层设备发送基于播放指令生成的播放请求,播放请求中包括播放机位信息,播放机位信息用于指示所请求播放的目标机位。终端接收上层设备发送的目标机位对应的视频分片以及目标机位对应的旋转视频数据,旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,正向机位包括位于目标机位的顺时针方向的一个或多个第一机位,逆向机位包括位于目标机位的逆时针方向的一个或多个第二机位。当终端在基于目标机位对应的视频分片播放视频画面的过程中,接收到旋转指令时,终端根据旋转指令确定旋转方向,旋转方向为顺时针方向或逆时针方向。响应于旋转视频数据包括位于目标机位的旋转方向上的机位对应的目标视频数据,终端基于目标视频数据播放视频画面。
本申请中,上层设备向终端发送该终端所请求播放的目标机位对应的视频分片以及该目标机位对应的旋转视频数据,终端在接收到目标机位对应的视频分片后,对该视频分片进行解码即可实现对该目标机位所采集的视频画面的播放;当终端接收到旋转指令时,可以根据旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低,且播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同。因此本申请提供的视频播放方法不受限于前端拍摄所采用的相机数量,应用范围广。另外,与相关技术相比,上层设备无需始终向终端发送所有相机所采集的视频画面,可以减少数据传输量,节约传输资源。
可选地,在终端接收到旋转指令之前,终端还可以生成旋转预备请求,该旋转预备请求用于请求获取目标机位对应的旋转视频数据。终端向上层设备发送旋转预备请求,目标机位对应的旋转视频数据是上层设备响应于旋转预备请求发送的。
可选地,旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识 或预备旋转状态中的一个或多个,预备旋转状态包括动态旋转状态和/或静态旋转状态,旋转预备请求中的内容是终端中预先配置的。
在一种实现方式中,目标视频数据为目标旋转分片。目标旋转分片包括基于位于目标机位的旋转方向的多个机位对应的视频分片中的视频图像得到的多个图像帧组,目标旋转分片中的每个图像帧组基于位于目标机位的旋转方向的一个机位对应的视频分片中的视频图像得到,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。
其中,响应于终端在视频播放状态下接收到旋转指令,目标旋转分片包括动态旋转子分片,该动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个机位在旋转方向上到目标机位的距离由近至远依次排列。或者,响应于终端在视频暂停播放状态下接收到旋转指令,目标旋转分片包括静态旋转子分片,所述静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个机位在旋转方向上到目标机位的距离由近至远依次排列。
相应地,终端基于目标视频数据播放视频画面的实现过程,包括:终端对目标旋转分片进行解码播放。
在该实现方式中,当终端接收到旋转指令时,通过对预先获取的旋转分片进行解码即可实现对视频画面的环绕播放,环绕播放时延较低。
在另一种实现方式中,目标视频数据包括位于目标机位的旋转方向的多个机位分别对应的视频分片。终端基于目标视频数据播放视频画面的实现过程,包括:终端基于多个机位对应的视频分片中的视频图像分别生成图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。终端按照多个机位在旋转方向上到目标机位的距离由近至远的顺序,依次播放生成的图像帧组中的视频图像。
在该实现方式中,当终端接收到旋转指令时,终端可以解码播放预先获取的旋转方向上的机位对应的视频分片中的视频图像实现对视频画面的环绕播放,环绕播放时延较低。
可选地,当终端接收到旋转指令时,终端向上层设备发送基于旋转指令生成的环绕播放请求,环绕播放请求中包括旋转机位信息,旋转机位信息用于指示旋转范围。终端接收上层设备发送的旋转分片,旋转分片中包括旋转范围内的多个机位对应的图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。终端对旋转分片进行解码播放。
可选地,图像帧组为GOP;或者,图像帧组包括插入帧;或者,图像帧组包括插入帧和P帧的组合;或者,图像帧组包括插入帧、P帧和B帧的组合。
第三方面,提供了一种视频播放装置。该视频播放装置为上层设备。所述装置包括多个功能模块,所述多个功能模块相互作用,实现上述第一方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第四方面,提供了一种视频播放装置。该视频播放装置为终端。所述装置包括多个功能模块,所述多个功能模块相互作用,实现上述第二方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第五方面,提供了一种视频播放系统,所述系统包括:上层设备和终端,所述上层设备包括如第三方面所述的视频播放装置,所述终端包括如第四方面所述的视频播放装置。
第六方面,提供了一种视频播放装置,包括:处理器和存储器;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器,用于调用所述计算机程序,实现如第一方面任一所述的视频播放方法;或者,实现如第二方面任一所述的视频播放方法。
第七方面,提供了一种计算机存储介质,所述计算机存储介质上存储有指令,当所述指令被计算机设备的处理器执行时,实现如第一方面或第二方面任一所述的视频播放方法。
第八方面,提供了一种芯片,芯片包括可编程逻辑电路和/或程序指令,当芯片运行时,实现上述第一方面及其各实施方式中的方法或实现上述第二方面及其各实施方式中的方法。
第九方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如第一方面或第二方面任一所述的视频播放方法。
本申请提供的技术方案带来的有益效果至少包括:
上层设备向终端发送该终端所请求播放的目标机位对应的视频分片以及该目标机位对应的旋转视频数据,终端在接收到目标机位对应的视频分片后,对该视频分片进行解码即可实现对该目标机位所采集的视频画面的播放;当终端接收到旋转指令时,可以根据预取的旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低,且播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同。因此本申请实施例提供的视频播放方法不受限于前端拍摄所采用的相机数量,应用范围广。另外,与相关技术相比,上层设备无需始终向终端发送所有相机所采集的视频画面,可以减少数据传输量,节约传输资源。另外,旋转分片可以基于插入帧生成,此时上层设备向终端发送的视频分片中无需使用全I帧或mini GOP,而可以使用正常GOP,能够降低上层设备向终端发送的视频分片的数据量;并且,插入帧的数据量通常小于I帧的数据量,能够降低上层设备向终端发送的旋转分片的数据量,因此利用插入帧技术生成旋转分片,可以有效减少网络传输资源的消耗。
附图说明
图1是本申请实施例提供的一种视频播放系统的结构示意图;
图2是本申请实施例提供的一种视频分片的结构示意图;
图3是本申请实施例提供的一种编码得到的GOP与插入帧流的对比结构示意图;
图4是本申请实施例提供的一种媒体源侧的摄像机分布场景示意图;
图5是本申请实施例提供的一种视频播放方法的流程示意图;
图6是本申请实施例提供的另一种媒体源侧的摄像机分布场景示意图;
图7是本申请实施例提供的多个机位分别对应的视频分片的结构示意图;
图8是本申请实施例提供的一种上层设备向终端发送的媒体内容的示意图;
图9是本申请实施例提供的一种正向静态旋转分片的结构示意图;
图10是本申请实施例提供的多个机位分别对应的视频流和插入帧流的结构示意图;
图11是本申请实施例提供的另一种上层设备向终端发送的媒体内容的示意图;
图12是本申请实施例提供的另一种正向静态旋转分片的结构示意图;
图13是本申请实施例提供的一种动态旋转分片的生成过程示意图;
图14是本申请实施例提供的另一种动态旋转分片的生成过程示意图;
图15是本申请实施例提供的一种静态旋转分片的生成过程示意图;
图16是本申请实施例提供的另一种静态旋转分片的生成过程示意图;
图17是本申请实施例提供的一种视频播放装置的结构示意图;
图18是本申请实施例提供的另一种视频播放装置的结构示意图;
图19是本申请实施例提供的一种视频播放装置的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1是本申请实施例提供的一种视频播放系统的结构示意图。如图1所示,该系统包括:媒体源101、视频服务器102和终端103。
媒体源101用于提供多路视频流。参见图1,媒体源101包括多个摄像机1011和前端编码器1012。摄像机1011与前端编码器1012连接。每个摄像机1011用于采集一路视频流,并将采集到的视频流传输至前端编码器1012。前端编码器1012用于对多个摄像机1011采集的视频流进行编码,并将编码后的视频流发送给视频服务器102。本申请实施例中,多个摄像机1011用于采集同一焦点区域内不同角度的视频图像,且该多个摄像机1011采集图像的时刻和频率相同。可选地,可以采用相机同步技术实现多个摄像机1011的同步拍摄。图中摄像机的数量仅用作示例性说明,不作为对本申请实施例提供的视频播放系统的限制。多个摄像机可以采用环形排布方式或扇形排布方式等,本申请实施例对摄像机的排布方式不做限定。
视频服务器102用于对媒体源101发送的视频流采用OTT(over the top)技术进行处理,并将处理后的视频流通过内容分发网络(content delivery network,CDN)分发至终端。CDN是构建在现有网络基础之上的智能虚拟网络,CDN可以包括部署在各地的边缘服务器,还可以包括中心服务器。可选地,参见图1,视频服务器102包括视频处理服务器1021和视频分发服务器1022。视频处理服务器1021用于采用OTT技术对视频流进行处理,并将处理后的视频流发送给视频分发服务器1022;视频分发服务器1022用于将视频流分发至终端。其中,视频处理服务器1021也可称为视频处理平台,视频处理服务器1021可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。视频分发服务器1022可以是CDN的中心服务器或者边缘服务器。当然,视频处理服务器1021与视频分发服务器1022也可以集成在一起,本申请实施例对此不做限定。
终端103即视频播放端,用于对视频服务器102发送的视频流进行解码播放。可选地,终端103能够通过触控、语音控制、手势控制或遥控器控制等控制方式中一种或多种方式改变播放角度。本申请实施例对触发终端改变播放角度的控制方式不做限定。例如,终端103可以是手机、平板电脑或智能可穿戴设备等能够通过触控方式或语音控制方式改变播放角度的设备。或者,终端103也可以是机顶盒(set top box,STB)等能够通过遥控器的控制改变播 放角度的设备。
本申请实施例中,媒体源101侧的前端编码器1012或视频服务器102侧的视频处理服务器1021在获取多路视频流后,对每路视频流进行重新编码(也可称为转码)得到图像组(Group of Pictures,GOP),并基于GOP生成视频分片进行传输,每个GOP可被独立解码。其中,一个视频分片中通常封装有多个GOP,每个GOP包括一帧或多帧视频图像。例如,一个GOP可以包括一个帧内编码图像(intra coded picture,I)帧;或者,一个GOP可以包括一个I帧以及位于I帧之后的一个或多个预测编码图像(predictive coded picture,P)帧;又或者,一个GOP可以包括一个I帧、位于I帧之后的一个或多个P帧以及位于I帧和P帧之间的一个或多个双向预测编码图像(bidirectionally predicted picture,B)帧。GOP通常是一组时间上的连续视频图像。对视频流进行重新编码得到的GOP的时间戳与摄像机对该GOP中的视频图像的采集时刻对应。例如,GOP的时间戳可以被设置为该GOP中最后一帧视频图像的采集时刻。又例如,当GOP中包括多帧视频图像时,GOP对应有开始时间戳和结束时间戳,开始时间戳为该GOP中第一帧视频图像的采集时刻,结束时间戳为该GOP中最后一帧视频图像的采集时刻。
可选地,时间长度小于1秒的GOP通常被称为小GOP(mini GOP)。GOP的时间参数可由管理人员设置。在固定时间长度下,每个GOP中包含的视频图像帧数与摄像机的拍摄帧率正相关,即摄像机的拍摄帧率越高,每个GOP中包含的视频图像帧数越多。例如,GOP中可以包括2帧视频图像(可对应每秒传输帧数(frame per second,FPS)为25(简称:25FPS))、3帧视频图像(可对应30FPS)、5帧视频图像(可对应50FPS)或6帧视频图像(可对应60FPS)。当然,GOP中也可以只包括1帧视频图像(即仅包括I帧)或包括更多帧视频图像,本申请实施例对此不做限定。
本申请实施例中,视频分片中的GOP采用独立传输封装方式编码,使得每个GOP可以作为单独的碎片(也可称为子分片)进行独立使用。例如,视频分片可以采用碎片mp4(fragmented mp4,fmp4)格式进行封装。fmp4格式是运动图像专家组(moving picture expert group,MPEG)提出的MPEG-4标准中定义的流媒体格式。图2是本申请实施例提供的一种视频分片的结构示意图。如图2所示,该视频分片中包括n个封装头和n个数据字段(mdat),每个mdat用于承载一个GOP的数据,也即是该视频分片中封装有n个GOP,n为大于1的整数。每个封装头中包括moof字段。该视频分片的封装方式也可称为多moof头封装方式。可选地,封装头中还可以包括styp字段和sidx字段。
值得说明的是,本申请实施例中涉及的分片(segment)是指能够被独立请求获取的视频数据,子分片(fragment)是指能够被独立解码播放的视频数据。一个分片通常包括一个或多个子分片。
可选地,媒体源101侧的前端编码器1012或视频服务器102侧的视频处理服务器1021在获取多路视频流后,还可以对每路视频流进行重新编码得到插入帧流。插入帧流包括多个插入帧,插入帧为不参考时域运动矢量编码得到的P帧,插入帧可视为I帧的延续。插入帧被定义为不依赖于I帧能够独立解码的P帧,普通P帧必须依赖于I帧才能进行解码操作,而插入帧可以独立解码。本申请以下实施例中采用P’帧表示插入帧。
本申请实施例中,针对每个机位,前端编码器1012或视频处理服务器1021可以针对一个GOP中的多个P帧,间隔编码得到对应的P’帧。例如,图3是本申请实施例提供的一种 编码得到的GOP与插入帧流的对比结构示意图。如图3所示,GOP包括I帧以及位于I帧之后的9个P帧,该9个P帧分别为P-0至P-8,对应地,插入帧流包括4个P’帧,该4个P’帧分别为P’-1、P’-3、P’-5和P’-7。其中,P’-1帧对应的视频画面与P-1帧对应的视频画面相同,P’-3帧对应的视频画面与P-3帧对应的视频画面相同,P’-5帧对应的视频画面与P-5帧对应的视频画面相同,P’-7帧对应的视频画面与P-7帧对应的视频画面相同。P-0帧依赖于I帧解码得到视频图像,P-2帧可以依赖于P’-1帧解码得到视频图像,P-4帧可以依赖于P’-3帧解码得到视频图像,P-6帧可以依赖于P’-5帧解码得到视频图像,P-8帧可以依赖于P’-7帧解码得到视频图像。
视频服务器102侧的视频处理服务器1021还根据外部设置的数据,生成媒体内容索引(也可称为OTT索引)。媒体内容索引用于描述每条视频流的信息,媒体内容索引实质上为描述视频流的信息的文件。视频流的信息包括视频流的地址信息以及视频流的时间信息等。视频流的地址信息用于指示该视频流的获取地址,例如视频流的地址信息可以是该视频流对应的统一资源定位符(uniform resource locator,URL)地址。视频流的时间信息用于指示该视频流中每个视频分片的起始时刻和结束时刻。其中,视频分片的起始时刻可以是该视频分片中的第一帧视频图像的采集时刻,视频分片的结束时刻可以是该视频分片中最后一帧视频图像的采集时刻。可选地,该媒体内容索引中还可以包括机位信息。机位信息包括机位数量(即媒体源侧的摄像机数量)和每条视频流对应的机位角度。视频流对应的机位角度即摄像机对应的机位角度。
例如,图4是本申请实施例提供的一种媒体源侧的摄像机分布场景示意图。如图4所示,该场景中包括20个摄像机,分别记为摄像机1-20。该20个摄像机采用环形排布方式,用于拍摄同一焦点区域M,拍摄焦点均为点O。可以将其中一个摄像机对应的机位角度设置为0,并对应计算其它摄像机对应的机位角度。例如可以将摄像机4对应的机位角度设置为0°,分别计算其它摄像机对应的机位角度,则摄像机9对应的机位角度为90°,摄像机14对应的机位角度为180°,摄像机19对应的机位角度为270°。
管理人员可以将摄像机数量以及各个摄像机对应的机位角度输入视频处理服务器,供视频处理服务器生成媒体内容索引。可选地,本申请实施例中的媒体内容索引可以是m3u8文件(可称为基于超文本传输协议(hyper text transfer protocol,HTTP)的直播流(HTTP living streaming,HLS)索引)或媒体演示描述(media presentation description,MPD)文件(可称为基于HTTP的动态自适应流(dynamic adaptive streaming over HTTP,DASH)索引)。其中,m3u8文件是指8位统一码转换格式(8-bit unicode transformation format,UTF-8)编码格式的m3u文件。
本申请实施例中,视频服务器102与终端103之间可以基于超文本传输协议(hyper text transfer protocol,HTTP)传输视频流。终端获取视频服务器中的视频内容的过程包括:终端先从视频服务器下载媒体内容索引,通过解析该媒体内容索引得到视频流的信息。终端选择当前需要播放的视频流,并从媒体内容索引中提取该视频流的URL地址,然后通过该视频流的URL地址向视频服务器发送媒体内容请求。视频服务器接收到该媒体内容请求后,向终端发送对应的视频流。
可选地,请继续参见图1,该视频播放系统中还可以包括网络设备104,视频服务器102与终端103之间通过网络设备104连接。网络设备104可以是网关或其它中间设备。当然, 视频服务器102与终端103之间也可以直接连接,本申请实施例对此不做限定。
图5是本申请实施例提供的一种视频播放方法的流程示意图。该方法可以应用于如图1所示的视频播放系统中。如图5所示,该方法包括:
步骤501、当终端接收到播放指令时,终端基于该播放指令生成播放请求。
该播放请求中包括播放机位信息,该播放机位信息用于指示所请求播放的目标机位。
可选地,播放机位信息包括目标机位对应的视频流的URL地址。或者,当终端获取的媒体内容索引中包括机位信息时,播放机位信息可以包括目标机位的标识。本申请实施例中,播放请求也可称为媒体内容请求。
步骤502、终端向上层设备发送播放请求。
本申请实施例中,上层设备指终端的上游设备。可选地,上层设备可以是如图1所示的视频播放系统中的视频服务器102或网络设备104。
步骤503、上层设备向终端发送目标机位对应的视频分片以及该目标机位对应的旋转视频数据。
其中,目标机位对应的旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据。正向机位包括位于目标机位的顺时针方向的一个或多个第一机位。逆向机位包括位于目标机位的逆时针方向的一个或多个第二机位。例如在如图3所示的场景中,假设目标机位为摄像机4,正向机位可以包括摄像机5、摄像机6和摄像机7等,逆向机位可以包括摄像机3、摄像机2和摄像机1等。又例如,图6是本申请实施例提供的另一种媒体源侧的摄像机分布场景示意图。如图6所示,该场景中包括9个摄像机,分为记为摄像机J至摄像机R。该9个摄像机采用扇形排布方式,用于拍摄同一焦点区域M’。假设目标机位为摄像机N,正向机位可以包括摄像机M、摄像机L、摄像机K和摄像机J,逆向机位可以包括摄像机O、摄像机P、摄像机Q和摄像机R。
可选地,当正向机位包括位于目标机位的顺时针方向的一个第一机位时,正向机位对应的视频数据包括该第一机位对应的视频分片。当正向机位包括位于目标机位的顺时针方向的多个第一机位时,正向机位对应的视频数据包括该多个第一机位中的每个第一机位分别对应的视频分片;或者,正向机位对应的视频数据为正向旋转分片。其中,正向旋转分片包括正向动态旋转子分片和/或正向静态旋转子分片。正向动态旋转子分片包括基于该多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向动态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到。正向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。正向静态旋转子分片包括基于该多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到。正向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
可选地,当逆向机位包括位于目标机位的逆时针方向的一个第二机位时,逆向机位对应的视频数据包括该第二机位对应的视频分片。当逆向机位包括位于目标机位的逆时针方向的多个第二机位时,逆向机位对应的视频数据包括该多个第二机位中的每个第二机位分别对应的视频分片;或者,逆向机位对应的视频数据为逆向旋转分片。其中,逆向旋转分片包括逆 向动态旋转子分片和/或逆向静态旋转子分片。逆向动态旋转子分片包括基于该多个第二机位对应的视频分片中的视频图像得到的多个图像帧组,逆向动态旋转子分片中的每个图像帧组基于一个第二机位对应的视频分片中的视频图像得到。逆向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第二机位在逆时针方向上到目标机位的距离由近至远依次排列。逆向静态旋转子分片包括基于该多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,逆向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到。逆向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
值得说明的是,动态旋转分片和静态旋转分片的区别在于:前者包括的多个图像帧组按照时间先后顺序排列,后者包括的多个图像帧组对应的播放时段相同。前者可以预备用于视频播放状态下进行的环绕播放,后者可以预备用于视频暂停播放状态下进行的环绕播放。
本申请实施例中,旋转分片中的每个图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。可选地,图像帧组为GOP。或者,图像帧组包括插入帧。或者,图像帧组包括插入帧和P帧的组合,P帧依赖于插入帧解码。或者,图像帧组包括插入帧、P帧和B帧的组合,P帧依赖于插入帧解码,B帧依赖于插入帧和P帧解码。
在第一种实现方式中,旋转分片中的图像帧组为GOP。例如,图7是本申请实施例提供的多个机位分别对应的视频分片的结构示意图。如图7所示,在图6所示的场景中的摄像机J至摄像机R中的每个摄像机都对应有按照时间先后顺序排列的视频分片a(对应时段T1)、视频分片b(对应时段T2)和视频分片c(对应时段T3)。每个视频分片分别包括5个GOP,视频分片a包括编号为1至5的GOP,视频分片b包括编号为6至10的GOP,视频分片c包括编号为11至15的GOP。
假设目标机位为摄像机N,正向机位包括摄像机M、摄像机L和摄像机K,逆向机位包括摄像机O、摄像机P和摄像机Q,目标机位对应的旋转视频数据包括正向动态旋转分片和逆向动态旋转分片。则上层设备向终端发送的时段T1内的媒体内容可以如图8所示,包括摄像机N对应的视频分片N-a、摄像机N对应的正向动态旋转分片N-a1以及摄像机N对应的逆向动态旋转分片N-a2。其中,摄像机N对应的视频分片N-a包括N-1至N-5;正向动态旋转分片N-a1包括M-1、L-2、K-3、M-4和L-5;逆向动态旋转分片N-a2包括O-1、P-2、Q-3、O-4和P-5。其中,M-1、L-2和K-3组成一个正向动态旋转子分片,M-4和L-5组成另一个正向动态旋转子分片,即正向动态旋转分片N-a1包括两个正向动态旋转子分片;对应地,O-1、P-2和Q-3组成一个逆向动态旋转子分片,O-4和P-5组成另一个逆向动态旋转子分片,即逆向动态旋转分片N-a2包括两个逆向动态旋转子分片。
例如,图9是本申请实施例提供的一种正向静态旋转分片的结构示意图。如图9所示,继续参考图7的示例,摄像机N对应的正向机位包括摄像机M和摄像机L,摄像机N在时段T1内对应有5个正向静态旋转子分片1-5,该5个正向静态旋转子分片1-5与N-1至N-5在时间上一一对应,正向静态旋转子分片1包括M-1和L-1,正向静态旋转子分片2包括M-2和L-2,正向静态旋转子分片3包括M-3和L-3,正向静态旋转子分片4包括M-4和L-4,正向静态旋转子分片5包括M-5和L-5。其中,正向静态旋转子分片1用于视频画面暂停在N-1对应的视频图像上时进行顺时针环绕播放,正向静态旋转子分片2用于视频画面暂停在N-2对应的视频图像上时进行顺时针环绕播放,以此类推,本申请实施例不再一一赘述。逆向 静态旋转子分片的结构可参考正向静态旋转子分片的结构。
在第二种实现方式中,当各个机位分别对应有视频流(包括视频分片)和插入帧流时,旋转分片中的图像帧组可以包括插入帧(P’帧)。例如,图10是本申请实施例提供的多个机位分别对应的视频流和插入帧流的结构示意图。如图10所示,在图6所示的场景中的摄像机J至摄像机R中的每个摄像机都对应有视频流和插入帧流,视频流包括多个视频分片(图10中仅示出视频分片中的一个GOP),该GOP包括I帧以及位于I帧之后的8个P帧,该8个P帧分别为P-1至P-8。插入帧流包括针对GOP中的多个P帧,间隔编码得到的多个P’帧,包括P’-1、P’-3、P’-5和P’-7。
假设目标机位为摄像机N,正向机位包括摄像机M、摄像机L和摄像机K,逆向机位包括摄像机O、摄像机P和摄像机Q,目标机位对应的旋转视频数据包括正向动态旋转分片和逆向动态旋转分片。则上层设备向终端发送的时段T1’内的媒体内容可以如图11所示,包括摄像机N对应的N-GOP、摄像机N对应的正向动态旋转分片N-a1’以及摄像机N对应的逆向动态旋转分片N-a2’。其中,摄像机N对应的N-GOP包括NI以及NP-0至NP-8;正向动态旋转分片N-a1’包括MI、MP-0、LP’-1、LP-2、KP’-3、KP-4、MP’-5、MP-6、LP’-7和LP-8;逆向动态旋转分片N-a2’包括OI、OP-0、PP’-1、PP-2、QP’-3、QP-4、OP’-5、OP-6、PP’-7和PP-8。其中,MI、MP-0、LP’-1、LP-2、KP’-3和KP-4组成一个正向动态旋转子分片,MP’-5、MP-6、LP’-7和LP-8组成另一个正向动态旋转子分片;对应地,OI、OP-0、PP’-1、PP-2、QP’-3和QP-4组成一个逆向动态旋转子分片,OP’-5、OP-6、PP’-7和PP-8组成另一个逆向动态旋转子分片。
例如,图12是本申请实施例提供的另一种正向静态旋转分片的结构示意图。如图12所示,继续参考图10的示例,摄像机N对应的正向机位包括摄像机M和摄像机L,摄像机N在时段T1’内对应有10个正向静态旋转子分片1-10,该10个正向静态旋转子分片1-10与N-GOP中的10帧视频图像在时间上一一对应,正向静态旋转子分片1包括MI和LI,正向静态旋转子分片2包括MI、MP-0、LI和LP-0(MI用于供MP-0解码,LI用于供LP-0解码),正向静态旋转子分片3包括MP’-1和LP’-1,正向静态旋转子分片4包括MP’-1、MP-2、LP’-1和LP-2(MP’-1用于供MP-2解码,LP’-1用于供LP-2解码),以此类推。可选地,正向静态旋转子分片2中也可以不包括MI和LI,MP-0依赖于正向静态旋转子分片1中的MI解码,LP-0依赖于正向静态旋转子分片1中的LI解码;正向静态旋转子分片4中也可以不包括MP’-1和LP’-1,MP-2依赖于正向静态旋转子分片3中的MP’-1解码,LP-2依赖于正向静态旋转子分片3中的LP’-1解码;等等。也即是,P帧对应的静态旋转子分片可以基于该P帧解码时所依赖的I帧或P’帧对应的静态旋转子分片进行解码。其中,正向静态旋转子分片1用于视频画面暂停在NI对应的视频图像上时进行顺时针环绕播放,正向静态旋转子分片2用于视频画面暂停在NP-0对应的视频图像上时进行顺时针环绕播放,以此类推,本申请实施例不再一一赘述。逆向静态旋转子分片的结构可参考正向静态旋转子分片的结构。
在上述第二种实现方式中,由于旋转分片可以基于插入帧生成,因此上层设备向终端发送的视频分片中无需使用全I帧或mini GOP,而可以使用正常GOP,相较于上述第一种实现方式,能够降低上层设备向终端发送的视频分片的数据量。另外,插入帧的数据量通常小于I帧的数据量,相较于第一种实现方式,能够降低上层设备向终端发送的正向旋转分片和逆向旋转分片的数据量。因此该第二种实现方式可以有效减少网络传输资源的消耗。
可选地,上层设备接收到终端发送的播放请求后,可以响应于该播放请求,向终端发送目标机位对应的视频分片以及目标机位对应的旋转视频数据,也即是,目标机位对应的视频分片以及目标机位对应的旋转视频数据可以均是上层设备响应于播放请求发送的。
或者,上层设备接收到终端发送的播放请求后,可以响应于该播放请求,只向终端发送目标机位对应的视频分片。终端还可以生成旋转预备请求,并向上层设备发送该旋转预备请求。该旋转预备请求用于请求获取目标机位对应的旋转视频数据。响应于上层设备接收到该旋转预备请求,上层设备再向终端发送目标机位对应的旋转视频数据。也即是,目标机位对应的视频分片是上层设备响应于播放请求发送的,目标机位对应的旋转视频数据是上层设备响应于旋转预备请求发送的。可选地,终端可以向上层设备同时发送播放请求和旋转预备请求;或者,终端也可以先向上层设备发送播放请求,再向上层设备发送旋转预备请求,本申请实施例对此不做限定。值得说明的是,旋转预备请求是终端在接收到旋转指令之前向上层设备发送的,也即是该旋转预备请求用于预取所请求播放的机位对应的旋转视频数据。
本申请实施例中,上层设备可以在接收到终端发送的播放请求后,主动向终端发送所请求播放的机位对应的旋转视频数据,或者,也可以在接收到终端发送的旋转预备请求后,被动响应向终端发送所请求播放的机位对应的旋转视频数据。
可选地,旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,预备旋转状态包括动态旋转状态和/或静态旋转状态,该旋转预备请求中的内容是终端中预先配置的。其中,动态旋转状态用于指示获取动态旋转分片,静态旋转状态用于指示获取静态旋转分片。终端可以在接收到预设的触发操作后,生成并向上层设备发送旋转预备请求。例如,当终端检测到顺时针横屏显示视频图像时,确定预备旋转方向为顺时针方向,此时终端可以向上层设备请求正向机位对应的视频数据。又例如,当终端检测到逆时针横屏显示视频图像时,确定预备旋转方向为逆时针方向,此时终端可以向上层设备请求逆向机位对应的视频数据。又例如,终端可以在显示界面上显示目标按钮,当终端检测到对目标按钮的触控操作时,终端向上层设备请求目标机位对应的旋转视频数据。又例如,终端可以基于用户历史行为数据向上层设备请求相应的旋转视频数据,等等。
步骤504、当终端在基于目标机位对应的视频分片播放视频画面的过程中,接收到旋转指令时,终端根据该旋转指令确定旋转方向。
该旋转方向为顺时针方向或逆时针方向。终端在基于目标机位对应的视频分片播放视频画面的过程中接收到旋转指令,可以是终端在视频播放状态下接收到旋转指令,或者也可以是终端在视频暂停播放状态下接收到旋转指令。
在一种实现方式中,当终端在视频播放界面上检测到滑动操作时,终端确定接收到旋转指令。终端根据该滑动操作的滑动方向,确定旋转方向。例如,滑动方向向左表示逆时针旋转,滑动方向向右表示顺时针旋转。
在另一种实现方式中,当终端接收到遥控设备发送的目标遥控指令时,终端确定接收到旋转指令。终端根据目标遥控指令中的按键标识,确定旋转方向。例如,当遥控按键信息中包括左键的标识时,表示旋转方向为逆时针方向,当遥控按键信息中包括右键的标识时,表示旋转方向为顺时针方向。当然还可以设置遥控设备上的其它按键控制旋转方向,本申请实施例对此不做限定。
步骤505、响应于旋转视频数据包括位于目标机位的旋转方向上的机位对应的目标视频 数据,终端基于目标视频数据播放视频画面。
在一种实现方式中,目标视频数据为目标旋转分片,该目标旋转分片包括基于位于目标机位的旋转方向的多个机位对应的视频分片中的视频图像得到的多个图像帧组,该目标旋转分片中的每个图像帧组基于位于该目标机位的旋转方向的一个机位对应的视频分片中的视频图像得到。其中,响应于终端在视频播放状态下接收到该旋转指令,目标旋转分片包括动态旋转子分片,动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照该多个机位在旋转方向上到目标机位的距离由近至远依次排列,也即是,该目标旋转分片为动态旋转分片。或者,响应于终端在视频暂停播放状态下接收到该旋转指令,目标旋转分片包括静态旋转子分片,静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照该多个机位在旋转方向上到目标机位的距离由近至远依次排列,也即是,该目标旋转分片为静态旋转分片。
相应地,终端基于目标视频数据播放视频画面的实现过程,包括:终端对目标旋转分片进行解码播放。具体实现时,若终端在视频播放状态下接收到旋转指令,且旋转指令所指示的旋转方向为顺时针方向,则终端对与播放时刻对应的正向动态旋转子分片进行解码播放。若终端在视频播放状态下接收到旋转指令,且旋转指令所指示的旋转方向为逆时针方向,则终端对与播放时刻对应的逆向动态旋转子分片进行解码播放。若终端在视频暂停播放状态下接收到旋转指令,且旋转指令所指示的旋转方向为顺时针方向,则终端对视频暂停时刻对应的正向静态旋转子分片进行解码播放。若终端在视频暂停播放状态下接收到旋转指令,且旋转指令所指示的旋转方向为逆时针方向,则终端对视频暂停时刻对应的逆向静态旋转子分片进行解码播放。
在该实现方式中,当终端接收到旋转指令时,通过对预先获取的旋转分片进行解码即可实现对视频画面的环绕播放,环绕播放时延较低。
在另一种实现方式中,目标视频数据包括位于目标机位的旋转方向的多个机位分别对应的视频分片。终端基于目标视频数据播放视频画面的实现过程,包括:终端基于该多个机位对应的视频分片中的视频图像分别生成图像帧组。终端按照多个机位在旋转方向上到目标机位的距离由近至远的顺序,依次播放生成的图像帧组中的视频图像。
具体实现时,若终端在视频播放状态下接收到旋转指令,则终端根据位于目标机位的旋转方向的多个机位在旋转方向上到目标机位的距离,基于该多个机位对应的视频分片中的视频图像生成按照时间先后顺序排列的多个图像帧组,每个图像帧组基于一个机位对应的视频分片中的视频图像生成,然后依次播放该多个图像帧组中的视频图像。例如,参考图7示出的例子,目标机位为摄像机N,正向机位包括摄像机M和摄像机L,终端当前正在播放摄像机N对应的视频分片,假设在播放至N-2时接收到旋转指令,且基于该旋转指令确定的旋转方向为顺时针方向,则终端依次提取并解码播放摄像机M对应的视频分片a中的M-3以及摄像机L对应的视频分片a中的L-4。
若终端在视频暂停播放状态下接收到旋转指令,则终端基于该多个机位对应的视频分片中视频暂停时刻对应的视频图像分别生成图像帧组,每个图像帧组基于一个机位对应的视频分片中该视频暂停时刻对应的视频图像生成,然后按照该多个机位在旋转方向上到目标机位的距离由近至远的顺序,依次播放生成的图像帧组中的视频图像。例如,参考图7示出的例子,目标机位为摄像机N,正向机位包括摄像机M和摄像机L,终端当前的视频画面暂停在摄像机N对应的视频分片a中的N-2,此时接收到旋转指令,且基于该旋转指令确定的旋转 方向为顺时针方向,则终端分别提取摄像机M对应的视频分片a中的M-2以及摄像机L对应的视频分片a中的L-2,并依次解码播放M-2和L-2。
可选地,当目标视频数据仅包括位于目标机位的旋转方向的一个机位对应的视频分片时,终端直接解码播放该机位对应的视频分片中的视频图像。若终端在视频播放状态下接收到旋转指令,则终端解码播放该机位对应的视频分片中下一播放时刻对应的视频图像。例如,参考图7示出的例子,目标机位为摄像机N,正向机位包括摄像机M,终端当前正在播放摄像机N对应的视频分片,假设在播放至N-2时接收到旋转指令,且基于该旋转指令确定的旋转方向为顺时针方向,此时终端解码播放摄像机M对应的视频分片a中的M-3。若终端在视频暂停播放状态下接收到该旋转指令,则终端解码播放该机位对应的视频分片中视频暂停时刻对应的视频图像。例如,参考图7示出的例子,目标机位为摄像机N,正向机位包括摄像机M,终端当前的视频画面暂停在摄像机N对应的视频分片a中的N-2,此时接收到旋转指令,且基于该旋转指令确定的旋转方向为顺时针方向,终端解码播放摄像机M对应的视频分片a中的M-2。
在该实现方式中,当终端接收到旋转指令时,终端可以解码播放预先获取的旋转方向上的机位对应的视频分片中的视频图像实现对视频画面的环绕播放,环绕播放时延较低。
本申请实施例中,上层设备向终端发送该终端所请求播放的目标机位对应的视频分片以及该目标机位对应的旋转视频数据,终端在接收到目标机位对应的视频分片后,对该视频分片进行解码即可实现对该目标机位所采集的视频画面的播放;当终端接收到旋转指令时,可以根据预先获取的旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低,且播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同。因此本申请实施例提供的视频播放方法不受限于前端拍摄所采用的相机数量,应用范围广。另外,与相关技术相比,上层设备无需始终向终端发送所有相机所采集的视频画面,可以减少数据传输量,节约传输资源。
可选地,当终端接收到旋转指令时,还可以执行以下步骤506至步骤511。
步骤506、终端基于该旋转指令生成环绕播放请求。
环绕播放请求中包括旋转机位信息,该旋转机位信息用于指示旋转范围。可选地,当终端获取的媒体内容索引中包括机位信息时,终端接收到旋转指令后,可以根据旋转指令以及机位信息确定起始机位、终止机位和旋转方向,此时旋转机位信息中可以包括起始机位的标识、终止机位的标识和旋转方向。或者,终端接收到旋转指令后,可以根据旋转指令确定旋转角度,此时旋转机位信息中可以包括旋转角度。
可选地,当终端在视频播放状态下接收到旋转指令时,终端生成的环绕播放请求用于请求动态环绕播放视频内容。这种情况下,环绕播放请求还用于确定播放开始时刻和播放结束时刻。可选地,环绕播放请求还包括播放时间信息,该播放时间信息包括播放开始时刻、播放结束时刻或环绕播放时长中的一个或多个。
可选地,当终端在视频暂停播放状态下接收到旋转指令时,终端生成的环绕播放请求用于请求静态环绕播放视频内容。这种情况下,环绕播放请求还用于确定目标播放时刻。可选地,环绕播放请求中包括该目标播放时刻,该目标播放时刻可以是视频暂停时刻。静态环绕播放视频内容指,对多个机位提供的目标播放时刻对应的视频画面进行环绕播放。
在一种实现方式中,当终端在视频播放界面上检测到滑动操作时,终端确定接收到旋转 指令。终端根据该滑动操作的滑动信息,确定旋转机位信息,该滑动信息包括滑动起始位置、滑动长度、滑动方向或滑动角度中的一个或多个。然后终端基于该旋转机位信息生成环绕播放请求。其中,滑动起始位置、滑动长度和滑动方向可以用于确定起始机位、终止机位和旋转方向。滑动角度可以用于确定旋转角度。
可选地,滑动起始位置对应起始机位,滑动方向对应旋转方向,滑动长度用于定义切换的机位数量。滑动方向向左表示逆时针旋转,滑动方向向右表示顺时针旋转。滑动长度每达到单位长度,表示切换一个机位。例如单位长度可以设置为1厘米,当滑动长度达到3厘米时,表示切换3个机位。滑动敏感度与单位长度的设置值负相关,即单位长度的设置值越小,滑动敏感度越高。滑动敏感度可根据实际需求设置。例如,假设滑动方向向右,滑动长度为5厘米,单位长度为1厘米,则表示顺时针旋转切换5个机位。参考图4,假设滑动起始位置对应的起始机位为摄像机9,则终端确定旋转方向为顺时针,终止机位为摄像机14。
可选地,当环绕播放请求用于请求动态环绕播放视频内容时,还可通过滑动时长定义环绕播放时长。例如可以使环绕播放时长等于滑动时长。
可选地,滑动角度用于确定旋转角度。可以设置旋转角度与滑动角度满足一定关系,例如使旋转角度等于滑动角度;或者使旋转角度等于滑动角度的2倍;等等。当旋转机位信息中包括旋转角度时,还可以采用旋转角度的正负表示旋转方向。例如旋转角度为正值,表示顺时针旋转,旋转角度为负值,表示逆时针旋转。
在另一种实现方式中,当终端接收到遥控设备发送的目标遥控指令时,终端确定接收到旋转指令。目标遥控指令中包括遥控按键信息,遥控按键信息包括按键标识和/或按键次数。终端根据该遥控按键信息,确定旋转机位信息。然后终端基于该旋转机位信息生成环绕播放请求。其中,按键标识可以用于确定旋转方向。按键次数可以用于确定切换机位数量。
可选地,旋转方向基于按键标识确定。例如,当遥控按键信息中包括左键的标识时,表示旋转方向为逆时针方向,当遥控按键信息中包括右键的标识时,表示旋转方向为顺时针方向。当然还可以设置遥控设备上的其它按键控制旋转方向,本申请实施例对此不做限定。按键次数用于定义切换的机位数量,例如按键次数为1,表示切换一个机位。例如,假设遥控按键信息中包括左键的标识,且按键次数为3,则表示逆时针旋转切换3个机位。参考图4,假设起始机位为摄像机9,则终端根据按键标识确定旋转方向为逆时针,根据按键次数确定切换的机位数量为3,进而确定终止机位为摄像机6。
可选地,当环绕播放请求用于请求动态环绕播放视频内容时,还可通过按键时长定义环绕播放时长。例如可以使环绕播放时长等于按键时长。
步骤507、终端向上层设备发送环绕播放请求。
步骤508、上层设备基于环绕播放请求确定播放时间信息。
在本申请的一个可选实施例中,环绕播放请求用于请求动态环绕播放视频内容,则播放时间信息包括播放开始时刻和播放结束时刻。上层设备基于环绕播放请求确定播放时间信息的实现方式包括以下五种:
在第一种实现方式中,步骤508的实现过程包括:上层设备根据接收到环绕播放请求的时刻以及预设的策略,确定播放开始时刻和播放结束时刻。预设的策略中包括预设环绕播放时长。
可选地,预设的策略中定义有:将上层设备接收到环绕播放请求时的视频播放时刻作为 播放开始时刻,且播放结束时刻与播放开始时刻的间隔时长等于预设环绕播放时长。例如,上层设备接收到环绕播放请求时的视频播放时刻为00:19:35,预设环绕播放时长为2秒,则上层设备确定播放开始时刻为00:19:35,播放结束时刻为00:19:37。或者,预设的策略中也可以定义:将与环绕播放请求的接收时刻(对应视频播放时刻)间隔一定时长的视频播放时刻作为播放开始时刻,该播放开始时刻在时序上可以位于环绕播放请求的接收时刻之前,或者,该播放开始时刻在时序上也可以位于环绕播放请求的接收时刻之后。例如,环绕播放请求的接收时刻为00:19:35,播放开始时刻可以为00:19:34,或者,播放开始时刻也可以为00:19:36。
在第二种实现方式中,环绕播放请求中包括播放开始时刻和播放结束时刻。则步骤508的实现过程包括:上层设备在环绕播放请求中识别出播放开始时刻和播放结束时刻。
可选地,预先定义或预先配置环绕播放请求的指定字段用于携带播放开始时刻和播放结束时刻。其中,预先定义可以是在标准或协议中定义;预先配置可以是上层设备与终端预先协商。上层设备在接收到环绕播放请求后,可以从指定字段中识别出播放开始时刻和播放结束时刻。例如,环绕播放请求的指定字段中携带有两个时刻,分别为00:19:35和00:19:37,则上层设备确定播放开始时刻为00:19:35,播放结束时刻为00:19:37。
在第三种实现方式中,环绕播放请求中包括播放开始时刻。则步骤508的实现过程包括:上层设备根据播放开始时刻以及预设环绕播放时长,确定播放结束时刻。例如,环绕播放请求中携带的播放开始时刻为00:19:35,预设环绕播放时长为2秒,则上层设备确定播放结束时刻为00:19:37。
在第四种实现方式中,环绕播放请求中包括环绕播放时长。则步骤508的实现过程包括:上层设备根据接收到环绕播放请求的时刻以及环绕播放时长,确定播放开始时刻和播放结束时刻。该实现方式可参考上述第一种实现方式,本申请实施例在此不再赘述。
在第五种实现方式中,环绕播放请求中包括播放开始时刻和环绕播放时长。则步骤508的实现过程包括:上层设备根据播放开始时刻以及环绕播放时长,确定播放结束时刻。例如,环绕播放请求中携带的播放开始时刻为00:19:35,环绕播放时长为2秒,则上层设备确定播放结束时刻为00:19:37。
在本申请的另一个可选实施例中,环绕播放请求用于请求静态环绕播放视频内容,则播放时间信息包括目标播放时刻。可选地,环绕播放请求中包括该目标播放时刻。或者,环绕播放请求中不包括该目标播放时刻,上层设备根据接收到环绕播放请求的时刻确定目标播放时刻,上层设备确定目标播放时刻的方式可参考上述第一种实现方式中上层设备确定播放开始时刻的方式,本申请实施例在此不再赘述。
步骤509、上层设备根据旋转机位信息和播放时间信息生成旋转分片。
该旋转分片中包括旋转范围内的多个机位对应的图像帧组。可选地,该旋转分片中依次包括沿旋转方向从起始机位至终止机位的多个机位对应的图像帧组。
可选地,上层设备先根据旋转机位信息确定起始机位、终止机位和旋转方向,再沿旋转方向从起始机位至终止机位的机位中确定多个机位。
可选地,当旋转机位信息中包括起始机位的标识、终止机位的标识和旋转方向时,上层设备接收到环绕播放请求后,可以根据旋转机位信息中的内容确定起始机位、终止机位和旋转方向。当旋转机位信息中包括旋转角度时,上层设备接收到环绕播放请求后,根据起始机位和旋转角度,确定终止机位和旋转方向。例如,参考图4,假设上层设备确定的起始机位为 摄像机9,环绕播放请求中携带的旋转角度为-90°,则上层设备确定旋转方向为逆时针,终止机位为摄像机4。
可选地,上层设备确定的多个机位可以包括沿旋转方向从起始机位至终止机位的所有机位或部分机位。例如,参考图4,假设起始机位为摄像机9,终止机位为摄像机14,旋转方向为顺时针方向,则上层设备确定的多个机位依次包括摄像机9、摄像机10、摄像机11、摄像机12、摄像机13和摄像机14。或者,当环绕播放请求用于请求静态环绕播放视频内容时,上层设备确定的多个机位可以包括沿旋转方向从起始机位至终止机位的部分机位。例如,假设图3中摄像机11的拍摄区域和摄像机13的拍摄区域的并集完全覆盖摄像机12的拍摄区域,则上层设备确定的多个机位中可以不包括摄像机12的拍摄区域。在静态环绕播放摄像机9至摄像机14采集的视频画面时,由于摄像机11拍摄的视频画面和摄像机13拍摄的视频画面包含摄像机12拍摄的视频画面,因此不会导致环绕播放过程中的视频画面突变,进而可以保证环绕播放画面的流畅性。
在本申请的一个可选实施例中,环绕播放请求用于请求动态环绕播放视频内容。
在一种实现方式中,旋转分片中的图像帧组为GOP。则步骤509的实现过程包括:
在步骤5091A1中,上层设备获取多个机位中的每个机位对应的从播放开始时刻至播放结束时刻的m个视频分片,m为正整数。
例如,假设该多个机位沿旋转方向依次包括q个机位,播放开始时刻为t1,播放结束时刻为t2,q为大于0的整数,t2>t1,每个机位对应的视频流在时间段(t1,t2)包括m个视频分片。则上层设备分别获取该q个机位在时间段(t1,t2)内对应的m个视频分片。
在步骤5092A1中,上层设备根据播放时间信息,从每个机位对应的m个视频分片中提取一个或多个GOP。
可选地,上层设备根据环绕播放时长以及多个机位的数量,确定每个机位对应的GOP提取时刻以及GOP提取数量,该环绕播放时长等于播放结束时刻与播放开始时刻的差值。上层设备根据每个机位对应的GOP提取时刻以及GOP提取数量,从每个机位对应的m个视频分片中提取GOP。
沿旋转方向排布的两个机位中,前一个机位对应的GOP提取时刻在时序上位于后一个机位对应的GOP提取时刻之前。每个机位对应的GOP提取数量等于环绕播放时长与GOP的时间长度以及多个机位的数量的乘积的比值(可对该比值向上取整或向下取整)。
例如,继续参考步骤5091A中的例子,假设每个GOP的时间长度为t,每个机位对应的GOP提取数量等于(t2-t1)/(q*t)。
在步骤5093A1中,上层设备对提取的GOP进行组装,得到旋转分片。
可选地,上层设备按照旋转方向将提取的GOP依次进行组装,得到旋转分片,该旋转分片为动态旋转分片。
例如,请参考图7示出的例子,假设播放开始时刻为时段T2的起始时刻,播放结束时刻为时段T2的终止时刻,起始机位为摄像机N,终止机位为摄像机R,旋转方向为逆时针方向,各个机位对应的视频分片b包括5个GOP,每个机位对应的GOP提取数量为1,请参考图13,图13是本申请实施例提供的一种动态旋转分片的生成过程示意图。如图13所示,上层设备从摄像机N对应的视频分片b中提取的GOP为N-6,从摄像机O对应的视频分片b中提取的GOP为O-7,从摄像机P对应的视频分片b中提取的GOP为P-8,从摄像机Q对应 的视频分片b中提取的GOP为Q-9,从摄像机R对应的视频分片b中提取的GOP为R-10。然后上层设备按照旋转方向将从该5个机位对应的视频分片中提取的GOP依次进行组装,得到动态旋转分片。
在另一种实现方式中,旋转分片中的图像帧组基于插入帧生成。则步骤509的实现过程包括:
在步骤5091A2中,上层设备获取多个机位中的每个机位对应的从播放开始时刻至播放结束时刻的m个视频分片,m为正整数。
此步骤的解释可参考上述步骤5091A1,本申请实施例在此不再赘述。
在步骤5092A2中,上层设备根据播放时间信息,从每个机位对应的m个视频分片中提取一帧或多帧视频图像。
可选地,上层设备根据环绕播放时长以及多个机位的数量,确定每个机位对应的视频图像提取时刻以及视频图像提取数量,该环绕播放时长等于播放结束时刻与播放开始时刻的差值。上层设备根据每个机位对应的视频图像提取时刻以及视频图像提取数量,从每个机位对应的m个视频分片中提取视频图像。
沿旋转方向排布的两个机位中,前一个机位对应的视频图像提取时刻在时序上位于后一个机位对应的视频图像提取时刻之前。每个机位对应的视频图像提取数量等于环绕播放时长与视频图像的时间长度以及多个机位的数量的乘积的比值(可对该比值向上取整或向下取整)。
在步骤5093A2中,对于该多个机位中的每个机位,上层设备根据该机位对应的插入帧流以及提取的视频图像生成图像帧组,并对该多个机位对应的图像帧组进行组装,得到旋转分片。
可选地,上层设备按照旋转方向将生成的图像帧组依次进行组装,得到旋转分片,该旋转分片为动态旋转分片。
例如,请参考图12示出的例子,假设播放开始时刻为时段T1’的起始时刻,播放结束时刻为时段T1’的终止时刻,起始机位为摄像机N,终止机位为摄像机R,旋转方向为逆时针方向,每个机位对应的视频图像提取数量为2,请参考图14,图14是本申请实施例提供的另一种动态旋转分片的生成过程示意图。如图14所示,上层设备从摄像机N对应的GOP中提取视频图像NI和NP-0,从摄像机O对应的插入帧流中提取视频图像OP’-1并从摄像机O对应的GOP中提取视频图像OP-2,从摄像机P对应的插入帧流中提取视频图像PP’-3并从摄像机P对应的GOP中提取视频图像PP-4,从摄像机Q对应的插入帧流中提取视频图像QP’-5并从摄像机Q对应的GOP中提取视频图像QP-6,从摄像机R对应的插入帧流中提取视频图像RP’-7并从摄像机R对应的GOP中提取视频图像RP-8。其中,NI和NP-0为摄像机N对应的图像帧组,OP’-1和OP-2为摄像机O对应的图像帧组,PP’-3和PP-4为摄像机P对应的图像帧组,QP’-5和QP-6为摄像机Q对应的图像帧组,RP’-7和RP-8为摄像机R对应的图像帧组。上层设备按照旋转方向将从该5个机位对应的图像帧组依次进行组装,得到动态旋转分片。
在本申请的另一个可选实施例中,环绕播放请求用于请求静态环绕播放视频内容。
在一种实现方式中,旋转分片中的图像帧组为GOP,每个GOP包括一帧视频图像。则步骤509的实现过程包括:
在步骤5091B1中,上层设备获取多个机位中的每个机位对应的目标视频分片,该目标视 频分片对应的时间段包含目标播放时刻。
该目标视频分片对应的时间段包含目标播放时刻,指该目标播放时刻位于目标视频分片的起始时刻和结束时刻之间。
在步骤5092B1中,上层设备从每个机位对应的目标视频分片中,提取目标播放时刻对应的一个GOP。
该目标播放时刻对应的一个GOP,指该GOP中的视频图像的采集时刻为目标播放时刻。
在步骤5093B1中,上层设备对提取的GOP进行组装,得到旋转分片。
可选地,上层设备按照旋转方向将提取的GOP依次进行组装,得到旋转分片,该旋转分片为静态旋转分片。
例如,请参考图7示出的例子,起始机位为摄像机N,终止机位为摄像机R,目标播放时刻播放的是摄像机M对应的视频分片b中的GOP M-7,旋转方向为逆时针方向,请参考图15,图15是本申请实施例提供的一种静态旋转分片的生成过程示意图。如图15所示,上层设备从摄像机N对应的视频分片b中提取的GOP为N-7,从摄像机O对应的视频分片b中提取的GOP为O-7,从摄像机P对应的视频分片b中提取的GOP为P-7,从摄像机Q对应的视频分片b中提取的GOP为Q-7,从摄像机R对应的视频分片b中提取的GOP为R-7。然后上层设备按照旋转方向将从该5个机位对应的视频分片中提取的GOP依次进行组装,得到静态旋转分片。
在另一种实现方式中,旋转分片中的图像帧组基于插入帧生成。则步骤509的实现过程包括:
在步骤5091B2中,上层设备获取多个机位中的每个机位对应的目标视频分片,该目标视频分片对应的时间段包含目标播放时刻。
该目标视频分片对应的时间段包含目标播放时刻,指该目标播放时刻位于目标视频分片的起始时刻和结束时刻之间。
在步骤5092B2中,上层设备从每个机位对应的目标视频分片中,提取目标播放时刻对应的一帧视频图像。
该目标播放时刻对应的一帧视频图像,指该视频图像的采集时刻为目标播放时刻。
在步骤5093B2中,对于该多个机位中的每个机位,上层设备根据该机位对应的插入帧流以及提取的视频图像生成图像帧组,并对该多个机位对应的图像帧组进行组装,得到旋转分片。
其中,若提取的视频图像为I帧,则图像帧组包括该I帧。若提取的视频图像不为I帧且该视频图像对应有插入帧,则图像帧组包括该视频图像对应的插入帧。若提取的视频图像不为I帧且该视频图像没有对应的插入帧,则图像帧组包括该视频图像以及该视频图像解码所依赖的I帧或插入帧。
可选地,上层设备按照旋转方向将生成的图像帧组依次进行组装,得到旋转分片,该旋转分片为静态旋转分片。
例如,请参考图12示出的例子,起始机位为摄像机N,终止机位为摄像机R,目标播放时刻播放的是摄像机M对应的GOP中的MP-1,请参考图16,图16是本申请实施例提供的另一种静态旋转分片的生成过程示意图。如图16所示,上层设备从摄像机N对应的插入帧流中提取插入帧NP’-1,从摄像机O对应的插入帧流中提取插入帧OP’-1,从摄像机P对应 的插入帧流中提取插入帧PP’-1,从摄像机Q对应的插入帧流中提取插入帧QP’-1,从摄像机R对应的插入帧流中提取插入帧RP’-1。上层设备按照旋转方向将从该5个机位对应的视频图像依次进行组装,得到静态旋转分片。
可选地,旋转分片包含的图像帧组数量与其它视频分片包含的图像帧组数量可以相同,也可以不同,例如旋转分片包含的图像帧组数量可以少于或多于其它视频分片包含的图像帧组数量,本申请实施例对此不做限定。
可选地,当上层设备为网络设备时,上层设备接收到环绕播放请求后,先从视频服务器下载媒体内容索引,通过解析该媒体内容索引得到视频流的信息。上层设备从媒体内容索引中提取多个机位中每个机位对应的视频流的URL地址,然后通过视频流的URL分别获取对应的视频分片。
步骤510、上层设备向终端发送旋转分片。
可选地,当环绕播放请求用于请求动态环绕播放视频内容时,上层设备向终端发送旋转分片后,继续向终端发送终止机位对应的视频分片,使得终端能够流畅地从起始机位对应的播放画面切换至终止机位对应的播放画面。或者,当环绕播放请求用于请求静态环绕播放视频内容时,上层设备向终端发送旋转分片后,停止向终端发送视频数据。
步骤511、终端对旋转分片进行解码播放。
终端对旋转分片进行解码播放,能够实现对沿旋转方向从起始机位起至终止机位中的多个机位对应的视频画面的环绕播放。其中,终端播放的视频画面的分辨率可以与旋转分片中的视频图像的分辨率相同。
本申请实施例提供的方法的步骤的先后顺序能够进行适当调整,例如,步骤506和步骤507可以与步骤505同时执行,即终端接收到旋转指令后,可以在基于预先获取的旋转视频数据播放视频画面的同时,生成并向上层设备发送环绕播放请求。步骤也能够根据情况进行相应增减。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
综上所述,在本申请实施例提供的视频播放方法中,上层设备向终端发送该终端所请求播放的目标机位对应的视频分片以及该目标机位对应的旋转视频数据,终端在接收到目标机位对应的视频分片后,对该视频分片进行解码即可实现对该目标机位所采集的视频画面的播放;当终端接收到旋转指令时,可以根据预先获取的旋转视频数据实现对视频画面的环绕播放,环绕播放时延较低,且播放的视频画面的分辨率可以与视频分片中的视频图像或旋转视频数据中的视频图像的分辨率相同。因此本申请实施例提供的视频播放方法不受限于前端拍摄所采用的相机数量,应用范围广。另外,与相关技术相比,上层设备无需始终向终端发送所有相机所采集的视频画面,可以减少数据传输量,节约传输资源。另外,旋转分片可以基于插入帧生成,此时上层设备向终端发送的视频分片中无需使用全I帧或mini GOP,而可以使用正常GOP,能够降低上层设备向终端发送的视频分片的数据量;并且,插入帧的数据量通常小于I帧的数据量,能够降低上层设备向终端发送的旋转分片的数据量,因此利用插入帧技术生成旋转分片,可以有效减少网络传输资源的消耗。
图17是本申请实施例提供的一种视频播放装置的结构示意图。该装置应用于上层设备,例如,该上层设备可以是如图1所示的视频播放系统中的视频服务器或网络设备。如图17所 示,该装置170包括:
接收模块1701,用于接收终端发送的播放请求,播放请求中包括播放机位信息,播放机位信息用于指示所请求播放的目标机位。
发送模块1702,用于向终端发送目标机位对应的视频分片以及目标机位对应的旋转视频数据,旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,正向机位包括位于目标机位的顺时针方向的一个或多个第一机位,逆向机位包括位于目标机位的逆时针方向的一个或多个第二机位。
可选地,发送模块1702,用于:响应于上层设备接收到终端发送的旋转预备请求,上层设备向终端发送旋转视频数据,旋转预备请求用于请求获取目标机位对应的旋转视频数据。或者,响应于播放请求,向终端发送旋转视频数据。
可选地,正向机位对应的视频数据包括每个第一机位对应的视频分片;或者,正向机位包括位于目标机位的顺时针方向的多个第一机位,正向机位对应的视频数据为正向旋转分片,正向旋转分片包括正向动态旋转子分片和/或正向静态旋转子分片,其中,正向动态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向动态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,正向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列;正向静态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,正向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,正向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
可选地,逆向机位对应的视频数据包括每个第二机位对应的视频分片;或者,逆向机位包括位于目标机位的逆时针方向的多个第二机位,逆向机位对应的视频数据为逆向旋转分片,逆向旋转分片包括逆向动态旋转子分片和/或逆向静态旋转子分片,其中,逆向动态旋转子分片包括基于多个第二机位对应的视频分片中的视频图像得到的多个图像帧组,逆向动态旋转子分片中的每个图像帧组基于一个第二机位对应的视频分片中的视频图像得到,逆向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个第二机位在逆时针方向上到目标机位的距离由近至远依次排列;逆向静态旋转子分片包括基于多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,逆向静态旋转子分片中的每个图像帧组基于一个第一机位对应的视频分片中的视频图像得到,逆向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个第一机位在顺时针方向上到目标机位的距离由近至远依次排列。
其中,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。
可选地,请继续参见图17,该装置170还包括:处理模块1703。
接收模块1701,还用于接收终端发送的环绕播放请求,环绕播放请求中包括旋转机位信息,旋转机位信息用于指示旋转范围。处理模块1703,用于基于环绕播放请求确定播放时间信息;并根据旋转机位信息和播放时间信息生成旋转分片,旋转分片中包括旋转范围内的多个机位对应的图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。发送模块1702,还用于向终端发送旋转分片。
可选地,图像帧组为GOP;或者,图像帧组包括插入帧;或者,图像帧组包括插入帧和P帧的组合;或者,图像帧组包括插入帧、P帧和B帧的组合。
可选地,旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,预备旋转状态包括动态旋转状态和/或静态旋转状态,旋转预备请求中的内容是终端中预先配置的。
图18是本申请实施例提供的另一种视频播放装置的结构示意图。该装置应用于终端,例如,该装置可以是如图1所示的视频播放系统中的终端103。如图18所示,该装置180包括:
发送模块1801,用于当终端接收到播放指令时,向上层设备发送基于播放指令生成的播放请求,播放请求中包括播放机位信息,播放机位信息用于指示所请求播放的目标机位。
接收模块1802,用于接收上层设备发送的目标机位对应的视频分片以及目标机位对应的旋转视频数据,旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,正向机位包括位于目标机位的顺时针方向的一个或多个第一机位,逆向机位包括位于目标机位的逆时针方向的一个或多个第二机位。
处理模块1803,用于当终端在基于目标机位对应的视频分片播放视频画面的过程中,接收到旋转指令时,根据旋转指令确定旋转方向,旋转方向为顺时针方向或逆时针方向。
播放模块1804,用于响应于旋转视频数据包括位于目标机位的旋转方向上的机位对应的目标视频数据,基于目标视频数据播放视频画面。
可选地,处理模块1803,还用于生成旋转预备请求,旋转预备请求用于请求获取目标机位对应的旋转视频数据。发送模块1801,还用于向上层设备发送旋转预备请求,目标机位对应的旋转视频数据是上层设备响应于旋转预备请求发送的。
可选地,旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,预备旋转状态包括动态旋转状态和/或静态旋转状态,旋转预备请求中的内容是终端中预先配置的。
可选地,目标视频数据为目标旋转分片,目标旋转分片包括基于位于目标机位的旋转方向的多个机位对应的视频分片中的视频图像得到的多个图像帧组,目标旋转分片中的每个图像帧组基于位于目标机位的旋转方向的一个机位对应的视频分片中的视频图像得到,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。其中,响应于终端在视频播放状态下接收到旋转指令,目标旋转分片包括动态旋转子分片,动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照多个机位在旋转方向上到目标机位的距离由近至远依次排列;或者,响应于终端在视频暂停播放状态下接收到旋转指令,目标旋转分片包括静态旋转子分片,静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照多个机位在旋转方向上到目标机位的距离由近至远依次排列。播放模块1804,用于对目标旋转分片进行解码播放。
可选地,目标视频数据包括位于目标机位的旋转方向的多个机位分别对应的视频分片。播放模块1804,用于:基于多个机位对应的视频分片中的视频图像分别生成图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。按照多个机位在旋转方向上到目标机位的距离由近至远的顺序,依次播放生成的图像帧组中的视频图像。
可选地,发送模块1801,还用于当终端接收到旋转指令时,向上层设备发送基于旋转指令生成的环绕播放请求,环绕播放请求中包括旋转机位信息,旋转机位信息用于指示旋转范围。接收模块1802,还用于接收上层设备发送的旋转分片,旋转分片中包括旋转范围内的多 个机位对应的图像帧组,图像帧组包括一帧或多帧视频图像,每个图像帧组可被独立解码。播放模块1804,还用于对旋转分片进行解码播放。
可选地,图像帧组为GOP;或者,图像帧组包括插入帧;或者,图像帧组包括插入帧和P帧的组合;或者,图像帧组包括插入帧、P帧和B帧的组合。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本申请实施例还提供了一种视频播放系统,该系统包括:上层设备和终端。上层设备包括如图17所示的视频播放装置,终端包括如图18所示的视频播放装置。
图19是本申请实施例提供的一种视频播放装置的框图。该视频播放装置可以是上层设备或终端,上层设备可以是视频服务器或网络设备,终端可以是手机、平板电脑、智能可穿戴设备或机顶盒等。如图19所示,该视频播放装置190包括:处理器1901和存储器1902。
存储器1902,用于存储计算机程序,所述计算机程序包括程序指令;
处理器1901,用于调用所述计算机程序,实现如图5所示的视频播放方法中上层设备执行的动作或终端执行的动作。
可选地,该视频播放装置190还包括通信总线1903和通信接口1904。
其中,处理器1901包括一个或者一个以上处理核心,处理器1901通过运行计算机程序,执行各种功能应用以及数据处理。
存储器1902可用于存储计算机程序。可选地,存储器可存储操作系统和至少一个功能所需的应用程序单元。操作系统可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。
通信接口1904可以为多个,通信接口1904用于与其它存储设备或网络设备进行通信。例如在本申请实施例中,上层设备的通信接口可以用于向终端发送旋转分片,终端的通信接口可以用于向上层设备发送环绕播放请求。网络设备可以是交换机或路由器等。
存储器1902与通信接口1904分别通过通信总线1903与处理器1901连接。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述指令被计算机设备的处理器执行时,实现如上述方法实施例所述的视频播放方法中上层设备执行的动作或者终端执行的动作。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本申请实施例中,术语“第一”、“第二”和“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的构思和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (30)

  1. 一种视频播放方法,其特征在于,所述方法包括:
    上层设备接收终端发送的播放请求,所述播放请求中包括播放机位信息,所述播放机位信息用于指示所请求播放的目标机位;
    所述上层设备向所述终端发送所述目标机位对应的视频分片以及所述目标机位对应的旋转视频数据,所述旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,所述正向机位包括位于所述目标机位的顺时针方向的一个或多个第一机位,所述逆向机位包括位于所述目标机位的逆时针方向的一个或多个第二机位。
  2. 根据权利要求1所述的方法,其特征在于,所述上层设备向所述终端发送所述目标机位对应的旋转视频数据,包括:
    响应于所述上层设备接收到所述终端发送的旋转预备请求,所述上层设备向所述终端发送所述旋转视频数据,所述旋转预备请求用于请求获取所述目标机位对应的旋转视频数据;
    或者,
    响应于所述播放请求,所述上层设备向所述终端发送所述旋转视频数据。
  3. 根据权利要求1或2所述的方法,其特征在于,
    所述正向机位对应的视频数据包括每个所述第一机位对应的视频分片;或者,所述正向机位包括位于所述目标机位的顺时针方向的多个第一机位,所述正向机位对应的视频数据为正向旋转分片,所述正向旋转分片包括正向动态旋转子分片和/或正向静态旋转子分片,其中,所述正向动态旋转子分片或所述正向静态旋转子分片包括基于所述多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,所述正向动态旋转子分片或所述正向静态旋转子分片中的每个图像帧组基于一个所述第一机位对应的视频分片中的视频图像得到;所述正向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;所述正向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;
    所述逆向机位对应的视频数据包括每个所述第二机位对应的视频分片;或者,所述逆向机位包括位于所述目标机位的逆时针方向的多个第二机位,所述逆向机位对应的视频数据为逆向旋转分片,所述逆向旋转分片包括逆向动态旋转子分片和/或逆向静态旋转子分片,其中,所述逆向动态旋转子分片或所述逆向静态旋转子分片包括基于所述多个第二机位对应的视频分片中的视频图像得到的多个图像帧组,所述逆向动态旋转子分片或所述逆向静态旋转子分片中的每个图像帧组基于一个所述第二机位对应的视频分片中的视频图像得到;所述逆向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个第二机位在逆时针方向上到所述目标机位的距离由近至远依次排列;所述逆向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;
    其中,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述方法还包括:
    所述上层设备接收所述终端发送的环绕播放请求,所述环绕播放请求中包括旋转机位信息,所述旋转机位信息用于指示旋转范围;
    所述上层设备基于所述环绕播放请求确定播放时间信息;
    所述上层设备根据所述旋转机位信息和所述播放时间信息生成旋转分片,所述旋转分片中包括所述旋转范围内的多个机位对应的图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    所述上层设备向所述终端发送所述旋转分片。
  5. 根据权利要求3或4所述的方法,其特征在于,所述图像帧组为图像组GOP;或者,所述图像帧组包括插入帧;或者,所述图像帧组包括插入帧和P帧的组合;或者,所述图像帧组包括插入帧、P帧和B帧的组合。
  6. 根据权利要求2所述的方法,其特征在于,所述旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,所述预备旋转状态包括动态旋转状态和/或静态旋转状态,所述旋转预备请求中的内容是所述终端中预先配置的。
  7. 一种视频播放方法,其特征在于,所述方法包括:
    当终端接收到播放指令时,所述终端向上层设备发送基于所述播放指令生成的播放请求,所述播放请求中包括播放机位信息,所述播放机位信息用于指示所请求播放的目标机位;
    所述终端接收所述上层设备发送的所述目标机位对应的视频分片以及所述目标机位对应的旋转视频数据,所述旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,所述正向机位包括位于所述目标机位的顺时针方向的一个或多个第一机位,所述逆向机位包括位于所述目标机位的逆时针方向的一个或多个第二机位;
    当所述终端在基于所述目标机位对应的视频分片播放视频画面的过程中,接收到旋转指令时,所述终端根据所述旋转指令确定旋转方向,所述旋转方向为顺时针方向或逆时针方向;
    响应于所述旋转视频数据包括位于所述目标机位的所述旋转方向上的机位对应的目标视频数据,所述终端基于所述目标视频数据播放视频画面。
  8. 根据权利要求7所述的方法,其特征在于,在所述终端接收到所述旋转指令之前,所述方法还包括:
    所述终端生成旋转预备请求,所述旋转预备请求用于请求获取所述目标机位对应的旋转视频数据;
    所述终端向所述上层设备发送所述旋转预备请求,所述目标机位对应的旋转视频数据是所述上层设备响应于所述旋转预备请求发送的。
  9. 根据权利要求8所述的方法,其特征在于,所述旋转预备请求包括预备旋转方向、预备 旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,所述预备旋转状态包括动态旋转状态和/或静态旋转状态,所述旋转预备请求中的内容是所述终端中预先配置的。
  10. 根据权利要求7至9任一所述的方法,其特征在于,所述目标视频数据为目标旋转分片,所述目标旋转分片包括基于位于所述目标机位的所述旋转方向的多个机位对应的视频分片中的视频图像得到的多个图像帧组,所述目标旋转分片中的每个图像帧组基于位于所述目标机位的所述旋转方向的一个机位对应的视频分片中的视频图像得到,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    其中,响应于所述终端在视频播放状态下接收到所述旋转指令,所述目标旋转分片包括动态旋转子分片,所述动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远依次排列;或者,响应于所述终端在视频暂停播放状态下接收到所述旋转指令,所述目标旋转分片包括静态旋转子分片,所述静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远依次排列;
    所述终端基于所述目标视频数据播放视频画面,包括:
    所述终端对所述目标旋转分片进行解码播放。
  11. 根据权利要求7至9任一所述的方法,其特征在于,所述目标视频数据包括位于所述目标机位的所述旋转方向的多个机位分别对应的视频分片;
    所述终端基于所述目标视频数据播放视频画面,包括:
    所述终端基于所述多个机位对应的视频分片中的视频图像分别生成图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    所述终端按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远的顺序,依次播放生成的所述图像帧组中的视频图像。
  12. 根据权利要求7至11任一所述的方法,其特征在于,所述方法还包括:
    当所述终端接收到所述旋转指令时,所述终端向所述上层设备发送基于所述旋转指令生成的环绕播放请求,所述环绕播放请求中包括旋转机位信息,所述旋转机位信息用于指示旋转范围;
    所述终端接收所述上层设备发送的旋转分片,所述旋转分片中包括所述旋转范围内的多个机位对应的图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    所述终端对所述旋转分片进行解码播放。
  13. 根据权利要求10至12任一所述的方法,其特征在于,所述图像帧组为图像组GOP;或者,所述图像帧组包括插入帧;或者,所述图像帧组包括插入帧和P帧的组合;或者,所述图像帧组包括插入帧、P帧和B帧的组合。
  14. 一种视频播放装置,其特征在于,应用于上层设备,所述装置包括:
    接收模块,用于接收终端发送的播放请求,所述播放请求中包括播放机位信息,所述播放机位信息用于指示所请求播放的目标机位;
    发送模块,用于向所述终端发送所述目标机位对应的视频分片以及所述目标机位对应的旋转视频数据,所述旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,所述正向机位包括位于所述目标机位的顺时针方向的一个或多个第一机位,所述逆向机位包括位于所述目标机位的逆时针方向的一个或多个第二机位。
  15. 根据权利要求14所述的装置,其特征在于,所述发送模块,用于:
    响应于所述上层设备接收到所述终端发送的旋转预备请求,向所述终端发送所述旋转视频数据,所述旋转预备请求用于请求获取所述目标机位对应的旋转视频数据;
    或者,响应于所述播放请求,向所述终端发送所述旋转视频数据。
  16. 根据权利要求14或15所述的装置,其特征在于,
    所述正向机位对应的视频数据包括每个所述第一机位对应的视频分片;或者,所述正向机位包括位于所述目标机位的顺时针方向的多个第一机位,所述正向机位对应的视频数据为正向旋转分片,所述正向旋转分片包括正向动态旋转子分片和/或正向静态旋转子分片,其中,所述正向动态旋转子分片或所述正向静态旋转子分片包括基于所述多个第一机位对应的视频分片中的视频图像得到的多个图像帧组,所述正向动态旋转子分片或所述正向静态旋转子分片中的每个图像帧组基于一个所述第一机位对应的视频分片中的视频图像得到;所述正向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;所述正向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;
    所述逆向机位对应的视频数据包括每个所述第二机位对应的视频分片;或者,所述逆向机位包括位于所述目标机位的逆时针方向的多个第二机位,所述逆向机位对应的视频数据为逆向旋转分片,所述逆向旋转分片包括逆向动态旋转子分片和/或逆向静态旋转子分片,其中,所述逆向动态旋转子分片或所述逆向静态旋转子分片包括基于所述多个第二机位对应的视频分片中的视频图像得到的多个图像帧组,所述逆向动态旋转子分片或所述逆向静态旋转子分片中的每个图像帧组基于一个所述第二机位对应的视频分片中的视频图像得到,所述逆向动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个第二机位在逆时针方向上到所述目标机位的距离由近至远依次排列;所述逆向静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个第一机位在顺时针方向上到所述目标机位的距离由近至远依次排列;
    其中,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码。
  17. 根据权利要求14至16任一所述的装置,其特征在于,所述装置还包括:处理模块;
    所述接收模块,还用于接收所述终端发送的环绕播放请求,所述环绕播放请求中包括旋转机位信息,所述旋转机位信息用于指示旋转范围;
    所述处理模块,用于基于所述环绕播放请求确定播放时间信息;并根据所述旋转机位信 息和所述播放时间信息生成旋转分片,所述旋转分片中包括所述旋转范围内的多个机位对应的图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    所述发送模块,还用于向所述终端发送所述旋转分片。
  18. 根据权利要求16或17所述的装置,其特征在于,所述图像帧组为图像组GOP;或者,所述图像帧组包括插入帧;或者,所述图像帧组包括插入帧和P帧的组合;或者,所述图像帧组包括插入帧、P帧和B帧的组合。
  19. 根据权利要求15所述的装置,其特征在于,所述旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,所述预备旋转状态包括动态旋转状态和/或静态旋转状态,所述旋转预备请求中的内容是所述终端中预先配置的。
  20. 一种视频播放装置,其特征在于,应用于终端,所述装置包括:
    发送模块,用于当所述终端接收到播放指令时,向上层设备发送基于所述播放指令生成的播放请求,所述播放请求中包括播放机位信息,所述播放机位信息用于指示所请求播放的目标机位;
    接收模块,用于接收所述上层设备发送的所述目标机位对应的视频分片以及所述目标机位对应的旋转视频数据,所述旋转视频数据包括正向机位对应的视频数据和/或逆向机位对应的视频数据,所述正向机位包括位于所述目标机位的顺时针方向的一个或多个第一机位,所述逆向机位包括位于所述目标机位的逆时针方向的一个或多个第二机位;
    处理模块,用于当所述终端在基于所述目标机位对应的视频分片播放视频画面的过程中,接收到旋转指令时,根据所述旋转指令确定旋转方向,所述旋转方向为顺时针方向或逆时针方向;
    播放模块,用于响应于所述旋转视频数据包括位于所述目标机位的所述旋转方向上的机位对应的目标视频数据,基于所述目标视频数据播放视频画面。
  21. 根据权利要求20所述的装置,其特征在于,
    所述处理模块,还用于在所述终端接收到所述旋转指令之前,生成旋转预备请求,所述旋转预备请求用于请求获取所述目标机位对应的旋转视频数据;
    所述发送模块,还用于向所述上层设备发送所述旋转预备请求,所述目标机位对应的旋转视频数据是所述上层设备响应于所述旋转预备请求发送的。
  22. 根据权利要求21所述的装置,其特征在于,所述旋转预备请求包括预备旋转方向、预备旋转机位的数量、预备旋转机位的标识或预备旋转状态中的一个或多个,所述预备旋转状态包括动态旋转状态和/或静态旋转状态,所述旋转预备请求中的内容是所述终端中预先配置的。
  23. 根据权利要求20至22任一所述的装置,其特征在于,所述目标视频数据为目标旋转 分片,所述目标旋转分片包括基于位于所述目标机位的所述旋转方向的多个机位对应的视频分片中的视频图像得到的多个图像帧组,所述目标旋转分片中的每个图像帧组基于位于所述目标机位的所述旋转方向的一个机位对应的视频分片中的视频图像得到,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    其中,响应于所述终端在视频播放状态下接收到所述旋转指令,所述目标旋转分片包括动态旋转子分片,所述动态旋转子分片中的多个图像帧组按照时间先后顺序排列,且按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远依次排列;或者,响应于所述终端在视频暂停播放状态下接收到所述旋转指令,所述目标旋转分片包括静态旋转子分片,所述静态旋转子分片中的多个图像帧组对应的播放时段相同,且按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远依次排列;
    所述播放模块,用于对所述目标旋转分片进行解码播放。
  24. 根据权利要求20至22任一所述的装置,其特征在于,所述目标视频数据包括位于所述目标机位的所述旋转方向的多个机位分别对应的视频分片;所述播放模块,用于:
    基于所述多个机位对应的视频分片中的视频图像分别生成图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    按照所述多个机位在所述旋转方向上到所述目标机位的距离由近至远的顺序,依次播放生成的所述图像帧组中的视频图像。
  25. 根据权利要求20至24任一所述的装置,其特征在于,
    所述发送模块,还用于当所述终端接收到所述旋转指令时,向所述上层设备发送基于所述旋转指令生成的环绕播放请求,所述环绕播放请求中包括旋转机位信息,所述旋转机位信息用于指示旋转范围;
    所述接收模块,还用于接收所述上层设备发送的旋转分片,所述旋转分片中包括所述旋转范围内的多个机位对应的图像帧组,所述图像帧组包括一帧或多帧视频图像,每个所述图像帧组可被独立解码;
    所述播放模块,还用于对所述旋转分片进行解码播放。
  26. 根据权利要求23至25任一所述的装置,其特征在于,所述图像帧组为图像组GOP;或者,所述图像帧组包括插入帧;或者,所述图像帧组包括插入帧和P帧的组合;或者,所述图像帧组包括插入帧、P帧和B帧的组合。
  27. 一种视频播放系统,其特征在于,所述系统包括:上层设备和终端,所述上层设备包括如权利要求14至19任一所述的视频播放装置,所述终端包括如权利要求20至26任一所述的视频播放装置。
  28. 一种视频播放装置,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器,用于调用所述计算机程序,实现如权利要求1至6任一所述的视频播放方 法;或者,实现如权利要求7至13任一所述的视频播放方法。
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有指令,当所述指令被计算机设备的处理器执行时,实现如权利要求1至13任一所述的视频播放方法。
  30. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序被处理器执行时,实现如权利要求1至13任一所述的视频播放方法。
PCT/CN2021/141641 2021-04-22 2021-12-27 视频播放方法、装置及系统、计算机可读存储介质 WO2022222533A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21937755.3A EP4319168A1 (en) 2021-04-22 2021-12-27 Video playing method, apparatus and system, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110435658.5A CN115243076A (zh) 2021-04-22 2021-04-22 视频播放方法、装置及系统、计算机可读存储介质
CN202110435658.5 2021-04-22

Publications (1)

Publication Number Publication Date
WO2022222533A1 true WO2022222533A1 (zh) 2022-10-27

Family

ID=83666937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141641 WO2022222533A1 (zh) 2021-04-22 2021-12-27 视频播放方法、装置及系统、计算机可读存储介质

Country Status (3)

Country Link
EP (1) EP4319168A1 (zh)
CN (1) CN115243076A (zh)
WO (1) WO2022222533A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107396085A (zh) * 2017-08-24 2017-11-24 三星电子(中国)研发中心 一种全视点视频图像的处理方法及系统
WO2019004498A1 (ko) * 2017-06-29 2019-01-03 포디리플레이 인코포레이티드 다채널 영상 생성 방법, 다채널 영상 재생 방법 및 다채널 영상 재생 프로그램
CN109996110A (zh) * 2017-12-29 2019-07-09 中兴通讯股份有限公司 一种视频播放方法、终端、服务器及存储介质
CN111447503A (zh) * 2020-04-26 2020-07-24 烽火通信科技股份有限公司 一种多视点视频的视点切换方法、服务器和系统
CN111447457A (zh) * 2020-03-25 2020-07-24 咪咕文化科技有限公司 直播视频处理方法、装置及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019004498A1 (ko) * 2017-06-29 2019-01-03 포디리플레이 인코포레이티드 다채널 영상 생성 방법, 다채널 영상 재생 방법 및 다채널 영상 재생 프로그램
CN107396085A (zh) * 2017-08-24 2017-11-24 三星电子(中国)研发中心 一种全视点视频图像的处理方法及系统
CN109996110A (zh) * 2017-12-29 2019-07-09 中兴通讯股份有限公司 一种视频播放方法、终端、服务器及存储介质
CN111447457A (zh) * 2020-03-25 2020-07-24 咪咕文化科技有限公司 直播视频处理方法、装置及存储介质
CN111447503A (zh) * 2020-04-26 2020-07-24 烽火通信科技股份有限公司 一种多视点视频的视点切换方法、服务器和系统

Also Published As

Publication number Publication date
CN115243076A (zh) 2022-10-25
EP4319168A1 (en) 2024-02-07

Similar Documents

Publication Publication Date Title
JP7256212B2 (ja) 360°没入型ビデオを提供するためのタイル選択および帯域幅最適化
CN112740711A (zh) 用于配置为支持多个360度视频会话的网络中的带宽优化的系统和方法
US10440416B1 (en) System and method for providing quality control in 360° immersive video during pause
JP2018186524A (ja) コンテンツ送信装置およびコンテンツ再生装置
US10958972B2 (en) Channel change method and apparatus
JP2020519094A (ja) ビデオ再生方法、デバイス、およびシステム
US10791366B2 (en) Fast channel change in a video delivery network
CN117581552A (zh) 在传输预创作视频帧和合成视频帧之间切换
US20180063590A1 (en) Systems and Methods for Encoding and Playing Back 360° View Video Content
US20130070839A1 (en) Statistical multiplexing of streaming media
CN110324580B (zh) 一种基于视联网的监控视频播放方法及装置
US20230045876A1 (en) Video Playing Method, Apparatus, and System, and Computer Storage Medium
CN106791860B (zh) 一种自适应视频编码控制系统及方法
JP2023171661A (ja) タイルベースの没入型ビデオをエンコードするためのエンコーダおよび方法
EP2781088A1 (en) Reducing amount op data in video encoding
WO2022222533A1 (zh) 视频播放方法、装置及系统、计算机可读存储介质
CN115174942A (zh) 一种自由视角切换方法及交互式自由视角播放系统
US11418560B1 (en) Media and application aware network architecture
WO2020152045A1 (en) A client and a method for managing, at the client, a streaming session of a multimedia content.
WO2009080114A1 (en) Method and apparatus for distributing media over a communications network
CN112291577B (zh) 直播视频的发送方法和装置、存储介质、电子装置
WO2022100742A1 (zh) 视频编码及视频播放方法、装置和系统
EP3493552A1 (en) Method for managing a streaming processing of a multimedia video spatially tiled stored on a network equipment, and corresponding terminal
Phillips et al. System and method for encoding 360 immersive video
CN114513658A (zh) 一种视频加载方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21937755

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021937755

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021937755

Country of ref document: EP

Effective date: 20231026

NENP Non-entry into the national phase

Ref country code: DE