WO2018196530A1 - 一种视频信息处理方法及终端、计算机存储介质 - Google Patents

一种视频信息处理方法及终端、计算机存储介质 Download PDF

Info

Publication number
WO2018196530A1
WO2018196530A1 PCT/CN2018/080579 CN2018080579W WO2018196530A1 WO 2018196530 A1 WO2018196530 A1 WO 2018196530A1 CN 2018080579 W CN2018080579 W CN 2018080579W WO 2018196530 A1 WO2018196530 A1 WO 2018196530A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
frame
video
sub
sub video
Prior art date
Application number
PCT/CN2018/080579
Other languages
English (en)
French (fr)
Inventor
杨玉坤
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018196530A1 publication Critical patent/WO2018196530A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection

Definitions

  • the present invention relates to information processing technologies, and in particular, to a video information processing method, a terminal, and a computer storage medium.
  • the embodiments of the present invention provide a video information processing method, a terminal, and a computer storage medium, which at least solve the problems existing in the prior art.
  • a video information processing method includes:
  • Obtaining a video frame dividing the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy;
  • the sub video frame is decoded according to the decoding strategy.
  • a dividing unit configured to acquire a video frame, and divide the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy
  • a detecting unit configured to detect a spatial angle formed by a current line of sight of the human eye acting on a display area of the video frame
  • a first processing unit configured to locate a target area locked by the current line of sight in the display area according to the angle
  • a second processing unit configured to acquire a sub video frame corresponding to the target area
  • a decoding unit configured to decode the sub video frame according to the decoding strategy.
  • the terminal includes: a processor and a memory for storing a computer program executable on the processor; wherein the processor is configured to execute the foregoing program when the computer program is executed A method of processing video information as described.
  • a computer storage medium storing the computer executable instructions for executing the video information processing method according to any one of the above aspects.
  • a video information processing method where the method is performed by a terminal, the terminal includes one or more processors and a memory, and one or more programs, wherein the one or more programs
  • the program is stored in a memory, and the program can include one or more units each corresponding to a set of instructions, the one or more processors being configured to execute the instructions; the method comprising:
  • Obtaining a video frame dividing the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy;
  • the sub video frame is decoded according to the decoding strategy.
  • the target area is locked by angle detection and angular positioning, and a sub-video frame corresponding to the target area is obtained. Since the sub video frame is a partial image in all the images in the video frame, decoding of the sub video frame instead of decoding all the video improves the decoding efficiency, and the decoding efficiency is improved, and the image quality is improved. The clarity of the image quality is guaranteed and greatly improved.
  • FIG. 1 is a schematic diagram of an optional hardware structure of a mobile terminal implementing various embodiments of the present invention
  • FIG. 2 is a schematic diagram of hardware entities of each party performing information interaction according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of an implementation process of a method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a system architecture according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a video frame in an application scenario according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a scene for rendering an image by using VR technology according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of still another scenario for rendering an image by using VR technology according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of still another scenario for rendering an image by using VR technology according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of still another scenario for rendering an image by using VR technology according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of still another scenario for rendering an image by using VR technology according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of video partitioning of an application scenario to which an embodiment of the present invention is applied.
  • FIG. 12 is a schematic diagram of video partitioning in still another application scenario to which an embodiment of the present invention is applied.
  • FIG. 13 is a schematic diagram of video partitioning in still another application scenario to which an embodiment of the present invention is applied;
  • FIG. 14 is a schematic structural diagram of hardware of a terminal according to an embodiment of the present invention.
  • module A mobile terminal embodying various embodiments of the present invention will now be described with reference to the accompanying drawings.
  • suffixes such as “module,” “component,” or “unit” used to denote an element are merely illustrative of the embodiments of the present invention, and do not have a specific meaning per se. Therefore, “module” and “component” can be used in combination.
  • first, second, etc. are used herein to describe various elements (or various thresholds or various applications or various instructions or various operations), etc., these elements (or thresholds) Or application or instruction or operation) should not be limited by these terms. These terms are only used to distinguish one element (or threshold or application or instruction or operation) and another element (or threshold or application or instruction or operation).
  • first operation may be referred to as a second operation
  • second operation may also be referred to as a first operation
  • the first operation and the second operation are both operations, but the two are not the same The operation is only.
  • the steps in the embodiment of the present invention are not necessarily processed in the order of the steps described.
  • the steps may be selectively arranged to be reordered according to requirements, or the steps in the embodiment may be deleted, or the steps in the embodiment may be added.
  • the description of the steps in the embodiments of the present invention is only an optional combination of the steps, and does not represent a combination of the steps of the embodiments of the present invention.
  • the order of the steps in the embodiments is not to be construed as limiting the present invention.
  • the intelligent terminal (such as a mobile terminal) of the embodiment of the present invention can be implemented in various forms.
  • the mobile terminal described in the embodiments of the present invention may include, for example, a mobile phone, a smart phone, a VR head mounted display terminal, and the like.
  • the VR head mounted display terminal is not limited to VR glasses, VR eye masks, VR helmets, and the like.
  • the VR head-mounted display terminal uses a head-mounted display terminal to close a person's visual and auditory sense to the outside world, and guides the user to create a feeling in a virtual environment.
  • the display principle is that the left and right eye screens respectively display the images of the left and right eyes, and the human eye obtains such a difference information and generates a stereoscopic effect in the mind.
  • FIG. 1 is a schematic diagram of an optional hardware structure of a mobile terminal implementing various embodiments of the present invention.
  • the mobile terminal 100 is not limited to a mobile phone, a smart phone, a VR head mounted display terminal, or the like.
  • the wireless terminal unit 110 When the mobile terminal 100 is a VR head mounted display terminal, the wireless terminal unit 110, the wireless internet unit 111, the sensing unit 120, the collecting unit 121, the dividing unit 130, the detecting unit 131, the first processing unit 132, and the second The processing unit 133, the decoding unit 134, the rendering and output unit 140, the display unit 141, the storage unit 150, the interface unit 160, the control unit 170, and the power supply unit 180.
  • Figure 1 illustrates a mobile terminal having various components, but it should be understood that not all illustrated components are required to be implemented. More or fewer components can be implemented instead. The components of the VR head mounted display terminal will be described in detail below.
  • a wireless communication unit 110 that allows for radio communication between a VR head mounted display terminal and a wireless communication system or network.
  • the wireless communication unit can communicate in various forms, and can communicate with the background server in a broadcast form, a Wi-Fi communication form, a mobile communication (2G, 3G, or 4G) format.
  • the broadcast signal and/or the broadcast associated information may be received from the external broadcast management server via the broadcast channel.
  • the broadcast channel can include a satellite channel and/or a terrestrial channel.
  • the broadcast management server may be a server that generates and transmits a broadcast signal and/or broadcast associated information or a server that receives a previously generated broadcast signal and/or broadcast associated information and transmits it to the terminal.
  • the broadcast signal may include a TV broadcast signal, a radio broadcast signal, a data broadcast signal, and the like. Moreover, the broadcast signal may further include a broadcast signal combined with a TV or radio broadcast signal. Broadcast related information can also be provided via a mobile communication network.
  • the broadcast signal may exist in various forms, for example, it may be an Electronic Program Guide (EPG) of Digital Multimedia Broadcasting (DMB), a digital video broadcast handheld (DVB-H, Digital Video Broadcasting-Handheld). ) exists in the form of an ESG (Electronic Service Guide) and the like.
  • EPG Electronic Program Guide
  • DMB Digital Multimedia Broadcasting
  • DVD-H Digital Video Broadcasting-Handheld
  • the broadcast signal and/or broadcast associated information may be stored in storage unit 150 (or other type of storage medium).
  • Wi-Fi is a technology that can connect terminals such as personal computers and mobile terminals (such as VR head-mounted display terminals and mobile phone terminals) wirelessly.
  • Wi-Fi hotspots can be accessed.
  • Wi-Fi hotspots are created by installing an access point on an internet connection. This access point transmits wireless signals over short distances, typically covering 300 feet.
  • a Wi-Fi enabled VR head-mounted display terminal encounters a Wi-Fi hotspot, it can be wirelessly connected to the Wi-Fi network.
  • the radio signal is transmitted to and/or received from at least one of a base station (e.g., an access point, a Node B, etc.), an external terminal, and a server.
  • a base station e.g., an access point, a Node B, etc.
  • Such radio signals may include voice call signals, video call signals, or various types of data transmitted and/or received in accordance with text and/or multimedia messages.
  • the wireless internet unit 111 supports various data transmission communication technologies including WLAN of the VR head mounted display terminal to access the Internet.
  • the unit can be internally or externally coupled to the VR head mounted display terminal.
  • the wireless Internet access technologies involved in the unit may include Wireless Local Area Networks (WLAN), Wireless Broadband (Wibro), Worldwide Interoperability for Microwave Access (Wimax), and High Speed Downlink Packet Access (HSDPA, High). Speed Downlink Packet Access) and more.
  • the sensing unit 120 is configured to check various user operations to obtain information such as spatial angle, distance, position, speed, acceleration, etc., and the sensing unit may be a gyroscope.
  • the collecting unit 121 is configured to collect data, including collecting video image data. The data detected by the sensing unit can also be aggregated into the acquisition unit for data processing.
  • the dividing unit 130 is configured to acquire a video frame, and divide the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy.
  • the detecting unit 131 is configured to detect a spatial angle formed by the current line of sight of the human eye acting on the display area of the video frame.
  • the first processing unit 132 is configured to locate a target area locked by the current line of sight in the display area according to the angle.
  • the second processing unit 133 is configured to acquire a sub video frame corresponding to the target area.
  • the decoding unit 134 is configured to decode the sub video frame according to the decoding strategy.
  • the rendering and output unit 140 is configured to render the decoded data of the decoding unit into an image and output, in addition to the image, the audio data of the corresponding image is decoded, and the audio data can be converted through a rendering and output unit or through a dedicated audio output unit. After the audio signal is output, the output is sound.
  • the image data is supplied to the display unit for display.
  • the display unit 141 is configured to display image data for decoding the rendered output, and the image data may be displayed in a related user interface (UI, User Interface) or a GUI (Graphical User Interface).
  • the storage unit 150 is configured to store a software program or the like for processing and control operations performed by the control unit 170, or may temporarily store data (for example, image data, sensor data, audio data, etc.) that has been output or is to be output. . Moreover, the storage unit may store data regarding various manners of vibration and audio signals that are output when the touch is applied to the touch screen.
  • the storage unit may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), and a static memory.
  • the VR head mounted display terminal can cooperate with a network storage device that performs a storage function of the storage unit 150 through a network connection.
  • the interface unit 160 can be applied with 2G, 3G or 4G, wireless technology, etc., supports high-speed data transmission, transmits sound and data information at the same time, and has an open interface.
  • the VR head-mounted display terminal can be more easily used with various I/O devices. .
  • the control unit 170 is configured to control the overall operation of the VR head mounted display terminal. For example, control and processing related to sensory detection, video data acquisition, data communication, and the like of user operations are performed. Resource allocation and coordination for the coordination and interaction of various hardware components.
  • the power supply unit 180 receives external power or internal power under the control of the control unit 170 and provides appropriate power required to operate the various components and components.
  • the various embodiments described herein can be implemented in a computer readable medium using, for example, computer software, hardware, or any combination thereof.
  • the embodiments described herein may use an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing (DSP), a Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), processor, controller, microcontroller, microprocessor, electronics designed to perform the functions described herein At least one of the units is implemented, and in some cases, such an implementation may be implemented in control unit 170.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • the software code can be implemented by a software application (or program) written in any suitable programming language, which can be stored in storage unit 150 and executed by control unit 170.
  • the specific hardware entity of the storage unit 150 may be a memory, and a specific hardware entity of the control unit 170 may be a controller.
  • FIG. 2 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present invention.
  • FIG. 2 includes: a terminal 1 and a server 2, and the terminal 1 is composed of terminals 11-13.
  • the terminals 11-13 respectively adopt different VR head-mounted display terminals, the terminal 11 adopts a VR helmet, the terminal 12 adopts VR glasses (VR glasses composed of hardware entities), and the terminal 13 adopts VR glasses used in conjunction with the mobile phone terminal (
  • the VR glasses may be foldable carton glasses or non-folded, ie VR glasses made up of hardware entities.
  • the server 2 stores various video files. Through the interaction between the terminal 1 and the server 2, the video files to be played can be downloaded from the server 2 in real time online or in advance offline.
  • the resolution does not pose a problem in normal plane playback.
  • the VR head-mounted display terminal is used for panoramic playback, the image quality is unclear.
  • the 360-degree panoramic video quality is limited by the current hardware processing performance and encoding algorithm, and the resolution does not reach a good experience.
  • the human The eye can only see 1/3 or less of the area, and when the area is enlarged to the screen size, the definition of the image quality will be significantly reduced, plus the VR head-mounted display terminal itself has The magnifying glass of the concave and convex surface will enlarge the picture a bit, and the definition of the image quality will become worse.
  • the decoding of Blu-ray 1080P video by the general mobile phone hardware is already the limit, and if it is played in the panoramic mode and the amplification effect of the VR head-mounted display terminal, the picture quality becomes worse.
  • the processing performance of the hardware cannot be qualitatively improved in a short time, the playback quality of some panoramic videos can be improved by the decoding mechanism adopted by the processing logic 10 in FIG.
  • the processing logic 10 includes: S1, dividing the current video frame into at least two sub-video frames; S2, capturing a current line of sight of the user, and locating the target area locked by the current line of sight in the video frame according to a spatial angle obtained by the current line of sight acting on the video frame; S3, according to the video number of the sub video frame, obtain a sub video frame corresponding to the target area, decode the sub video frame according to the decoding strategy, and not decode the other non-sub video frames.
  • the sub video frame is a partial image constituting the current video frame
  • the decoding operation resource is saved, and the decoding is concentrated on the sub video frame corresponding to the target area locked by the current line of sight of the user, thereby improving decoding.
  • Efficiency and the improvement of decoding efficiency, brings about an improvement in the definition of picture quality.
  • FIG. 2 is only an example of a system architecture for implementing an embodiment of the present invention.
  • the embodiment of the present invention is not limited to the system structure described in FIG. 2, and various embodiments of the method of the present invention are proposed based on the system architecture described in FIG. .
  • the method includes: acquiring a video frame, dividing the video frame into at least two sub video frames, and formatting the sub video frame and the video frame.
  • the decoding strategy (101) is satisfied.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame are consistent in length.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame that is, a certain sub-video is consistent in the number of frames.
  • the video frame is not limited to ultra-high definition VR video.
  • the video format is ultra high definition. Since the human eye can only see one-third or less of the area, and the area is enlarged to the screen size, the sharpness of the image quality is significantly reduced.
  • a single panorama is adopted.
  • the video is divided into multiple channels and stored into multiple videos independently, and the effective portion of the decoding is selected according to the viewing angle of the current eye, so that the waste of resources caused by unnecessary decoding operations can be saved, and the decoding is focused on the effective portion due to the decoding.
  • the smaller the video area the less the computational consumption, the unnecessary waste of decoding operations, and the improved decoding efficiency, so that the decoding resolution is greatly improved.
  • a spatial angle (102) formed by the user's current line of sight is applied to the video frame.
  • a target area (103) that is locked by the current line of sight in the display area is located according to the angle.
  • Obtaining a sub video frame corresponding to the target area specifically, obtaining a sub video frame corresponding to the target area according to a video number obtained by dividing the video frame into at least two sub video frames.
  • the sub-video frame is decoded (105) according to the decoding strategy.
  • only the image of the specified area is decoded, that is, only the image corresponding to the current line-of-sight locking target area of the user is decoded, and the image is divided (or split, cut, or split) by the complete image in the video frame.
  • a plurality of sub video frames are represented, wherein the sub video frame is a partial image of a complete image in the video frame, that is, the plurality of sub video frames may constitute one video frame.
  • multiple sub video frames which may also be referred to as multi-channel sub-video frames, are separately stored.
  • the specific expressions employed are not limited to the examples in this embodiment.
  • the division of the video frame may also be referred to as splitting of the video frame, cutting of the video frame, or segmentation of the video frame, and the specific expression manner is not limited to the examples in the embodiment.
  • a video frame (such as a Chinese map or a world map, etc.) can be obtained by the above angle (such as a known spatial angle) and the above video number (eg, division or cutting or splitting of a video picture). Which of the sub video frames of the corresponding image are within the current line of sight area of the user.
  • the sub video frame is decoded according to the decoding policy, and only the image of the corresponding sub video is decoded and rendered to provide the user with a panoramic video presentation of the VR, without being within the current line of sight region. It is ignored directly and will not be decoded.
  • the first video frame to the ith video frame are obtained, and the first video frame to the ith video frame are sequentially processed into multiple first sub video frames corresponding to the first video frame to the ith video, and The i-th sub-video frame.
  • the first video frame and the plurality of first sub video frames are consistent in length and/or number of frames
  • the ith video frame and the plurality of i th sub video frames are consistent in length and/or frame number.
  • the first video frame and the ith video frame need not be consistent in length and/or number of frames. Detecting an angle formed by the current line of sight of the user on the first video frame, and positioning a target area locked by the current line of sight in the first video frame according to the angle.
  • the complete image of the first video frame is divided into 8 blocks, and the numbers corresponding to the first sub video frame are 1, 2, .
  • the target area locked by the current line of sight is an area corresponding to the first sub video frame number 2, 3, 6, and 7, and the video number 2 obtained by dividing the first video frame into the plurality of first sub video frames is obtained. And 3, 6, and 7, obtaining the sub video frame corresponding to the target area from the storage location of the sub video, and finally, decoding the sub video frame according to the decoding strategy.
  • the panoramic video quality can be improved by 2 times or more, and unnecessary computing resources are wasted.
  • the video information processing method of the embodiment of the present invention includes: acquiring a partitioned granularity parameter, where the partitioned granularity parameter is used to represent a threshold value of a frame number used when dividing the video frame into the at least two sub video frames or Threshold adjustable range.
  • the threshold value refers to different partitioning granularity parameters corresponding to different thresholds
  • the threshold value may be a determined fixed value
  • a threshold value is selected between the fixed values for dividing the video frame
  • the threshold adjustable range is The partitioning granularity parameter fluctuates within a threshold interval, and the threshold may not be a determined fixed value.
  • the threshold may be changed like a sliding window, and a threshold is randomly selected in the threshold interval for dividing the video frame.
  • Sliding window is a kind of control technology.
  • the threshold is related to the current computing power.
  • the threshold is randomly selected according to the computing power.
  • the sliding window is used to achieve coordinated control between the threshold and the current computing power to ensure that the selected threshold is accurate enough to ensure the drawing. Balance of quality and computing power. If communication between the two threads of the threshold selection and computing power is performed, if the communication parties perform their respective processing operations without considering the respective situations, problems may occur, for example, the current computing power is strong, and the image quality can be ensured. At this time, it is not necessary to divide the granularity parameters more carefully, unless you want higher definition. If the computing power is very poor, to ensure the definition of the image quality, you must choose a more detailed division of the granularity parameters. It can be seen that the randomly selected threshold is a more accurate measure to ensure clarity than the partitioned granularity parameter with a fixed threshold.
  • an example is: for example, the same video frame (such as a Chinese map or a world map) can obtain different numbers of sub video frames according to different partitioning granularity parameters, for example,
  • a video frame such as a Chinese map or a world map
  • a video frame such as China
  • the map or the world map can be divided into 8 sub-video frames
  • a video frame such as a Chinese map or a world map
  • the parameters are adjustable, with different thresholds and different granularity.
  • the granularity parameter is not static, and can be adjusted according to the image quality or presentation requirements of the actual image. For example, the current image quality itself is detected as a very clear image quality, then you can choose to divide the processing with lower granularity. Conversely, you can choose to divide the processing with higher granularity; for example, the user's own requirements for image quality are not high. You don't need to look at 1080P or ultra-clear or Blu-ray quality, or the current network is unstable.
  • the system monitors the network instability to select a process with a lower granularity. Conversely, a process with a higher granularity can be selected.
  • the video information processing method of the embodiment of the present invention includes: when the threshold size is obtained according to the dividing granularity parameter, dividing the video frame into a sub video frame corresponding to the current threshold according to a current threshold.
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame are consistent in length.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame that is, a certain sub-video is consistent in the number of frames.
  • the video frame is not limited to ultra-high definition VR video.
  • VR video for 360 degree panoramic video when the video is played in 360 panoramic mode, the video format is ultra high definition. Since the human eye can only see one-third or less of the area, and the area is enlarged to the screen size, the sharpness of the image quality is significantly reduced.
  • a single panorama is adopted.
  • the video is divided into multiple channels and stored into multiple videos independently, and the effective portion of the decoding is selected according to the viewing angle of the current eye, so that the waste of resources caused by unnecessary decoding operations can be saved, and the decoding is focused on the effective portion due to the decoding.
  • the angle at which the user's current line of sight acts on the video frame is detected.
  • a target area locked by the current line of sight in the video frame is located according to the angle.
  • a sub video frame corresponding to the target area is obtained according to a video number obtained by dividing the video frame into at least two sub video frames.
  • the sub video frame is decoded according to the decoding strategy.
  • only the image of the specified area is decoded, that is, only the image corresponding to the current line-of-sight locking target area of the user is decoded, and the image is divided (or split, cut, or split) by the complete image in the video frame.
  • a plurality of sub video frames are represented, wherein the sub video frame is a partial image of a complete image in the video frame, that is, the plurality of sub video frames may constitute one video frame.
  • the video information processing method of the embodiment of the present invention includes: when the threshold adjustable range is obtained according to the divided granularity parameter, a threshold is randomly selected from the threshold adjustable range, and the threshold is determined according to the selected threshold.
  • the video frame is divided into a corresponding number of sub video frames.
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame are consistent in length.
  • a video frame and a certain video frame in the multi-channel video frame obtained by dividing the video frame, that is, a certain sub-video is consistent in the number of frames.
  • the video frame is not limited to ultra-high definition VR video.
  • VR video for 360 degree panoramic video when the video is played in 360 panoramic mode, the video format is ultra high definition. Since the human eye can only see one-third or less of the area, and the area is enlarged to the screen size, the sharpness of the image quality is significantly reduced. With the embodiment of the present invention, a single panorama is adopted.
  • the video is divided into multiple channels and stored into multiple videos independently, and the effective portion of the decoding is selected according to the viewing angle of the current eye, so that the waste of resources caused by unnecessary decoding operations can be saved, and the decoding is focused on the effective portion due to the decoding.
  • the smaller the video area the less the computational consumption, the unnecessary waste of decoding operations, and the improved decoding efficiency, so that the decoding resolution is greatly improved.
  • the angle at which the user's current line of sight acts on the video frame is detected.
  • a target area locked by the current line of sight in the video frame is located according to the angle.
  • a sub video frame corresponding to the target area is obtained according to a video number obtained by dividing the video frame into at least two sub video frames.
  • the sub video frame is decoded according to the decoding strategy.
  • only the image of the specified area is decoded, that is, only the image corresponding to the current line-of-sight locking target area of the user is decoded, and the image is divided (or split, cut, or split) by the complete image in the video frame.
  • a plurality of sub video frames are represented, wherein the sub video frame is a partial image of a complete image in the video frame, that is, the plurality of sub video frames may constitute one video frame.
  • an example is that the video frame corresponds to a whole image, and the at least two sub video frames are partial images in the entire image corresponding to the video frame.
  • the video frame eg, the entire picture of the entire image
  • the video frame is cut into at least two sub-video frames (eg, a partial picture of the entire image).
  • the sub video frame and the video frame are consistent in play length and/or number of frames, the format of the sub video frame and the video frame satisfies a decoding strategy.
  • the format of the sub video frame and the video frame meets a preset decoding strategy, for example, a video frame and a certain video frame in the multiple video frames obtained by dividing the video frame, that is, a certain sub video
  • the length is consistent, or a video frame and a certain video frame in the multiple video frames obtained by dividing it, that is, a certain sub-video is consistent in the number of frames.
  • the video frame is divided into at least two sub-video frames, and at least two sub-video frames are separately stored, and video numbers are respectively assigned to at least two sub-video frames for subsequent locking.
  • the sub-video frame is queried after the target area.
  • at least two sub-video frames may be compressed before being separately stored, and then, when the sub-video frame is queried after the target area is locked, the corresponding at least two are queried by the video number.
  • Each sub-video frame is decompressed and then decoded by a decoding strategy.
  • the video information processing method of the embodiment of the present invention includes: storing the at least two sub video frames separately. Index information is created according to a frame type and a storage address offset of the at least two sub video frames, and a video number corresponding to the sub video frame is used as an index key of the index information.
  • the video number is a video number obtained by dividing the video frame into at least two sub video frames. According to the video number obtained by dividing the video frame into at least two sub video frames, in the process of obtaining a sub video frame corresponding to the target area, 1) may query from the index information according to the video number.
  • a frame type and a storage address offset, and a video type of the sub video frame is identified according to the frame type.
  • different video types adopt different decoding strategies.
  • the video information processing method of the embodiment of the invention includes: 1) first positioning, specifically, acquiring the first operation, for example, extracting feature points such as pupils, inner corners and outer corners by face detection, according to a line of sight model, etc. , to obtain the position of the human eye's line of sight.
  • the first angle obtained by the first operation such as by continuously detecting the angle of the line of sight of the human eye, can locate the first target area locked by the current line of sight in the video frame; 2) the second positioning, acquiring the human eye The line of sight shifts from the rest position to the next position and positions the next position.
  • Detecting the first operation change to the second operation for example, the user's head rotation or the eyeball rotation causes the current line of sight of the user to move, and the angle change formed by the second operation changes according to the first operation, positioning A second target area locked after the current line of sight is moved in the video frame is used to implement frame synchronization by secondary positioning. 3) performing frame synchronization on the storage address offset in the index information according to the sub video frame offset corresponding to the first target area and the second target area.
  • the VR mode after the video area is decoded as needed, each small video is skipped to play the picture, and the first picture may not need to be decoded at the beginning, and after a few seconds of playing, the line of sight moves. Need to start decoding the No.
  • An example of the embodiment of the present invention is: first positioning, for example, the local area involves 3, 4, 5 frames, when the user's line of sight moves, an offset is generated, and a second positioning is required, and the line of sight is required.
  • the frame offset generated by the movement is frame synchronized. Since the frame synchronization is realized, the user operation can be accurately captured regardless of how the user's current line of sight changes, and the target area currently locked by the user after the current line of sight movement change and the sub video frame corresponding to the target area are accurately located. And, according to the index information stored in the sub video frame, the sub video frame corresponding to the target area is accurately read from the storage location for subsequent decoding.
  • for frame synchronization there are two possibilities. 1.
  • the sub video frame sequence 2 involved in the second positioning and the sub video frame sequence 1 involved in the previous first positioning are consecutive frames, and the normal decoding is performed.
  • the sub-video frame sequence 2 involved in the second positioning and the sub-video frame sequence 1 involved in the previous first positioning are discontinuous frames, and there is a problem of frame skip decoding, for frame skip decoding.
  • frame skipping decoding can be implemented without adversely affecting normal decoding operations.
  • the sub video frame is decoded according to the decoding policy, involving multiple positioning, frame synchronization, and frame skip decoding. For example, if the eye is located at the center of the ball, you can see the VR panoramic picture. When sensing the positioning, use the mobile phone's own sensor or external device sensor to calculate a spatial angle, and then apply this angle to the 3D control angle. That is, the function of controlling the angle of view of the sensor is completed.
  • the default latitude and longitude can be used. For example, taking a video frame as a map, the first time is positioned at the center coordinate point of the map, and the second time is the tracking line of sight offset because the eyeball or head moves.
  • the length of the group of pictures (GOP) in the key frame is not fixed in this embodiment, and can be dynamically adjusted, that is, when the decoding fails or the frame is decoded, the GOP can be adjusted to the minimum value, and the Decoding failure or frame skip decoding results in poor decoding efficiency.
  • the GOP is dynamically set as small as possible, and the type and starting offset of each frame can be recorded in the header of the customized file storage format.
  • the interval GOP of the video key frame in the at least two sub video frames is adjusted to a minimum value (GOP_min) in the GOP preset value, according to the GOP_min pair
  • the sub video frame is decoded.
  • the sub video frame is a discontinuous frame, performing frame skipping decoding, adjusting an interval GOP of the video key frame in the at least two sub video frames to the GOP_min, and the sub video according to the GOP_min
  • the frame is decoded.
  • An example of an embodiment of the present invention is that an encoder (for example, H264) of a video must be continuously read frame data in order to decode normally. If the decoder fails to decode the 5th frame or intentionally skips (the invisible area during VR playback intentionally ignores the skip), the normal decoding can be restarted until the start of the next GOP.
  • This problem in decoding failure or frame skip decoding can be solved by reducing the interval of video key frames. Specifically, the interval GOP to GOP_min of the video key frames in the at least two sub video frames is adjusted, that is, a relatively small GOP value (such as the GOP_min) is used.
  • GOP_min a relatively small GOP
  • the use of a relatively small GOP can ensure that after skipping some frames, the number of failures before re-successfully decoding is relatively small, thereby avoiding the normality when multiple frame skipping or decoding failure reaches a preset value.
  • the adverse effects of the decoding operation can ensure that after skipping some frames, the number of failures before re-successfully decoding is relatively small, thereby avoiding the normality when multiple frame skipping or decoding failure reaches a preset value.
  • the processing logic formed by the policies and processes performed by the above various embodiments may be implemented in advance by the function of the customizable decoding area added in the video decoder. That is to say, the video decoder itself supports customizable decoding of the specified target area for decoding.
  • the video information processing system of the embodiment of the present invention includes a terminal 41 and a server 42.
  • the terminal 41 can adopt different VR head-mounted display terminals, such as a VR helmet, VR glasses (VR glasses composed of hardware entities), and VR glasses used with the mobile phone terminal (the VR glasses can be foldable carton glasses) It can also be unfolded, that is, VR glasses composed of hardware entities, and the like.
  • Various video files are stored in the server 42. Through the interaction between the terminal 41 and the server 42, the video files to be played can be downloaded from the server 42 in real time or in advance offline.
  • the processing is performed by the dividing unit 411, the detecting unit 412, the first processing unit 413, the second processing unit 414, and the decoding unit 415 in the terminal 41.
  • the dividing unit 411 is configured to acquire a video frame, and divide the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy, and the detecting unit 412 detects a current line of sight of the human eye.
  • the first processing unit 413 is configured to locate a target area locked by the current line of sight in the display area according to the angle; and the second processing unit 414 And configured to acquire a sub video frame corresponding to the target area, for example, according to a video number obtained by dividing the video frame into at least two sub video frames, to obtain a sub video frame corresponding to the target area; and decoding unit 415, The method is configured to decode the sub video frame according to the decoding policy.
  • a video frame (such as a Chinese map or a world map, etc.) can be obtained by the above angle (such as a known spatial angle) and the above video number (eg, division or cutting or splitting of a video picture). Which of the sub video frames of the corresponding image are within the current line of sight area of the user.
  • the sub video frame is decoded according to the decoding policy, and only the image of the corresponding sub video is decoded and rendered to provide the user with a panoramic video presentation of the VR, without being within the current line of sight region. It is ignored directly and will not be decoded.
  • the first video frame to the ith video frame are obtained, and the first video frame to the ith video frame are sequentially processed into multiple first sub video frames corresponding to the first video frame to the ith video, and The i-th sub-video frame.
  • the first video frame and the plurality of first sub video frames are consistent in length and/or number of frames
  • the ith video frame and the plurality of i th sub video frames are consistent in length and/or frame number.
  • the first video frame and the ith video frame need not be consistent in length and/or number of frames. Detecting an angle formed by the current line of sight of the user on the first video frame, and positioning a target area locked by the current line of sight in the first video frame according to the angle.
  • the complete image of the first video frame is divided into 8 blocks, and the numbers corresponding to the first sub video frame are 1, 2, .
  • the target area locked by the current line of sight is an area corresponding to the first sub video frame number 2, 3, 6, and 7, and the video number 2 obtained by dividing the first video frame into the plurality of first sub video frames is obtained. And 3, 6, and 7, obtaining the sub video frame corresponding to the target area from the storage location of the sub video, and finally, decoding the sub video frame according to the decoding strategy.
  • the dividing unit is further configured to: obtain a partitioning granularity parameter, where the partitioning granularity parameter is used to represent that when the video frame is divided into the at least two sub video frames, The number of frames used or the threshold adjustable range;
  • the video frame is divided into sub-video frames corresponding to the current threshold according to a current threshold;
  • a threshold is randomly selected from the threshold adjustable range, and the video frame is divided into a corresponding number of sub video frames according to the selected threshold.
  • the video frame corresponds to the entire image
  • the at least two sub video frames are partial images in the entire image corresponding to the video frame; when the sub video frame and the video frame are playing
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • the terminal further includes: a storage unit, configured to separately store the at least two sub video frames separately.
  • An index creation unit is configured to create index information according to a frame type and a storage address offset of the at least two sub video frames, and use a video number corresponding to the sub video frame as an index key of the index information; the video number is The video frame obtained by dividing the video frame into at least two sub video frames.
  • the second processing unit is further configured to: query, according to the video number, a frame type and a storage address offset from the index information; and identify the a video type of the sub video frame; positioning a storage location of the sub video frame according to the storage address offset; reading the sub video frame from the storage location.
  • the terminal further includes: a first positioning unit configured to continuously detect an angle of a line of sight of the human eye according to a position of the line of sight of the human eye, and locate the current line of sight in the video frame. a first target area that is locked; a second positioning unit configured to: when the line of sight of the human eye shifts, locate a second target area that is locked after the current line of sight moves in the video frame according to an angle change of the line of sight of the human eye;
  • the synchronization unit is configured to perform frame synchronization on the storage address offset in the index information according to the sub video frame offset corresponding to the first target area and the second target area.
  • the decoding unit is further configured to adjust a GOP of a video key frame in the at least two sub video frames to a GOP preset value when the sub video frame fails to be decoded.
  • the minimum value GOP_min, the sub video frame is decoded according to the GOP_min; when the sub video frame is a discontinuous frame, performing frame skipping decoding, and spacing the video key frames in the at least two sub video frames
  • the GOP is adjusted to the GOP_min, and the sub video frame is decoded according to the GOP_min.
  • the video frame is divided into eight sub-video frames as an example, and the embodiments of the present invention are described as follows:
  • the original Ultra HD VR video source into multiple independent video storages, such as a standard 4K video, that is, the video format 3840x2160 video is split into 8 960x1080 videos, as shown in Figure 5.
  • the video picture is cut, but each small video file has the same length and number of frames as the original video.
  • these 8 sub-videos are stored in a custom format.
  • the GOP is as small as possible.
  • the custom format file header records the type and starting offset of each frame.
  • the video picture is a world map.
  • the VR rendering principle is to render this special picture onto a ball.
  • the ball without the rendered texture in Figure 7 can clearly see the latitude and longitude.
  • the figure is attached, but only the contoured ball is rendered.
  • the ball of the rendered texture in Fig. 8 is attached to the ball due to the texture, and part of the latitude and longitude is covered.
  • the image of the outline in Fig. 8 and Fig. 9 The final rendered image outline is consistent.
  • Successfully rendered the VR video onto a ball as shown in Figure 9.
  • With VR technology if the eye is located at the center of the sphere of Figure 9, you can see the VR panorama as shown in Figure 10.
  • the human eye sees the world, and the up and down angle or the left and right angles are absolutely impossible to exceed 180 degrees. The same is true in the computer, and you can usually see an arc-shaped area of about 100 degrees. So the above world map only has a small part of the picture content seen by the eyes at any time.
  • the head rotates, directly calculate the spatial angle using the sensor or external device sensor of the mobile phone, and then apply this angle to the 3D control angle, that is, the function of controlling the angle of view of the sensor is completed.
  • the spatial angle and the segmentation number of the video picture it is possible to calculate which of all numbered sub-pictures of the current world map are within the line of sight area.
  • only the image of the corresponding sub-video is decoded and rendered, and is ignored in the range.
  • the consumption of the decoding operation of the decoder is proportional to the image area. The smaller the area, the less the operation is consumed, and the unnecessary waste of decoding operation can be saved.
  • the reasons for the picture quality bottleneck are: 1) The video decoding performance bottleneck of ordinary mobile devices, the limit is 1080P. 2) panorama video requires a higher definition video picture, such as 4K or 8K. At present, the actual decoded video of the panoramic video is as shown in FIG.
  • the area seen in the VR head mounted display terminal is the target area identified by A1 in Fig. 11, as shown in Fig. 12. At the current moment, the target area actually has a small percentage of the entire picture.
  • the existing coding technology and the hardware processing performance cannot directly enable the mobile device to smoothly decode the 4K or 8K video. However, with the embodiment of the present invention, the decoding efficiency can be improved based on the current processing performance.
  • the background operation only needs to decode the picture of the target area identified by A1, and even if a part of the redundancy is added, about 50% of invalid processing can be avoided.
  • the screen is cut into 8 blocks and then compressed to generate a new video, that is, the video frame is cut into 8 sub-video frames and compressed and stored separately, as shown in the figure.
  • the target area identified by A1 is composed of the areas numbered 3, 4, 7, and 8, correspondingly, the sub video frames corresponding to the areas of 3, 4, 7, and 8.
  • the problem of frame synchronization and frame skipping of multi-channel video is also considered.
  • the large video is 1000x1000, 10 seconds long, and there are a total of 300 frames.
  • the resolution of each small video is 250x250, which is also ten seconds long, and there are also 300 frames in total.
  • the encoder of a general video (such as H264) must be continuously read frame data in order to decode normally. If the decoder decodes the 5th frame or deliberately skips (the invisible area during VR playback intentionally ignores the skip), then decoding the 6th, 7th, 8th, 9th... frames will fail until the start of the next GOP.
  • This problem can be solved by reducing the GOP, ie using a smaller GOP value.
  • a relatively small GOP can ensure that after skipping some frames, the number of failures before re-successful decoding is relatively small, thereby avoiding problems caused by multiple decoding failures or frame skipping decoding.
  • each small video is skipping the playback picture. It may be that the first video does not need to be decoded at the beginning. After a few seconds of playback, the line of sight moves, and it is necessary to start decoding the video No. 1. And the time to start playing is the 5th second. In this case, it is necessary to find the position of the 5th second frame very accurately. Otherwise, the pictures of different channels of video cannot be synchronized.
  • the above-mentioned customized video file storage format can be used to add an index of all frames of the video in the file header, and the index records the type of each frame and the file address offset, and the index record can be quickly located. Any frame begins to be read and decoded for precise frame alignment.
  • the terminal includes: a processor 61 and a memory for storing a computer program capable of running on the processor, and a representation of the memory may be as shown in FIG.
  • the illustrated computer storage medium 63 also includes a bus 62 for data communication.
  • the processor is configured to execute when the computer program is executed:
  • Obtaining a video frame dividing the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy;
  • the sub video frame is decoded according to the decoding strategy.
  • the video frame is divided into sub-video frames corresponding to the current threshold according to a current threshold;
  • a threshold is randomly selected from the threshold adjustable range, and the video frame is divided into a corresponding number of sub video frames according to the selected threshold.
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • index information according to a frame type and a storage address offset of the at least two sub video frames, and using a video number corresponding to the sub video frame as an index key of the index information;
  • the video number is a video number obtained by dividing the video frame into at least two sub video frames.
  • the sub video frame is read from the storage location.
  • the angle of the line of sight of the human eye is continuously detected, and the first target area locked by the current line of sight in the video frame is located;
  • the GOP of the video key frame in the at least two sub video frames is adjusted to a minimum value GOP_min in the GOP preset value, and the sub video frame is decoded according to the GOP_min;
  • the sub video frame is a discontinuous frame
  • performing frame skipping decoding adjusting a GOP of the video key frame in the at least two sub video frames to the GOP_min, and decoding the sub video frame according to the GOP_min .
  • a computer storage medium stores computer executable instructions for executing:
  • Obtaining a video frame dividing the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy;
  • the sub video frame is decoded according to the decoding strategy.
  • the computer executable instructions are also used to execute:
  • the video frame is divided into sub-video frames corresponding to the current threshold according to a current threshold;
  • a threshold is randomly selected from the threshold adjustable range, and the video frame is divided into a corresponding number of sub video frames according to the selected threshold.
  • the computer executable instructions are also used to execute:
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • the computer executable instructions are also used to execute:
  • index information according to a frame type and a storage address offset of the at least two sub video frames, and using a video number corresponding to the sub video frame as an index key of the index information;
  • the video number is a video number obtained by dividing the video frame into at least two sub video frames.
  • the computer executable instructions are also used to execute:
  • the sub video frame is read from the storage location.
  • the computer executable instructions are also used to execute:
  • the angle of the line of sight of the human eye is continuously detected, and the first target area locked by the current line of sight in the video frame is located;
  • the computer executable instructions are also used to execute:
  • the GOP of the video key frame in the at least two sub video frames is adjusted to a minimum value GOP_min in the GOP preset value, and the sub video frame is decoded according to the GOP_min;
  • the sub video frame is a discontinuous frame
  • performing frame skipping decoding adjusting a GOP of the video key frame in the at least two sub video frames to the GOP_min, and decoding the sub video frame according to the GOP_min .
  • a video information processing method where the method is performed by a terminal, the terminal includes one or more processors and a memory, and one or more programs, wherein the one or more programs
  • the program is stored in a memory, and the program can include one or more units each corresponding to a set of instructions, the one or more processors being configured to execute the instructions; the method comprising:
  • Obtaining a video frame dividing the video frame into at least two sub video frames, where the format of the sub video frame and the video frame meets a decoding strategy;
  • the sub video frame is decoded according to the decoding strategy.
  • the acquiring a video frame and dividing the video frame into at least two sub video frames includes:
  • the video frame is divided into sub-video frames corresponding to the current threshold according to a current threshold;
  • a threshold is randomly selected from the threshold adjustable range, and the video frame is divided into a corresponding number of sub video frames according to the selected threshold.
  • the video frame corresponds to a whole image, and the at least two sub video frames are partial images in the entire image;
  • the format of the sub video frame and the video frame satisfies a decoding strategy.
  • the method further includes:
  • index information according to a frame type and a storage address offset of the at least two sub video frames, and using a video number corresponding to the sub video frame as an index key of the index information;
  • the video number is a video number obtained by dividing the video frame into at least two sub video frames.
  • the acquiring a sub video frame corresponding to the target area includes:
  • the sub video frame is read from the storage location.
  • the method further includes:
  • the angle of the line of sight of the human eye is continuously detected, and the first target area locked by the current line of sight in the video frame is located;
  • the decoding according to the decoding policy, the sub video frame comprises:
  • the length GOP of the image group of the video key frame in the at least two sub video frames is adjusted to a minimum value GOP_min in the GOP preset value, and the sub video is compared according to the GOP_min Frame decoding;
  • the sub video frame is a discontinuous frame
  • performing frame skipping decoding adjusting an interval GOP of the video key frame in the at least two sub video frames to the GOP_min, and performing the sub video frame according to the GOP_min decoding.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk, or an optical disk, and the like, which can store program codes.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.
  • the target area is locked by angle detection and angular positioning, and a sub-video frame corresponding to the target area is obtained. Since the sub video frame is a partial image in all the images in the video frame, decoding of the sub video frame instead of decoding all the video improves the decoding efficiency, and the decoding efficiency is improved, and the image quality is improved. The clarity of the image quality is guaranteed and greatly improved.

Abstract

本发明公开了一种视频信息处理方法及终端、计算机存储介质,其中,所述方法包括:获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;获取与所述目标区域对应的子视频帧;根据所述解码策略对所述子视频帧进行解码。

Description

一种视频信息处理方法及终端、计算机存储介质
相关申请的交叉引用
本申请基于申请号为201710289910.x、申请日为2017年04月27日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明涉及信息处理技术,尤其涉及一种视频信息处理方法及终端、计算机存储介质。
背景技术
随着终端的智能化和影像及互联网技术的发展,虚拟现实(VR)技术在移动领域的发展前景非常好,但是其呈现效果的清晰度还不尽如人意。现有的普通移动终端无法解码超过1080P清晰度的视频格式,这在普通平面播放的时候并不构成问题,也就是,其解码的清晰度并不影响在终端的正常显示。然而,当视频以360全景模式播放的时候,人的眼睛只能看到其中1/3或者更少的区域,而这片区域被放大到屏幕大小的时候,画质的清晰度就有了明显下降,再加上VR眼镜本身具有凹凸面处理的放大镜,会将画面再放大一些,画质的清晰度就变得更差了。
然而,相关技术中,对于该画质清晰度得不到保障的问题,尚无有效解决方案。
发明内容
有鉴于此,本发明实施例提供了一种视频信息处理方法及终端、计算 机存储介质,至少解决了现有技术存在的问题。
本发明实施例的一种视频信息处理方法,所述方法包括:
获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
获取与所述目标区域对应的子视频帧;
根据所述解码策略对所述子视频帧进行解码。
本发明实施例的一种终端,所述终端包括:
划分单元,配置为获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测单元,配置为检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
第一处理单元,配置为根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
第二处理单元,配置为获取与所述目标区域对应的子视频帧;
解码单元,配置为根据所述解码策略对所述子视频帧进行解码。
本发明实施例的一种终端,所述终端包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,所述处理器用于运行所述计算机程序时,执行上述方案任一项所述的视频信息处理方法。
本发明实施例的一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令用于执行上述方案任一项所述的视频信息处理方法。
本发明实施例的一种视频信息处理方法,所述方法由终端执行,所述终端包括有一个或多个处理器以及存储器,以及一个或一个以上的程序, 其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
获取与所述目标区域对应的子视频帧;
根据所述解码策略对所述子视频帧进行解码。
采用本发明实施例,将所述视频帧划分成至少两个子视频帧后,通过角度检测,角度定位来锁定目标区域,得到与所述目标区域对应的子视频帧。由于子视频帧是视频帧中全部图像中的局部图像,因此,对该子视频帧的解码而不是对全部视频的解码,会提高解码效率,而解码效率的提高,能改善画质的清晰度,使得画质的清晰度得到保障及大幅的提升。
附图说明
图1为实现本发明各个实施例的移动终端一个可选的硬件结构示意图;
图2为本发明实施例中进行信息交互的各方硬件实体的示意图;
图3为本发明实施例一方法的实现流程示意图;
图4为本发明实施例一系统架构的示意图;
图5为应用本发明实施例一应用场景中视频帧的示意图;
图6为应用本发明实施例的采用VR技术渲染图像的一个场景示意图;
图7为应用本发明实施例的采用VR技术渲染图像的又一个场景示意图;
图8为应用本发明实施例的采用VR技术渲染图像的又一个场景示意图;
图9为应用本发明实施例的采用VR技术渲染图像的又一个场景示意图;
图10为应用本发明实施例的采用VR技术渲染图像的又一个场景示意图;
图11为应用本发明实施例的一应用场景的视频划分的示意图;
图12为应用本发明实施例的又一应用场景的视频划分的示意图;
图13为应用本发明实施例的又一应用场景的视频划分的示意图;
图14为本发明实施例终端的硬件结构示意图。
具体实施方式
下面结合附图对技术方案的实施作进一步的详细描述。
现在将参考附图描述实现本发明各个实施例的移动终端。在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本发明实施例的说明,其本身并没有特定的意义。因此,"模块"与"部件"可以混合地使用。
在下面的详细说明中,陈述了众多的具体细节,以便彻底理解本发明。不过,对于本领域的普通技术人员来说,显然可在没有这些具体细节的情况下实践本发明。在其他情况下,没有详细说明公开的公知方法、过程、组件、电路和网络,以避免不必要地使实施例的各个方面模糊不清。
另外,本文中尽管多次采用术语“第一”、“第二”等来描述各种元件(或各种阈值或各种应用或各种指令或各种操作)等,不过这些元件(或阈值或应用或指令或操作)不应受这些术语的限制。这些术语只是用于区分一个元件(或阈值或应用或指令或操作)和另一个元件(或阈值或应用或指令或操作)。例如,第一操作可以被称为第二操作,第二操作也可以被称为第一操作,而不脱离本发明的范围,第一操作和第二操作都是操作,只是二者并不是相同的操作而已。
本发明实施例中的步骤并不一定是按照所描述的步骤顺序进行处理,可以按照需求有选择的将步骤打乱重排,或者删除实施例中的步骤,或者增加实施例中的步骤,本发明实施例中的步骤描述只是可选的顺序组合,并不代表本发明实施例的所有步骤顺序组合,实施例中的步骤顺序不能认为是对本发明的限制。
本发明实施例中的术语“和/或”指的是包括相关联的列举项目中的一个或多个的任何和全部的可能组合。还要说明的是:当用在本说明书中时,“包括/包含”指定所陈述的特征、整数、步骤、操作、元件和/或组件的存在,但是不排除一个或多个其他特征、整数、步骤、操作、元件和/或组件和/或它们的组群的存在或添加。
本发明实施例的智能终端(如移动终端)可以以各种形式来实施。例如,本发明实施例中描述的移动终端可以包括诸如移动电话、智能电话、VR头戴式显示终端等等。其中,VR头戴式显示终端不限于VR眼镜、VR眼罩、VR头盔等。VR头戴式显示终端是利用头戴式显示终端将人的对外界的视觉、听觉封闭,引导用户产生一种身在虚拟环境中的感觉。其显示原理是左右眼屏幕分别显示左右眼的图像,人眼获取这种带有差异的信息后在脑海中产生立体感。
图1为实现本发明各个实施例的移动终端一个可选的硬件结构示意图。移动终端100不限于移动电话、智能电话、VR头戴式显示终端等等。
移动终端100为VR头戴式显示终端时,可以包括:无线通信单元110、无线互联网单元111、传感单元120、采集单元121、划分单元130、检测单元131、第一处理单元132、第二处理单元133、解码单元134、渲染及输出单元140、显示单元141、存储单元150、接口单元160、控制单元170、电源单元180。图1示出了具有各种组件的移动终端,但是应理解的是,并不要求实施所有示出的组件。可以替代地实施更多或更少的组件。将在下 面详细描述VR头戴式显示终端的元件。
无线通信单元110,其允许VR头戴式显示终端与无线通信系统或网络之间的无线电通信。例如,无线通信单元进行通信的形式多种多样,可以采用广播的形式、Wi-Fi通信形式、移动通信(2G、3G或4G)形式等与后台服务器进行通信交互。其中,采用广播的形式进行通信交互时,可以经由广播信道从外部广播管理服务器接收广播信号和/或广播相关信息。广播信道可以包括卫星信道和/或地面信道。广播管理服务器可以是生成并发送广播信号和/或广播相关信息的服务器或者接收之前生成的广播信号和/或广播相关信息并且将其发送给终端的服务器。广播信号可以包括TV广播信号、无线电广播信号、数据广播信号等等。而且,广播信号可以进一步包括与TV或无线电广播信号组合的广播信号。广播相关信息也可以经由移动通信网络提供。广播信号可以以各种形式存在,例如,其可以以数字多媒体广播(DMB,Digital Multimedia Broadcasting)的电子节目指南(EPG,Electronic Program Guide)、数字视频广播手持(DVB-H,Digital Video Broadcasting-Handheld)的电子服务指南(ESG,Electronic Service Guide)等等的形式而存在。广播信号和/或广播相关信息可以存储在存储单元150(或者其它类型的存储介质)中。Wi-Fi是一种可以将个人电脑、移动终端(如VR头戴式显示终端、手机终端)等终端以无线方式互相连接的技术,采用Wi-Fi通信形式时,能够访问Wi-Fi热点进而接入Wi-Fi网络。Wi-Fi热点是通过在互联网连接上安装访问点来创建的。这个访问点将无线信号通过短程进行传输,一般覆盖300英尺。当支持Wi-Fi的VR头戴式显示终端遇到一个Wi-Fi热点时,就可以用无线方式连接到Wi-Fi网络中。采用移动通信(2G、3G或4G)形式时,将无线电信号发送到基站(例如,接入点、节点B等等)、外部终端以及服务器中的至少一个和/或从其接收无线电信号。这样的无线电信号可以包括语音通话信号、视频通话信号、或者 根据文本和/或多媒体消息发送和/或接收的各种类型的数据。
无线互联网单元111支持VR头戴式显示终端的包括无线在内的各种数据传输通讯技术,以便接入互联网。该单元可以内部或外部地耦接到VR头戴式显示终端。该单元所涉及的无线互联网接入技术可以包括无线局域网络(WLAN,Wireless Local Area Networks)、无线宽带(Wibro)、全球微波互联接入(Wimax)、高速下行链路分组接入(HSDPA,High Speed Downlink Packet Access)等等。
传感单元120,配置为对各种用户操作进行检查,得到空间角度、距离、位置、速度、加速度等信息,传感单元可以是陀螺仪。采集单元121,配置为对数据进行采集,包括对视频图像数据的采集。传感单元所检测得到的数据也可以汇聚到采集单元中进行数据处理。
划分单元130,配置为获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略。检测单元131,配置为检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度。第一处理单元132,配置为根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域。第二处理单元133,配置为获取与所述目标区域对应的子视频帧。解码单元134,配置为根据所述解码策略对所述子视频帧进行解码。
渲染及输出单元140,配置为将解码单元的解码数据渲染形成图像并输出,除了图像,还包括解码得到对应图像的音频数据,可以通过渲染及输出单元或者通过专门的音频输出单元对音频数据转换为音频信号后进行输出,输出为声音。将图像数据提供给显示单元进行显示。显示单元141,配置为显示解码所渲染输出的图像数据,可以将图像数据显示于相关用户界面(UI,User Interface)或图形用户界面(GUI,Graphical User Interface)中。
存储单元150,配置为存储由控制单元170执行的处理和控制操作的软件程序等等,或者可以暂时地存储已经输出或将要输出的数据(例如,图像数据、传感数据、音频数据等等)。而且,存储单元可以存储关于当触摸施加到触摸屏时输出的各种方式的振动和音频信号的数据。存储单元可以包括至少一种类型的存储介质,所述存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM,Random Access Memory)、静态随机访问存储器(SRAM,Static Random Access Memory)、只读存储器(ROM,Read Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read Only Memory)、可编程只读存储器(PROM,Programmable Read Only Memory)、磁性存储器、磁盘、光盘等等。而且,VR头戴式显示终端可以与通过网络连接执行存储单元150的存储功能的网络存储装置协作。
接口单元160,可以应用2G、3G或4G、无线技术等,支持高速数据传输,同时传送声音及数据信息,开放接口,VR头戴式显示终端能够更轻松地与各种I/O设备配合使用。
控制单元170,配置为控制VR头戴式显示终端的总体操作。例如,执行与用户操作的传感检测、视频数据采集、数据通信等等相关的控制和处理。对各个硬件组件的配合和交互操作进行资源分配和协调。
电源单元180在控制单元170的控制下接收外部电力或内部电力并且提供操作各元件和组件所需的适当的电力。
这里描述的各种实施方式可以以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC,Application Specific Integrated Circuit)、数字信号处理器(DSP,Digital Signal Processing)、数字信号处理装置(DSPD,Digital Signal Processing Device)、可编程逻辑装置(PLD, Programmable Logic Device)、现场可编程门阵列(FPGA,Field Programmable Gate Array)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施,在一些情况下,这样的实施方式可以在控制单元170中实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件单元来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储单元150中并且由控制单元170执行。其中,存储单元150的一个具体硬件实体可以为存储器,控制单元170的一个具体硬件实体可以为控制器。
至此,已经按照其功能描述了移动终端中以VR头戴式显示终端为代表的上述单元组成结构。
图2为本发明实施例中进行信息交互的各方硬件实体的示意图,图2中包括:终端1和服务器2,终端1由终端11-13所组成。其中,终端11-13分别采用不同的VR头戴式显示终端,终端11采用VR头盔、终端12采用VR眼镜(由硬件实体构成的VR眼镜)、终端13采用与手机终端配套使用的VR眼镜(该VR眼镜可以是可折叠的纸盒眼镜,也可以是非折叠的,即由硬件实体构成的VR眼镜)。服务器2中存储有各种视频文件,通过终端1与服务器2的交互,可以从服务器2中实时在线或者预先离线下载到需要播放的视频文件。在终端1本地播放视频文件时,由于现有的普通移动终端无法解码超过1080P清晰度的视频格式,这在普通平面播放时清晰度并不构成问题。可是,采用VR头戴式显示终端进行全景播放时会导致画质不清晰的问题。在VR领域特别是移动VR领域,360度全景视频画质受目前硬件处理性能和编码算法的限制,清晰度还达不到好的体验效果,当视频文件以360全景模式播放的时候,人的眼睛只能看到其中1/3或者更少的区域,而这片区域被放大到屏幕大小的时候,画质的清晰度就有了明显下降, 再加上VR头戴式显示终端本身具有起凹凸面处理的放大镜,会将画面再放大一些,画质的清晰度就变得更差了。比如一般手机硬件解码蓝光1080P的视频已经是极限了,而如果以全景模式播放出来,又加上VR头戴式显示终端的放大效果,画质就变得更差了。虽然硬件的处理性能短时间内无法有质的提升,但是可以通过如图2中处理逻辑10所采用的解码机制来提升一些全景视频的播放画质。处理逻辑10包括:S1、将当前视频帧划分成至少两个子视频帧;S2、捕捉用户当前视线,根据当前视线作用于视频帧得到的空间角度定位出在视频帧中当前视线所锁定的目标区域;S3、根据子视频帧的视频编号,得到与目标区域对应的子视频帧,根据解码策略对子视频帧进行解码处理,对其他非子视频帧不进行解码处理。采用本发明实施例,由于子视频帧为构成当前视频帧的局部图像,因此,节约了解码运算资源,将解码集中于用户当前视线所锁定的目标区域对应的子视频帧解码,从而提高了解码效率,而解码效率的提高,带来了画质清晰度的提升。
上述图2的例子只是实现本发明实施例的一个系统架构实例,本发明实施例并不限于上述图2所述的系统结构,基于上述图2所述的系统架构,提出本发明方法各个实施例。
本发明实施例的视频信息处理方法,如图3所示,所述方法包括:获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略(101)。子视频帧与视频帧的格式满足预设的解码策略的一个实例中,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在长度上是一致的,或者,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在帧数上是一致的。该视频帧不限于超高清的VR视频,针对VR视频为360度全景视频而言,视频以360全景模式播放的时候,视频格式为超高清。由于人的眼睛只能看到其中1/3或者更少的区域,而这片区域被放大到屏幕大小 的时候,画质的清晰度就有了明显下降,采用本发明实施例,将单个全景视频分成多路独立存储成多个视频,根据当前眼睛所观看视角选择有效部分解码播放,则可以省下来不必要的解码运算所导致的资源浪费,而将解码专注于该有效部分,由于解码的视频面积越小,运算的消耗就越少,可以省下不必要的解码运算浪费,则提高了解码的效率,从而解码清晰度得到很大的提升。检测用户当前视线作用于所述视频帧形成的空间角度(102)。根据所述角度定位出在所述显示区域中当前视线所锁定的目标区域(103)。获取与所述目标区域对应的子视频帧,具体的,可以根据将所述视频帧划分成至少两个子视频帧后得到的视频编号,得到与所述目标区域对应的子视频帧(104)。根据所述解码策略对所述子视频帧进行解码(105)。本发明实施例中,只解码指定区域的图像,即只解码用户当前视线锁定目标区域所对应的图像,该图像以视频帧中完整图像进行划分(或称拆分、切割、或分割)后得到的多个子视频帧来表示,其中,子视频帧为视频帧中完整图像的局部图像,即多个子视频帧可以构成一个视频帧。
本文中,多个子视频帧,也可以称为多路子视频帧,分别独立存储。采用何种具体表达方式不限于本实施例中的这些举例。
本文中,对视频帧的划分,也可以称为对视频帧的拆分、对视频帧的切割、或对视频帧的分割,采用何种具体表达方式不限于本实施例中的这些举例。
本发明实施例中,通过上述角度(如已知的空间角度)和上述视频编号(如,视频画面的划分或切割或分割编号),可以得到视频帧(比如一副中国地图或世界地图等)对应图像的子视频帧中哪些是在用户的当前视线区域范围内。本实施例中,根据所述解码策略对所述子视频帧进行解码,是只解码对应子视频的图像并将其渲染出来以提供给用户进行VR的全景视频呈现,而不在当前视线区域范围内的,就直接忽略,不予以解码处理。
一个实例中,获取第一视频帧至第i视频帧,将所述第一视频帧至第i视频帧依序处理为对应第一视频帧至第i视频的多个第一子视频帧和多个第i子视频帧。其中,第一视频帧与多个第一子视频帧在长度和/或帧数上是一致的,第i视频帧与多个第i子视频帧在长度和/或帧数上是一致的。第一视频帧与第i视频帧在长度和/或帧数上不需要保持一致。检测用户当前视线作用于所述第一视频帧形成的角度,根据所述角度定位出在所述第一视频帧中当前视线所锁定的目标区域。比如,对于所述第一视频帧的完整图像,将该完整图像划分为8块,对应第一子视频帧的编号为1、2、……8。当前视线所锁定的目标区域为对应第一子视频帧编号2、3、6、7所构成的区域,则,根据将第一视频帧划分成多个第一子视频帧后得到的视频编号2、3、6、7,从子视频的存储位置中得到与目标区域对应的子视频帧,最终,根据解码策略对所述子视频帧进行解码。
对于通过硬件解码标清、超清、蓝光1080P或更高清画质视频遇到的瓶颈,即硬件的处理性能在短时间内无法得到有质的提升,若以全景模式播放标清、超清、蓝光1080P或更高清画质视频,再加上VR眼镜的放大效果,画质清晰度会受到影响。采用本发明实施例,当视频被全景模式播放的时候,跟踪人的眼睛的视线所在,将单个全景视频分成多路独立存储成多个视频,根据当前眼睛所观看视角选择有效部分解码播放。比如,某一时刻视线只集中于整幅视频画面的1/3或者更少的目标区域,则,对于视线未关注的其余2/3或更多区域不需要即时解码,如果也对这些视线未关注的区域即时解码,势必浪费设备的运算资源。通过节约掉运算资源而转化成有效的解码运算,就可以提升全景视频画质2倍或者更多,省下不必要的运算资源浪费。
本发明实施例的视频信息处理方法,包括:获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时 所采用的帧数阈值大小或阈值可调范围。其中,1、阈值大小是指不同的划分颗粒度参数对应不同的阈值,阈值可以是确定的固定值,在这些固定值间选择一个阈值用于对视频帧的划分;2、阈值可调范围是指划分颗粒度参数在一个阈值区间内波动,阈值可以不是确定的固定值,比如,阈值可以如滑窗一样变化,在阈值区间随机选择一个阈值用于对视频帧的划分。滑窗是一种控制技术,阈值与当前的运算能力相关联,根据运算能力来随机选取阈值,通过滑窗在阈值与当前的运算能力中实现协调控制,确保选择的阈值足够精确,从而确保画质清晰和运算能力的均衡。如果阈值选择和运算能力的这2个线程之间进行通信时,如果通信双方不考虑各自的情况分别执行各自的处理操作,会出现问题,比如,当前运算能力很强,能确保画质清晰度,此时并不需要更细致的划分颗粒度参数,除非是想要更高的清晰度。而运算能力很差的话,要确保画质清晰度,必须选择更细致的划分颗粒度参数。可见,随机选择的阈值相比于采用固定阈值的划分颗粒度参数而言,是更准确的保障清晰度的措施。
本发明实施例中,对于划分颗粒度参数而言,一个例子为:比如,同一个视频帧(如中国地图或世界地图)按照不同的划分颗粒度参数可以得到不同数量的子视频帧,比如,可以采用划分颗粒度参数为第一阈值时,将一个视频帧(如中国地图或世界地图)可以划分为6个子视频帧;采用划分颗粒度参数为第二阈值时,将一个视频帧(如中国地图或世界地图)可以划分为8个子视频帧;采用划分颗粒度参数为第三阈值时,将一个视频帧(如中国地图或世界地图)可以划分为10个子视频帧等等,该划分颗粒度参数是可调的,采用不同的阈值,划分颗粒度不同。就可调的划分颗粒度而言,划分的越细,则避免解码运算浪费的效果会更好,则画质清晰度会更高。划分颗粒度参数不是一成不变的,根据实际图像的画质或呈现需求也可以调整。比如,当前画质本身就检测为很清晰的画质,那么可以 选择划分颗粒度较低的处理,反之,可以选择划分颗粒度较高的处理;又如;用户本身对画质的要求不高,不需要看1080P或超清或蓝光的画质,或者,当前网络不稳定,想要看1080P或超清或蓝光的画质会出现卡顿现象,那么,用户可以根据自身需求来选择或者通过系统监控到网络不稳定来选择划分颗粒度较低的处理,反之,可以选择划分颗粒度较高的处理。
本发明实施例的视频信息处理方法,包括:当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧。所述子视频帧与所述视频帧的格式满足解码策略。子视频帧与视频帧的格式满足预设的解码策略的一个实例中,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在长度上是一致的,或者,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在帧数上是一致的。该视频帧不限于超高清的VR视频,针对VR视频为360度全景视频而言,视频以360全景模式播放的时候,视频格式为超高清。由于人的眼睛只能看到其中1/3或者更少的区域,而这片区域被放大到屏幕大小的时候,画质的清晰度就有了明显下降,采用本发明实施例,将单个全景视频分成多路独立存储成多个视频,根据当前眼睛所观看视角选择有效部分解码播放,则可以省下来不必要的解码运算所导致的资源浪费,而将解码专注于该有效部分,由于解码的视频面积越小,运算的消耗就越少,可以省下不必要的解码运算浪费,则提高了解码的效率,从而解码清晰度得到很大的提升。检测用户当前视线作用于所述视频帧形成的角度。根据所述角度定位出在所述视频帧中当前视线所锁定的目标区域。根据将所述视频帧划分成至少两个子视频帧后得到的视频编号,得到与所述目标区域对应的子视频帧。根据所述解码策略对所述子视频帧进行解码。本发明实施例中,只解码指定区域的图像,即只解码用户当前视线锁定目标区域所对应的图像,该图像以视频帧中完整图像进行 划分(或称拆分、切割、或分割)后得到的多个子视频帧来表示,其中,子视频帧为视频帧中完整图像的局部图像,即多个子视频帧可以构成一个视频帧。
本发明实施例的视频信息处理方法,包括:当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。所述子视频帧与所述视频帧的格式满足解码策略。子视频帧与视频帧的格式满足预设的解码策略的一个实例中,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在长度上是一致的,或者,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在帧数上是一致的。该视频帧不限于超高清的VR视频,针对VR视频为360度全景视频而言,视频以360全景模式播放的时候,视频格式为超高清。由于人的眼睛只能看到其中1/3或者更少的区域,而这片区域被放大到屏幕大小的时候,画质的清晰度就有了明显下降,采用本发明实施例,将单个全景视频分成多路独立存储成多个视频,根据当前眼睛所观看视角选择有效部分解码播放,则可以省下来不必要的解码运算所导致的资源浪费,而将解码专注于该有效部分,由于解码的视频面积越小,运算的消耗就越少,可以省下不必要的解码运算浪费,则提高了解码的效率,从而解码清晰度得到很大的提升。检测用户当前视线作用于所述视频帧形成的角度。根据所述角度定位出在所述视频帧中当前视线所锁定的目标区域。根据将所述视频帧划分成至少两个子视频帧后得到的视频编号,得到与所述目标区域对应的子视频帧。根据所述解码策略对所述子视频帧进行解码。本发明实施例中,只解码指定区域的图像,即只解码用户当前视线锁定目标区域所对应的图像,该图像以视频帧中完整图像进行划分(或称拆分、切割、或分割)后得到的多个子视频帧来表示,其中,子视频帧为视频帧中完整图像 的局部图像,即多个子视频帧可以构成一个视频帧。
本发明实施例中,一个实例为:所述视频帧对应整幅图像,所述至少两个子视频帧为所述视频帧对应整幅图像中的局部图像。将所述视频帧(如整幅图像的全部画面)切割成至少两个子视频帧(如整幅图像的局部画面)。当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。所述子视频帧与所述视频帧二者的格式满足预设的解码策略,比如,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在长度上是一致的,或者,一个视频帧和将其划分后得到的多路视频帧中的某一路视频帧,即某一个子视频在帧数上是一致的。
本发明实施例中,获取视频帧,将所述视频帧划分成至少两个子视频帧后,将至少两个子视频帧分别独立存储,并对至少两个子视频帧分别予以视频编号,以备后续锁定目标区域后进行子视频帧的查询。为了节约存储空间,还可以将至少两个子视频帧分别独立存储之前,对至少两个子视频帧进行压缩处理,则锁定目标区域后进行子视频帧的查询时,通过视频编号查询到对应的至少两个子视频帧,先将其进行解压缩处理后再通过解码策略对其进行解码处理。
本发明实施例的视频信息处理方法,包括:将所述至少两个子视频帧分别独立存储。根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字。所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。根据将所述视频帧划分成至少两个子视频帧后得到的视频编号,得到与所述目标区域对应的子视频帧的过程中,1)可以根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移,根据所述帧类型识别出所述子视频帧的视频类型。本实施例中,不同的视频类型会采用不同的解码策 略,如果事先知道是何种视频类型,则对于后续的快速视频解码是有帮助的,有助于提高解码效率,解码效率越快,得到的画质清晰度越高。2)根据所述存储地址偏移定位出所述子视频帧的存储位置,从所述存储位置读取所述子视频帧后进行所述解码。通过本实施例中自定义的文件存储格式,可以配合后续解码时的帧定位及帧同步。
本发明实施例的视频信息处理方法,包括:1)第一次定位,具体的,获取第一操作,比如,通过人脸检测提取瞳孔、内眼角和外眼角等特征点位置,根据视线模型等,获取人眼视线的停留位置。通过第一操作获得的第一角度,比如通过连续检测人眼视线的角度,就能定位出在所述视频帧中当前视线所锁定的第一目标区域;2)第二次定位,获取人眼视线从停留位置转移到下一个位置并对下一个位置进行定位。对所述第一操作变化至第二操作进行检测,比如,用户头部转动或者眼球转动导致用户的当前视线发生移动,根据所述第一操作变化至所述第二操作形成的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域,通过二次定位实现帧同步。3)根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。本实施例中,因为使用VR模式时,按需解码视频区域之后,每一路小视频都是跳着播放画面的,可能1号视频一开始并不需要解码,播放几秒之后,视线移动了,需要开始解码1号视频,且开始播放的时刻是第5秒。那么这种情况下就需要非常的精准的找到第5秒的帧的位置,否则,不同路视频的画面就不能做到同步。采用本发明实施例,通过二次定位及上述实施例中自定义的视频文件存储格式就可以实现精准的帧定位。
本发明实施例的一个实例为:第一次定位,比如,局部区域涉及3,4,5帧,当用户视线移动时,产生了偏移,此时需要第二次定位,并对这种视线移动产生的帧偏移进行帧同步。由于实现了帧同步,无论用户当前视线 如何移动变化,都能精准的捕捉到用户操作,并精确定位出用户当前视线移动变化后当前所锁定的目标区域及与所述目标区域对应的子视频帧,进而,根据子视频帧存储的索引信息,将与所述目标区域对应的子视频帧从存储位置精确读取出来,以用于后续的解码。本发明实施例中,对于帧同步,有两种可能,1,第二次定位涉及到的子视频帧序列2与之前第一次定位涉及到的子视频帧序列1是连续的帧,正常解码即可;2,第二次定位涉及到的子视频帧序列2与之前第一次定位涉及到的子视频帧序列1是不连续的帧,会存在一个跳帧解码的问题,对于跳帧解码,当多次跳帧或者解码失败达到预设值后会对正常的解码操作产生不利影响。采用本实施例可以实现跳帧解码,且不会对正常的解码操作产生不利影响。
本发明实施例中,根据所述解码策略对所述子视频帧进行解码,涉及多次定位,帧同步和跳帧解码。比如,若眼睛位于球心处,则可以看到VR全景的画面,传感定位时,使用手机自带的传感器或者外部设备传感器计算出一个空间角度,再将这个角度应用到3D里面控制视角,即完成了传感器控制视角的功能。第一次定位可以采用默认经纬度,比如,以视频帧为地图为例,第一次定位在地图的中心坐标点,第二次定位是跟踪视线偏移,因为眼球或头会动。当用户视线移动时,产生了偏移,此时需要第二次定位,并对这种视线移动产生的帧偏移进行帧同步。关键帧中图像组(GOP,Group Of Picture)的长度在本实施例中是非固定的,可以对其进行动态调整,即,解码失败或者跳帧解码时,可以将GOP调节至最小值,提高对于解码失败或跳帧解码导致的解码效率低下的问题。在上述实施例的自定义文件存储格式中,动态设置该GOP尽量小,可以在自定义的文件存储格式文件头记录了每一帧的类型和起始偏移量。具体的,1)当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为GOP预设值中的最小值(GOP_min),根据所述GOP_min对所述子视频帧 进行解码。2)当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
本发明实施例的一个实例为:视频的编码器(例如H264)解码的时候都必须是连续的读取帧数据才能正常解码。如果解码器解码第5帧失败或者故意跳过(VR播放时不可见区域故意忽略跳过),直到下一个GOP的开始,才能重新开始正常解码。对于解码失败或跳帧解码中存在的这个问题,可以通过减少视频关键帧的间隔来解决。具体的,调整所述至少两个子视频帧中视频关键帧的间隔GOP至GOP_min,即使用比较小的GOP值(如所述GOP_min)。采用比较小的GOP(如所述GOP_min)可以保证当跳过一些帧之后,重新成功解码前的失败次数比较少,从而避免了当多次跳帧或者解码失败达到预设值后会对正常的解码操作产生的不利影响。
上述各个实施例所执行的策略和处理等形成的处理逻辑,可以预先通过视频解码器中增加的可定制解码区域功能来实现。也就是说,视频解码器本身就支持可定制解码指定目标区域来解码。
本发明实施例的视频信息处理系统,如图4所示,包括终端41和服务器42。终端41可以采用不同的VR头戴式显示终端,比如,VR头盔、VR眼镜(由硬件实体构成的VR眼镜)、与手机终端配套使用的VR眼镜(该VR眼镜可以是可折叠的纸盒眼镜,也可以是非折叠的,即由硬件实体构成的VR眼镜)等等。服务器42中存储有各种视频文件,通过终端41与服务器42的交互,可以从服务器42中实时在线或者预先离线下载到需要播放的视频文件。在终端41本地播放视频文件时,通过终端41中的划分单元411、检测单元412、第一处理单元413、第二处理单元414和解码单元415进行处理。其中,划分单元411,配置为获取视频帧,将所述视频帧划分成 至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;检测单元412,检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;第一处理单元413,配置为根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;第二处理单元414,配置为获取与所述目标区域对应的子视频帧,比如根据将所述视频帧划分成至少两个子视频帧后得到的视频编号,得到与所述目标区域对应的子视频帧;解码单元415,配置为根据所述解码策略对所述子视频帧进行解码。
采用本发明实施例,通过上述角度(如已知的空间角度)和上述视频编号(如,视频画面的划分或切割或分割编号),可以得到视频帧(比如一副中国地图或世界地图等)对应图像的子视频帧中哪些是在用户的当前视线区域范围内。本实施例中,根据所述解码策略对所述子视频帧进行解码,是只解码对应子视频的图像并将其渲染出来以提供给用户进行VR的全景视频呈现,而不在当前视线区域范围内的,就直接忽略,不予以解码处理。
一个实例中,获取第一视频帧至第i视频帧,将所述第一视频帧至第i视频帧依序处理为对应第一视频帧至第i视频的多个第一子视频帧和多个第i子视频帧。其中,第一视频帧与多个第一子视频帧在长度和/或帧数上是一致的,第i视频帧与多个第i子视频帧在长度和/或帧数上是一致的。第一视频帧与第i视频帧在长度和/或帧数上不需要保持一致。检测用户当前视线作用于所述第一视频帧形成的角度,根据所述角度定位出在所述第一视频帧中当前视线所锁定的目标区域。比如,对于所述第一视频帧的完整图像,将该完整图像划分为8块,对应第一子视频帧的编号为1、2、……8。当前视线所锁定的目标区域为对应第一子视频帧编号2、3、6、7所构成的区域,则,根据将第一视频帧划分成多个第一子视频帧后得到的视频编号2、3、6、7,从子视频的存储位置中得到与目标区域对应的子视频帧,最终,根据解码策略对所述子视频帧进行解码。
本发明实施例一实施方式中,所述划分单元,还配置为:获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
本发明实施例一实施方式中,视频帧对应整幅图像,所述至少两个子视频帧为所述视频帧对应整幅图像中的局部图像;当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
本发明实施例一实施方式中,所述终端还包括:存储单元,配置为将所述至少两个子视频帧分别独立存储。索引创建单元,配置为根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
本发明实施例一实施方式中,所述第二处理单元,还配置为:根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;根据所述帧类型识别出所述子视频帧的视频类型;根据所述存储地址偏移定位出所述子视频帧的存储位置;从所述存储位置读取所述子视频帧。
本发明实施例一实施方式中,所述终端还包括:第一定位单元,配置为根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;第二定位单元,配置为当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线 移动后所锁定的第二目标区域;帧同步单元,配置为根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
本发明实施例一实施方式中,所述解码单元,还配置为:当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
以一个现实应用场景为例对本发明实施例阐述如下:
采用本发明实施例使用多路视频组提升VR视频清晰度的过程中,以将视频帧划分为8个子视频帧为例,对本发明实施例进行描述如下:
首先,将原有超高清VR视频源拆分成多个独立视频存储,例如一个标准4K视频,即视频格式为3840x2160的视频拆分成8个960x1080的视频,如图5所示。视频画面被切割了,但是每一个小视频文件的长度和帧数都和原视频一样。
其次,是这8个子视频按照自定义格式存储,GOP尽量小,自定义格式文件头记录了每一帧的类型和起始偏移量。VR视频播放时,是将上述完整画面贴到一个球上,参见图6-图9。视频画面是一个世界地图,如图6所示,VR渲染原理就是将这种特殊画面渲染到一个球上。没有贴图的球,如图7所示。图7中未经渲染贴图的球,能清晰的看到经纬度。贴了图,但只渲染了轮廓的球,如图8所示,图8中经渲染贴图的球,由于贴图附加了轮廓到球上,将部分经纬度遮盖,图8中轮廓的图像与图9中最终渲染得到的图像轮廓是一致的。成功将VR视频画面渲染到一个球上,如图9所示。采用VR技术,若眼睛位于图9的球心处,则可以看到如图10所示 的VR全景的画面。
人的眼睛看世界,上下角度或者左右角度绝对不可能超过180度。计算机里面也一样,通常能看到的是一个100度左右的弧形区域。所以上面的世界地图在任何时刻只有其中少部分的画面内容是被眼睛看见的。
当头部旋转时,直接使用手机自带的传感器或者外部设备传感器计算出一个空间角度,再将这个角度应用到3D里面控制视角,即完成了传感器控制视角的功能。通过已知的空间角度和视频画面的分割编号,可以计算出当前世界地图的所有编号子画面哪些是在视线区域范围内。进而就只解码对应的子视频的图像并渲染出来,不在范围内的就直接忽略。解码器解码运算的消耗是和图像面积成正比的,面积越小运算消耗越少,进而可以省下很大一部分不必要的解码运算浪费。画质清晰度瓶颈产生的原因为:1)普通移动设备视频解码性能瓶颈,极限1080P。2)全景视频要求更高清晰度的视频画面,例如4K或者8K。目前,实际全景视频解码后的画面,如图11所示。在VR头戴式显示终端中看到的区域为图11中A1所标识的目标区域,如图12所示。在当前时刻,该目标区域实际上在整个画面中的占比很小。现有的编码技术和硬件处理性能,不能直接让移动设备可以流畅解码4K或者8K的视频,而采用本发明实施例,可以基于目前的处理性能,提高解码效率。具体的,要避免处理性能的浪费,如此一来,就能提升相当程度的画质清晰度。比如,播放当前这一帧的时候,后台运算只需要解码A1所标识的目标区域的画面,就算再加上一部分冗余,也能避免50%左右的无效处理。仍然以图11所述的视频为例,采用本发明实施例,对画面进行切割处理分成8块后分别压缩生成新视频,即将视频帧切割为8个子视频帧后压缩并分别独立存储,如图13所示,A1所标识的目标区域由编号3,4,7,8的区域所构成,相应的,对应3,4,7,8的区域所在的子视频帧。那么,播放的时候就只需要解码3,4,7,8子视频的当前这一帧 就可以了。如果视频画面被切割的更细,还可以进一步的避免浪费。比如,省下了50%的运算能力,那么,可以将视频分辨率提升得更高一些,画面面积就可以增加一倍,进而提升清晰度质量。
采用本发明实施例,还要考虑多路视频的帧同步和跳帧解码的问题。当一个大视频被拆分成16个小视频之后,假设大视频是1000x1000,10秒长,一共有300帧画面。那么每一个小视频的分辨率是250x250,也是十秒长,一共也有300帧画面。但是一般视频的编码器(例如H264)解码的时候都必须是连续的读取帧数据才能正常解码。如果解码器解码第5帧失败或者故意跳过(VR播放时不可见区域故意忽略跳过),则解码第6,7,8,9...帧都会失败,直到下一个GOP的开始,才能重新开始正常解码,影响到解码效率。可以通过减少GOP来解决这个问题,即:使用比较小的GOP值。比较小的GOP可以保证当跳过一些帧之后,重新成功解码前的失败次数比较少,从而避免了多次解码失败或跳帧解码所导致的问题。
当使用VR模式,按需解码视频区域之后,每一路小视频都是跳着播放画面的,可能1号视频一开始并不需要解码,播放几秒之后,视线移动了,需要开始解码1号视频,且开始播放的时刻是第5秒。那么这种情况下就需要非常的精准的找到第5秒的帧的位置,否则,不同路视频的画面就不能做到同步。采用本发明实施例,可以使用上述自定义的视频文件存储格式,在文件头内增加视频所有帧的索引,索引记录了每一帧的类型和文件地址偏移,则通过索引记录可以快速定位到任何一帧开始读取并解码,以实现精准的帧定位。
本发明实施例的一种终端,如图14所示,所述终端包括:处理器61和用于存储能够在处理器上运行的计算机程序的存储器,存储器的一个表现形式可以为如图14所示的计算机存储介质63,还包括用于数据通信的总线62。
所述处理器用于运行所述计算机程序时,执行:
获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
获取与所述目标区域对应的子视频帧;
根据所述解码策略对所述子视频帧进行解码。
所述处理器用于运行所述计算机程序时,还执行:
获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
所述处理器用于运行所述计算机程序时,还执行:
当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
所述处理器用于运行所述计算机程序时,还执行:
将所述至少两个子视频帧分别独立存储;
根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
所述处理器用于运行所述计算机程序时,还执行:
根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
根据所述帧类型识别出所述子视频帧的视频类型;以及根据所述存储地址偏移定位出所述子视频帧的存储位置;
从所述存储位置读取所述子视频帧。
所述处理器用于运行所述计算机程序时,还执行:
根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;
当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
所述处理器用于运行所述计算机程序时,还执行:
当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
本发明实施例的一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令用于执行:
获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
获取与所述目标区域对应的子视频帧;
根据所述解码策略对所述子视频帧进行解码。
该计算机可执行指令还用于执行:
获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
该计算机可执行指令还用于执行:
当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
该计算机可执行指令还用于执行:
将所述至少两个子视频帧分别独立存储;
根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
该计算机可执行指令还用于执行:
根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
根据所述帧类型识别出所述子视频帧的视频类型;以及根据所述存储地址偏移定位出所述子视频帧的存储位置;
从所述存储位置读取所述子视频帧。
该计算机可执行指令还用于执行:
根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所 述视频帧中当前视线所锁定的第一目标区域;
当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
该计算机可执行指令还用于执行:
当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
本发明实施例的一种视频信息处理方法,所述方法由终端执行,所述终端包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
获取与所述目标区域对应的子视频帧;
根据所述解码策略对所述子视频帧进行解码。
一实施例中,所述获取视频帧,将所述视频帧划分成至少两个子视频帧包括:
获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
一实施例中,所述视频帧对应整幅图像,所述至少两个子视频帧为整幅图像中的局部图像;
当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
一实施例中,所述方法还包括:
将所述至少两个子视频帧分别独立存储;
根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
一实施例中,所述获取与所述目标区域对应的子视频帧,包括:
根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
根据所述帧类型识别出所述子视频帧的视频类型;以及根据所述存储地址偏移定位出所述子视频帧的存储位置;
从所述存储位置读取所述子视频帧。
一实施例中,所述方法还包括:
根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;
当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
一实施例中,所述根据所述解码策略对所述子视频帧进行解码,包括:
当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的图像组的长度GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可 以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
工业实用性
采用本发明实施例,将所述视频帧划分成至少两个子视频帧后,通过角度检测,角度定位来锁定目标区域,得到与所述目标区域对应的子视频帧。由于子视频帧是视频帧中全部图像中的局部图像,因此,对该子视频帧的解码而不是对全部视频的解码,会提高解码效率,而解码效率的提高,能改善画质的清晰度,使得画质的清晰度得到保障及大幅的提升。

Claims (23)

  1. 一种视频信息处理方法,所述方法包括:
    获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
    检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
    根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
    获取与所述目标区域对应的子视频帧;
    根据所述解码策略对所述子视频帧进行解码。
  2. 根据权利要求1所述的方法,其中,所述获取视频帧,将所述视频帧划分成至少两个子视频帧包括:
    获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
    当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
    当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
  3. 根据权利要求1所述的方法,其中,所述视频帧对应整幅图像,所述至少两个子视频帧为整幅图像中的局部图像;
    当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
  4. 根据权利要求1至3任一项所述的方法,其中,所述方法还包括:
    将所述至少两个子视频帧分别独立存储;
    根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息, 并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
    所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
  5. 根据权利要求4所述的方法,其中,所述获取与所述目标区域对应的子视频帧,包括:
    根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
    根据所述帧类型识别出所述子视频帧的视频类型;以及根据所述存储地址偏移定位出所述子视频帧的存储位置;
    从所述存储位置读取所述子视频帧。
  6. 根据权利要求4所述的方法,其中,所述方法还包括:
    根据人眼视线的停留位置,基于预定算法检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;
    当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
    根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
  7. 根据权利要求4所述的方法,其中,所述根据所述解码策略对所述子视频帧进行解码,包括:
    当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的图像组GOP的长度调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
    当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
  8. 一种终端,所述终端包括:
    划分单元,配置为获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
    检测单元,配置为检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
    第一处理单元,配置为根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
    第二处理单元,配置为获取与所述目标区域对应的子视频帧;
    解码单元,配置为根据所述解码策略对所述子视频帧进行解码。
  9. 根据权利要求8所述的终端,其中,所述划分单元,还配置为:
    获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
    当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
    当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
  10. 根据权利要求8所述的终端,其中,所述视频帧对应整幅图像,所述至少两个子视频帧为整幅图像中的局部图像;
    当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
  11. 根据权利要求8至10任一项所述的终端,其中,所述终端还包括:
    存储单元,配置为将所述至少两个子视频帧分别独立存储;
    索引创建单元,配置为根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
    所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
  12. 根据权利要求11所述的终端,其中,所述第二处理单元,还配置为:
    根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
    根据所述帧类型识别出所述子视频帧的视频类型;根据所述存储地址偏移定位出所述子视频帧的存储位置;
    从所述存储位置读取所述子视频帧。
  13. 根据权利要求11所述的终端,其中,所述终端还包括:
    第一定位单元,配置为根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;
    第二定位单元,配置为当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
    帧同步单元,配置为根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
  14. 根据权利要求11所述的终端,其中,所述解码单元,还配置为:
    当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键帧的图像组的长度GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
    当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
  15. 一种终端,所述终端包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,所述处理器用于运行所述计算机程序时,执行上述权利要求1-7任一项所述的视频信息处理方法。
  16. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令用于执行上述权利要求1-7任一项所述的视频信息处理方法。
  17. 一种视频信息处理方法,所述方法由终端执行,所述终端包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
    获取视频帧,将所述视频帧划分成至少两个子视频帧,所述子视频帧与所述视频帧的格式满足解码策略;
    检测人眼当前视线作用于所述视频帧的显示区域所形成的空间角度;
    根据所述角度,定位出在所述显示区域中当前视线所锁定的目标区域;
    获取与所述目标区域对应的子视频帧;
    根据所述解码策略对所述子视频帧进行解码。
  18. 根据权利要求17所述的方法,其中,所述获取视频帧,将所述视频帧划分成至少两个子视频帧包括:
    获取划分颗粒度参数,所述划分颗粒度参数用于表征将所述视频帧划分为所述至少两个子视频帧时,所采用的帧数阈值大小或阈值可调范围;
    当根据所述划分颗粒度参数得到所述阈值大小时,根据当前阈值将所述视频帧划分为对应所述当前阈值的子视频帧;
    当根据所述划分颗粒度参数得到所述阈值可调范围时,从所述阈值可调范围内随机选择一个阈值,根据选定的阈值将所述视频帧划分为对应数量的子视频帧。
  19. 根据权利要求17所述的方法,其中,所述视频帧对应整幅图像,所述至少两个子视频帧为整幅图像中的局部图像;
    当所述子视频帧与所述视频帧在播放长度上和/或帧数上为一致时,则所述子视频帧与所述视频帧的格式满足解码策略。
  20. 根据权利要求17至19任一项所述的方法,其中,所述方法还包括:
    将所述至少两个子视频帧分别独立存储;
    根据所述至少两个子视频帧的帧类型和存储地址偏移创建索引信息,并以子视频帧对应的视频编号作为所述索引信息的索引关键字;
    所述视频编号为将所述视频帧划分成至少两个子视频帧后得到的视频编号。
  21. 根据权利要求20所述的方法,其中,所述获取与所述目标区域对应的子视频帧,包括:
    根据所述视频编号从所述索引信息中查询到帧类型和存储地址偏移;
    根据所述帧类型识别出所述子视频帧的视频类型;以及根据所述存储地址偏移定位出所述子视频帧的存储位置;
    从所述存储位置读取所述子视频帧。
  22. 根据权利要求20所述的方法,其中,所述方法还包括:
    根据人眼视线的停留位置,连续检测人眼视线的角度,并定位出在所述视频帧中当前视线所锁定的第一目标区域;
    当人眼视线转移时,根据人眼视线的角度变化,定位出在所述视频帧中当前视线移动后所锁定的第二目标区域;
    根据所述第一目标区域和所述第二目标区域对应的子视频帧偏移,对所述索引信息中的存储地址偏移进行帧同步。
  23. 根据权利要求20所述的方法,其中,所述根据所述解码策略对所述子视频帧进行解码,包括:
    当对所述子视频帧解码失败时,将所述至少两个子视频帧中视频关键 帧的图像组的长度GOP调整为GOP预设值中的最小值GOP_min,根据所述GOP_min对所述子视频帧进行解码;
    当所述子视频帧为不连续的帧时,进行跳帧解码,将所述至少两个子视频帧中视频关键帧的间隔GOP调整为所述GOP_min,根据所述GOP_min对所述子视频帧进行解码。
PCT/CN2018/080579 2017-04-27 2018-03-26 一种视频信息处理方法及终端、计算机存储介质 WO2018196530A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710289910.X 2017-04-27
CN201710289910.XA CN108810574B (zh) 2017-04-27 2017-04-27 一种视频信息处理方法及终端

Publications (1)

Publication Number Publication Date
WO2018196530A1 true WO2018196530A1 (zh) 2018-11-01

Family

ID=63918001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080579 WO2018196530A1 (zh) 2017-04-27 2018-03-26 一种视频信息处理方法及终端、计算机存储介质

Country Status (2)

Country Link
CN (1) CN108810574B (zh)
WO (1) WO2018196530A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109640151A (zh) * 2018-11-27 2019-04-16 Oppo广东移动通信有限公司 视频处理方法、装置、电子设备以及存储介质
CN110933364A (zh) * 2019-10-25 2020-03-27 深圳市道通智能航空技术有限公司 全向视觉避障实现方法、系统、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015054235A1 (en) * 2013-10-07 2015-04-16 Vid Scale, Inc. User adaptive 3d video rendering and delivery
US9232257B2 (en) * 2010-09-22 2016-01-05 Thomson Licensing Method for navigation in a panoramic scene
CN105791882A (zh) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 视频编码方法及装置
CN105915937A (zh) * 2016-05-10 2016-08-31 上海乐相科技有限公司 一种全景视频播放方法及设备
CN105916060A (zh) * 2016-04-26 2016-08-31 乐视控股(北京)有限公司 数据传输的方法、装置及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106060515B (zh) * 2016-07-14 2018-11-06 腾讯科技(深圳)有限公司 全景媒体文件推送方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9232257B2 (en) * 2010-09-22 2016-01-05 Thomson Licensing Method for navigation in a panoramic scene
WO2015054235A1 (en) * 2013-10-07 2015-04-16 Vid Scale, Inc. User adaptive 3d video rendering and delivery
CN105791882A (zh) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 视频编码方法及装置
CN105916060A (zh) * 2016-04-26 2016-08-31 乐视控股(北京)有限公司 数据传输的方法、装置及系统
CN105915937A (zh) * 2016-05-10 2016-08-31 上海乐相科技有限公司 一种全景视频播放方法及设备

Also Published As

Publication number Publication date
CN108810574A (zh) 2018-11-13
CN108810574B (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
US11245939B2 (en) Generating and transmitting metadata for virtual reality
US11653065B2 (en) Content based stream splitting of video data
US9363542B2 (en) Techniques to provide an enhanced video replay
KR20190022851A (ko) 콘텐츠를 제공 및 디스플레이하기 위한 장치 및 방법
KR102384489B1 (ko) 정보 처리 장치, 정보 제공 장치, 제어 방법, 및 컴퓨터 판독가능 저장 매체
US20220095002A1 (en) Method for transmitting media stream, and electronic device
EP3434021B1 (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
WO2019149066A1 (zh) 视频播放方法、终端设备及存储介质
WO2018196530A1 (zh) 一种视频信息处理方法及终端、计算机存储介质
KR20180052255A (ko) 스트리밍 컨텐츠 제공 방법, 및 이를 위한 장치
GB2567136A (en) Moving between spatially limited video content and omnidirectional video content
EP4021001A1 (en) Code stream processing method and device, first terminal, second terminal and storage medium
US11488633B2 (en) Playback device
US11134236B2 (en) Image processing device and system
US8933997B2 (en) Video output apparatus and method for controlling the same
KR101874084B1 (ko) 영상 처리 장치, 그 제어 방법 및 컴퓨터 프로그램이 기록된 기록 매체
CN110636336A (zh) 发送装置及方法、接收装置及方法及计算机可读存储介质
CN115002335B (zh) 视频处理方法、装置、电子设备和计算机可读存储介质
EP4044584A1 (en) Panoramic video generation method, video acquisition method, and related apparatuses
KR20180013243A (ko) 스트리밍 컨텐츠 제공 방법, 스트리밍 컨텐츠 저장 방법 및 이를 위한 장치
US20240013475A1 (en) Transparency range for volumetric video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18790620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18790620

Country of ref document: EP

Kind code of ref document: A1