WO2021249562A1 - Information transmission method, related device, and system - Google Patents

Information transmission method, related device, and system Download PDF

Info

Publication number
WO2021249562A1
WO2021249562A1 PCT/CN2021/099866 CN2021099866W WO2021249562A1 WO 2021249562 A1 WO2021249562 A1 WO 2021249562A1 CN 2021099866 W CN2021099866 W CN 2021099866W WO 2021249562 A1 WO2021249562 A1 WO 2021249562A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
encoding
tracking
decoding device
image
Prior art date
Application number
PCT/CN2021/099866
Other languages
French (fr)
Chinese (zh)
Inventor
李龙龙
邸佩云
方华猛
宋翼
邹奕成
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021249562A1 publication Critical patent/WO2021249562A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/127Prioritisation of hardware or computational resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria

Definitions

  • the present invention relates to the technical field of video coding and decoding, in particular to an information transmission method, related equipment and system.
  • CloudVR virtual reality
  • the high-load tasks such as required rendering have a very large demand for computing resources.
  • a high-performance game console is required, which leads to the high cost of VR games and cannot meet the requirements.
  • People need to play VR games anytime, anywhere.
  • CloudVR technology uses the idea of terminal-cloud collaboration to separate VR game rendering and VR game interaction.
  • VR game interaction is completed in the terminal, and the interactive instructions (head-display and gesture posture and position information, user output instructions, etc.) are transmitted to the wireless network through the wireless network.
  • Cloud server The cloud server completes the rendering of the game according to the received interactive instructions, and transmits the game screen to the terminal for display through the wireless network.
  • CloudVR technology can significantly reduce the cost of VR game terminals and enable users to access the Internet to play VR games anytime, anywhere.
  • the cloud server and the terminal are connected through a wireless network.
  • the delay is too large at the time, obvious black borders will appear in the VR game screen. Unsmooth phenomena such as screen freezes, etc., significantly affect the VR gaming experience.
  • the embodiments of the present application provide an information transmission method, related equipment, and system, which can reduce the time delay of the CloudVR system, reduce or even eliminate the unevenness of the device such as black edges and screen freezes, and improve the user experience.
  • the embodiments of the present application provide an information transmission method.
  • the method is described from the perspective of an encoding device, including: receiving tracking information of a decoding device, where the tracking information includes motion information or pose of the decoding device Information; according to the tracking information of the decoding device, configure the encoding information of the image to be processed; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters; according to the encoding information, the encoding information of the image to be processed is configured
  • the image is processed for encoding, and a code stream is sent to the decoding device, where the code stream includes the one or more encoding parameters.
  • the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the performance of the decoding device Movement conditions, in a specific embodiment, the movement information includes the movement speed and/or acceleration of the decoding device, the movement speed includes an angular velocity and/or a linear velocity, and the acceleration includes an angular acceleration and/or a linear acceleration.
  • the pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space, and the position may be in a three-dimensional coordinate system
  • the correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two.
  • there may be a direct mapping relationship between the tracking information and the encoding information that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information.
  • there may be an indirect correlation between tracking information and coded information For example, it is necessary to perform certain algorithm processing or condition judgment on the tracking information to determine the corresponding coded information. After the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information.
  • the encoding information includes one or more encoding parameters used by the encoder of the encoding device to encode the image to be processed (or called the image to be encoded). Since the encoder uses the encoding parameters to perform the encoding process, this means that the use of different encoding parameters requires different amounts of calculation and different computational complexity during the encoding process. That is to say, in this application, the encoding device can adjust the encoding parameters configured by itself based on the tracking information uploaded by the decoding device in real time, thereby realizing the adjustment of the encoding calculation complexity.
  • the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration from the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information.
  • the adjustment strategy may be to adjust the computational complexity (ie, encoding parameters) of the encoder.
  • one of the main components of the delay is the encoding delay in the system
  • the embodiment of the application adjusts the encoding delay of the encoder by adjusting the calculation complexity of the encoder, thereby reducing the overall System delay, the image-related information and coding parameters can be sent to the decoding device subsequently, so that the decoding device can decode and display normally.
  • this application reduces the calculation complexity of the encoder to reduce the system delay, which can greatly reduce or even eliminate the possibility of the black border of the picture, so that the decoding device receives After the code stream is reached, the image can be decoded and displayed in time, which also ensures the smoothness of the display of the decoding device and avoids the occurrence of jams.
  • the configuring encoding information of the image to be processed according to the tracking information of the decoding device specifically includes: querying a preset mapping relationship according to the tracking information to obtain the The encoding information of the image, the preset mapping relationship includes the mapping relationship between the tracking information and the encoding information; the encoding information is configured.
  • the preset mapping relationship may be pre-stored in the storage unit of the encoding device, and is used to characterize the mapping relationship between the tracking information and the encoding information.
  • the preset mapping relationship may be a mapping table, which may directly record the mapping relationship between various tracking information and encoding parameters, or the mapping table may record various values of motion information or pose information.
  • the mapping relationship between the range and the encoding parameter by determining which value range the specific value in the tracking information is in, can determine which encoding parameter the tracking information corresponds to.
  • the encoding device can configure the encoding information (one or more encoding parameters) into the encoder (that is, to replace the previously configured encoding parameters), thereby realizing the encoding parameters of the encoder
  • the adjustment of, that is, the coding calculation complexity of the coding process is adjusted.
  • the embodiment of the present application can realize rapid adjustment of encoding calculation complexity by setting the preset mapping relationship, thereby realizing rapid adjustment of encoding delay, which is beneficial to eliminating black edges and jams on the decoding end.
  • technicians can define the specific content of the preset mapping relationship according to actual needs and set it in the encoding device. Therefore, the embodiments of the present application are also beneficial to provide various preset mapping relationship selection possibilities to meet various requirements. Application scenarios to meet actual coding needs.
  • the one or more encoding parameters include deblocking filter (deblock_filter) parameters, reference frame number (Ref), motion estimation search range (me_range), motion estimation method (me_method ), one or more of sub-pixel subdivision intensity (subme), and lookahead optimizer parameters.
  • the deblocking filter parameter is used to indicate whether to activate the deblock_filter function to perform deblocking filtering on the reconstructed image.
  • the number of reference frames parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction.
  • the motion estimation search range parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder.
  • the motion estimation method is used to indicate the setting of the full-pixel motion estimation method.
  • the motion estimation method includes the motion search algorithm (such as the diamond search algorithm, the hexagon search algorithm, the asymmetric cross multi-level hexagon grid search algorithm, etc.) .
  • the sub-pixel subdivided intensity parameter is used to indicate the dynamic prediction and partition mode.
  • the advance optimizer parameter is used to set the frame buffer size for thread prediction.
  • the tracking information when the tracking information is greater than or equal to a preset threshold, the tracking information maps the first encoding information; when the tracking information is less than the preset threshold, the tracking information maps Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
  • the deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on.
  • the specific value in the tracking information such as speed, acceleration, acceleration, position, attitude, etc.
  • the specific value is less than the preset threshold
  • the number of reference frames in the first encoded information is smaller than the number of reference frames in the second encoded information.
  • the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value, at this time, 0 ⁇ Ref ⁇ 2 is configured, thereby reducing the number of reference frames in the encoding prediction.
  • the coding complexity is reduced, thereby reducing the coding delay, and avoiding black bars and jams on the decoding end.
  • configure 16 ⁇ Ref ⁇ 2 the number of reference frames in coding prediction increases, and the coding complexity increases. Due to the slower motion of the decoding end, the delay caused by increasing the number of reference frames Will not cause black borders, jams and other phenomena.
  • the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information.
  • the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value
  • 4 ⁇ me_range ⁇ 8 is configured at this time, thereby reducing the motion estimation radius in the coding prediction.
  • the coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end.
  • 8 ⁇ me_range ⁇ 64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena.
  • the calculation amount of the motion estimation mode in the first coded information is smaller than the calculation amount of the motion estimation mode in the second coded information.
  • a relatively simple motion estimation method is configured at this time, such as the diamond search algorithm dia, the search algorithm is relatively simple , The amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end.
  • relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena.
  • the sub-pixel subdivision intensity in the first coded information is less than the sub-pixel subdivision intensity in the second coded information.
  • the specific values in the tracking information such as speed, acceleration, acceleration, position, posture, etc.
  • configure subme to be equal to 0 or 1 thereby reducing coding complexity and coding Time delay to avoid black borders and jams at the decoding end.
  • configure 1 ⁇ subme ⁇ 11 to increase coding complexity. Since the decoding end moves slowly, the delay caused will not cause black bars, jams, etc.
  • the advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
  • the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, then configure 0 ⁇ lookahead ⁇ 2 to reduce the size of the frame buffer, thereby reducing coding complexity Therefore, the encoding delay is reduced to avoid black bars and jams at the decoding end.
  • the specific value is less than the preset threshold, configure 2 ⁇ lookahead ⁇ 250 to increase the size of the frame buffer, thereby increasing the coding complexity. Due to the slower motion of the decoding end, the delay caused will not cause black borders, Caton and other phenomena.
  • the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.
  • head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture.
  • Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements.
  • Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head.
  • Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment. It can be seen that the embodiments of the present application can be applied to a variety of tracking scenarios, meet the needs of users in different scenarios, and improve the applicability and commercial value of the present application.
  • the decoding device includes a virtual reality (VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (MR) device, or an unmanned A kind of aircraft flying glasses.
  • VR virtual reality
  • AR Augmented Reality
  • MR mixed reality
  • the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology
  • AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology
  • MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology.
  • the decoding device can be a head-mounted display device (Head Mount Display, HMD), and the head-mounted display device and the host (ie, the encoding device) can communicate and interact in a wireless or wired manner, and the host encodes the image Then it is transmitted to the head-mounted display device, and the head-mounted display device decodes and displays the image, thereby bringing the user a visual experience and interactive experience of VR/AR/MR.
  • HMD Head Mount Display
  • UAV flight glasses are devices used to interact with the UAV's camera.
  • the flight glasses and the drone can communicate and interact wirelessly.
  • the drone encodes the captured image/video and transmits it to the flight glasses.
  • the flight glasses decode the image and display it, thereby bringing the user a vision experience of the drone. , It can even realize the control of the drone's flight attitude/shooting direction.
  • an embodiment of the present application provides a device for encoding an image.
  • the device is applied to an encoding device and includes: a receiving module, a parameter adjustment module, an encoding module, and a transmitting module.
  • the receiving module is used for receiving and decoding.
  • the tracking information of the device includes the motion information or the pose information of the decoding device; the parameter adjustment module is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information and The encoding information is associated, and the encoding information includes one or more encoding parameters; an encoding module is used to encode the image to be processed according to the encoding information; and a transmission module is used to send a code stream to the In the decoding device, the code stream includes the one or more encoding parameters.
  • the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the decoding device
  • the motion information includes the motion speed and/or acceleration of the decoding device
  • the motion speed includes angular velocity and/or linear velocity
  • the acceleration includes angular acceleration and/or linear acceleration.
  • the pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space.
  • the correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the relationship between the two is stored in the encoding device.
  • there may be a direct mapping relationship between the tracking information and the encoding information that is, the tracking information and the encoding information are bound, and the parameter adjustment module can directly determine the encoding information through the tracking information.
  • the tracking information and the coded information may be indirectly related.
  • the parameter adjustment module needs to perform certain algorithm processing or conditional judgment on the tracking information to determine the corresponding coded information. After the receiving module receives the specific tracking information uploaded by the decoding device, it can determine the corresponding encoding information according to the specific tracking information.
  • the encoding information includes one or more encoding parameters used by the encoder of the encoding device to encode the image to be processed (or called the image to be encoded). Since the encoder uses the encoding parameters to perform the encoding process, this means that the use of different encoding parameters requires different amounts of calculation and different computational complexity during the encoding process. That is to say, in this application, the encoding device can adjust the encoding parameters configured by itself based on the tracking information uploaded by the decoding device in real time, thereby realizing the adjustment of the encoding calculation complexity.
  • the device of the embodiment of the present application can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration feedback from the decoding device in real time, and perform an analysis of the tracking information in the encoding device according to the received tracking information.
  • the encoding information of the encoder is adjusted.
  • the adjustment strategy can be to adjust the computational complexity of the encoder (ie encoding parameters), thereby reducing the overall system delay, and subsequently sending the image-related information and encoding parameters to the decoding device to facilitate the decoding device
  • Normal decoding and display can greatly reduce or even eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of jams.
  • the parameter adjustment module is specifically configured to: query a preset mapping relationship according to the tracking information to obtain the encoding information of the image to be processed, and the preset mapping relationship includes all The mapping relationship between the tracking information and the encoding information; configuring the encoding information.
  • the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, and advanced optimizer parameters One or more of.
  • the tracking information when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first encoding information; when the tracking information is less than the preset threshold, the tracking information is mapped Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
  • the deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;
  • the number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;
  • the calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information
  • the sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information
  • the advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
  • the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.
  • the motion information of the decoding device includes the motion speed and/or acceleration of the decoding device, the motion speed includes angular velocity and/or linear velocity, and the acceleration includes angular acceleration And/or linear acceleration.
  • the decoding device includes one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
  • the functional modules of the device can cooperate with each other to implement the methods described in the related embodiments of the first aspect.
  • an embodiment of the present application provides a device for encoding an image.
  • the device may be an encoding device.
  • the encoding device includes: a memory, a processor, and a transceiver, and each component in the memory, the processor, and the transceiver It may be connected by a bus, or at least two components of the memory, the processor, and the transceiver may be coupled together. in:
  • the transceiver is used to receive data from the outside world and send data to the outside world;
  • the memory is used to store program instructions and data
  • the processor is configured to execute program instructions in the memory to implement the method described in the first aspect or any possible embodiment of the first aspect.
  • an embodiment of the present application provides a system that includes an encoding device and a decoding device, wherein: the decoding device is configured to send tracking information of the decoding device to the encoding device; and the tracking information includes all The motion information or pose information of the decoding device; the encoding device is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding The information includes one or more encoding parameters; the image to be processed is encoded according to the encoding information, and a code stream is sent to the decoding device, and the code stream includes the one or more encoding parameters.
  • the decoding device is used to decode and display the image according to the code stream.
  • the encoding device may be the encoding device described in any embodiment of the second aspect or the third aspect.
  • an embodiment of the present application provides a computing node cluster (or cloud cluster), including: at least one computing node, each computing node includes a processor and a memory, and the processor executes the code in the memory Perform the method according to any one of the embodiments of the first aspect.
  • an embodiment of the present invention provides a non-volatile computer-readable storage medium; the computer-readable storage medium is used to store implementation code of the method described in the first aspect.
  • the program code is executed by a computer, the computer is used to implement the method described in any one of the embodiments of the first aspect.
  • an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computer, the computer executes the method described in any one of the embodiments of the first aspect.
  • the computer program product may be a software installation package.
  • the computer program product may be downloaded and executed on the computer to achieve The method described in any embodiment of the first aspect.
  • the encoding device can receive instructions that contain at least one type of information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, according to the position/posture/linear velocity/angular velocity of the decoding device /Acceleration automatically adjusts the computational complexity (encoding parameters) of the encoder, thereby adjusting the encoding delay, thereby reducing the overall system delay, and then the image-related information and the configured encoding parameters can be sent to the decoding device for the convenience of the decoding device Decode and display normally.
  • the computational complexity encoding parameters
  • the present application reduces the system delay by reducing the computational complexity of the encoding device in the encoding process, which can fundamentally eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of stuttering.
  • FIG. 1 is a block diagram of an example video decoding system 10 provided by an embodiment of the present application.
  • FIG. 2 is an example diagram of a device experience scenario applied in an embodiment of the present application
  • FIG. 3 is an example diagram of yet another device experience scenario applied in an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of a video decoding device provided by an embodiment of the present application.
  • FIG. 5 is a simplified block diagram of a device that can be used as either or both of a source device and a destination device according to an embodiment of the present application;
  • FIG. 6 is an example diagram of a head-turning scene of a user wearing a device provided by an embodiment of the present application.
  • FIG. 7 is an example diagram of a black border phenomenon provided by an embodiment of the present application.
  • FIG. 8 is an example flow chart of an information transmission solution provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of four tracking modes for realizing interaction between users and screens involved in an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of an information transmission method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of some search templates provided by embodiments of the present application.
  • FIG. 12 is a logical schematic diagram of determining encoding information according to tracking information according to an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of yet another information transmission method provided by an embodiment of the present application.
  • FIG. 14 is an example flowchart of another information transmission solution provided by an embodiment of the present application.
  • FIG. 15 is a structural diagram of a system provided by an embodiment of the present application and an encoding device and a decoding device in the system.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships. For example, “A and/or B” can mean: only A, only B, and both A and B. , Where A and B can be singular or plural. The character “/” generally indicates that the associated objects before and after are in an “or” relationship. "The following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • Video coding generally refers to a technology that processes a sequence of pictures that form a video or video sequence.
  • the video encoding technology used in this article may include video encoding (video encoding) and video decoding (video decoding).
  • Video encoding is performed on the source side, and usually includes processing (for example, by compressing) the original video picture to reduce the amount of data required to represent the video picture, so as to store and/or transmit more efficiently.
  • Video decoding is performed on the destination side, and usually includes inverse processing relative to the encoder to reconstruct the video picture.
  • the “encoding” of video pictures involved in the embodiments should be understood as involving the “encoding” or “decoding” of the video sequence.
  • the combination of the encoding part and the decoding part is also called codec (encoding and decoding).
  • the term "video coder” generally refers to both video encoders and video decoders.
  • the term "video coding” or “coding” may generally refer to video encoding or video decoding.
  • FIG. 1 is a block diagram of a video decoding system 10 according to an example described in an embodiment of the present invention.
  • the video coding system 10 may include a source device 12 and a destination device 14.
  • the source device 12 generates encoded video data. Therefore, the source device 12 may be referred to as a video encoding device.
  • the destination device 14 can decode the encoded video data generated by the source device 12, and therefore, the destination device 14 can be referred to as a video decoding device.
  • Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program codes in the form of instructions or data structures that can be accessed by a computer.
  • the source device 12 and the destination device 14 may communicate with each other via a link 13, and the destination device 14 may receive encoded video data from the source device 12 via the link 13.
  • Link 13 may include one or more media or devices capable of moving encoded video data from source device 12 to destination device 14.
  • link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time.
  • the source device 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 14.
  • the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
  • the source device 12 and the destination device 14 may include various devices, and the existence and (accurate) division of the functionality of the source device 12 and/or the destination device 14 may vary according to actual devices and applications.
  • At least one of the source device 12 and the destination device 14 may include a desktop computer, a mobile computing device, a notebook (eg, laptop) computer, a tablet computer, a set-top box, a mobile phone, a smart phone, a television, a camera , Display devices, set-top boxes, digital media players, video game consoles, video streaming equipment (such as content service servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or their Similar.
  • the solution of the present application can be applied to an immersive virtual visual experience scene.
  • the source device 12 may be a host, which may be an independent terminal, a computing device, a physical server, or a cloud computing (cloud computing) platform.
  • the destination device 14 may be a virtual reality (Virtual Reality, VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (Mixed Reality, MR) device, or the like.
  • VR Virtual Reality
  • AR Augmented Reality
  • MR mixed reality
  • the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology
  • AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology
  • MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology.
  • the destination device 14 can be a head-mounted display device (Head Mount Display, HMD).
  • the head-mounted display device and the host can communicate and interact in a wireless or wired manner.
  • the host encodes the image and transmits it to Head-mounted display device, the head-mounted display device decodes the image and displays it, thereby bringing the user a visual experience and interactive experience of VR/AR/MR.
  • the head-mounted display device may be, for example, a mobile-end headset or a host-end headset.
  • the mobile terminal head display such as VR/AR/MR glasses, VR/AR/MR mobile phone box, etc., can be connected to the host wirelessly (such as Bluetooth, WIFI, mobile network, etc.).
  • the host-side headset can also be called an external head-mounted device, which requires a wired connection to the host and other accessories for use.
  • the computing function of the host can also be integrated into the head-mounted display device.
  • the head-mounted display device can be an all-in-one headset, and the all-in-one headset has an independent display device ( As the decoding end) and the computing unit (as the encoding end), the two complete the communication interaction inside the all-in-one headset.
  • the solution of the present application can also be applied to the control or visual experience scenes of unmanned vehicles.
  • the source device 12 may be a drone, an unmanned car (not shown), etc., and the source device 12 may be equipped with a camera for image capture and encoding.
  • the destination device 14 may be a drone's flying glasses, an unmanned car control device (not shown), or the like.
  • Figure 3 shows the scene of interaction between the flying glasses and the drone.
  • the flying glasses and the drone can communicate and interact wirelessly.
  • the drone encodes the captured image/video and transmits it to the flying glasses.
  • the flying glasses decode the image and display it, so as to bring the user a vision experience of the drone, and even realize the control of the drone's flight attitude/shooting direction.
  • the source device 12 includes an encoder 20, and optionally, the source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22.
  • the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:
  • the picture source 16 which can include or can be any type of picture capture device, for example to capture real-world pictures, videos, and/or any type of pictures or comments (for screen content encoding, some text on the screen is also considered to be waiting
  • the encoded picture or part of the image) generating equipment for example, a computer graphics processor for generating computer animation pictures, or for acquiring and/or providing real-world pictures (such as images taken by a camera), computer animation pictures (for example, Screen content, VR pictures), and/or any combination thereof (for example, AR/MR pictures).
  • the picture source 16 may be a camera for capturing pictures or a memory for storing pictures.
  • the picture source 16 may also include any type (internal or external) interface for storing previously captured or generated pictures and/or acquiring or receiving pictures.
  • the terms "picture”, "frame” or “image” can be used as synonyms.
  • the picture source 16 When the picture source 16 is a camera, the picture source 16 may be, for example, a local or an integrated camera integrated in the source device; when the picture source 16 is a memory, the picture source 16 may be local or, for example, an integrated camera integrated in the source device. Memory.
  • the picture source 16 includes an interface, the interface may be, for example, an external interface for receiving pictures from an external video source.
  • the external video source is, for example, an external picture capturing device, such as a camera, an external memory, or an external picture generating device, such as It is an external computer graphics processor, computer or server.
  • the interface can be any type of interface based on any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.
  • the picture transmitted from the picture source 16 to the picture preprocessor may also be referred to as original picture data 17.
  • the picture preprocessor 18 is configured to receive the original picture data 17 and perform preprocessing on the original picture data 17 to obtain the preprocessed picture 19 or the preprocessed picture data 19.
  • the pre-processing performed by the picture pre-processor 18 may include one or more of image rendering, trimming, color format conversion, toning, or denoising.
  • the encoder 20 is configured to receive the pre-processed picture data 19, and process the pre-processed picture data 19 using the configured coding prediction mode and coding parameters, so as to provide the coded picture data 21.
  • picture data can be divided into a set of non-overlapping blocks (also called image blocks, or video blocks).
  • the current image to be processed by the encoder 20 may include one or more blocks at the block level (also called Is an image block, or a video block).
  • the encoder 20 can perform encoding at the block level.
  • the term "image to be processed” may refer to a part of a picture or a frame. Specifically, the "image to be processed” may be a "image block to be processed", that is, a block currently to be processed.
  • the image to be processed may include the block currently to be encoded; in decoding, the image to be processed may include the block currently to be decoded.
  • the prediction block is generated through spatial (intra-picture) prediction and temporal (inter-picture) prediction, and the prediction block is subtracted from the current block (currently processed or to-be-processed block) to obtain the residual block. Domain transforms the residual block and quantizes the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side applies the inverse processing part relative to the encoder to the coded or compressed block to reconstruct it for representation The current block.
  • the encoder duplicates the decoder processing loop, so that the encoder and the decoder generate the same prediction (for example, intra-frame prediction and inter-frame prediction) and/or reconstruction for processing, that is, to encode subsequent blocks.
  • the encoder 20 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the encoding side.
  • the communication interface 22 can be used to receive the encoded picture data 21, and can transmit the encoded picture data 21 to the destination device 14 or any other device (such as a memory) through the link 13 for storage or direct reconstruction, so The other device can be any device used for decoding or storage.
  • the communication interface 22 can be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a data packet, for transmission on the link 13.
  • the destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, a picture post-processor 32, and a display device 34. They are described as follows:
  • the communication interface 28 can be used to receive the encoded picture data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded picture data storage device.
  • the communication interface 28 can be used to transmit or receive the encoded picture data 21 via the link 13 between the source device 12 and the destination device 14 or via any type of network.
  • the link 13 is, for example, a direct wired or wireless connection.
  • the type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof.
  • the communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded picture data 21.
  • Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded picture data Information about the transmission of the transmitted data.
  • the decoder 30 is configured to receive the encoded picture data 21 and parse out the instruction information transmitted in the code stream.
  • the instruction information indicates the encoding parameters used when the encoder 20 encodes the image, based on the encoded picture data 21 and the instruction The information enables image decoding, thereby providing decoded picture data 31 (also referred to as reconstructed picture data).
  • the decoder 30 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the decoding side.
  • the picture post processor 32 is configured to perform post-processing on the decoded picture data 31 to obtain the post-processed picture data 33.
  • the post-processing performed by the picture post-processor 32 may include one or more of: rendering, color format conversion, toning, trimming or resampling, or any other processing, and may also be used to transmit the post-processed picture data 33 To display device 34.
  • the decoding device can also adjust one or more processing algorithms used by the picture post-processor 32 according to tracking information (such as speed, angular velocity, acceleration, linear velocity, position, posture, etc.).
  • Standard-Dynamic Range (SDR) image algorithm High-Dynamic Range (HDR) image algorithm
  • image enhancement algorithm image super-resolution Rate algorithm and so on.
  • the display device 34 is configured to receive the post-processed picture data 33 to display the picture to, for example, a user or a viewer.
  • the display device 34 may be or may include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor.
  • the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (digital light processor, DLP) or any type of other display.
  • FIG. 1 shows the source device 12 and the destination device 14 as separate devices
  • the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or Corresponding functionality and destination device 14 or corresponding functionality.
  • the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .
  • Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.
  • the video decoding system 10 shown in FIG. 1 is only an example, and the technology of this application can be applied to video encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, video encoding or video decoding). ).
  • the data can be retrieved from local storage, streamed on the network, etc.
  • the video encoding device can encode data and store the data to the memory, and/or the video decoding device can retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other but only encode data to the memory and/or retrieve data from the memory and decode the data.
  • FIG. 4 is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application.
  • the video coding device 400 is suitable for implementing the embodiments described herein.
  • the video coding device 400 may be a video decoder (for example, the decoder 30 of FIG. 1) or a video encoder (for example, the encoder 20 of FIG. 1).
  • the video coding device 400 may be one or more components of the decoder 30 in FIG. 1 or the encoder 20 in FIG. 1 described above.
  • the video decoding device 400 includes: an entry port 410 for receiving data and a receiver unit (Rx) 420, a processor, logic unit or central processing unit (CPU) 430 for processing data, and a transmitter for transmitting data A unit (Tx) 440 and an outlet port 450, and a memory 460 for storing data.
  • the video decoding device 400 may further include a photoelectric conversion component and an electro-optical (EO) component coupled with the inlet port 410, the receiver unit 420, the transmitter unit 440, and the outlet port 450 for the outlet or inlet of optical or electrical signals.
  • EO electro-optical
  • the processor 430 is implemented by hardware and software.
  • the processor 430 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 communicates with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460.
  • the processor 430 includes a decoding module 470 (for example, an encoding module 470 or a decoding module 470).
  • the encoding/decoding module 470 implements the embodiments disclosed herein to implement what is provided in the embodiments of the present invention. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations.
  • the encoding/decoding module 470 provides a substantial improvement to the function of the video decoding device 400 and affects the conversion of the video decoding device 400 to different states.
  • the encoding/decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.
  • the memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution.
  • the memory 460 may be volatile and/or non-volatile, and may be read only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).
  • FIG. 5 is a simplified block diagram of an apparatus 500 that can be used as either or both of the source device 12 and the destination device 14 in FIG. 1, according to an exemplary embodiment.
  • the apparatus 500 may take the form of a computing system containing multiple computing devices (such as multiple computing chips or multiple servers), or adopt such forms as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, mobile phones, Smartphones, televisions, cameras, display devices, set-top boxes, digital media players, video game consoles, video streaming equipment, broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or the like, etc.
  • the form of a single computing device may be used as either or both of the source device 12 and the destination device 14 in FIG. 1, according to an exemplary embodiment.
  • the apparatus 500 may take the form of a computing system containing multiple computing devices (such as multiple computing chips or multiple servers), or adopt such forms as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, mobile phones, Smartphones, televisions, cameras, display devices, set-top
  • the processor 502 in the device 500 may be a central processing unit.
  • the processor 502 may be any other type of device or multiple devices that can manipulate or process information that is currently or will be developed in the future.
  • a single processor such as processor 502
  • the use of more than one processor may achieve advantages in terms of speed and efficiency.
  • the memory 504 in the apparatus 500 may be a read only memory (Read Only Memory, ROM) device or a random access memory (random access memory, RAM) device. Any other suitable type of storage device can be used as the memory 504.
  • the memory 504 may include code and data 506 accessed by the processor 502 using the bus 512.
  • the memory 504 may further include an operating system 508 and an application program 510, and the application program 510 includes at least one program that permits the processor 502 to execute the method described herein.
  • the application program 510 may include applications 1 to N, and applications 1 to N further include video encoding applications that perform the methods described herein, such as AR/VR/MR applications, drone flight/shooting control applications, unmanned Driving control applications and more.
  • the apparatus 500 may also include additional memory in the form of a slave memory 514, which may be, for example, a memory card used with a mobile computing device. Because the video communication session may contain a large amount of information, this information may be stored in the slave memory 514 in whole or in part, and loaded into the memory 504 for processing as needed.
  • a slave memory 514 may be, for example, a memory card used with a mobile computing device. Because the video communication session may contain a large amount of information, this information may be stored in the slave memory 514 in whole or in part, and loaded into the memory 504 for processing as needed.
  • the apparatus 500 may also include one or more output devices, such as a display 518.
  • the display 518 may be a touch-sensitive display that combines a display and a touch-sensitive element operable to sense touch input.
  • the display 518 may be coupled to the processor 502 through the bus 512.
  • other output devices that allow the user to program the device 500 or use the device 500 in other ways may also be provided, or other output devices may be provided as an alternative to the display 518.
  • the display can be implemented in different ways, including through a liquid crystal display (LCD), a cathode-ray tube (CRT) display, a plasma display, or a light emitting diode (light emitting diode).
  • LCD liquid crystal display
  • CRT cathode-ray tube
  • plasma display a plasma display
  • light emitting diode light emitting diode
  • Diode, LED LED displays, such as organic LED (organic LED, OLED) displays.
  • the device 500 may also include or be connected to an image sensing device 520, such as a camera (camera), an infrared detector, or any other image sensing device that can sense images that is currently or will be developed in the future.
  • Equipment 520 The image sensing device 520 may be placed directly facing the user running the device 500, or may be placed directly facing the external environment. In an example, the position and optical axis of the image sensing device 520 may be configured such that its field of view includes an area immediately adjacent to the display 518 and the display 518 is visible from the area.
  • the apparatus 500 When the apparatus 500 is the destination device 14, it may optionally further include a motion sensing device 522, and the motion sensing device 522 may be used to realize the interaction between the user and the destination device.
  • the motion sensing device 522 can be used to detect at least one type of information such as the location/posture/linear velocity/angular velocity/acceleration of the destination device or the user’s body part to implement the tracking methods described in the embodiments of the present application: head tracking, gestures Tracking, eye tracking and running tracking.
  • the motion sensing device 522 may include at least one sensor such as an accelerometer, a gyroscope, a magnetometer, an optical capture device, and an inertial sensor. , To monitor in real time at least one type of information such as the rotation angle, angular velocity, angular acceleration, and rotation direction of the head of the user wearing the destination device.
  • the motion sensing device 522 may include optical capture such as accelerometers, gyroscopes, magnetometers, inertial sensors, optical cameras, infrared cameras, and depth sensors.
  • At least one device such as a device, so as to monitor at least one kind of information such as the posture, shape, movement speed, and movement direction of the user's hand in real time.
  • the motion sensing device 522 may include at least one device such as a built-in camera, an eye tracker, an infrared controller, an iris image detector, etc. Real-time monitoring of at least one information of the user's eyeball position, gaze direction, movement direction, and movement speed.
  • the motion sensing device 522 may include an inertial measurement unit (IMU), an accelerometer, a gyroscope, a magnetometer, a depth camera, or SLAM (simultaneous localization). and mapping, real-time positioning and map construction) system and other at least one device to monitor at least one type of information such as the speed, acceleration, direction, position, and posture of the user moving in the real environment.
  • IMU inertial measurement unit
  • an accelerometer e.gyroscope
  • a magnetometer e.gyroscope
  • SLAM simultaneous localization
  • mapping real-time positioning and map construction
  • the processor 502 and the memory 504 of the device 500 are shown as integrated in a single unit in FIG. 5, other configurations may also be used.
  • the operation of the processor 502 may be distributed in multiple directly coupled machines (each machine has one or more processors), or distributed in a local area or other network.
  • the storage 504 may be distributed in multiple machines, such as a network-based storage or storage in multiple machines running the apparatus 500.
  • the bus 512 of the device 500 may be formed by multiple buses.
  • the slave memory 514 may be directly coupled to other components of the device 500 or may be accessed through a network, and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Therefore, the device 500 can be implemented in a variety of configurations.
  • the existing VR/AR/MR technology allows users to obtain an immersive visual experience and satisfy the interaction between the user and the screen.
  • the current VR glasses will probably produce a field of vision (FOV) that exceeds 90 degrees (for example, 90-120 degrees).
  • FOV field of vision
  • the enlarged display technology it can display a magnified partial virtual reality in front of the user’s eyes.
  • Scenery, within this display range, real-time three-dimensional images can be generated through three-dimensional engine technology.
  • the second is to cooperate with the data collected by the head's position and attitude sensor (such as the head gyroscope), so that the three-dimensional engine responds to the head rotation direction (and the current head position change).
  • the gyroscope It can notify the image generation engine to render a new screen accordingly, and the image generation engine sends the new screen back to the VR glasses, and the VR glasses update the displayed three-dimensional images in real time.
  • the angle of the user's head rotation is exactly the same as the visual 3D image simulated by the 3D engine, making the user feel that the Buddha is observing a surrounding virtual 3D world through a large window. Because the user’s head rotation produces a picture change that the user can understand in the virtual world, the user thinks that the virtual world has feedback to the user, then the user’s actions and the virtual world’s feedback to the user are combined to form an interaction effect.
  • the image generation engine is located on the cloud server, that is, the game screen rendering is performed on the server side, and the game interaction is performed on the VR glasses side.
  • the server and the VR glasses are connected through a wireless network. After rendering, the new image is transmitted through wireless transmission.
  • the VR glasses display As shown in Figure 6, after the user wears the VR glasses, it is assumed that the current field of view (FOV) is "field of view 1". When the user turns his head at a certain angle, the field of view rotates from "field of view 1" to "field of view 2". If the picture is not updated in time, the human eye may feel a black area at the edge of "field of view 2", that is, black edges. (1) in FIG. 7 shows a VR scene without black borders, and (2) in FIG. 7 shows a VR scene with black borders.
  • the devices/devices described in Figures 1 to 4 of the embodiments of the present application can solve the defects of the prior art, and can simultaneously avoid the occurrence of black borders, freezes and other unsmooth phenomena at the decoding end, and can also ensure the image resolution.
  • the CloudVR system includes two parts: a cloud server (equivalent to the source device 12 described in this application) and a VR device (equivalent to the destination device 14 described in this application).
  • the server receives the instruction that contains information such as posture/rotational angular velocity/acceleration feedback from the VR device, the game rendering engine will render the corresponding image (the rendering resolution can be maintained unchanged), and combine the image with the posture/rotational angular velocity/acceleration, etc.
  • the information is sent to the encoder.
  • the encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder by judging the rotation speed/acceleration/posture of the VR device and other information to adjust the encoding delay of the encoder.
  • the image-related information can be sent to the VR device for decoding and show.
  • the VR device rotates fast, reduce the computational complexity of the encoder, thereby reducing the encoding delay of the encoder, and then reduce the CloudVR system delay, thereby eliminating the possibility of black edges on the screen; when the VR device rotates slowly
  • the computational complexity of the encoder returns to normal, and the system delay returns to normal. Since the VR device in this solution can decode and display images in real time and in time, the smoothness of the display of the VR device is also ensured, and the occurrence of stuttering is avoided.
  • the computational complexity of coding mentioned in this article is determined by the texture complexity and motion complexity of the video image.
  • the destination device is a VR device as an example.
  • the posture/rotational angular velocity/acceleration of the VR device can be detected by head tracking.
  • the information that the VR device feeds back to the cloud server may not be limited to information obtained by head tracking, but may also be information obtained by gesture tracking, or information obtained by eye tracking, or information obtained by motion tracking.
  • Head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture.
  • sensors such as accelerometers, gyroscopes, magnetometers, optical capture devices, and inertial sensors inside the destination device
  • the rotation angle, angular velocity, and angular velocity of the head of the user wearing the destination device can be monitored in real time.
  • Angular acceleration, rotation direction and other information The result of head tracking is that when the user puts on the destination device (such as a VR device) and turns his head, the picture they see will move with the movement of the head, simulating the scene where the user turns his head and sees a new picture. , So as to obtain an immersive visual experience.
  • Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements.
  • the use of gesture tracking can be divided into two ways: one is the contact detection method, that is, the user's hand needs to be bound to the sensor (for example, the data glove worn on the hand, or the user's handheld device), and the sensor can be It is an accelerometer, gyroscope, magnetometer, inertial sensor, etc., so as to monitor the user's hand posture, shape, moving speed, moving direction and other information in real time.
  • the other is a non-contact detection method, which can identify the posture, shape, movement speed, and direction of the user's hand by configuring optical capture devices such as optical cameras, infrared cameras, and depth sensors in the destination device. Gesture tracking can bring users direct participation and interaction with the screen content, and enhance the user experience.
  • Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head.
  • devices such as built-in cameras, eye trackers, infrared controllers, iris image detectors, etc. inside the destination device
  • the position of the user’s eyeballs can be tracked in real time through certain algorithms (such as eye diagram video recording and corneal reflection) , Gaze direction, movement direction, movement speed, etc.
  • the destination device such as a VR device
  • the picture he sees will move with the movement of his eyes, simulating the scene where the user moves his eyes to see a new picture, thereby obtaining immersive vision Experience.
  • Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment.
  • an inertial measurement unit IMU
  • the accelerometer, gyroscope, or magnetometer can be used to measure the speed, acceleration, direction and other information of the user's movement in the real environment.
  • a depth camera or SLAM (simultaneous localization and mapping) system can also be used to identify changes in the real environment, so as to determine changes in the user's own motion and real-time position. Through running tracking, you can also trigger the update of the visual screen, or trigger the user's interaction with the screen elements.
  • SLAM simultaneous localization and mapping
  • the tracking method used can be any of the tracking methods described above, or a combination of multiple tracking methods, such as a combination of head tracking and eye tracking, head tracking and eye tracking.
  • the combination of part tracking and motion tracking, etc., this application does not limit this.
  • FIG. 10 is a schematic flowchart of an information transmission method provided by an embodiment of the present invention. Describe separately, including but not limited to the following steps:
  • the decoding device detects and obtains tracking information.
  • the tracking information is information generated by performing at least one of the following tracking methods on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.
  • the tracking information of the decoding device includes at least one of the movement information and the pose information when the decoding device or the user's limbs are moving/moving/rotating, which can include the movement information or the pose information, or both at the same time.
  • Information and pose information may include velocity (linear velocity, angular velocity, etc.) and/or acceleration (linear acceleration, angular acceleration, etc.), and the pose information may include the position and/or posture (or direction) of the decoding device or the user.
  • the pose number information can represent the position and posture (or direction) of the decoding device in three-dimensional space.
  • the position can be represented by the three coordinate axes x, y, and z in the three-dimensional coordinate system
  • the direction can be represented by ( ⁇ , ⁇ , ⁇ ) to indicate, ( ⁇ , ⁇ , ⁇ ) to indicate the angle of rotation around the three coordinate axes.
  • the decoding device sends the tracking information of the decoding device to the encoding device; correspondingly, the encoding device receives the tracking information of the decoding device.
  • the encoding device configures encoding information of the image to be processed according to the tracking information of the decoding device.
  • the image to be processed is an image that currently needs to be processed and transmitted to the encoding device side for display/interaction.
  • the tracking information is associated with the encoding information of the image to be processed, and the encoding information includes one or more encoding parameters (or set of encoding parameters) that the encoder of the encoding device needs to use to encode the image to be processed.
  • These encoding parameters may include, for example, one or more of the following: an instruction to turn on or off the deblock filter (deblock_filter) function, the number of reference frames (Ref), the motion estimation search range (me_range), the motion estimation method (me_method), Sub-pixel subdivision intensity (subme), lookahead optimizer parameters, etc. They are described as follows:
  • deblock_filter The on or off instruction of the deblock filter (deblock_filter) function is used to indicate whether to activate the deblock_filter function to perform deblock filtering on the reconstructed image.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • the number of reference frames (Ref) parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction.
  • the value range of the number of reference frames is, for example, 0-16. The larger the value, the more accurate the prediction. The greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • the motion estimation search range (me_range) parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder.
  • the value range of the motion estimation radius can be 4 to 64.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • 8 ⁇ me_range ⁇ 64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena.
  • the motion estimation method (me_method) is used to indicate the setting of the full-pixel motion estimation method.
  • the motion estimation method includes the motion search algorithm (such as the diamond search algorithm dia, the hexagon search algorithm hex, and the asymmetric cross multi-level hexagon Grid search algorithm umh, etc.), the more complex the motion search algorithm, the more accurate the prediction, and the more complex the calculation. Conversely, the simpler the motion search algorithm, the worse the prediction accuracy and the smaller the computational complexity.
  • the general matching criterion for motion estimation is the rate-distortion optimization criterion
  • SAD Sud of Absolute Difference
  • Figure 11 shows some possible search templates.
  • the black dots in the template represent the best prediction points found in this step.
  • the search templates listed in the figure include small diamond templates, medium diamond templates, and hexagon templates. , Small square template, asymmetric cross template, 5*5 step-by-step search template, large hexagon template, regular octagon template, etc. It should be noted that in the specific implementation of this application, any other possible search templates can also be used, such as full search templates, three-step search templates, four-step search templates, etc., which are not limited in this application.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, attitude, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • Time configuration relatively simple motion estimation methods such as the diamond search algorithm dia, the search algorithm is relatively simple, the amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black bars and jams at the decoding end.
  • relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena.
  • the sub-pixel subdivision intensity (subme) parameter is used to indicate the dynamic prediction and partitioning mode.
  • the value range of this parameter can be, for example, 0-11.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • subme When setting subme equal to 0 or 1, thereby reducing the coding complexity, thereby reducing the coding delay, to avoid black bars, jams and other phenomena at the decoding end.
  • the lookahead optimizer parameter is used to set the frame buffer size for thread prediction.
  • the value range of this parameter is, for example, 0-250.
  • the larger the value the more accurate the prediction and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity.
  • the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster.
  • the frame buffer size is reduced, thereby reducing the coding complexity, thereby reducing the coding delay, and avoiding black borders and jams at the decoding end.
  • the encoded information mapped by the tracking information may be called the first encoding Information; when the tracking information is less than a preset threshold, the encoding information mapped by the tracking information is called second encoding information, then the first encoding information and the second encoding information can satisfy at least one of the following relationships :
  • the deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on; the reference in the first encoding information
  • the number of frames is smaller than the number of reference frames in the second coded information;
  • the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information; the calculation amount of the motion estimation method in the first coded information Less than the calculation amount
  • the tracking information is associated with the encoding information, which means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two.
  • the decoding device may store the mapping relationship between the tracking information and the encoding parameter set. In this way, after receiving the tracking information, the decoding device can find the corresponding encoding parameter and configure it to the encoder according to the mapping relationship. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information.
  • the tracking information and the coded information may be indirectly related. For example, certain algorithm processing or conditional judgments are needed on the tracking information to determine the corresponding coded information.
  • the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information.
  • the decoding device can also make a judgment on a preset condition based on the tracking information, and determine the corresponding encoding parameter set based on the judgment result.
  • the decoding device pre-stores the mapping relationship between different data intervals and encoding parameter sets. After receiving the tracking information, the decoding device can determine the data interval where the speed/acceleration/position data in the tracking information is located according to the tracking information. Then find the corresponding encoding parameter set according to the mapping relationship.
  • the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information.
  • the encoding information (encoding parameters) of the encoder in the encoder is adjusted, thereby adjusting the computational complexity of the encoder.
  • the encoding device performs image encoding on the image to be processed according to the encoding parameters configured in S104.
  • the encoding process can be performed by the encoder in the encoding device. The specific coding process is not described here.
  • the encoding device encodes the encoding parameter used to indicate the configuration of S104 into a code stream and sends it to the decoding device.
  • the encoding device can adjust the encoding parameters of the encoder according to the tracking information uploaded by the decoding device, relevant information about the encoding parameters can be encoded in the code stream to facilitate subsequent decoding by the encoding device.
  • the code stream also contains the image information obtained after encoding the image, so that the decoding device can reconstruct (decode) the image based on the image information.
  • the code stream sent to the decoding device contains the motion vector difference.
  • Information (MVD), reference image index, etc. the specific content of image information can be implemented by referring to existing encoding methods, which are not limited in this application.
  • the decoding device parses the code stream from the encoding device.
  • the decoding device can obtain this information by parsing the code stream.
  • the decoding device decodes and displays the image according to the instruction information.
  • the decoding process of the decoding device can be regarded as the inverse process of encoding, and the decoding device can decode (decode) the image information to reconstruct the image and display it on the display device.
  • the realization of the decoding process can refer to the existing decoding means.
  • the encoding device can receive instructions containing at least one information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then further transmit these instructions to the encoder.
  • the encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder according to the position/posture/linear velocity/angular velocity/acceleration of the decoding device, thereby adjusting the encoding delay of the encoder, thereby reducing the delay of the entire system.
  • the image-related information and the configured encoding parameters are sent to the VR device so that the decoding device can decode and display normally.
  • this application reduces the system delay by reducing the computational complexity of the encoder. Basically eliminate the possibility of black borders on the screen.
  • the VR device can decode and display images in real time and in time, so the display fluency of the VR device is also guaranteed, and the occurrence of jams is avoided.
  • the embodiment of the application can also keep the image rendering at a better resolution. Rate, to ensure the user experience.
  • a specific cloudVR scenario is used as an example to illustrate the information transmission method provided by the embodiments of the present application.
  • the encoding device can be a VR headset
  • the decoding device can include the cloud
  • the tracking information of the VR headset is the information obtained through head tracking (for example, the rotational angular velocity of the head, the pose, etc.), as shown in Fig. 11, the method includes but is not limited to the following steps:
  • the VR head-mounted display detects the rotation angular velocity V and pose information of the VR head-mounted display by means of head tracking.
  • head tracking For the specific implementation scheme, please refer to the content about head tracking in the previous article, which will not be repeated here.
  • the VR head-mounted display sends information such as the rotational angular velocity V and the pose to the server.
  • S203 The server determines the image to be processed according to the pose information of the VR headset, and performs image rendering on the image to be processed.
  • the server can predict the position of the image according to the position of the head display to determine the current image to be processed, and preprocess the image to be processed.
  • the preprocessing may include, for example, image rendering, trimming, color format conversion, color correction, or de-processing. One or more of noise.
  • the specific content of this part can be achieved with reference to existing methods.
  • the server transmits the rotational angular velocity V of the VR headset to the internal encoder.
  • the encoder obtains the rotational angular velocity, it judges the relationship between the rotational angular velocity V and the preset threshold, so as to start the encoding parameter configuration function to adjust the encoding parameters and adjust the encoding complexity.
  • the preset threshold that can be preset may include T1 and T2, where T1 ⁇ T2.
  • the encoder selects the first encoding parameter set according to the mapping relationship, and the configured first encoding parameter set may include, for example, one or more of the following: Turn off the deblock function in the encoder , The number of reference frames is modified to 1, the motion estimation search range is 4x4, the motion estimation method uses diamond search dia, and so on. Therefore, the coding calculation complexity is greatly reduced, and the coding delay is significantly reduced.
  • the encoder selects the second encoding parameter set according to the mapping relationship, and the configured second encoding parameter set may include, for example, one or more of the following: open deblock Function, the number of reference frames is increased to 2, the motion estimation search range is 8x8, the motion estimation method uses the hexagonal search algorithm hex, and so on. Therefore, the coding calculation complexity is reduced, and the coding delay is reduced.
  • the encoder selects the third encoding parameter set according to the mapping relationship.
  • the configured third encoding parameter set may include, for example, one or more of the following: open deblock function, reference frame The number is increased to 4, the motion estimation search range is 16x16, and the motion estimation method adopts the asymmetric cross-type multi-level hexagonal grid point search algorithm umh.
  • the computational complexity of the encoder is relatively large, and the coding delay is relatively large.
  • an exemplary implementation code is as follows:
  • S206 The encoder of the server loads the configured encoding parameters and starts to perform image encoding on the image to be processed.
  • the encoder of the server encodes the encoded image information and the selected configuration encoding parameters into a code stream, and sends the encoded image information to the VR head display via the network.
  • the image information may include, for example, motion vector difference information (MVD), reference image index, and other specific content, which can be implemented with reference to existing coding methods, which is not limited in this application.
  • MVD motion vector difference information
  • reference image index reference image index
  • the decoding device can obtain this information by parsing the code stream.
  • the VR head display performs image decoding and display.
  • FIG. 14 shows another implementation scheme.
  • the code configured in the encoder Parameters can include turning on and off the deblock function, the change in the number of structural vertices in display rendering, and so on.
  • the implementation schemes of different forms and the schemes obtained based on the variants of the scheme of this application shall all belong to the protection scope of this application.
  • the server can adjust the calculation complexity of the encoder and optimize the parameter configuration of the encoder by judging the rotational angular velocity information fed back by the VR head-mounted display, thereby optimizing the encoding delay, and then optimizing the entire system when turning the head.
  • Delay significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning the head.
  • the faster the head turning speed the less obvious the human eye's perception of the quality of the image, the lower the CloudVR system latency, the less likely it is to perceive the black edge of the head turning.
  • the encoder can sacrifice a certain image quality at the expense of reducing encoding complexity, thereby reducing system delay; if the head-mounted display rotating speed is slow, the encoder can encode according to the original configuration , Without degrading image quality.
  • This not only guarantees the low latency when the VR headset rotates quickly, and fundamentally reduces the possibility of black borders, but does not affect the image quality and experience of the VR headset when it is stationary or turning its head at low speed.
  • this embodiment The resolution of the image rendering can also be guaranteed to ensure the user experience.
  • the existing technical methods do not reduce the system delay, so adverse reactions such as smearing and jitter will be introduced.
  • the decoder of the decoding device after the decoder of the decoding device completes the decoding of the image, it can also improve the image post-processing link (for example, the picture post-processor 32 in FIG. 1) To further reduce the system delay, so as to further avoid the occurrence of black edges and stalls.
  • the image post-processing link for example, the picture post-processor 32 in FIG. 1
  • the decoding device can start or close one or more processing algorithms in the image post-processing stage according to the tracking information of the decoding device, so as to adjust the computational complexity of the image post-processing link and reduce the black border Or the possibility of stuck phenomenon.
  • the mapping relationship between the tracking information of the decoding device and one or more processing algorithms used by the picture post-processor can be preset. Then, the mapping relationship can be queried according to the tracking information of the decoding device. Determine the corresponding processing algorithm and configure it for the image post processor.
  • the one or more processing algorithms adopted by the image post processor may include, for example, at least one of the following algorithms: Standard-Dynamic Range (SDR) image algorithm, High-Dynamic Range (HDR) Image algorithms, image enhancement algorithms, image super-resolution algorithms, etc.
  • SDR image algorithm can be used to realize the gamma curve correction of the image or video.
  • HDR image algorithm can be used to provide more dynamic range and image details of the image, so that the image can better reflect the visual effect in the real environment.
  • Image enhancement algorithms can be used to adjust the brightness, contrast, saturation, hue, etc. of the image to increase the clarity of the image and reduce noise.
  • Image super-resolution algorithms can restore low-resolution images or image sequences to high-resolution images. It can be seen that the image quality of the decoding device can be improved through the above-mentioned one or more processing algorithms, thereby bringing a better image look and feel to the user.
  • the tracking information of the decoding device can be obtained from the history cache of the memory of the decoding device, or it can be transmitted back to the decoding device by the encoding device through the code stream.
  • the tracking information of the decoding device includes at least one of motion information and pose information of the decoding device, the motion information includes the motion speed and/or acceleration of the decoding device, and the motion speed includes angular velocity and/or linear velocity. Speed, the acceleration includes angular acceleration and/or linear acceleration; the pose information includes position information and/or attitude information of the decoding device.
  • the decoding device is a VR head-display
  • the tracking information is the rotation angular velocity V
  • the rotation angular velocity threshold of the VR head-display can be set in the decoding device in advance.
  • the mapping relationship between the tracking information and one or more processing algorithms adopted by the picture post-processor can be the following example:
  • one or more processing algorithms are turned off, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, the image super-resolution algorithm, etc. is turned off. That is, when the VR headset is moving faster, the computational complexity can be reduced in the image post-processing link, thereby reducing the system delay, thereby avoiding the occurrence of black edges, jams and other phenomena.
  • one or more processing algorithms are turned on, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, and the image super-resolution algorithm is turned on. That is to say, the computational complexity can be increased in the image post-processing link, thereby improving the image quality of the image output by the system, and bringing a better viewing experience to the user. Since the movement of the VR head-mounted display is relatively slow at this time, the time delay caused will not cause black edges, jams, etc.
  • multiple rotational angular velocity thresholds may also be designed to provide more mapping possibilities between tracking information and processing algorithms to adapt to more diversified application scenarios.
  • the decoding device (such as a VR headset) adjusts one of the image post-processing links (such as a picture post-processor) based on the tracking information cached in the history or the tracking information returned in the code stream.
  • One or more processing algorithms can be turned on or off to achieve the adjustment of computational complexity, further optimize the delay of the entire system turning head, significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning heads. It can improve the image quality without black borders and jams, and bring users a better viewing experience.
  • Figure 15 is a system provided by an embodiment of the present invention and a structural diagram of the encoding device 60 and the decoding device 70 in the system.
  • the encoding device 60 and the decoding device 70 can communicate wirelessly.
  • the encoding device 60 may include a parameter adjustment module 601, an encoding module 602, a receiving module 603, and a transmitting module 604.
  • the decoding device 70 may include a tracking module 701, a decoding module 702, a display module 703, a transmitting module 704, and a receiving module 705,
  • each module of the encoding device 60 and the decoding device 70 are respectively described as follows.
  • the receiving module 603 is configured to receive tracking information of the decoding device 70; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.
  • the parameter adjustment module 601 is configured to configure the encoding information of the image to be processed according to the tracking information of the decoding device 70; the tracking information is associated with the encoding information, and the encoding information includes the encoding information used for encoding the image to be processed. One or more encoding parameters.
  • the encoding module 602 is used to perform image encoding on the image to be processed.
  • the transmitting module 604 is configured to encode the encoding parameters and the encoded image information into a code stream and send to the decoding device 70.
  • the tracking module 701 is configured to obtain tracking information by tracking the decoding device 70 in real time, and the tracking information is information generated by performing at least one of the following operations on the decoding device 70: head tracking, gesture tracking, and eye tracking , Or motion tracking; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.
  • the transmitting module 704 is configured to send the tracking information of the decoding device 70 to the encoding device 60.
  • the receiving module 705 is configured to receive the code stream from the encoding device 60 to obtain encoding parameters and encoded image information.
  • the decoding module 702 is used for image decoding according to the encoded image information.
  • the display module 703 is used to display the decoded image.
  • each module function of the encoding device 60 and the decoding device 70 can refer to the related description of the embodiment in Figure 10 or Figure 13 above.
  • the receiving module 603 can be used to perform the tracking information of S102.
  • the parameter adjustment module 601 is used to perform S103, the encoding module 602 is used to perform S104, and the transmitting module 604 is used to perform S105 code stream transmission; for the decoding device 70, the tracking module 701 is used to perform S101, and the transmitting module 704 is used to perform The tracking information is sent in S102, the receiving module 705 is used to perform the code stream reception of S105 and S106, the decoding module 702 is used to perform image decoding in S107, and the display module 703 is used to display the image in S107. For the sake of brevity of the manual, I won't repeat it here.
  • the computer may be implemented in whole or in part by software, hardware, firmware or any combination.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions, and when the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a network site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, and may also be a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid state hard disk), and so on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application provides an information transmission method, a related device, and a system. The method is applied to an encoding device, and comprises: receiving tracking information of a decoding device, the tracking information comprising movement information or pose information of the decoding device; configuring encoding information of an image to be processed according to the tracking information, the tracking information being associated with the encoding information, and the encoding information comprising one or more encoding parameters; and encoding said image according to the encoding information, and sending a code stream to the decoding device, the code stream comprising the one or more encoding parameters. Implementing the embodiments of the present application can reduce or even eliminate the unsmooth phenomena such as black edge and picture freeze of the decoding device, thereby improving user experience.

Description

一种信息传输方法、相关设备及系统Information transmission method, related equipment and system
本申请要求于2020年06月12日提交中国专利局,申请号为202010535609.4,申请名称“一种信息传输方法、相关设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 12, 2020, the application number is 202010535609.4, and the application name is "an information transmission method, related equipment and system", the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本发明涉及视频编解码技术领域,尤其涉及一种信息传输方法、相关设备及系统。The present invention relates to the technical field of video coding and decoding, in particular to an information transmission method, related equipment and system.
背景技术Background technique
目前虚拟现实(Virtual Reality,VR)游戏产业快速发展,所需要的渲染等高负荷任务对计算资源需求非常大,一般需要配置高性能的游戏主机,导致VR游戏的成本居高不下,并且不能满足人们随时随地玩VR游戏的需求。CloudVR技术利用端云协同的思想,将VR游戏渲染和VR游戏交互分离,VR游戏交互在终端完成,交互指令(头显和手势的姿态及位置信息,用户输出的指令等)通过无线网络传输到云端(Cloud)服务器;云端服务器根据接收到的交互指令完成游戏的渲染,并将游戏画面通过无线网络传输到终端显示。CloudVR技术可以显著降低VR游戏终端的成本,并使用户可以随时随地接入网络玩VR游戏。At present, the virtual reality (VR) game industry is developing rapidly. The high-load tasks such as required rendering have a very large demand for computing resources. Generally, a high-performance game console is required, which leads to the high cost of VR games and cannot meet the requirements. People need to play VR games anytime, anywhere. CloudVR technology uses the idea of terminal-cloud collaboration to separate VR game rendering and VR game interaction. VR game interaction is completed in the terminal, and the interactive instructions (head-display and gesture posture and position information, user output instructions, etc.) are transmitted to the wireless network through the wireless network. Cloud server: The cloud server completes the rendering of the game according to the received interactive instructions, and transmits the game screen to the terminal for display through the wireless network. CloudVR technology can significantly reduce the cost of VR game terminals and enable users to access the Internet to play VR games anytime, anywhere.
CloudVR技术中云端服务器和终端之间通过无线网络相连,相比在终端本地进行渲染及交互的VR游戏而言会存在较大时延,当时延过大时,VR游戏画面中会出现明显黑边、画面卡顿等不流畅现象,显著影响VR游戏体验。In CloudVR technology, the cloud server and the terminal are connected through a wireless network. Compared with the VR game that renders and interacts locally on the terminal, there will be a larger delay. When the delay is too large at the time, obvious black borders will appear in the VR game screen. Unsmooth phenomena such as screen freezes, etc., significantly affect the VR gaming experience.
发明内容Summary of the invention
本申请实施例提供一种信息传输方法、相关设备及系统,能够降低CloudVR系统的时延,减少甚至消除设备的黑边、画面卡顿等不流畅现象,提升用户使用体验。The embodiments of the present application provide an information transmission method, related equipment, and system, which can reduce the time delay of the CloudVR system, reduce or even eliminate the unevenness of the device such as black edges and screen freezes, and improve the user experience.
第一方面,本申请实施例提供了一种信息传输方法,该方法从编码设备的角度进行描述,包括:接收解码设备的跟踪信息,所述跟踪信息包括所述解码设备的运动信息或位姿信息;根据解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;根据所述编码信息对所述待处理图像进行编码,并将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数。In the first aspect, the embodiments of the present application provide an information transmission method. The method is described from the perspective of an encoding device, including: receiving tracking information of a decoding device, where the tracking information includes motion information or pose of the decoding device Information; according to the tracking information of the decoding device, configure the encoding information of the image to be processed; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters; according to the encoding information, the encoding information of the image to be processed is configured The image is processed for encoding, and a code stream is sent to the decoding device, where the code stream includes the one or more encoding parameters.
其中,跟踪信息是解码设备对自身或者用户的运动状态进行跟踪检测而获得的,所述跟踪信息包括运动信息和位姿信息中的至少一种,所述运动信息用于指示所述解码设备的运动情况,具体实施例中,所述运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线加速度。所述位姿信息用于指示所述解码设备或者用户的位置信息和/或姿态信息,即位姿数信息可表示解码设备在三维空间中的位置和姿态(或方向),位置可以通过三维坐标系中的三个坐标轴x、y、z表示,方向可以通过(α,β,γ)来表示,(α,β,γ)表示围绕三个坐标轴旋转的角度。Wherein, the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the performance of the decoding device Movement conditions, in a specific embodiment, the movement information includes the movement speed and/or acceleration of the decoding device, the movement speed includes an angular velocity and/or a linear velocity, and the acceleration includes an angular acceleration and/or a linear acceleration. The pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space, and the position may be in a three-dimensional coordinate system The three coordinate axes in x, y, z represent, the direction can be represented by (α, β, γ), and (α, β, γ) represents the angle of rotation around the three coordinate axes.
跟踪信息与编码信息相关联,是指跟踪信息与编码信息之间具有对应的关系,编码设备存储着两者之间的关系。举例来说,跟踪信息与编码信息之间可以是直接映射的关系,即跟踪信息与编码信息相绑定,通过跟踪信息可以直接确定编码信息。又举例来说,跟踪信息与编码信息之间可以是间接关联的关系,比如需要对跟踪信息进行一定的算法处理或者条件判 断才能确定与之对应的编码信息。当编码设备收到解码设备上传的具体的跟踪信息后,根据具体的跟踪信息可以确定与之对应的编码信息。The correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information. For another example, there may be an indirect correlation between tracking information and coded information. For example, it is necessary to perform certain algorithm processing or condition judgment on the tracking information to determine the corresponding coded information. After the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information.
编码信息包括编码设备的编码器对待处理图像(或称待编码图像)进行编码所采用的一个或多个编码参数。由于编码器是采用编码参数来执行编码过程的,这意味着,采用不同的编码参数在编码过程中所需的计算量不同,计算复杂度不同。也就是说,本申请中,编码设备可以实时基于解码设备上传的跟踪信息来调整自身配置的编码参数,从而实现了编码计算复杂度的调整。The encoding information includes one or more encoding parameters used by the encoder of the encoding device to encode the image to be processed (or called the image to be encoded). Since the encoder uses the encoding parameters to perform the encoding process, this means that the use of different encoding parameters requires different amounts of calculation and different computational complexity during the encoding process. That is to say, in this application, the encoding device can adjust the encoding parameters configured by itself based on the tracking information uploaded by the decoding device in real time, thereby realizing the adjustment of the encoding calculation complexity.
可以看到,实施本申请实施例,编码设备可实时接收解码设备反馈的包含位置/姿态/线速度/角速度/加速度等至少一种跟踪信息的指令后,可以根据接收到的跟踪信息对编码设备中的编码器的编码信息进行调整,调整策略可以是调节编码器的计算复杂度(即编码参数)。在编解码系统(例如CloudVR系统)中,时延的主要构成之一在于系统中编码时延,而本申请实施例通过调节编码器的计算复杂度来调节编码器的编码时延,进而降低整个系统时延,后续可将图像相关信息和编码参数发给解码设备,以便于解码设备正常地解码及显示。It can be seen that, after implementing the embodiments of the present application, the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration from the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information. To adjust the encoding information of the encoder in, the adjustment strategy may be to adjust the computational complexity (ie, encoding parameters) of the encoder. In an encoding and decoding system (such as the CloudVR system), one of the main components of the delay is the encoding delay in the system, and the embodiment of the application adjusts the encoding delay of the encoder by adjusting the calculation complexity of the encoder, thereby reducing the overall System delay, the image-related information and coding parameters can be sent to the decoding device subsequently, so that the decoding device can decode and display normally.
由于解码设备的黑边现象与系统时延过大息息相关,所以本申请通过降低编码器的计算复杂度来降低系统时延,能极大减少甚至消除画面黑边产生的可能性,这样解码设备收到码流后就能够及时地解码及显示图像,也保证了解码设备的显示流畅度,避免了卡顿现象的发生。Since the black border phenomenon of the decoding device is closely related to the excessive system delay, this application reduces the calculation complexity of the encoder to reduce the system delay, which can greatly reduce or even eliminate the possibility of the black border of the picture, so that the decoding device receives After the code stream is reached, the image can be decoded and displayed in time, which also ensures the smoothness of the display of the decoding device and avoids the occurrence of jams.
基于第一方面,在可能的实施例中,所述根据所述解码设备的跟踪信息,配置待处理图像的编码信息,具体包括:根据所述跟踪信息查询预设映射关系,获得所述待处理图像的编码信息,所述预设映射关系包括所述跟踪信息到所述编码信息的映射关系;配置所述编码信息。Based on the first aspect, in a possible embodiment, the configuring encoding information of the image to be processed according to the tracking information of the decoding device specifically includes: querying a preset mapping relationship according to the tracking information to obtain the The encoding information of the image, the preset mapping relationship includes the mapping relationship between the tracking information and the encoding information; the encoding information is configured.
其中,预设映射关系可以预先存储在编码设备的存储单元中,用于表征所述跟踪信息到所述编码信息的映射关系。例如在实现中,预设映射关系可以是映射表,该映射表可以直接记录了各种跟踪信息与编码参数之间映射关系,或者该映射表可以记录运动信息或位姿信息的各种取值范围与编码参数之间映射关系,通过确定跟踪信息中的具体数值处于哪个取值范围,就可以确定该跟踪信息对应哪些编码参数。在获得跟踪信息对应的编码信息后,编码设备就可以将该编码信息(一个或多个编码参数)配置到编码器中(即替换掉原先配置的编码参数),从而实现了编码器的编码参数的调整,即调整了编码过程的编码计算复杂度。The preset mapping relationship may be pre-stored in the storage unit of the encoding device, and is used to characterize the mapping relationship between the tracking information and the encoding information. For example, in implementation, the preset mapping relationship may be a mapping table, which may directly record the mapping relationship between various tracking information and encoding parameters, or the mapping table may record various values of motion information or pose information The mapping relationship between the range and the encoding parameter, by determining which value range the specific value in the tracking information is in, can determine which encoding parameter the tracking information corresponds to. After obtaining the encoding information corresponding to the tracking information, the encoding device can configure the encoding information (one or more encoding parameters) into the encoder (that is, to replace the previously configured encoding parameters), thereby realizing the encoding parameters of the encoder The adjustment of, that is, the coding calculation complexity of the coding process is adjusted.
可以看到,本申请实施例通过设置预设映射关系能够实现编码计算复杂度的快速调整,从而实现编码时延的快速调节,有利于消除解码端的黑边、卡顿现象。另外,技术人员可以根据实际需要来定义预设映射关系的具体内容并设置到编码设备中,所以本申请实施例也有利于提供各种预设映射关系的选择可能性来满足各种各样的应用场景,满足实际的编码需要。It can be seen that the embodiment of the present application can realize rapid adjustment of encoding calculation complexity by setting the preset mapping relationship, thereby realizing rapid adjustment of encoding delay, which is beneficial to eliminating black edges and jams on the decoding end. In addition, technicians can define the specific content of the preset mapping relationship according to actual needs and set it in the encoding device. Therefore, the embodiments of the present application are also beneficial to provide various preset mapping relationship selection possibilities to meet various requirements. Application scenarios to meet actual coding needs.
基于第一方面,在可能的实施例中,所述一个或多个编码参数包括去块滤波器(deblock_filter)参数、参考帧数目(Ref)、运动估计搜索范围(me_range)、运动估计方式(me_method)、子像素细分强度(subme)、先行(lookahead)优化器参数中的一个或多个。Based on the first aspect, in a possible embodiment, the one or more encoding parameters include deblocking filter (deblock_filter) parameters, reference frame number (Ref), motion estimation search range (me_range), motion estimation method (me_method ), one or more of sub-pixel subdivision intensity (subme), and lookahead optimizer parameters.
其中,去块滤波器参数用于指示是否启动deblock_filter功能对重建图像进行去块滤波。参考帧数目参数用于指示最大参考帧数目,即在图像预测中采用的参考帧的数目。运动估计搜索范围参数用于指示图像预测中的运动估算半径,即编码器对像素块进行预测搜索的半径。运动估计方式用于指示设定全像素运动估计方式,运动估计方式包括运动搜索算法(例如菱形搜索算法、六边形搜索算法、非对称十字型多层次六边形格点搜索算法,等等)。子像素细 分强度参数用于指示动态预测和分区方式。先行优化器参数用于设置线程预测的帧缓存大小。Among them, the deblocking filter parameter is used to indicate whether to activate the deblock_filter function to perform deblocking filtering on the reconstructed image. The number of reference frames parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction. The motion estimation search range parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder. The motion estimation method is used to indicate the setting of the full-pixel motion estimation method. The motion estimation method includes the motion search algorithm (such as the diamond search algorithm, the hexagon search algorithm, the asymmetric cross multi-level hexagon grid search algorithm, etc.) . The sub-pixel subdivided intensity parameter is used to indicate the dynamic prediction and partition mode. The advance optimizer parameter is used to set the frame buffer size for thread prediction.
基于第一方面,在可能的实施例中,当所述跟踪信息大于等于预设阈值时,所述跟踪信息映射第一编码信息;当所述跟踪信息小于预设阈值时,所述跟踪信息映射第二编码信息,且所述第一编码信息与所述第二编码信息满足以下至少一种关系:Based on the first aspect, in a possible embodiment, when the tracking information is greater than or equal to a preset threshold, the tracking information maps the first encoding information; when the tracking information is less than the preset threshold, the tracking information maps Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
(1)所述第一编码信息中的去块滤波器参数用于指示关闭去块滤波器,所述第二编码信息中的去块滤波器参数用于指示开启去块滤波器。(1) The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on.
例如,deblock_filter=1表示进行打开该功能,为deblock_filter=0表示关闭该功能。例如在一种具体实现中,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,配置“deblock_filter=0”,即关闭该去块滤波器功能,从而使编码器省去了去块滤波器的工作,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置“deblock_filter=1”,此时开启去块滤波器功能,由于解码端运动较慢,所以开启去块滤波器功能所带来的时延不会引起黑边、卡顿等现象。For example, deblock_filter=1 means that the function is turned on, and deblock_filter=0 means that the function is turned off. For example, in a specific implementation, when the specific value in the tracking information (such as speed, acceleration, acceleration, position, attitude, etc.) is greater than or equal to the preset threshold, configure "deblock_filter=0", that is, turn off the deblocking filter Function, so that the encoder saves the work of the deblocking filter, reduces the coding complexity, thereby reduces the coding delay, and avoids black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure "deblock_filter=1", and the deblocking filter function is turned on at this time. Since the motion of the decoding end is slow, the delay caused by turning on the deblocking filter function will not cause Phenomena such as black borders and stuttering.
(2)所述第一编码信息中的参考帧数目小于所述第二编码信息中的参考帧数目。(2) The number of reference frames in the first encoded information is smaller than the number of reference frames in the second encoded information.
例如,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,此时配置0<Ref<2,从而减少了编码预测中参考帧的数目,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置16≥Ref≥2,编码预测中参考帧的数目增加,编码复杂度增加,由于解码端运动较慢,所以增加参考帧数目所带来的时延不会引起黑边、卡顿等现象。For example, when the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value, at this time, 0<Ref<2 is configured, thereby reducing the number of reference frames in the encoding prediction. The coding complexity is reduced, thereby reducing the coding delay, and avoiding black bars and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, configure 16≥Ref≥2, the number of reference frames in coding prediction increases, and the coding complexity increases. Due to the slower motion of the decoding end, the delay caused by increasing the number of reference frames Will not cause black borders, jams and other phenomena.
(3)所述第一编码信息中的运动估计搜索范围小于所述第二编码信息中的运动估计搜索范围。(3) The motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information.
例如,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,此时配置4≤me_range≤8,从而减少了编码预测中运动估算半径,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置8<me_range≤64,编码预测中运动估算半径增加,编码复杂度增加,由于解码端运动较慢,所以增加运动估算半径所带来的时延不会引起黑边、卡顿等现象。For example, when the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value, 4≤me_range≤8 is configured at this time, thereby reducing the motion estimation radius in the coding prediction. The coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, 8<me_range≤64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena.
(4)所述第一编码信息中运动估计方式的计算量小于所述第二编码信息中运动估计方式的计算量。(4) The calculation amount of the motion estimation mode in the first coded information is smaller than the calculation amount of the motion estimation mode in the second coded information.
例如,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,此时配置相对简单的运动估计方式,例如菱形搜索算法dia,搜索算法较为简单,计算量小,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置相对复杂的运动估计方式,例如,六边形搜索算法hex、非对称十字型多层次六边形格点搜索算法umh,等等,计算量增加,即增加了编码复杂度,由于解码端运动较慢,所以搜索算法的复杂所带来的时延不会引起黑边、卡顿等现象。For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, a relatively simple motion estimation method is configured at this time, such as the diamond search algorithm dia, the search algorithm is relatively simple , The amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena.
(5)所述第一编码信息中的子像素细分强度小于所述第二编码信息中的子像素细分强度。(5) The sub-pixel subdivision intensity in the first coded information is less than the sub-pixel subdivision intensity in the second coded information.
例如,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,此时配置subme等于0或1,从而降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置1<subme≤11,增加编码复杂度,由于解码端运动较慢,所以所带来的时延不会引起黑边、卡顿等现象。For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, configure subme to be equal to 0 or 1, thereby reducing coding complexity and coding Time delay to avoid black borders and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 1<subme≤11 to increase coding complexity. Since the decoding end moves slowly, the delay caused will not cause black bars, jams, etc.
(6)所述第一编码信息中的先行优化器参数小于所述第二编码信息中的先行优化器参数。(6) The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
例如,当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,此时配置0≤lookahead<2,使帧缓存大小降低,从而降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置2≤lookahead≤250,使帧缓存大小增加,从而增加编码复杂度,由于解码端运动较慢,所以所带来的时延不会引起黑边、卡顿等现象。For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, then configure 0≤lookahead<2 to reduce the size of the frame buffer, thereby reducing coding complexity Therefore, the encoding delay is reduced to avoid black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 2≤lookahead≤250 to increase the size of the frame buffer, thereby increasing the coding complexity. Due to the slower motion of the decoding end, the delay caused will not cause black borders, Caton and other phenomena.
基于第一方面,在可能的实施例中,所述跟踪信息为对所述解码设备执行以下至少一种操作产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪。Based on the first aspect, in a possible embodiment, the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.
其中,头部跟踪是通过测量用户的头部转动时的角度、角速度或角加速度而实现对头部运动的跟踪,从而触发视觉画面的响应。手势跟踪是通过检测用户的手部在现实环境中的姿势、形态、移动速度、移动方向等而实现对手部运动的跟踪,从而触发视觉画面的响应,或者触发与画面元素的交互。眼动跟踪是通过测量用户眼睛的注视点的位置或者眼球相对头部的运动而实现对眼球运动的跟踪。运动跟踪是通过测量用户在现实环境中的位置和姿态(即位姿)、在现实环境中移动的速度、加速度、方向等来实现对用户运动的跟踪。可以看到,本申请实施例可应用于多种多样的跟踪场景,满足用户在不同场景下的需求,提高了本申请的应用性和商业价值。Among them, head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture. Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements. Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head. Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment. It can be seen that the embodiments of the present application can be applied to a variety of tracking scenarios, meet the needs of users in different scenarios, and improve the applicability and commercial value of the present application.
基于第一方面,在可能的实施例中,所述解码设备包括虚拟现实(Virtual Reality,VR)设备、增强现实(Augmented Reality,AR)设备、混合现实(Mixed Reality,MR)设备、或无人机飞行眼镜中的一种。Based on the first aspect, in a possible embodiment, the decoding device includes a virtual reality (VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (MR) device, or an unmanned A kind of aircraft flying glasses.
例如VR设备可以是VR眼镜、VR头显、VR盒子等等应用了VR技术的设备,AR设备可以是AR眼镜、AR电视、AR头显等等应用了AR技术的设备,MR设备可以是MR眼镜、MR终端、MR头显、MR可穿戴涉设备等等应用了MR技术的设备。在产品形态上,解码设备可以是头戴式显示设备(Head Mount Display,HMD),头戴式显示设备和主机(即编码设备)之间可以通过无线或有线方式进行通信交互,主机将图像编码后传输到头戴式显示设备,头戴式显示设备解码图像并显示,从而给用户带来VR/AR/MR的视觉体验和交互体验。For example, the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology, AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology, and MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology. In terms of product form, the decoding device can be a head-mounted display device (Head Mount Display, HMD), and the head-mounted display device and the host (ie, the encoding device) can communicate and interact in a wireless or wired manner, and the host encodes the image Then it is transmitted to the head-mounted display device, and the head-mounted display device decodes and displays the image, thereby bringing the user a visual experience and interactive experience of VR/AR/MR.
无人机飞行眼镜是用于实现与无人机的摄像头进行交互的设备。飞行眼镜和无人机之间可以通过无线方式进行通信交互,无人机将所拍摄图像/视频编码后传输到飞行眼镜,飞行眼镜解码图像并显示,从而给用户带来无人机的视野体验,甚至可以实现对无人机的飞行姿态/拍摄方向的控制。UAV flight glasses are devices used to interact with the UAV's camera. The flight glasses and the drone can communicate and interact wirelessly. The drone encodes the captured image/video and transmits it to the flight glasses. The flight glasses decode the image and display it, thereby bringing the user a vision experience of the drone. , It can even realize the control of the drone's flight attitude/shooting direction.
第二方面,本申请实施例提供了一种用于编码图像的装置,该装置应用于编码设备,包括:接收模块、参数调整模块、编码模块和发射模块,其中:接收模块,用于接收解码设备的跟踪信息;所述跟踪信息包括所述解码设备的运动信息或位姿信息;参数调整模块,用于根据所述解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;编码模块,用于根据所述编码信息对所述待处理图像进行编码;发射模块,用于将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数。In the second aspect, an embodiment of the present application provides a device for encoding an image. The device is applied to an encoding device and includes: a receiving module, a parameter adjustment module, an encoding module, and a transmitting module. The receiving module is used for receiving and decoding. The tracking information of the device; the tracking information includes the motion information or the pose information of the decoding device; the parameter adjustment module is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information and The encoding information is associated, and the encoding information includes one or more encoding parameters; an encoding module is used to encode the image to be processed according to the encoding information; and a transmission module is used to send a code stream to the In the decoding device, the code stream includes the one or more encoding parameters.
同理,跟踪信息是解码设备对自身或者用户的运动状态进行跟踪检测而获得的,所述跟踪信息包括运动信息和位姿信息中的至少一种,所述运动信息用于指示所述解码设备的运动情况,具体实施例中,所述运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线加速度。所述位姿信息用于指示 所述解码设备或者用户的位置信息和/或姿态信息,即位姿数信息可表示解码设备在三维空间中的位置和姿态(或方向)。In the same way, the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the decoding device In a specific embodiment, the motion information includes the motion speed and/or acceleration of the decoding device, the motion speed includes angular velocity and/or linear velocity, and the acceleration includes angular acceleration and/or linear acceleration. The pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space.
跟踪信息与编码信息相关联,是指跟踪信息与编码信息之间具有对应的关系,编码设备中存储着两者之间的关系。举例来说,跟踪信息与编码信息之间可以是直接映射的关系,即跟踪信息与编码信息相绑定,参数调整模块通过跟踪信息可以直接确定编码信息。又举例来说,跟踪信息与编码信息之间可以是间接关联的关系,比如参数调整模块需要对跟踪信息进行一定的算法处理或者条件判断才能确定与之对应的编码信息。当接收模块收到解码设备上传的具体的跟踪信息后,根据具体的跟踪信息可以确定与之对应的编码信息。The correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the relationship between the two is stored in the encoding device. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information and the encoding information are bound, and the parameter adjustment module can directly determine the encoding information through the tracking information. For another example, the tracking information and the coded information may be indirectly related. For example, the parameter adjustment module needs to perform certain algorithm processing or conditional judgment on the tracking information to determine the corresponding coded information. After the receiving module receives the specific tracking information uploaded by the decoding device, it can determine the corresponding encoding information according to the specific tracking information.
编码信息包括编码设备的编码器对待处理图像(或称待编码图像)进行编码所采用的一个或多个编码参数。由于编码器是采用编码参数来执行编码过程的,这意味着,采用不同的编码参数在编码过程中所需的计算量不同,计算复杂度不同。也就是说,本申请中,编码设备可以实时基于解码设备上传的跟踪信息来调整自身配置的编码参数,从而实现了编码计算复杂度的调整。The encoding information includes one or more encoding parameters used by the encoder of the encoding device to encode the image to be processed (or called the image to be encoded). Since the encoder uses the encoding parameters to perform the encoding process, this means that the use of different encoding parameters requires different amounts of calculation and different computational complexity during the encoding process. That is to say, in this application, the encoding device can adjust the encoding parameters configured by itself based on the tracking information uploaded by the decoding device in real time, thereby realizing the adjustment of the encoding calculation complexity.
可以看到,本申请实施例的装置可实现实时接收解码设备反馈的包含位置/姿态/线速度/角速度/加速度等至少一种跟踪信息的指令,并根据接收到的跟踪信息对编码设备中的编码器的编码信息进行调整,调整策略可以是调节编码器的计算复杂度(即编码参数),进而降低整个系统时延,后续可将图像相关信息和编码参数发给解码设备,以便于解码设备正常地解码及显示,能极大减少甚至消除画面黑边产生的可能性,也保证了解码设备的显示流畅度,避免了卡顿现象的发生。It can be seen that the device of the embodiment of the present application can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration feedback from the decoding device in real time, and perform an analysis of the tracking information in the encoding device according to the received tracking information. The encoding information of the encoder is adjusted. The adjustment strategy can be to adjust the computational complexity of the encoder (ie encoding parameters), thereby reducing the overall system delay, and subsequently sending the image-related information and encoding parameters to the decoding device to facilitate the decoding device Normal decoding and display can greatly reduce or even eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of jams.
基于第二方面,在可能的实施例中,所述参数调整模块具体用于:根据所述跟踪信息查询预设映射关系,获得所述待处理图像的编码信息,所述预设映射关系包括所述跟踪信息到所述编码信息的映射关系;配置所述编码信息。Based on the second aspect, in a possible embodiment, the parameter adjustment module is specifically configured to: query a preset mapping relationship according to the tracking information to obtain the encoding information of the image to be processed, and the preset mapping relationship includes all The mapping relationship between the tracking information and the encoding information; configuring the encoding information.
基于第二方面,在可能的实施例中,所述一个或多个编码参数包括去块滤波器参数、参考帧数目、运动估计搜索范围、运动估计方式、子像素细分强度、先行优化器参数中的一个或多个。Based on the second aspect, in a possible embodiment, the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, and advanced optimizer parameters One or more of.
基于第二方面,在可能的实施例中,当所述跟踪信息大于等于预设阈值时,所述跟踪信息映射第一编码信息;当所述跟踪信息小于预设阈值时,所述跟踪信息映射第二编码信息,且所述第一编码信息与所述第二编码信息满足以下至少一种关系:Based on the second aspect, in a possible embodiment, when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first encoding information; when the tracking information is less than the preset threshold, the tracking information is mapped Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
所述第一编码信息中的去块滤波器参数用于指示关闭去块滤波器,所述第二编码信息中的去块滤波器参数用于指示开启去块滤波器;The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;
所述第一编码信息中的参考帧数目小于所述第二编码信息中的参考帧数目;所述第一编码信息中的运动估计搜索范围小于所述第二编码信息中的运动估计搜索范围;The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;
所述第一编码信息中运动估计方式的计算量小于所述第二编码信息中运动估计方式的计算量;The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;
所述第一编码信息中的子像素细分强度小于所述第二编码信息中的子像素细分强度;The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;
所述第一编码信息中的先行优化器参数小于所述第二编码信息中的先行优化器参数。The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
基于第二方面,在可能的实施例中,所述跟踪信息为对所述解码设备执行以下至少一种操作产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪。Based on the second aspect, in a possible embodiment, the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.
基于第二方面,在可能的实施例中,所述解码设备的运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线 加速度。Based on the second aspect, in a possible embodiment, the motion information of the decoding device includes the motion speed and/or acceleration of the decoding device, the motion speed includes angular velocity and/or linear velocity, and the acceleration includes angular acceleration And/or linear acceleration.
基于第二方面,在可能的实施例中,所述解码设备包括虚拟现实VR设备、增强现实AR设备、混合现实MR设备、或无人机飞行眼镜中的一种。Based on the second aspect, in a possible embodiment, the decoding device includes one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
在上述第二方面描述的各个实施例中,装置的各功能模块可通过互相配合来实现第一方面相关实施例描述的方法。In each of the embodiments described in the second aspect above, the functional modules of the device can cooperate with each other to implement the methods described in the related embodiments of the first aspect.
第三方面,本申请实施例提供了一种用于编码图像的设备,该设备可以是编码设备,该编码设备包括:存储器、处理器和收发器,存储器、处理器和收发器中的各个部件可通过总线连接,或者存储器、处理器和收发器中的至少两个部件可以耦合设置在一起。其中:In a third aspect, an embodiment of the present application provides a device for encoding an image. The device may be an encoding device. The encoding device includes: a memory, a processor, and a transceiver, and each component in the memory, the processor, and the transceiver It may be connected by a bus, or at least two components of the memory, the processor, and the transceiver may be coupled together. in:
所述收发器用于从外界接收数据和向外界发送数据;The transceiver is used to receive data from the outside world and send data to the outside world;
所述存储器用于存储程序指令和数据;The memory is used to store program instructions and data;
所述处理器用于执行所述存储器中的程序指令,实现如第一方面或第一方面任意可能实施例所描述的方法。The processor is configured to execute program instructions in the memory to implement the method described in the first aspect or any possible embodiment of the first aspect.
第四方面,本申请实施例提供了一种系统,该系统包括编码设备和解码设备,其中:所述解码设备用于,向编码设备发送所述解码设备的跟踪信息;所述跟踪信息包括所述解码设备的运动信息或位姿信息;所述编码设备用于,根据所述解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;根据所述编码信息对所述待处理图像进行编码,并将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数。所述解码设备用于,根据所述码流进行图像解码及显示。In a fourth aspect, an embodiment of the present application provides a system that includes an encoding device and a decoding device, wherein: the decoding device is configured to send tracking information of the decoding device to the encoding device; and the tracking information includes all The motion information or pose information of the decoding device; the encoding device is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding The information includes one or more encoding parameters; the image to be processed is encoded according to the encoding information, and a code stream is sent to the decoding device, and the code stream includes the one or more encoding parameters. The decoding device is used to decode and display the image according to the code stream.
具体的,所述编码设备可以是第二方面或者第三方面任意实施例所描述的编码设备。Specifically, the encoding device may be the encoding device described in any embodiment of the second aspect or the third aspect.
第五方面,本申请实施例提供了一种计算节点集群(或称云端集群),包括:至少一个计算节点,每个计算节点包括处理器和存储器,所述处理器执行所述存储器中的代码执行如第一方面任一实施例所述的方法。In a fifth aspect, an embodiment of the present application provides a computing node cluster (or cloud cluster), including: at least one computing node, each computing node includes a processor and a memory, and the processor executes the code in the memory Perform the method according to any one of the embodiments of the first aspect.
第六方面,本发明实施例提供了一种非易失性计算机可读存储介质;所述计算机可读存储介质用于存储第一方面所述方法的实现代码。所述程序代码被计算机执行时,所述计算机用于实现第一方面任一实施例所描述的方法。In a sixth aspect, an embodiment of the present invention provides a non-volatile computer-readable storage medium; the computer-readable storage medium is used to store implementation code of the method described in the first aspect. When the program code is executed by a computer, the computer is used to implement the method described in any one of the embodiments of the first aspect.
第七方面,本发明实施例提供了一种计算机程序产品;该计算机程序产品包括程序指令,当该计算机程序产品被计算机执行时,该计算机执行前述第一方面任一实施例所描述的方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第一方面的任一种可能的设计提供的方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品,以实现第一方面任一实施例所描述的方法。In a seventh aspect, an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computer, the computer executes the method described in any one of the embodiments of the first aspect. The computer program product may be a software installation package. In the case that any one of the possible designs provided in the foregoing first aspect needs to be used, the computer program product may be downloaded and executed on the computer to achieve The method described in any embodiment of the first aspect.
可以看出,实施本申请实施例,编码设备可实时接收解码设备反馈的包含位置/姿态/线速度/角速度/加速度等至少一种信息的指令,根据解码设备的位置/姿态/线速度/角速度/加速度自动调节编码器的计算复杂度(编码参数),从而调节了编码时延,进而降低整个系统时延,后续可将图像相关信息和所配置的编码参数发给解码设备,以便于解码设备正常地解码及显示。本申请通过降低编码设备在编码过程中的计算复杂度来降低系统时延,能从根本上消除画面黑边产生的可能性,保证了解码设备的显示流畅度,避免了卡顿现象的发生。It can be seen that, implementing the embodiments of the present application, the encoding device can receive instructions that contain at least one type of information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, according to the position/posture/linear velocity/angular velocity of the decoding device /Acceleration automatically adjusts the computational complexity (encoding parameters) of the encoder, thereby adjusting the encoding delay, thereby reducing the overall system delay, and then the image-related information and the configured encoding parameters can be sent to the decoding device for the convenience of the decoding device Decode and display normally. The present application reduces the system delay by reducing the computational complexity of the encoding device in the encoding process, which can fundamentally eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of stuttering.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或背景技术中的技术方案,下面将对本发明实施例或背景技术中所需要使用的附图进行说明。In order to more clearly describe the technical solutions in the embodiments of the present invention or the background art, the following will describe the drawings that need to be used in the embodiments of the present invention or the background art.
图1是本申请实施例提供的一种实例的视频译码系统10的框图;FIG. 1 is a block diagram of an example video decoding system 10 provided by an embodiment of the present application;
图2是本申请实施例所应用的一种设备体验场景的示例图;FIG. 2 is an example diagram of a device experience scenario applied in an embodiment of the present application;
图3是本申请实施例所应用的又一种设备体验场景的示例图;FIG. 3 is an example diagram of yet another device experience scenario applied in an embodiment of the present application;
图4是本申请实施例提供的一种视频译码设备的结构示意图;Figure 4 is a schematic structural diagram of a video decoding device provided by an embodiment of the present application;
图5是本申请实施例提供的一种可用作源设备和目的地设备中的任一个或两个的装置的简化框图;FIG. 5 is a simplified block diagram of a device that can be used as either or both of a source device and a destination device according to an embodiment of the present application;
图6是本申请实施例提供的一种用户佩戴设备转头场景的示例图;FIG. 6 is an example diagram of a head-turning scene of a user wearing a device provided by an embodiment of the present application; FIG.
图7是本申请实施例提供的关于黑边现象的示例图;FIG. 7 is an example diagram of a black border phenomenon provided by an embodiment of the present application;
图8是本申请实施例提供的一种信息传输方案的流程示例图;FIG. 8 is an example flow chart of an information transmission solution provided by an embodiment of the present application;
图9是本申请实施例涉及的四种实现用户与画面交互的跟踪方式的示意图;FIG. 9 is a schematic diagram of four tracking modes for realizing interaction between users and screens involved in an embodiment of the present application;
图10是本申请实施例提供的一种信息传输方法的流程示意图;FIG. 10 is a schematic flowchart of an information transmission method provided by an embodiment of the present application;
图11是本申请实施例提供的一些搜索模板的示意图;FIG. 11 is a schematic diagram of some search templates provided by embodiments of the present application;
图12是本申请实施例提供的一种根据跟踪信息确定编码信息的逻辑示意图;FIG. 12 is a logical schematic diagram of determining encoding information according to tracking information according to an embodiment of the present application; FIG.
图13是本申请实施例提供的又一种信息传输方法的流程示意图;FIG. 13 is a schematic flowchart of yet another information transmission method provided by an embodiment of the present application;
图14是本申请实施例提供的又一种信息传输方案的流程示例图;FIG. 14 is an example flowchart of another information transmission solution provided by an embodiment of the present application; FIG.
图15是本申请实施例提供的一种系统以及该系统中的编码设备和解码设备的结构示意图。FIG. 15 is a structural diagram of a system provided by an embodiment of the present application and an encoding device and a decoding device in the system.
具体实施方式detailed description
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。The embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention. The terms used in the embodiment of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.
本申请的说明书实施例和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. in the specification embodiments and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusion, for example, including a series of steps or units. The method, system, product, or device is not necessarily limited to those clearly listed steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or devices.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" refers to one or more, and "multiple" refers to two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships. For example, "A and/or B" can mean: only A, only B, and both A and B. , Where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a). For example, at least one (a) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
为了更好理解本申请的技术方案,首先描述本发明实施例所应用的视频译码系统架构。视频译码(video coding)通常是指处理形成视频或视频序列的图片序列的技术。本文中使用的视频编码技术可包括视频编码(video encoding)和视频解码(video decoding)。视频编码在源侧执行,通常包括处理(例如,通过压缩)原始视频图片以减少表示该视频图片所需的数据量,从而更高效地存储和/或传输。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重构视频图片。实施例涉及的视频图片“编码”应理解为涉及视频序列的“编码”或“解码”。编码部分和解码部分的组合也称为编解码(编码和解码)。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本文中,术语“视频译码”或“译码”可一 般地指代视频编码或视频解码。In order to better understand the technical solutions of the present application, firstly, the architecture of the video decoding system applied in the embodiments of the present invention will be described. Video coding generally refers to a technology that processes a sequence of pictures that form a video or video sequence. The video encoding technology used in this article may include video encoding (video encoding) and video decoding (video decoding). Video encoding is performed on the source side, and usually includes processing (for example, by compressing) the original video picture to reduce the amount of data required to represent the video picture, so as to store and/or transmit more efficiently. Video decoding is performed on the destination side, and usually includes inverse processing relative to the encoder to reconstruct the video picture. The “encoding” of video pictures involved in the embodiments should be understood as involving the “encoding” or “decoding” of the video sequence. The combination of the encoding part and the decoding part is also called codec (encoding and decoding). As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this document, the term "video coding" or "coding" may generally refer to video encoding or video decoding.
参见图1,图1为本发明实施例中所描述的一种实例的视频译码系统10的框图。如图1所示,视频译码系统10可包括源设备12和目的地设备14,源设备12产生经编码视频数据,因此,源设备12可被称为视频编码装置。目的地设备14可对由源设备12所产生的经编码的视频数据进行解码,因此,目的地设备14可被称为视频解码装置。源设备12、目的地设备14或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体。Referring to FIG. 1, FIG. 1 is a block diagram of a video decoding system 10 according to an example described in an embodiment of the present invention. As shown in FIG. 1, the video coding system 10 may include a source device 12 and a destination device 14. The source device 12 generates encoded video data. Therefore, the source device 12 may be referred to as a video encoding device. The destination device 14 can decode the encoded video data generated by the source device 12, and therefore, the destination device 14 can be referred to as a video decoding device. Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program codes in the form of instructions or data structures that can be accessed by a computer.
源设备12和目的地设备14之间可通过链路13进行通信连接,目的地设备14可经由链路13从源设备12接收经编码视频数据。链路13可包括能够将经编码视频数据从源设备12移动到目的地设备14的一或多个媒体或装置。在一个实例中,链路13可包括使得源设备12能够实时将经编码视频数据直接发射到目的地设备14的一或多个通信媒体。在此实例中,源设备12可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地设备14。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源设备12到目的地设备14的通信的其它设备。The source device 12 and the destination device 14 may communicate with each other via a link 13, and the destination device 14 may receive encoded video data from the source device 12 via the link 13. Link 13 may include one or more media or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time. In this example, the source device 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 14. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
源设备12和目的地设备14可以包括各种装置,源设备12和/或目的地设备14的功能性的存在和(准确)划分可能根据实际设备和应用有所不同。源设备12和目的地设备14中的至少一者可包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、移动电话、智能手机、电视机、相机、显示装置、机顶盒、数字媒体播放器、视频游戏控制台、视频流式传输设备(例如内容服务服务器或内容分发服务器)、广播接收器设备、广播发射器设备、车载设备、移动载具或其类似者。The source device 12 and the destination device 14 may include various devices, and the existence and (accurate) division of the functionality of the source device 12 and/or the destination device 14 may vary according to actual devices and applications. At least one of the source device 12 and the destination device 14 may include a desktop computer, a mobile computing device, a notebook (eg, laptop) computer, a tablet computer, a set-top box, a mobile phone, a smart phone, a television, a camera , Display devices, set-top boxes, digital media players, video game consoles, video streaming equipment (such as content service servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or their Similar.
参见图2,在一些可能实现中,本申请方案可应用于沉浸式的虚拟视觉体验场景。源设备12可以是主机,该主机可以是独立的终端、计算设备、物理服务器,也可以是云计算(cloud computing)平台。目的地设备14可以是虚拟现实(Virtual Reality,VR)设备、增强现实(Augmented Reality,AR)设备、混合现实(Mixed Reality,MR)设备或其类似者。Referring to FIG. 2, in some possible implementations, the solution of the present application can be applied to an immersive virtual visual experience scene. The source device 12 may be a host, which may be an independent terminal, a computing device, a physical server, or a cloud computing (cloud computing) platform. The destination device 14 may be a virtual reality (Virtual Reality, VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (Mixed Reality, MR) device, or the like.
例如VR设备可以是VR眼镜、VR头显、VR盒子等等应用了VR技术的设备,AR设备可以是AR眼镜、AR电视、AR头显等等应用了AR技术的设备,MR设备可以是MR眼镜、MR终端、MR头显、MR可穿戴涉设备等等应用了MR技术的设备。For example, the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology, AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology, and MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology.
在产品形态上,目的地设备14可以是头戴式显示设备(Head Mount Display,HMD),头戴式显示设备和主机之间可以通过无线或有线方式进行通信交互,主机将图像编码后传输到头戴式显示设备,头戴式显示设备解码图像并显示,从而给用户带来VR/AR/MR的视觉体验和交互体验。头戴式显示设备例如可以是移动端头显或主机端头显。移动端头显例如VR/AR/MR眼镜、VR/AR/MR手机盒子等,可与主机无线连接(例如蓝牙、WIFI、移动网络等)。主机端头显也可称为外接头戴式设备,这种头显需要有线连接到主机以及其他配件进行使用。In terms of product form, the destination device 14 can be a head-mounted display device (Head Mount Display, HMD). The head-mounted display device and the host can communicate and interact in a wireless or wired manner. The host encodes the image and transmits it to Head-mounted display device, the head-mounted display device decodes the image and displays it, thereby bringing the user a visual experience and interactive experience of VR/AR/MR. The head-mounted display device may be, for example, a mobile-end headset or a host-end headset. The mobile terminal head display, such as VR/AR/MR glasses, VR/AR/MR mobile phone box, etc., can be connected to the host wirelessly (such as Bluetooth, WIFI, mobile network, etc.). The host-side headset can also be called an external head-mounted device, which requires a wired connection to the host and other accessories for use.
另外,在又一可能的实现中,还可以将主机的计算功能集成到头戴式显示设备中,例如该头戴式显示设备可以是一体机头显,一体机头显具备独立的显示设备(作为解码端)和计算单元(作为编码端),这两者在一体机头显内部完成通信交互。In addition, in another possible implementation, the computing function of the host can also be integrated into the head-mounted display device. For example, the head-mounted display device can be an all-in-one headset, and the all-in-one headset has an independent display device ( As the decoding end) and the computing unit (as the encoding end), the two complete the communication interaction inside the all-in-one headset.
参见图3,在又一些可能实现中,本申请方案还可应用于无人载具的控制或视觉体验场 景。例如源设备12可以是无人机、无人驾驶汽车(图未示)等,该源设备12可配置有摄像头,以进行图像捕获及编码。目的地设备14可以是无人机的飞行眼镜、无人驾驶汽车控制装置(图未示)或其类似者。如图3示出了通过飞行眼镜和无人机进行交互的场景,飞行眼镜和无人机之间可以通过无线方式进行通信交互,无人机将所拍摄图像/视频编码后传输到飞行眼镜,飞行眼镜解码图像并显示,从而给用户带来无人机的视野体验,甚至可以实现对无人机的飞行姿态/拍摄方向的控制。Referring to Fig. 3, in other possible implementations, the solution of the present application can also be applied to the control or visual experience scenes of unmanned vehicles. For example, the source device 12 may be a drone, an unmanned car (not shown), etc., and the source device 12 may be equipped with a camera for image capture and encoding. The destination device 14 may be a drone's flying glasses, an unmanned car control device (not shown), or the like. Figure 3 shows the scene of interaction between the flying glasses and the drone. The flying glasses and the drone can communicate and interact wirelessly. The drone encodes the captured image/video and transmits it to the flying glasses. The flying glasses decode the image and display it, so as to bring the user a vision experience of the drone, and even realize the control of the drone's flight attitude/shooting direction.
进一步地,源设备12包括编码器20,另外可选地,源设备12还可以包括图片源16、图片预处理器18、以及通信接口22。具体实现形态中,所述编码器20、图片源16、图片预处理器18、以及通信接口22可能是源设备12中的硬件部件,也可能是源设备12中的软件程序。分别描述如下:Further, the source device 12 includes an encoder 20, and optionally, the source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22. In a specific implementation form, the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:
图片源16,可以包括或可以为任何类别的图片捕获设备,用于例如捕获现实世界图片,视频,和/或任何类别的图片或评论(对于屏幕内容编码,屏幕上的一些文字也认为是待编码的图片或图像的一部分)生成设备,例如,用于生成计算机动画图片的计算机图形处理器,或用于获取和/或提供现实世界图片(例如相机所拍摄图像)、计算机动画图片(例如,屏幕内容、VR图片)的任何类别设备,和/或其任何组合(例如,AR/MR图片)。图片源16可以为用于捕获图片的相机或者用于存储图片的存储器,图片源16还可以包括存储先前捕获或产生的图片和/或获取或接收图片的任何类别的(内部或外部)接口。在视频译码领域,术语“图片(picture)”、“帧(frame)”或“图像(image)”可以用作同义词。当图片源16为相机时,图片源16可例如为本地的或集成在源设备中的集成相机;当图片源16为存储器时,图片源16可为本地的或例如集成在源设备中的集成存储器。当所述图片源16包括接口时,接口可例如为从外部视频源接收图片的外部接口,外部视频源例如为外部图片捕获设备,比如相机、外部存储器或外部图片生成设备,外部图片生成设备例如为外部计算机图形处理器、计算机或服务器。接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。The picture source 16, which can include or can be any type of picture capture device, for example to capture real-world pictures, videos, and/or any type of pictures or comments (for screen content encoding, some text on the screen is also considered to be waiting The encoded picture or part of the image) generating equipment, for example, a computer graphics processor for generating computer animation pictures, or for acquiring and/or providing real-world pictures (such as images taken by a camera), computer animation pictures (for example, Screen content, VR pictures), and/or any combination thereof (for example, AR/MR pictures). The picture source 16 may be a camera for capturing pictures or a memory for storing pictures. The picture source 16 may also include any type (internal or external) interface for storing previously captured or generated pictures and/or acquiring or receiving pictures. In the field of video coding, the terms "picture", "frame" or "image" can be used as synonyms. When the picture source 16 is a camera, the picture source 16 may be, for example, a local or an integrated camera integrated in the source device; when the picture source 16 is a memory, the picture source 16 may be local or, for example, an integrated camera integrated in the source device. Memory. When the picture source 16 includes an interface, the interface may be, for example, an external interface for receiving pictures from an external video source. The external video source is, for example, an external picture capturing device, such as a camera, an external memory, or an external picture generating device, such as It is an external computer graphics processor, computer or server. The interface can be any type of interface based on any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.
本发明实施例中,由图片源16传输至图片预处理器的图片也可称为原始图片数据17。In the embodiment of the present invention, the picture transmitted from the picture source 16 to the picture preprocessor may also be referred to as original picture data 17.
图片预处理器18,用于接收原始图片数据17并对原始图片数据17执行预处理,以获取经预处理的图片19或经预处理的图片数据19。例如,图片预处理器18执行的预处理可以包括图像渲染、整修、色彩格式转换、调色或去噪中的一种或多种。The picture preprocessor 18 is configured to receive the original picture data 17 and perform preprocessing on the original picture data 17 to obtain the preprocessed picture 19 or the preprocessed picture data 19. For example, the pre-processing performed by the picture pre-processor 18 may include one or more of image rendering, trimming, color format conversion, toning, or denoising.
编码器20,用于接收经预处理的图片数据19,采用配置的编码预测模式和编码参数对经预处理的图片数据19进行处理,从而提供经编码图片数据21。通常地,图片数据可分割成不重叠的块(也称为图像块,或视频块)集合,换句话说,编码器20当前的待处理图像可以包括块层级的一个或多个块(也称为图像块,或视频块)。编码器20可以在块层级上进行编码。本文中,术语“待处理图像”可表示图片或帧的一部分。具体的,“待处理图像”可以是“待处理图像块”,即当前待处理的块。在编码中,待处理图像可包括当前待编码的块;在解码中,待处理图像可包括当前待解码的块。例如,在编码器侧,通过空间(图片内)预测和时间(图片间)预测来产生预测块,从当前块(当前处理或待处理的块)减去预测块以获取残差块,在变换域变换残差块并量化残差块,以减少待传输(压缩)的数据量,而解码器侧将相对于编码器的逆处理部分应用于经编码或经压缩块,以重构用于表示的当前块。另外,编码器复制解码器处理循环,使得编码器和解码器生成相同的预测(例如帧内预测和帧间预测)和/或重构,用于处理亦即编码后续块。The encoder 20 is configured to receive the pre-processed picture data 19, and process the pre-processed picture data 19 using the configured coding prediction mode and coding parameters, so as to provide the coded picture data 21. Generally, picture data can be divided into a set of non-overlapping blocks (also called image blocks, or video blocks). In other words, the current image to be processed by the encoder 20 may include one or more blocks at the block level (also called Is an image block, or a video block). The encoder 20 can perform encoding at the block level. Here, the term "image to be processed" may refer to a part of a picture or a frame. Specifically, the "image to be processed" may be a "image block to be processed", that is, a block currently to be processed. In encoding, the image to be processed may include the block currently to be encoded; in decoding, the image to be processed may include the block currently to be decoded. For example, on the encoder side, the prediction block is generated through spatial (intra-picture) prediction and temporal (inter-picture) prediction, and the prediction block is subtracted from the current block (currently processed or to-be-processed block) to obtain the residual block. Domain transforms the residual block and quantizes the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side applies the inverse processing part relative to the encoder to the coded or compressed block to reconstruct it for representation The current block. In addition, the encoder duplicates the decoder processing loop, so that the encoder and the decoder generate the same prediction (for example, intra-frame prediction and inter-frame prediction) and/or reconstruction for processing, that is, to encode subsequent blocks.
在一些实施例中,编码器20可以用于执行后文所描述的各个实施例,以实现本发明所描述的信息传输方法在编码侧的应用。In some embodiments, the encoder 20 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the encoding side.
通信接口22,可用于接收经编码图片数据21,并可通过链路13将经编码图片数据21传输至目的地设备14或任何其它设备(如存储器),以用于存储或直接重构,所述其它设备可为任何用于解码或存储的设备。通信接口22可例如用于将经编码图片数据21封装成合适的格式,例如数据包,以在链路13上传输。The communication interface 22 can be used to receive the encoded picture data 21, and can transmit the encoded picture data 21 to the destination device 14 or any other device (such as a memory) through the link 13 for storage or direct reconstruction, so The other device can be any device used for decoding or storage. The communication interface 22 can be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a data packet, for transmission on the link 13.
目的地设备14包括解码器30,另外可选地,目的地设备14还可以包括通信接口28、图片后处理器32和显示设备34。分别描述如下:The destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, a picture post-processor 32, and a display device 34. They are described as follows:
通信接口28,可用于从源设备12或任何其它源接收经编码图片数据21,所述任何其它源例如为存储设备,存储设备例如为经编码图片数据存储设备。通信接口28可以用于藉由源设备12和目的地设备14之间的链路13或藉由任何类别的网络传输或接收经编码图片数据21,链路13例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。通信接口28可以例如用于解封装通信接口22所传输的数据包以获取经编码图片数据21。The communication interface 28 can be used to receive the encoded picture data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded picture data storage device. The communication interface 28 can be used to transmit or receive the encoded picture data 21 via the link 13 between the source device 12 and the destination device 14 or via any type of network. The link 13 is, for example, a direct wired or wireless connection. The type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof. The communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded picture data 21.
通信接口28和通信接口22都可以配置为单向通信接口或者双向通信接口,以及可以用于例如发送和接收消息来建立连接、确认和交换任何其它与通信链路和/或例如经编码图片数据传输的数据传输有关的信息。Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded picture data Information about the transmission of the transmitted data.
解码器30,用于接收经编码图片数据21并解析出码流中传输的指示信息,该指示信息指示了编码器20编码图像时所采用的编码参数,基于该经编码图片数据21和该指示信息实现图像解码,从而提供经解码图片数据31(也称为经重构图片数据)。在一些实施例中,解码器30可以用于执行后文所描述的各个实施例,以实现本发明所描述的信息传输方法在解码侧的应用。The decoder 30 is configured to receive the encoded picture data 21 and parse out the instruction information transmitted in the code stream. The instruction information indicates the encoding parameters used when the encoder 20 encodes the image, based on the encoded picture data 21 and the instruction The information enables image decoding, thereby providing decoded picture data 31 (also referred to as reconstructed picture data). In some embodiments, the decoder 30 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the decoding side.
图片后处理器32,用于对经解码图片数据31执行后处理,以获得经后处理图片数据33。例如图片后处理器32执行的后处理可以包括:渲染、色彩格式转换、调色、整修或重采样,或任何其它处理中的一个或多个,还可用于将将经后处理图片数据33传输至显示设备34。本申请可选的实施例中,解码设备还可以根据跟踪信息(例如速度、角速度、加速度、线速度、位置、姿态等信息)调整图片后处理器32所采用的一种或多种处理算法的开启或关闭,例如可以调整如下算法中的至少一种:标准动态范围(Standard-Dynamic Range,SDR)图像算法,高动态范围(High-Dynamic Range,HDR)图像算法,图像增强算法,图像超分辨率算法等等。The picture post processor 32 is configured to perform post-processing on the decoded picture data 31 to obtain the post-processed picture data 33. For example, the post-processing performed by the picture post-processor 32 may include one or more of: rendering, color format conversion, toning, trimming or resampling, or any other processing, and may also be used to transmit the post-processed picture data 33 To display device 34. In an optional embodiment of the present application, the decoding device can also adjust one or more processing algorithms used by the picture post-processor 32 according to tracking information (such as speed, angular velocity, acceleration, linear velocity, position, posture, etc.). Turn it on or off, for example, you can adjust at least one of the following algorithms: Standard-Dynamic Range (SDR) image algorithm, High-Dynamic Range (HDR) image algorithm, image enhancement algorithm, image super-resolution Rate algorithm and so on.
显示设备34,用于接收经后处理图片数据33以向例如用户或观看者显示图片。显示设备34可以为或可以包括任何类别的用于呈现经重构图片的显示器,例如,集成的或外部的显示器或监视器。例如,显示器可以包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微LED显示器、硅基液晶(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任何类别的其它显示器。The display device 34 is configured to receive the post-processed picture data 33 to display the picture to, for example, a user or a viewer. The display device 34 may be or may include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor. For example, the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (digital light processor, DLP) or any type of other display.
虽然,图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对 应的功能性。Although FIG. 1 shows the source device 12 and the destination device 14 as separate devices, the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or Corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .
编码器20和解码器30都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the technology is partially implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.
在一些情况下,图1中所示视频译码系统10仅为示例,本申请的技术可以适用于不必包含编码和解码设备之间的任何数据通信的视频编码设置(例如,视频编码或视频解码)。在其它实例中,数据可从本地存储器检索、在网络上流式传输等。视频编码设备可以对数据进行编码并且将数据存储到存储器,和/或视频解码设备可以从存储器检索数据并且对数据进行解码。在一些实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的设备执行编码和解码。In some cases, the video decoding system 10 shown in FIG. 1 is only an example, and the technology of this application can be applied to video encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, video encoding or video decoding). ). In other instances, the data can be retrieved from local storage, streamed on the network, etc. The video encoding device can encode data and store the data to the memory, and/or the video decoding device can retrieve the data from the memory and decode the data. In some instances, encoding and decoding are performed by devices that do not communicate with each other but only encode data to the memory and/or retrieve data from the memory and decode the data.
参见图4,图4是本申请实施例提供的一种视频译码设备400(例如视频编码设备400或视频解码设备400)的结构示意图。视频译码设备400适于实施本文所描述的实施例。在一个实施例中,视频译码设备400可以是视频解码器(例如图1的解码器30)或视频编码器(例如图1的编码器20)。在另一个实施例中,视频译码设备400可以是上述图1的解码器30或图1的编码器20中的一个或多个组件。Referring to FIG. 4, FIG. 4 is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application. The video coding device 400 is suitable for implementing the embodiments described herein. In one embodiment, the video coding device 400 may be a video decoder (for example, the decoder 30 of FIG. 1) or a video encoder (for example, the encoder 20 of FIG. 1). In another embodiment, the video coding device 400 may be one or more components of the decoder 30 in FIG. 1 or the encoder 20 in FIG. 1 described above.
视频译码设备400包括:用于接收数据的入口端口410和接收器单元(Rx)420,用于处理数据的处理器、逻辑单元或中央处理器(CPU)430,用于传输数据的发射器单元(Tx)440和出口端口450,以及,用于存储数据的存储器460。视频译码设备400还可以包括与入口端口410、接收器单元420、发射器单元440和出口端口450耦合的光电转换组件和电光(EO)组件,用于光信号或电信号的出口或入口。The video decoding device 400 includes: an entry port 410 for receiving data and a receiver unit (Rx) 420, a processor, logic unit or central processing unit (CPU) 430 for processing data, and a transmitter for transmitting data A unit (Tx) 440 and an outlet port 450, and a memory 460 for storing data. The video decoding device 400 may further include a photoelectric conversion component and an electro-optical (EO) component coupled with the inlet port 410, the receiver unit 420, the transmitter unit 440, and the outlet port 450 for the outlet or inlet of optical or electrical signals.
处理器430通过硬件和软件实现。处理器430可以实现为一个或多个CPU芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入口端口410、接收器单元420、发射器单元440、出口端口450和存储器460通信。处理器430包括译码模块470(例如编码模块470或解码模块470)。编码/解码模块470实现本文中所公开的实施例,以实现本发明实施例所提供的。例如,编码/解码模块470实现、处理或提供各种编码操作。因此,通过编码/解码模块470为视频译码设备400的功能提供了实质性的改进,并影响了视频译码设备400到不同状态的转换。或者,以存储在存储器460中并由处理器430执行的指令来实现编码/解码模块470。The processor 430 is implemented by hardware and software. The processor 430 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs. The processor 430 communicates with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460. The processor 430 includes a decoding module 470 (for example, an encoding module 470 or a decoding module 470). The encoding/decoding module 470 implements the embodiments disclosed herein to implement what is provided in the embodiments of the present invention. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations. Therefore, the encoding/decoding module 470 provides a substantial improvement to the function of the video decoding device 400 and affects the conversion of the video decoding device 400 to different states. Alternatively, the encoding/decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择性地执行这些程序时存储程序,并存储在程序执行过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(ROM)、随机存取存储器(RAM)、随机存取存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(SRAM)。The memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution. The memory 460 may be volatile and/or non-volatile, and may be read only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).
参见图5,图5是根据一示例性实施例的可用作图1中的源设备12和目的地设备14中 的任一个或两个的装置500的简化框图。装置500可以采用包含多个计算设备(如多个计算芯片或多个服务器)的计算系统的形式,或采用诸如桌上型计算机、移动计算装置、笔记型计算机、平板计算机、机顶盒、移动电话、智能手机、电视机、相机、显示装置、机顶盒、数字媒体播放器、视频游戏控制台、视频流式传输设备、广播接收器设备、广播发射器设备、车载设备、移动载具或其类似者等单个计算设备的形式。Referring to FIG. 5, FIG. 5 is a simplified block diagram of an apparatus 500 that can be used as either or both of the source device 12 and the destination device 14 in FIG. 1, according to an exemplary embodiment. The apparatus 500 may take the form of a computing system containing multiple computing devices (such as multiple computing chips or multiple servers), or adopt such forms as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, mobile phones, Smartphones, televisions, cameras, display devices, set-top boxes, digital media players, video game consoles, video streaming equipment, broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or the like, etc. The form of a single computing device.
装置500中的处理器502可以为中央处理器。或者,处理器502可以为现有的或今后将研发出的能够操控或处理信息的任何其它类型的设备或多个设备。如图所示,虽然可以使用例如处理器502的单个处理器实践所揭示的实施方式,但是使用一个以上处理器可以实现速度和效率方面的优势。The processor 502 in the device 500 may be a central processing unit. Alternatively, the processor 502 may be any other type of device or multiple devices that can manipulate or process information that is currently or will be developed in the future. As shown in the figure, although a single processor, such as processor 502, may be used to practice the disclosed embodiments, the use of more than one processor may achieve advantages in terms of speed and efficiency.
在一实施方式中,装置500中的存储器504可以为只读存储器(Read Only Memory,ROM)设备或随机存取存储器(random access memory,RAM)设备。任何其他合适类型的存储设备都可以用作存储器504。存储器504可以包括代码和由处理器502使用总线512访问的数据506。存储器504可进一步包括操作系统508和应用程序510,应用程序510包含至少一个准许处理器502执行本文所描述的方法的程序。例如,应用程序510可以包括应用1到N,应用1到N进一步包括执行本文所描述的方法的视频编码应用,例如AR/VR/MR应用程序,无人机飞行/拍摄控制应用程序、无人驾驶控制应用程序等等。装置500还可包含采用从存储器514形式的附加存储器,该从存储器514例如可以为与移动计算设备一起使用的存储卡。因为视频通信会话可能含有大量信息,这些信息可以整体或部分存储在从存储器514中,并按需要加载到存储器504用于处理。In an embodiment, the memory 504 in the apparatus 500 may be a read only memory (Read Only Memory, ROM) device or a random access memory (random access memory, RAM) device. Any other suitable type of storage device can be used as the memory 504. The memory 504 may include code and data 506 accessed by the processor 502 using the bus 512. The memory 504 may further include an operating system 508 and an application program 510, and the application program 510 includes at least one program that permits the processor 502 to execute the method described herein. For example, the application program 510 may include applications 1 to N, and applications 1 to N further include video encoding applications that perform the methods described herein, such as AR/VR/MR applications, drone flight/shooting control applications, unmanned Driving control applications and more. The apparatus 500 may also include additional memory in the form of a slave memory 514, which may be, for example, a memory card used with a mobile computing device. Because the video communication session may contain a large amount of information, this information may be stored in the slave memory 514 in whole or in part, and loaded into the memory 504 for processing as needed.
装置500还可包含一或多个输出设备,例如显示器518。在一个实例中,显示器518可以为将显示器和可操作以感测触摸输入的触敏元件组合的触敏显示器。显示器518可以通过总线512耦合于处理器502。除了显示器518还可以提供其它准许用户对装置500编程或以其它方式使用装置500的输出设备,或提供其它输出设备作为显示器518的替代方案。当输出设备是显示器或包含显示器时,显示器可以以不同方式实现,包含通过液晶显示器(liquid crystal display,LCD)、阴极射线管(cathode-ray tube,CRT)显示器、等离子显示器或发光二极管(light emitting diode,LED)显示器,如有机LED(organic LED,OLED)显示器。The apparatus 500 may also include one or more output devices, such as a display 518. In one example, the display 518 may be a touch-sensitive display that combines a display and a touch-sensitive element operable to sense touch input. The display 518 may be coupled to the processor 502 through the bus 512. In addition to the display 518, other output devices that allow the user to program the device 500 or use the device 500 in other ways may also be provided, or other output devices may be provided as an alternative to the display 518. When the output device is a display or includes a display, the display can be implemented in different ways, including through a liquid crystal display (LCD), a cathode-ray tube (CRT) display, a plasma display, or a light emitting diode (light emitting diode). Diode, LED) displays, such as organic LED (organic LED, OLED) displays.
装置500还可包含图像感测设备520或与其连通,图像感测设备520例如为相机(摄像头)、红外探测器或为现有的或今后将研发出的可以感测图像的任何其它图像感测设备520。图像感测设备520可以放置为直接面向运行装置500的用户,也可以放置为直接面向外部环境。在一实例中,可以配置图像感测设备520的位置和光轴以使其视野包含紧邻显示器518的区域且从该区域可见显示器518。The device 500 may also include or be connected to an image sensing device 520, such as a camera (camera), an infrared detector, or any other image sensing device that can sense images that is currently or will be developed in the future. Equipment 520. The image sensing device 520 may be placed directly facing the user running the device 500, or may be placed directly facing the external environment. In an example, the position and optical axis of the image sensing device 520 may be configured such that its field of view includes an area immediately adjacent to the display 518 and the display 518 is visible from the area.
当装置500为目的地设备14时,可选的还包括运动感测设备522,运动感测设备522可用于实现用户与目的地设备的交互。具体的,运动感测设备522可用于检测目的地设备或者用户身体部位的位置/姿态/线速度/角速度/加速度等至少一种信息,实现本申请实施例描述的跟踪方式:头部跟踪、手势跟踪、眼动跟踪和运行跟踪。例如,当运动感测设备522用于执行本申请实施例描述的头部跟踪时,该运动感测设备522可包括加速度计、陀螺仪、磁力计、光学捕捉设备、惯性传感器等至少一种传感器,以实时监测佩戴该目的地设备的用户的头部的转动角度、角速度、角加速度、转动方向等至少一种信息。当运动感测设备522用于执行本申请实施例描述的手势跟踪时,该运动感测设备522可包括加速度计、陀螺仪、磁力计、惯性传感器、光学摄像头、红外相机、深度传感器等光学捕捉设备等至少一种设备,从而实 时监测用户的手部的姿势、形态、移动速度、移动方向等至少一种信息。当运动感测设备522用于执行本申请实施例描述的眼动跟踪时,该运动感测设备522可包括内置摄像头、眼动仪、红外控制器、虹膜图像检测器等至少一种设备,以实时监测用户眼球的位置、注视方向、移动方向、移动速度等至少一种信息。当运动感测设备522用于执行本申请实施例描述的运动跟踪时,该运动感测设备522可包括惯性测量单元(IMU)、加速度计、陀螺仪、磁力计、深度摄像头或者SLAM(simultaneous localization and mapping,即时定位与地图构建)系统等至少一种设备,以监测用户在现实环境中移动的速度、加速度、方向、位置、姿态等至少一种信息。When the apparatus 500 is the destination device 14, it may optionally further include a motion sensing device 522, and the motion sensing device 522 may be used to realize the interaction between the user and the destination device. Specifically, the motion sensing device 522 can be used to detect at least one type of information such as the location/posture/linear velocity/angular velocity/acceleration of the destination device or the user’s body part to implement the tracking methods described in the embodiments of the present application: head tracking, gestures Tracking, eye tracking and running tracking. For example, when the motion sensing device 522 is used to perform the head tracking described in the embodiments of the present application, the motion sensing device 522 may include at least one sensor such as an accelerometer, a gyroscope, a magnetometer, an optical capture device, and an inertial sensor. , To monitor in real time at least one type of information such as the rotation angle, angular velocity, angular acceleration, and rotation direction of the head of the user wearing the destination device. When the motion sensing device 522 is used to perform the gesture tracking described in the embodiments of this application, the motion sensing device 522 may include optical capture such as accelerometers, gyroscopes, magnetometers, inertial sensors, optical cameras, infrared cameras, and depth sensors. At least one device such as a device, so as to monitor at least one kind of information such as the posture, shape, movement speed, and movement direction of the user's hand in real time. When the motion sensing device 522 is used to perform the eye tracking described in the embodiments of the present application, the motion sensing device 522 may include at least one device such as a built-in camera, an eye tracker, an infrared controller, an iris image detector, etc. Real-time monitoring of at least one information of the user's eyeball position, gaze direction, movement direction, and movement speed. When the motion sensing device 522 is used to perform the motion tracking described in the embodiments of the present application, the motion sensing device 522 may include an inertial measurement unit (IMU), an accelerometer, a gyroscope, a magnetometer, a depth camera, or SLAM (simultaneous localization). and mapping, real-time positioning and map construction) system and other at least one device to monitor at least one type of information such as the speed, acceleration, direction, position, and posture of the user moving in the real environment.
虽然图5中将装置500的处理器502和存储器504绘示为集成在单个单元中,但是还可以使用其它配置。处理器502的运行可以分布在多个可直接耦合的机器中(每个机器具有一个或多个处理器),或分布在本地区域或其它网络中。存储器504可以分布在多个机器中,例如基于网络的存储器或多个运行装置500的机器中的存储器。虽然此处只绘示单个总线,但装置500的总线512可以由多个总线形成。进一步地,从存储器514可以直接耦合至装置500的其它组件或可以通过网络访问,并且可包括单个集成单元,例如一个存储卡,或多个单元,例如多个存储卡。因此,可以以多种配置实施装置500。Although the processor 502 and the memory 504 of the device 500 are shown as integrated in a single unit in FIG. 5, other configurations may also be used. The operation of the processor 502 may be distributed in multiple directly coupled machines (each machine has one or more processors), or distributed in a local area or other network. The storage 504 may be distributed in multiple machines, such as a network-based storage or storage in multiple machines running the apparatus 500. Although only a single bus is shown here, the bus 512 of the device 500 may be formed by multiple buses. Further, the slave memory 514 may be directly coupled to other components of the device 500 or may be accessed through a network, and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Therefore, the device 500 can be implemented in a variety of configurations.
现有的VR/AR/MR技术能让用户获得沉浸式的视觉体验,并且满足用户与画面之间的交互。The existing VR/AR/MR technology allows users to obtain an immersive visual experience and satisfy the interaction between the user and the screen.
参见图6,以头戴式VR眼镜为例,就目前的VR眼镜来讲,主要通过两方面来达到沉浸感以及交互的目的:Referring to Figure 6, taking head-mounted VR glasses as an example, as far as the current VR glasses are concerned, immersion and interaction are mainly achieved through two aspects:
一是现在的VR眼镜大概会产生超过90度(例如90-120度)范围的图像视野(Field of Vision,FOV),通过经过放大的显示屏技术,能够在用户眼前显示出一个放大的局部虚拟景象,在这个显示范围内,通过三维引擎技术可产生实时的三维图像。One is that the current VR glasses will probably produce a field of vision (FOV) that exceeds 90 degrees (for example, 90-120 degrees). Through the enlarged display technology, it can display a magnified partial virtual reality in front of the user’s eyes. Scenery, within this display range, real-time three-dimensional images can be generated through three-dimensional engine technology.
二是通过和头部的位姿传感(例如头部的陀螺仪)采集的数据配合,让三维引擎响应头部转动方向(和当前头部位置变化),当人转动头部时,陀螺仪能够通知图像生成引擎相应渲染新的画面,图像生成引擎再将新的画面传回到VR眼镜,VR眼镜实时更新显示的三维图像。这个过程中,用户头部转动的角度刚好和三维引擎模拟的三维图像视觉一致,让用户觉得放佛是通过一个大窗口在观察一个环绕的虚拟的三维世界。由于用户通过头部转动在虚拟世界中产生了用户能够理解的画面变化,用户就认为该虚拟世界对用户发生了反馈,那么用户的动作和虚拟世界对用户的反馈组合在一起,就形成一次交互作用。The second is to cooperate with the data collected by the head's position and attitude sensor (such as the head gyroscope), so that the three-dimensional engine responds to the head rotation direction (and the current head position change). When the person turns the head, the gyroscope It can notify the image generation engine to render a new screen accordingly, and the image generation engine sends the new screen back to the VR glasses, and the VR glasses update the displayed three-dimensional images in real time. In this process, the angle of the user's head rotation is exactly the same as the visual 3D image simulated by the 3D engine, making the user feel that the Buddha is observing a surrounding virtual 3D world through a large window. Because the user’s head rotation produces a picture change that the user can understand in the virtual world, the user thinks that the virtual world has feedback to the user, then the user’s actions and the virtual world’s feedback to the user are combined to form an interaction effect.
在CloudVR技术中,图像生成引擎位于云端服务器,即游戏画面渲染在服务器端进行,游戏交互在VR眼镜端进行,服务器和VR眼镜之间通过无线网络相连,渲染后新的画面通过无线传输方式传回VR眼镜显示。如图6所示,在用户带上VR眼镜后,假设当前视野范围(FOV)为“视野1”。当用户转头一定角度时,视野从“视野1”转动到“视野2”,如果画面没有及时更新,人眼将可能会在“视野2”的边缘感受到黑色区域,即黑边。如图7中的(1)示出了一种画面无黑边的VR场景,图7中的(2)示出了一种画面有黑边的VR场景。In CloudVR technology, the image generation engine is located on the cloud server, that is, the game screen rendering is performed on the server side, and the game interaction is performed on the VR glasses side. The server and the VR glasses are connected through a wireless network. After rendering, the new image is transmitted through wireless transmission. Back to the VR glasses display. As shown in Figure 6, after the user wears the VR glasses, it is assumed that the current field of view (FOV) is "field of view 1". When the user turns his head at a certain angle, the field of view rotates from "field of view 1" to "field of view 2". If the picture is not updated in time, the human eye may feel a black area at the edge of "field of view 2", that is, black edges. (1) in FIG. 7 shows a VR scene without black borders, and (2) in FIG. 7 shows a VR scene with black borders.
通常来讲,图像信息在无线传输中存在一定时延,当用户转头过快时,需要更新的画面的频率增加。而时延的存在导致这些需要更新的画面没有及时在VR眼镜中显示,从而导致了黑边现象的存在,另外,由于画面更新不及时,VR眼镜还会出现画面卡顿等不流畅感,显著影响VR游戏体验。Generally speaking, there is a certain time delay in the wireless transmission of image information. When the user turns his head too fast, the frequency of the picture that needs to be updated increases. And the existence of time delay causes these images that need to be updated are not displayed in the VR glasses in time, which leads to the existence of black borders. In addition, due to the untimely update of the images, the VR glasses will also experience a sense of unsmoothness such as screen freezes, which is significant. Affect the VR gaming experience.
本申请实施例图1-图4描述的设备/装置能够解决现有技术的缺陷,既能够同时避免在解码端出现黑边、卡顿等不流畅现象,又能够保证图像分辨率。The devices/devices described in Figures 1 to 4 of the embodiments of the present application can solve the defects of the prior art, and can simultaneously avoid the occurrence of black borders, freezes and other unsmooth phenomena at the decoding end, and can also ensure the image resolution.
本申请实施例提出的技术方案如图8所示,CloudVR系统包括云端服务器(相当于本申请描述的源设备12)和VR设备(相当于本申请描述的目的地设备14)两部分,其中云端服务器接收VR设备反馈的包含姿态/转动角速度/加速度等信息的指令后,游戏渲染引擎将渲染出对应的图像(渲染分辨率可维持不变),并将该图像和姿态/转动角速度/加速度等信息发送到编码器。编码器通过判断VR设备的转动速度/加速度/姿态等信息,自动调节编码器的计算复杂度(编码参数)从而调节了编码器的编码时延,后续可将图像相关信息发给VR设备解码及显示。其中,当VR设备转速较快时,降低编码器的计算复杂度,从而降低编码器的编码时延,进而降低CloudVR系统时延,从而消除画面黑边产生的可能性;当VR设备转速较慢时,编码器的计算复杂度恢复正常,系统时延恢复正常。由于本方案中VR设备能够实时、及时地解码及显示图像,所以也保证了VR设备的显示流畅度,避免了卡顿现象的发生。The technical solution proposed by the embodiment of this application is shown in Figure 8. The CloudVR system includes two parts: a cloud server (equivalent to the source device 12 described in this application) and a VR device (equivalent to the destination device 14 described in this application). After the server receives the instruction that contains information such as posture/rotational angular velocity/acceleration feedback from the VR device, the game rendering engine will render the corresponding image (the rendering resolution can be maintained unchanged), and combine the image with the posture/rotational angular velocity/acceleration, etc. The information is sent to the encoder. The encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder by judging the rotation speed/acceleration/posture of the VR device and other information to adjust the encoding delay of the encoder. Later, the image-related information can be sent to the VR device for decoding and show. Among them, when the VR device rotates fast, reduce the computational complexity of the encoder, thereby reducing the encoding delay of the encoder, and then reduce the CloudVR system delay, thereby eliminating the possibility of black edges on the screen; when the VR device rotates slowly When the time, the computational complexity of the encoder returns to normal, and the system delay returns to normal. Since the VR device in this solution can decode and display images in real time and in time, the smoothness of the display of the VR device is also ensured, and the occurrence of stuttering is avoided.
本文中所讲的编码计算复杂度由视频图像的纹理复杂度和运动复杂度共同决定。视频图像越均匀,运动越平缓其编码计算复杂度越低;反之,其编码计算复杂度越高。The computational complexity of coding mentioned in this article is determined by the texture complexity and motion complexity of the video image. The more uniform the video image, the smoother the motion, the lower the coding calculation complexity; on the contrary, the higher the coding calculation complexity.
需要说明的是,图8所示的实施例中,以目的地设备为VR设备为例,用户佩戴VR设备后,对VR设备的姿态/转动角速度/加速度的检测可通过头部跟踪的方式予以实现。而本申请在实际应用中,VR设备向云端服务器反馈的信息可不限于头部跟踪获得的信息,还可以是手势跟踪获得的信息,或者眼动跟踪获得的信息,或者运动跟踪获得的信息。It should be noted that in the embodiment shown in FIG. 8, the destination device is a VR device as an example. After the user wears the VR device, the posture/rotational angular velocity/acceleration of the VR device can be detected by head tracking. accomplish. In practical applications of this application, the information that the VR device feeds back to the cloud server may not be limited to information obtained by head tracking, but may also be information obtained by gesture tracking, or information obtained by eye tracking, or information obtained by motion tracking.
参见图9,下面分别对这四种实现用户与画面交互的跟踪方式进行描述。Referring to Fig. 9, the four tracking methods for realizing the interaction between the user and the screen will be described below.
(1)头部跟踪(1) Head tracking
头部跟踪是通过测量用户的头部转动时的角度、角速度或角加速度而实现对头部运动的跟踪,从而触发视觉画面的响应。在具体实现上,通过在目的地设备内部配置诸如加速度计、陀螺仪、磁力计、光学捕捉设备、惯性传感器等传感器,可以实时监测佩戴该目的地设备的用户的头部的转动角度、角速度、角加速度、转动方向等信息。头部追踪的结果是,当用户戴上目的地设备(例如VR设备)转动头部时,看到的画面会随着头部的移动而移动,模拟了用户转头看到新的画面的场景,从而获得沉浸式的视觉体验。Head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture. In terms of specific implementation, by configuring sensors such as accelerometers, gyroscopes, magnetometers, optical capture devices, and inertial sensors inside the destination device, the rotation angle, angular velocity, and angular velocity of the head of the user wearing the destination device can be monitored in real time. Angular acceleration, rotation direction and other information. The result of head tracking is that when the user puts on the destination device (such as a VR device) and turns his head, the picture they see will move with the movement of the head, simulating the scene where the user turns his head and sees a new picture. , So as to obtain an immersive visual experience.
(2)手势跟踪(2) Gesture tracking
手势跟踪是通过检测用户的手部在现实环境中的姿势、形态、移动速度、移动方向等而实现对手部运动的跟踪,从而触发视觉画面的响应,或者触发与画面元素的交互。具体实现上,使用手势跟踪可以分为两种方式:一种是接触式检测方式,即用户的手部需要与传感器绑定(例如戴在手上的数据手套,或者用户手持设备),传感器可以是加速度计、陀螺仪、磁力计、惯性传感器等,从而实时监测用户的手部的姿势、形态、移动速度、移动方向等信息。另一种是无接触式检测方式,可通过在目的地设备中配置光学摄像头、红外相机、深度传感器等光学捕捉设备来识别出用户手部的姿势、形态、移动速度、移动方向等信息。手势跟踪能够带给用户对画面内容的直接参与和互动,提升用户的使用体验。Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements. In terms of specific implementation, the use of gesture tracking can be divided into two ways: one is the contact detection method, that is, the user's hand needs to be bound to the sensor (for example, the data glove worn on the hand, or the user's handheld device), and the sensor can be It is an accelerometer, gyroscope, magnetometer, inertial sensor, etc., so as to monitor the user's hand posture, shape, moving speed, moving direction and other information in real time. The other is a non-contact detection method, which can identify the posture, shape, movement speed, and direction of the user's hand by configuring optical capture devices such as optical cameras, infrared cameras, and depth sensors in the destination device. Gesture tracking can bring users direct participation and interaction with the screen content, and enhance the user experience.
(3)眼动跟踪(3) Eye tracking
眼动跟踪是通过测量用户眼睛的注视点的位置或者眼球相对头部的运动而实现对眼球运动的跟踪。具体实现中,通过在目的地设备内部配置诸如内置摄像头、眼动仪、红外控制器、 虹膜图像检测器等设备,通过一定算法(例如眼图录像法和角膜反射法)实时跟踪用户眼球的位置、注视方向、移动方向、移动速度等。当用户戴上目的地设备(例如VR设备)后移动眼睛时,看到的画面会随着眼睛的移动而移动,模拟了用户移动眼睛视野看到新的画面的场景,从而获得沉浸式的视觉体验。Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head. In specific implementation, by configuring devices such as built-in cameras, eye trackers, infrared controllers, iris image detectors, etc. inside the destination device, the position of the user’s eyeballs can be tracked in real time through certain algorithms (such as eye diagram video recording and corneal reflection) , Gaze direction, movement direction, movement speed, etc. When the user puts on the destination device (such as a VR device) and then moves his eyes, the picture he sees will move with the movement of his eyes, simulating the scene where the user moves his eyes to see a new picture, thereby obtaining immersive vision Experience.
(4)运动跟踪(4) Motion tracking
运动跟踪是通过测量用户在现实环境中的位置和姿态(即位姿)、在现实环境中移动的速度、加速度、方向等来实现对用户运动的跟踪。具体实现中,例如可以通过惯性测量单元(IMU)来测量用户在现实环境中的平移运行和旋转运动,从而实现在六个自由度(6DoF)的运动测量。又例如,可通过加速度计、陀螺仪或磁力计测量用户在现实环境中移动的速度、加速度、方向等信息。又例如,还可以通过深度摄像头或者SLAM(simultaneous localization and mapping,即时定位与地图构建)系统来识别现实环境的变化,从而确定用户自身运动的变化以及实时位置。通过运行跟踪,也可以触发视觉画面的更新,或者触发用户与画面元素的交互。Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment. In a specific implementation, for example, an inertial measurement unit (IMU) can be used to measure the translation and rotation of the user in the real environment, so as to realize the motion measurement in six degrees of freedom (6DoF). For another example, the accelerometer, gyroscope, or magnetometer can be used to measure the speed, acceleration, direction and other information of the user's movement in the real environment. For another example, a depth camera or SLAM (simultaneous localization and mapping) system can also be used to identify changes in the real environment, so as to determine changes in the user's own motion and real-time position. Through running tracking, you can also trigger the update of the visual screen, or trigger the user's interaction with the screen elements.
需要说明的是,本申请实施例中,所采用的跟踪方式可以是上述描述的跟踪方式中的任一个,也可以是多个跟踪方式的组合,例如头部跟踪和眼动跟踪的结合,头部跟踪和运动跟踪的结合,等等,本申请对此不作限定。It should be noted that in the embodiments of the present application, the tracking method used can be any of the tracking methods described above, or a combination of multiple tracking methods, such as a combination of head tracking and eye tracking, head tracking and eye tracking. The combination of part tracking and motion tracking, etc., this application does not limit this.
基于上文描述的系统、设备、跟踪方式等,下面描述本发明实施例提供的一种能够用于避免黑边和卡顿现象发生的信息传输方法。请参见图10,图10是本发明实施例提供的一种信息传输方法的流程示意图,该方法从编码设备侧(或称源设备侧)和解码设备侧(或称目的地设备侧)的角度分别进行描述,包括但不限于如下步骤:Based on the system, equipment, tracking method, etc. described above, the following describes an information transmission method provided by an embodiment of the present invention that can be used to avoid the occurrence of black borders and jams. Please refer to FIG. 10, which is a schematic flowchart of an information transmission method provided by an embodiment of the present invention. Describe separately, including but not limited to the following steps:
S101、解码设备检测获得跟踪信息。S101. The decoding device detects and obtains tracking information.
其中,跟踪信息为对解码设备执行以下至少一种跟踪方式产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪。这些跟踪方式的具体内容已在上文作了描述,这里不再赘述。具体实现中,解码设备的跟踪信息包括解码设备或用户的肢体发生运动/移动/转动时的运动信息和位姿信息中的至少一种,即可包括运动信息或位姿信息,或者同时包括运动信息和位姿信息。其中运动信息可包括速度(线速度、角速度等)和/或加速度(线加速度,角加速度等),位姿信息可包括解码设备或者用户的位置和/或姿态(或方向)。即,位姿数信息可表示解码设备在三维空间中的位置和姿态(或方向),位置可以通过三维坐标系中的三个坐标轴x、y、z表示,方向可以通过(α,β,γ)来表示,(α,β,γ)表示围绕三个坐标轴旋转的角度。Wherein, the tracking information is information generated by performing at least one of the following tracking methods on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking. The specific content of these tracking methods has been described above, and will not be repeated here. In a specific implementation, the tracking information of the decoding device includes at least one of the movement information and the pose information when the decoding device or the user's limbs are moving/moving/rotating, which can include the movement information or the pose information, or both at the same time. Information and pose information. The motion information may include velocity (linear velocity, angular velocity, etc.) and/or acceleration (linear acceleration, angular acceleration, etc.), and the pose information may include the position and/or posture (or direction) of the decoding device or the user. That is, the pose number information can represent the position and posture (or direction) of the decoding device in three-dimensional space. The position can be represented by the three coordinate axes x, y, and z in the three-dimensional coordinate system, and the direction can be represented by (α, β, γ) to indicate, (α, β, γ) to indicate the angle of rotation around the three coordinate axes.
S102、解码设备向编码设备发送解码设备的跟踪信息;相应的,编码设备接收解码设备的跟踪信息。S102. The decoding device sends the tracking information of the decoding device to the encoding device; correspondingly, the encoding device receives the tracking information of the decoding device.
S103、编码设备根据解码设备的跟踪信息,配置待处理图像的编码信息。S103. The encoding device configures encoding information of the image to be processed according to the tracking information of the decoding device.
其中,所述待处理图像为当前需要处理传输到编码设备侧进行显示/交互的图像。所述跟踪信息与待处理图像的编码信息相关联,编码信息包括编码设备的编码器对所述待处理图像进行编码需要采用的一个或多个编码参数(或称编码参数集合)。Wherein, the image to be processed is an image that currently needs to be processed and transmitted to the encoding device side for display/interaction. The tracking information is associated with the encoding information of the image to be processed, and the encoding information includes one or more encoding parameters (or set of encoding parameters) that the encoder of the encoding device needs to use to encode the image to be processed.
这些编码参数例如可以包括如下的一种或多种:去块滤波器(deblock_filter)功能的开启或关闭指令、参考帧数目(Ref)、运动估计搜索范围(me_range)、运动估计方式(me_method)、子像素细分强度(subme)、lookahead优化器参数,等等。分别描述如下:These encoding parameters may include, for example, one or more of the following: an instruction to turn on or off the deblock filter (deblock_filter) function, the number of reference frames (Ref), the motion estimation search range (me_range), the motion estimation method (me_method), Sub-pixel subdivision intensity (subme), lookahead optimizer parameters, etc. They are described as follows:
(1)去块滤波器(deblock_filter)功能的开启或关闭指令用于指示是否启动deblock_filter功能对重建图像进行去块滤波。例如,deblock_filter=1表示进行打开该功能,为deblock_filter=0 表示关闭该功能。例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置“deblock_filter=0”,即关闭该去块滤波器功能,从而使编码器省去了去块滤波器的工作,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置“deblock_filter=1”,此时开启去块滤波器功能,由于解码端运动较慢,所以开启去块滤波器功能所带来的时延不会引起黑边、卡顿等现象。(1) The on or off instruction of the deblock filter (deblock_filter) function is used to indicate whether to activate the deblock_filter function to perform deblock filtering on the reconstructed image. For example, deblock_filter=1 means that the function is turned on, and deblock_filter=0 means that the function is turned off. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting "deblock_filter=0", that is, the deblocking filter function is turned off, so that the encoder saves the work of the deblocking filter, reduces the coding complexity, thereby reduces the coding delay, and avoids black on the decoding end. Phenomena such as edge, lag, etc. Conversely, when the specific value is less than the preset threshold, configure "deblock_filter=1", and the deblocking filter function is turned on at this time. Since the motion of the decoding end is slow, the delay caused by turning on the deblocking filter function will not cause Phenomena such as black borders and stuttering.
(2)参考帧数目(Ref)参数用于指示最大参考帧数目,即在图像预测中采用的参考帧的数目,参考帧数目取值范围例如为0~16,取值越大预测越准确,计算复杂度越大。反之,取值越小则预测准确度越差,计算复杂度越小。例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置0<Ref<2,从而减少了编码预测中参考帧的数目,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置16≥Ref≥2,编码预测中参考帧的数目增加,编码复杂度增加,由于解码端运动较慢,所以增加参考帧数目所带来的时延不会引起黑边、卡顿等现象。需要说明的是,上述示例仅用于解释本申请方案而非限定。(2) The number of reference frames (Ref) parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction. The value range of the number of reference frames is, for example, 0-16. The larger the value, the more accurate the prediction. The greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting 0<Ref<2, it reduces the number of reference frames in the encoding prediction, reduces the encoding complexity, and thus reduces the encoding delay, so as to avoid black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 16≥Ref≥2, the number of reference frames in coding prediction increases, and the coding complexity increases. Due to the slower motion of the decoding end, the delay caused by increasing the number of reference frames Will not cause black bars, jams and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.
(3)运动估计搜索范围(me_range)参数用于指示图像预测中的运动估算半径,即编码器对像素块进行预测搜索的半径。例如,运动估算半径的取值范围可以为4~64.取值越大搜索范围越大,预测越准确,计算复杂度越大。反之,取值越小则预测准确度越差,计算复杂度越小。例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置4≤me_range≤8,从而减少了编码预测中运动估算半径,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置8<me_range≤64,编码预测中运动估算半径增加,编码复杂度增加,由于解码端运动较慢,所以增加运动估算半径所带来的时延不会引起黑边、卡顿等现象。需要说明的是,上述示例仅用于解释本申请方案而非限定。(3) The motion estimation search range (me_range) parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder. For example, the value range of the motion estimation radius can be 4 to 64. The larger the value, the larger the search range, the more accurate the prediction, and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. Time configuration 4≤me_range≤8, thereby reducing the motion estimation radius in the coding prediction, reducing the coding complexity, thereby reducing the coding delay, and avoiding black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, 8<me_range≤64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.
(4)运动估计方式(me_method)用于指示设定全像素运动估计方式,运动估计方式包括运动搜索算法(例如菱形搜索算法dia、六边形搜索算法hex、非对称十字型多层次六边形格点搜索算法umh,等等),运动搜索算法越复杂则预测越准确,计算量越复杂越大。反之,运动搜索算法越简单则预测准确度越差,计算复杂度越小。(4) The motion estimation method (me_method) is used to indicate the setting of the full-pixel motion estimation method. The motion estimation method includes the motion search algorithm (such as the diamond search algorithm dia, the hexagon search algorithm hex, and the asymmetric cross multi-level hexagon Grid search algorithm umh, etc.), the more complex the motion search algorithm, the more accurate the prediction, and the more complex the calculation. Conversely, the simpler the motion search algorithm, the worse the prediction accuracy and the smaller the computational complexity.
例如,运动估计一般的匹配准则是采用率失真最优化准则,匹配误差函数为J=SAD+λ·R,其中λ为拉格朗日常数,R表示对运动矢量差值图像编码可能耗费的比特数,SAD(绝对差值和)计算公式如下:SAD=∑ (x,y)∈A|s[x,y]-s′[x,y]|。将J最小的点记为最小误差点(minimum block distortion,MBD)。 For example, the general matching criterion for motion estimation is the rate-distortion optimization criterion, and the matching error function is J=SAD+λ·R, where λ is the Lagrangian number, and R represents the bits that may be consumed for encoding the motion vector difference image The calculation formula of SAD (Sum of Absolute Difference) is as follows: SAD=∑ (x,y)εA |s[x,y]-s'[x,y]|. The point where J is the smallest is recorded as the minimum block distortion (MBD).
不同运动搜索算法采用的搜索模板不同,所依据的MBD点各有差异。如图11示出了一些可能的搜索模板,模板中的黑点表示该步搜索到的最佳预测点,图中示例性列出的搜索模板包括小菱形模板、中菱形模板、六边形模板、小正方形模板、非对称十字模板、5*5逐步搜索模板、大六边形模板、正八边形模板等等。需要说明的是,在本申请的具体实现中,还可以采用其他任意可能的搜索模板,例如全搜索模板、三步搜索模板、四步搜索模板等等,本申请对此并不限定。Different motion search algorithms use different search templates, and the MBD points on which they are based are different. Figure 11 shows some possible search templates. The black dots in the template represent the best prediction points found in this step. The search templates listed in the figure include small diamond templates, medium diamond templates, and hexagon templates. , Small square template, asymmetric cross template, 5*5 step-by-step search template, large hexagon template, regular octagon template, etc. It should be noted that in the specific implementation of this application, any other possible search templates can also be used, such as full search templates, three-step search templates, four-step search templates, etc., which are not limited in this application.
例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、 加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置相对简单的运动估计方式,例如菱形搜索算法dia,搜索算法较为简单,计算量小,降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置相对复杂的运动估计方式,例如,六边形搜索算法hex、非对称十字型多层次六边形格点搜索算法umh,等等,计算量增加,即增加了编码复杂度,由于解码端运动较慢,所以搜索算法的复杂所带来的时延不会引起黑边、卡顿等现象。需要说明的是,上述示例仅用于解释本申请方案而非限定。For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, attitude, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. Time configuration relatively simple motion estimation methods, such as the diamond search algorithm dia, the search algorithm is relatively simple, the amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.
(5)子像素细分强度(subme)参数用于指示动态预测和分区方式,该参数的取值范围例如可以为0~11,取值越大预测越准确,计算复杂度越大。反之,取值越小则预测准确度越差,计算复杂度越小。例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置subme等于0或1,从而降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置1<subme≤11,增加编码复杂度,由于解码端运动较慢,所以所带来的时延不会引起黑边、卡顿等现象。需要说明的是,上述示例仅用于解释本申请方案而非限定。(5) The sub-pixel subdivision intensity (subme) parameter is used to indicate the dynamic prediction and partitioning mode. The value range of this parameter can be, for example, 0-11. The larger the value, the more accurate the prediction and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting subme equal to 0 or 1, thereby reducing the coding complexity, thereby reducing the coding delay, to avoid black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 1<subme≤11 to increase coding complexity. Since the decoding end moves slowly, the delay caused will not cause black bars, jams, etc. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.
(6)先行(lookahead)优化器参数用于设置线程预测的帧缓存大小,该参数的取值范围例如为0~250,取值越大预测越准确,计算复杂度越大。反之,取值越小则预测准确度越差,计算复杂度越小。例如在一种具体实现中,可以做如下设计:当跟踪信息中的具体值(例如速度、加速度、加速度、位置、姿态等数值)大于等于预设阈值时,意味着解码端运动较快,此时配置0≤lookahead<2,使帧缓存大小降低,从而降低了编码复杂度,从而降低了编码时延,以避免解码端出现黑边、卡顿等现象。反之,该具体值小于预设阈值时,配置2≤lookahead≤250,使帧缓存大小增加,从而增加编码复杂度,由于解码端运动较慢,所以所带来的时延不会引起黑边、卡顿等现象。需要说明的是,上述示例仅用于解释本申请方案而非限定。(6) The lookahead optimizer parameter is used to set the frame buffer size for thread prediction. The value range of this parameter is, for example, 0-250. The larger the value, the more accurate the prediction and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting 0≤lookahead<2, the frame buffer size is reduced, thereby reducing the coding complexity, thereby reducing the coding delay, and avoiding black borders and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 2≤lookahead≤250 to increase the size of the frame buffer, thereby increasing the coding complexity. Due to the slower motion of the decoding end, the delay caused will not cause black borders, Caton and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.
基于上面的分析,可以总结如下:在本申请可能的实施例中,为了方便起见,当解码设备反馈的跟踪信息大于等于预设阈值时,可以将跟踪信息所映射的编码信息称为第一编码信息;当所述跟踪信息小于预设阈值时,将跟踪信息所映射的编码信息称为第二编码信息,那么,所述第一编码信息与所述第二编码信息可以满足以下至少一种关系:第一编码信息中的去块滤波器参数用于指示关闭去块滤波器,所述第二编码信息中的去块滤波器参数用于指示开启去块滤波器;第一编码信息中的参考帧数目小于所述第二编码信息中的参考帧数目;第一编码信息中的运动估计搜索范围小于所述第二编码信息中的运动估计搜索范围;第一编码信息中运动估计方式的计算量小于所述第二编码信息中运动估计方式的计算量;第一编码信息中的子像素细分强度小于所述第二编码信息中的子像素细分强度;第一编码信息中的先行优化器参数小于所述第二编码信息中的先行优化器参数。Based on the above analysis, it can be summarized as follows: In a possible embodiment of this application, for convenience, when the tracking information fed back by the decoding device is greater than or equal to a preset threshold, the encoded information mapped by the tracking information may be called the first encoding Information; when the tracking information is less than a preset threshold, the encoding information mapped by the tracking information is called second encoding information, then the first encoding information and the second encoding information can satisfy at least one of the following relationships : The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on; the reference in the first encoding information The number of frames is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information; the calculation amount of the motion estimation method in the first coded information Less than the calculation amount of the motion estimation method in the second coded information; the sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivided intensity in the second coded information; the advance optimizer in the first coded information The parameter is smaller than the advanced optimizer parameter in the second encoded information.
本申请实施例中,跟踪信息与编码信息相关联,是指跟踪信息与编码信息之间具有对应的关系,编码设备存储着两者之间的关系。In the embodiment of the present application, the tracking information is associated with the encoding information, which means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two.
比如,一些具体实施例中,解码设备可存储有跟踪信息与编码参数集合之间的映射关系。这样,收到跟踪信息后,解码设备就可以根据该映射关系找到对应的编码参数配置到编码器。举例来说,跟踪信息与编码信息之间可以是直接映射的关系,即跟踪信息与编码信息相绑定,通过跟踪信息可以直接确定编码信息。For example, in some specific embodiments, the decoding device may store the mapping relationship between the tracking information and the encoding parameter set. In this way, after receiving the tracking information, the decoding device can find the corresponding encoding parameter and configure it to the encoder according to the mapping relationship. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information.
又比如,一些具体实施例中,跟踪信息与编码信息之间可以是间接关联的关系,比如需要对跟踪信息进行一定的算法处理或者条件判断才能确定与之对应的编码信息。当编码设备收到解码设备上传的具体的跟踪信息后,根据具体的跟踪信息可以确定与之对应的编码信息。参见图12,解码设备还可以根据跟踪信息进行预设条件的判断,基于判断结果确定对应的编码参数集合。例如,解码设备预先存储不同的数据区间与编码参数集合之间的映射关系,收到跟踪信息后,解码设备可以根据跟踪信息确定该跟踪信息中的速度/加速度/位置等数据所在的数据区间,进而根据该映射关系找到对应的编码参数集合。For another example, in some specific embodiments, the tracking information and the coded information may be indirectly related. For example, certain algorithm processing or conditional judgments are needed on the tracking information to determine the corresponding coded information. After the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information. Referring to FIG. 12, the decoding device can also make a judgment on a preset condition based on the tracking information, and determine the corresponding encoding parameter set based on the judgment result. For example, the decoding device pre-stores the mapping relationship between different data intervals and encoding parameter sets. After receiving the tracking information, the decoding device can determine the data interval where the speed/acceleration/position data in the tracking information is located according to the tracking information. Then find the corresponding encoding parameter set according to the mapping relationship.
可以看到,本申请实施例中,编码设备可实时接收解码设备反馈的包含位置/姿态/线速度/角速度/加速度等至少一种跟踪信息的指令后,可以根据接收到的跟踪信息对编码设备中的编码器的编码信息(编码参数)进行调整,从而调节了编码器的计算复杂度。It can be seen that, in this embodiment of the application, the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information. The encoding information (encoding parameters) of the encoder in the encoder is adjusted, thereby adjusting the computational complexity of the encoder.
S104、编码设备根据S104配置的编码参数,对待处理图像进行图像编码。该编码过程可由编码设备中的编码器执行。这里不对具体编码过程展开描述。S104. The encoding device performs image encoding on the image to be processed according to the encoding parameters configured in S104. The encoding process can be performed by the encoder in the encoding device. The specific coding process is not described here.
S105、编码设备将用于指示S104配置的编码参数编入码流发送至解码设备。S105. The encoding device encodes the encoding parameter used to indicate the configuration of S104 into a code stream and sends it to the decoding device.
本申请实施例中,由于编码设备可根据解码设备上传的跟踪信息相应调整编码器的编码参数,故可在码流中编入编码参数的相关信息,以便于编码设备的后续解码。In the embodiment of the present application, since the encoding device can adjust the encoding parameters of the encoder according to the tracking information uploaded by the decoding device, relevant information about the encoding parameters can be encoded in the code stream to facilitate subsequent decoding by the encoding device.
可理解的是,码流中还包含了对图像编码后获得的图像信息,以便于解码设备基于该图像信息进行图像重构(解码),例如,向解码设备发送的码流中包含运动矢量差信息(MVD)、参考图像索引等,图像信息的具体内容可参考现有编码手段实现,本申请不做限定。It is understandable that the code stream also contains the image information obtained after encoding the image, so that the decoding device can reconstruct (decode) the image based on the image information. For example, the code stream sent to the decoding device contains the motion vector difference. Information (MVD), reference image index, etc., the specific content of image information can be implemented by referring to existing encoding methods, which are not limited in this application.
S106、解码设备解析来自所述编码设备的码流。S106. The decoding device parses the code stream from the encoding device.
例如,由于码流中编入了编码参数的信息和图像信息,所以可理解的,解码设备可通过解析码流获得这些信息。For example, since encoding parameter information and image information are encoded in the code stream, it is understandable that the decoding device can obtain this information by parsing the code stream.
S107、解码设备根据所述指示信息进行图像解码及显示。S107. The decoding device decodes and displays the image according to the instruction information.
可以理解的,解码设备的解码过程可视为编码的逆过程,解码设备可以对图像信息进行解码(译码),从而重构出图像,并通过显示设备进行显示。解码过程的实现可参考现有解码手段实现。It is understandable that the decoding process of the decoding device can be regarded as the inverse process of encoding, and the decoding device can decode (decode) the image information to reconstruct the image and display it on the display device. The realization of the decoding process can refer to the existing decoding means.
需要说明的是,本申请实施例主要是从编解码的实现过程对方案进行描述,而图像的其他处理过程(例如图像预处理、图像后处理等)的实现可参考现有手段予以实现,本申请不再详述。It should be noted that the embodiments of this application mainly describe the solution from the implementation process of encoding and decoding, and the implementation of other image processing procedures (such as image preprocessing, image post-processing, etc.) can be implemented with reference to existing methods. The application will not be detailed.
可以看到,实施本申请实施例,编码设备可实时接收解码设备反馈的包含位置/姿态/线速度/角速度/加速度等至少一种信息的指令后,进一步将这些指令传到编码器。编码器通过判断根据解码设备的位置/姿态/线速度/角速度/加速度自动调节编码器的计算复杂度(编码参数)从而调节了编码器的编码时延,进而降低整个系统时延,后续可将图像相关信息和所配置的编码参数发给VR设备,以便于解码设备正常地解码及显示。由于解码设备的黑边现象与系统时延过大息息相关,而时延的主要构成之一在于CloudVR系统中编码时延,所以本申请通过降低编码器的计算复杂度来降低系统时延,能从根本上消除画面黑边产生的可能性。本方案中VR设备能够实时、及时地解码及显示图像,所以也保证了VR设备的显示流畅度,避免了卡顿现象的发生,此外,本申请实施例还可以保持画面渲染处于较佳的分辨率,保证了用户的使用体验。It can be seen that after implementing the embodiments of the present application, the encoding device can receive instructions containing at least one information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then further transmit these instructions to the encoder. The encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder according to the position/posture/linear velocity/angular velocity/acceleration of the decoding device, thereby adjusting the encoding delay of the encoder, thereby reducing the delay of the entire system. The image-related information and the configured encoding parameters are sent to the VR device so that the decoding device can decode and display normally. Since the black border phenomenon of the decoding device is closely related to the excessive system delay, and one of the main components of the delay is the encoding delay in the CloudVR system. Therefore, this application reduces the system delay by reducing the computational complexity of the encoder. Basically eliminate the possibility of black borders on the screen. In this solution, the VR device can decode and display images in real time and in time, so the display fluency of the VR device is also guaranteed, and the occurrence of jams is avoided. In addition, the embodiment of the application can also keep the image rendering at a better resolution. Rate, to ensure the user experience.
为了更好理解本申请实施例的方案,下面以一种具体的cloudVR场景为例来说明本申请实施例提供的信息传输方法。请参见图13,该方法从编码设备侧(或称源设备侧)和解码设 备侧(或称目的地设备侧)的角度分别进行描述,该编码设备可以是VR头显,解码设备可以包括云端服务器,VR头显的跟踪信息为通过头部跟踪获得的信息(例如头部的旋转角速度、位姿等),如图11所示,该方法包括但不限于如下步骤:In order to better understand the solutions of the embodiments of the present application, a specific cloudVR scenario is used as an example to illustrate the information transmission method provided by the embodiments of the present application. Please refer to Figure 13. This method is described separately from the perspective of the encoding device side (or called the source device side) and the decoding device side (or called the destination device side). The encoding device can be a VR headset, and the decoding device can include the cloud In the server, the tracking information of the VR headset is the information obtained through head tracking (for example, the rotational angular velocity of the head, the pose, etc.), as shown in Fig. 11, the method includes but is not limited to the following steps:
S201、VR头显通过头部跟踪方式检测VR头显的旋转角速度V、位姿信息。具体的实现方案可参考前文有关于头部跟踪的内容,这里不再赘述。S201. The VR head-mounted display detects the rotation angular velocity V and pose information of the VR head-mounted display by means of head tracking. For the specific implementation scheme, please refer to the content about head tracking in the previous article, which will not be repeated here.
S202、VR头显将旋转角速度V、位姿等信息发送给服务器。S202. The VR head-mounted display sends information such as the rotational angular velocity V and the pose to the server.
S203、服务器根据VR头显的位姿信息确定待处理图像,并对待处理图像进行图像渲染。S203: The server determines the image to be processed according to the pose information of the VR headset, and performs image rendering on the image to be processed.
具体的,服务器可依据头显位置进行图像位置预测,从而确定当前的待处理图像,并对待处理图像进行预处理,所述预处理例如可包括图像渲染、整修、色彩格式转换、调色或去噪中的一种或多种。此部分的具体内容可参考现有手段实现。Specifically, the server can predict the position of the image according to the position of the head display to determine the current image to be processed, and preprocess the image to be processed. The preprocessing may include, for example, image rendering, trimming, color format conversion, color correction, or de-processing. One or more of noise. The specific content of this part can be achieved with reference to existing methods.
S204、服务器将VR头显的旋转角速度V传输到内部的编码器。编码器获得旋转角速度后,判断旋转角速度V与预设阈值的关系,以便于启动编码参数配置功能进行编码参数调整,调整编码复杂度。示例性地,可预先设置预设阈值可以包括T1和T2,其中T1<T2。当V>T2时,后续执行S205-1;当T1≤V≤T2时,后续执行S205-2;当V<T1时,后续执行S205-3。S204. The server transmits the rotational angular velocity V of the VR headset to the internal encoder. After the encoder obtains the rotational angular velocity, it judges the relationship between the rotational angular velocity V and the preset threshold, so as to start the encoding parameter configuration function to adjust the encoding parameters and adjust the encoding complexity. Exemplarily, the preset threshold that can be preset may include T1 and T2, where T1<T2. When V>T2, S205-1 is subsequently executed; when T1≤V≤T2, S205-2 is subsequently executed; when V<T1, S205-3 is subsequently executed.
S205-1、V>T2时,编码器配置第一编码参数集合。In S205-1, when V>T2, the encoder configures the first encoding parameter set.
具体的,当VR头显转速很快,超过T2时,编码器根据映射关系选择第一编码参数集合,配置的第一编码参数集合例如可以包括如下一种或多种:关闭编码器中deblock功能、参考帧数目修改为1、运动估计搜索范围为4x4、运动估计方式采用菱形搜索dia,等等。从而,大幅度降低编码计算复杂度,显著降低编码时延。Specifically, when the VR headset rotates very fast and exceeds T2, the encoder selects the first encoding parameter set according to the mapping relationship, and the configured first encoding parameter set may include, for example, one or more of the following: Turn off the deblock function in the encoder , The number of reference frames is modified to 1, the motion estimation search range is 4x4, the motion estimation method uses diamond search dia, and so on. Therefore, the coding calculation complexity is greatly reduced, and the coding delay is significantly reduced.
S205-2、T1≤V≤T2时,配置第二编码参数集合。S205-2, when T1≤V≤T2, configure the second encoding parameter set.
具体的,当VR头显转速较快,超过T1但不超过T2时,编码器根据映射关系选择第二编码参数集合,配置的第二编码参数集合例如可以包括如下一种或多种:打开deblock功能,参考帧数目增大到2、运动估计搜索范围为8x8、运动估计方式采用六边形搜索算法hex,等等。从而,降低编码计算复杂度,降低编码时延。Specifically, when the rotation speed of the VR headset is relatively fast, exceeding T1 but not exceeding T2, the encoder selects the second encoding parameter set according to the mapping relationship, and the configured second encoding parameter set may include, for example, one or more of the following: open deblock Function, the number of reference frames is increased to 2, the motion estimation search range is 8x8, the motion estimation method uses the hexagonal search algorithm hex, and so on. Therefore, the coding calculation complexity is reduced, and the coding delay is reduced.
S205-3、V<T1时,配置第三编码参数集合。S205-3, when V<T1, configure the third encoding parameter set.
具体的,当VR头显转速较慢,小于T1时,编码器根据映射关系选择第三编码参数集合,配置的第三编码参数集合例如可以包括如下一种或多种:打开deblock功能、参考帧数目增大到4、运动估计搜索范围为16x16、运动估计方式采用非对称十字型多层次六边形格点搜索算法umh。此时编码器计算复杂度相对较大,编码时延相对较大。Specifically, when the rotation speed of the VR headset is slower than T1, the encoder selects the third encoding parameter set according to the mapping relationship. The configured third encoding parameter set may include, for example, one or more of the following: open deblock function, reference frame The number is increased to 4, the motion estimation search range is 16x16, and the motion estimation method adopts the asymmetric cross-type multi-level hexagonal grid point search algorithm umh. At this time, the computational complexity of the encoder is relatively large, and the coding delay is relatively large.
具体实现中,一种示例性的实施代码如下:In specific implementation, an exemplary implementation code is as follows:
Figure PCTCN2021099866-appb-000001
Figure PCTCN2021099866-appb-000001
Figure PCTCN2021099866-appb-000002
Figure PCTCN2021099866-appb-000002
需要说明的是,有关于编码器配置编码参数的相关内容还可参考前述图10实施例S103的相关描述,这里不再赘述。It should be noted that, for related content regarding the configuration of the encoding parameters of the encoder, reference may also be made to the related description of the foregoing embodiment S103 in FIG. 10, which will not be repeated here.
S206、服务器的编码器载入所配置的编码参数开始对待处理图像进行图像编码。S206: The encoder of the server loads the configured encoding parameters and starts to perform image encoding on the image to be processed.
S207、服务器的编码器将经编码的图像信息和所选择配置的编码参数编入码流,通过网络发送到VR头显。图像信息例如可包含运动矢量差信息(MVD)、参考图像索引等具体内容可参考现有编码手段实现,本申请不做限定。S207. The encoder of the server encodes the encoded image information and the selected configuration encoding parameters into a code stream, and sends the encoded image information to the VR head display via the network. The image information may include, for example, motion vector difference information (MVD), reference image index, and other specific content, which can be implemented with reference to existing coding methods, which is not limited in this application.
S208、VR头显解析码流。S208, VR headset parsing code stream.
具体的,由于码流中编入了编码参数的信息和图像信息,所以可理解的,解码设备可通过解析码流获得这些信息。Specifically, since encoding parameter information and image information are encoded in the code stream, it is understandable that the decoding device can obtain this information by parsing the code stream.
S209、VR头显进行图像解码及显示。具体可参考前述图10实施例S107的相关描述,这里不再赘述。S209, the VR head display performs image decoding and display. For details, reference may be made to the related description of the embodiment S107 in FIG. 10, which is not repeated here.
需要说明的是,图13实施例中的两个预设阈值T1、T2,以及编码器具体配置的编码参数仅用于解释本申请方案而非限定。在具体实现中,基于本申请的技术思想,还可以设计不同的实现方案,例如图14示出了又一种实现方案,该方案中可以只采用一个预设阈值T,编码器中配置的编码参数可以包括deblock功能的开启和关闭、显示渲染中构造顶点数的变化等。而不同形式的实现方案、以及基于本申请方案变形得到的方案,皆应属于本申请的保护范围。It should be noted that the two preset thresholds T1 and T2 in the embodiment of FIG. 13 and the encoding parameters specifically configured by the encoder are only used to explain the solution of the present application and not to limit it. In specific implementation, based on the technical ideas of the present application, different implementation schemes can also be designed. For example, FIG. 14 shows another implementation scheme. In this scheme, only a preset threshold T can be used, and the code configured in the encoder Parameters can include turning on and off the deblock function, the change in the number of structural vertices in display rendering, and so on. The implementation schemes of different forms and the schemes obtained based on the variants of the scheme of this application shall all belong to the protection scope of this application.
可以看到,实施本发明实施例,服务器端可以通过判断VR头显反馈的旋转角速度信息,调整编码器计算复杂度,优化编码器参数配置,从而优化编码时延,进而优化整个系统转头时延,显著降低CloudVR的系统时延,有助于减缓甚至消除转头时黑边效应。佩戴VR头显时,转头速度越快,人眼对画质好坏的感知越不明显,CloudVR系统时延越低,感知到转头黑边的可能性越小。本实施例中,如果头显旋转速度较快,编码器可以牺牲一定画质为代价,降低编码复杂度,从而降低系统时延;如果头显旋转速度较慢,编码器可按照原始配置进行编码,不降低画质。从而既保证了在VR头显快速旋转时的低时延,从根本上减少黑边产生的可能性,又不影响VR头显静止或低速转头时的画质和体验,同时,本实施例还可以保证图像渲染分辨率,保证用户使用体验。而现有技术手段并没有降低系统时延,所以会引入画 面拖影、抖动等不良反应。It can be seen that implementing the embodiment of the present invention, the server can adjust the calculation complexity of the encoder and optimize the parameter configuration of the encoder by judging the rotational angular velocity information fed back by the VR head-mounted display, thereby optimizing the encoding delay, and then optimizing the entire system when turning the head. Delay, significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning the head. When wearing a VR headset, the faster the head turning speed, the less obvious the human eye's perception of the quality of the image, the lower the CloudVR system latency, the less likely it is to perceive the black edge of the head turning. In this embodiment, if the head-mounted display rotates faster, the encoder can sacrifice a certain image quality at the expense of reducing encoding complexity, thereby reducing system delay; if the head-mounted display rotating speed is slow, the encoder can encode according to the original configuration , Without degrading image quality. This not only guarantees the low latency when the VR headset rotates quickly, and fundamentally reduces the possibility of black borders, but does not affect the image quality and experience of the VR headset when it is stationary or turning its head at low speed. At the same time, this embodiment The resolution of the image rendering can also be guaranteed to ensure the user experience. However, the existing technical methods do not reduce the system delay, so adverse reactions such as smearing and jitter will be introduced.
此外,需要说明的是,在本申请可选的实施例中,当解码设备的解码器完成对图像的解码后,还可以通过改进图像后处理环节(例如图1中的图片后处理器32)来进一步降低系统时延,从而进一步避免黑边、卡顿现象的产生。In addition, it should be noted that, in an optional embodiment of the present application, after the decoder of the decoding device completes the decoding of the image, it can also improve the image post-processing link (for example, the picture post-processor 32 in FIG. 1) To further reduce the system delay, so as to further avoid the occurrence of black edges and stalls.
具体的,解码设备在图像后处理环节,可根据所述解码设备的跟踪信息,启动或关闭图像后处理阶段的一个或多个处理算法,以调整图像后处理环节的计算复杂度,减缓黑边或卡顿现象产生的可能。Specifically, in the image post-processing link, the decoding device can start or close one or more processing algorithms in the image post-processing stage according to the tracking information of the decoding device, so as to adjust the computational complexity of the image post-processing link and reduce the black border Or the possibility of stuck phenomenon.
在一些可能的实施例中,可预设解码设备的跟踪信息与图片后处理器采用的一个或多个处理算法之间的映射关系,那么,根据解码设备的跟踪信息查询该映射关系,就可以确定对应的处理算法并配置给图片后处理器使用。In some possible embodiments, the mapping relationship between the tracking information of the decoding device and one or more processing algorithms used by the picture post-processor can be preset. Then, the mapping relationship can be queried according to the tracking information of the decoding device. Determine the corresponding processing algorithm and configure it for the image post processor.
其中,图片后处理器采用的一个或多个处理算法例如可以包括如下算法中的至少一种:标准动态范围(Standard-Dynamic Range,SDR)图像算法,高动态范围(High-Dynamic Range,HDR)图像算法,图像增强算法,图像超分辨率算法等等。其中SDR图像算法可用于实现对图像或视频的伽马曲线校正。HDR图像算法可用于提供更多的图像的动态范围和图像细节,使得图像能够更好反映出真实环境中的视觉效果。图像增强算法可用于对图像的亮度、对比度、饱和度、色调等进行调节,以增加图像的清晰度、减少噪点等。图像超分辨率算法可将低分辨率图像或图像序列恢复成高分辨率图像。可以看出,通过上述一个或多个处理算法可提升解码设备的图像画质,从而给用户带来更好的图像观感。Among them, the one or more processing algorithms adopted by the image post processor may include, for example, at least one of the following algorithms: Standard-Dynamic Range (SDR) image algorithm, High-Dynamic Range (HDR) Image algorithms, image enhancement algorithms, image super-resolution algorithms, etc. Among them, the SDR image algorithm can be used to realize the gamma curve correction of the image or video. The HDR image algorithm can be used to provide more dynamic range and image details of the image, so that the image can better reflect the visual effect in the real environment. Image enhancement algorithms can be used to adjust the brightness, contrast, saturation, hue, etc. of the image to increase the clarity of the image and reduce noise. Image super-resolution algorithms can restore low-resolution images or image sequences to high-resolution images. It can be seen that the image quality of the decoding device can be improved through the above-mentioned one or more processing algorithms, thereby bringing a better image look and feel to the user.
解码设备的跟踪信息可以从解码设备的存储器的历史缓存中获得的,也可以是编码设备通过码流回传给解码设备的。解码设备的跟踪信息包括所述解码设备的运动信息和位姿信息中的至少一种,所述运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线加速度;所述位姿信息包括所述解码设备的位置信息和/或姿态信息。The tracking information of the decoding device can be obtained from the history cache of the memory of the decoding device, or it can be transmitted back to the decoding device by the encoding device through the code stream. The tracking information of the decoding device includes at least one of motion information and pose information of the decoding device, the motion information includes the motion speed and/or acceleration of the decoding device, and the motion speed includes angular velocity and/or linear velocity. Speed, the acceleration includes angular acceleration and/or linear acceleration; the pose information includes position information and/or attitude information of the decoding device.
举例来说,以解码设备为VR头显,跟踪信息为旋转角速度V为例,可预先在解码设备设定VR头显的旋转角速度阈值。那么,跟踪信息与图片后处理器采用的一个或多个处理算法之间的映射关系可以是如下示例:For example, if the decoding device is a VR head-display, and the tracking information is the rotation angular velocity V, for example, the rotation angular velocity threshold of the VR head-display can be set in the decoding device in advance. Then, the mapping relationship between the tracking information and one or more processing algorithms adopted by the picture post-processor can be the following example:
当旋转角速度V超过旋转角速度阈值时,关闭一个或多个处理算法,例如关闭SDR图像算法、HDR图像算法、图像增强算法、图像超分辨率算法等中的至少一个。也就是当VR头显的运动较快时可以在图像后处理环节降低计算复杂度,从而降低系统时延,从而避免了黑边、卡顿等现象的发生。When the rotational angular velocity V exceeds the rotational angular velocity threshold, one or more processing algorithms are turned off, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, the image super-resolution algorithm, etc. is turned off. That is, when the VR headset is moving faster, the computational complexity can be reduced in the image post-processing link, thereby reducing the system delay, thereby avoiding the occurrence of black edges, jams and other phenomena.
当旋转角速度V小于旋转角速度阈值时,开启一个或多个处理算法,例如开启SDR图像算法、HDR图像算法、图像增强算法、图像超分辨率算法等中的至少一个。即可以在图像后处理环节增加计算复杂度,从而提高系统输出的图像的画质,给用户带来更好的观感体验。由于此时VR头显的运动相对较慢,所以所带来的时延不会引起黑边、卡顿等现象。When the rotational angular velocity V is less than the rotational angular velocity threshold, one or more processing algorithms are turned on, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, and the image super-resolution algorithm is turned on. That is to say, the computational complexity can be increased in the image post-processing link, thereby improving the image quality of the image output by the system, and bringing a better viewing experience to the user. Since the movement of the VR head-mounted display is relatively slow at this time, the time delay caused will not cause black edges, jams, etc.
需要说明的是,上述示例仅用于解释本申请方案而非限定。在其他的实现中,也可以设计多个旋转角速度阈值(例如两个或更多个),从而提供跟踪信息与处理算法之间的更多的映射可能,以适应更多样化的应用场景。It should be noted that the above examples are only used to explain the solution of the present application and not to limit it. In other implementations, multiple rotational angular velocity thresholds (for example, two or more) may also be designed to provide more mapping possibilities between tracking information and processing algorithms to adapt to more diversified application scenarios.
可以看到,实施本发明实施例,解码设备端(例如VR头显)根据历史缓存的跟踪信息或者码流中回传过来的跟踪信息,调整图像后处理环节(例如图片后处理器)的一种或多种 处理算法的开启或关闭,从而实现计算复杂度的调整,进一步优化整个系统转头时延,显著降低CloudVR的系统时延,有助于减缓甚至消除转头时黑边效应,还可以在保证无黑边、卡顿的情况下提升图像画质,给用户带来更好的观感体验。It can be seen that in implementing the embodiment of the present invention, the decoding device (such as a VR headset) adjusts one of the image post-processing links (such as a picture post-processor) based on the tracking information cached in the history or the tracking information returned in the code stream. One or more processing algorithms can be turned on or off to achieve the adjustment of computational complexity, further optimize the delay of the entire system turning head, significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning heads. It can improve the image quality without black borders and jams, and bring users a better viewing experience.
上述详细阐述了本发明实施例的方法,下面继续提供了本发明实施例的装置。The foregoing describes the method of the embodiment of the present invention in detail, and the device of the embodiment of the present invention is continued below.
请参见图15,图15是本发明实施例提供的一种系统以及该系统中的编码设备60和解码设备70的结构示意图,其中,编码设备60和解码设备70之间可通过无线的方式进行通信,编码设备60可包括参数调整模块601、编码模块602、接收模块603和发射模块604。解码设备70可包括跟踪模块701、解码模块702、显示模块703、发射模块704和接收模块705,Please refer to Figure 15. Figure 15 is a system provided by an embodiment of the present invention and a structural diagram of the encoding device 60 and the decoding device 70 in the system. The encoding device 60 and the decoding device 70 can communicate wirelessly. For communication, the encoding device 60 may include a parameter adjustment module 601, an encoding module 602, a receiving module 603, and a transmitting module 604. The decoding device 70 may include a tracking module 701, a decoding module 702, a display module 703, a transmitting module 704, and a receiving module 705,
编码设备60和解码设备70各个模块功能分别描述如下。The functions of each module of the encoding device 60 and the decoding device 70 are respectively described as follows.
对于编码设备60:For encoding device 60:
接收模块603用于接收解码设备70的跟踪信息;所述解码设备70的跟踪信息包括所述解码设备的运动信息和位姿信息中的至少一种。The receiving module 603 is configured to receive tracking information of the decoding device 70; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.
参数调整模块601用于根据所述解码设备70的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括对所述待处理图像编码采用的一个或多个编码参数。The parameter adjustment module 601 is configured to configure the encoding information of the image to be processed according to the tracking information of the decoding device 70; the tracking information is associated with the encoding information, and the encoding information includes the encoding information used for encoding the image to be processed. One or more encoding parameters.
编码模块602用于对待处理图像进行图像编码。The encoding module 602 is used to perform image encoding on the image to be processed.
发射模块604用于将编码参数和经编码的图像信息编入码流发送至所述解码设备70。The transmitting module 604 is configured to encode the encoding parameters and the encoded image information into a code stream and send to the decoding device 70.
对于解码设备70:For the decoding device 70:
跟踪模块701用于通过对所述解码设备70进行实时跟踪获得跟踪信息,所述跟踪信息为对所述解码设备70执行以下至少一种操作产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪;所述解码设备70的跟踪信息包括所述解码设备的运动信息和位姿信息中的至少一种。The tracking module 701 is configured to obtain tracking information by tracking the decoding device 70 in real time, and the tracking information is information generated by performing at least one of the following operations on the decoding device 70: head tracking, gesture tracking, and eye tracking , Or motion tracking; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.
发射模块704用于向编码设备60发送所述解码设备70的跟踪信息。The transmitting module 704 is configured to send the tracking information of the decoding device 70 to the encoding device 60.
接收模块705用于接收来自所述编码设备60的码流获得编码参数和经编码的图像信息。The receiving module 705 is configured to receive the code stream from the encoding device 60 to obtain encoding parameters and encoded image information.
解码模块702用于根据经编码的图像信息进行图像解码。The decoding module 702 is used for image decoding according to the encoded image information.
显示模块703用于对经解码后的图像进行显示。The display module 703 is used to display the decoded image.
需要说明的是,编码设备60和解码设备70各个模块功能的具体功能实现可参考前文图10或图13实施例的相关描述,例如,对于编码设备60,接收模块603可用于执行S102的跟踪信息接收,参数调整模块601用于执行S103,编码模块602用于执行S104,发射模块604用于执行S105的码流发送;对于解码设备70,跟踪模块701用于执行S101,发射模块704用于执行S102的跟踪信息发送,接收模块705用于执行S105的码流接收和S106,解码模块702用于执行S107的图像解码,显示模块703用于S107的图像显示。为了说明书的简洁,这里不再赘述。It should be noted that the specific function realization of each module function of the encoding device 60 and the decoding device 70 can refer to the related description of the embodiment in Figure 10 or Figure 13 above. For example, for the encoding device 60, the receiving module 603 can be used to perform the tracking information of S102. For receiving, the parameter adjustment module 601 is used to perform S103, the encoding module 602 is used to perform S104, and the transmitting module 604 is used to perform S105 code stream transmission; for the decoding device 70, the tracking module 701 is used to perform S101, and the transmitting module 704 is used to perform The tracking information is sent in S102, the receiving module 705 is used to perform the code stream reception of S105 and S106, the decoding module 702 is used to perform image decoding in S107, and the display module 703 is used to display the image in S107. For the sake of brevity of the manual, I won't repeat it here.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者任意组合来实现。当使用软件实现时,可以全部或者部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令,在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可存储在计算机可读存储介质中,或者从一个计 算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网络站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、微波等)方式向另一个网络站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,也可以是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带等)、光介质(例如DVD等)、或者半导体介质(例如固态硬盘)等等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions, and when the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a network site, computer, server, or data center. Transmission to another network site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, and may also be a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid state hard disk), and so on.
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

Claims (17)

  1. 一种信息传输方法,其特征在于,所述方法应用于编码设备,包括:An information transmission method, characterized in that the method is applied to an encoding device, and includes:
    接收解码设备的跟踪信息,所述跟踪信息包括所述解码设备的运动信息或位姿信息;Receiving tracking information of a decoding device, where the tracking information includes motion information or pose information of the decoding device;
    根据所述解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;Configure encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters;
    根据所述编码信息对所述待处理图像进行编码,并将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数。The image to be processed is encoded according to the encoding information, and a code stream is sent to the decoding device, where the code stream includes the one or more encoding parameters.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述解码设备的跟踪信息,配置待处理图像的编码信息,具体包括:The method according to claim 1, wherein the configuring the encoding information of the image to be processed according to the tracking information of the decoding device specifically comprises:
    根据所述跟踪信息查询预设映射关系,获得所述待处理图像的编码信息,所述预设映射关系包括所述跟踪信息到所述编码信息的映射关系;Querying a preset mapping relationship according to the tracking information to obtain encoding information of the image to be processed, the preset mapping relationship including a mapping relationship from the tracking information to the encoding information;
    配置所述编码信息。Configure the encoding information.
  3. 根据权利要求1或2所述的方法,其特征在于,所述一个或多个编码参数包括去块滤波器参数、参考帧数目、运动估计搜索范围、运动估计方式、子像素细分强度、先行优化器参数中的一个或多个。The method according to claim 1 or 2, wherein the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, advance One or more of the optimizer parameters.
  4. 根据权利要求3所述的方法,其特征在于,当所述跟踪信息大于等于预设阈值时,所述跟踪信息映射第一编码信息;当所述跟踪信息小于预设阈值时,所述跟踪信息映射第二编码信息,且所述第一编码信息与所述第二编码信息满足以下至少一种关系:The method according to claim 3, wherein when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first encoding information; when the tracking information is less than the preset threshold, the tracking information The second encoding information is mapped, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
    所述第一编码信息中的去块滤波器参数用于指示关闭去块滤波器,所述第二编码信息中的去块滤波器参数用于指示开启去块滤波器;The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;
    所述第一编码信息中的参考帧数目小于所述第二编码信息中的参考帧数目;所述第一编码信息中的运动估计搜索范围小于所述第二编码信息中的运动估计搜索范围;The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;
    所述第一编码信息中运动估计方式的计算量小于所述第二编码信息中运动估计方式的计算量;The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;
    所述第一编码信息中的子像素细分强度小于所述第二编码信息中的子像素细分强度;The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;
    所述第一编码信息中的先行优化器参数小于所述第二编码信息中的先行优化器参数。The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述跟踪信息为对所述解码设备执行以下至少一种操作产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪。The method according to any one of claims 1 to 4, wherein the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or Motion tracking.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述解码设备的运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线加速度;所述位姿信息包括位置信息和/或姿态信息。The method according to any one of claims 1 to 5, wherein the motion information of the decoding device includes a motion speed and/or acceleration of the decoding device, and the motion speed includes an angular velocity and/or a linear velocity, The acceleration includes angular acceleration and/or linear acceleration; the pose information includes position information and/or posture information.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述解码设备包括虚拟现实VR设备、增强现实AR设备、混合现实MR设备、或无人机飞行眼镜中的一种。The method according to any one of claims 1-6, wherein the decoding device comprises one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
  8. 一种用于编码图像的装置,其特征在于,所述装置应用于编码设备,包括:An apparatus for encoding images, characterized in that the apparatus is applied to encoding equipment, and includes:
    接收模块,用于接收解码设备的跟踪信息;所述跟踪信息包括所述解码设备的运动信息或位姿信息;A receiving module for receiving tracking information of a decoding device; the tracking information includes motion information or pose information of the decoding device;
    参数调整模块,用于根据所述解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;The parameter adjustment module is configured to configure the coding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the coding information, and the coding information includes one or more coding parameters;
    编码模块,用于根据所述编码信息对所述待处理图像进行编码;An encoding module, configured to encode the image to be processed according to the encoding information;
    发射模块,用于将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数。The transmitting module is configured to send a code stream to the decoding device, where the code stream includes the one or more encoding parameters.
  9. 根据权利要求8所述的装置,其特征在于,所述参数调整模块具体用于:The device according to claim 8, wherein the parameter adjustment module is specifically configured to:
    根据所述跟踪信息查询预设映射关系,获得所述待处理图像的编码信息,所述预设映射关系包括所述跟踪信息到所述编码信息的映射关系;Querying a preset mapping relationship according to the tracking information to obtain encoding information of the image to be processed, the preset mapping relationship including a mapping relationship from the tracking information to the encoding information;
    配置所述编码信息。Configure the encoding information.
  10. 根据权利要求8或9所述的装置,其特征在于,所述一个或多个编码参数包括去块滤波器参数、参考帧数目、运动估计搜索范围、运动估计方式、子像素细分强度、先行优化器参数中的一个或多个。The device according to claim 8 or 9, wherein the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, and advance One or more of the optimizer parameters.
  11. 根据权利要求10所述的装置,其特征在于,当所述跟踪信息大于等于预设阈值时,所述跟踪信息映射第一编码信息;当所述跟踪信息小于预设阈值时,所述跟踪信息映射第二编码信息,且所述第一编码信息与所述第二编码信息满足以下至少一种关系:The device according to claim 10, wherein when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first coding information; when the tracking information is less than the preset threshold, the tracking information The second encoding information is mapped, and the first encoding information and the second encoding information satisfy at least one of the following relationships:
    所述第一编码信息中的去块滤波器参数用于指示关闭去块滤波器,所述第二编码信息中的去块滤波器参数用于指示开启去块滤波器;The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;
    所述第一编码信息中的参考帧数目小于所述第二编码信息中的参考帧数目;所述第一编码信息中的运动估计搜索范围小于所述第二编码信息中的运动估计搜索范围;The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;
    所述第一编码信息中运动估计方式的计算量小于所述第二编码信息中运动估计方式的计算量;The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;
    所述第一编码信息中的子像素细分强度小于所述第二编码信息中的子像素细分强度;The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;
    所述第一编码信息中的先行优化器参数小于所述第二编码信息中的先行优化器参数。The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
  12. 根据权利要求8-11任一项所述的装置,其特征在于,所述跟踪信息为对所述解码设备执行以下至少一种操作产生的信息:头部跟踪、手势跟踪、眼动跟踪、或运动跟踪。The apparatus according to any one of claims 8-11, wherein the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or Motion tracking.
  13. 根据权利要求8-12任一项所述的装置,其特征在于,所述解码设备的运动信息包括所述解码设备的运动速度和/或加速度,所述运动速度包括角速度和/或线速度,所述加速度包括角加速度和/或线加速度。The apparatus according to any one of claims 8-12, wherein the motion information of the decoding device includes a motion speed and/or acceleration of the decoding device, and the motion speed includes an angular velocity and/or a linear velocity, The acceleration includes angular acceleration and/or linear acceleration.
  14. 根据权利要求8-13任一项所述的装置,其特征在于,所述解码设备包括虚拟现实VR设备、增强现实AR设备、混合现实MR设备、或无人机飞行眼镜中的一种。The apparatus according to any one of claims 8-13, wherein the decoding device comprises one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
  15. 一种用于编码图像的设备,其特征在于,所述设备包括:存储器、处理器和收发器,其中:A device for encoding an image, characterized in that the device includes: a memory, a processor, and a transceiver, wherein:
    所述收发器用于从外界接收数据和向外界发送数据;The transceiver is used to receive data from the outside world and send data to the outside world;
    所述存储器用于存储程序指令和数据;The memory is used to store program instructions and data;
    所述处理器用于执行所述存储器中的程序指令,实现权利要求1-7任一项描述的方法。The processor is configured to execute program instructions in the memory to implement the method described in any one of claims 1-7.
  16. 一种系统,其特征在于,所述系统包括编码设备和解码设备,其中:A system, characterized in that the system includes an encoding device and a decoding device, wherein:
    所述解码设备用于,向编码设备发送所述解码设备的跟踪信息;所述跟踪信息包括所述解码设备的运动信息或位姿信息;The decoding device is configured to send tracking information of the decoding device to an encoding device; the tracking information includes motion information or pose information of the decoding device;
    所述编码设备用于,根据所述解码设备的跟踪信息,配置待处理图像的编码信息;所述跟踪信息与所述编码信息相关联,所述编码信息包括一个或多个编码参数;根据所述编码信息对所述待处理图像进行编码,并将码流发送至所述解码设备,所述码流包括所述一个或多个编码参数;The encoding device is configured to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters; The encoding information encodes the image to be processed, and sends a code stream to the decoding device, where the code stream includes the one or more encoding parameters;
    所述解码设备用于,根据所述码流进行图像解码及显示。The decoding device is used for image decoding and display according to the code stream.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码被计算机执行时,所述计算机用于执行权利要求1-7任一项描述的方法。A computer-readable storage medium, wherein the computer-readable storage medium is used to store program code, and when the program code is executed by a computer, the computer is used to execute any one of claims 1-7. method.
PCT/CN2021/099866 2020-06-12 2021-06-11 Information transmission method, related device, and system WO2021249562A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010535609.4 2020-06-12
CN202010535609.4A CN113810696A (en) 2020-06-12 2020-06-12 Information transmission method, related equipment and system

Publications (1)

Publication Number Publication Date
WO2021249562A1 true WO2021249562A1 (en) 2021-12-16

Family

ID=78846907

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099866 WO2021249562A1 (en) 2020-06-12 2021-06-11 Information transmission method, related device, and system

Country Status (2)

Country Link
CN (1) CN113810696A (en)
WO (1) WO2021249562A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115103175A (en) * 2022-07-11 2022-09-23 北京字跳网络技术有限公司 Image transmission method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107770561A (en) * 2017-10-30 2018-03-06 河海大学 A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data
US20180349705A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Object Tracking in Multi-View Video
CN109791605A (en) * 2016-08-01 2019-05-21 脸谱科技有限责任公司 Auto-adaptive parameter in image-region based on eyctracker information
CN110798497A (en) * 2018-08-03 2020-02-14 中国移动通信集团有限公司 Mixed reality interaction system and method
CN111179437A (en) * 2019-12-30 2020-05-19 上海曼恒数字技术股份有限公司 Cloud VR connectionless streaming system and connection method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9313493B1 (en) * 2013-06-27 2016-04-12 Google Inc. Advanced motion estimation
US10291932B2 (en) * 2015-03-06 2019-05-14 Qualcomm Incorporated Method and apparatus for low complexity quarter pel generation in motion search
CN109905702B (en) * 2017-12-11 2021-12-21 腾讯科技(深圳)有限公司 Method, device and storage medium for determining reference information in video coding
CN111801944B (en) * 2018-03-26 2021-10-22 华为技术有限公司 Video image encoder, decoder and corresponding motion information encoding method
CN110944171B (en) * 2018-09-25 2023-05-09 华为技术有限公司 Image prediction method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109791605A (en) * 2016-08-01 2019-05-21 脸谱科技有限责任公司 Auto-adaptive parameter in image-region based on eyctracker information
US20180349705A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Object Tracking in Multi-View Video
CN107770561A (en) * 2017-10-30 2018-03-06 河海大学 A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data
CN110798497A (en) * 2018-08-03 2020-02-14 中国移动通信集团有限公司 Mixed reality interaction system and method
CN111179437A (en) * 2019-12-30 2020-05-19 上海曼恒数字技术股份有限公司 Cloud VR connectionless streaming system and connection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115103175A (en) * 2022-07-11 2022-09-23 北京字跳网络技术有限公司 Image transmission method, device, equipment and medium
CN115103175B (en) * 2022-07-11 2024-03-01 北京字跳网络技术有限公司 Image transmission method, device, equipment and medium

Also Published As

Publication number Publication date
CN113810696A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
US10861215B2 (en) Asynchronous time and space warp with determination of region of interest
US10341650B2 (en) Efficient streaming of virtual reality content
CN109076249B (en) System and method for video processing and display
KR102474088B1 (en) Method and device for compositing an image
US20180165830A1 (en) Method and device for determining points of interest in an immersive content
US11363247B2 (en) Motion smoothing in a distributed system
CN109845275B (en) Method and apparatus for session control support for visual field virtual reality streaming
WO2020140758A1 (en) Image display method, image processing method, and related devices
CN116134809A (en) Method and apparatus for transmitting 3D XR media data
KR20190121867A (en) Method and apparatus for packaging and streaming virtual reality media content
CN110537208B (en) Head-mounted display and method
US11375244B2 (en) Dynamic video encoding and view adaptation in wireless computing environments
US10572764B1 (en) Adaptive stereo rendering to reduce motion sickness
US11924442B2 (en) Generating and displaying a video stream by omitting or replacing an occluded part
US20220172440A1 (en) Extended field of view generation for split-rendering for virtual reality streaming
US20210192681A1 (en) Frame reprojection for virtual reality and augmented reality
WO2021249562A1 (en) Information transmission method, related device, and system
WO2023093792A1 (en) Image frame rendering method and related apparatus
EP3564905A1 (en) Conversion of a volumetric object in a 3d scene into a simpler representation model
US11910034B2 (en) Network-based assistance for receiver processing of video data
US11863902B2 (en) Techniques for enabling high fidelity magnification of video
US20230132071A1 (en) Image processing device, image data transfer device, and image generation method
EP4202611A1 (en) Rendering a virtual object in spatial alignment with a pose of an electronic device
WO2019100247A1 (en) Virtual reality image display method, apparatus, device and system
KR20240048207A (en) Video streaming method and apparatus of extended reality device based on prediction of user&#39;s situation information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21822629

Country of ref document: EP

Kind code of ref document: A1