WO2021249562A1

WO2021249562A1 - Information transmission method, related device, and system

Info

Publication number: WO2021249562A1
Application number: PCT/CN2021/099866
Authority: WO
Inventors: 李龙龙; 邸佩云; 方华猛; 宋翼; 邹奕成
Original assignee: 华为技术有限公司
Priority date: 2020-06-12
Filing date: 2021-06-11
Publication date: 2021-12-16
Also published as: CN113810696A

Abstract

The present application provides an information transmission method, a related device, and a system. The method is applied to an encoding device, and comprises: receiving tracking information of a decoding device, the tracking information comprising movement information or pose information of the decoding device; configuring encoding information of an image to be processed according to the tracking information, the tracking information being associated with the encoding information, and the encoding information comprising one or more encoding parameters; and encoding said image according to the encoding information, and sending a code stream to the decoding device, the code stream comprising the one or more encoding parameters. Implementing the embodiments of the present application can reduce or even eliminate the unsmooth phenomena such as black edge and picture freeze of the decoding device, thereby improving user experience.

Description

Information transmission method, related equipment and system

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 12, 2020, the application number is 202010535609.4, and the application name is "an information transmission method, related equipment and system", the entire content of which is incorporated herein by reference Applying.

Technical field

The present invention relates to the technical field of video coding and decoding, in particular to an information transmission method, related equipment and system.

Background technique

At present, the virtual reality (VR) game industry is developing rapidly. The high-load tasks such as required rendering have a very large demand for computing resources. Generally, a high-performance game console is required, which leads to the high cost of VR games and cannot meet the requirements. People need to play VR games anytime, anywhere. CloudVR technology uses the idea of terminal-cloud collaboration to separate VR game rendering and VR game interaction. VR game interaction is completed in the terminal, and the interactive instructions (head-display and gesture posture and position information, user output instructions, etc.) are transmitted to the wireless network through the wireless network. Cloud server: The cloud server completes the rendering of the game according to the received interactive instructions, and transmits the game screen to the terminal for display through the wireless network. CloudVR technology can significantly reduce the cost of VR game terminals and enable users to access the Internet to play VR games anytime, anywhere.

In CloudVR technology, the cloud server and the terminal are connected through a wireless network. Compared with the VR game that renders and interacts locally on the terminal, there will be a larger delay. When the delay is too large at the time, obvious black borders will appear in the VR game screen. Unsmooth phenomena such as screen freezes, etc., significantly affect the VR gaming experience.

Summary of the invention

The embodiments of the present application provide an information transmission method, related equipment, and system, which can reduce the time delay of the CloudVR system, reduce or even eliminate the unevenness of the device such as black edges and screen freezes, and improve the user experience.

In the first aspect, the embodiments of the present application provide an information transmission method. The method is described from the perspective of an encoding device, including: receiving tracking information of a decoding device, where the tracking information includes motion information or pose of the decoding device Information; according to the tracking information of the decoding device, configure the encoding information of the image to be processed; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters; according to the encoding information, the encoding information of the image to be processed is configured The image is processed for encoding, and a code stream is sent to the decoding device, where the code stream includes the one or more encoding parameters.

Wherein, the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the performance of the decoding device Movement conditions, in a specific embodiment, the movement information includes the movement speed and/or acceleration of the decoding device, the movement speed includes an angular velocity and/or a linear velocity, and the acceleration includes an angular acceleration and/or a linear acceleration. The pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space, and the position may be in a three-dimensional coordinate system The three coordinate axes in x, y, z represent, the direction can be represented by (α, β, γ), and (α, β, γ) represents the angle of rotation around the three coordinate axes.

The correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information. For another example, there may be an indirect correlation between tracking information and coded information. For example, it is necessary to perform certain algorithm processing or condition judgment on the tracking information to determine the corresponding coded information. After the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information.

The encoding information includes one or more encoding parameters used by the encoder of the encoding device to encode the image to be processed (or called the image to be encoded). Since the encoder uses the encoding parameters to perform the encoding process, this means that the use of different encoding parameters requires different amounts of calculation and different computational complexity during the encoding process. That is to say, in this application, the encoding device can adjust the encoding parameters configured by itself based on the tracking information uploaded by the decoding device in real time, thereby realizing the adjustment of the encoding calculation complexity.

It can be seen that, after implementing the embodiments of the present application, the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration from the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information. To adjust the encoding information of the encoder in, the adjustment strategy may be to adjust the computational complexity (ie, encoding parameters) of the encoder. In an encoding and decoding system (such as the CloudVR system), one of the main components of the delay is the encoding delay in the system, and the embodiment of the application adjusts the encoding delay of the encoder by adjusting the calculation complexity of the encoder, thereby reducing the overall System delay, the image-related information and coding parameters can be sent to the decoding device subsequently, so that the decoding device can decode and display normally.

Since the black border phenomenon of the decoding device is closely related to the excessive system delay, this application reduces the calculation complexity of the encoder to reduce the system delay, which can greatly reduce or even eliminate the possibility of the black border of the picture, so that the decoding device receives After the code stream is reached, the image can be decoded and displayed in time, which also ensures the smoothness of the display of the decoding device and avoids the occurrence of jams.

Based on the first aspect, in a possible embodiment, the configuring encoding information of the image to be processed according to the tracking information of the decoding device specifically includes: querying a preset mapping relationship according to the tracking information to obtain the The encoding information of the image, the preset mapping relationship includes the mapping relationship between the tracking information and the encoding information; the encoding information is configured.

The preset mapping relationship may be pre-stored in the storage unit of the encoding device, and is used to characterize the mapping relationship between the tracking information and the encoding information. For example, in implementation, the preset mapping relationship may be a mapping table, which may directly record the mapping relationship between various tracking information and encoding parameters, or the mapping table may record various values of motion information or pose information The mapping relationship between the range and the encoding parameter, by determining which value range the specific value in the tracking information is in, can determine which encoding parameter the tracking information corresponds to. After obtaining the encoding information corresponding to the tracking information, the encoding device can configure the encoding information (one or more encoding parameters) into the encoder (that is, to replace the previously configured encoding parameters), thereby realizing the encoding parameters of the encoder The adjustment of, that is, the coding calculation complexity of the coding process is adjusted.

It can be seen that the embodiment of the present application can realize rapid adjustment of encoding calculation complexity by setting the preset mapping relationship, thereby realizing rapid adjustment of encoding delay, which is beneficial to eliminating black edges and jams on the decoding end. In addition, technicians can define the specific content of the preset mapping relationship according to actual needs and set it in the encoding device. Therefore, the embodiments of the present application are also beneficial to provide various preset mapping relationship selection possibilities to meet various requirements. Application scenarios to meet actual coding needs.

Based on the first aspect, in a possible embodiment, the one or more encoding parameters include deblocking filter (deblock_filter) parameters, reference frame number (Ref), motion estimation search range (me_range), motion estimation method (me_method ), one or more of sub-pixel subdivision intensity (subme), and lookahead optimizer parameters.

Among them, the deblocking filter parameter is used to indicate whether to activate the deblock_filter function to perform deblocking filtering on the reconstructed image. The number of reference frames parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction. The motion estimation search range parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder. The motion estimation method is used to indicate the setting of the full-pixel motion estimation method. The motion estimation method includes the motion search algorithm (such as the diamond search algorithm, the hexagon search algorithm, the asymmetric cross multi-level hexagon grid search algorithm, etc.) . The sub-pixel subdivided intensity parameter is used to indicate the dynamic prediction and partition mode. The advance optimizer parameter is used to set the frame buffer size for thread prediction.

Based on the first aspect, in a possible embodiment, when the tracking information is greater than or equal to a preset threshold, the tracking information maps the first encoding information; when the tracking information is less than the preset threshold, the tracking information maps Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:

(1) The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on.

For example, deblock_filter=1 means that the function is turned on, and deblock_filter=0 means that the function is turned off. For example, in a specific implementation, when the specific value in the tracking information (such as speed, acceleration, acceleration, position, attitude, etc.) is greater than or equal to the preset threshold, configure "deblock_filter=0", that is, turn off the deblocking filter Function, so that the encoder saves the work of the deblocking filter, reduces the coding complexity, thereby reduces the coding delay, and avoids black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure "deblock_filter=1", and the deblocking filter function is turned on at this time. Since the motion of the decoding end is slow, the delay caused by turning on the deblocking filter function will not cause Phenomena such as black borders and stuttering.

(2) The number of reference frames in the first encoded information is smaller than the number of reference frames in the second encoded information.

For example, when the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value, at this time, 0<Ref<2 is configured, thereby reducing the number of reference frames in the encoding prediction. The coding complexity is reduced, thereby reducing the coding delay, and avoiding black bars and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, configure 16≥Ref≥2, the number of reference frames in coding prediction increases, and the coding complexity increases. Due to the slower motion of the decoding end, the delay caused by increasing the number of reference frames Will not cause black borders, jams and other phenomena.

(3) The motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information.

For example, when the specific value in the tracking information (such as the value of velocity, acceleration, acceleration, position, posture, etc.) is greater than or equal to the preset threshold value, 4≤me_range≤8 is configured at this time, thereby reducing the motion estimation radius in the coding prediction. The coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, 8<me_range≤64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena.

(4) The calculation amount of the motion estimation mode in the first coded information is smaller than the calculation amount of the motion estimation mode in the second coded information.

For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, a relatively simple motion estimation method is configured at this time, such as the diamond search algorithm dia, the search algorithm is relatively simple , The amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black borders and jams on the decoding end. Conversely, when the specific value is less than the preset threshold, relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena.

(5) The sub-pixel subdivision intensity in the first coded information is less than the sub-pixel subdivision intensity in the second coded information.

For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, configure subme to be equal to 0 or 1, thereby reducing coding complexity and coding Time delay to avoid black borders and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 1<subme≤11 to increase coding complexity. Since the decoding end moves slowly, the delay caused will not cause black bars, jams, etc.

(6) The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.

For example, when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, then configure 0≤lookahead<2 to reduce the size of the frame buffer, thereby reducing coding complexity Therefore, the encoding delay is reduced to avoid black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 2≤lookahead≤250 to increase the size of the frame buffer, thereby increasing the coding complexity. Due to the slower motion of the decoding end, the delay caused will not cause black borders, Caton and other phenomena.

Based on the first aspect, in a possible embodiment, the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.

Among them, head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture. Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements. Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head. Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment. It can be seen that the embodiments of the present application can be applied to a variety of tracking scenarios, meet the needs of users in different scenarios, and improve the applicability and commercial value of the present application.

Based on the first aspect, in a possible embodiment, the decoding device includes a virtual reality (VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (MR) device, or an unmanned A kind of aircraft flying glasses.

For example, the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology, AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology, and MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology. In terms of product form, the decoding device can be a head-mounted display device (Head Mount Display, HMD), and the head-mounted display device and the host (ie, the encoding device) can communicate and interact in a wireless or wired manner, and the host encodes the image Then it is transmitted to the head-mounted display device, and the head-mounted display device decodes and displays the image, thereby bringing the user a visual experience and interactive experience of VR/AR/MR.

UAV flight glasses are devices used to interact with the UAV's camera. The flight glasses and the drone can communicate and interact wirelessly. The drone encodes the captured image/video and transmits it to the flight glasses. The flight glasses decode the image and display it, thereby bringing the user a vision experience of the drone. , It can even realize the control of the drone's flight attitude/shooting direction.

In the second aspect, an embodiment of the present application provides a device for encoding an image. The device is applied to an encoding device and includes: a receiving module, a parameter adjustment module, an encoding module, and a transmitting module. The receiving module is used for receiving and decoding. The tracking information of the device; the tracking information includes the motion information or the pose information of the decoding device; the parameter adjustment module is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information and The encoding information is associated, and the encoding information includes one or more encoding parameters; an encoding module is used to encode the image to be processed according to the encoding information; and a transmission module is used to send a code stream to the In the decoding device, the code stream includes the one or more encoding parameters.

In the same way, the tracking information is obtained by the decoding device by tracking and detecting the motion state of itself or the user, the tracking information includes at least one of motion information and pose information, and the motion information is used to indicate the decoding device In a specific embodiment, the motion information includes the motion speed and/or acceleration of the decoding device, the motion speed includes angular velocity and/or linear velocity, and the acceleration includes angular acceleration and/or linear acceleration. The pose information is used to indicate the position information and/or pose information of the decoding device or the user, that is, the pose number information may indicate the position and pose (or direction) of the decoding device in a three-dimensional space.

The correlation between the tracking information and the encoding information means that there is a corresponding relationship between the tracking information and the encoding information, and the relationship between the two is stored in the encoding device. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information and the encoding information are bound, and the parameter adjustment module can directly determine the encoding information through the tracking information. For another example, the tracking information and the coded information may be indirectly related. For example, the parameter adjustment module needs to perform certain algorithm processing or conditional judgment on the tracking information to determine the corresponding coded information. After the receiving module receives the specific tracking information uploaded by the decoding device, it can determine the corresponding encoding information according to the specific tracking information.

It can be seen that the device of the embodiment of the present application can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration feedback from the decoding device in real time, and perform an analysis of the tracking information in the encoding device according to the received tracking information. The encoding information of the encoder is adjusted. The adjustment strategy can be to adjust the computational complexity of the encoder (ie encoding parameters), thereby reducing the overall system delay, and subsequently sending the image-related information and encoding parameters to the decoding device to facilitate the decoding device Normal decoding and display can greatly reduce or even eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of jams.

Based on the second aspect, in a possible embodiment, the parameter adjustment module is specifically configured to: query a preset mapping relationship according to the tracking information to obtain the encoding information of the image to be processed, and the preset mapping relationship includes all The mapping relationship between the tracking information and the encoding information; configuring the encoding information.

Based on the second aspect, in a possible embodiment, the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, and advanced optimizer parameters One or more of.

Based on the second aspect, in a possible embodiment, when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first encoding information; when the tracking information is less than the preset threshold, the tracking information is mapped Second encoding information, and the first encoding information and the second encoding information satisfy at least one of the following relationships:

The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;

The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;

The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;

The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;

The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.

Based on the second aspect, in a possible embodiment, the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking.

Based on the second aspect, in a possible embodiment, the motion information of the decoding device includes the motion speed and/or acceleration of the decoding device, the motion speed includes angular velocity and/or linear velocity, and the acceleration includes angular acceleration And/or linear acceleration.

Based on the second aspect, in a possible embodiment, the decoding device includes one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.

In each of the embodiments described in the second aspect above, the functional modules of the device can cooperate with each other to implement the methods described in the related embodiments of the first aspect.

In a third aspect, an embodiment of the present application provides a device for encoding an image. The device may be an encoding device. The encoding device includes: a memory, a processor, and a transceiver, and each component in the memory, the processor, and the transceiver It may be connected by a bus, or at least two components of the memory, the processor, and the transceiver may be coupled together. in:

The transceiver is used to receive data from the outside world and send data to the outside world;

The memory is used to store program instructions and data;

The processor is configured to execute program instructions in the memory to implement the method described in the first aspect or any possible embodiment of the first aspect.

In a fourth aspect, an embodiment of the present application provides a system that includes an encoding device and a decoding device, wherein: the decoding device is configured to send tracking information of the decoding device to the encoding device; and the tracking information includes all The motion information or pose information of the decoding device; the encoding device is used to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding The information includes one or more encoding parameters; the image to be processed is encoded according to the encoding information, and a code stream is sent to the decoding device, and the code stream includes the one or more encoding parameters. The decoding device is used to decode and display the image according to the code stream.

Specifically, the encoding device may be the encoding device described in any embodiment of the second aspect or the third aspect.

In a fifth aspect, an embodiment of the present application provides a computing node cluster (or cloud cluster), including: at least one computing node, each computing node includes a processor and a memory, and the processor executes the code in the memory Perform the method according to any one of the embodiments of the first aspect.

In a sixth aspect, an embodiment of the present invention provides a non-volatile computer-readable storage medium; the computer-readable storage medium is used to store implementation code of the method described in the first aspect. When the program code is executed by a computer, the computer is used to implement the method described in any one of the embodiments of the first aspect.

In a seventh aspect, an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computer, the computer executes the method described in any one of the embodiments of the first aspect. The computer program product may be a software installation package. In the case that any one of the possible designs provided in the foregoing first aspect needs to be used, the computer program product may be downloaded and executed on the computer to achieve The method described in any embodiment of the first aspect.

It can be seen that, implementing the embodiments of the present application, the encoding device can receive instructions that contain at least one type of information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, according to the position/posture/linear velocity/angular velocity of the decoding device /Acceleration automatically adjusts the computational complexity (encoding parameters) of the encoder, thereby adjusting the encoding delay, thereby reducing the overall system delay, and then the image-related information and the configured encoding parameters can be sent to the decoding device for the convenience of the decoding device Decode and display normally. The present application reduces the system delay by reducing the computational complexity of the encoding device in the encoding process, which can fundamentally eliminate the possibility of black borders on the screen, ensure the smoothness of the display of the decoding device, and avoid the occurrence of stuttering.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present invention or the background art, the following will describe the drawings that need to be used in the embodiments of the present invention or the background art.

FIG. 1 is a block diagram of an example video decoding system 10 provided by an embodiment of the present application;

FIG. 2 is an example diagram of a device experience scenario applied in an embodiment of the present application;

FIG. 3 is an example diagram of yet another device experience scenario applied in an embodiment of the present application;

Figure 4 is a schematic structural diagram of a video decoding device provided by an embodiment of the present application;

FIG. 5 is a simplified block diagram of a device that can be used as either or both of a source device and a destination device according to an embodiment of the present application;

FIG. 6 is an example diagram of a head-turning scene of a user wearing a device provided by an embodiment of the present application; FIG.

FIG. 7 is an example diagram of a black border phenomenon provided by an embodiment of the present application;

FIG. 8 is an example flow chart of an information transmission solution provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of four tracking modes for realizing interaction between users and screens involved in an embodiment of the present application;

FIG. 10 is a schematic flowchart of an information transmission method provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of some search templates provided by embodiments of the present application;

FIG. 12 is a logical schematic diagram of determining encoding information according to tracking information according to an embodiment of the present application; FIG.

FIG. 13 is a schematic flowchart of yet another information transmission method provided by an embodiment of the present application;

FIG. 14 is an example flowchart of another information transmission solution provided by an embodiment of the present application; FIG.

FIG. 15 is a structural diagram of a system provided by an embodiment of the present application and an encoding device and a decoding device in the system.

detailed description

The embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention. The terms used in the embodiment of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

The terms "first", "second", etc. in the specification embodiments and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusion, for example, including a series of steps or units. The method, system, product, or device is not necessarily limited to those clearly listed steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or devices.

It should be understood that in this application, "at least one (item)" refers to one or more, and "multiple" refers to two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships. For example, "A and/or B" can mean: only A, only B, and both A and B. , Where A and B can be singular or plural. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a). For example, at least one (a) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ", where a, b, and c can be single or multiple.

In order to better understand the technical solutions of the present application, firstly, the architecture of the video decoding system applied in the embodiments of the present invention will be described. Video coding generally refers to a technology that processes a sequence of pictures that form a video or video sequence. The video encoding technology used in this article may include video encoding (video encoding) and video decoding (video decoding). Video encoding is performed on the source side, and usually includes processing (for example, by compressing) the original video picture to reduce the amount of data required to represent the video picture, so as to store and/or transmit more efficiently. Video decoding is performed on the destination side, and usually includes inverse processing relative to the encoder to reconstruct the video picture. The “encoding” of video pictures involved in the embodiments should be understood as involving the “encoding” or “decoding” of the video sequence. The combination of the encoding part and the decoding part is also called codec (encoding and decoding). As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this document, the term "video coding" or "coding" may generally refer to video encoding or video decoding.

Referring to FIG. 1, FIG. 1 is a block diagram of a video decoding system 10 according to an example described in an embodiment of the present invention. As shown in FIG. 1, the video coding system 10 may include a source device 12 and a destination device 14. The source device 12 generates encoded video data. Therefore, the source device 12 may be referred to as a video encoding device. The destination device 14 can decode the encoded video data generated by the source device 12, and therefore, the destination device 14 can be referred to as a video decoding device. Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program codes in the form of instructions or data structures that can be accessed by a computer.

The source device 12 and the destination device 14 may communicate with each other via a link 13, and the destination device 14 may receive encoded video data from the source device 12 via the link 13. Link 13 may include one or more media or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time. In this example, the source device 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 14. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.

The source device 12 and the destination device 14 may include various devices, and the existence and (accurate) division of the functionality of the source device 12 and/or the destination device 14 may vary according to actual devices and applications. At least one of the source device 12 and the destination device 14 may include a desktop computer, a mobile computing device, a notebook (eg, laptop) computer, a tablet computer, a set-top box, a mobile phone, a smart phone, a television, a camera , Display devices, set-top boxes, digital media players, video game consoles, video streaming equipment (such as content service servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or their Similar.

Referring to FIG. 2, in some possible implementations, the solution of the present application can be applied to an immersive virtual visual experience scene. The source device 12 may be a host, which may be an independent terminal, a computing device, a physical server, or a cloud computing (cloud computing) platform. The destination device 14 may be a virtual reality (Virtual Reality, VR) device, an augmented reality (Augmented Reality, AR) device, a mixed reality (Mixed Reality, MR) device, or the like.

For example, the VR device can be VR glasses, VR headset, VR box, etc., devices that apply VR technology, AR devices can be AR glasses, AR TVs, AR headsets, and other devices that apply AR technology, and MR devices can be MR Glasses, MR terminals, MR head-mounted displays, MR wearable devices, and other devices that use MR technology.

In terms of product form, the destination device 14 can be a head-mounted display device (Head Mount Display, HMD). The head-mounted display device and the host can communicate and interact in a wireless or wired manner. The host encodes the image and transmits it to Head-mounted display device, the head-mounted display device decodes the image and displays it, thereby bringing the user a visual experience and interactive experience of VR/AR/MR. The head-mounted display device may be, for example, a mobile-end headset or a host-end headset. The mobile terminal head display, such as VR/AR/MR glasses, VR/AR/MR mobile phone box, etc., can be connected to the host wirelessly (such as Bluetooth, WIFI, mobile network, etc.). The host-side headset can also be called an external head-mounted device, which requires a wired connection to the host and other accessories for use.

In addition, in another possible implementation, the computing function of the host can also be integrated into the head-mounted display device. For example, the head-mounted display device can be an all-in-one headset, and the all-in-one headset has an independent display device ( As the decoding end) and the computing unit (as the encoding end), the two complete the communication interaction inside the all-in-one headset.

Referring to Fig. 3, in other possible implementations, the solution of the present application can also be applied to the control or visual experience scenes of unmanned vehicles. For example, the source device 12 may be a drone, an unmanned car (not shown), etc., and the source device 12 may be equipped with a camera for image capture and encoding. The destination device 14 may be a drone's flying glasses, an unmanned car control device (not shown), or the like. Figure 3 shows the scene of interaction between the flying glasses and the drone. The flying glasses and the drone can communicate and interact wirelessly. The drone encodes the captured image/video and transmits it to the flying glasses. The flying glasses decode the image and display it, so as to bring the user a vision experience of the drone, and even realize the control of the drone's flight attitude/shooting direction.

Further, the source device 12 includes an encoder 20, and optionally, the source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22. In a specific implementation form, the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:

The picture source 16, which can include or can be any type of picture capture device, for example to capture real-world pictures, videos, and/or any type of pictures or comments (for screen content encoding, some text on the screen is also considered to be waiting The encoded picture or part of the image) generating equipment, for example, a computer graphics processor for generating computer animation pictures, or for acquiring and/or providing real-world pictures (such as images taken by a camera), computer animation pictures (for example, Screen content, VR pictures), and/or any combination thereof (for example, AR/MR pictures). The picture source 16 may be a camera for capturing pictures or a memory for storing pictures. The picture source 16 may also include any type (internal or external) interface for storing previously captured or generated pictures and/or acquiring or receiving pictures. In the field of video coding, the terms "picture", "frame" or "image" can be used as synonyms. When the picture source 16 is a camera, the picture source 16 may be, for example, a local or an integrated camera integrated in the source device; when the picture source 16 is a memory, the picture source 16 may be local or, for example, an integrated camera integrated in the source device. Memory. When the picture source 16 includes an interface, the interface may be, for example, an external interface for receiving pictures from an external video source. The external video source is, for example, an external picture capturing device, such as a camera, an external memory, or an external picture generating device, such as It is an external computer graphics processor, computer or server. The interface can be any type of interface based on any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.

In the embodiment of the present invention, the picture transmitted from the picture source 16 to the picture preprocessor may also be referred to as original picture data 17.

The picture preprocessor 18 is configured to receive the original picture data 17 and perform preprocessing on the original picture data 17 to obtain the preprocessed picture 19 or the preprocessed picture data 19. For example, the pre-processing performed by the picture pre-processor 18 may include one or more of image rendering, trimming, color format conversion, toning, or denoising.

The encoder 20 is configured to receive the pre-processed picture data 19, and process the pre-processed picture data 19 using the configured coding prediction mode and coding parameters, so as to provide the coded picture data 21. Generally, picture data can be divided into a set of non-overlapping blocks (also called image blocks, or video blocks). In other words, the current image to be processed by the encoder 20 may include one or more blocks at the block level (also called Is an image block, or a video block). The encoder 20 can perform encoding at the block level. Here, the term "image to be processed" may refer to a part of a picture or a frame. Specifically, the "image to be processed" may be a "image block to be processed", that is, a block currently to be processed. In encoding, the image to be processed may include the block currently to be encoded; in decoding, the image to be processed may include the block currently to be decoded. For example, on the encoder side, the prediction block is generated through spatial (intra-picture) prediction and temporal (inter-picture) prediction, and the prediction block is subtracted from the current block (currently processed or to-be-processed block) to obtain the residual block. Domain transforms the residual block and quantizes the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side applies the inverse processing part relative to the encoder to the coded or compressed block to reconstruct it for representation The current block. In addition, the encoder duplicates the decoder processing loop, so that the encoder and the decoder generate the same prediction (for example, intra-frame prediction and inter-frame prediction) and/or reconstruction for processing, that is, to encode subsequent blocks.

In some embodiments, the encoder 20 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the encoding side.

The communication interface 22 can be used to receive the encoded picture data 21, and can transmit the encoded picture data 21 to the destination device 14 or any other device (such as a memory) through the link 13 for storage or direct reconstruction, so The other device can be any device used for decoding or storage. The communication interface 22 can be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a data packet, for transmission on the link 13.

The destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, a picture post-processor 32, and a display device 34. They are described as follows:

The communication interface 28 can be used to receive the encoded picture data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded picture data storage device. The communication interface 28 can be used to transmit or receive the encoded picture data 21 via the link 13 between the source device 12 and the destination device 14 or via any type of network. The link 13 is, for example, a direct wired or wireless connection. The type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof. The communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded picture data 21.

Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded picture data Information about the transmission of the transmitted data.

The decoder 30 is configured to receive the encoded picture data 21 and parse out the instruction information transmitted in the code stream. The instruction information indicates the encoding parameters used when the encoder 20 encodes the image, based on the encoded picture data 21 and the instruction The information enables image decoding, thereby providing decoded picture data 31 (also referred to as reconstructed picture data). In some embodiments, the decoder 30 may be used to implement the various embodiments described below to realize the application of the information transmission method described in the present invention on the decoding side.

The picture post processor 32 is configured to perform post-processing on the decoded picture data 31 to obtain the post-processed picture data 33. For example, the post-processing performed by the picture post-processor 32 may include one or more of: rendering, color format conversion, toning, trimming or resampling, or any other processing, and may also be used to transmit the post-processed picture data 33 To display device 34. In an optional embodiment of the present application, the decoding device can also adjust one or more processing algorithms used by the picture post-processor 32 according to tracking information (such as speed, angular velocity, acceleration, linear velocity, position, posture, etc.). Turn it on or off, for example, you can adjust at least one of the following algorithms: Standard-Dynamic Range (SDR) image algorithm, High-Dynamic Range (HDR) image algorithm, image enhancement algorithm, image super-resolution Rate algorithm and so on.

The display device 34 is configured to receive the post-processed picture data 33 to display the picture to, for example, a user or a viewer. The display device 34 may be or may include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor. For example, the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (digital light processor, DLP) or any type of other display.

Although FIG. 1 shows the source device 12 and the destination device 14 as separate devices, the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or Corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .

Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the technology is partially implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.

In some cases, the video decoding system 10 shown in FIG. 1 is only an example, and the technology of this application can be applied to video encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, video encoding or video decoding). ). In other instances, the data can be retrieved from local storage, streamed on the network, etc. The video encoding device can encode data and store the data to the memory, and/or the video decoding device can retrieve the data from the memory and decode the data. In some instances, encoding and decoding are performed by devices that do not communicate with each other but only encode data to the memory and/or retrieve data from the memory and decode the data.

Referring to FIG. 4, FIG. 4 is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application. The video coding device 400 is suitable for implementing the embodiments described herein. In one embodiment, the video coding device 400 may be a video decoder (for example, the decoder 30 of FIG. 1) or a video encoder (for example, the encoder 20 of FIG. 1). In another embodiment, the video coding device 400 may be one or more components of the decoder 30 in FIG. 1 or the encoder 20 in FIG. 1 described above.

The video decoding device 400 includes: an entry port 410 for receiving data and a receiver unit (Rx) 420, a processor, logic unit or central processing unit (CPU) 430 for processing data, and a transmitter for transmitting data A unit (Tx) 440 and an outlet port 450, and a memory 460 for storing data. The video decoding device 400 may further include a photoelectric conversion component and an electro-optical (EO) component coupled with the inlet port 410, the receiver unit 420, the transmitter unit 440, and the outlet port 450 for the outlet or inlet of optical or electrical signals.

The processor 430 is implemented by hardware and software. The processor 430 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs. The processor 430 communicates with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460. The processor 430 includes a decoding module 470 (for example, an encoding module 470 or a decoding module 470). The encoding/decoding module 470 implements the embodiments disclosed herein to implement what is provided in the embodiments of the present invention. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations. Therefore, the encoding/decoding module 470 provides a substantial improvement to the function of the video decoding device 400 and affects the conversion of the video decoding device 400 to different states. Alternatively, the encoding/decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.

The memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution. The memory 460 may be volatile and/or non-volatile, and may be read only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).

Referring to FIG. 5, FIG. 5 is a simplified block diagram of an apparatus 500 that can be used as either or both of the source device 12 and the destination device 14 in FIG. 1, according to an exemplary embodiment. The apparatus 500 may take the form of a computing system containing multiple computing devices (such as multiple computing chips or multiple servers), or adopt such forms as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, mobile phones, Smartphones, televisions, cameras, display devices, set-top boxes, digital media players, video game consoles, video streaming equipment, broadcast receiver equipment, broadcast transmitter equipment, vehicle-mounted equipment, mobile vehicles or the like, etc. The form of a single computing device.

The processor 502 in the device 500 may be a central processing unit. Alternatively, the processor 502 may be any other type of device or multiple devices that can manipulate or process information that is currently or will be developed in the future. As shown in the figure, although a single processor, such as processor 502, may be used to practice the disclosed embodiments, the use of more than one processor may achieve advantages in terms of speed and efficiency.

In an embodiment, the memory 504 in the apparatus 500 may be a read only memory (Read Only Memory, ROM) device or a random access memory (random access memory, RAM) device. Any other suitable type of storage device can be used as the memory 504. The memory 504 may include code and data 506 accessed by the processor 502 using the bus 512. The memory 504 may further include an operating system 508 and an application program 510, and the application program 510 includes at least one program that permits the processor 502 to execute the method described herein. For example, the application program 510 may include applications 1 to N, and applications 1 to N further include video encoding applications that perform the methods described herein, such as AR/VR/MR applications, drone flight/shooting control applications, unmanned Driving control applications and more. The apparatus 500 may also include additional memory in the form of a slave memory 514, which may be, for example, a memory card used with a mobile computing device. Because the video communication session may contain a large amount of information, this information may be stored in the slave memory 514 in whole or in part, and loaded into the memory 504 for processing as needed.

The apparatus 500 may also include one or more output devices, such as a display 518. In one example, the display 518 may be a touch-sensitive display that combines a display and a touch-sensitive element operable to sense touch input. The display 518 may be coupled to the processor 502 through the bus 512. In addition to the display 518, other output devices that allow the user to program the device 500 or use the device 500 in other ways may also be provided, or other output devices may be provided as an alternative to the display 518. When the output device is a display or includes a display, the display can be implemented in different ways, including through a liquid crystal display (LCD), a cathode-ray tube (CRT) display, a plasma display, or a light emitting diode (light emitting diode). Diode, LED) displays, such as organic LED (organic LED, OLED) displays.

The device 500 may also include or be connected to an image sensing device 520, such as a camera (camera), an infrared detector, or any other image sensing device that can sense images that is currently or will be developed in the future. Equipment 520. The image sensing device 520 may be placed directly facing the user running the device 500, or may be placed directly facing the external environment. In an example, the position and optical axis of the image sensing device 520 may be configured such that its field of view includes an area immediately adjacent to the display 518 and the display 518 is visible from the area.

When the apparatus 500 is the destination device 14, it may optionally further include a motion sensing device 522, and the motion sensing device 522 may be used to realize the interaction between the user and the destination device. Specifically, the motion sensing device 522 can be used to detect at least one type of information such as the location/posture/linear velocity/angular velocity/acceleration of the destination device or the user’s body part to implement the tracking methods described in the embodiments of the present application: head tracking, gestures Tracking, eye tracking and running tracking. For example, when the motion sensing device 522 is used to perform the head tracking described in the embodiments of the present application, the motion sensing device 522 may include at least one sensor such as an accelerometer, a gyroscope, a magnetometer, an optical capture device, and an inertial sensor. , To monitor in real time at least one type of information such as the rotation angle, angular velocity, angular acceleration, and rotation direction of the head of the user wearing the destination device. When the motion sensing device 522 is used to perform the gesture tracking described in the embodiments of this application, the motion sensing device 522 may include optical capture such as accelerometers, gyroscopes, magnetometers, inertial sensors, optical cameras, infrared cameras, and depth sensors. At least one device such as a device, so as to monitor at least one kind of information such as the posture, shape, movement speed, and movement direction of the user's hand in real time. When the motion sensing device 522 is used to perform the eye tracking described in the embodiments of the present application, the motion sensing device 522 may include at least one device such as a built-in camera, an eye tracker, an infrared controller, an iris image detector, etc. Real-time monitoring of at least one information of the user's eyeball position, gaze direction, movement direction, and movement speed. When the motion sensing device 522 is used to perform the motion tracking described in the embodiments of the present application, the motion sensing device 522 may include an inertial measurement unit (IMU), an accelerometer, a gyroscope, a magnetometer, a depth camera, or SLAM (simultaneous localization). and mapping, real-time positioning and map construction) system and other at least one device to monitor at least one type of information such as the speed, acceleration, direction, position, and posture of the user moving in the real environment.

Although the processor 502 and the memory 504 of the device 500 are shown as integrated in a single unit in FIG. 5, other configurations may also be used. The operation of the processor 502 may be distributed in multiple directly coupled machines (each machine has one or more processors), or distributed in a local area or other network. The storage 504 may be distributed in multiple machines, such as a network-based storage or storage in multiple machines running the apparatus 500. Although only a single bus is shown here, the bus 512 of the device 500 may be formed by multiple buses. Further, the slave memory 514 may be directly coupled to other components of the device 500 or may be accessed through a network, and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Therefore, the device 500 can be implemented in a variety of configurations.

The existing VR/AR/MR technology allows users to obtain an immersive visual experience and satisfy the interaction between the user and the screen.

Referring to Figure 6, taking head-mounted VR glasses as an example, as far as the current VR glasses are concerned, immersion and interaction are mainly achieved through two aspects:

One is that the current VR glasses will probably produce a field of vision (FOV) that exceeds 90 degrees (for example, 90-120 degrees). Through the enlarged display technology, it can display a magnified partial virtual reality in front of the user’s eyes. Scenery, within this display range, real-time three-dimensional images can be generated through three-dimensional engine technology.

The second is to cooperate with the data collected by the head's position and attitude sensor (such as the head gyroscope), so that the three-dimensional engine responds to the head rotation direction (and the current head position change). When the person turns the head, the gyroscope It can notify the image generation engine to render a new screen accordingly, and the image generation engine sends the new screen back to the VR glasses, and the VR glasses update the displayed three-dimensional images in real time. In this process, the angle of the user's head rotation is exactly the same as the visual 3D image simulated by the 3D engine, making the user feel that the Buddha is observing a surrounding virtual 3D world through a large window. Because the user’s head rotation produces a picture change that the user can understand in the virtual world, the user thinks that the virtual world has feedback to the user, then the user’s actions and the virtual world’s feedback to the user are combined to form an interaction effect.

In CloudVR technology, the image generation engine is located on the cloud server, that is, the game screen rendering is performed on the server side, and the game interaction is performed on the VR glasses side. The server and the VR glasses are connected through a wireless network. After rendering, the new image is transmitted through wireless transmission. Back to the VR glasses display. As shown in Figure 6, after the user wears the VR glasses, it is assumed that the current field of view (FOV) is "field of view 1". When the user turns his head at a certain angle, the field of view rotates from "field of view 1" to "field of view 2". If the picture is not updated in time, the human eye may feel a black area at the edge of "field of view 2", that is, black edges. (1) in FIG. 7 shows a VR scene without black borders, and (2) in FIG. 7 shows a VR scene with black borders.

Generally speaking, there is a certain time delay in the wireless transmission of image information. When the user turns his head too fast, the frequency of the picture that needs to be updated increases. And the existence of time delay causes these images that need to be updated are not displayed in the VR glasses in time, which leads to the existence of black borders. In addition, due to the untimely update of the images, the VR glasses will also experience a sense of unsmoothness such as screen freezes, which is significant. Affect the VR gaming experience.

The devices/devices described in Figures 1 to 4 of the embodiments of the present application can solve the defects of the prior art, and can simultaneously avoid the occurrence of black borders, freezes and other unsmooth phenomena at the decoding end, and can also ensure the image resolution.

The technical solution proposed by the embodiment of this application is shown in Figure 8. The CloudVR system includes two parts: a cloud server (equivalent to the source device 12 described in this application) and a VR device (equivalent to the destination device 14 described in this application). After the server receives the instruction that contains information such as posture/rotational angular velocity/acceleration feedback from the VR device, the game rendering engine will render the corresponding image (the rendering resolution can be maintained unchanged), and combine the image with the posture/rotational angular velocity/acceleration, etc. The information is sent to the encoder. The encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder by judging the rotation speed/acceleration/posture of the VR device and other information to adjust the encoding delay of the encoder. Later, the image-related information can be sent to the VR device for decoding and show. Among them, when the VR device rotates fast, reduce the computational complexity of the encoder, thereby reducing the encoding delay of the encoder, and then reduce the CloudVR system delay, thereby eliminating the possibility of black edges on the screen; when the VR device rotates slowly When the time, the computational complexity of the encoder returns to normal, and the system delay returns to normal. Since the VR device in this solution can decode and display images in real time and in time, the smoothness of the display of the VR device is also ensured, and the occurrence of stuttering is avoided.

The computational complexity of coding mentioned in this article is determined by the texture complexity and motion complexity of the video image. The more uniform the video image, the smoother the motion, the lower the coding calculation complexity; on the contrary, the higher the coding calculation complexity.

It should be noted that in the embodiment shown in FIG. 8, the destination device is a VR device as an example. After the user wears the VR device, the posture/rotational angular velocity/acceleration of the VR device can be detected by head tracking. accomplish. In practical applications of this application, the information that the VR device feeds back to the cloud server may not be limited to information obtained by head tracking, but may also be information obtained by gesture tracking, or information obtained by eye tracking, or information obtained by motion tracking.

Referring to Fig. 9, the four tracking methods for realizing the interaction between the user and the screen will be described below.

(1) Head tracking

Head tracking is to track the head movement by measuring the angle, angular velocity or angular acceleration when the user's head rotates, thereby triggering the response of the visual picture. In terms of specific implementation, by configuring sensors such as accelerometers, gyroscopes, magnetometers, optical capture devices, and inertial sensors inside the destination device, the rotation angle, angular velocity, and angular velocity of the head of the user wearing the destination device can be monitored in real time. Angular acceleration, rotation direction and other information. The result of head tracking is that when the user puts on the destination device (such as a VR device) and turns his head, the picture they see will move with the movement of the head, simulating the scene where the user turns his head and sees a new picture. , So as to obtain an immersive visual experience.

(2) Gesture tracking

Gesture tracking is to track the movement of the hand by detecting the posture, shape, movement speed, and direction of the user's hand in the real environment, thereby triggering the response of the visual screen or triggering the interaction with the screen elements. In terms of specific implementation, the use of gesture tracking can be divided into two ways: one is the contact detection method, that is, the user's hand needs to be bound to the sensor (for example, the data glove worn on the hand, or the user's handheld device), and the sensor can be It is an accelerometer, gyroscope, magnetometer, inertial sensor, etc., so as to monitor the user's hand posture, shape, moving speed, moving direction and other information in real time. The other is a non-contact detection method, which can identify the posture, shape, movement speed, and direction of the user's hand by configuring optical capture devices such as optical cameras, infrared cameras, and depth sensors in the destination device. Gesture tracking can bring users direct participation and interaction with the screen content, and enhance the user experience.

(3) Eye tracking

Eye tracking is to track the eye movement by measuring the position of the gaze point of the user's eyes or the movement of the eyeball relative to the head. In specific implementation, by configuring devices such as built-in cameras, eye trackers, infrared controllers, iris image detectors, etc. inside the destination device, the position of the user’s eyeballs can be tracked in real time through certain algorithms (such as eye diagram video recording and corneal reflection) , Gaze direction, movement direction, movement speed, etc. When the user puts on the destination device (such as a VR device) and then moves his eyes, the picture he sees will move with the movement of his eyes, simulating the scene where the user moves his eyes to see a new picture, thereby obtaining immersive vision Experience.

(4) Motion tracking

Motion tracking is to track the user's motion by measuring the user's position and posture (ie pose) in the real environment, the speed, acceleration, and direction of movement in the real environment. In a specific implementation, for example, an inertial measurement unit (IMU) can be used to measure the translation and rotation of the user in the real environment, so as to realize the motion measurement in six degrees of freedom (6DoF). For another example, the accelerometer, gyroscope, or magnetometer can be used to measure the speed, acceleration, direction and other information of the user's movement in the real environment. For another example, a depth camera or SLAM (simultaneous localization and mapping) system can also be used to identify changes in the real environment, so as to determine changes in the user's own motion and real-time position. Through running tracking, you can also trigger the update of the visual screen, or trigger the user's interaction with the screen elements.

It should be noted that in the embodiments of the present application, the tracking method used can be any of the tracking methods described above, or a combination of multiple tracking methods, such as a combination of head tracking and eye tracking, head tracking and eye tracking. The combination of part tracking and motion tracking, etc., this application does not limit this.

Based on the system, equipment, tracking method, etc. described above, the following describes an information transmission method provided by an embodiment of the present invention that can be used to avoid the occurrence of black borders and jams. Please refer to FIG. 10, which is a schematic flowchart of an information transmission method provided by an embodiment of the present invention. Describe separately, including but not limited to the following steps:

S101. The decoding device detects and obtains tracking information.

Wherein, the tracking information is information generated by performing at least one of the following tracking methods on the decoding device: head tracking, gesture tracking, eye tracking, or motion tracking. The specific content of these tracking methods has been described above, and will not be repeated here. In a specific implementation, the tracking information of the decoding device includes at least one of the movement information and the pose information when the decoding device or the user's limbs are moving/moving/rotating, which can include the movement information or the pose information, or both at the same time. Information and pose information. The motion information may include velocity (linear velocity, angular velocity, etc.) and/or acceleration (linear acceleration, angular acceleration, etc.), and the pose information may include the position and/or posture (or direction) of the decoding device or the user. That is, the pose number information can represent the position and posture (or direction) of the decoding device in three-dimensional space. The position can be represented by the three coordinate axes x, y, and z in the three-dimensional coordinate system, and the direction can be represented by (α, β, γ) to indicate, (α, β, γ) to indicate the angle of rotation around the three coordinate axes.

S102. The decoding device sends the tracking information of the decoding device to the encoding device; correspondingly, the encoding device receives the tracking information of the decoding device.

S103. The encoding device configures encoding information of the image to be processed according to the tracking information of the decoding device.

Wherein, the image to be processed is an image that currently needs to be processed and transmitted to the encoding device side for display/interaction. The tracking information is associated with the encoding information of the image to be processed, and the encoding information includes one or more encoding parameters (or set of encoding parameters) that the encoder of the encoding device needs to use to encode the image to be processed.

These encoding parameters may include, for example, one or more of the following: an instruction to turn on or off the deblock filter (deblock_filter) function, the number of reference frames (Ref), the motion estimation search range (me_range), the motion estimation method (me_method), Sub-pixel subdivision intensity (subme), lookahead optimizer parameters, etc. They are described as follows:

(1) The on or off instruction of the deblock filter (deblock_filter) function is used to indicate whether to activate the deblock_filter function to perform deblock filtering on the reconstructed image. For example, deblock_filter=1 means that the function is turned on, and deblock_filter=0 means that the function is turned off. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting "deblock_filter=0", that is, the deblocking filter function is turned off, so that the encoder saves the work of the deblocking filter, reduces the coding complexity, thereby reduces the coding delay, and avoids black on the decoding end. Phenomena such as edge, lag, etc. Conversely, when the specific value is less than the preset threshold, configure "deblock_filter=1", and the deblocking filter function is turned on at this time. Since the motion of the decoding end is slow, the delay caused by turning on the deblocking filter function will not cause Phenomena such as black borders and stuttering.

(2) The number of reference frames (Ref) parameter is used to indicate the maximum number of reference frames, that is, the number of reference frames used in image prediction. The value range of the number of reference frames is, for example, 0-16. The larger the value, the more accurate the prediction. The greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting 0<Ref<2, it reduces the number of reference frames in the encoding prediction, reduces the encoding complexity, and thus reduces the encoding delay, so as to avoid black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 16≥Ref≥2, the number of reference frames in coding prediction increases, and the coding complexity increases. Due to the slower motion of the decoding end, the delay caused by increasing the number of reference frames Will not cause black bars, jams and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.

(3) The motion estimation search range (me_range) parameter is used to indicate the motion estimation radius in the image prediction, that is, the radius of the pixel block prediction search performed by the encoder. For example, the value range of the motion estimation radius can be 4 to 64. The larger the value, the larger the search range, the more accurate the prediction, and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. Time configuration 4≤me_range≤8, thereby reducing the motion estimation radius in the coding prediction, reducing the coding complexity, thereby reducing the coding delay, and avoiding black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, 8<me_range≤64 is configured, and the motion estimation radius in coding prediction increases, and the coding complexity increases. Since the motion of the decoding end is slower, the delay caused by increasing the motion estimation radius is not It will cause black borders, freezes and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.

(4) The motion estimation method (me_method) is used to indicate the setting of the full-pixel motion estimation method. The motion estimation method includes the motion search algorithm (such as the diamond search algorithm dia, the hexagon search algorithm hex, and the asymmetric cross multi-level hexagon Grid search algorithm umh, etc.), the more complex the motion search algorithm, the more accurate the prediction, and the more complex the calculation. Conversely, the simpler the motion search algorithm, the worse the prediction accuracy and the smaller the computational complexity.

For example, the general matching criterion for motion estimation is the rate-distortion optimization criterion, and the matching error function is J=SAD+λ·R, where λ is the Lagrangian number, and R represents the bits that may be consumed for encoding the motion vector difference image The calculation formula of SAD (Sum of Absolute Difference) is as follows: SAD=∑ _(x,y)εA |s[x,y]-s'[x,y]|. The point where J is the smallest is recorded as the minimum block distortion (MBD).

Different motion search algorithms use different search templates, and the MBD points on which they are based are different. Figure 11 shows some possible search templates. The black dots in the template represent the best prediction points found in this step. The search templates listed in the figure include small diamond templates, medium diamond templates, and hexagon templates. , Small square template, asymmetric cross template, 5*5 step-by-step search template, large hexagon template, regular octagon template, etc. It should be noted that in the specific implementation of this application, any other possible search templates can also be used, such as full search templates, three-step search templates, four-step search templates, etc., which are not limited in this application.

For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, attitude, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. Time configuration relatively simple motion estimation methods, such as the diamond search algorithm dia, the search algorithm is relatively simple, the amount of calculation is small, the coding complexity is reduced, and the coding delay is reduced, so as to avoid black bars and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, relatively complex motion estimation methods are configured, such as hexagon search algorithm hex, asymmetric cross multi-level hexagon grid search algorithm umh, etc., and the amount of calculation increases. That is to say, the coding complexity is increased. Due to the slow motion of the decoding end, the time delay caused by the complexity of the search algorithm will not cause black borders, jams and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.

(5) The sub-pixel subdivision intensity (subme) parameter is used to indicate the dynamic prediction and partitioning mode. The value range of this parameter can be, for example, 0-11. The larger the value, the more accurate the prediction and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting subme equal to 0 or 1, thereby reducing the coding complexity, thereby reducing the coding delay, to avoid black bars, jams and other phenomena at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 1<subme≤11 to increase coding complexity. Since the decoding end moves slowly, the delay caused will not cause black bars, jams, etc. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.

(6) The lookahead optimizer parameter is used to set the frame buffer size for thread prediction. The value range of this parameter is, for example, 0-250. The larger the value, the more accurate the prediction and the greater the computational complexity. Conversely, the smaller the value, the worse the prediction accuracy and the smaller the computational complexity. For example, in a specific implementation, the following design can be made: when the specific values in the tracking information (such as speed, acceleration, acceleration, position, posture, etc.) are greater than or equal to the preset threshold, it means that the decoding end moves faster. When setting 0≤lookahead<2, the frame buffer size is reduced, thereby reducing the coding complexity, thereby reducing the coding delay, and avoiding black borders and jams at the decoding end. Conversely, when the specific value is less than the preset threshold, configure 2≤lookahead≤250 to increase the size of the frame buffer, thereby increasing the coding complexity. Due to the slower motion of the decoding end, the delay caused will not cause black borders, Caton and other phenomena. It should be noted that the above examples are only used to explain the solution of the present application and not to limit it.

Based on the above analysis, it can be summarized as follows: In a possible embodiment of this application, for convenience, when the tracking information fed back by the decoding device is greater than or equal to a preset threshold, the encoded information mapped by the tracking information may be called the first encoding Information; when the tracking information is less than a preset threshold, the encoding information mapped by the tracking information is called second encoding information, then the first encoding information and the second encoding information can satisfy at least one of the following relationships : The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on; the reference in the first encoding information The number of frames is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information; the calculation amount of the motion estimation method in the first coded information Less than the calculation amount of the motion estimation method in the second coded information; the sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivided intensity in the second coded information; the advance optimizer in the first coded information The parameter is smaller than the advanced optimizer parameter in the second encoded information.

In the embodiment of the present application, the tracking information is associated with the encoding information, which means that there is a corresponding relationship between the tracking information and the encoding information, and the encoding device stores the relationship between the two.

For example, in some specific embodiments, the decoding device may store the mapping relationship between the tracking information and the encoding parameter set. In this way, after receiving the tracking information, the decoding device can find the corresponding encoding parameter and configure it to the encoder according to the mapping relationship. For example, there may be a direct mapping relationship between the tracking information and the encoding information, that is, the tracking information is bound to the encoding information, and the encoding information can be directly determined through the tracking information.

For another example, in some specific embodiments, the tracking information and the coded information may be indirectly related. For example, certain algorithm processing or conditional judgments are needed on the tracking information to determine the corresponding coded information. After the encoding device receives the specific tracking information uploaded by the decoding device, the corresponding encoding information can be determined according to the specific tracking information. Referring to FIG. 12, the decoding device can also make a judgment on a preset condition based on the tracking information, and determine the corresponding encoding parameter set based on the judgment result. For example, the decoding device pre-stores the mapping relationship between different data intervals and encoding parameter sets. After receiving the tracking information, the decoding device can determine the data interval where the speed/acceleration/position data in the tracking information is located according to the tracking information. Then find the corresponding encoding parameter set according to the mapping relationship.

It can be seen that, in this embodiment of the application, the encoding device can receive instructions that include at least one tracking information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then the encoding device can be adjusted according to the received tracking information. The encoding information (encoding parameters) of the encoder in the encoder is adjusted, thereby adjusting the computational complexity of the encoder.

S104. The encoding device performs image encoding on the image to be processed according to the encoding parameters configured in S104. The encoding process can be performed by the encoder in the encoding device. The specific coding process is not described here.

S105. The encoding device encodes the encoding parameter used to indicate the configuration of S104 into a code stream and sends it to the decoding device.

In the embodiment of the present application, since the encoding device can adjust the encoding parameters of the encoder according to the tracking information uploaded by the decoding device, relevant information about the encoding parameters can be encoded in the code stream to facilitate subsequent decoding by the encoding device.

It is understandable that the code stream also contains the image information obtained after encoding the image, so that the decoding device can reconstruct (decode) the image based on the image information. For example, the code stream sent to the decoding device contains the motion vector difference. Information (MVD), reference image index, etc., the specific content of image information can be implemented by referring to existing encoding methods, which are not limited in this application.

S106. The decoding device parses the code stream from the encoding device.

For example, since encoding parameter information and image information are encoded in the code stream, it is understandable that the decoding device can obtain this information by parsing the code stream.

S107. The decoding device decodes and displays the image according to the instruction information.

It is understandable that the decoding process of the decoding device can be regarded as the inverse process of encoding, and the decoding device can decode (decode) the image information to reconstruct the image and display it on the display device. The realization of the decoding process can refer to the existing decoding means.

It should be noted that the embodiments of this application mainly describe the solution from the implementation process of encoding and decoding, and the implementation of other image processing procedures (such as image preprocessing, image post-processing, etc.) can be implemented with reference to existing methods. The application will not be detailed.

It can be seen that after implementing the embodiments of the present application, the encoding device can receive instructions containing at least one information such as position/posture/linear velocity/angular velocity/acceleration fed back by the decoding device in real time, and then further transmit these instructions to the encoder. The encoder automatically adjusts the calculation complexity (encoding parameters) of the encoder according to the position/posture/linear velocity/angular velocity/acceleration of the decoding device, thereby adjusting the encoding delay of the encoder, thereby reducing the delay of the entire system. The image-related information and the configured encoding parameters are sent to the VR device so that the decoding device can decode and display normally. Since the black border phenomenon of the decoding device is closely related to the excessive system delay, and one of the main components of the delay is the encoding delay in the CloudVR system. Therefore, this application reduces the system delay by reducing the computational complexity of the encoder. Basically eliminate the possibility of black borders on the screen. In this solution, the VR device can decode and display images in real time and in time, so the display fluency of the VR device is also guaranteed, and the occurrence of jams is avoided. In addition, the embodiment of the application can also keep the image rendering at a better resolution. Rate, to ensure the user experience.

In order to better understand the solutions of the embodiments of the present application, a specific cloudVR scenario is used as an example to illustrate the information transmission method provided by the embodiments of the present application. Please refer to Figure 13. This method is described separately from the perspective of the encoding device side (or called the source device side) and the decoding device side (or called the destination device side). The encoding device can be a VR headset, and the decoding device can include the cloud In the server, the tracking information of the VR headset is the information obtained through head tracking (for example, the rotational angular velocity of the head, the pose, etc.), as shown in Fig. 11, the method includes but is not limited to the following steps:

S201. The VR head-mounted display detects the rotation angular velocity V and pose information of the VR head-mounted display by means of head tracking. For the specific implementation scheme, please refer to the content about head tracking in the previous article, which will not be repeated here.

S202. The VR head-mounted display sends information such as the rotational angular velocity V and the pose to the server.

S203: The server determines the image to be processed according to the pose information of the VR headset, and performs image rendering on the image to be processed.

Specifically, the server can predict the position of the image according to the position of the head display to determine the current image to be processed, and preprocess the image to be processed. The preprocessing may include, for example, image rendering, trimming, color format conversion, color correction, or de-processing. One or more of noise. The specific content of this part can be achieved with reference to existing methods.

S204. The server transmits the rotational angular velocity V of the VR headset to the internal encoder. After the encoder obtains the rotational angular velocity, it judges the relationship between the rotational angular velocity V and the preset threshold, so as to start the encoding parameter configuration function to adjust the encoding parameters and adjust the encoding complexity. Exemplarily, the preset threshold that can be preset may include T1 and T2, where T1<T2. When V>T2, S205-1 is subsequently executed; when T1≤V≤T2, S205-2 is subsequently executed; when V<T1, S205-3 is subsequently executed.

In S205-1, when V>T2, the encoder configures the first encoding parameter set.

Specifically, when the VR headset rotates very fast and exceeds T2, the encoder selects the first encoding parameter set according to the mapping relationship, and the configured first encoding parameter set may include, for example, one or more of the following: Turn off the deblock function in the encoder , The number of reference frames is modified to 1, the motion estimation search range is 4x4, the motion estimation method uses diamond search dia, and so on. Therefore, the coding calculation complexity is greatly reduced, and the coding delay is significantly reduced.

S205-2, when T1≤V≤T2, configure the second encoding parameter set.

Specifically, when the rotation speed of the VR headset is relatively fast, exceeding T1 but not exceeding T2, the encoder selects the second encoding parameter set according to the mapping relationship, and the configured second encoding parameter set may include, for example, one or more of the following: open deblock Function, the number of reference frames is increased to 2, the motion estimation search range is 8x8, the motion estimation method uses the hexagonal search algorithm hex, and so on. Therefore, the coding calculation complexity is reduced, and the coding delay is reduced.

S205-3, when V<T1, configure the third encoding parameter set.

Specifically, when the rotation speed of the VR headset is slower than T1, the encoder selects the third encoding parameter set according to the mapping relationship. The configured third encoding parameter set may include, for example, one or more of the following: open deblock function, reference frame The number is increased to 4, the motion estimation search range is 16x16, and the motion estimation method adopts the asymmetric cross-type multi-level hexagonal grid point search algorithm umh. At this time, the computational complexity of the encoder is relatively large, and the coding delay is relatively large.

In specific implementation, an exemplary implementation code is as follows:

It should be noted that, for related content regarding the configuration of the encoding parameters of the encoder, reference may also be made to the related description of the foregoing embodiment S103 in FIG. 10, which will not be repeated here.

S206: The encoder of the server loads the configured encoding parameters and starts to perform image encoding on the image to be processed.

S207. The encoder of the server encodes the encoded image information and the selected configuration encoding parameters into a code stream, and sends the encoded image information to the VR head display via the network. The image information may include, for example, motion vector difference information (MVD), reference image index, and other specific content, which can be implemented with reference to existing coding methods, which is not limited in this application.

S208, VR headset parsing code stream.

Specifically, since encoding parameter information and image information are encoded in the code stream, it is understandable that the decoding device can obtain this information by parsing the code stream.

S209, the VR head display performs image decoding and display. For details, reference may be made to the related description of the embodiment S107 in FIG. 10, which is not repeated here.

It should be noted that the two preset thresholds T1 and T2 in the embodiment of FIG. 13 and the encoding parameters specifically configured by the encoder are only used to explain the solution of the present application and not to limit it. In specific implementation, based on the technical ideas of the present application, different implementation schemes can also be designed. For example, FIG. 14 shows another implementation scheme. In this scheme, only a preset threshold T can be used, and the code configured in the encoder Parameters can include turning on and off the deblock function, the change in the number of structural vertices in display rendering, and so on. The implementation schemes of different forms and the schemes obtained based on the variants of the scheme of this application shall all belong to the protection scope of this application.

It can be seen that implementing the embodiment of the present invention, the server can adjust the calculation complexity of the encoder and optimize the parameter configuration of the encoder by judging the rotational angular velocity information fed back by the VR head-mounted display, thereby optimizing the encoding delay, and then optimizing the entire system when turning the head. Delay, significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning the head. When wearing a VR headset, the faster the head turning speed, the less obvious the human eye's perception of the quality of the image, the lower the CloudVR system latency, the less likely it is to perceive the black edge of the head turning. In this embodiment, if the head-mounted display rotates faster, the encoder can sacrifice a certain image quality at the expense of reducing encoding complexity, thereby reducing system delay; if the head-mounted display rotating speed is slow, the encoder can encode according to the original configuration , Without degrading image quality. This not only guarantees the low latency when the VR headset rotates quickly, and fundamentally reduces the possibility of black borders, but does not affect the image quality and experience of the VR headset when it is stationary or turning its head at low speed. At the same time, this embodiment The resolution of the image rendering can also be guaranteed to ensure the user experience. However, the existing technical methods do not reduce the system delay, so adverse reactions such as smearing and jitter will be introduced.

In addition, it should be noted that, in an optional embodiment of the present application, after the decoder of the decoding device completes the decoding of the image, it can also improve the image post-processing link (for example, the picture post-processor 32 in FIG. 1) To further reduce the system delay, so as to further avoid the occurrence of black edges and stalls.

Specifically, in the image post-processing link, the decoding device can start or close one or more processing algorithms in the image post-processing stage according to the tracking information of the decoding device, so as to adjust the computational complexity of the image post-processing link and reduce the black border Or the possibility of stuck phenomenon.

In some possible embodiments, the mapping relationship between the tracking information of the decoding device and one or more processing algorithms used by the picture post-processor can be preset. Then, the mapping relationship can be queried according to the tracking information of the decoding device. Determine the corresponding processing algorithm and configure it for the image post processor.

Among them, the one or more processing algorithms adopted by the image post processor may include, for example, at least one of the following algorithms: Standard-Dynamic Range (SDR) image algorithm, High-Dynamic Range (HDR) Image algorithms, image enhancement algorithms, image super-resolution algorithms, etc. Among them, the SDR image algorithm can be used to realize the gamma curve correction of the image or video. The HDR image algorithm can be used to provide more dynamic range and image details of the image, so that the image can better reflect the visual effect in the real environment. Image enhancement algorithms can be used to adjust the brightness, contrast, saturation, hue, etc. of the image to increase the clarity of the image and reduce noise. Image super-resolution algorithms can restore low-resolution images or image sequences to high-resolution images. It can be seen that the image quality of the decoding device can be improved through the above-mentioned one or more processing algorithms, thereby bringing a better image look and feel to the user.

The tracking information of the decoding device can be obtained from the history cache of the memory of the decoding device, or it can be transmitted back to the decoding device by the encoding device through the code stream. The tracking information of the decoding device includes at least one of motion information and pose information of the decoding device, the motion information includes the motion speed and/or acceleration of the decoding device, and the motion speed includes angular velocity and/or linear velocity. Speed, the acceleration includes angular acceleration and/or linear acceleration; the pose information includes position information and/or attitude information of the decoding device.

For example, if the decoding device is a VR head-display, and the tracking information is the rotation angular velocity V, for example, the rotation angular velocity threshold of the VR head-display can be set in the decoding device in advance. Then, the mapping relationship between the tracking information and one or more processing algorithms adopted by the picture post-processor can be the following example:

When the rotational angular velocity V exceeds the rotational angular velocity threshold, one or more processing algorithms are turned off, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, the image super-resolution algorithm, etc. is turned off. That is, when the VR headset is moving faster, the computational complexity can be reduced in the image post-processing link, thereby reducing the system delay, thereby avoiding the occurrence of black edges, jams and other phenomena.

When the rotational angular velocity V is less than the rotational angular velocity threshold, one or more processing algorithms are turned on, for example, at least one of the SDR image algorithm, the HDR image algorithm, the image enhancement algorithm, and the image super-resolution algorithm is turned on. That is to say, the computational complexity can be increased in the image post-processing link, thereby improving the image quality of the image output by the system, and bringing a better viewing experience to the user. Since the movement of the VR head-mounted display is relatively slow at this time, the time delay caused will not cause black edges, jams, etc.

It should be noted that the above examples are only used to explain the solution of the present application and not to limit it. In other implementations, multiple rotational angular velocity thresholds (for example, two or more) may also be designed to provide more mapping possibilities between tracking information and processing algorithms to adapt to more diversified application scenarios.

It can be seen that in implementing the embodiment of the present invention, the decoding device (such as a VR headset) adjusts one of the image post-processing links (such as a picture post-processor) based on the tracking information cached in the history or the tracking information returned in the code stream. One or more processing algorithms can be turned on or off to achieve the adjustment of computational complexity, further optimize the delay of the entire system turning head, significantly reduce the system delay of CloudVR, and help to slow down or even eliminate the black edge effect when turning heads. It can improve the image quality without black borders and jams, and bring users a better viewing experience.

The foregoing describes the method of the embodiment of the present invention in detail, and the device of the embodiment of the present invention is continued below.

Please refer to Figure 15. Figure 15 is a system provided by an embodiment of the present invention and a structural diagram of the encoding device 60 and the decoding device 70 in the system. The encoding device 60 and the decoding device 70 can communicate wirelessly. For communication, the encoding device 60 may include a parameter adjustment module 601, an encoding module 602, a receiving module 603, and a transmitting module 604. The decoding device 70 may include a tracking module 701, a decoding module 702, a display module 703, a transmitting module 704, and a receiving module 705,

The functions of each module of the encoding device 60 and the decoding device 70 are respectively described as follows.

For encoding device 60:

The receiving module 603 is configured to receive tracking information of the decoding device 70; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.

The parameter adjustment module 601 is configured to configure the encoding information of the image to be processed according to the tracking information of the decoding device 70; the tracking information is associated with the encoding information, and the encoding information includes the encoding information used for encoding the image to be processed. One or more encoding parameters.

The encoding module 602 is used to perform image encoding on the image to be processed.

The transmitting module 604 is configured to encode the encoding parameters and the encoded image information into a code stream and send to the decoding device 70.

For the decoding device 70:

The tracking module 701 is configured to obtain tracking information by tracking the decoding device 70 in real time, and the tracking information is information generated by performing at least one of the following operations on the decoding device 70: head tracking, gesture tracking, and eye tracking , Or motion tracking; the tracking information of the decoding device 70 includes at least one of motion information and pose information of the decoding device.

The transmitting module 704 is configured to send the tracking information of the decoding device 70 to the encoding device 60.

The receiving module 705 is configured to receive the code stream from the encoding device 60 to obtain encoding parameters and encoded image information.

The decoding module 702 is used for image decoding according to the encoded image information.

The display module 703 is used to display the decoded image.

It should be noted that the specific function realization of each module function of the encoding device 60 and the decoding device 70 can refer to the related description of the embodiment in Figure 10 or Figure 13 above. For example, for the encoding device 60, the receiving module 603 can be used to perform the tracking information of S102. For receiving, the parameter adjustment module 601 is used to perform S103, the encoding module 602 is used to perform S104, and the transmitting module 604 is used to perform S105 code stream transmission; for the decoding device 70, the tracking module 701 is used to perform S101, and the transmitting module 704 is used to perform The tracking information is sent in S102, the receiving module 705 is used to perform the code stream reception of S105 and S106, the decoding module 702 is used to perform image decoding in S107, and the display module 703 is used to display the image in S107. For the sake of brevity of the manual, I won't repeat it here.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions, and when the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a network site, computer, server, or data center. Transmission to another network site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, and may also be a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid state hard disk), and so on.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

Claims

An information transmission method, characterized in that the method is applied to an encoding device, and includes:

Receiving tracking information of a decoding device, where the tracking information includes motion information or pose information of the decoding device;

Configure encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters;

The image to be processed is encoded according to the encoding information, and a code stream is sent to the decoding device, where the code stream includes the one or more encoding parameters.
The method according to claim 1, wherein the configuring the encoding information of the image to be processed according to the tracking information of the decoding device specifically comprises:

Querying a preset mapping relationship according to the tracking information to obtain encoding information of the image to be processed, the preset mapping relationship including a mapping relationship from the tracking information to the encoding information;

Configure the encoding information.
The method according to claim 1 or 2, wherein the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, advance One or more of the optimizer parameters.
The method according to claim 3, wherein when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first encoding information; when the tracking information is less than the preset threshold, the tracking information The second encoding information is mapped, and the first encoding information and the second encoding information satisfy at least one of the following relationships:

The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;

The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;

The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;

The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;

The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
The method according to any one of claims 1 to 4, wherein the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or Motion tracking.
The method according to any one of claims 1 to 5, wherein the motion information of the decoding device includes a motion speed and/or acceleration of the decoding device, and the motion speed includes an angular velocity and/or a linear velocity, The acceleration includes angular acceleration and/or linear acceleration; the pose information includes position information and/or posture information.
The method according to any one of claims 1-6, wherein the decoding device comprises one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
An apparatus for encoding images, characterized in that the apparatus is applied to encoding equipment, and includes:

A receiving module for receiving tracking information of a decoding device; the tracking information includes motion information or pose information of the decoding device;

The parameter adjustment module is configured to configure the coding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the coding information, and the coding information includes one or more coding parameters;

An encoding module, configured to encode the image to be processed according to the encoding information;

The transmitting module is configured to send a code stream to the decoding device, where the code stream includes the one or more encoding parameters.
The device according to claim 8, wherein the parameter adjustment module is specifically configured to:

Querying a preset mapping relationship according to the tracking information to obtain encoding information of the image to be processed, the preset mapping relationship including a mapping relationship from the tracking information to the encoding information;

Configure the encoding information.
The device according to claim 8 or 9, wherein the one or more encoding parameters include deblocking filter parameters, number of reference frames, motion estimation search range, motion estimation mode, sub-pixel subdivision strength, and advance One or more of the optimizer parameters.
The device according to claim 10, wherein when the tracking information is greater than or equal to a preset threshold, the tracking information is mapped to the first coding information; when the tracking information is less than the preset threshold, the tracking information The second encoding information is mapped, and the first encoding information and the second encoding information satisfy at least one of the following relationships:

The deblocking filter parameter in the first encoding information is used to indicate that the deblocking filter is turned off, and the deblocking filter parameter in the second encoding information is used to indicate that the deblocking filter is turned on;

The number of reference frames in the first coded information is smaller than the number of reference frames in the second coded information; the motion estimation search range in the first coded information is smaller than the motion estimation search range in the second coded information;

The calculation amount of the motion estimation mode in the first coded information is less than the calculation amount of the motion estimation mode in the second coded information;

The sub-pixel subdivision intensity in the first coded information is smaller than the sub-pixel subdivision intensity in the second coded information;

The advance optimizer parameter in the first encoded information is smaller than the advance optimizer parameter in the second encoded information.
The apparatus according to any one of claims 8-11, wherein the tracking information is information generated by performing at least one of the following operations on the decoding device: head tracking, gesture tracking, eye tracking, or Motion tracking.
The apparatus according to any one of claims 8-12, wherein the motion information of the decoding device includes a motion speed and/or acceleration of the decoding device, and the motion speed includes an angular velocity and/or a linear velocity, The acceleration includes angular acceleration and/or linear acceleration.
The apparatus according to any one of claims 8-13, wherein the decoding device comprises one of a virtual reality VR device, an augmented reality AR device, a mixed reality MR device, or drone flight glasses.
A device for encoding an image, characterized in that the device includes: a memory, a processor, and a transceiver, wherein:

The transceiver is used to receive data from the outside world and send data to the outside world;

The memory is used to store program instructions and data;

The processor is configured to execute program instructions in the memory to implement the method described in any one of claims 1-7.
A system, characterized in that the system includes an encoding device and a decoding device, wherein:

The decoding device is configured to send tracking information of the decoding device to an encoding device; the tracking information includes motion information or pose information of the decoding device;

The encoding device is configured to configure the encoding information of the image to be processed according to the tracking information of the decoding device; the tracking information is associated with the encoding information, and the encoding information includes one or more encoding parameters; The encoding information encodes the image to be processed, and sends a code stream to the decoding device, where the code stream includes the one or more encoding parameters;

The decoding device is used for image decoding and display according to the code stream.
A computer-readable storage medium, wherein the computer-readable storage medium is used to store program code, and when the program code is executed by a computer, the computer is used to execute any one of claims 1-7. method.