CN115065827A - Video encoding method, video encoding device, electronic device, and medium - Google Patents

Video encoding method, video encoding device, electronic device, and medium Download PDF

Info

Publication number
CN115065827A
CN115065827A CN202210768613.4A CN202210768613A CN115065827A CN 115065827 A CN115065827 A CN 115065827A CN 202210768613 A CN202210768613 A CN 202210768613A CN 115065827 A CN115065827 A CN 115065827A
Authority
CN
China
Prior art keywords
video frame
pose information
information
pixel block
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210768613.4A
Other languages
Chinese (zh)
Inventor
高立鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202210768613.4A priority Critical patent/CN115065827A/en
Publication of CN115065827A publication Critical patent/CN115065827A/en
Priority to PCT/CN2023/102731 priority patent/WO2024002065A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel

Abstract

The application discloses a video coding method, a video coding device, electronic equipment and a video coding medium, and belongs to the technical field of video processing. Wherein, the method comprises the following steps: acquiring relative pose information of a first video frame and a second video frame, wherein the first video frame is a video frame to be coded, and the second video frame is a reference video frame; determining a motion vector and residual error information of a first pixel block according to the relative pose information, wherein the first pixel block is a pixel block to be coded in the first video frame; the motion vector and residual information are encoded.

Description

Video encoding method, video encoding device, electronic device, and medium
Technical Field
The present application belongs to the field of video processing technologies, and in particular, to a video encoding method, apparatus, electronic device, and medium.
Background
Generally, in the process of video encoding, if a pixel block in a frame of video frame is to be encoded, a global motion search algorithm may be first used to search a reference video frame for a matching block matching the pixel block, and then a motion vector and a residual of the pixel block are determined according to the matching block, so that the pixel block may be encoded according to the motion vector and the residual.
However, since a matching block matching the pixel block to be encoded needs to be searched for in the reference video frame, it may take a long time to search for the matching block, and thus it may take a long time to encode the pixel block to be encoded.
This results in a long time for video encoding.
Disclosure of Invention
An embodiment of the present application provides a video encoding method, an apparatus, an electronic device, and a medium, which can solve the problem of long time consumption in video encoding.
In a first aspect, an embodiment of the present application provides a video encoding method, where the method includes: acquiring relative pose information of a first video frame and a second video frame, wherein the first video frame is a video frame to be coded, and the second video frame is a reference video frame; determining a motion vector and residual error information of a first pixel block according to the relative pose information, wherein the first pixel block is a pixel block to be coded in the first video frame; the motion vector and residual information are encoded.
In a second aspect, an embodiment of the present application provides a video encoding apparatus, including: the device comprises an acquisition module, a determination module and an encoding module. The acquisition module is used for acquiring the relative pose information of a first video frame and a second video frame, wherein the first video frame is a video frame to be coded, and the second video frame is a reference video frame. And the determining module is used for determining the motion vector and the residual error information of the first pixel block according to the relative pose information acquired by the acquiring module. And the coding module is used for coding the motion vector and the residual error information determined by the determining module.
In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the steps of the method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the steps of the method according to the first aspect.
In this embodiment of the application, the electronic device may acquire relative pose information of the first video frame and the second video frame, and determine a motion vector and residual information of a first pixel block to be encoded in the first video frame according to the relative pose information, so that the electronic device may encode the motion vector and the residual information. The electronic equipment can determine the motion vector and the residual error information of the first pixel block directly according to the acquired relative pose information, and does not need to determine the motion vector and the residual error information of the first pixel block after the electronic equipment searches the pixel block matched with the first pixel block in the second video frame.
Drawings
Fig. 1 is a schematic flowchart of a video encoding method according to an embodiment of the present application;
fig. 2 is a second flowchart of a video encoding method according to an embodiment of the present application;
fig. 3 is a third schematic flowchart of a video encoding method according to an embodiment of the present application;
fig. 4 is a fourth flowchart illustrating a video encoding method according to an embodiment of the present application;
fig. 5 is a fifth flowchart illustrating a video encoding method according to an embodiment of the present application;
fig. 6 is a sixth flowchart illustrating a video encoding method according to an embodiment of the present application;
fig. 7 is a seventh schematic flowchart of a video encoding method according to an embodiment of the present application;
fig. 8 is a schematic diagram of a mapping relationship between a first pixel block and a second pixel block provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 11 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The video encoding method, apparatus, electronic device and medium provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
In the related art, in the process of video encoding, if a certain pixel block in a certain video frame is to be encoded, the size of the certain pixel block may be used as a sliding window, matching blocks matched with the certain pixel block are searched one by one in all pixel blocks in a reference video frame, and then a motion vector and a residual error of the certain pixel block are determined according to the matching blocks, so that the certain pixel block may be encoded according to the motion vector and the residual error. However, since it is necessary to search for a matching block matching a certain pixel block one by one in all pixel blocks of a reference video frame using the size of the certain pixel block as a sliding window, the number of the searched pixel blocks is large, the amount of calculation for searching the matching block is large, the time taken to determine the motion vector and the residual of the certain pixel block is long, and the time taken to encode the certain pixel block needs a long time, which results in a long time taken to encode the video.
In this embodiment of the present application, if a certain pixel block in a certain video frame is to be encoded, the electronic device may directly obtain the relative pose information of the certain video frame and the reference video frame, determine the motion vector and the residual information of the certain pixel block according to the relative pose information, and then encode the motion vector and the residual information. It can be understood that the electronic device can determine the motion vector and the residual error information of the certain pixel block directly according to the relative pose information of the certain video frame and the reference video frame, and the electronic device does not need to search for the matching blocks matched with the certain pixel block in all the pixel blocks of the second video frame one by taking the size of the certain pixel block as a sliding window, so that the time consumed by the electronic device for determining the motion vector and the residual error information of the certain pixel block can be reduced, the time consumed by the electronic device for encoding the certain pixel block can be reduced, and thus the time consumed by the electronic device for video encoding can be reduced.
Fig. 1 shows a flowchart of a video encoding method according to an embodiment of the present application. As shown in fig. 1, a video encoding method provided in an embodiment of the present application may include steps 101 to 103 described below.
Step 101, the electronic device acquires relative pose information of a first video frame and a second video frame.
In this embodiment, the electronic device may be any one of: a mobile phone, a tablet computer, a notebook computer, a wearable device, an extended reality (XR) device, and the like. The XR device may be specifically an XR head-display device, such as XR glasses, an XR helmet, etc.
In an embodiment of the present application, the first video frame is a video frame to be encoded, and the second video frame is a reference video frame.
The first video frame may specifically be a video frame currently being encoded by the electronic device; the second video frame may be a specific video frame or may be a video frame previous to the first video frame.
It should be noted that the above "previous video frame of the first video frame" may be understood as: a video frame preceding the first video frame in the sequence of acquired video frames.
Optionally, in this embodiment of the application, when the electronic device starts an XR-type application, a camera of the electronic device may be started, and N video frames are acquired by the camera, so that the electronic device may encode the N video frames, and when a first video frame is encoded, the electronic device may acquire relative pose information, where N is a positive integer. The N video frames may be specifically XR video frames.
It is to be understood that the first video frame and the second video frame may both be video frames of the N video frames.
Optionally, in this embodiment of the application, the electronic device may first obtain one pose information of the first video frame in the first coordinate system and another pose information of the second video frame in the first coordinate system, and then determine the relative pose information according to the one pose information and the another pose information.
The electronic device may acquire pose information of the first video frame in the first coordinate system and pose information of the second video frame in the first coordinate system through the following examples.
In one example, the electronic device may directly perform calculation according to any one of the first video frame and the second video frame to obtain pose information of the any one of the first video frame and the second video frame in the first coordinate system.
In another example, the electronic device may calculate pose information of any one of the first video frame and the second video frame in the first coordinate system based on any one of the first video frame and the second video frame and motion information corresponding to the any one of the first video frame and the second video frame (i.e., motion information acquired by an Inertial Measurement Unit (IMU) when the any one of the first video frame and the second video frame is acquired).
In yet another example, the electronic device may calculate pose information of any video frame in the first coordinate system based on the motion tracking image frame and motion information corresponding to the motion tracking image frame. Note that the timing of the motion tracking image frame is synchronized with the timing of any one of the first video frame and the second video frame. The motion information corresponding to the motion tracking image frame refers to the motion information acquired by the IMU when the motion tracking image frame is acquired.
When a user performs a video call through an XR-type application, such as a holographic video call, a remote expert system, a remote medical treatment, and the like, the electronic device may acquire a motion tracking image frame through at least one motion tracking camera, such as a fisheye camera, and synchronize a timing sequence of the motion tracking image frame with a timing sequence of any video frame through a timing synchronization module, so that the electronic device may obtain pose information of any video frame in a first coordinate system based on the motion tracking image frame and motion information corresponding to the motion tracking image frame.
In yet another example, the electronic device may directly acquire pose information of a previous video frame of any one of the first video frame and the second video frame in the first coordinate system, and then calculate to obtain the pose information of the any one video frame in the first coordinate system based on the pose information of the previous video frame in the first coordinate system and motion information corresponding to the any one video frame, where the motion information corresponding to the any one video frame refers to motion information acquired by the IMU when the any one video frame is acquired.
In this embodiment of the application, while the electronic device acquires the pose information of any one of the first video frame and the second video frame in the first coordinate system, the electronic device may further acquire the pose information of another video frame (i.e., a video frame other than the any one of the first video frame and the second video frame) through the above-mentioned multiple examples.
The mode of the electronic device for acquiring the pose information of any video frame in the first coordinate system may be the same as or different from the mode of acquiring the pose information of another video frame in the first coordinate system.
Exemplarily, the electronic device may perform calculation according to the first video frame to obtain pose information of the first video frame in the first coordinate system; calculating according to the second video frame to obtain pose information of the second video frame in the first coordinate system; namely, the electronic device acquires the pose information of the first video frame in the first coordinate system in the same manner as the pose information of the second video frame in the first coordinate system.
In another exemplary embodiment, the electronic device may perform calculation according to the first video frame to obtain pose information of the first video frame in the first coordinate system; and calculating to obtain the pose information of the second video frame in the first coordinate system based on the second video frame and the motion information corresponding to the second video frame, namely, the way of acquiring the pose information of the first video frame in the first coordinate system by the electronic equipment is different from the way of acquiring the pose information of the second video frame in the first coordinate system.
And 102, the electronic equipment determines the motion vector and residual error information of the first pixel block according to the relative pose information.
In this embodiment of the present application, the first pixel block is a pixel block to be encoded in a first video frame.
It is to be understood that the first pixel block may specifically be a pixel block currently being encoded by the electronic device.
The process by which the electronic device determines the motion vector and residual information for the first pixel block will be illustrated below.
Optionally, in this embodiment of the application, as shown in fig. 2 in combination with fig. 1, the step 102 may be specifically implemented by a step 102a and a step 102b described below.
And 102a, the electronic equipment determines a second pixel block matched with the first pixel block in the second video frame according to the relative pose information.
In the embodiment of the application, in the first video frame and the second video frame, any pair of matched feature points (pixel blocks) is located on epipolar lines of each other, that is, any pair of matched feature points (pixel blocks) satisfies epipolar constraints, so that the electronic device can determine the pixel block matched with the first pixel block in the second video frame according to the relative pose information.
Optionally, in this embodiment of the application, the electronic device may determine, according to the relative pose information, position information of the second pixel block by using the position information of the first pixel block, so as to determine the second pixel block.
The position information of the first pixel block may be: position information of the first pixel block in the first video frame; the position information of the second pixel block may be: position information of the second pixel block in the second video frame.
Alternatively, in this embodiment of the application, the step 102a may be specifically realized by the following step 102a1 and step 102a 2.
And step 102a1, the electronic equipment determines mapping parameters according to the relative pose information and the target internal reference matrix.
In an embodiment of the present application, the target internal reference matrix is an internal reference matrix of a first camera, and the first video frame and the second video frame are obtained by the first camera; the mapping parameter is used for indicating the mapping relation between the first video frame pixel block and the second video frame pixel block.
The first camera may specifically be: a Red Green Blue (RGB) camera.
Further optionally, in this embodiment of the application, the relative pose information may include a translation vector and a rotation matrix, for example, a third translation vector and a third rotation matrix in the following embodiments, where the translation vector is: a relative translation vector in the first coordinate system when the first video frame is acquired and when the second video frame is acquired, the rotation matrix being: a relative rotation matrix in a first coordinate system when acquiring the first video frame and when acquiring the second video frame; therefore, the electronic device can calculate a fourth translation vector according to the translation vector by using an antipodal constraint equation, and calculate a fourth rotation matrix according to the rotation matrix by using an antipodal constraint equation to obtain the mapping parameter. Wherein the first coordinate system can be a world coordinate system.
It is to be understood that the mapping parameters include a fourth translation vector and a fourth rotation matrix.
It should be noted that, for the description of the pole constraint equation, reference may be made to specific descriptions in the related art, and details are not repeated herein in the embodiments of the present application.
Step 102a2, the electronic device calculates the position information of the second pixel block according to the position information of the first pixel block and the mapping parameter.
Further optionally, the position information of the first pixel block may specifically be: coordinate information of the first pixel block in the first video frame; the position information of the second pixel block may specifically be: coordinate information of the second block of pixels in the second video frame.
Further optionally, in this embodiment of the application, the electronic device may translate the coordinate information of the first pixel block according to a fourth translation vector, and rotate the coordinate information of the first pixel block according to a fourth rotation matrix, so as to obtain the coordinate information of the second pixel block.
Therefore, when the electronic device determines the second pixel block corresponding to the first pixel block, the electronic device can determine the mapping parameters directly according to the relative pose information and the target internal reference matrix, and directly calculate the position information of the second pixel block according to the position information of the first pixel block by adopting the mapping parameters, without searching the second pixel block matched with the first pixel block in the second video frame, so that the time consumption for searching the second pixel block can be reduced, and the time consumption for video coding of the electronic device can be reduced.
And 102b, the electronic equipment determines the motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block.
Further optionally, in this embodiment of the application, the electronic device may calculate a motion vector of the first pixel block according to the position information of the first pixel block and the position information of the second pixel block.
Specifically, the electronic device may determine a difference value between the coordinate information of the first pixel block and the coordinate information of the second pixel block as the motion vector of the first pixel block.
In this embodiment, the residual information is used to indicate a residual between the first pixel block and the second pixel block.
Further optionally, in this embodiment of the application, the residual information specifically may include: luminance residual information and color difference residual information; the electronic device may thereby determine a difference between the luminance component of the first pixel block and the luminance component of the second pixel block as luminance residual information; and determining a difference value between the color difference component of the first pixel block and the color difference component of the second pixel block as color difference residual information to determine the residual information.
In the embodiment of the application, the electronic device can determine the second pixel block matched with the first pixel block in the second video frame directly according to the relative pose information without searching the second pixel block matched with the first pixel block in the second video frame, so that the electronic device can quickly determine the motion vector and the residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block, and the efficiency of the electronic device for acquiring the motion vector and the residual error information of the first pixel block can be improved.
And 103, the electronic equipment encodes the motion vector and the residual error information.
It should be noted that, for the description of the electronic device encoding the motion vector and the residual information, reference may be made to specific descriptions in related technologies, and details are not repeated herein in this embodiment of the application.
It is understood that, since the luminance and the color difference between the first video frame and the second video frame are similar, more 0 values may be included in the residual information, so that the more 0 values may be compressed by encoding the motion vector and the residual information at the electronic device to improve the efficiency of video compression.
Optionally, in this embodiment of the application, after the electronic device encodes the motion vector and the residual information, the electronic device may transmit the encoded video stream to a receiver of the XR-like application.
In the embodiment of the application, when the electronic device encodes the first pixel block in the first code video frame, the electronic device may acquire the relative pose information, and directly determine the motion vector and the residual error information of the first pixel block based on the relative pose information, so that the electronic device may encode the motion vector and the residual error information.
According to the video coding method provided by the embodiment of the application, the electronic device can acquire the relative pose information of the first video frame and the second video frame, and determine the motion vector and the residual error information of the first pixel block to be coded in the first video frame according to the relative pose information, so that the electronic device can code the motion vector and the residual error information. The electronic equipment can determine the motion vector and the residual error information of the first pixel block directly according to the acquired relative pose information, and does not need to determine the motion vector and the residual error information of the first pixel block after the electronic equipment searches the pixel block matched with the first pixel block in the second video frame.
Moreover, the electronic equipment does not need to search the second pixel block matched with the first pixel block in the second video frame, so that the consumption of computing power and electric quantity caused by the adoption of a search algorithm can be avoided; and in case that the user performs a video call (e.g., a holographic video call, a remote expert system, telemedicine, etc.) through the XR-like application, the real-time performance of the video call can be improved due to the reduction of time consumption for video encoding by the electronic device.
A detailed description will be given below of how the electronic device acquires the relative pose information.
Optionally, in this embodiment of the application, the first video frame and the second video frame are acquired by a second camera. Specifically, with reference to fig. 1, as shown in fig. 3, before the step 101, the video encoding method provided in the embodiment of the present application may further include the following step 201, and the step 101 may be specifically realized by the following step 101 a.
Step 201, the electronic device acquires first position and orientation information and second position and orientation information.
In this embodiment of the application, the first pose information is pose information of the first video frame in a first coordinate system; the second pose information is pose information of the second video frame in the first coordinate system.
Wherein the first coordinate system can be a world coordinate system.
Further optionally, in this embodiment of the application, the second camera may specifically be: RGB camera. This second camera can be the same camera with first camera.
The manner in which the electronic device is to acquire the first position information or the second position information will be described in detail below.
Optionally, in this embodiment of the application, with reference to fig. 3, as shown in fig. 4, the step 201 may be specifically implemented by a step 201a described below.
Step 201a, the electronic device calculates to obtain first target pose information according to the image information of the first target video frame.
In an embodiment of the present application, the first target video frame is a first video frame or a second video frame.
In the embodiment of the application, under the condition that the first target video frame is the first video frame, the first target pose information is first pose information; and under the condition that the first target video frame is the second video frame, the first target pose information is the second pose information.
Further optionally, in this embodiment of the application, the image information of the first target video frame may specifically include at least one of the following: position information of the feature points in the first target video frame, and depth information of the first target video frame.
Further optionally, in this embodiment of the application, the electronic device may use a simultaneous localization and mapping (SLAM) algorithm to calculate the first target pose information according to the image information of the first target video frame.
The electronic equipment can directly calculate the first pose information (or the second pose information) according to the image information of the first video frame (or the second video frame), so that the electronic equipment can quickly acquire the relative pose information, and the time consumption of video coding of the electronic equipment can be further reduced.
Optionally, in this embodiment of the application, with reference to fig. 3, as shown in fig. 5, the step 201 may be specifically implemented by the following steps 201b to 201 d.
Step 201b, the electronic device acquires the second target video frame and the first motion information.
In an embodiment of the present application, the second target video frame is a first video frame or a second video frame.
In this embodiment of the application, the first motion information is motion information acquired by the IMU when the second camera acquires the second target video frame.
Further optionally, in this embodiment of the application, the first motion information may be at least one of: acceleration information, angular velocity information.
Further optionally, in this embodiment of the application, when the user views an XR image or performs a video call through the XR-like application, the electronic device may acquire the second target video frame through the second camera, and acquire the first motion information through the IMU.
Step 201c, the electronic device calculates to obtain third pose information according to the second target video frame, the first motion information and the first external parameter matrix.
In an embodiment of the present application, the first external reference matrix is an external reference matrix between the second camera and the IMU; the third pose information is pose information of the IMU in the first coordinate system.
Further optionally, in this embodiment of the application, a first external parameter matrix is prestored in the electronic device, so that the electronic device may directly obtain the first external parameter matrix, and calculate to obtain third pose information according to the second target video frame, the first motion information, and the first external parameter matrix.
Further optionally, in this embodiment of the application, the electronic device may calculate, by using an SLAM algorithm, third pose information according to the second target video frame, the first motion information, and the first external parameter matrix.
And step 201d, the electronic equipment calculates to obtain second target pose information according to the third pose information and the first external parameter matrix.
In the embodiment of the application, under the condition that the second target video frame is the first video frame, the second target pose information is the first pose information; and under the condition that the second target video frame is the second video frame, the second target pose information is the second pose information.
It should be noted that, for the description that the electronic device obtains the pose information of the second object by calculation according to the third pose information and the first external reference matrix, reference may be made to specific descriptions in the related art, and details of the embodiment of the present application are not described herein again.
Therefore, when the electronic equipment acquires the first video frame or the second video frame, the first motion information can be acquired, so that the electronic equipment can directly calculate the first position posture information or the second position posture information, the electronic equipment can quickly acquire the relative position posture information, and time consumption of video coding of the electronic equipment can be further reduced.
Optionally, in this embodiment of the application, with reference to fig. 3, as shown in fig. 6, the step 201 may be specifically implemented by the following steps 201e to 201 g.
Step 201e, the electronic device obtains the motion tracking image frame and the second motion information.
In an embodiment of the application, the second motion information is motion information obtained by the IMU when the at least one third camera obtains the motion tracking image frame.
Further optionally, the at least one third camera may specifically include X third cameras, where X is a positive even number. Wherein each third camera may be a motion tracking camera.
Illustratively, the X third cameras may be two third cameras, or four third cameras, and so on.
Further optionally, in this embodiment of the application, the second motion information may be at least one of: acceleration information, angular velocity information.
Further optionally, in this embodiment of the application, the motion tracking image frame may be a first motion image frame or a second motion tracking image frame; therefore, under the condition that a user watches XR images or carries out video call through an XR application, the electronic equipment can acquire a first video frame through the second camera, acquire a first motion tracking image frame through at least one third camera and acquire second motion information through the IMU; alternatively, the electronic device may acquire the second video frame through the second camera, acquire the second motion tracking image frame through the at least one third camera, and acquire the second motion information through the IMU.
And step 201f, the electronic equipment calculates to obtain fourth pose information according to the motion tracking image frame, the second motion information and at least one second external parameter matrix.
In an embodiment of the application, the fourth pose information is pose information of the IMU in the first coordinate system, and the at least one second appearance parameter matrix is an appearance parameter matrix between the at least one third camera and the IMU.
Further optionally, in this embodiment of the application, the electronic device may calculate, by using a SLAM algorithm, fourth pose information according to the motion tracking image frame, the second motion information, and the at least one second external reference matrix.
And step 201g, the electronic equipment calculates third target pose information according to the fourth pose information and the first external parameter matrix.
In an embodiment of the present application, the first external reference matrix is an external reference matrix between the second camera and the IMU.
In the embodiment of the application, under the condition that the third target video frame is the first video frame, the third target pose information is the first pose information; under the condition that the third target video frame is the second video frame, the third target pose information is second pose information; and the motion tracking image frame and the third target video frame are synchronized in time sequence.
It is to be understood that, in the case where the third target video frame is the first video frame, the motion tracking image frame is the first motion tracking image frame; in a case where the third target video frame is the second video frame, the motion-tracking image frame is the second motion-tracking image frame.
Therefore, when the electronic equipment acquires the motion tracking image frame which is time-sequence synchronous with the first video frame (or the second video frame), the second motion information can be acquired, so that the electronic equipment can directly calculate to obtain the first position posture information or the second position posture information, the electronic equipment can quickly acquire the relative position posture information, and the time consumption of the electronic equipment for video coding can be further reduced.
Optionally, in this embodiment of the application, with reference to fig. 3, as shown in fig. 7, the step 201 may be specifically implemented by the following steps 201h to 201j
Step 201h, the electronic device acquires the second posture information and the third motion information.
In this embodiment of the application, the third motion information is motion information obtained by the IMU when the second camera obtains the first video frame.
Further optionally, in this embodiment of the application, the third motion information may be at least one of the following: acceleration information, angular velocity information.
In this embodiment, the second video frame is a previous video frame of the first video frame.
Under the condition that the user watches XR images or carries out video call through an XR application, the electronic equipment can acquire the first video frame through the second camera and acquire the third motion information through the IMU.
And step 201i, the electronic equipment performs integral calculation according to the second pose information and the third motion information to obtain fifth pose information.
In this embodiment of the application, the fifth pose information is pose information of the IMU in the first coordinate system.
And step 201j, the electronic equipment calculates to obtain first pose information according to the fifth pose information and the first external parameter matrix.
In an embodiment of the present application, the first external reference matrix is an external reference matrix between the second camera and the IMU.
It should be noted that, the electronic device may adopt the above steps 201h to 201j to obtain the second posture information. The electronic equipment can acquire the pose information and the fourth motion information of the last video frame of the second video frame in the first coordinate system, and according to the pose information and the fourth motion information of the last video frame in the first coordinate system, integral calculation is carried out to obtain the pose information of the IMU in the first coordinate system, so that the electronic equipment can calculate to obtain the second pose information according to the pose information and the first external reference matrix.
Therefore, as the electronic equipment can directly acquire the pose information of the second video frame in the first coordinate system and the third motion information acquired by the IMU when the second camera acquires the first video frame, the electronic equipment can directly calculate to acquire the first pose information or the second pose information, so that the electronic equipment can quickly acquire the relative pose information, and the time consumption of video coding of the electronic equipment can be further reduced.
Step 101a, the electronic device determines relative pose information according to the first pose information and the second pose information.
Further optionally, in this embodiment of the application, the electronic device may determine the relative pose information according to the first pose information and the second pose information by using a preset algorithm.
Therefore, the electronic equipment can directly determine the relative pose information according to the first pose information and the second pose information, so that the accuracy of acquiring the relative pose information can be improved, and the accuracy of determining the second pixel block corresponding to the first pixel block by the electronic equipment can be improved.
The following description will be given by taking an example in which the preset algorithm includes a first preset algorithm and a second preset algorithm.
Optionally, in this embodiment of the present application, the first pose information includes a first translation vector and a first rotation matrix, where the first translation vector is a translation vector of the second camera with respect to an origin of the first coordinate system when the second camera acquires the first video frame, and the first rotation matrix is a rotation matrix of the second camera with respect to a first coordinate axis of the first coordinate system when the second camera acquires the first video frame; the second pose information includes a second translation vector and a second rotation matrix, the second translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the second camera acquires the second video frame, and the second rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the second camera acquires the second video frame. Specifically, the step 101a may be realized by the following steps 101a1 to 101a 3.
Step 101a1, the electronic device determines a third rotation matrix according to the first rotation matrix and the second rotation matrix.
Further optionally, in this embodiment of the application, the first coordinate axis may specifically be an X axis, a Y axis, or a Z axis.
Further optionally, in this embodiment of the application, the electronic device may determine, by using a first preset algorithm, a third rotation matrix according to the first rotation matrix and the second rotation matrix.
Specifically, the first preset algorithm may specifically be: r is R ═ R 1 -1 R 2
Wherein R is a third rotation matrix, R 1 Is a first rotation matrix, R 2 Is a second rotation matrix.
In step 101a2, the electronic device calculates a third translation vector according to the first translation vector, the second translation vector and the first rotation matrix.
Further optionally, in this embodiment of the application, the electronic device may determine, by using a second preset algorithm, a third translation vector according to the first translation vector, the second translation vector, and the first rotation matrix.
Specifically, the second preset algorithm may specifically be: t ═ R 1 -1 (t 2 -t 1 );
Wherein t is a third translation vector, t 1 Is a first translation vector, t 2 As a second translation vector, R 2 Is a first rotation matrix.
In an embodiment of the present application, the relative pose information includes a third rotation matrix and a third translation vector.
Therefore, the electronic equipment can accurately determine the third translation vector and the third rotation matrix according to the first translation vector, the first rotation matrix, the second translation vector and the second rotation matrix so as to improve the accuracy of determining the relative pose information.
The manner in which the electronic device acquires the first posture information (or the second posture information) may be the same as or different from the manner in which the electronic device acquires the second posture information (or the first posture information).
For example, the electronic device may obtain the first pose information in the manner of example one and obtain the second pose information in the manner of example one, that is, the electronic device obtains the first pose information in the same manner as the electronic device obtains the second pose information.
Further exemplarily, the electronic device may obtain the first pose information in the manner of example one, and obtain the second pose information in the manner of example three, that is, the manner in which the electronic device obtains the first pose information is different from the manner in which the electronic device obtains the second pose information.
The following describes a video encoding method provided in an embodiment of the present application in two different scenarios.
Scene one, the user views XR images through XR-like applications.
In the case where the user views XR images via an XR-like application, the electronic device may turn on a camera (e.g., a first camera or a second camera) of the electronic device to capture N video frames, n pieces of motion information are acquired through the IMU, the N pieces of motion information correspond to the N video frames one by one, then, when encoding the first video frame of the N video frames, the electronic device may read a first external reference matrix between the camera (i.e., the first camera or the second camera) and the IMU, and calculates the first position and orientation information of the first video frame in the first coordinate system according to the first video frame, the motion information corresponding to the first video frame and the first external reference matrix, and calculating to obtain second position and posture information of the second video frame in the first coordinate system according to the second video frame (namely the last video frame of the first video frame in the N video frames), the motion information corresponding to the second video frame and the first external parameter matrix. The electronic equipment can determine the relative pose information of the first video frame and the second video frame according to the first pose information and the second pose information; at this time, the electronic device may read a target internal reference matrix of a camera (i.e., a first camera or a second camera) of the electronic device, and determine a mapping parameter according to the relative pose information and the target internal reference matrix, where the mapping parameter is used to indicate a mapping relationship between a first video frame pixel block and a second video frame pixel block, and then the electronic device may calculate position information of the second pixel block according to the position information and the mapping parameter of the first pixel block, and determine a motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block, and then the electronic device may encode the motion vector and the residual error information.
For example, as shown in fig. 8, it is assumed that the first pixel block is a 4 × 4 pixel block, and the position information of the first pixel block in the first video frame (e.g., I1) is (x, y), so that the electronic device can calculate the position information of the second pixel block in the second video frame (e.g., I2) according to the position information of the first pixel block and the mapping parameter, and the position information of the second pixel block in I2 is (x, y) ,y )。
And in the second scenario, the user carries out video call through XR application.
In the case that a user performs a video call through an XR-type application, the electronic device may start a camera (e.g., a first camera or a second camera) of the electronic device to collect N video frames, start at least one third camera of the electronic device to collect N motion tracking image frames, and obtain N motion information through the IMU, where the N motion information and the N motion tracking image frames correspond to each other one by one, so that the electronic device may synchronize a timing sequence of each video frame with a timing sequence of each motion tracking image frame, and then read an appearance parameter matrix between the at least one third camera and the IMU, so that the electronic device may determine first pose information of the first video frame in a first coordinate system based on a first motion tracking image frame synchronized with the timing sequence of the first video frame, motion information corresponding to the first motion tracking image frame, and at least one second appearance parameter matrix, and determining second pose information of the second video frame in the first coordinate system based on a second motion tracking image frame time-series synchronized with the first video frame, motion information corresponding to the second motion tracking image frame, and at least one second external reference matrix. The electronic equipment can determine the relative pose information of the first video frame and the second video frame according to the first pose information and the second pose information; at this time, the electronic device may read a target internal reference matrix of a camera (i.e., a first camera or a second camera) of the electronic device, and determine a mapping parameter according to the relative pose information and the target internal reference matrix, where the mapping parameter is used to indicate a mapping relationship between a first video frame pixel block and a second video frame pixel block, and then the electronic device may calculate position information of the second pixel block according to the position information and the mapping parameter of the first pixel block, and determine a motion vector and residual information of the first pixel block according to the position information of the first pixel block and the second pixel block, and then the electronic device may encode the motion vector and the residual information, and transmit a video stream obtained by encoding to a receiver of the XR-type application.
In the video encoding method provided by the embodiment of the present application, the execution subject may be a video encoding apparatus. In the embodiments of the present application, a video encoding method performed by a video encoding apparatus is taken as an example to describe the video encoding apparatus provided in the embodiments of the present application.
Fig. 9 shows a schematic diagram of a possible structure of a video encoding apparatus according to an embodiment of the present application. As shown in fig. 9, the video encoding apparatus 60 may include: an acquisition module 61, a determination module 62 and an encoding module 63.
The acquiring module 61 is configured to acquire relative pose information of a first video frame and a second video frame, where the first video frame is a video frame to be encoded, and the second video frame is a reference video frame. And a determining module 62, configured to determine a motion vector and residual information of the first pixel block according to the relative pose information acquired by the acquiring module 61. And an encoding module 63 for encoding the motion vector and the residual information determined by the determining module 62.
In a possible implementation manner, the determining module 62 is specifically configured to determine, according to the relative pose information, a second pixel block in the second video frame, where the second pixel block matches the first pixel block; and determining the motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block.
In a possible implementation manner, the determining module 62 is specifically configured to determine the mapping parameter according to the relative pose information and a target internal reference matrix, where the target internal reference matrix is an internal reference matrix of the first camera, and the first video frame and the second video frame are acquired by the first camera; the mapping parameter is used for indicating the mapping relation between the first video frame pixel block and the second video frame pixel block; and calculating the position information of the second pixel block according to the position information and the mapping parameters of the first pixel block.
In a possible implementation manner, the first video frame and the second video frame are acquired by a second camera. The obtaining module 61 is specifically configured to obtain first pose information and second pose information, where the first pose information is pose information of a first video frame in a first coordinate system, and the second pose information is pose information of a second video frame in the first coordinate system. The determining module 62 is specifically configured to determine the relative pose information according to the first pose information and the second pose information acquired by the acquiring module 61.
In a possible implementation manner, the determining module 62 is further configured to calculate first object pose information according to image information of the first object video frame. Under the condition that the first target video frame is the first video frame, the first target pose information is first pose information; and under the condition that the first target video frame is the second video frame, the first target pose information is second pose information.
In a possible implementation manner, the obtaining module 61 is further configured to obtain a second target video frame and first motion information, where the first motion information is motion information obtained by the IMU when the second camera obtains the target video frame. The determining module 62 is further configured to calculate, according to the second target video frame, the first motion information, and the first external reference matrix acquired by the acquiring module 61, third pose information, where the third pose information is pose information of the IMU in the first coordinate system, and the first external reference matrix is an external reference matrix between the second camera and the IMU; and calculating to obtain second target pose information according to the third pose information and the first external parameter matrix. Under the condition that the second target video frame is the first video frame, the second target pose information is first pose information; and under the condition that the second target video frame is the second video frame, the second target pose information is the second pose information.
In a possible implementation manner, the obtaining module 61 is further configured to obtain a motion tracking image frame and second motion information, where the second motion information is motion information obtained by the IMU when the at least one third camera obtains the motion tracking image frame. The determining module 62 is further configured to calculate fourth pose information according to the motion tracking image frame, the second motion information, and at least one second external reference matrix acquired by the acquiring module 61, where the fourth pose information is pose information of the IMU in the first coordinate system, and the at least one second external reference matrix is an external reference matrix between at least one third camera and the IMU; and calculating to obtain third target pose information according to the fourth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU. Under the condition that the third target video frame is the first video frame, the third target pose information is first pose information; under the condition that the third target video frame is the second video frame, the third target pose information is second pose information; the motion tracking image frame and the third target video frame are synchronized in time sequence.
In a possible implementation manner, the obtaining module 61 is further configured to obtain second pose information and third motion information, where the third motion information is motion information obtained by the IMU when the second camera obtains the first video frame. The determining module 62 is further configured to perform integral calculation according to the second pose information and the third motion information acquired by the acquiring module 61 to obtain fifth pose information, where the fifth pose information is pose information of the IMU in the first coordinate system; and calculating to obtain first pose information according to the fifth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU.
In one possible implementation, the first attitude information includes a first translation vector and a first rotation matrix, and the second attitude information includes a second translation vector and a second rotation matrix. The determining module 62 is specifically configured to calculate a third rotation matrix according to the first rotation matrix and the second rotation matrix; and calculating a third translation vector according to the first translation vector, the second translation vector and the first rotation matrix. The relative pose information comprises a third rotation matrix and a third translation vector; the first translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the first video frame is acquired, and the first rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the first video frame is acquired; the second translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the second camera acquires the second video frame, and the second rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the second camera acquires the second video frame.
According to the video coding device provided by the embodiment of the application, the video coding device can determine the motion vector and the residual error information of the first pixel block directly according to the acquired relative pose information, and the motion vector and the residual error information of the first pixel block are determined after the video coding device searches the pixel block matched with the first pixel block in the second video frame, so that the time consumed by the video coding device for determining the motion vector and the residual error information of the first pixel block can be reduced, the time consumed by the video coding device for coding the first pixel block can be reduced, and the time consumed by the video coding device for coding the video can be reduced.
The video encoding apparatus in the embodiment of the present application may be an electronic device, and may also be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic device may be, for example, a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) device, a robot, a wearable device, a super-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, or the like, and the embodiments of the present application are not limited in particular.
The video encoding apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.
The video encoding apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 8, and is not described herein again to avoid repetition.
Optionally, in this embodiment, as shown in fig. 10, an electronic device 70 is further provided in this embodiment, and includes a processor 71 and a memory 72, where a program or an instruction that can be executed on the processor 71 is stored in the memory 72, and when the program or the instruction is executed by the processor 71, the process steps of the video coding method embodiment are implemented, and the same technical effect can be achieved, and are not described again here to avoid repetition.
It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic device and the non-mobile electronic device described above.
Fig. 11 is a schematic hardware structure diagram of an electronic device implementing the embodiment of the present application.
The electronic device 1100 includes, but is not limited to: a radio frequency unit 1101, a network module 1102, an audio output unit 1103, an input unit 1104, a sensor 1105, a display unit 1106, a user input unit 1107, an interface unit 1108, a memory 1109, a processor 1110, and the like.
Those skilled in the art will appreciate that the electronic device 1100 may further include a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 1110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. 11 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.
The processor 1110 is configured to acquire relative pose information of a first video frame and a second video frame, where the first video frame is a video frame to be encoded, and the second video frame is a reference video frame; determining a motion vector and residual error information of a first pixel block according to the relative pose information, wherein the first pixel block is a pixel block to be coded in the first video frame; the motion vector and residual information are encoded.
According to the electronic device provided by the embodiment of the application, the electronic device can determine the motion vector and the residual error information of the first pixel block directly according to the acquired relative pose information, and the electronic device does not need to determine the motion vector and the residual error information of the first pixel block after searching the pixel block matched with the first pixel block in the second video frame, so that the time consumed by the electronic device for determining the motion vector and the residual error information of the first pixel block can be reduced, the time consumed by the electronic device for encoding the first pixel block can be reduced, and the time consumed by the electronic device for video encoding can be reduced.
Optionally, in this embodiment of the present application, the processor 1110 is specifically configured to determine, according to the relative pose information, a second pixel block, which is matched with the first pixel block, in the second video frame; and determining the motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block.
Therefore, the electronic equipment can determine the second pixel block matched with the first pixel block in the second video frame directly according to the relative pose information without searching the second pixel block matched with the first pixel block in the second video frame, so that the electronic equipment can rapidly determine the motion vector and the residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block, and the efficiency of the electronic equipment for acquiring the motion vector and the residual error information of the first pixel block can be improved.
Optionally, in this embodiment of the present application, the processor 1110 is specifically configured to determine a mapping parameter according to the relative pose information and a target internal reference matrix, where the target internal reference matrix is an internal reference matrix of the first camera, and the first video frame and the second video frame are acquired by the first camera; the mapping parameter is used for indicating the mapping relation between the first video frame pixel block and the second video frame pixel block; and calculating the position information of the second pixel block according to the position information and the mapping parameters of the first pixel block.
Therefore, when the electronic device determines the second pixel block corresponding to the first pixel block, the electronic device can determine the mapping parameters directly according to the relative pose information and the target internal reference matrix, and directly calculate the position information of the second pixel block according to the position information of the first pixel block by adopting the mapping parameters, without searching the second pixel block matched with the first pixel block in the second video frame, so that the time consumption for searching the second pixel block can be reduced, and the time consumption for video coding of the electronic device can be reduced.
Optionally, in this embodiment of the application, the first video frame and the second video frame are acquired by a second camera.
The processor 1110 is further configured to obtain first pose information and second pose information, where the first pose information is pose information of the first video frame in the first coordinate system, and the second pose information is pose information of the second video frame in the first coordinate system.
Processor 1110 is specifically configured to determine relative pose information according to the first pose information and the second pose information.
Therefore, the electronic equipment can directly determine the relative pose information according to the first pose information and the second pose information, so that the accuracy of acquiring the relative pose information can be improved, and the accuracy of determining the second pixel block corresponding to the first pixel block by the electronic equipment can be improved.
Optionally, in this embodiment of the application, the processor 1110 is further configured to calculate to obtain first target pose information according to image information of the first target video frame.
Under the condition that the first target video frame is the first video frame, the first target pose information is first pose information; and under the condition that the first target video frame is the second video frame, the first target pose information is the second pose information.
Therefore, the electronic equipment can calculate the first pose information (or the second pose information) directly according to the image information of the first video frame (or the second video frame), so that the electronic equipment can rapidly acquire the relative pose information, and the time consumption of video coding of the electronic equipment can be further reduced.
Optionally, in this embodiment of the present application, the processor 1110 is further configured to obtain a second target video frame and first motion information, where the first motion information is motion information obtained by the IMU when the second camera obtains the target video frame; calculating to obtain third pose information according to the second target video frame, the first motion information and the first external parameter matrix, wherein the third pose information is pose information of the IMU under the first coordinate system, and the first external parameter matrix is an external parameter matrix between the second camera and the IMU; and calculating to obtain second target pose information according to the third pose information and the first external parameter matrix.
Under the condition that the second target video frame is the first video frame, the second target pose information is first pose information; and under the condition that the second target video frame is the second video frame, the second target pose information is the second pose information.
Therefore, when the electronic equipment acquires the first video frame or the second video frame, the first motion information can be acquired, so that the electronic equipment can directly calculate the first position posture information or the second position posture information, the electronic equipment can quickly acquire the relative position posture information, and time consumption of video coding of the electronic equipment can be further reduced.
Optionally, in this embodiment of the application, the processor 1110 is further configured to acquire a motion tracking image frame and second motion information, where the second motion information is motion information acquired by the IMU when the at least one third camera acquires the motion tracking image frame; calculating to obtain fourth pose information according to the motion tracking image frame, the second motion information and at least one second external parameter matrix, wherein the fourth pose information is pose information of the IMU in the first coordinate system, and the at least one second external parameter matrix is an external parameter matrix between at least one third camera and the IMU; and calculating to obtain third target pose information according to the fourth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU.
Under the condition that the third target video frame is the first video frame, the third target pose information is first pose information; under the condition that the third target video frame is the second video frame, the third target pose information is second pose information; the motion tracking image frame and the third target video frame are synchronized in time sequence.
Therefore, when the electronic equipment acquires the motion tracking image frame which is time-sequence synchronous with the first video frame (or the second video frame), the second motion information can be acquired, so that the electronic equipment can directly calculate to obtain the first position posture information or the second position posture information, the electronic equipment can quickly acquire the relative position posture information, and the time consumption of the electronic equipment for video coding can be further reduced.
Optionally, in this embodiment of the application, the processor 1110 is further configured to obtain second pose information and third motion information, where the third motion information is motion information obtained by the IMU when the second camera obtains the first video frame; according to the second pose information and the third motion information, performing integral calculation to obtain fifth pose information, wherein the fifth pose information is pose information of the IMU in the first coordinate system; and calculating to obtain first pose information according to the fifth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU.
Therefore, as the electronic equipment can directly acquire the pose information of the second video frame in the first coordinate system and the third motion information acquired by the IMU when the second camera acquires the first video frame, the electronic equipment can directly calculate to acquire the first pose information or the second pose information, so that the electronic equipment can quickly acquire the relative pose information, and the time consumption of video coding of the electronic equipment can be further reduced.
Optionally, in this embodiment of the application, the first posture information includes a first translation vector and a first rotation matrix, and the second posture information includes a second translation vector and a second rotation matrix.
A processor 1110, specifically configured to calculate a third rotation matrix according to the first rotation matrix and the second rotation matrix; and calculating a third translation vector according to the first translation vector, the second translation vector and the first rotation matrix.
The relative pose information comprises a third rotation matrix and a third translation vector; the first translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the first video frame is acquired, and the first rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the first video frame is acquired; the second translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the second camera acquires the second video frame, and the second rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the second camera acquires the second video frame.
Therefore, the electronic equipment can accurately determine the third translation vector and the third rotation matrix according to the first translation vector, the first rotation matrix, the second translation vector and the second rotation matrix so as to improve the accuracy of determining the relative pose information.
It should be understood that in the embodiment of the present application, the input unit 1104 may include a Graphics Processing Unit (GPU) 11041 and a microphone 11042, and the graphics processor 11041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1106 may include a display panel 11061, and the display panel 11061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1107 includes at least one of a touch panel 11071 and other input devices 11072. A touch panel 11071, also called a touch screen. The touch panel 11071 may include two portions of a touch detection device and a touch controller. Other input devices 11072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
The memory 1109 may be used to store software programs as well as various data. The memory 1109 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, an application program or instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1109 may include volatile memory or nonvolatile memory, or the memory 1109 may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memories may be Random Access Memories (RAMs), Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (ddr SDRAM), enhanced SDRAM (enhanced SDRAM, ESDRAM), synched SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 1109 in the embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 1110 may include one or more processing units; optionally, the processor 1110 integrates an application processor, which primarily handles operations related to the operating system, user interface, applications, etc., and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 1110.
The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned video encoding method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video encoding method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the above-mentioned video encoding method embodiments, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the present embodiments are not limited to those precise embodiments, which are intended to be illustrative rather than restrictive, and that various changes and modifications may be effected therein by one skilled in the art without departing from the scope of the appended claims.

Claims (20)

1. A method of video encoding, the method comprising:
acquiring relative pose information of a first video frame and a second video frame, wherein the first video frame is a video frame to be coded, and the second video frame is a reference video frame;
determining a motion vector and residual error information of a first pixel block according to the relative pose information, wherein the first pixel block is a pixel block to be coded in the first video frame;
encoding the motion vector and the residual information.
2. The method of claim 1, wherein determining motion vectors and residual information for the first pixel block based on the relative pose information comprises:
determining a second pixel block matched with the first pixel block in the second video frame according to the relative pose information;
and determining the motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block.
3. The method according to claim 2, wherein the determining a second pixel block in the second video frame that matches the first pixel block according to the relative pose information comprises:
determining mapping parameters according to the relative pose information and a target internal reference matrix, wherein the target internal reference matrix is an internal reference matrix of a first camera, and the first video frame and the second video frame are acquired by the first camera; the mapping parameter is used for indicating the mapping relation between the first video frame pixel block and the second video frame pixel block;
and calculating the position information of the second pixel block according to the position information of the first pixel block and the mapping parameters.
4. The method of claim 1, wherein the first video frame and the second video frame are captured by a second camera, and wherein capturing relative pose information of the first video frame and the second video frame comprises:
acquiring first pose information and second pose information, wherein the first pose information is pose information of the first video frame in a first coordinate system, and the second pose information is pose information of the second video frame in the first coordinate system;
the acquiring of the relative pose information of the first video frame and the second video frame includes:
and determining the relative pose information according to the first pose information and the second pose information.
5. The method of claim 4, further comprising:
calculating to obtain first target pose information according to the image information of the first target video frame;
wherein, when the first target video frame is the first video frame, the first target pose information is the first pose information;
and under the condition that the first target video frame is the second video frame, the first target pose information is the second pose information.
6. The method of claim 4, further comprising:
acquiring a second target video frame and first motion information, wherein the first motion information is the motion information acquired by an inertial measurement unit IMU when a second camera acquires the target video frame;
calculating to obtain third pose information according to the second target video frame, the first motion information and a first external parameter matrix, wherein the third pose information is pose information of the IMU under the first coordinate system, and the first external parameter matrix is an external parameter matrix between the second camera and the IMU;
calculating to obtain second target pose information according to the third pose information and the first external parameter matrix;
wherein, when the second target video frame is the first video frame, the second target pose information is the first pose information;
and under the condition that the second target video frame is the second video frame, the second target pose information is the second pose information.
7. The method of claim 4, further comprising:
acquiring a motion tracking image frame and second motion information, wherein the second motion information is the motion information acquired by the IMU when at least one third camera acquires the motion tracking image frame;
calculating to obtain fourth pose information according to the motion tracking image frame, the second motion information and at least one second external reference matrix, wherein the fourth pose information is pose information of the IMU under the first coordinate system, and the at least one second external reference matrix is an external reference matrix between the at least one third camera and the IMU;
calculating third target pose information according to the fourth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU;
wherein, in case that a third target video frame is the first video frame, the third target pose information is the first pose information;
when a third target video frame is the second video frame, the third target pose information is the second pose information;
time sequence synchronization of the motion tracking image frame and the third target video frame.
8. The method of claim 4, further comprising:
acquiring second position and posture information and third motion information, wherein the third motion information is motion information acquired by the IMU when the second camera acquires the first video frame;
according to the second pose information and the third motion information, performing integral calculation to obtain fifth pose information, wherein the fifth pose information is pose information of the IMU in the first coordinate system;
and calculating to obtain first pose information according to the fifth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU.
9. The method of claim 4, wherein the first pose information comprises a first translation vector and a first rotation matrix, and wherein the second pose information comprises a second translation vector and a second rotation matrix;
determining the relative pose information according to the first pose information and the second pose information includes:
calculating a third rotation matrix according to the first rotation matrix and the second rotation matrix;
calculating a third translation vector according to the first translation vector, the second translation vector and the first rotation matrix;
wherein the relative pose information includes the third rotation matrix and the third translation vector;
the first translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the first video frame is acquired, and the first rotation matrix is a rotation matrix of the second camera relative to a first coordinate axis of the first coordinate system when the first video frame is acquired;
the second translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the second video frame is acquired, and the second rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the second video frame is acquired.
10. A video encoding apparatus, characterized in that the video encoding apparatus comprises: the device comprises an acquisition module, a determination module and a coding module;
the acquisition module is used for acquiring relative pose information of a first video frame and a second video frame, wherein the first video frame is a video frame to be coded, and the second video frame is a reference video frame;
the determining module is configured to determine a motion vector and residual information of the first pixel block according to the relative pose information acquired by the acquiring module;
and the encoding module is used for encoding the motion vector and the residual error information determined by the determining module.
11. The video coding apparatus according to claim 10, wherein the determining module is specifically configured to determine a second pixel block in the second video frame that matches the first pixel block according to the relative pose information; and determining the motion vector and residual error information of the first pixel block according to the position information of the first pixel block and the second pixel block.
12. The video encoding apparatus according to claim 11, wherein the determining module is specifically configured to determine the mapping parameters according to the relative pose information and an object internal reference matrix, where the object internal reference matrix is an internal reference matrix of a first camera, and the first video frame and the second video frame are obtained by the first camera; the mapping parameter is used for indicating the mapping relation between the first video frame pixel block and the second video frame pixel block; and calculating the position information of the second pixel block according to the position information of the first pixel block and the mapping parameter.
13. The video encoding device of claim 10, wherein the first video frame and the second video frame are acquired by a second camera;
the acquiring module is specifically configured to acquire first pose information and second pose information, where the first pose information is pose information of the first video frame in a first coordinate system, and the second pose information is pose information of the second video frame in the first coordinate system;
the determining module is specifically configured to determine the relative pose information according to the first pose information and the second pose information acquired by the acquiring module.
14. The video coding device of claim 13, wherein the determining module is further configured to calculate first target pose information according to image information of the first target video frame;
wherein, when the first target video frame is the first video frame, the first target pose information is the first pose information;
and under the condition that the first target video frame is the second video frame, the first target pose information is the second pose information.
15. The video encoding apparatus of claim 13, wherein the obtaining module is further configured to obtain a second target video frame and first motion information, where the first motion information is obtained by an inertial measurement unit IMU when a second camera obtains the target video frame;
the determination module is further configured to calculate, according to the second target video frame, the first motion information, and a first extrinsic parameter matrix that are obtained by the obtaining module, third pose information that is pose information of the IMU in the first coordinate system, and the first extrinsic parameter matrix that is an extrinsic parameter matrix between the second camera and the IMU; calculating to obtain second target pose information according to the third pose information and the first external parameter matrix;
wherein, when the second target video frame is the first video frame, the second target pose information is the first pose information;
and under the condition that the second target video frame is the second video frame, the second target pose information is the second pose information.
16. The video encoding apparatus of claim 13, wherein the obtaining module is further configured to obtain a motion tracking image frame and second motion information, where the second motion information is motion information obtained by the IMU when the at least one third camera obtains the motion tracking image frame;
the determining module is further configured to calculate fourth pose information according to the motion tracking image frame, the second motion information, and at least one second appearance matrix acquired by the acquiring module, where the fourth pose information is pose information of the IMU in the first coordinate system, and the at least one second appearance matrix is an appearance matrix between the at least one third camera and the IMU; calculating third target pose information according to the fourth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU;
wherein, in case that a third target video frame is the first video frame, the third target pose information is the first pose information;
when a third target video frame is the second video frame, the third target pose information is the second pose information;
time sequence synchronization of the motion tracking image frame and the third target video frame.
17. The video encoding apparatus of claim 13, wherein the obtaining module is further configured to obtain second pose information and third motion information, where the third motion information is motion information obtained by the IMU when the second camera obtains the first video frame;
the determining module is further configured to perform integral calculation according to the second pose information and the third motion information acquired by the acquiring module to obtain fifth pose information, where the fifth pose information is pose information of the IMU in the first coordinate system; and calculating to obtain first pose information according to the fifth pose information and a first external parameter matrix, wherein the first external parameter matrix is an external parameter matrix between the second camera and the IMU.
18. The video encoding device of claim 13, wherein the first pose information comprises a first translation vector and a first rotation matrix, and wherein the second pose information comprises a second translation vector and a second rotation matrix;
the determining module is specifically configured to calculate a third rotation matrix according to the first rotation matrix and the second rotation matrix; calculating a third translation vector according to the first translation vector, the second translation vector and the first rotation matrix;
wherein the relative pose information includes the third rotation matrix and the third translation vector;
the first translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the first video frame is acquired, and the first rotation matrix is a rotation matrix of the second camera relative to a first coordinate axis of the first coordinate system when the first video frame is acquired;
the second translation vector is a translation vector of the second camera relative to the origin of the first coordinate system when the second video frame is acquired, and the second rotation matrix is a rotation matrix of the second camera relative to the first coordinate axis of the first coordinate system when the second video frame is acquired.
19. An electronic device, characterized in that it comprises a processor and a memory, said memory storing a program or instructions executable on said processor, said program or instructions, when executed by said processor, implementing the steps of the video coding method according to any one of claims 1 to 9.
20. A readable storage medium, on which a program or instructions are stored which, when executed by a processor, implement the steps of the video encoding method according to any one of claims 1 to 9.
CN202210768613.4A 2022-06-30 2022-06-30 Video encoding method, video encoding device, electronic device, and medium Pending CN115065827A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210768613.4A CN115065827A (en) 2022-06-30 2022-06-30 Video encoding method, video encoding device, electronic device, and medium
PCT/CN2023/102731 WO2024002065A1 (en) 2022-06-30 2023-06-27 Video encoding method and apparatus, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210768613.4A CN115065827A (en) 2022-06-30 2022-06-30 Video encoding method, video encoding device, electronic device, and medium

Publications (1)

Publication Number Publication Date
CN115065827A true CN115065827A (en) 2022-09-16

Family

ID=83203396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210768613.4A Pending CN115065827A (en) 2022-06-30 2022-06-30 Video encoding method, video encoding device, electronic device, and medium

Country Status (2)

Country Link
CN (1) CN115065827A (en)
WO (1) WO2024002065A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024002065A1 (en) * 2022-06-30 2024-01-04 维沃移动通信有限公司 Video encoding method and apparatus, electronic device, and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112219087A (en) * 2019-08-30 2021-01-12 深圳市大疆创新科技有限公司 Pose prediction method, map construction method, movable platform and storage medium
CN111583350A (en) * 2020-05-29 2020-08-25 联想(北京)有限公司 Image processing method, device and system and server
WO2022072242A1 (en) * 2020-10-01 2022-04-07 Qualcomm Incorporated Coding video data using pose information of a user
CN115065827A (en) * 2022-06-30 2022-09-16 维沃移动通信有限公司 Video encoding method, video encoding device, electronic device, and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024002065A1 (en) * 2022-06-30 2024-01-04 维沃移动通信有限公司 Video encoding method and apparatus, electronic device, and medium

Also Published As

Publication number Publication date
WO2024002065A1 (en) 2024-01-04

Similar Documents

Publication Publication Date Title
US11145083B2 (en) Image-based localization
WO2019223468A1 (en) Camera orientation tracking method and apparatus, device, and system
EP3134870B1 (en) Electronic device localization based on imagery
CN110866977B (en) Augmented reality processing method, device, system, storage medium and electronic equipment
US11240525B2 (en) Systems and methods for video encoding acceleration in virtual, augmented, and mixed reality (xR) applications
CN110941332A (en) Expression driving method and device, electronic equipment and storage medium
CN112207821B (en) Target searching method of visual robot and robot
US11662580B2 (en) Image display method, apparatus, and system to reduce display latency
WO2024002065A1 (en) Video encoding method and apparatus, electronic device, and medium
CN113838151B (en) Camera calibration method, device, equipment and medium
CN115035456A (en) Video denoising method and device, electronic equipment and readable storage medium
CN113190120B (en) Pose acquisition method and device, electronic equipment and storage medium
CN112784081A (en) Image display method and device and electronic equipment
CN112511743A (en) Video shooting method and device
CN115278084A (en) Image processing method, image processing device, electronic equipment and storage medium
US10282633B2 (en) Cross-asset media analysis and processing
CN114785957A (en) Shooting method and device thereof
CN114390206A (en) Shooting method and device and electronic equipment
CN115278049A (en) Shooting method and device thereof
CN115205419A (en) Instant positioning and map construction method and device, electronic equipment and readable storage medium
CN114241127A (en) Panoramic image generation method and device, electronic equipment and medium
US20220345621A1 (en) Scene lock mode for capturing camera images
CN111833459B (en) Image processing method and device, electronic equipment and storage medium
CN112287155B (en) Picture processing method and device
CN116091572B (en) Method for acquiring image depth information, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination