CN116760986B

CN116760986B - Candidate motion vector generation method, candidate motion vector generation device, computer equipment and storage medium

Info

Publication number: CN116760986B
Application number: CN202311064903.1A
Authority: CN
Inventors: 冷龙韬; 许诗燊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-11-14
Anticipated expiration: 2043-08-23
Also published as: CN116760986A

Abstract

The present application relates to a candidate motion vector generation method, apparatus, computer device, storage medium and computer program product. The method can be applied to the field of cloud technology, and comprises the steps of obtaining candidate reference frames corresponding to pixel areas to be encoded in video frames; invoking a target filter; the target filter is generated based on a reference frame identification corresponding to an encoded pixel region in the video frame, the encoded pixel region being an encoded pixel region associated with the pixel region to be encoded; performing hash operation on the frame identifiers of the candidate reference frames to obtain frame identifier hash values; determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter; and determining candidate motion vectors of the pixel region to be encoded based on the matching result and the motion vectors of the encoded pixel region. The method can improve the generation efficiency of the candidate motion vector.

Description

Candidate motion vector generation method, candidate motion vector generation device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a candidate motion vector generating method, apparatus, computer device, and storage medium.

Background

Currently, in video coding, video is typically compressed efficiently by predicting pixel motion between a current frame and a reference frame based on motion compensated inter-frame prediction techniques, which result in a parameter called motion vector representing the movement of a block of pixels from frame to frame, and in order to more accurately perform motion compensation, the encoder typically generates a series of candidate motion vectors representing possible block movements of pixels.

However, the existing candidate motion vector generation method is generally based on some heuristic algorithms, such as selecting motion vectors for the surrounding coding pixel region, however, in the case of more candidate reference frames, a large number of frame sequence matching calculations need to be performed in this method, which results in low efficiency of candidate motion vector generation.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a candidate motion vector generation method, apparatus, computer device, and storage medium that can improve the candidate motion vector generation efficiency.

In a first aspect, the present application provides a candidate motion vector generation method. The method comprises the following steps:

Acquiring candidate reference frames corresponding to pixel areas to be coded in a video frame;

invoking a target filter; the target filter is generated based on a reference frame identification corresponding to an encoded pixel region in the video frame, the encoded pixel region being an encoded pixel region associated with the pixel region to be encoded;

performing hash operation on the frame identifiers of the candidate reference frames to obtain frame identifier hash values;

determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter;

and determining candidate motion vectors of the pixel region to be encoded based on the matching result and the motion vectors of the encoded pixel region.

In a second aspect, the application further provides a candidate motion vector generation device. The device comprises:

the candidate reference frame acquisition module is used for acquiring candidate reference frames corresponding to pixel areas to be coded in the video frames;

the filter calling module is used for calling the target filter; the target filter is generated based on a reference frame identification corresponding to an encoded pixel region in the video frame, the encoded pixel region being an encoded pixel region associated with the pixel region to be encoded;

The hash value calculation module is used for carrying out hash operation on the frame identifications of the candidate reference frames to obtain frame identification hash values;

a matching result determining module, configured to determine a matching result between the frame identifier of the candidate reference frame and the reference frame identifier corresponding to the encoded pixel region based on the frame identifier hash value and the target filter;

and the candidate motion vector generation module is used for determining the candidate motion vector of the pixel region to be encoded based on the matching result and the motion vector of the encoded pixel region.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The candidate motion vector generation method, the candidate motion vector generation device, the computer equipment, the storage medium and the computer program product are characterized in that a target filter is generated in advance based on a reference frame identifier corresponding to an encoded pixel region associated with a pixel region to be encoded, after a candidate reference frame corresponding to the pixel region to be encoded in a video frame is acquired, the target filter can be directly called, and hash operation is carried out on the frame identifier of the candidate reference frame to obtain a frame identifier hash value; therefore, the matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region can be rapidly determined based on the frame identification hash value and the target filter, and further the candidate motion vector of the pixel region to be encoded is determined based on the matching result and the motion vector of the encoded pixel region, so that the generation efficiency of the candidate motion vector of the encoded pixel region is improved, the efficiency and the effect of video encoding can be facilitated particularly under the condition of processing a large number of reference frames, the calculation resources are saved, and the video compression quality and the encoding speed are improved.

Drawings

FIG. 1 is a diagram of an application environment for a candidate motion vector generation method in one embodiment;

FIG. 2 is a flow diagram of a candidate motion vector generation method in one embodiment;

FIG. 3 is a schematic diagram of a portion of a pixel region of a video frame in one embodiment;

FIG. 4 is a schematic diagram of motion vectors in one embodiment;

FIG. 5 is a schematic diagram of generating candidate motion vectors in one embodiment;

FIG. 6 is a schematic diagram of an initial filter in one embodiment;

FIG. 7 is a schematic diagram of a target filter in one embodiment;

FIG. 8 is a flow diagram of a candidate motion vector generation method in one embodiment;

FIG. 9 is a flow diagram of a target filter build step in one embodiment;

FIG. 10 is a flow chart of candidate motion vector generation steps in one embodiment;

FIG. 11 is a schematic diagram of an application scenario of a candidate motion vector generation method in one embodiment;

fig. 12 is a schematic view of an application scenario of a candidate motion vector generation method according to another embodiment;

FIG. 13 is a block diagram of a candidate motion vector generation device in one embodiment;

FIG. 14 is a block diagram of a candidate motion vector generation device in one embodiment;

fig. 15 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The candidate motion vector generation method provided by the embodiment of the application can be applied to the field of Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud computing (clouding) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

The candidate motion vector generation method provided by the embodiment of the application can be applied to an application scene shown in fig. 1, wherein the scene at least comprises a terminal 102, a terminal 106 and a server 104, wherein the terminal 102 and the terminal 106 are in communication connection with the server 104 through a wired network or a wireless network, the candidate motion vector generation method can be executed in the terminal or the server, and the terminal 102 obtains a candidate reference frame corresponding to a pixel region to be coded in a video frame by taking the terminal 102 as an example; invoking a target filter; the target filter is generated based on a reference frame identification corresponding to an encoded pixel region in the video frame, the encoded pixel region being an encoded pixel region associated with the pixel region to be encoded; performing hash operation on the frame identifiers of the candidate reference frames to obtain frame identifier hash values; determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter; and determining candidate motion vectors of the pixel region to be encoded based on the matching result and the motion vectors of the encoded pixel region.

The terminal can be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things equipment, portable wearable equipment and network equipment, and the internet of things equipment can be smart speakers, smart televisions, smart air conditioners, smart vehicle-mounted equipment and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The network devices may be routers, switches, firewalls, load balancers, network memories, network adapters, and the like.

The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

In one embodiment, as shown in fig. 2, a candidate motion vector generation method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

S202, obtaining candidate reference frames corresponding to pixel areas to be encoded in the video frames.

The video is a frame sequence formed by a plurality of video frames. Video frames, for example, are also known as image frames, and when video frames are video encoded, the video frames that are encoded may be referred to as encoded frames.

The pixel area to be encoded may be referred to as a unit to be encoded, and when video encoding is performed, the video frame is divided into a plurality of small pixel areas, and these pixel areas may be referred to as an encoding unit, where the pixel area after the component encoding may be referred to as an encoded unit, and the pixel area not yet encoded may be referred to as a unit to be encoded. For example, in HEVC, coding Tree Units (CTUs) are used as units for Coding, so that video frames in a video sequence need to be divided into CTUs of uniform size during Coding, and further, CTUs may be further divided into smaller Coding units (Coding units, CUs) according to a quadtree structure. The CTUs include four sizes of 64×64, 32×32, 16×16, 8×8, and by way of example, 32×32 indicates that the corresponding CTUs have a size of 32 pixels wide and 32 pixels high, i.e., 32×32 pixels.

The candidate reference frames are a series of image frames for motion compensated prediction, in particular a series of image frames used to predict the current pixel region to be encoded, and typically comprise past frames (also called forward reference frames) and possibly future frames (also called backward reference frames), depending on whether the encoding scheme supports bi-directional prediction or not. For example, a B frame (bi-predictive frame) may use frames in both the front and rear directions as candidate reference frames.

It should be noted that, in the embodiment of the present application, the obtaining the candidate reference Frame may specifically be obtaining a Frame Identifier of the candidate reference Frame, where the Frame Identifier is a unique Identifier used to identify each video Frame in video coding. Each video frame has a frame identifier associated with it, which is used to manage the processing of the frame during video encoding and decoding, including such processes as prediction of the frame, referencing, encoding and decoding, etc., for example, in the motion compensation process, the reference frame can be found by the frame identifier, in which embodiment the frame identifier may be a POC value, POC (Picture Order Count) is a frame sequential representation method used in video encoding, POC is mainly used to represent the difference between the decoding order and the display order, and POC is widely used in many video encoding standards, such as h.264 and h.265, and for candidate reference frames, a frame closer to the current video frame in the POC sequence is generally selected as a candidate reference frame, because in many cases, the similarity of the image content between the frame closer to the current frame time (closer in the POC sequence) and the current frame is higher, and thus is more suitable for motion compensation as a reference frame.

Specifically, the terminal acquires a preset reference frame range, acquires a video frame meeting the reference frame range from a video to which a current video frame belongs, and screens candidate reference frames from the video frames in the reference frame range based on candidate screening conditions.

The reference frame range may be a time range or a number range, where the time range refers to a certain period of time from the current video frame, for example, if we set the time range to be 5 seconds, all video frames within 5 seconds from the current video frame may be selected as candidate reference frames; the number range refers to a specific number of reference frames, specifically a fixed number of frames from the front or rear of the current video frame, which may be selected as candidate reference frames, e.g., if the set number range is 5, the encoder may consider 5 frames before and 5 frames after the current frame as possible candidate reference frames.

Candidate screening conditions are used to screen candidate reference frames from a series of video frames that meet the reference frame range, which may be specifically a temporal similarity condition or a content similarity condition, where the temporal similarity condition is typically a selection of candidate reference frames based on a time interval between video frames, e.g., selecting several frames closest in time to the current video frame to be encoded as candidate reference frames, since video frames closer in time are typically closer in content; the content similarity condition is to select candidate reference frames according to the content similarity between video frames, for example, calculate pixel differences, structural Similarity (SSIM) or some other image quality index between the video frames to be encoded and possible reference frames to evaluate the content similarity between them, and then select some frames with the most similar content as candidate reference frames.

S204, calling a target filter.

The target filter is generated based on a reference frame identifier corresponding to an encoded pixel region in the video frame, wherein the encoded pixel region is an encoded pixel region associated with a pixel region to be encoded, that is, the target filter maps and stores the reference frame identifier of the encoded pixel region, so that the target filter can detect that other frame identifiers are matched with the reference frame identifier of the encoded pixel region.

It will be appreciated that during video encoding, a video frame is divided into a number of pixel regions, which may be encoded individually, some of which have been encoded (encoded pixel regions) at some point in time, while others are waiting to be encoded (pixel regions to be encoded), which is typically the target to be encoded next. The target filter for the current pixel region to be encoded is generated based on the reference frame identification of the neighboring encoded pixel regions of the current pixel region to be encoded.

Referring to the partial pixel region of the video frame shown in fig. 3, where the region P is the current pixel region to be encoded, and the regions A0, A1, B2, B1 and B0 are the associated encoded pixel regions of the region P, the target filter corresponding to the region P is generated based on the reference frame identifiers corresponding to the regions A0, A1, B2, B1 and B0, respectively, where the regions A0, A1, B2, B1 and B0 are all encoded by adopting the inter-frame prediction encoding mode.

Specifically, the terminal may obtain an encoded pixel region associated with a pixel region to be encoded in a video frame, obtain reference frame identifiers corresponding to the encoded pixel regions respectively, and construct a target filter of a current encoded pixel region based on the reference frame identifiers for calling.

Referring to the partial pixel regions of the video frame shown in fig. 3, including the region A0, the region A1, the region B2, the region B1, the region B0 and the region P, the rectangular frames in the drawing are only used to represent the positional relationship between different regions, and not to represent the actual size of the corresponding regions, where the region P is the current pixel region to be encoded, the region A0, the region A1, the region B2, the region B1 and the region B0 are the associated encoded pixel regions of the region P, and then the target filter corresponding to the region P is generated based on the reference frame identifiers corresponding to the region A0, the region A1, the region B2, the region B1 and the region B0 respectively, where the region A0, the region A1, the region B2, the region B1 and the region B0 are all encoded by adopting the inter-frame prediction encoding mode, and the frame identifiers of the respective reference frames may be the same or different.

S206, carrying out hash operation on the frame identification of the candidate reference frame to obtain a frame identification hash value.

The hash operation (hashing) is an output that transforms an input of arbitrary length (also called pre-image) into a fixed length by a hash algorithm, which is a hash value, and in the embodiment of the present application, the hash operation is applied to frame identifiers of candidate reference frames, in such a way that each frame identifier is converted into a unique hash value.

Specifically, after obtaining the frame identifiers of the candidate reference frames, the terminal may sequentially perform hash operation on the frame identifiers of the candidate reference frames through a hash function to obtain the frame identifier hash values corresponding to the candidate reference frames. Wherein the hash function may be a function corresponding to the target filter.

S208, determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter.

The matching result may be that there is a target reference frame identifier matching with the frame identifier of the candidate reference frame in the reference frame identifiers of the encoded pixel areas, or that there is no target reference frame identifier matching with the frame identifier of the candidate reference frame in the reference frame identifiers of the encoded pixel areas.

Specifically, after obtaining the frame identification hash value of the candidate reference frame, the terminal obtains a value to be matched of a target position corresponding to the frame identification hash value in the target filter, and determines a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the value to be matched.

S210, determining candidate motion vectors of the pixel region to be encoded based on the matching result and the motion vectors of the encoded pixel region.

Wherein the motion vectors are used to describe the difference in position between a particular region in one video frame and a corresponding region in another video frame (reference frame). As shown in fig. 4, the region indicated by the solid rectangular frame in the video frame 1 is the encoded pixel region m, the video frame 2 in the figure is the corresponding reference frame of the encoded pixel region m in the video frame 1, the region indicated by the solid rectangular frame in the video frame 2 is the region n (best match) corresponding to the encoded pixel region m, and the vector pointing from the position corresponding to the encoded pixel region m to the position corresponding to the region n in the figure is the motion vector of the encoded pixel region.

Specifically, after obtaining a matching result corresponding to the reference frame identifier, the terminal determines a corresponding motion vector processing mode according to the matching result, and processes the motion vector of the encoded pixel region according to the determined motion vector processing mode to obtain a candidate motion vector of the pixel region to be encoded.

The motion vector processing mode comprises a motion vector inheritance mode and a motion vector scaling processing mode, wherein the motion vector inheritance mode is that a pixel region to be encoded directly inherits a motion vector of a reference frame corresponding to the encoded pixel region as a candidate motion vector; the motion vector scaling processing mode refers to scaling the motion vector of the reference frame corresponding to the encoded pixel region to obtain a candidate motion vector of the pixel region to be encoded.

For example, in fig. 3, if the frame identifier of a certain reference video frame of the pixel region to be encoded is the same as the reference frame identifier of the encoded pixel region A0, the motion vector of the region A0 is directly inherited as a candidate motion vector of the pixel region to be encoded, and scaling is performed on the motion vectors of the region A1, the region B2, the region B1 and the region B0, so as to obtain 4 candidate motion vectors.

Furthermore, it should be noted that the current pixel region to be encoded may correspond to a plurality of candidate reference frames, and each candidate reference frame needs to perform the above steps S204 to S210 to generate a corresponding candidate motion vector.

Referring to fig. 5, cur frame represents a current video frame, where a region P in Cur frame is a current pixel region to be encoded, a region B2 and a region B0 in the current video frame are two encoded pixel regions corresponding to the region P, where a reference frame corresponding to the encoded region B2 is a Ref frame1, a reference frame corresponding to the encoded region B0 is a Ref frame3, candidate reference frames corresponding to the region P are a Ref frame1, a Ref frame2 and a Ref frame3, and when determining a candidate motion vector of the region P, the above steps S204 to S210 are performed for the candidate reference frame Ref frame1, so that the motion vector of the encoded region B2 can be directly determined as a candidate motion vector 1 of the region P, and the motion vector of the region B0 is processed to obtain a scaled motion vector 2, and the scaled motion vector 2 is determined as a candidate motion vector 2 of the region P; for the candidate reference frame Ref frame2, performing the steps S204 to S210 may implement scaling processing on the motion vector of the encoded region B2 to obtain a scaled motion vector 3, determining the scaled motion vector 3 as a candidate motion vector 3 of the region P, and scaling processing on the motion vector of the encoded region B0 to obtain a scaled motion vector 4, and determining the scaled motion vector 4 as a candidate motion vector 4 of the region P; for the candidate reference frame Ref frame4, by performing the above steps S204 to S210, scaling of the motion vector of the encoded region B2 may be achieved, resulting in a scaled motion vector 5, and determining the scaled motion vector 5 as the candidate motion vector 5 of the region P, and directly determining the motion vector of the encoded region B0 as the candidate motion vector 6 of the region P.

In the above embodiment, the terminal generates the target filter in advance based on the reference frame identifier corresponding to the encoded pixel region associated with the pixel region to be encoded, and after obtaining the candidate reference frame corresponding to the pixel region to be encoded in the video frame, directly invokes the target filter, and performs hash operation on the frame identifier of the candidate reference frame to obtain the frame identifier hash value; therefore, the matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region can be rapidly determined based on the frame identification hash value and the target filter, and further the candidate motion vector of the pixel region to be encoded is determined based on the matching result and the motion vector of the encoded pixel region, so that the generation efficiency of the candidate motion vector of the encoded pixel region is improved, the efficiency and the effect of video encoding can be facilitated particularly under the condition of processing a large number of reference frames, the calculation resources are saved, and the video compression quality and the encoding speed are improved.

In one embodiment, the candidate motion vector generating method further includes a process of generating a target filter, and the process specifically includes the following steps: acquiring a reference frame identifier corresponding to the encoded pixel region; carrying out hash operation on the reference frame identifier based on a hash function corresponding to the initial filter to obtain a hash value of the reference frame identifier; determining a frame identifier insertion position corresponding to the hash value of the reference frame identifier in the initial filter; and adjusting the numerical value corresponding to the frame identification insertion position in the initial filter to obtain a target filter.

The filter is a data structure, which includes a vector formed by binary bits (0 or 1) and a corresponding mapping function, the size of the binary vector can be determined according to requirements, for example, the binary vector can be set to 10 bits, the values of all bits are set to 0 at the beginning, and the filter at the beginning is the initial filter. The mapping function may be a hash function for mapping the reference frame identification to a position in the binary vector.

Specifically, any one of the encoded pixel areas is to be added to the reference frame identifier in the initial filter, hash operation is performed on the reference frame identifier through a hash function corresponding to the initial filter to obtain at least one hash value of the reference frame identifier, each hash value of the reference frame identifier corresponds to a position of a binary vector of the initial filter, and the position is a frame identifier insertion position, so that a value of the frame identifier insertion position in the binary vector can be set to be 1, the reference frame identifier is inserted into the initial filter, and all the reference frame identifiers of the encoded pixel areas are inserted into the initial filter through the above processing.

In one embodiment, the number of hash functions corresponding to the filters is 1, the hash functions are modulo functions, the modulus of the modulo functions is the same as the number of bits of the target filters, the terminal performs hash operation on the reference frame identifier based on the hash function corresponding to the initial filter, and the process of obtaining the hash value of the reference frame identifier includes the following steps: and performing modular operation on the reference frame identifier based on a modular function corresponding to the initial filter to obtain a hash value of the reference frame identifier.

Wherein the modulus of the modulo function is the same size as the number of bits of the binary vector of the filter to ensure that the reference frame identification can be mapped into the range of binary vectors, e.g., R bits of the binary vector of the filter, and R can be set, then any incoming reference frame identification will be mapped into the range of 0 to R-1. In the embodiment of the present application, R may be set to 53, and the corresponding binary vector contains 53 positions from 0 to 52.

Referring to fig. 6, a schematic diagram of an initial filter in an embodiment is shown, where a binary vector of the initial filter contains 10 bits from 0 to 9, at an initial time, values of bits of the filter are all 0, and if a reference frame identifier of an encoded pixel area includes a reference frame identifier X, a reference frame identifier Y and a reference frame identifier Z, when the reference frame identifier X is to be inserted into the filter, performing modulo operation on the reference frame identifier X by the filter to obtain a hash value of the reference frame identifier, and if the hash value of the reference frame identifier is 5, modifying a value at a position where a binary vector j=5 of the filter is 1, so as to implement insertion of the reference frame identifier X into the initial filter; when a reference frame identifier Y is to be inserted into a filter, performing modular operation on the reference frame identifier Y through the filter to obtain a hash value of the reference frame identifier, and modifying the value at the position of a binary vector j=8 of the filter to be 1 on the assumption that the hash value of the reference frame identifier is 8, so as to realize the insertion of the reference frame identifier Y into the filter; when the reference frame identifier Z is to be inserted into the filter, performing modulo operation on the reference frame identifier Z by the filter to obtain a hash value of the reference frame identifier, and modifying the value at the position of the binary vector j=3 of the filter to 1 if the hash value of the reference frame identifier is 3, so as to insert the reference frame identifier Z into the filter, thereby realizing that all the reference frame identifiers of the encoded pixel region are inserted into the initial filter, and obtaining the target filter shown in fig. 7.

In the above embodiment, the terminal performs hash operation on the reference frame identifier corresponding to the encoded pixel region based on the hash function corresponding to the initial filter to obtain the hash value of the reference frame identifier; determining a frame identifier insertion position corresponding to the hash value of the reference frame identifier in the initial filter; the numerical value corresponding to the frame identifier insertion position is adjusted in the initial filter, so that the reference frame identifier is stored in a relatively small space of the target filter, the storage cost is reduced, and whether the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region can be rapidly judged based on the constructed target filter, so that the calculation efficiency is improved.

In one embodiment, before acquiring the reference frame identifier corresponding to the encoded pixel region, the terminal may further acquire the encoding mode identifier of the encoded pixel region; when the coding mode of the coding mode identification characterization coded pixel area is inter prediction, executing the step of obtaining the reference frame identification corresponding to the coded pixel area.

Wherein the coding mode identifies a coding mode for identifying each coded pixel region in video coding, the coding modes including Intra coding (Intra coding) and Inter coding (Inter coding), intra coding meaning that each pixel block of an image is predicted and coded using only information of other coded pixel blocks within the frame, the mode being independent of other frames and not using motion compensation; inter-frame coding refers to that a pixel block of an image is predicted and coded by using a pixel block at a corresponding position or a position near a previous frame or a next frame, and the mode is based on a motion compensation technology, so that redundancy of a video sequence can be effectively reduced, and a better compression effect is achieved.

Inter prediction (Inter prediction) is a specific method in Inter coding, which is mainly used for predicting a pixel block of a current frame, and is usually based on Motion compensation, i.e. predicting the pixel block of the current frame by calculating a Motion Vector between the current frame and a reference frame, and this predicted pixel block is used for calculating a prediction error, which is encoded and transmitted together with the Motion Vector.

Specifically, after determining the current pixel region to be encoded, the terminal determines the associated encoded pixel region of the current pixel region to be encoded, and for any one encoded pixel region, if a target filter is to be constructed based on the encoded pixel region, firstly acquires an encoding mode identifier corresponding to the encoded pixel region, and if the encoding mode identifier of the encoded pixel region represents that the encoding mode of the encoded pixel region is not inter-frame prediction, the step of subsequently acquiring a reference frame identifier corresponding to the encoded pixel region is not executed, so that unnecessary steps can be omitted, and the calculation and processing efficiency can be improved; if the coded pixel region coding mode identifier characterizes that the coded pixel region coding mode is inter-frame prediction, then executing the step of obtaining the reference frame identifier corresponding to the coded pixel region, and obtaining and using the reference frame identifier in the subsequent step will help to improve the accuracy and efficiency of coding.

In one embodiment, the process of hashing the frame identifier of the candidate reference frame by the terminal to obtain the frame identifier hash value includes the following steps: obtaining a hash function corresponding to a target filter; and carrying out hash operation on the frame identifications of the candidate reference frames based on the hash function to obtain frame identification hash values.

Wherein the target filter comprises a vector of binary bits (0 or 1) and a corresponding mapping function, which may be a hash function, for mapping the candidate reference frame identification to a position in the binary vector.

Specifically, the terminal obtains hash functions corresponding to the target filter, and performs hash operation on frame identifiers of candidate reference frames based on the hash functions to obtain frame identifier hash values, wherein each frame identifier hash value corresponds to a position of a binary vector of the target filter.

In one embodiment, the number of hash functions corresponding to the filter is 1, the hash functions are modulo functions, the terminal performs hash operation on the frame identifier of the candidate reference frame based on the hash functions, and the process of obtaining the hash value of the frame identifier includes the following steps: and performing modular operation on the frame identifications of the candidate reference frames based on a modular function corresponding to the target filter to obtain a frame identification hash value.

Wherein the modulo of the modulo function is the same size as the number of bits of the binary vector of the filter to ensure that the reference frame identification can be mapped to the range of binary vectors, e.g., R bits of the binary vector of the filter, and the modulo of the modulo function can be set to R, then any incoming reference frame identification will be mapped to the range of 0 to R-1. In the embodiment of the present application, R may be set to 53, and the corresponding binary vector contains 53 positions from 0 to 52.

In the above embodiment, the terminal may convert the complex frame identifier into a simple value through hash operation, which may make the matching calculation process very fast, because retrieving a hash value is generally more efficient in computer memory than searching for a complex frame identifier, and the hash function may compress a larger or more complex frame identifier into a smaller value, which may save a lot of storage space during the processing of a lot of data.

In one embodiment, the process of determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region by the terminal based on the frame identification hash value and the target filter comprises the steps of: determining a target position corresponding to the frame identification hash value in the target filter; obtaining a value to be matched corresponding to a target position in a target filter; when the value to be matched is a first target value, determining that a target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifiers corresponding to the encoded pixel areas; and when the value to be matched is a second target value, determining that the target reference frame identification matched with the frame identification of the candidate reference frame does not exist in the reference frame identifications corresponding to the encoded pixel areas.

Wherein the first target value may be 1 and the second target value may be 0.

Specifically, after obtaining the frame identifier hash value, the terminal determines the position of the frame identifier hash value corresponding to the binary vector of the target filter, where the position is the target position, and obtains a value corresponding to the target position in the binary vector, where the value is a value to be matched, and if the value to be matched is a first target value, for example, 1, it is determined that the reference frame identifier identical to the frame identifier corresponding to the frame identifier hash value has been inserted into the target filter, that is, it is determined that there is a target reference frame identifier matched with the frame identifier of the candidate reference frame in the reference frame identifier corresponding to the encoded pixel region; if the value to be matched is a second target value, for example, 0, it is determined that the reference frame identifier identical to the frame identifier corresponding to the frame identifier hash value is not inserted into the target filter, that is, it is determined that the target reference frame identifier matched with the frame identifier of the candidate reference frame does not exist in the reference frame identifiers corresponding to the encoded pixel region.

For example, for a frame identifier a of a certain candidate reference frame, performing a modulo operation to obtain a frame identifier hash value of 3, determining a j=3 position in the target filter shown in fig. 7 as a target position, where the value of the target position is 1, and determining that a frame identifier d, that is, a=z, exists in the reference frame identifier corresponding to the encoded pixel region; for the frame identifier b of a certain candidate reference frame, performing modulo operation to obtain a frame identifier hash value of 5, determining the j=5 position in the target filter shown in fig. 5 as a target position, and determining that the frame identifier b exists in the reference frame identifier corresponding to the encoded pixel region if the value of the target position is 1, namely b=x.

In the above embodiment, the terminal can quickly determine whether the frame identifier of the candidate reference frame is matched with the reference frame identifier corresponding to the encoded pixel region through the hash value and the target filter, that is, after one hash operation and one search operation, the result can be obtained, thereby improving the matching efficiency.

In one embodiment, the process of determining candidate motion vectors for a pixel region to be encoded by a terminal based on a matching result and motion vectors for the encoded pixel region comprises the steps of: when the matching result represents that the target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region, acquiring a target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region; determining a target motion vector as a candidate motion vector of a pixel region to be encoded; and performing scaling treatment on other motion vectors in the motion vectors of the encoded pixel region to obtain candidate motion vectors of the pixel region to be encoded.

The other motion vectors in the motion vectors of the encoded pixel region refer to motion vectors corresponding to other reference frame identifiers except the target reference frame identifier in the reference frame identifiers of the encoded pixel region.

Specifically, when the matching result indicates that a target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifiers corresponding to the encoded pixel regions, determining the target reference frame identifier based on the frame identifier, for example, determining the frame identifier of the current candidate reference frame as the target reference frame identifier, acquiring a target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region, and directly determining the target motion vector as the candidate motion vector of the current pixel region to be encoded; and acquiring other reference frame identifications in the reference frame identifications corresponding to the encoded pixel areas, acquiring other motion vectors corresponding to the other reference frame identifications from the motion vectors of the encoded pixel areas, and performing scaling processing on the other motion vectors to obtain candidate motion vectors of the pixel areas to be encoded.

For example, three reference frame identifiers are corresponding to the encoded pixel region, each reference frame identifier corresponds to one motion vector, if it is determined that a target reference frame identifier matched with a frame identifier of a candidate reference frame exists in the reference frame identifiers corresponding to the encoded pixel region, the motion vector corresponding to the target reference frame identifier in the three reference frame identifiers is directly determined to be a candidate motion vector of the current pixel region to be encoded, and scaling processing is performed on the motion vectors corresponding to two other reference frame identifiers except the target reference frame identifier in the three reference frame identifiers, so as to obtain two scaled motion vectors, where the two scaled motion vectors are the two candidate motion vectors of the current pixel region to be encoded.

In the above embodiment, the terminal directly inherits the target motion vector corresponding to the target reference frame identifier as the candidate motion vector, so that redundant motion vector searching is avoided, thereby improving video coding efficiency, and scaling other motion vectors in the motion vectors of the coded pixel region, so that more accurate candidate motion vectors can be obtained without motion vector searching, and video coding quality can be improved.

In one embodiment, the process of scaling other motion vectors in the motion vectors of the encoded pixel region by the terminal to obtain candidate motion vectors of the pixel region to be encoded includes the following steps: determining a scaling factor based on frame identifications of the candidate reference frames and other reference frame identifications corresponding to the encoded pixel regions; and scaling other motion vectors of the encoded pixel region based on the scaling coefficient to obtain candidate motion vectors of the pixel region to be encoded.

Specifically, other reference frame identifiers in the reference frame identifiers corresponding to the encoded pixel regions are obtained, other motion vectors corresponding to the other reference frame identifiers are obtained from the motion vectors of the encoded pixel regions, the ratio of the frame identifiers of the current candidate reference frame to the other reference frame identifiers is determined to be a scaling coefficient, the product of the scaling coefficient and the corresponding other motion vectors is determined to be a scaled motion vector, and the obtained scaled motion vector is determined to be a candidate motion vector of the current pixel region to be encoded.

In the above embodiment, the terminal determines the scaling factor by considering the frame identifier of the candidate reference frame and the reference frame identifier corresponding to the encoded pixel region, so that the scaling factor is more in line with the change of the video content, and the candidate motion vector obtained by scaling the motion vector based on the scaling factor is more accurate, thereby improving the accuracy of video encoding.

In one embodiment, before acquiring the target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region, the terminal may further search the reference frame identifier of the encoded pixel region for the target reference frame identifier identical to the frame identifier of the reference frame; and when the target reference frame identifier is found, executing the step of acquiring a target motion vector corresponding to the target reference frame identifier from the motion vectors of the encoded pixel areas.

Specifically, when the matching result indicates that the reference frame identifier corresponding to the encoded pixel region has the target reference frame identifier matched with the frame identifier of the candidate reference frame, in order to avoid that the matching result is a misjudgment result, the target reference frame identifier identical to the frame identifier of the reference frame can be searched from the reference frame identifiers of the encoded pixel region, if the target reference frame identifier can be searched from the reference frame identifiers of the encoded pixel region, the matching result is determined to be accurate, and then the step of acquiring the target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region is executed; if the target reference frame identification cannot be found in the reference frame identifications of the encoded pixel areas, determining that the matching result is inaccurate, namely that the target reference frame identification does not exist in the reference frame identifications corresponding to the encoded pixel areas, obtaining the motion vectors of the encoded pixel areas, and respectively performing scaling treatment on the motion vectors of the encoded pixel areas to obtain candidate motion vectors of the pixel areas to be encoded.

It should be noted that, in the embodiment of the present application, the target filter itself has a certain misjudgment rate, where the misjudgment occurs in the case that the matching result indicates that the reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region, the misjudgment is illustrated, and it is assumed that the reference frame identifier of the encoded pixel region associated with the current pixel region to be encoded is a POC value, each POC value is 23, 15, 48, and the corresponding modulo result is 3, 5, and 8, so as to obtain the target filter as shown in fig. 7, where the modulo function of the target filter is as follows:

D=Poc mod R

where R is the number of bits of the target filter, the number of bits of the target filter shown in fig. 7 is 10, i.e., r=10. If the candidate reference frame identifier of the current pixel region to be encoded is a POC value, the POC value is 15, performing modulo operation on the candidate reference frame identifier based on a modulo function of the target filter, and if the result is 5, obtaining a value to be matched corresponding to a j=5 position in the target filter, as can be seen from fig. 7, if the value to be matched is 1, determining that the reference frame identifier of the encoded pixel region has the reference frame identifier with the POC value of 15, and if the process of generating the target filter is known, determining that the judgment is correct; if the current candidate reference frame identifier of the pixel region to be encoded is a POC value, the POC value is 25, performing modulo operation on the candidate reference frame identifier based on a modulo function of the target filter, and if the result is 5, obtaining a value to be matched corresponding to a j=5 position in the target filter, as can be seen from fig. 7, if the value to be matched is 1, determining that the reference frame identifier of the encoded pixel region has the reference frame identifier with the POC value of 25, and if the process of generating the target filter is known, determining that the judgment is erroneous.

In order to reduce the false positive rate, in the embodiment of the present application, the number of bits of the target filter may be set to be not smaller than the maximum value of the candidate reference frame identifier and the reference frame identifier, for example, the maximum value is poc=50, and the number of bits of the target filter may be set to be 50, where this will not cause false positive; however, considering that in practical situations, the value of the frame identifier may be relatively large, if the number of bits of the target filter is set excessively, the calculation efficiency of determining the matching result may be reduced, so the number of bits of the target filter needs to be reasonably set according to the magnitude of the POC value in practical situations, so as to balance between improving the calculation efficiency and accepting a certain erroneous judgment rate, for example, the maximum value is poc=100, ideally (without erroneous judgment), the number of bits of the target filter may be set to 100, but in order to improve the calculation efficiency of determining the matching result, the number of bits of the target filter may be set to a value less than 100, for example, to 80, or to 50, etc., so that the calculation efficiency is improved when accepting the certain erroneous judgment rate.

In the above embodiment, the terminal may directly determine that the target reference frame identifier is valid by directly searching the reference frame identifier of the encoded pixel region for the target reference frame identifier that is the same as the frame identifier of the reference frame, so as to reduce the possibility of erroneous judgment, and if the target reference frame identifier is found, may directly obtain the target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region without performing additional search, thereby improving the encoding efficiency.

In one embodiment, the candidate motion vector generating method further includes the steps of: when the matching result represents that the target reference frame identification matched with the frame identification of the candidate reference frame does not exist in the reference frame identifications corresponding to the encoded pixel areas, the motion vector of the encoded pixel areas is obtained; and performing scaling treatment on the motion vector of the encoded pixel region to obtain a candidate motion vector of the pixel region to be encoded.

Specifically, when the matching result indicates that the reference frame identifier corresponding to the encoded pixel region does not have the target reference frame identifier matched with the frame identifier of the candidate reference frame, motion vectors respectively corresponding to the reference frame identifiers of the encoded pixel region are obtained, scaling processing is performed on the corresponding motion vectors based on the reference frame identifiers, scaled motion vectors corresponding to the reference frame identifiers are obtained, and the scaled motion vectors are determined to be candidate motion vectors of the current pixel region to be encoded.

For example, three reference frame identifiers are corresponding to the encoded pixel region, each reference frame identifier corresponds to a motion vector, and if it is determined that a target reference frame identifier matched with the frame identifier of the candidate reference frame does not exist in the reference frame identifiers corresponding to the encoded pixel region, scaling is performed on the motion vectors corresponding to the three reference frame identifiers respectively to obtain three scaled motion vectors, where the three scaled motion vectors are the two candidate motion vectors of the current pixel region to be encoded.

In the above embodiment, the terminal obtains the candidate motion vector by using the motion vector of the encoded pixel region, so that the time for searching the motion vector can be saved, thereby improving the encoding efficiency, and the motion vector of the encoded pixel region is scaled, so that the candidate motion vector is more accurate, thereby improving the accuracy of video encoding.

In one embodiment, the process of scaling the motion vector of the encoded pixel region by the terminal to obtain the candidate motion vector of the pixel region to be encoded includes the following steps: determining a scaling factor based on the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region; and scaling the motion vector of the encoded pixel region based on the scaling coefficient to obtain a candidate motion vector of the pixel region to be encoded.

Specifically, the terminal determines the ratio of the frame identifier of the current candidate reference frame to the reference frame identifier of the encoded pixel region as a scaling coefficient, determines the product of the scaling coefficient and the corresponding motion vector as a scaled motion vector, and determines the obtained scaled motion vector as the candidate motion vector of the current pixel region to be encoded.

In one embodiment, the process of determining the scaling factor by the terminal based on the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region comprises the steps of: determining a first identification interval between the candidate reference frame and the video frame based on the frame identification of the candidate reference frame; determining a second identification interval between a reference frame and a video frame of the encoded pixel region based on the reference frame identification; a scaling factor is determined based on the first identification interval and the second identification interval.

Wherein the first identification interval is used for representing an inter-frame distance between the candidate reference frame and the currently processed video frame; the second identification interval is used to characterize the inter-frame distance between the reference frame of the encoded pixel region and the currently processed video frame.

Specifically, when the terminal needs to scale the motion vector of the encoded pixel area, the terminal may acquire a reference frame identifier corresponding to the motion vector, a frame identifier of a video frame of the current pixel area to be encoded, and a frame identifier of a candidate reference frame corresponding to the current pixel area to be encoded, determine a first identifier interval based on the frame identifier of the current video frame and the frame identifier of the candidate reference frame, determine a second identifier interval based on the frame identifier of the current video frame and the reference frame identifier of the encoded pixel area, determine a ratio between the second identifier interval and the first identifier interval, and determine the ratio as a scaling factor of the motion vector.

In one embodiment, the frame identifications are POC values, and the following relationship is satisfied between the scaling factor, the frame identification of the candidate reference frame, the frame identification of the current video frame, and the reference frame identification of the encoded pixel region:

wherein,represents a scaling factor,/->Frame identification representing the current video frame, +.>Frame identification of candidate reference frames representing a current pixel region to be encoded,/>A reference frame identification representing a motion vector to be scaled.

In one embodiment, after obtaining the scaling factor, the terminal may perform scaling processing on the motion vector based on the scaling factor to obtain a scaled motion vector, where the scaled motion vector is a candidate motion vector of the current pixel area to be encoded, and the candidate motion vector, the scaling factor, and the motion vector satisfy the following relationship:

wherein,representing candidate motion vectors, & lt & gt>Represents a scaling factor,/->Representing the motion vector prior to the scaling process.

In the above embodiment, the terminal may calculate the appropriate scaling factor more accurately by comparing the two identification intervals (i.e., the interval between the candidate reference frame and the video frame and the interval between the reference frame and the video frame of the encoded pixel region), so that the candidate motion vector of the pixel region to be encoded may be predicted more accurately, thereby improving the encoding accuracy.

In one embodiment, the candidate motion vector generating method further includes the steps of: determining the coding cost of each candidate motion vector; and selecting a motion vector of the pixel region to be coded from the candidate motion vectors based on the coding cost.

The coding cost may also be referred to as coding cost, which is a measure for reflecting information loss and compression efficiency in the video coding process, and may specifically include a distortion cost and a bit rate cost, where the distortion cost refers to a difference between an original video and a decoded video caused by quantization and other lossless compression processes, and the bit rate cost refers to the number of bits used for representing the video, and it is understood that the lower the bit rate, the smaller the storage space and transmission bandwidth required for representing the video, but at the same time, the distortion may increase.

Specifically, after obtaining each candidate motion vector of a current pixel area to be encoded, the terminal determines a pixel difference value between a corresponding predicted pixel area of the candidate motion vector and the current pixel area to be encoded according to any one candidate motion vector, wherein the pixel difference value is a distortion cost, determines the number of bits required for encoding and modifying the candidate motion vector, and the number of bits is a bit rate cost, so that the encoding cost of the candidate motion vector is obtained.

In the above embodiment, the terminal calculates the coding cost of each candidate motion vector, so that on one hand, the motion vector with the minimum coding cost can be selected, and the video coding effect can be further optimized, and the video quality is improved; on the other hand, by comparing the coding cost of each candidate motion vector, the motion vector with higher coding cost can be prevented from being coded, so that unnecessary calculated amount is reduced, and the coding efficiency is improved; in addition, a balance can be achieved between the coding effect and the coding complexity, for example, under the condition of limited computing resources, the coding complexity can be reduced while the coding quality is ensured by selecting the motion vector with smaller coding cost.

In one embodiment, as shown in fig. 8, there is further provided a candidate motion vector generating method, which is described by taking the terminal in fig. 1 as an example, including the following steps:

s802, obtaining a candidate reference frame corresponding to a pixel region to be encoded in a video frame and an encoded pixel region encoding mode identifier associated with the pixel region to be encoded.

S804, when the coding mode of the coding mode identification representing the coded pixel region is inter-frame prediction, the reference frame identification corresponding to the coded pixel region is obtained.

S806, carrying out hash operation on the reference frame identifier based on a hash function corresponding to the initial filter to obtain a hash value of the reference frame identifier; determining a frame identifier insertion position corresponding to the hash value of the reference frame identifier in the initial filter; and adjusting the numerical value corresponding to the frame identification insertion position in the initial filter to obtain a target filter.

S808, carrying out hash operation on the frame identification of the candidate reference frame based on the hash function corresponding to the target filter to obtain a frame identification hash value.

S810, determining a target position corresponding to the frame identification hash value in the target filter.

S812, obtaining a value to be matched corresponding to the target position in the target filter.

S814, when the value to be matched is the first target value, the target motion vector corresponding to the target reference frame identifier is obtained from the motion vector of the encoded pixel region, the target motion vector is determined as the candidate motion vector of the pixel region to be encoded, and scaling is performed on other motion vectors in the motion vector of the encoded pixel region to obtain the candidate motion vector of the pixel region to be encoded.

And when the value to be matched is a first target value, the step of acquiring a target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region is executed, wherein the target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region.

And S816, when the value to be matched is the second target value, acquiring the motion vector of the encoded pixel region, and performing scaling treatment on the motion vector of the encoded pixel region to obtain the candidate motion vector of the pixel region to be encoded.

And when the value to be matched is a second target value, determining that a target reference frame identifier matched with the frame identifier of the candidate reference frame does not exist in the reference frame identifiers corresponding to the encoded pixel regions, executing the step of obtaining the motion vector of the encoded pixel regions, and performing scaling treatment on the motion vector of the encoded pixel regions to obtain the candidate motion vector of the pixel regions to be encoded.

The application also provides an application scene, which applies the candidate motion vector generation method, wherein the candidate motion vector generation method comprises a target filter construction stage and a candidate motion vector generation stage:

it should be noted that, in this application scenario, each coding unit may be denoted as a CU, a set formed by coded units associated with a current unit to be coded may be denoted as a neighboring unit set (neighbor_ CUs), a set formed by motion vectors of the coded units may be denoted as a neighboring motion vector set (Neighbor MVs), and a reference frame identifier of the coded unit may be denoted as a neighboring frame identifier set (Neighbor POCs), and referring to fig. 9, the target filter construction stage includes the following steps:

(1) Obtaining an encoded Unit CU currently to be inserted into a Filter _i Is used for determining the coded unit CU based on the coding mode identification _i If the coding mode of the (a) is inter prediction, ending the current flow if not; if yes, executing the step (2);

(2) Obtaining coded Unit CU _i To a set of contiguous motion vectors (Neighbor MVs);

(3) Acquiring encoded unitsCU _i Reference frame identification (POC), stored to Neighbor POCs;

(4) Unit to be encoded CU _i Mapping reference frame identification (POC) into an initial filter; and returns to step (1) for the coded unit CU _i+1 And processing until a target filter is obtained from all the initial filters of the mapping values of the coded units associated with the current unit to be coded.

Wherein the encoded unit CU is to _i The implementation of reference frame identification (POC) mapping into the initial filter is as follows:

BloomFilter = BloomFilter | (1<<(POC mod 53))

wherein mod is modulo operation, "<" is shift operation, "|" is OR operation, the number of bits of the initial filter is 53, and the corresponding bit is identified as 1-53, so that when mapping, POC needs to be shifted one bit to the left after modulo.

Referring to fig. 10, the candidate motion vector generation phase includes the steps of:

(1) Obtaining candidate reference frame identification (POC) to be matched of current unit to be coded _i ）；

(2) Identifying candidate reference frames (POCs) _i ) Mapping to a target position of a target filter, and checking whether the target position has a mapping result;

wherein candidate reference frame identification (POC _i ) The implementation of the mapping to the target locations of the target filters is as follows:

Check whether (BloomFilter>>(POC mod 53))&1 is set。

wherein mod is a modulo operation, "<" is a shift operation, "<" is an AND operation.

(3) The candidate motion vector (Direct MVC), if present, is determined in a Direct inheritance manner, and the candidate motion vector (Direct MVC) is determined in a scaled manner.

In particular, for target point of care (i.e., POC) in Neighbor POCs (Neighbor POCs) _i ) The corresponding motion vector willThe motion vector is directly determined as a candidate Motion Vector (MVC) of the current unit to be coded; and scaling the motion vector corresponding to other POCs except the target POC in the adjacent identification set (Neighbor POCs) to obtain a candidate Motion Vector (MVC) of the current unit to be coded.

(4) If not, directly skipping all Direct MVC detection, and only carrying out an Indirect MVC flow.

And for the motion vector corresponding to each point-to-multipoint (POC) in the adjacent identification set (Neighbor POCs), scaling the motion vector to obtain a candidate Motion Vector (MVC) of the current unit to be coded.

(5) Returning to step (1), and identifying (POC) candidate reference frames to be matched of the current unit to be coded _i+1 ) And processing until all candidate reference frame identifiers of the current unit to be encoded are matched, and obtaining a Set (MVCs) formed by a plurality of candidate motion vectors of the current unit to be encoded.

In practical application, taking the case that the current unit to be encoded corresponds to 32 candidate reference frames as an example, the number of reference frame identifications (POCs) used in 5 coding units around the current unit to be encoded is 1-10, if all the peripheral coding units use the same reference frame, namely only one reference frame identification POC used in 5 coding units around, in the process of traversing matching each candidate reference frame identification by the current coding unit, 31 Direct MVC processes are all failed, if all the peripheral coding units use different reference frames, namely 10 reference frame identifications POCs used in 5 coding units around, 22 Direct MVC processes are all failed in the process of traversing matching each candidate reference frame identification by the current coding unit, by adopting the method for generating candidate motion vectors and the Xinin test, under the configuration of 32 candidate reference frames, the target filter constructed by using 5 Direct frame reference frame mappings of the peripheral coding units can achieve more than 99%, and the candidate motion vector of the peripheral coding units can be obtained by skipping about 20% according to the situation of the candidate motion vector of the current coding unit.

The present application also provides an application scenario, which may be a real-time video call scenario performed by an instant messaging client, as shown in fig. 11, where the scenario at least includes a terminal 1102, a terminal 1106 and a server 1104, where the terminal 1102 and the terminal 1106 establish communication connection with the server 1104 through a wired network or a wireless network, the terminal 1102 communicates with the terminal 1106 through the server 1104, the terminal 1102 corresponds to a user 1, and the terminal 1106 corresponds to a user 2. The terminal 1102 shoots the conversation process of the user 1 through a camera to obtain a first video, the terminal 1102 obtains candidate motion vectors corresponding to video frames in the first video by adopting the candidate motion vector generation method provided by the embodiment of the application, further codes the video frames in the first video based on the candidate motion vectors, the first video stream obtained after the coding is sent to the server 1104, the server 1104 forwards the first video stream to the terminal 1106, and the terminal 1106 decodes the first video stream according to the first video stream and plays the decoded first video. Meanwhile, the terminal 1106 shoots the conversation process of the user 2 through the camera to obtain a second video, the terminal 1106 obtains candidate motion vectors corresponding to all video frames in the second video by adopting the candidate motion vector generation method provided by the embodiment of the application, further codes all video frames in the second video based on the candidate motion vectors, the second video stream obtained after the coding is sent to the server 1104, the server 1104 forwards the second video stream to the terminal 1102, the terminal 1102 decodes the second video stream according to the coding mode, and plays the decoded second video. Alternatively, the terminal 1102 may play the first video shot by itself, that is, play the first video and the second video simultaneously in the display interface of the terminal 1102, in addition to playing the second video. Similarly, terminal 1106 can play the second video and the first video simultaneously. The display interfaces of the terminal 1102 and the terminal 1106 are shown in detail in fig. 11.

The application also provides an application scene, which can be a live broadcast scene, as shown in fig. 12, wherein the scene at least comprises a live broadcast terminal 1202, a live broadcast server 1204, a viewing terminal 1206 and a viewing terminal 1208, the live broadcast terminal 1202, the viewing terminal 1206 and the viewing terminal 1208 are all in communication connection with the live broadcast server 1204 through a wired network or a wireless network, the live broadcast terminal 1202 can shoot a live broadcast user to obtain live broadcast video, the candidate motion vector corresponding to each video frame in the live broadcast video is obtained through the candidate motion vector generation method provided by the embodiment of the application, each video frame in the live broadcast video is further encoded based on the candidate motion vector, the live broadcast video stream obtained after encoding is sent to the live broadcast server 1204, the live broadcast server 1204 can forward the live broadcast video stream to the viewing terminal 1206 and the viewing terminal 1208, and the viewing terminal 1206 and the viewing terminal 1208 can decode the live broadcast video stream to obtain the live broadcast video after receiving the live broadcast video stream forwarded by the live broadcast server 1204, and further view the live broadcast video.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a candidate motion vector generation device for realizing the candidate motion vector generation method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the candidate motion vector generating apparatus provided below may be referred to the limitation of the candidate motion vector generating method hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 13, there is provided a candidate motion vector generation device including: a candidate reference frame acquisition module 1302, a filter invocation module 1304, a hash value calculation module 1306, a match result determination module 1308, and a candidate motion vector generation module 1310, wherein:

a candidate reference frame obtaining module 1302, configured to obtain a candidate reference frame corresponding to a pixel region to be encoded in a video frame.

The filter invoking module 1304 is configured to invoke a target filter, where the target filter is generated based on a reference frame identifier corresponding to an encoded pixel region in the video frame, and the encoded pixel region is an encoded pixel region associated with the pixel region to be encoded.

The hash value calculation module 1306 is configured to perform hash operation on the frame identifier of the candidate reference frame to obtain a hash value of the frame identifier.

A matching result determination module 1308 for determining a matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter.

The candidate motion vector generation module 1310 is configured to determine a candidate motion vector of the pixel area to be encoded based on the matching result and the motion vector of the encoded pixel area.

In the above embodiment, the target filter is generated in advance based on the reference frame identifier corresponding to the encoded pixel region associated with the pixel region to be encoded, and after the candidate reference frame corresponding to the pixel region to be encoded in the video frame is obtained, the target filter is directly called, and the hash operation is performed on the frame identifier of the candidate reference frame to obtain the frame identifier hash value; therefore, the matching result between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region can be rapidly determined based on the frame identification hash value and the target filter, and further the candidate motion vector of the pixel region to be encoded is determined based on the matching result and the motion vector of the encoded pixel region, so that the generation efficiency of the candidate motion vector of the encoded pixel region is improved, the efficiency and the effect of video encoding can be facilitated particularly under the condition of processing a large number of reference frames, the calculation resources are saved, and the video compression quality and the encoding speed are improved.

In one embodiment, as shown in fig. 14, the apparatus further comprises a filter construction module 1312 for: acquiring a reference frame identifier corresponding to the encoded pixel region; carrying out hash operation on the reference frame identifier based on a hash function corresponding to the initial filter to obtain a hash value of the reference frame identifier; determining a frame identifier insertion position corresponding to the hash value of the reference frame identifier in the initial filter; and adjusting the numerical value corresponding to the frame identification insertion position in the initial filter to obtain a target filter.

In one embodiment, filter construction module 1312 is also configured to: acquiring a coding mode identifier of a coded pixel area; when the coding mode of the coding mode identification characterization coded pixel area is inter prediction, executing the step of obtaining the reference frame identification corresponding to the coded pixel area.

In one embodiment, the hash value calculation module 1306 is further configured to: obtaining a hash function corresponding to a target filter; and carrying out hash operation on the frame identifications of the candidate reference frames based on the hash function to obtain frame identification hash values.

In one embodiment, the match result determination module 1308 is further to: determining a target position corresponding to the frame identification hash value in the target filter; obtaining a value to be matched corresponding to a target position in a target filter; when the value to be matched is a first target value, determining that a target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifiers corresponding to the encoded pixel areas; and when the value to be matched is a second target value, determining that the target reference frame identification matched with the frame identification of the candidate reference frame does not exist in the reference frame identifications corresponding to the encoded pixel areas.

In one embodiment, candidate motion vector generation module 1310 is further configured to: when the matching result represents that the target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region, acquiring a target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region; determining a target motion vector as a candidate motion vector of a pixel region to be encoded; and performing scaling treatment on other motion vectors in the motion vectors of the encoded pixel region to obtain candidate motion vectors of the pixel region to be encoded.

In one embodiment, candidate motion vector generation module 1310 is further configured to: searching a target reference frame identifier which is the same as the frame identifier of the reference frame from the reference frame identifiers of the encoded pixel areas; and when the target reference frame identifier is found, executing the step of acquiring a target motion vector corresponding to the target reference frame identifier from the motion vectors of the encoded pixel areas.

In one embodiment, candidate motion vector generation module 1310 is further configured to: when the matching result represents that the target reference frame identification matched with the frame identification of the candidate reference frame does not exist in the reference frame identifications corresponding to the encoded pixel areas, the motion vector of the encoded pixel areas is obtained; and performing scaling treatment on the motion vector of the encoded pixel region to obtain a candidate motion vector of the pixel region to be encoded.

In one embodiment, candidate motion vector generation module 1310 is further configured to: determining a scaling factor based on the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region; and scaling the motion vector of the encoded pixel region based on the scaling coefficient to obtain a candidate motion vector of the pixel region to be encoded.

In one embodiment, candidate motion vector generation module 1310 is further configured to: determining a first identification interval between the candidate reference frame and the video frame based on the frame identification of the candidate reference frame; determining a second identification interval between a reference frame and a video frame of the encoded pixel region based on the reference frame identification; a scaling factor is determined based on the first identification interval and the second identification interval.

In one embodiment, as shown in fig. 14, the apparatus further includes a motion vector selection module 1314 for: determining the coding cost of each candidate motion vector; and selecting a motion vector of the pixel region to be coded from the candidate motion vectors based on the coding cost.

The respective modules in the candidate motion vector generation device described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 15. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a candidate motion vector generation method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 15 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of candidate motion vector generation, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

acquiring a reference frame identifier corresponding to the encoded pixel region;

carrying out hash operation on the reference frame identifier based on a hash function corresponding to the initial filter to obtain a hash value of the reference frame identifier;

determining a frame identifier insertion position corresponding to the referred frame identifier hash value in the initial filter;

and adjusting the numerical value corresponding to the frame identifier insertion position in the initial filter to obtain the target filter.

3. The method of claim 2, wherein prior to the obtaining the reference frame identifier corresponding to the encoded pixel region, the method further comprises:

acquiring a coding mode identifier of the coded pixel region;

And when the coding mode of the coding mode identification representing the coded pixel region is inter-frame prediction, executing the step of acquiring the reference frame identification corresponding to the coded pixel region.

4. The method of claim 1, wherein hashing the frame identification of the candidate reference frame to obtain a frame identification hash value comprises:

obtaining a hash function corresponding to the target filter;

and carrying out hash operation on the frame identification of the candidate reference frame based on the hash function to obtain a frame identification hash value.

5. The method of claim 1, wherein the determining a match between the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region based on the frame identification hash value and the target filter comprises:

determining a target position corresponding to the frame identification hash value in the target filter;

obtaining a value to be matched corresponding to the target position in the target filter;

when the value to be matched is a first target value, determining that a target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifiers corresponding to the encoded pixel areas;

And when the value to be matched is a second target value, determining that a target reference frame identifier matched with the frame identifier of the candidate reference frame does not exist in the reference frame identifiers corresponding to the encoded pixel areas.

6. The method of claim 1, wherein the determining the candidate motion vector for the pixel region to be encoded based on the matching result and the motion vector for the encoded pixel region comprises:

when the matching result represents that a target reference frame identifier matched with the frame identifier of the candidate reference frame exists in the reference frame identifier corresponding to the encoded pixel region, acquiring a target motion vector corresponding to the target reference frame identifier from the motion vector of the encoded pixel region;

determining the target motion vector as a candidate motion vector of the pixel region to be encoded;

and performing scaling treatment on other motion vectors in the motion vectors of the encoded pixel region to obtain candidate motion vectors of the pixel region to be encoded.

7. The method of claim 6, wherein prior to the obtaining the target motion vector corresponding to the target reference frame identity from the motion vectors of the encoded pixel region, the method further comprises:

Searching a target reference frame identification which is the same as the frame identification of the candidate reference frame from the reference frame identifications of the encoded pixel areas;

and when the target reference frame identifier is found, executing the step of acquiring a target motion vector corresponding to the target reference frame identifier from the motion vectors of the encoded pixel areas.

8. The method of claim 6, wherein the method further comprises:

when the matching result represents that the target reference frame identification matched with the frame identification of the candidate reference frame does not exist in the reference frame identifications corresponding to the encoded pixel areas, the motion vector of the encoded pixel areas is obtained;

and scaling the motion vector of the encoded pixel region to obtain a candidate motion vector of the pixel region to be encoded.

9. The method of claim 8, wherein scaling the motion vector of the encoded pixel region to obtain a candidate motion vector of the pixel region to be encoded, comprises:

determining a scaling factor based on the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region;

And scaling the motion vector of the encoded pixel region based on the scaling coefficient to obtain a candidate motion vector of the pixel region to be encoded.

10. The method of claim 9, wherein the determining a scaling factor based on the frame identification of the candidate reference frame and the reference frame identification corresponding to the encoded pixel region comprises:

determining a first identification interval between the candidate reference frame and the video frame based on a frame identification of the candidate reference frame;

determining a second identification interval between a reference frame of the encoded pixel region and the video frame based on the reference frame identification;

a scaling factor is determined based on the first identification interval and the second identification interval.

11. The method according to any one of claims 1 to 10, further comprising:

determining the coding cost of each candidate motion vector;

and selecting the motion vector of the pixel region to be coded from the candidate motion vectors based on the coding cost.

12. A candidate motion vector generation device, the device comprising:

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.