CN115733988A

CN115733988A - Video data processing method and device, computer equipment and storage medium

Info

Publication number: CN115733988A
Application number: CN202211372149.3A
Authority: CN
Inventors: 张佳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-03-03

Abstract

The embodiment of the application provides a video data processing method, a video data processing device, computer equipment and a storage medium, which are applied to scenes such as cloud technology, vehicle-mounted scenes, audio and video coding and decoding, and the method comprises the following steps: acquiring a target video frame from video data, and acquiring an associated coding unit associated with a unit to be coded in the target video frame; the coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded; determining an available reference frame set corresponding to a unit to be coded in video data according to the associated coding unit; if a parent coding unit to which a unit to be coded belongs exists in the target video frame, determining a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set; the candidate reference frame set is used for traversing the target reference frame for the unit to be encoded. By adopting the method and the device, the coding effect and the coding efficiency of the target video frame can be considered at the same time.

Description

Video data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for processing video data, a computer device, and a storage medium.

Background

A data transmission scene (e.g., a live broadcast scene) needs to encode video data to be transmitted to obtain a video code stream corresponding to the video data, so as to improve transmission efficiency. It can be understood that, in the process of encoding video data, a unit to be encoded of a target video frame to be encoded needs to be obtained from the video data, and then inter prediction or intra prediction is performed on the unit to be encoded, where in the inter prediction process, the inter prediction mode needs to use a reference frame selection algorithm to determine a reference frame for encoding the unit to be encoded in the video data.

Current reference frame selection algorithms may obtain video frames that are coded sequentially before the target video frame. A reference frame selection algorithm can determine the distance between the video frames and a target video frame and the coding quality of the video frames, further superimpose the distance and the coding quality, sort the results obtained by superimposition from big to small, and take the video frame corresponding to the maximum value selected from the sorted results as a target reference frame corresponding to a unit to be coded in the target video frame. However, the reference frame selection algorithm only considers the distance between the target reference frame and the target video frame and the encoding quality of the target reference frame, and does not consider the content similarity between the target reference frame and the target video frame, when the image content from the target reference frame to the target video frame changes violently, the difference between the content in the target reference frame and the content in the target video frame is large, and the encoding effect of the target video frame can be obviously reduced by encoding the target video frame based on the target reference frame with the large content difference. Another reference frame selection algorithm may traverse the video frames to achieve one pass encoding for each possible combination of reference frames to find the best reference frame. However, if the number of video frames in the coding order before the target video frame is large, the reference frame selection algorithm needs to consume a large amount of time in the process of traversing the video frames, thereby reducing the coding efficiency of the target video frame. Therefore, the current reference frame selection algorithm cannot take both the coding effect and the coding efficiency into consideration.

Disclosure of Invention

The embodiment of the application provides a video data processing method, a video data processing device, a computer device and a storage medium, which can simultaneously take coding effect and coding efficiency of a target video frame into consideration.

In one aspect, an embodiment of the present application provides a video data processing method, including:

acquiring a target video frame from video data, and acquiring an associated coding unit associated with a unit to be coded in the target video frame; the coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded;

determining an available reference frame set corresponding to a unit to be coded in video data according to the associated coding unit;

if a parent coding unit to which a unit to be coded belongs exists in the target video frame, determining a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set; the candidate reference frame set is used for traversing a target reference frame for the unit to be coded; the target reference frame is used for coding the unit to be coded.

An aspect of an embodiment of the present application provides a video data processing apparatus, including:

the encoding unit acquisition module is used for acquiring a target video frame from the video data and acquiring an associated encoding unit associated with a unit to be encoded in the target video frame; the coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded;

the available set determining module is used for determining an available reference frame set corresponding to a unit to be coded in the video data according to the associated coding unit;

the candidate set determining module is used for determining a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set if the parent coding unit to which the unit to be coded belongs exists in the target video frame; the candidate reference frame set is used for traversing a target reference frame for the unit to be coded; the target reference frame is used for coding the unit to be coded.

The number of the associated coding units is S, and S is a positive integer;

the available set determination module includes:

a first type obtaining unit, configured to obtain prediction coding types of the S associated coding units;

the first determining unit is used for acquiring a full reference frame set constructed aiming at the unit to be coded in the video data if the correlated coding unit with the prediction coding type of intra-frame prediction exists in the S correlated coding units, and determining the full reference frame set as an available reference frame set corresponding to the unit to be coded;

and the second determining unit is used for determining an available reference frame set corresponding to the unit to be coded in the video data according to the S associated coding units if the prediction coding types of the S associated coding units are all inter-frame prediction.

Wherein the full reference frame set comprises a forward full reference frame set and a backward full reference frame set;

a first determining unit, specifically configured to obtain, in video data, an encoded video frame whose encoding order is earlier than that of a target video frame;

a first determining unit, configured to add the encoded video frame to the forward full reference frame set if the playing order of the encoded video frame is earlier than the target video frame;

the first determining unit is specifically configured to add the encoded video frame to the backward full reference frame set if the playing order of the encoded video frame is later than that of the target video frame.

The second determining unit is specifically configured to obtain a threshold of the number of coding units associated with the video encoder; the coding unit number threshold is greater than or equal to S;

the second determining unit is specifically configured to, if S is smaller than the threshold of the number of coding units, acquire a full-size reference frame set constructed for the unit to be coded in the video data, and determine the full-size reference frame set as an available reference frame set corresponding to the unit to be coded;

the second determining unit is specifically configured to determine, if S is equal to the threshold of the number of coding units, an available reference frame set corresponding to the unit to be coded in the video data according to the reference frames used by the S associated coding units.

The number of the associated coding units is S, and S is a positive integer;

the available set determination module includes:

the full-amount set determining unit is used for acquiring a full-amount reference frame set constructed aiming at a unit to be coded in the video data if the prediction coding types of the S associated coding units are all inter-frame prediction and S is equal to a coding unit quantity threshold value associated with a video coder; the full reference frame set comprises a forward full reference frame set and a backward full reference frame set;

a reference frame determining unit, configured to obtain a forward reference frame closest to the position of the target video frame in the forward full-scale reference frame set, and obtain a backward reference frame closest to the position of the target video frame in the backward full-scale reference frame set;

and the reference frame merging unit is used for merging the reference frame, the forward reference frame and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded.

Wherein the available reference frame set comprises a forward available reference frame set and a backward available reference frame set;

a reference frame merging unit, specifically configured to determine a union of reference frames used by the S associated encoding units as an associated reference frame set;

the reference frame merging unit is specifically used for determining the reference frame and the forward reference frame which are earlier than the target video frame in playing sequence in the associated reference frame set as a forward available reference frame set corresponding to the unit to be encoded if the associated reference frame set does not include the forward reference frame and the backward reference frame;

the reference frame merging unit is specifically configured to determine a reference frame and a backward reference frame in the associated reference frame set, which have a play order later than that of the target video frame, as a backward available reference frame set corresponding to the unit to be encoded.

Wherein the candidate set determination module comprises:

a second type acquiring unit for acquiring a predictive coding type of the parent coding unit;

the third determining unit is used for determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded if the prediction coding type of the parent coding unit is intra-frame prediction;

and the fourth determining unit is used for determining the candidate reference frame set corresponding to the unit to be coded according to the reference frame used by the parent coding unit and the available reference frame set if the predictive coding type of the parent coding unit is inter-frame prediction.

The fourth determining unit is specifically configured to acquire an inter-frame coding mode of the parent coding unit;

a fourth determining unit, configured to determine, if the inter-frame coding mode is not the inter-frame skip mode, the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

and the fourth determining unit is specifically configured to, if the inter-frame coding mode is an inter-frame skip mode, match the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

Wherein the candidate set determination module comprises:

a reference frame matching unit, configured to match a reference frame used by the parent coding unit with an available reference frame set if the prediction coding type of the parent coding unit is inter prediction and the inter coding mode of the parent coding unit is inter skip mode;

a fifth determining unit, configured to determine, if an intersection exists between the reference frame used by the parent encoding unit and a reference frame in the available reference frame set, the intersection between the reference frame used by the parent encoding unit and the reference frame in the available reference frame set as a candidate reference frame set corresponding to the unit to be encoded;

and the sixth determining unit is used for determining the available reference frame set as the candidate reference frame set corresponding to the unit to be coded if no intersection exists between the reference frame used by the parent coding unit and the reference frame in the available reference frame set.

Wherein the available reference frame set comprises a forward available reference frame set and a backward available reference frame set; the candidate reference frame set comprises a forward candidate reference frame set and a backward candidate reference frame set;

a fifth determining unit, configured to determine, as a forward candidate reference frame set corresponding to the unit to be encoded, a reference frame and a forward available reference frame set that are in a play order earlier than that of the target video frame in the reference frames used by the parent encoding unit;

and the fifth determining unit is specifically configured to determine, as the backward candidate reference frame set corresponding to the unit to be encoded, a reference frame and a backward available reference frame set that have a playback order later than that of the target video frame in the reference frames used by the parent encoding unit.

Wherein, the device still includes:

and the set transfer module is used for determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded if the parent coding unit to which the unit to be coded belongs does not exist in the target video frame.

An aspect of an embodiment of the present application provides a computer device, including: a processor and a memory;

the processor is connected with the memory, wherein the memory is used for storing a computer program, and the computer program causes the computer device to execute the method provided by the embodiment of the application when being executed by the processor.

An aspect of the embodiments of the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is adapted to be loaded and executed by a processor, so as to enable a computer device having the processor to execute the method provided by the embodiments of the present application.

An aspect of an embodiment of the present application provides a computer program product, which includes a computer program, and the computer program is stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device executes the method provided by the embodiment of the application.

Therefore, the embodiment of the application provides a fast reference frame selection algorithm, which can fully consider the prediction correlation of video content in a target video frame (that is, the approximate rates of reference frames of coding units in the same frame are the same), on one hand, obtain an associated coding unit adjacent to a unit to be coded, and use the associated coding unit to define a reference frame of the unit to be coded, so as to obtain an available reference frame set, on the other hand, obtain a parent coding unit to which the unit to be coded belongs, and use the parent coding unit to define a child coding unit, so as to obtain a candidate reference frame set. It can be understood that, by the fast reference frame selection algorithm provided in the embodiments of the present application, a candidate reference frame set fusing a reference frame used by an associated coding unit and a reference frame used by a parent coding unit can be selected from all video frames, and because the reference frames in the candidate reference frame set are determined by the associated coding unit and the parent coding unit, the reference frames in the candidate reference frame set have a higher content similarity with the target video frame. Therefore, the method and the device for processing the video frames in the candidate reference frame set can traverse the video frames in the candidate reference frame set with fewer frames without traversing the coded video frames, so that the time consumption of traversing is reduced, and the target reference frame with the best coding effect can be obtained in a traversing result when the candidate reference frame set to which the reference frame with higher content similarity belongs is traversed, so that the coding effect and the coding efficiency of the target video frame can be considered at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

fig. 2 is a schematic view of a scenario for performing data interaction according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a video data processing method according to an embodiment of the present application;

fig. 4 is a schematic view of a scenario for acquiring an associated coding unit according to an embodiment of the present application;

FIG. 5 is a schematic view of a scene for acquiring a forward reference frame and a backward reference frame according to an embodiment of the present application;

FIG. 6 is a flow chart illustrating fast reference frame selection according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a video data processing method according to an embodiment of the present application;

fig. 8 is a schematic flowchart of a video data processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Specifically, please refer to fig. 1, where fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present disclosure. As shown in fig. 1, the network architecture may include a server 2000 and a cluster of end devices. The terminal device cluster may specifically include one or more terminal devices, and the number of terminal devices in the terminal device cluster is not limited herein. As shown in fig. 1, the plurality of terminal devices may specifically include a terminal device 3000a, a terminal device 3000b, a terminal device 3000c, a terminal device …, and a terminal device 3000n; terminal equipment 3000a, terminal equipment 3000b, terminal equipment 3000c, …, and terminal equipment 3000n may be directly or indirectly network-connected to server 2000 in a wired or wireless communication manner, respectively, so that each terminal equipment may perform data interaction with server 2000 through the network connection.

Wherein, every terminal equipment in the terminal equipment cluster can include: the system comprises intelligent terminals with data processing functions, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device, an intelligent household appliance (e.g., an intelligent television), a wearable device, a vehicle-mounted terminal, an aircraft and the like. It should be understood that each terminal device in the terminal device cluster shown in fig. 1 may be installed with an application client, and when the application client runs in each terminal device, data interaction may be performed with the server 2000. The application client can include a social client, a multimedia client (e.g., a video client), an entertainment client (e.g., a game client), an education client, a live broadcast client, and other application clients with video coding functions. The application client may be an independent client, or may be an embedded sub-client integrated in a certain client, which is not limited herein.

The server 2000 may be a server corresponding to the application client, the server 2000 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform.

For convenience of understanding, in the embodiments of the present application, one terminal device may be selected as a target terminal device from a plurality of terminal devices shown in fig. 1. For example, the terminal device 3000c shown in fig. 1 may be used as a target terminal device in the embodiment of the present application, and an application client having a video coding function may be integrated in the target terminal device. At this time, the target terminal device may implement data interaction with the server 2000 through the application client.

It should be understood that the video data processing method provided by the embodiment of the present application may be executed by a computer device having a video encoding function, and the computer device may implement data encoding and data transmission on multimedia data (e.g., video data) through a cloud technology. The video data processing method provided in the embodiment of the present application may be executed by the server 2000 (that is, the computer device may be the server 2000), may also be executed by the target terminal device (that is, the computer device may be the target terminal device), and may also be executed by both the server 2000 and the target terminal device. In other words, the server 2000 may perform encoding processing on the video data by using the video data processing method provided in the embodiment of the present application, and further send the video code stream obtained through the encoding processing to the target terminal device, and the target terminal device may decode and play the video code stream. Alternatively, the target terminal device may also perform encoding processing on the video data by using the video data processing method provided in the embodiment of the present application, and further send the video code stream obtained through the encoding processing to the server 2000. Optionally, the target terminal device may further send the video code stream obtained through the encoding processing to other terminal devices in the terminal device cluster.

The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data of different levels are processed separately, and various industry data need strong system background support so as to be realized through cloud computing.

It can be understood that the network framework may be applied to a video call scene, a video transmission scene, a cloud conference scene, a live broadcast scene, a cloud game scene, and the like, and specific service scenes are not listed one by one here. Among them, cloud gaming (Cloud gaming) may also be called game on demand (gaming), which is an online game technology based on Cloud computing technology. Cloud game technology enables light-end devices (thin clients) with relatively limited graphics processing and data computing capabilities to run high-quality games. In a cloud game scene, a game is not operated in a player game terminal but in a cloud server, and the cloud server renders the game scene into a video and audio stream which is transmitted to the player game terminal through a network. The player game terminal does not need to have strong graphic operation and data processing capacity, and only needs to have basic streaming media playing capacity and capacity of acquiring player input instructions and sending the instructions to the cloud server.

The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology. A user can share voice, data files and videos with teams and clients all over the world quickly and efficiently only by performing simple and easy-to-use operation through an internet interface, and complex technologies such as transmission and processing of data in a conference are assisted by a cloud conference service provider to operate. At present, domestic cloud conferences mainly focus on Service contents mainly in a Software as a Service (SaaS a Service) mode, including Service forms such as telephones, networks and videos, and cloud computing-based video conferences are called cloud conferences. In the cloud conference era, data transmission, processing and storage are all processed by computer resources of video conference manufacturers, users do not need to purchase expensive hardware and install complicated software, and efficient teleconferencing can be performed only by opening a browser and logging in a corresponding interface. The cloud conference system supports multi-server dynamic cluster deployment, provides a plurality of high-performance servers, and greatly improves conference stability, safety and usability. In recent years, video conferences have gained popularity from many users due to the fact that communication efficiency can be greatly improved, communication cost can be continuously reduced, and internal management level upgrading is brought, and the video conferences are widely applied to various fields of transportation, finance, operators, education, enterprises, car networking and the like. Undoubtedly, after the video conference uses cloud computing, the cloud computing has stronger attraction in convenience, rapidness and usability, and the arrival of new climax of the video conference application is necessarily stimulated.

It should be understood that a computer device (e.g., a target terminal device) having a video encoding function may perform encoding processing on video data through a video encoder to obtain a video stream corresponding to the video data, so as to improve the transmission efficiency of the video data. For example, the Video encoder may be an HEVC (High Efficiency Video Coding) Video encoder, a VVC (scalable Video Coding, general Video Coding standard) Video encoder, or the like. Among them, the VCC video encoder is also called as h.266 video encoder, and the general video coding standard specifies the decoding flow and syntax of the h.266 video encoder decoding, and the coding flow and syntax of the encoding.

The h.266 video encoder is a coding standard, and the code rate of the h.266 video encoder is only about 50% of that of the HEVC standard of the previous generation under the same subjective quality, which is greatly helpful for the current massive video service data, because the video stream of the same quality requires less storage space and less bandwidth. However, the encoding complexity of the h.266 video encoder has increased by a factor of several correspondingly, since the new standard introduces more sophisticated encoding tools to achieve higher video compression ratios. The high coding complexity means that more computing resources and longer time are needed for coding, and the high coding complexity for low-delay services such as live broadcast directly reduces the service experience of users. Therefore, it is meaningful to preserve the low rate capability of the video encoder as much as possible and reduce the encoding complexity as much as possible.

For convenience of understanding, in the embodiments of the present application, a video frame to be subjected to encoding processing in video data may be referred to as a target video frame, and a basic encoding unit to be subjected to encoding processing in the target video frame may be referred to as a unit to be encoded. The Unit to be encoded may be a Coding Unit (CU) to be encoded, and the CU is a basic Coding Unit in an h.266 video encoder.

It is understood that the target video frame may have different video frame types (i.e., frame types), and the target video frame has different frame types, and the reference frame selected when encoding the unit to be encoded in the target video frame is different. The frame types of the target video frame herein may include a first type, a second type, and a third type. In this embodiment, a frame type of an intra-frame (I-frame) may be referred to as a first type, a frame type of a bi-directional predicted frame (B-frame) may be referred to as a second type, and a frame type of a forward-predicted frame (P-frame) may be referred to as a third type.

It is understood that the video data in the embodiment of the present application may be any video data that needs to be encoded in a service scenario. For example, the video data may be directly captured by an image collector (e.g., a camera) in the terminal device, the video data may be recorded by the image collector in the terminal device in real time during a live broadcast/video call, the video data may be downloaded by the terminal device over a network, and the video data may be obtained by the terminal device from a server during a game/conference.

For easy understanding, please refer to fig. 2, and fig. 2 is a schematic diagram of a scenario for performing data interaction according to an embodiment of the present application. The server 20a shown in fig. 2 may be the server 2000 in the embodiment corresponding to fig. 1, and the terminal device 20b shown in fig. 2 may be the target terminal device in the embodiment corresponding to fig. 1. For convenience of understanding, in the embodiment of the present application, the terminal device 20b is taken as an example of a transmitting end for transmitting a video stream of video data, and the server 20a is taken as an example of a receiving end for receiving the video stream of video data.

It should be understood that terminal device 20b may obtain video data (e.g., video data 21 a). One or more video frames may be included in the video data 21a, and the number of video frames in the video data 21a is not limited in the embodiment of the present application. Further, the terminal device 20b needs to perform encoding processing on the video data 21a by a video encoder (e.g., an h.266 video encoder) to generate a video code stream associated with the video data 21a.

As shown in fig. 2, when performing encoding processing on video data 21a, terminal device 20b may acquire a target video frame (e.g., video frame 21 b) that needs to be subjected to encoding processing from video data 21a, further acquire a unit to be encoded from video frame 21b, and further acquire an associated encoding unit associated with the unit to be encoded from video data 21a. The coding order of the associated coding units is earlier than that of the units to be coded (namely, the associated coding units are coded completely before the units to be coded start to be coded), and the associated coding units are adjacent to the units to be coded. It should be understood that the number of associated coding units is not limited in the embodiments of the present application, and the embodiments of the present application take the example that there is an associated coding unit associated with a unit to be coded.

Further, the terminal device 20b may perform encoding processing on the unit to be encoded based on the encoding policy of the video encoder to obtain a compressed code stream corresponding to the unit to be encoded. It should be understood that when the terminal device 20b completes the encoding process on each unit to be encoded in the target video frame, the compressed code stream corresponding to each unit to be encoded may be obtained, and further when the encoding process on each video frame in the video data 21a is completed, the compressed code stream corresponding to each unit to be encoded may be encapsulated into the video code stream associated with the video data 21a, so as to complete the encoding process on the video data 21a.

The encoding strategy of the video encoder may include an intra-frame prediction mode (i.e., intra-frame prediction encoding) and an inter-frame prediction mode (i.e., inter-frame prediction encoding), which may be collectively referred to as encoding prediction techniques, the intra-frame prediction (i.e., intra-frame encoding) indicates that the encoding of the current frame does not refer to information of other frames, and the inter-frame prediction (i.e., inter-frame encoding) indicates that the current frame is predicted using information of adjacent frames. When inter-frame prediction is performed on a unit to be encoded in a target video frame, a video encoder may select one frame from a forward reference frame list or a backward reference frame list as a reference frame (i.e., unidirectional prediction), may also select a total of two frames from the two reference frame lists as reference frames (i.e., bidirectional prediction), and optionally, the unidirectional prediction may also select one frame from a forward reference frame list as the reference frame, but may not select one frame from a backward reference frame list as the reference frame. Among them, the second type of video frame (i.e., B frame) may be inter-predicted using unidirectional prediction or bidirectional prediction, and the third type of video frame (P frame) may be inter-predicted using unidirectional prediction.

It should be understood that the embodiment of the present application may be applied to reference frame selection in an inter prediction mode, as shown in fig. 2, the terminal device 20b may determine, according to an associated coding unit, an available reference frame set corresponding to a unit to be coded in the video data 21a, where the available reference frame set may include one or more video frames, and the embodiment of the present application does not limit the number of reference frames in the available reference frame set, where all the reference frames in the available reference frame set are already coded video frames. Alternatively, if there is no associated coding unit associated with the unit to be coded, the terminal device 20b may determine a video frame of the video data 21a that is earlier in coding order than the target video frame as the available reference frame set corresponding to the unit to be coded.

Further, as shown in fig. 2, the terminal device 20b may determine whether a parent coding unit to which the unit to be coded belongs exists in the video frame 21b, and if the parent coding unit to which the unit to be coded belongs exists in the target video frame, determine a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set, where the candidate reference frame set may include one or more video frames. Alternatively, if there is no parent coding unit to which the unit to be coded belongs, the terminal device 20b may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be coded.

Further, as shown in fig. 2, after determining the candidate reference frame set, the terminal device 20b may traverse the candidate reference frame set to obtain a target reference frame, perform encoding processing on the unit to be encoded based on the target reference frame to obtain a compressed code stream corresponding to the unit to be encoded, and further obtain a video code stream associated with the video data 21a. At this time, the terminal device 20b may transmit the video stream associated with the video data 21a to the server 20a, so that the server 20a may perform decoding processing on the video stream through a video decoder when receiving the video stream, to obtain the video data 21a.

It can be understood that the compressed code stream corresponding to the unit to be encoded may include, but is not limited to, a motion vector, a reference frame index, a reference frame list, and the like, and the server 20a may generate an inter-frame prediction pixel value by using information in the compressed code stream, that is, restore the unit to be encoded. The reference frame index may represent an index for positioning a specific reference frame in a reference frame list, and the reference frame index may position the specific reference frame used when the unit to be encoded is encoded in the reference frame list, where the reference frame list may be a full reference frame list or a candidate reference frame list, which is not limited in this application.

Therefore, when the unit to be encoded in the target video frame needs to be encoded, the embodiment of the application can acquire the associated encoding unit associated with the unit to be encoded and the parent encoding unit to which the unit to be encoded belongs from the target video frame, and determine the candidate reference frame set corresponding to the unit to be encoded according to the associated encoding unit and the parent encoding unit. It can be understood that the reference frames in the candidate reference frame set are reference frames associated with the associated coding unit and the parent coding unit, and in consideration of the correlation between the unit to be coded and the video content in the parent coding unit and the correlation between the unit to be coded and the video content in the associated coding unit, it can be known that the reference frames in the candidate reference frame set and the target video frame to which the unit to be coded belongs have higher content similarity, so that when the unit to be coded in the target video frame is coded based on the candidate reference frame set, the candidate reference frame set can be traversed without traversing all the coded video frames, which not only can ensure the coding effect of the target video frame, but also can simplify the selection of the reference frames, effectively reduce the occupation ratio of the reference frame selection in the complexity of the whole coding process, thereby reducing the computational complexity of the inter-frame coding process of the video coder, and further reducing the overhead of coding time (i.e., improving coding efficiency), computational resources and bandwidth resources.

A specific implementation manner of determining the candidate reference frame set in the video data by the computer device having the video coding function may refer to the following embodiments corresponding to fig. 3 to fig. 8.

Further, please refer to fig. 3, wherein fig. 3 is a schematic flowchart of a video data processing method according to an embodiment of the present application. The method may be executed by a server, or may be executed by a terminal device, or may be executed by both the server and the terminal device, where the server may be the server 20a in the embodiment corresponding to fig. 2, and the terminal device may be the terminal device 20b in the embodiment corresponding to fig. 2. For convenience of understanding, the embodiment of the present application is described as an example in which the method is executed by a terminal device. The video data processing method may include the following steps S101 to S103:

step S101, a target video frame is obtained from video data, and a related coding unit related to a unit to be coded in the target video frame is obtained;

specifically, the terminal device may obtain a video frame to be encoded from the video data, and determine the obtained video frame as a target video frame. Further, the terminal device may perform image block division processing on the target video frame through the video encoder to obtain one or more image blocks (i.e., coding blocks) of the target video frame, and further acquire the unit to be coded from the one or more image blocks. Among them, the purpose of the image block division processing is to use a smaller image block in a minutely moving part and a larger image block in a static background for more accurate processing prediction, and the encoding unit CU may be referred to as an image block in the embodiments of the present application. Further, the terminal device may acquire an image block satisfying an association condition from the one or more image blocks according to a position of the unit to be encoded in the one or more image blocks, and determine the acquired image block as an associated encoding unit. The coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded. Therefore, the associated condition refers to a condition that the encoding order of the acquired image blocks is earlier than the encoding order of the unit to be encoded, and the acquired image blocks are adjacent to the unit to be encoded.

For easy understanding, please refer to fig. 4, and fig. 4 is a schematic view of a scenario for obtaining an associated coding unit according to an embodiment of the present application. As shown in fig. 4, the position diagram 44a may include a unit to be encoded and an associated coding unit, in the position diagram 44a, E may represent the unit to be encoded, A, B, C and D may represent the associated coding unit, which may also be referred to as an adjacent CU, a may represent a "left" CU, B may represent an "upper left" CU, C may represent an "upper" CU, and D may represent an "upper right" CU. In addition, the position diagram 44a shows a unit picture to be coded as being very large, and a related coding unit picture as being very small, which is only illustrative, and actually, the related coding unit is not necessarily smaller than the unit to be coded, and it should be understood that the size of the unit to be coded and the size of the related coding unit are not limited in the embodiments of the present application.

It can be understood that, according to the encoding standard, the associated encoding unit a, the associated encoding unit B, the associated encoding unit C and the associated encoding unit D are encoded before the encoding of the unit to be encoded E is started. It should be understood that the sizes of the unit to be encoded E, the associated encoding unit a, the associated encoding unit B, the associated encoding unit C and the associated encoding unit D are not limited in the embodiments of the present application. The sizes of the unit to be encoded E, the associated encoding unit a, the associated encoding unit B, the associated encoding unit C, and the associated encoding unit D are determined by a partitioning manner of the video encoder, for example, the partitioning manner may be a quadtree, a binary tree, a ternary tree, or the like, and the partitioning manner is not limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, a limit may be added to the size of the unit to be encoded (that is, a size limit is added), for example, when the number of pixels of the unit to be encoded exceeds a pixel threshold (for example, 512 pixels), the fast policy provided in this embodiment of the present application is executed, where the fast policy is step S101 to step S103 in this embodiment of the present application.

The image block division schematic diagram 44b shown in fig. 4 may be an image block division schematic diagram of a target video frame to be subjected to encoding processing, which is obtained from video data, and the image block division schematic diagram 44b may be a schematic diagram obtained by performing image block division processing by an h.266 video encoder. Similarly, the present application does not limit the specific partitioning form in the image block partitioning diagram 44b, and the specific partitioning form in the image block partitioning diagram 44b satisfies the partitioning manner specified by the video encoder.

The image block division diagram 44b shows that the target video frame may be divided into (an image block 40a, an image block 40b, an image block 40c, an image block 40d, an image block 40e, an image block 40f, an image block 40g, an image block 40h, an image block 40i, an image block 40 j), (an image block 41 a), (an image block 42a, an image block 42b, an image block 42c, an image block 42d, an image block 42 e), (an image block 43a, an image block 43b, an image block 43c, an image block 43d, an image block 43e, an image block 43f, an image block 43g, an image block 43h, an image block 43i, and an image block 43 j).

The (image block 40a, image block 40b, image block 40c, image block 40d, image block 40e, image block 40f, image block 40g, image block 40h, image block 40i, and image block 40 j) may be divided into (image block 40a, image block 40b, image block 40c, image block 40 d), (image block 40e, image block 40 f), (image block 40 g), (image block 40h, image block 40i, and image block 40 j). Further, (image block 40a, image block 40b, image block 40c, image block 40 d) may be divided into (image block 40 a), (image block 40 b), (image block 40 c), (image block 40 d), (image block 40e, image block 40 f) may be divided into (image block 40 e), (image block 40 f), (image block 40h, image block 40i, image block 40 j) may be divided into (image block 40 h), (image block 40 i), (image block 40 j). Similarly, the (image block 42a, image block 42b, image block 42c, image block 42d, image block 42 e) and (image block 43a, image block 43b, image block 43c, image block 43d, image block 43e, image block 43f, image block 43g, image block 43h, image block 43i, and image block 43 j) may be divided, and will not be described herein again.

For ease of understanding, an image block 42E as shown in FIG. 4 may be used as the unit to be encoded E, in which case, an image block 42D may be used as the associated encoding unit A, an image block 40g may be used as the associated encoding unit B, an image block 40j may be used as the associated encoding unit C, and an image block 41a may be used as the associated encoding unit D. Alternatively, the image block 42D shown in fig. 4 may be used as the unit to be encoded E, and at this time, (the image block 42a, the image block 42B, and the image block 42C) may be used as the associated encoding unit a, the image block 40g may be used as the associated encoding unit B, the image block 40j may be used as the associated encoding unit C, and the image block 41a may be used as the associated encoding unit D.

It should be understood that the terminal device may obtain an encoding policy of a video encoder (e.g., h.266 video encoder), and perform encoding processing on the unit to be encoded based on the encoding policy of the video encoder. The coding modes associated with the coding strategy can include an inter-frame prediction mode and an intra-frame prediction mode, so that when the terminal device performs inter-frame prediction processing on the unit to be coded, the terminal device can determine a reference video frame associated with the unit to be coded based on the frame type of the target video frame. Wherein different video compression standards may correspond to different reference video frames. If the frame type of the target video frame is a B frame (i.e., the second type) or a P frame (i.e., the third type), the terminal device may perform the following steps S102 to S103; alternatively, if the frame type of the target video frame is an I frame (i.e., the first type), the terminal device does not need to perform the following steps S102 to S103.

The h.266 video encoder performs encoding in a block division manner, when performing encoding, an image block is divided into several CUs, the CUs may be divided in a nested manner, and a CU may be continuously divided into several CUs as a new image block until a minimum CU size limit is reached, so that the CU is a basic unit for encoding prediction.

The number of the associated coding units is S, where S may be a non-negative integer. It is understood that if S is a positive integer, the terminal device may perform the following steps S102 to S103. Optionally, if S is equal to 0, the terminal device may obtain a full reference frame set constructed for the unit to be encoded in the video data, and determine the full reference frame set as an available reference frame set corresponding to the unit to be encoded.

Step S102, determining an available reference frame set corresponding to a unit to be coded in video data according to an associated coding unit;

specifically, if the prediction coding types of the S associated coding units are all inter-frame prediction, and S is equal to the threshold of the number of coding units associated with the video encoder, the terminal device may obtain, in the video data, a full reference frame set constructed for the unit to be coded. Wherein the full reference frame set comprises a forward full reference frame set and a backward full reference frame set. Further, the terminal device may obtain a forward reference frame closest to the position of the target video frame in the forward full reference frame set, and obtain a backward reference frame closest to the position of the target video frame in the backward full reference frame set. Further, the terminal device may merge the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded.

It will be appreciated that in performing inter-frame prediction, the video encoder may construct a reference frame list for the target video frame, the reference frame list comprising two parts, one part being a forward reference frame list (i.e., a forward full reference frame set) and the other part being a backward reference frame list (i.e., a backward full reference frame set). The forward reference frame list contains video frames that are both in coding order and in playback order before the current frame (i.e., the target video frame), and the backward reference frame list contains video frames that are both in coding order before the current frame (i.e., the target video frame) and in playback order after the current frame (i.e., the target video frame). The number of video frames in the reference frame list is not limited in the embodiments of the present application.

It can be understood that, depending on the encoding rule, the probability that a video frame in the full reference frame list is selected as the best reference frame is different, and since the video data has content continuity, the probability that the same reference frame is selected by the continuous area of the same video frame is very high (i.e. the reference frame areas of the same image block area are approximately the same). Therefore, the embodiments of the present application propose to restrict the reference frames that can be used by a current CU (i.e., a unit to be coded) by using the reference frames used by a neighboring CU (i.e., an associated coding unit). Specifically, before the encoder performs inter-frame prediction with motion search on the current CU, statistics are performed on reference frames used by neighboring CUs, and only the reference frames used by the neighboring CUs can be selected as the reference frame of the current CU.

It can be understood that, depending on the encoding rule, the probability that the video frame in the full reference frame list is selected as the best reference frame is different, and since the video frames with the closer playing order have higher content similarity probability, the frames with the playing order closer to the current frame to be encoded (i.e. the target video frame) are easier to be selected as the reference frames. Therefore, in order to reduce the influence on the coding effect as much as possible, two frames before and after the closest playing distance from the current frame (i.e., the target video frame) in the reference frame list (i.e., the forward full-reference frame set and the backward full-reference frame set) are always available, even if the two frames before and after are not referred to by the neighboring CU (i.e., the two frames before and after do not belong to the reference frames used by the S associated coding units). The video frame closest to the playing distance of the target video frame in the forward full reference frame set is a forward reference frame, and the video frame closest to the playing distance of the target video frame in the backward full reference frame set is a backward reference frame.

For example, the forward full reference frame list may include video frame X ₁ Video frame X ₂ And video frame X ₃ Video frame X may be included in the backward full reference frame list ₄ Video frame X ₅ And video frame X ₆ Video frame X ₁ Video frame X ₂ Video frame X ₃ Watch and watchFrequency frame X ₄ Video frame X ₅ And video frame X ₆ Are all before the target video frame, video frame X ₁ Video frame X ₂ And video frame X ₃ In the display order before the target video frame, video frame X ₄ Video frame X ₅ And video frame X ₆ After the target video frame, video frame X ₁ Video frame X ₂ Video frame X ₃ Target video frame, video frame X ₄ Video frame X ₅ And video frame X ₆ Arranged according to the playing sequence, at this time, the terminal equipment can arrange the video frame X ₃ Determining video frame X as a forward reference frame ₄ Determined as a backward reference frame.

Wherein the available reference frame set comprises a forward available reference frame set and a backward available reference frame set. It should be understood that the terminal device merges the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded: the terminal device may determine a union of the reference frames used by the S associated coding units as an associated reference frame set. The terminal device may determine a reference frame in the association reference frame set, which has a playing sequence earlier than that of the target video frame, as a forward association reference frame set, determine a reference frame in the association reference frame set, which has a playing sequence later than that of the target video frame, as a backward association reference frame set, and the forward association reference frame set and the backward association reference frame set may be collectively referred to as an association reference frame set. In other words, the terminal device may add, to the forward direction associated reference frame set, a reference frame whose play order among the reference frames used by the S associated encoding units is earlier than that of the target video frame, and add, to the backward direction associated reference frame set, a reference frame whose play order among the reference frames used by the S associated encoding units is later than that of the target video frame. Further, if the associated reference frame set does not include the forward reference frame and the backward reference frame, the terminal device may determine the associated reference frame set, the forward reference frame, and the backward reference frame as an available reference frame set corresponding to the unit to be encoded. In other words, if the associated reference frame set does not include the forward reference frame and the backward reference frame, the terminal device may determine the reference frame and the forward reference frame that are earlier in the playing sequence than the target video frame in the associated reference frame set as a forward available reference frame set corresponding to the unit to be encoded; and determining the reference frame and the backward reference frame which play sequence is later than that of the target video frame in the associated reference frame set as a backward available reference frame set corresponding to the unit to be coded.

For convenience of understanding, the embodiments of the present application are described by taking an example in which a forward associated reference frame set includes reference frames with a playing order earlier than that of a target video frame, and a backward associated reference frame set includes reference frames with a playing order later than that of the target video frame. Optionally, if the forward associated reference frame set does not include a reference frame (that is, the associated reference frame set does not include a reference frame whose playing order is earlier than that of the target video frame), the terminal device may determine the forward reference frame as a forward available reference frame set corresponding to the unit to be encoded; if the backward associated reference frame set does not include a reference frame (i.e., the associated reference frame set does not include a reference frame whose playing order is later than that of the target video frame), the terminal device may determine the backward reference frame as a backward available reference frame set corresponding to the unit to be encoded.

For example, taking S equal to 3 as an example for explanation, S associated coding units may include an associated coding unit P ₁ Related coding unit P ₂ And associated coding unit P ₃ Associated coding unit P ₁ The reference frame used may be the reference frame W ₁ And a reference frame W ₂ Associated coding unit P ₂ The reference frame used may be the reference frame W ₁ Associated coding unit P ₃ The reference frame used may be the reference frame W ₃ . Thus, the union of the reference frames used by the S associated coding units may be the reference frame W ₁ Reference frame W ₂ And a reference frame W ₃ Reference frame W ₁ Reference frame W ₂ And a reference frame W ₃ Can be used to form an associated reference frame set, wherein a reference frame W ₁ And a reference frame W ₃ Is played earlier than the target video frame and the reference frame W ₂ Is later than the target video frame, then the reference frame W ₁ And a reference frame W ₃ Can be used to construct a set of forward correlated reference frames, reference frame W ₂ Can be used to construct a set of backward correlated reference frames.

Optionally, if the associated reference frame set includes a forward reference frame and a backward reference frame, the associated reference frame set is determined as an available reference frame set corresponding to the unit to be encoded. In other words, if the associated reference frame set comprises a forward reference frame and a backward reference frame, determining the reference frame with the playing sequence earlier than that of the target video frame in the associated reference frame set as a forward available reference frame set corresponding to the unit to be encoded; and determining the reference frames with the playing sequence later than that of the target video frames in the associated reference frame set as a backward available reference frame set corresponding to the unit to be coded.

Optionally, if the associated reference frame set includes a forward reference frame and does not include a backward reference frame, the associated reference frame set and the backward reference frame are determined as an available reference frame set corresponding to the unit to be encoded. In other words, if the associated reference frame set includes a forward reference frame and does not include a backward reference frame, determining a reference frame in the associated reference frame set, which has a playing sequence earlier than that of the target video frame, as a forward available reference frame set corresponding to the unit to be encoded; and determining the reference frame and the backward reference frame which play sequence is later than that of the target video frame in the associated reference frame set as a backward available reference frame set corresponding to the unit to be coded.

For convenience of understanding, the embodiments of the present application will be described with reference to an example in which a reference frame in a later playing order than a target video frame is included in an associated reference frame set. Optionally, if the backward associated reference frame set does not include the reference frame, the terminal device may determine the backward reference frame as a backward available reference frame set corresponding to the unit to be encoded.

Optionally, if the associated reference frame set does not include a forward reference frame and includes a backward reference frame, the associated reference frame set and the forward reference frame are determined as an available reference frame set corresponding to the unit to be encoded. In other words, if the associated reference frame set does not include the forward reference frame and includes the backward reference frame, the reference frame and the forward reference frame which are earlier than the target video frame in the playing sequence in the associated reference frame set are determined as the forward available reference frame set corresponding to the unit to be encoded; and determining the reference frames with the playing sequence later than that of the target video frames in the associated reference frame set as a backward available reference frame set corresponding to the unit to be coded.

For convenience of understanding, the embodiments of the present application are described by taking an example in which the forward associated reference frame set includes reference frames that are earlier in playing order than the target video frame. Optionally, if the forward associated reference frame set does not include a reference frame, the terminal device may determine the forward reference frame as a forward available reference frame set corresponding to the unit to be encoded.

Optionally, if there is an associated coding unit of which the prediction coding type is intra-frame prediction in the S associated coding units, or S is smaller than the threshold of the number of coding units associated with the video encoder, the terminal device may determine the full-size reference frame set as an available reference frame set corresponding to the unit to be coded.

It should be understood that the embodiment of the present application does not limit the specific value of the coding unit number threshold, for example, when the video encoder is an h.266 video encoder, the coding unit number threshold may be equal to 4.

Step S103, if a parent coding unit to which a unit to be coded belongs exists in the target video frame, determining a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set;

specifically, a specific process of the terminal device determining the candidate reference frame set corresponding to the unit to be encoded according to the parent encoding unit and the available reference frame set may be described as follows: if the prediction encoding type of the parent encoding unit is inter prediction and the inter encoding mode of the parent encoding unit is inter skip mode, the terminal device may match the reference frame used by the parent encoding unit with the available reference frame set. Further, if there is an intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, the terminal device may determine the intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set as the candidate reference frame set corresponding to the unit to be coded. Optionally, if there is no intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, the terminal device may determine the available reference frame set as the candidate reference frame set corresponding to the unit to be coded.

It is understood that the inter-coding mode of the parent coding unit may be an inter-SKIP mode (i.e., SKIP mode), an inter-Merge mode (i.e., merge mode), an AMVP (Advanced Motion Vector Prediction) mode, or the like. When the SKIP mode is used for inter-frame prediction of the parent coding unit, the parent coding unit does not need to transmit a residual coefficient and an MVD (Motion Vector Difference); when inter prediction is performed on a parent coding unit by using the Merge mode, motion parameters of the parent coding unit can be directly obtained from a coding unit which is adjacent to the coded coding unit (namely, an associated coding unit associated with the parent coding unit, wherein the associated coding unit associated with the parent coding unit represents an associated coding unit obtained when the parent coding unit is used as a unit to be coded); MVD is required when inter prediction is performed on a parent coding unit using AMVP mode.

The SKIP mode is a special inter-frame prediction technique, and directly multiplexes one block of content of a reference frame, and the SKIP mode is limited to improve the probability that the reference frames of a parent CU (i.e., a parent coding unit) and a child CU (i.e., a child coding unit) are the same. In this case, the parent CU and the child CU are relative, the parent coding unit may be regarded as a parent CU of the unit to be coded, and the unit to be coded may be regarded as a child CU of the parent coding unit, so that the child coding unit is the unit to be coded.

Wherein the available reference frame set comprises a forward available reference frame set and a backward available reference frame set; the set of candidate reference frames includes a forward set of candidate reference frames and a backward set of candidate reference frames. The specific process of the terminal device determining the candidate reference frame set corresponding to the unit to be encoded may be described as follows: the terminal device can determine a reference frame and a forward available reference frame set which play order is earlier than that of a target video frame in the reference frame used by the parent coding unit as a forward candidate reference frame set corresponding to the unit to be coded; and determining the reference frame and the backward available reference frame set with the playing sequence later than that of the target video frame in the reference frames used by the parent coding unit as a backward candidate reference frame set corresponding to the unit to be coded.

For convenience of understanding, in the embodiments of the present application, reference frames used by a parent coding unit include reference frames whose playback order is earlier than that of a target video frame, and reference frames used by the parent coding unit include reference frames whose playback order is later than that of the target video frame. Optionally, if the reference frame used by the parent coding unit does not include a reference frame whose playing order is earlier than that of the target video frame, the terminal device may determine the forward available reference frame set as a forward candidate reference frame set corresponding to the unit to be coded; if the reference frames used by the parent coding unit do not include the reference frames with the playing sequence later than that of the target video frame, the terminal device may determine the backward available reference frame set as a backward candidate reference frame set corresponding to the unit to be coded.

Alternatively, if the prediction coding type of the parent coding unit is intra-frame prediction, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be coded. Alternatively, if the prediction encoding type of the parent encoding unit is inter prediction and the inter encoding mode of the parent encoding unit is not inter skip mode, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be encoded. Optionally, if there is no parent coding unit to which the unit to be coded belongs in the target video frame, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be coded.

For ease of understanding, please refer to fig. 4 again, the image block 42e may be a unit to be encoded, and (the image block 42a, the image block 42 e) may be a parent encoding unit to which the image block 42e belongs; alternatively, tile 40d may be the unit to be encoded, (tile 40b, tile 40 d) may be the parent coding unit to which tile 40d belongs (which means that tile 40b and tile 40d may belong to the same coding unit), or, (tile 40c, tile 40 d) may be the parent coding unit to which tile 40d belongs (which means that tile 40c and tile 40d may belong to the same coding unit).

The candidate reference frame set is used for traversing a target reference frame for a unit to be coded; the target reference frame is used for coding the unit to be coded. It should be understood that the terminal device may determine the video frames in the candidate reference frame set as the reference video frames associated with the unit to be encoded, the video encoder does not determine how to specifically select the reference video frames for the encoding process, and different selections may have different encoding effects, and in order to obtain the best encoding effect, the video encoder may perform one-pass encoding on each possible reference frame combination, including extremely high complexity motion search and motion compensation, so as to obtain the reference frame combination with the best encoding effect. The coding effect in the embodiment of the present application may be understood as distortion, and the coding effect may be measured by using a rate distortion cost.

Wherein the candidate reference frame set comprises a forward candidate reference frame set and a backward candidate reference frame set. It should be understood that the specific process of the terminal device encoding the target reference frame in the candidate reference frame set can be described as follows: the terminal device may determine a video frame type of the target video frame. The video frame type of the target video frame may be used to guide a video encoder to select a reference frame for encoding the target video frame in the candidate reference frame set. Further, if the video frame type is a unidirectional prediction type (i.e., a third type), the terminal device may traverse the target reference frame for encoding the unit to be encoded in the forward candidate reference frame set or the backward candidate reference frame set. Optionally, if the video frame type is a bidirectional prediction type (i.e., a second type), the terminal device may traverse a target reference frame used for performing encoding processing on the unit to be encoded in the forward candidate reference frame set, the backward candidate reference frame set, or the bidirectional reference frame set. Wherein the bidirectional reference frame set comprises a forward candidate reference frame set and a backward candidate reference frame set. In other words, if the video frame type is a bidirectional prediction type, the terminal device may traverse the target reference frame for encoding the unit to be encoded in the forward candidate reference frame set or the backward candidate reference frame set; or, the terminal device may traverse the target reference frame for performing the encoding process on the unit to be encoded in the forward candidate reference frame set and the backward candidate reference frame set.

For ease of understanding, please refer to fig. 5, fig. 5 is a schematic view of a scene for acquiring a forward reference frame and a backward reference frame according to an embodiment of the present application. As shown in fig. 5, which is a schematic diagram of bi-directional prediction of a unit to be encoded, a video frame 53c may be a target video frame, a video frame set 53a may be a forward candidate reference frame set corresponding to the target video frame 53c, and a video frame set 53b may be a backward candidate reference frame set corresponding to the target video frame 53 c. The forward candidate reference frame set 53a may include a plurality of video frames, and the backward candidate reference frame set 53a may include a plurality of video frames, the embodiments of the present application do not limit the number of video frames in the forward candidate reference frame set 53a and the backward candidate reference frame set 53a, for convenience of understanding, the embodiments of the present application describe as an example that each of the forward candidate reference frame set 53a and the backward candidate reference frame set 53a includes 3 video frames, the forward candidate reference frame set 53a may include a video frame 50a, a video frame 50b, and a video frame 50c, and the backward candidate reference frame set 53a may include a video frame 51a, a video frame 51b, and a video frame 51c.

As shown in fig. 5, the target video frame 53c may include a unit to be encoded 52a, the video frame 50c may include an encoding unit 52b, the video frame 51a may include an encoding unit 52c, when the video frame 50c and the video frame 51a are determined as reference frames used by the target video frame, the video encoder may perform encoding processing on the unit to be encoded 52a based on the encoding unit 52b and the encoding unit 52c, at this time, the unit to be encoded selects one frame as a reference frame from the forward candidate reference frame set 53a and the backward candidate reference frame set 53b and performs motion search, and the encoding unit 52b and the encoding unit 52c may be referred to as reference blocks.

Optionally, when the unit to be encoded selects one frame as a reference frame in the forward candidate reference frame set 53a and the backward candidate reference frame set 53b, and performs motion search, the video encoder may determine the video frame 50a and the video frame 51a as reference frames used by the target video frame, the video encoder may also determine the video frame 50b as a reference frame used by the target video frame, and the video encoder may also determine the video frame 51b as a reference frame used by the target video frame, which is not limited herein.

For ease of understanding, please refer to fig. 6, fig. 6 is a schematic flowchart illustrating a fast reference frame selection according to an embodiment of the present application. As shown in fig. 6, the terminal device may perform step S21, obtain the unit to be encoded in the target video frame through step S21, further, the terminal device may perform step S22, obtain the associated coding units associated with the unit to be encoded in the target video frame through step S22, if the associated coding units are all present (i.e. the number of associated coding units is equal to the threshold number of coding units), and the associated coding units are all encoded by inter-frame prediction (i.e. the prediction encoding types of the associated coding units are all inter-frame prediction), the terminal device may perform step S23, otherwise, the terminal device skips performing steps S23 and S24.

As shown in fig. 6, the terminal device may disable the reference frame that is not used by the associated coding unit in the reference frame list (i.e., the full-scale reference frame list) through step S23, i.e., obtain the reference frame used by the associated coding unit through step S23, and further, the terminal device may execute step S24, and set the frame that is closest to the current frame (i.e., the target video frame) in the previous and subsequent reference frame lists (i.e., the forward full-scale reference frame list and the backward full-scale reference frame list) as available through step S24, at this time, the terminal device may set a total of two frames as available, i.e., the terminal device may merge the reference frame used by the associated coding unit, the forward reference frame and the backward reference frame, to obtain an available reference frame set corresponding to the unit to be coded.

As shown in fig. 6, the terminal device may perform step S25, obtain, in the target video frame, a parent coding unit to which the unit to be encoded belongs through step S25, and if the parent coding unit exists, the parent coding unit uses an inter-frame skip mode (i.e., the inter-frame coding mode of the parent coding unit is the inter-frame skip mode), and a reference frame used by the parent coding unit is available (i.e., the reference frame used by the parent coding unit is available to the unit to be encoded, which indicates that the reference frame used by the parent coding unit is used by the associated coding unit (or is closest to the target video frame), and the reference frame used by the parent coding unit exists in the full reference frame list), the terminal device may perform step S26, otherwise, the terminal device skips performing step S26.

As shown in fig. 6, the terminal device may disable the reference frames in the reference frame list except the reference frame used by the parent coding unit in step S26, that is, obtain the intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set in step S26, and obtain the candidate reference frame set corresponding to the unit to be coded.

Further, as shown in fig. 6, the terminal device may execute step S27, where step S27 indicates that the selection of the fast reference frame provided in the embodiment of the present application is finished, and an output result of the selection of the fast reference frame is a candidate reference frame set, where the candidate reference frame set may be used to select a reference frame for encoding processing for a unit to be encoded in the target video frame.

Optionally, the terminal device may further obtain a target video frame from the video data, and obtain a unit to be encoded from the target video frame. Further, if a parent coding unit to which the unit to be coded belongs exists in the target video frame, the terminal device may determine an available reference frame set corresponding to the unit to be coded in the video data according to the parent coding unit and the full-scale reference frame set. For a specific process of determining the available reference frame set by the terminal device according to the parent coding unit and the full reference frame set, reference may be made to the description of determining the candidate reference frame set according to the parent coding unit and the available reference frame set, which will not be described herein again. Further, the terminal device may acquire an associated coding unit associated with the unit to be coded, and determine a candidate reference frame set corresponding to the unit to be coded according to the associated coding unit and the available reference frame set. For a specific process of determining the candidate reference frame set by the terminal device according to the association coding unit and the available reference frame set, reference may be made to the description of determining the available reference frame set according to the association coding unit and the full reference frame set, which will not be described herein again.

Therefore, the embodiment of the present application provides a fast reference frame selection algorithm, which may fully consider prediction correlation of video content in a target video frame (that is, the approximate probabilities of reference frames of coding units in the same frame are the same), on one hand, obtain an associated coding unit adjacent to a unit to be coded, and use the associated coding unit to define a reference frame of the unit to be coded, so as to obtain an available reference frame set, on the other hand, obtain a parent coding unit to which the unit to be coded belongs, and use the parent coding unit to define a child coding unit, so as to obtain a candidate reference frame set. It can be understood that, by the fast reference frame selection algorithm provided in the embodiments of the present application, a candidate reference frame set fusing a reference frame used by an associated coding unit and a reference frame used by a parent coding unit can be selected from all video frames, and because the reference frames in the candidate reference frame set are determined by the associated coding unit and the parent coding unit, the reference frames in the candidate reference frame set have a higher content similarity with the target video frame. Therefore, the embodiment of the application can traverse the video frames in the candidate reference frame set with less frame number without traversing all the coded video frames, so that not only is the time consumed for traversing reduced, but also the target reference frame with the best coding effect can be obtained in the traversal result when the candidate reference frame set to which the reference frame with higher content similarity belongs is traversed, and thus the coding effect and the coding efficiency of the target video frame can be considered at the same time (namely, the coding effect of the target video frame is improved while the coding efficiency of the target video frame is ensured, and the coding efficiency of the target video frame is improved while the coding effect of the target video frame is ensured).

Further, please refer to fig. 7, and fig. 7 is a flowchart illustrating a video data processing method according to an embodiment of the present application. The video data processing method may include the following steps S1021 to S1023, and the steps S1021 to S1023 are an embodiment of the step S102 in the embodiment corresponding to fig. 3.

Step S1021, obtaining the predictive coding types of the S related coding units;

the prediction encoding type may include inter prediction type (i.e., inter prediction) and intra prediction type (i.e., intra prediction), among others. The S associated coding units may have the same predictive coding type or different predictive coding types, and this application is not limited thereto. For example, suppose that S equals 4,4 associated coding units specifically can include associated coding unit P ₁ Related coding unit P ₂ A related coding unit P ₃ And associated coding unit P ₄ Associated coding unit P ₁ Related coding unit P ₂ And associated coding unit P ₃ All of the predictive coding types of (1) can be inter prediction, and the associated coding unit P ₄ The type of predictive coding of (a) may be intra prediction; or, the associated coding unit P ₁ Related coding unit P ₂ Related coding unit P ₃ And associated coding unit P ₄ The type of predictive coding of (a) may be inter prediction.

Step S1022, if there is an associated coding unit with a predictive coding type that is intra-frame prediction in the S associated coding units, acquiring a full reference frame set constructed for the unit to be coded in the video data, and determining the full reference frame set as an available reference frame set corresponding to the unit to be coded;

specifically, if there is an associated coding unit of which the prediction coding type is intra-prediction among the S associated coding units, the terminal device may acquire an encoded video frame of which the encoding order is earlier than that of the target video frame in the video data. The number of encoded video frames is not limited in the embodiments of the present application. Further, if the playing order of the encoded video frames is earlier than the target video frame, the terminal device may add the encoded video frames to the forward full reference frame set. Optionally, if the playing order of the encoded video frames is later than that of the target video frame, the terminal device may add the encoded video frames to the backward full reference frame set. In other words, the terminal device may add the encoded video frames having a playing order earlier than the target video frame to the forward full reference frame set and add the encoded video frames having a playing order later than the target video frame to the forward full reference frame set. Wherein, the full reference frame set comprises a forward full reference frame set and a backward full reference frame set, in other words, the forward full reference frame set and the backward full reference frame set can be collectively referred to as the full reference frame set. Further, the terminal device may determine the full reference frame set as an available reference frame set corresponding to the unit to be encoded.

For convenience of understanding, the embodiments of the present application take the example that the forward full reference frame set includes encoded video frames with a playing order earlier than that of the target video frame, and the backward full reference frame set includes encoded video frames with a playing order later than that of the target video frame. Optionally, if there is no encoded video frame whose playing order is earlier than that of the target video frame, the terminal device may determine the backward full-scale reference frame set as an available reference frame set corresponding to the unit to be encoded, that is, the forward full-scale reference frame set is empty, in other words, the terminal device may determine the backward full-scale reference frame set and the forward full-scale reference frame set that is an empty set as available reference frame sets corresponding to the unit to be encoded; if there is no encoded video frame whose playing order is later than that of the target video frame, the terminal device may determine the forward full-scale reference frame set as an available reference frame set corresponding to the unit to be encoded, that is, the backward full-scale reference frame set is empty, in other words, the terminal device may determine the forward full-scale reference frame set and the backward full-scale reference frame set that is an empty set as available reference frame sets corresponding to the unit to be encoded.

In step S1023, if the prediction coding types of the S associated coding units are all inter-frame prediction, determining an available reference frame set corresponding to the unit to be coded in the video data according to the S associated coding units.

Specifically, if the prediction coding types of the S associated coding units are all inter-frame prediction, the terminal device may obtain a coding unit number threshold associated with the video encoder. Wherein the threshold number of coding units is greater than or equal to S. Further, if S is smaller than the threshold of the number of coding units, the terminal device may obtain a full reference frame set constructed for the unit to be coded in the video data, and determine the full reference frame set as an available reference frame set corresponding to the unit to be coded. For a specific process of the terminal device acquiring the full reference frame set constructed for the unit to be encoded in the video data, reference may be made to the description of step S1022 described above, which will not be described again here. If S is smaller than the threshold of the number of the coding units, the unit to be coded is located in the image boundary area of the target video frame. Optionally, if S is equal to the threshold of the number of coding units, the terminal device may determine, according to the reference frames used by the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data.

It should be understood that, according to the reference frames used by the S associated coding units, a specific process of determining an available reference frame set corresponding to a unit to be coded in the video data by the terminal device may be described as follows: the terminal device can acquire the full reference frame set constructed for the unit to be coded in the video data. Wherein the full reference frame set comprises a forward full reference frame set and a backward full reference frame set. For a specific process of the terminal device acquiring the full reference frame set constructed for the unit to be encoded in the video data, reference may be made to the description of step S1022 described above, which will not be described again here. Further, the terminal device may obtain a forward reference frame closest to the position of the target video frame in the forward full reference frame set, and obtain a backward reference frame closest to the position of the target video frame in the backward full reference frame set. Further, the terminal device may merge the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded.

The terminal device may sort the reference frames in the forward full reference frame set according to the playing order to obtain sorted reference frames of the forward full reference frame set, and determine the last reference frame in the sorted reference frames of the forward full reference frame set as the forward reference frame closest to the target video frame; similarly, the terminal device may perform sorting processing on the reference frames in the backward full-scale reference frame set according to the playing sequence to obtain sorted reference frames of the backward full-scale reference frame set, and determine the first reference frame in the sorted reference frames of the backward full-scale reference frame set as the backward reference frame closest to the target video frame.

For a specific process of the terminal device merging the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units, reference may be made to the description of step S102 in the embodiment corresponding to fig. 3, and details are not repeated here.

It can be understood that, by determining the full-scale reference frame set as the available reference frame set corresponding to the unit to be encoded, the reference frames may be selected in a manner of traversing on the full-scale reference frame set in an area with poor prediction effect (i.e., there is an associated encoding unit with a prediction encoding type being intra-frame prediction in S associated encoding units, or S is smaller than the threshold of the number of encoding units), and still using traversal on the full-scale reference frame set (i.e., when the full-scale reference frame set is determined as the available reference frame set and the available reference frame set is determined as the candidate reference frame set in a subsequent step, the traversal process on the candidate reference frame set may be equal to the traversal process on the full-scale reference frame set), so as to reduce the influence on the encoding effect as much as possible.

Alternatively, the terminal device may obtain a coding unit number threshold associated with the video encoder. Further, if S is smaller than the threshold of the number of coding units, the terminal device may acquire a full-size reference frame set constructed for the unit to be coded in the video data, and determine the full-size reference frame set as an available reference frame set corresponding to the unit to be coded. Optionally, if S is equal to the threshold of the number of coding units, the terminal device may determine, according to the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data. The specific process of the terminal device determining the available reference frame set corresponding to the unit to be encoded in the video data according to the S associated encoding units may be described as follows: the terminal device may obtain the predictive coding types of the S associated coding units. Further, if there is an associated coding unit of which the prediction coding type is intra-frame prediction in the S associated coding units, the terminal device may obtain, in the video data, a full reference frame set constructed for the unit to be coded, and determine the full reference frame set as an available reference frame set corresponding to the unit to be coded. Optionally, if the prediction coding types of the S associated coding units are all inter-frame prediction, the terminal device may determine, according to the reference frames used by the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data.

Alternatively, the terminal device may obtain the predictive coding types of the S associated coding units. Further, if an associated coding unit with a prediction coding type of intra-frame prediction exists in the S associated coding units, a full reference frame set constructed for the unit to be coded is acquired from the video data, and the full reference frame set is determined as an available reference frame set corresponding to the unit to be coded. Optionally, if the prediction coding types of the S associated coding units are all inter-frame prediction, determining an available reference frame set corresponding to the unit to be coded in the video data according to the reference frames used by the S associated coding units.

Alternatively, the terminal device may obtain a coding unit number threshold associated with the video encoder. Further, if S is smaller than the threshold of the number of coding units, the terminal device may acquire a full-size reference frame set constructed for the unit to be coded in the video data, and determine the full-size reference frame set as an available reference frame set corresponding to the unit to be coded. Optionally, if S is equal to the threshold of the number of coding units, the terminal device may determine, according to the reference frames used by the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data.

Optionally, if S is smaller than the threshold of the number of coding units, the terminal device may determine, according to the reference frames used by the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data.

Therefore, the embodiment of the application can acquire the associated coding units adjacent to the unit to be coded in the target video frame, and determine the available reference frame set corresponding to the unit to be coded according to the predicted coding types of the associated coding units and the number of the associated coding units, so that the accuracy of the available reference frame set is improved, the accuracy of the candidate reference frame set is improved, and further, when the unit to be coded in the target video frame is coded based on the candidate reference frame set, the candidate reference frame set can be traversed without traversing the full reference frame set, so that the coding effect and the coding efficiency of the target video frame can be considered at the same time. It can be understood that, when the prediction coding types of the associated coding units are all inter-frame prediction and the number of the associated coding units is equal to the threshold of the number of coding units, in the embodiment of the present application, an available reference frame set corresponding to a unit to be coded is determined in the video data according to reference frames used by S associated coding units (that is, a full reference frame list of the unit to be coded is reduced according to the reference frame usage of the associated coding units), otherwise, the full reference frame set obtained in the video data is determined as the available reference frame set corresponding to the unit to be coded.

Further, please refer to fig. 8, wherein fig. 8 is a schematic flowchart of a video data processing method according to an embodiment of the present application. The video data processing method may include the following steps S1031 to S1033, and the steps S1031 to S1033 are a specific embodiment of the step S103 in the embodiment corresponding to fig. 3.

Step S1031, acquiring a predictive coding type of the parent coding unit;

the prediction encoding type may include inter prediction type (i.e., inter prediction) and intra prediction type (i.e., intra prediction), among others.

Step S1032, if the prediction coding type of the parent coding unit is intra-frame prediction, determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

it may be understood that, when determining the available reference frame set as the candidate reference frame set corresponding to the unit to be encoded, the terminal device may determine the forward available reference frame set as the forward candidate reference frame set corresponding to the unit to be encoded, and determine the backward available reference frame set as the backward candidate reference frame set corresponding to the unit to be encoded. Wherein, the forward available reference frame set and the backward available reference frame set can be collectively referred to as an available reference frame set; the forward candidate reference frame set and the backward candidate reference frame set may be collectively referred to as a candidate reference frame set.

Step S1033, if the prediction encoding type of the parent encoding unit is inter-frame prediction, determining a candidate reference frame set corresponding to the unit to be encoded according to the reference frame used by the parent encoding unit and the available reference frame set.

Specifically, if the prediction encoding type of the parent encoding unit is inter prediction, the terminal device may acquire the inter encoding mode of the parent encoding unit. Further, if the inter-coding mode is not the inter-skip mode, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be coded. Optionally, if the inter-frame coding mode is an inter-frame skip mode, the terminal device may match the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

It should be understood that, the specific process of the terminal device matching the reference frame used by the parent coding unit with the available reference frame set to obtain the candidate reference frame set corresponding to the unit to be coded may be described as follows: the terminal device may match the reference frame used by the parent coding unit with the set of available reference frames. Further, if there is an intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, the terminal device may determine the intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set as the candidate reference frame set corresponding to the unit to be coded. The intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set is determined as the candidate reference frame set corresponding to the unit to be coded, which represents that the child CU (i.e. the child coding unit) can only use the reference frame of the parent CU (i.e. the parent coding unit). Optionally, if there is no intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, the terminal device may determine the available reference frame set as the candidate reference frame set corresponding to the unit to be coded.

Alternatively, the terminal device may acquire the inter-coding mode of the parent coding unit. Further, if the inter-coding mode is not the inter-skip mode, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be coded. Optionally, if the inter-frame coding mode is an inter-frame skip mode, the terminal device may match the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

Alternatively, the terminal device may acquire the predictive coding type of the parent coding unit. Further, if the prediction encoding type of the parent encoding unit is intra-frame prediction, the terminal device may determine the available reference frame set as a candidate reference frame set corresponding to the unit to be encoded. Alternatively, if the predictive coding type of the parent coding unit is inter-frame prediction, the terminal device may match the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

Therefore, the parent coding unit to which the unit to be coded belongs can be acquired from the target video frame, the candidate reference frame set corresponding to the unit to be coded is determined according to the predictive coding type of the parent coding unit and the inter-frame coding mode of the parent coding unit, so that the accuracy of the candidate reference frame set is improved, and further, when the unit to be coded in the target video frame is coded based on the candidate reference frame set, the candidate reference frame set can be traversed without traversing the full reference frame set, so that the coding effect and the coding efficiency of the target video frame can be considered at the same time. It can be understood that, when the prediction coding type of the parent coding unit is inter-frame prediction, the inter-frame coding mode of the parent coding unit is inter-frame skip mode, and the reference frame used by the parent coding unit is available for the unit to be coded, an intersection between the reference frame used by the parent coding unit and a reference frame in the available reference frame set is determined as a candidate reference frame set corresponding to the unit to be coded (i.e., the unit to be coded multiplexes the reference frame of the parent coding unit under a specific condition, i.e., the reference frame of the child coding unit is limited by the reference frame of the parent coding unit), otherwise, the available reference frame set corresponding to the unit to be coded is determined as the candidate reference frame set corresponding to the unit to be coded.

Further, referring to fig. 9, fig. 9 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application, where the video data processing apparatus 1 may include: a coding unit acquisition module 11, a usable set determination module 12, a candidate set determination module 13; further, the data processing apparatus 1 may further include: a collection transfer module 14;

the encoding unit acquiring module 11 is configured to acquire a target video frame from video data, and acquire an associated encoding unit associated with a unit to be encoded in the target video frame; the coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded;

an available set determining module 12, configured to determine, according to the associated coding unit, an available reference frame set corresponding to the unit to be coded in the video data;

the number of the associated coding units is S, and S is a positive integer;

the available set determination module 12 includes: a first type obtaining unit 121, a first determining unit 122, a second determining unit 123, a total amount set determining unit 124, a reference frame determining unit 125, a reference frame merging unit 126;

a first type obtaining unit 121, configured to obtain prediction coding types of the S associated coding units;

a first determining unit 122, configured to, if an associated coding unit whose predictive coding type is intra-frame prediction exists in the S associated coding units, acquire a full reference frame set constructed for a unit to be coded in the video data, and determine the full reference frame set as an available reference frame set corresponding to the unit to be coded;

a first determining unit 122, specifically configured to obtain, in the video data, an encoded video frame with an encoding order earlier than that of the target video frame;

a first determining unit 122, configured to add the encoded video frame to the forward full reference frame set if the playing order of the encoded video frame is earlier than the target video frame;

the first determining unit 122 is specifically configured to add the encoded video frame to the backward full reference frame set if the playing order of the encoded video frame is later than the target video frame.

A second determining unit 123, configured to determine, according to the S associated coding units, an available reference frame set corresponding to the unit to be coded in the video data if the prediction coding types of the S associated coding units are all inter-frame prediction.

The second determining unit 123 is specifically configured to obtain a threshold of the number of coding units associated with the video encoder; the coding unit number threshold is greater than or equal to S;

a second determining unit 123, configured to, if S is smaller than the threshold of the number of coding units, obtain a full reference frame set constructed for the unit to be coded in the video data, and determine the full reference frame set as an available reference frame set corresponding to the unit to be coded;

the second determining unit 123 is specifically configured to determine, if S is equal to the threshold of the number of coding units, an available reference frame set corresponding to a unit to be coded in the video data according to the reference frames used by the S associated coding units.

A full set determining unit 124, configured to obtain a full reference frame set constructed for a unit to be encoded in the video data if the prediction coding types of the S associated coding units are all inter-frame prediction and S is equal to a threshold of the number of coding units associated with the video encoder; the full reference frame set comprises a forward full reference frame set and a backward full reference frame set;

a reference frame determining unit 125, configured to obtain a forward reference frame closest to the position of the target video frame in the forward full-scale reference frame set, and obtain a backward reference frame closest to the position of the target video frame in the backward full-scale reference frame set;

a reference frame merging unit 126, configured to merge the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded.

a reference frame merging unit 126, configured to specifically determine a union of the reference frames used by the S associated encoding units as an associated reference frame set;

the reference frame merging unit 126 is specifically configured to determine, if the associated reference frame set does not include the forward reference frame and the backward reference frame, a reference frame and a forward reference frame that are earlier in playing order than the target video frame in the associated reference frame set as a forward available reference frame set corresponding to the unit to be encoded;

the reference frame merging unit 126 is specifically configured to determine a reference frame and a backward reference frame in the associated reference frame set, which have a play order later than that of the target video frame, as a backward available reference frame set corresponding to the unit to be encoded.

For a specific implementation manner of the first type obtaining unit 121, the first determining unit 122, the second determining unit 123, the total amount set determining unit 124, the reference frame determining unit 125, and the reference frame merging unit 126, reference may be made to the description of step S1021 to step S1023 in the embodiment corresponding to fig. 3 and the embodiment corresponding to fig. 7, which will not be repeated here.

A candidate set determining module 13, configured to determine, if a parent coding unit to which a unit to be coded belongs exists in the target video frame, a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set; the candidate reference frame set is used for traversing a target reference frame for the unit to be coded; the target reference frame is used for coding the unit to be coded.

Wherein, the candidate set determining module 13 includes: a second type obtaining unit 131, a third determining unit 132, a fourth determining unit 133, a reference frame matching unit 134, a fifth determining unit 135, a sixth determining unit 136;

a second type acquiring unit 131 for acquiring a predictive coding type of the parent coding unit;

a third determining unit 132, configured to determine, if the prediction coding type of the parent coding unit is intra-frame prediction, the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

a fourth determining unit 133, configured to determine, if the prediction coding type of the parent coding unit is inter prediction, a candidate reference frame set corresponding to the unit to be coded according to the reference frame used by the parent coding unit and the available reference frame set.

The fourth determining unit 133 is specifically configured to acquire an inter-frame coding mode of the parent coding unit;

a fourth determining unit 133, configured to determine, if the inter-frame coding mode is not the inter-frame skip mode, the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

the fourth determining unit 133 is specifically configured to, if the inter-frame coding mode is an inter-frame skip mode, match the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

A reference frame matching unit 134 configured to match a reference frame used by the parent coding unit with an available reference frame set if the prediction coding type of the parent coding unit is inter prediction and the inter coding mode of the parent coding unit is inter skip mode;

a fifth determining unit 135, configured to determine, if there is an intersection between the reference frame used by the parent coding unit and a reference frame in the available reference frame set, the intersection between the reference frame used by the parent coding unit and a reference frame in the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

a fifth determining unit 135, configured to specifically determine, as a forward candidate reference frame set corresponding to the unit to be encoded, a reference frame and a forward available reference frame set that are in an earlier playing order than the target video frame in the reference frames used by the parent encoding unit;

the fifth determining unit 135 is specifically configured to determine, as a candidate reference frame set in a backward direction corresponding to the unit to be encoded, a reference frame and a backward available reference frame set that are in a later playback order than the target video frame in the reference frames used by the parent encoding unit.

A sixth determining unit 136, configured to determine the available reference frame set as a candidate reference frame set corresponding to the unit to be encoded if there is no intersection between the reference frame used by the parent encoding unit and the reference frame in the available reference frame set.

For specific implementation manners of the second type obtaining unit 131, the third determining unit 132, the fourth determining unit 133, the reference frame matching unit 134, the fifth determining unit 135, and the sixth determining unit 136, reference may be made to the description of step S1031 to step S1033 in the embodiment corresponding to fig. 3 and the embodiment corresponding to fig. 8, which will not be repeated herein.

Optionally, the set transferring module 14 is configured to determine, if there is no parent coding unit to which the unit to be coded belongs in the target video frame, the available reference frame set as a candidate reference frame set corresponding to the unit to be coded.

For specific implementation manners of the encoding unit obtaining module 11, the available set determining module 12, the candidate set determining module 13 and the set transferring module 14, reference may be made to the description of steps S101 to S103 in the embodiment corresponding to fig. 3, and the description of steps S1021 to S1023 in the embodiment corresponding to fig. 7 and the description of steps S1031 to S1033 in the embodiment corresponding to fig. 8, which will not be described again here. In addition, the beneficial effects of the same method are not described in detail.

Further, referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device may be a terminal device or a server. As shown in fig. 10, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. In some embodiments, the user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 1005 may also be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the video data processing method in the embodiment corresponding to fig. 3, fig. 7, or fig. 8, and may also perform the description of the video data processing apparatus 1 in the embodiment corresponding to fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores the aforementioned computer program executed by the video data processing apparatus 1, and when the processor executes the computer program, the description of the video data processing method in the embodiment corresponding to fig. 3, fig. 7, or fig. 8 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

Further, it should be noted that: embodiments of the present application also provide a computer program product, which may include a computer program, which may be stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor can execute the computer program, so that the computer device executes the description of the video data processing method in the embodiment corresponding to fig. 3, fig. 7, or fig. 8, which is described above, and therefore, the description thereof will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of processing video data, comprising:

determining an available reference frame set corresponding to the unit to be coded in the video data according to the associated coding unit;

if a parent coding unit to which the unit to be coded belongs exists in the target video frame, determining a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set; the candidate reference frame set is used for traversing a target reference frame for the unit to be encoded; the target reference frame is used for coding the unit to be coded.

2. The method of claim 1, wherein the number of associated coding units is S, and S is a positive integer;

the determining, according to the associated coding unit, an available reference frame set corresponding to the unit to be coded in the video data includes:

acquiring the predictive coding types of the S related coding units;

if an associated coding unit with a prediction coding type of intra-frame prediction exists in the S associated coding units, acquiring a full-scale reference frame set constructed for the unit to be coded in the video data, and determining the full-scale reference frame set as an available reference frame set corresponding to the unit to be coded;

and if the prediction coding types of the S associated coding units are all inter-frame prediction, determining an available reference frame set corresponding to the unit to be coded in the video data according to the S associated coding units.

3. The method of claim 2, wherein the full reference frame set comprises a forward full reference frame set and a backward full reference frame set;

the acquiring, in the video data, a full-scale reference frame set constructed for the unit to be encoded includes:

acquiring coded video frames with a coding sequence earlier than that of the target video frame in the video data;

if the playing order of the encoded video frames is earlier than the target video frames, adding the encoded video frames to the forward full reference frame set;

if the playing sequence of the encoded video frames is later than the target video frames, adding the encoded video frames to the backward full reference frame set.

4. The method of claim 2, wherein the determining the set of available reference frames corresponding to the unit to be encoded in the video data according to the S associated coding units comprises:

obtaining a coding unit quantity threshold associated with a video encoder; the number of coding units threshold is greater than or equal to the S;

if the S is smaller than the threshold of the number of the coding units, acquiring a full-scale reference frame set constructed for the unit to be coded in the video data, and determining the full-scale reference frame set as an available reference frame set corresponding to the unit to be coded;

if the S is equal to the threshold of the number of coding units, determining an available reference frame set corresponding to the unit to be coded in the video data according to the reference frames used by the S associated coding units.

5. The method of claim 1, wherein the number of the associated coding units is S, and S is a positive integer;

if the prediction coding types of the S associated coding units are all inter-frame prediction and S is equal to a threshold value of the number of coding units associated with a video coder, acquiring a full reference frame set constructed for the unit to be coded in the video data; the full reference frame set comprises a forward full reference frame set and a backward full reference frame set;

acquiring a forward reference frame closest to the position of the target video frame from the forward full reference frame set, and acquiring a backward reference frame closest to the position of the target video frame from the backward full reference frame set;

and merging the reference frames used by the S associated coding units, the forward reference frame and the backward reference frame to obtain an available reference frame set corresponding to the unit to be coded.

6. The method of claim 5, wherein the set of available reference frames comprises a forward set of available reference frames and a backward set of available reference frames;

the merging the reference frame, the forward reference frame, and the backward reference frame used by the S associated coding units to obtain an available reference frame set corresponding to the unit to be coded includes:

determining a union of reference frames used by the S associated coding units as an associated reference frame set;

if the associated reference frame set does not comprise the forward reference frame and the backward reference frame, determining a reference frame and the forward reference frame which are earlier in playing sequence than the target video frame in the associated reference frame set as the forward available reference frame set corresponding to the unit to be encoded;

and determining the reference frame and the backward reference frame which play order is later than that of the target video frame in the associated reference frame set as the backward available reference frame set corresponding to the unit to be encoded.

7. The method of claim 1, wherein the determining the candidate reference frame set corresponding to the unit to be encoded according to the parent coding unit and the available reference frame set comprises:

acquiring a predictive coding type of the parent coding unit;

if the prediction coding type of the parent coding unit is intra-frame prediction, determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

and if the prediction coding type of the parent coding unit is inter-frame prediction, determining a candidate reference frame set corresponding to the unit to be coded according to the reference frame used by the parent coding unit and the available reference frame set.

8. The method of claim 7, wherein the determining the candidate reference frame set corresponding to the unit to be encoded according to the reference frame used by the parent coding unit and the available reference frame set comprises:

acquiring an inter-frame coding mode of the parent coding unit;

if the inter-frame coding mode is not an inter-frame skipping mode, determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

and if the interframe coding mode is the interframe skipping mode, matching the reference frame used by the parent coding unit with the available reference frame set to obtain a candidate reference frame set corresponding to the unit to be coded.

9. The method of claim 1, wherein the determining the candidate reference frame set corresponding to the unit to be encoded according to the parent coding unit and the available reference frame set comprises:

if the prediction coding type of the parent coding unit is inter-frame prediction and the inter-frame coding mode of the parent coding unit is inter-frame skip mode, matching the reference frame used by the parent coding unit with the available reference frame set;

if an intersection exists between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, determining the intersection between the reference frame used by the parent coding unit and the reference frame in the available reference frame set as a candidate reference frame set corresponding to the unit to be coded;

and if no intersection exists between the reference frame used by the parent coding unit and the reference frame in the available reference frame set, determining the available reference frame set as the candidate reference frame set corresponding to the unit to be coded.

10. The method of claim 9, wherein the set of available reference frames comprises a forward set of available reference frames and a backward set of available reference frames; the candidate reference frame set comprises a forward candidate reference frame set and a backward candidate reference frame set;

the determining an intersection between a reference frame used by a parent coding unit and a reference frame in the available reference frame set as a candidate reference frame set corresponding to the unit to be coded includes:

determining a reference frame and the forward available reference frame set which play order is earlier than the target video frame in the reference frames used by the parent coding unit as the forward candidate reference frame set corresponding to the unit to be coded;

and determining the reference frame and the backward available reference frame set which play order is later than that of the target video frame in the reference frames used by the parent coding unit as the backward candidate reference frame set corresponding to the unit to be coded.

11. The method according to any one of claims 1-10, further comprising:

and if the parent coding unit to which the unit to be coded belongs does not exist in the target video frame, determining the available reference frame set as a candidate reference frame set corresponding to the unit to be coded.

12. A video data processing apparatus, comprising:

the encoding unit acquiring module is used for acquiring a target video frame from video data and acquiring an associated encoding unit associated with a unit to be encoded in the target video frame; the coding sequence of the associated coding unit is earlier than that of the unit to be coded, and the associated coding unit is adjacent to the unit to be coded;

an available set determining module, configured to determine, according to the associated coding unit, an available reference frame set corresponding to the unit to be coded in the video data;

a candidate set determining module, configured to determine, if a parent coding unit to which the unit to be coded belongs exists in the target video frame, a candidate reference frame set corresponding to the unit to be coded according to the parent coding unit and the available reference frame set; the candidate reference frame set is used for traversing a target reference frame for the unit to be encoded; the target reference frame is used for coding the unit to be coded.

13. A computer device, comprising: a processor and a memory;

the processor is connected to the memory, wherein the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method of any one of claims 1-11.

14. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded and executed by a processor, so that a computer device having said processor performs the method of any of claims 1-11.

15. A computer program product, characterized in that the computer program product comprises a computer program stored in a computer readable storage medium and adapted to be read and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-11.