CN114827630A

CN114827630A - Method, system, device and medium for learning CU deep partitioning based on frequency domain distribution

Info

Publication number: CN114827630A
Application number: CN202210241583.1A
Authority: CN
Inventors: 许皓淇; 曹英烈; 周智恒
Original assignee: South China University of Technology SCUT; Guangzhou City University of Technology
Current assignee: South China University of Technology SCUT; Guangzhou City University of Technology
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-07-29
Anticipated expiration: 2042-03-11
Also published as: CN114827630B

Abstract

The invention discloses a method, a system, a device and a medium for learning CU deep partitioning based on frequency domain distribution, wherein the method comprises the following steps: dividing the image into a plurality of 64x64 size blocks and performing DCT to obtain a frequency domain coefficient distribution matrix F ₆₄ (ii) a Corresponding to W ₆₄ Calculating a probability score p ₆₄ If it is less than the division threshold α ₆₄ The downward division is ended and greater than the division threshold α ₆₄ Continuously dividing the data into 4 32x32 sub-CU blocks according to the quadtree principle, and obtaining a 32x 32-size frequency domain coefficient matrix F in the same way ₃₂ And W ₃₂ Is calculated to obtainGet the probability score p ₃₂ Then is compared with a division threshold value alpha ₃₂ Judging whether to continue dividing; and so on until all CU blocks stop partitioning ahead of time or partition the block into the smallest 8x8CU blocks. The invention judges whether to continue the division or not through the probability fraction and the division threshold value, does not need to carry out traversal recursion on all conditions, lightens the complexity of CU deep division, saves a large amount of encoding time, and can be widely applied to the technical field of video encoding.

Description

Method, system, device and medium for learning CU deep partition based on frequency domain distribution

Technical Field

The invention relates to the technical field of artificial intelligence and video coding, in particular to a method, a system, a device and a medium for CU deep partitioning based on frequency domain distribution learning.

Background

With the development of internet and communication technology in recent years, the rapid increase of video traffic has brought about great challenges to video coding technology.

In the conventional coding framework (HEVC for example), any coded frame needs to be divided into multiple ctu (code Tree unit) sequences before performing the subsequent quantization operation of the predictive transform. The CTU can be continuously divided into CUs (code units) with different sizes downwards according to the principle of a quadtree, and the sizes of the CUs are maximally 64x64 and minimally 8x 8. The division of CTUs determines the efficiency of subsequent coding.

In order to obtain the optimal CTU partition, the encoding procedure uses a traversal recursive method, which continuously partitions the 64x64CU down to 8x8CU, and predicts each partition by using a built-in rate-distortion cost function until one of the optimal prediction cases is selected.

The division mode causes great waste on coding time and computing resources, and the resolution of the video image is obviously improved along with the increase of the resolution of the video image. Therefore, how to reduce the complexity of CU depth partitioning becomes a popular problem in the current industry.

Disclosure of Invention

To solve at least one of the technical problems in the prior art to a certain extent, an object of the present invention is to provide a method, a system, an apparatus, and a medium for learning CU depth partitioning based on frequency domain distribution.

The technical scheme adopted by the invention is as follows:

a method for learning CU deep division based on frequency domain distribution comprises the following steps:

obtaining a video image, dividing the video image into a plurality of first CU blocks with the size of 64x64, and obtaining a frequency domain coefficient distribution matrix of DCT (discrete cosine transform) transformation of the first CU blocks

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

If it is

The first CU block is divided downwards into 4 second CU blocks with the size of 32x32, and a frequency domain coefficient distribution matrix of DCT transformation of the second CU blocks is obtained

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the first CU block;

if it is

Dividing the second CU block into 4 third CU blocks with the size of 16x16 downwards, and acquiring a frequency domain coefficient distribution matrix of DCT (discrete cosine transform) transformation of the third CU blocks

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the second CU block;

if it is

Divide the third CU block down into 4 fourth CU blocks of size 8x 8; otherwise, ending the division of the third CU block;

wherein ,α_N To divide the threshold, N is 64, 32, 16.

Further, the probability score

Obtained by the following formula:

wherein ,W₆₄ And distributing the weight matrix for a preset frequency domain, wherein i represents a row coordinate of the matrix, and j represents a row coordinate of the matrix.

Further, the frequency domain distribution weight matrix W ₆₄ Obtained by the following method:

obtaining training set and sample needed by network

Representing a DCT transform frequency domain coefficient distribution matrix corresponding to a k 64x64 size CU block; l is _k When the CU block is divided down, 0 and 1 indicate that the CU block is divided down, 0 indicates no, and 1 indicates yes;

training the network according to a preset loss function to obtain a frequency domain distribution weight matrix W ₆₄ ；

The expression of the preset loss function is as follows:

further, the division threshold α ₆₄ Obtained by the following method:

selecting label samples with L equal to 0 in training set

Calculating a probability score from the selected samples

From calculated probability scores

Obtaining a partition threshold α ₆₄ 。

Further, threshold values are divided

Further, the dividing the video image into first CU blocks of 64x64 size includes:

the video image is divided into several first CU blocks of 64x64 size according to the luminance component.

after dividing the video image into a plurality of first CU blocks of 64x64 size, pixel interpolation is carried out on the pixel area of 64x64 size of the residual insufficient division.

The other technical scheme adopted by the invention is as follows:

a frequency domain distribution-based learning CU depth partitioning system, comprising:

the first dividing module is used for acquiring a video image, dividing the video image into a plurality of first CU blocks with the size of 64x64, and acquiring a frequency domain coefficient distribution matrix of DCT (discrete cosine transform) transformation of the first CU blocks

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

A second division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the first CU block;

a third division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the second CU block;

a fourth division module for if

wherein ,α_N To divide the threshold, N is 64，32，16。

The other technical scheme adopted by the invention is as follows:

an apparatus for learning CU depth partitioning based on frequency domain distribution, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The other technical scheme adopted by the invention is as follows:

a computer readable storage medium in which a processor executable program is stored, which when executed by a processor is for performing the method as described above.

The invention has the beneficial effects that: the invention judges whether to continue the division or not through the probability fraction and the division threshold value to obtain a mode of terminating the division in advance, does not need to carry out traversal recursion on all conditions, reduces the complexity of CU deep division and saves a large amount of coding time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating steps of a method for learning CU deep partition based on frequency domain distribution according to an embodiment of the present invention

FIG. 2 is a schematic diagram of a video image in an embodiment of the invention;

FIG. 3 is a pictorial representation of a CU block 64x64DCT transform frequency-domain coefficient distribution of the video image of FIG. 2;

FIG. 4 is a schematic flow chart of a method for learning CU depth fast partition based on frequency domain distribution according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a frequency domain distribution weight matrix W according to an embodiment of the present invention ₆₄ And a division threshold α ₆₄ And (4) network training flow chart.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1, the present embodiment provides a method for learning CU depth partitioning based on frequency domain distribution, including the following steps:

s101, obtaining a video image, dividing the video image into a plurality of first CU blocks with the size of 64x64, and obtaining a frequency domain coefficient distribution matrix of DCT (discrete cosine transform) transformation of the first CU blocks

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

The luminance component of the video frame image is selected and divided into N64 x64 sized CU blocks. If the remaining areas are less than 64x64 pixels, the areas are interpolated.

DCT transformation is carried out on the CU area with the size of k 64x64 to obtain a frequency domain coefficient matrix

k is 1,2, N. Calculating the division probability:

s102, if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, the division of the first CU block is ended.

If it is

The CU area stops partitioning in advance; if it is

The CU area (called LCU _64) continues to be divided down into 4CU areas of size 32x32 according to the quadtree principle.

For the continued downward partition of LCU _64, the frequency domain coefficient matrix of the mth CU area with the size of 32x32 is obtained

m is 1,2,3, 4. Calculating the division probability:

s103, if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, the division of the second CU block is ended.

If it is

The CU area stops partitioning in advance; if it is

The CU area (called LCU _32) continues to be divided down into 4CU areas of size 16x16 according to the quadtree principle.

Similarly, the LCU _32 is continuously divided downwards to obtain the frequency domain coefficient matrix of the nth CU area with the size of 32x32

n is 1,2,3, 4. Calculating the division probability:

s104, if

Divide the third CU block down into 4 fourth CU blocks of size 8x 8; otherwise, the division of the third CU block is ended.

If it is

The CU area stops partitioning in advance; if it is

The CU area continues to be divided down into 4CU areas of size 8x8 according to the quadtree principle, ending.

Therefore, the method can obtain a mode of terminating the partition in advance, does not need to perform traversal recursion on all the situations, reduces the complexity of CU deep partition, and saves a large amount of coding time. Meanwhile, the probability fraction nature of the calculation division based on the frequency domain learning is only calculated on the two matrixes, the processor is simple and convenient to process, and great calculation resources and time can be saved.

The above method is explained in detail below with reference to the drawings and the specific embodiments.

As shown in fig. 4, fig. 4 is a diagram illustrating the frequency domain distribution-based CU deep learning method according to an embodiment of the present inventionA flow chart of a fast partitioning method. The method comprises a frequency domain distribution learning network module and a CU deep division judging module. When the CU deep division judgment is carried out, a frequency domain distribution weight matrix W needs to be learned through a frequency domain distribution learning network module firstly _N And a division threshold value alpha _N Two important parameters, N64, 32, 16.

FIG. 5 shows a learning frequency domain distribution weight matrix W provided by an embodiment of the present invention ₆₄ And a division threshold α ₆₄ And (4) network training flow chart. Distributing the weight matrix W for the remaining frequency domains ₃₂ ，W ₁₆ And a division threshold α ₃₂ ，α ₁₆ And similar training process is also performed, and additional drawings are omitted.

Step A1: obtaining training set and sample needed by network

step A2: setting a loss function

Training the network to obtain a frequency domain distribution weight matrix W ₆₄ ；

Step A3: according to the frequency domain distribution weight matrix W which is learned currently ₆₄ Selecting a label sample with L being 0 in the data set

Computing partition probability scores

Step A4: observation probability score

Selecting a suitable way to set the division threshold value alpha ₆₄ . In this embodiment, the selection is made

Better experimental results can be obtained.

Referring to fig. 2 and 3, the more high-frequency components in a region tends to be divided into smaller CU blocks, and the frequency domain distribution learning network module learns how to represent the relationship between the richness (content richness) of the high-frequency components and the CU division depth. This association distributes the weight matrix W across the domains _N And a division threshold α _N Parameter representation.

Step S1: dividing the brightness component of any video image into a plurality of CU blocks with the size of 64x64, and solving a frequency domain coefficient distribution matrix of DCT (discrete cosine transform) transformation of the CU blocks

According to

Calculating the corresponding probability score;

step S2: decision probability score

And a division threshold α ₆₄ The relationship between;

step S3: if it is

Ending the partitioning of the CU block in advance;

step S4: if it is

The CU block is divided down into 4 32x32 sub-CU blocks according to the quadtree principle;

step S5: performing DCT (discrete cosine transform) on 4 32x32 sub-CU blocks to obtain frequency domain coefficient distribution matrix

Calculating a probability score

Step S6: decision probability score

And a division threshold α ₃₂ The relationship between;

step S7: if it is

Ending the partitioning of the CU block in advance;

step S8: if it is

The CU block is divided down into 4 16x16 sub-CU blocks according to the quadtree principle;

step S9: performing DCT (discrete cosine transform) on 4 16x16 sub-CU blocks to obtain frequency domain coefficient distribution matrix

Calculating a probability score

Step S10: decision probability score

And a division threshold α ₁₆ The relationship between;

step S11: if it is

Ending the partitioning of the CU block in advance;

step S12: if it is

The CU block is divided down into 4 8x8 sub-CU blocks according to the quadtree principle, and this division is ended.

It can be seen that steps S3, S7, and S11 all have a chance to end partitioning in advance, so that in many cases, it is avoided to make decisions after traversing all partitioning modes of a CU block, and waste of encoding time and computational resources can be avoided. The division among different sub-CU blocks is independent and not influenced, so that a program can be processed in parallel, a plurality of sub-CU blocks are decided at the same time, and the coding time is greatly saved. Wherein, only one step is executed in S3 and S4, S7 and S8, and S11 and S12, even if only 9 steps are needed for dividing a 64x64CU block into a minimum 8x8CU block, which involves 3 DCT transformation steps, 3 inter-matrix operations, and three judgments, thereby greatly reducing the demand on the computing resources of the processor.

In the test experiment in the embodiment, a CU division depth result similar to that obtained by the conventional coding framework HEVC can be obtained, and the coding time is greatly reduced on the premise of not affecting the video quality and the code rate.

The present embodiment further provides a system for learning CU depth partitioning based on frequency domain distribution, including:

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

A second division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the first CU block;

a third division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the second CU block;

a fourth division module for if

wherein ,α_N To divide the threshold, N is 64, 32, 16.

The frequency domain distribution learning-based CU depth partitioning system of the present embodiment can execute the frequency domain distribution learning-based CU depth partitioning method provided in the embodiment of the present invention, can execute any combination of the implementation steps of the embodiment of the method, and has corresponding functions and advantageous effects of the method.

The present embodiment further provides a device for learning CU depth partitioning based on frequency domain distribution, including:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of fig. 1.

The device for learning CU depth partitioning based on frequency domain distribution according to the present embodiment can perform the method for learning CU depth partitioning based on frequency domain distribution provided in the embodiments of the method of the present invention, can perform any combination of the implementation steps of the embodiments of the method, and has corresponding functions and advantages of the method.

The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

The present embodiment further provides a storage medium, which stores an instruction or a program capable of executing the frequency domain distribution learning CU depth partitioning method according to the embodiments of the present invention, and when the instruction or the program is executed, the instruction or the program can execute any combination of the implementation steps of the embodiments of the method, and has corresponding functions and advantages of the method.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for learning CU depth partitioning based on frequency domain distribution is characterized by comprising the following steps:

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

If it is

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the first CU block;

if it is

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the second CU block;

if it is

wherein ,α_N To divide the threshold, N is 64, 32, 16.

2. The method as claimed in claim 1, wherein the probability score is used to learn CU depth partition based on frequency domain distribution

By the followingThe formula is calculated to obtain:

wherein ,W₆₄ For a preset frequency domain distribution weight matrix, i represents the row coordinate of the matrix, and j represents the column coordinate of the matrix.

3. The method as claimed in claim 2, wherein the frequency domain distribution weight matrix W is a weight matrix of learning CU ₆₄ Obtained by the following method:

obtaining training set and sample needed by network

The expression of the preset loss function is as follows:

4. the method as claimed in claim 3, wherein the partition threshold α is a ₆₄ Obtained by the following method:

selecting label samples with L being 0 in training set

According to the selectionCalculating probability scores of the samples

From calculated probability scores

Obtaining a partition threshold α ₆₄ 。

5. The method as claimed in claim 4, wherein the partition threshold is set according to the frequency domain distribution

6. The method according to claim 1, wherein the dividing the video image into 64x64 sized first CU blocks comprises:

7. The method according to claim 1, wherein the dividing the video image into 64x64 sized first CU blocks comprises:

after dividing a video image into a plurality of first CU blocks with the size of 64x64, pixel interpolation is carried out on a pixel area with the size of 64x64 in the residual insufficient division.

8. A system for learning CU deep partitioning based on frequency domain distribution, comprising:

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

A second division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the first CU block;

a third division module for if

According to the coefficient distribution matrix of frequency domain

Obtaining a probability score

Otherwise, ending the division of the second CU block;

a fourth division module for if

Divide the third CU block down into 4 fourth CU blocks of size 8x 8;

otherwise, ending the division of the third CU block;

wherein ,α_N To divide the threshold, N is 64, 32, 16.

9. An apparatus for learning CU depth partitioning based on frequency domain distribution, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 7 when executed by the processor.