CN116389752A

CN116389752A - Video encoding method and device, video decoding method and device and electronic equipment

Info

Publication number: CN116389752A
Application number: CN202310530604.6A
Authority: CN
Inventors: 张韵东; 张博; 昝劲文; 李国新
Original assignee: Zhongxing Intelligent System Technology Co ltd; Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd; Vimicro Corp
Current assignee: Zhongxing Intelligent System Technology Co ltd; Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd; Vimicro Corp
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-07-04

Abstract

The disclosure provides a video encoding method and device, a video decoding method and device and electronic equipment, and relates to the technical field of digital multimedia. The video encoding method includes: acquiring a target image block in a video frame; traversing uncoded image blocks in a video frame, and determining at least one first image block conforming to a preset rule; respectively acquiring respective first residual matrixes of a target image block and at least one first image block; fusing the first residual matrixes of the target image block and at least one first image block respectively to obtain a second residual matrix; and under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block, obtaining the coding data corresponding to the target image block and the at least one first image block based on the second residual matrix. Therefore, the expression efficiency of the coded data can be improved, the code rate of video quantization output is greatly reduced, and the video compression efficiency and the video coding performance are integrally improved.

Description

Video encoding method and device, video decoding method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of digital multimedia, in particular to a video coding method and device, a video decoding method and device and electronic equipment.

Background

Digital multimedia technology is a technology that processes, stores, transmits, and presents digitized information using computer technology and communication technology. The video is used as an emerging information carrier, has the advantages of vividness, intuitiveness, diversification, easy understanding and the like, and has wide application prospect in the aspects of information transmission and social communication. Therefore, video coding technology is becoming a current research focus. The related research institutions invest a great deal of resources and efforts to improve the efficiency of video coding, while maintaining visual quality, reducing the amount of data as much as possible.

Video consists of successive video frames and there is typically a strong similarity between successive video frames, i.e. there is a lot of redundant information. In the related art, a video frame is described by adopting a mode of predicting images and residual errors, so that redundancy of space and time dimensions is reduced, and video compression efficiency is improved. However, there is still some redundancy and internal correlation in the residual, which affects video compression efficiency and video coding performance.

Disclosure of Invention

In view of the above, the present disclosure provides a video encoding method and apparatus, a video decoding method and apparatus, and an electronic device, so as to solve the encoding performance problem in the related art.

In a first aspect, a video encoding method is provided, including: acquiring a target image block in a video frame; traversing uncoded image blocks in a video frame, and determining at least one first image block conforming to a preset rule; respectively acquiring respective first residual matrixes of a target image block and at least one first image block; fusing the first residual matrixes of the target image block and at least one first image block respectively to obtain a second residual matrix; and under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block, obtaining the coding data corresponding to the target image block and the at least one first image block based on the second residual matrix.

In some embodiments, traversing uncoded image blocks in a video frame, determining at least one first image block that meets a preset rule comprises: traversing uncoded image blocks in the video frame, and determining the image blocks with the prediction modes being inter-frame prediction as second image blocks; traversing second image blocks in the video frame, and determining k second image blocks which are the same as the target image block in area and are adjacent to the target image block as first image blocks, wherein k is a positive integer.

In some embodiments, traversing uncoded image blocks in a video frame, determining at least one first image block that meets a preset rule comprises: traversing uncoded image blocks in the video frame, and determining the image blocks with the prediction modes being inter-frame prediction as third image blocks; traversing third image blocks in the video frame, and determining at least one third image block which is equal to the preset area and is adjacent to the target image block as a first image block, wherein the sum of the areas of the third image blocks and the target image block is equal to the preset area.

In some embodiments, the encoding cost is a rate-distortion cost calculated based on a stream length and a video distortion level of the encoded data of the target image block.

In some embodiments, the video encoding method further comprises: and under the condition that the coding cost of the second residual matrix is greater than or equal to that of the first residual matrix of the target image block, obtaining the coded data corresponding to the target image block based on the first residual matrix of the target image block.

In some embodiments, the prediction mode of the target image block is inter prediction.

In a second aspect, there is provided a video decoding method, comprising: acquiring coded data corresponding to a target image block and at least one first image block, wherein the coded data comprises a second residual matrix corresponding to the target image block and the at least one first image block, and the coded data is obtained based on the method of the first aspect; acquiring a preset rule corresponding to a target image block; splitting the second residual matrixes based on a preset rule to obtain a plurality of first residual matrixes; the encoded data is decoded into a target image block and at least one first image block based on a plurality of first residual matrices.

In a third aspect, there is provided a video encoding apparatus comprising: the first acquisition module is used for acquiring a target image block in a video frame; the determining module is used for traversing uncoded image blocks in the video frame and determining at least one first image block conforming to a preset rule; the second acquisition module is used for respectively acquiring first residual matrixes of the target image block and at least one first image block; the fusion module is used for fusing the first residual matrixes of the target image block and at least one first image block to obtain a second residual matrix; and the encoding module is used for obtaining encoding data corresponding to the target image block and the at least one first image block based on the second residual matrix under the condition that the encoding cost of the second residual matrix is smaller than the sum of the encoding cost of the first residual matrix of each of the target image block and the at least one first image block.

In a fourth aspect, there is provided a video decoding apparatus comprising: the first acquisition module is used for acquiring the target image block and the coded data corresponding to the at least one first image block, wherein the coded data comprises a second residual matrix corresponding to the target image block and the at least one first image block, and the coded data is obtained based on the method of the first aspect; the second acquisition module is used for acquiring a preset rule corresponding to the target image block; the splitting module is used for splitting the second residual matrixes based on a preset rule to obtain a plurality of first residual matrixes; and a decoding module for decoding the encoded data into a target image block and at least one first image block based on the plurality of first residual matrices.

In a fifth aspect, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the methods of the first and second aspects described above via execution of executable instructions.

In a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the methods of the first and second aspects.

According to the video coding method provided by the embodiment of the disclosure, the target image block and the residual error matrix of the first image block conforming to the preset rule are fused to obtain the second residual error matrix, and the internal correlation of the residual error matrix is optimized; and comparing the sum of the coding costs of the second residual matrix and the first residual matrix of each of the target image block and the first image block, and obtaining the coding data corresponding to the target image block and the first image block based on the second residual matrix under the condition that the coding cost of the second residual matrix is smaller. Therefore, the expression efficiency of the coded data can be improved, the code rate of video quantization output is greatly reduced, and the video compression efficiency and the video coding performance are integrally improved.

Drawings

Fig. 1 is a flow chart illustrating a video encoding method according to an embodiment of the disclosure.

Fig. 2 is a flowchart illustrating a process of traversing an uncoded image block in a video frame to determine at least one first image block according to a preset rule according to an embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating another method for traversing an uncoded image block in a video frame to determine at least one first image block according to a preset rule according to an embodiment of the present disclosure.

Fig. 4 is a flow chart illustrating a video decoding method according to an embodiment of the disclosure.

Fig. 5 illustrates a schematic structural diagram of a video encoding apparatus in an embodiment of the present disclosure.

Fig. 6 illustrates a schematic structure of a video decoding apparatus in an embodiment of the present disclosure.

Fig. 7 shows a schematic structural diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The video coding technology can remove redundant information in an original video file through a compression algorithm, and reduce the size of the video file, so that the video file is easier to transmit, store and process. In the related art, each video frame is divided into a plurality of image blocks. And predicting each image block by using an intra-frame prediction mode or an inter-frame prediction mode to obtain a predicted image block corresponding to the image block, and performing difference between the image block and the predicted image block to obtain a residual matrix of the image block. Then, performing operations such as frequency domain transformation, quantization, entropy coding and the like on the residual matrix, and finally obtaining the coded data of the image block. In the process, the frequency domain transformation expects that each element of the residual matrix meets a certain distribution relation so as to obtain a better compression effect. However, the residual matrix in practical application often cannot meet the condition, so that the coding performance cannot be guaranteed.

In view of this, the present disclosure provides a video encoding method, including: acquiring a target image block in a video frame; traversing uncoded image blocks in a video frame, and determining at least one first image block conforming to a preset rule; respectively acquiring respective first residual matrixes of a target image block and at least one first image block; fusing the first residual matrixes of the target image block and at least one first image block respectively to obtain a second residual matrix; and under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block, obtaining the coding data corresponding to the target image block and the at least one first image block based on the second residual matrix. Therefore, the expression efficiency of the coded data can be improved, the code rate of video quantization output is greatly reduced, and the video compression efficiency and the video coding performance are integrally improved.

The embodiment of the disclosure provides a video encoding method, a video decoding method, a device, electronic equipment and a storage medium. The video encoding device and/or the video decoding device may be integrated in an electronic device, which may be a terminal such as a smart phone, a video camera, a monitoring camera, or a server.

It is understood that the video encoding method and/or the video decoding method according to the embodiments of the present disclosure may be performed on a terminal, may be performed on a server, or may be performed jointly by the terminal and the server. And should not be construed as limiting the present disclosure.

The video encoding method and the video decoding method of the embodiment of the disclosure can be applied to any video encoder and decoder, and are applicable to any video encoding standard such as H.265, H.264, AVS, AV1 and the like. It is to be understood that the above examples are intended solely to clearly describe the specific implementations of the embodiments of the present disclosure and should not be construed as limiting the present disclosure.

The present exemplary embodiment will be described in detail below with reference to the accompanying drawings and examples.

First, in an embodiment of the present disclosure, a video encoding method is provided, which may be performed by any electronic device having computing processing capabilities.

Fig. 1 is a schematic flow chart of a video encoding method according to an embodiment of the disclosure, and as shown in fig. 1, the video encoding method provided by the embodiment of the disclosure includes the following steps.

S101, acquiring a target image block in a video frame.

In particular, a video consists of a series of consecutive still pictures, and a video frame is a still picture in the video. The encoder divides each video frame into a plurality of image blocks according to a particular rule. The encoder may divide the video frame into a plurality of image blocks of the same size or into a plurality of image blocks of non-identical size. Each image block includes chrominance and luminance information of a portion of the image in the video frame. For example, the encoder may divide the video frame into a plurality of coding tree units with the same size according to a fixed size, and then tree-divide each coding tree unit according to a specific rule to obtain a plurality of image blocks.

In some embodiments, any image block may be selected as a target image block, and after the processing of the current target image block is finished, other unprocessed image blocks in the video frame are selected as new target image blocks. Thus, the method of the embodiment is guaranteed to traverse all image blocks in the video frame.

S102, traversing uncoded image blocks in the video frame, and determining at least one first image block conforming to a preset rule.

In some embodiments, the encoder traverses all the image blocks not encoded in the video frame in which the target image block is located, and determines at least one image block that meets a preset rule as the first image block. Illustratively, the encoder starts searching from an uncoded image block numbered closest to the target image block according to the image block numbering order, and stops searching when an image block conforming to a preset rule is searched. The image blocks in the same video frame can be numbered according to a scanning sequence from left to right and from top to bottom, and the number of each image block is unique.

For example, the preset rule may be a rule set for an area of the image block; alternatively, the preset rule may be a rule set for the length and/or width of the image block.

In some embodiments, if the encoder traverses all the uncoded image blocks in the video frame where the target image block is located, and then does not find the image block meeting the preset rule, the target image block is encoded separately to obtain the encoded data corresponding to the target image block, and the processing of the current target image block is ended.

S103, respectively acquiring respective first residual matrixes of the target image block and at least one first image block.

In the encoding process, the encoder calculates the difference between the original image block and the predicted image block to obtain residual information, and stores the residual information into the encoding information, so that the data volume of the video code stream is reduced, and the compression efficiency and the video quality are improved. Wherein, the predictive image block is calculated by an encoder based on a predictive algorithm, and the residual information is usually represented by a matrix. Thus, the above-mentioned processing is performed on the target image block and the at least one first image block, respectively, so that the respective first residual matrices of the target image block and the at least one first image block can be obtained.

And S104, fusing the first residual matrixes of the target image block and at least one first image block to obtain a second residual matrix.

In some embodiments, multiple first residual matrices or transposed versions of the first residual matrices may be stitched into one large matrix as the second residual matrix. Wherein the elements in the second residual matrix are a set of all elements in the plurality of first residual matrices.

Illustratively, if the first residual matrix comprises an 8×4 order matrix a and a 4×8 order matrix B. If the two matrixes are fused, the transposes of A and B can be transversely arranged to be combined to obtain a second residual matrix (A|B) with 8 multiplied by 8 ^T ) _8×8 The method comprises the steps of carrying out a first treatment on the surface of the Or the transpose of A and B are longitudinally arranged to be combined to obtain a second residual matrix of 8 multiplied by 8 order

Illustratively, if the first residual matrix comprises a 4×4 order matrix C and a 4×4 order matrix D. If the two matrixes are fused, C and D can be transversely arranged to be combined to obtain a second residual matrix (C|D) with the order of 4 multiplied by 8 _4×8 The method comprises the steps of carrying out a first treatment on the surface of the Or longitudinally arranging C and D, and combining to obtain a second residual matrix of 8×4 order

The above combining manner can combine a plurality of first residual matrices into one second residual matrix. The combination method is simple and easy to restore, and the information contained in the original first residual matrix is not lost.

In some embodiments, elements in the second residual matrix may be rearranged through matrix transformation, so that information with approximate energy sizes in the second residual matrix is gathered together, so as to improve energy aggregation, and enable the second residual matrix to present a better coding effect.

And S105, obtaining the coded data corresponding to the target image block and the at least one first image block based on the second residual matrix under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block.

In some embodiments, the second residual matrix is frequency domain transformed. The frequency domain transformation is an orthogonal transformation for transforming the image from the space domain to other domains (such as the frequency domain), and the complex computation in the space domain can be simplified after being transformed to the other domains by means of the characteristic of the orthogonal transformation, so as to obtain the frequency domain coefficient of the second residual matrix. Illustratively, the frequency domain transform may be a discrete cosine transform (Discrete Cosine Transform, DCT) or a discrete sine transform (Discrete Sine Transform, DST), or the like. The frequency domain transform can compress most of the energy in the information into fewer coefficients, further increasing the energy concentration.

The frequency domain coefficients of the second residual matrix have a large amount of information, and in order to further compress the amount of information, the frequency domain coefficients need to be quantized, and parameters in the frequency domain coefficients are classified into specific value domains. Illustratively, each parameter in the frequency domain coefficients may be divided by a scaling factor to obtain an integer or an integer with a remainder, and the remainder is discarded by rounding, ultimately quantizing the frequency domain coefficients into the form of a plurality of integers. In this process, part of the accuracy (i.e., image quality) of the information is lost, but redundant information can be removed, so that the coding efficiency is greatly improved, and the purpose of compressing the information amount is achieved.

And carrying out entropy coding on the quantized frequency domain coefficients to finally obtain coded data of a second residual matrix, namely coded data corresponding to the target image block and at least one first image block.

In some embodiments, although the compression efficiency of the transformed encoded data is high and the code rate is small, the lost information amount is too large, so that the image distortion of the part of encoded data after decoding is larger; however, if the excessive information is reserved, the image distortion is smaller, and the code rate is correspondingly increased. Thus, the relation between image quality and coding efficiency can be measured using coding cost. The coding cost represents the distortion degree of different methods under the condition of a certain code rate, and the smaller the coding cost of one method is, the higher the coding performance of the method is.

Thus, in the face of two schemes of encoding a plurality of image blocks simultaneously based on the second residual matrix or encoding a target image block separately, the encoding costs of the first residual matrix of the target image block, the first residual matrix of the first image block, and the second residual matrix can be calculated, respectively. And if the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the first image block, selecting to code the target image block and the first image block based on the second residual matrix to obtain corresponding coded data.

In some embodiments, the decoder is able to determine whether the second residual matrix needs to be split into a plurality of first residual matrices when decoding the video encoding subsequently, by recording, with 1 bit, whether the current encoded data is encoded based on the second residual matrix.

Through the steps, the method of the embodiment can obtain the second residual matrix by fusing the residual matrix of the target image block and the residual matrix of the first image block conforming to the preset rule, and optimize the internal correlation of the residual matrix; and comparing the coding cost of the second residual matrix with the sum of the coding cost of the first residual matrix of each of the target image block and the first image block, and obtaining the coding data corresponding to the target image block and the first image block based on the second residual matrix under the condition that the coding cost of the second residual matrix is smaller. Therefore, the expression efficiency of the coded data can be improved, the code rate of video quantization output is greatly reduced, and the video compression efficiency and the video coding performance are integrally improved.

In some embodiments, the rate distortion cost of the residual matrix may be used as the coding cost. The rate distortion cost is obtained based on the code stream length and video distortion degree of the coded data of the target image block through calculation, and can be written as:

J＝D+λR

Wherein J is rate distortion cost, R is code stream length of encoded data, lambda is Lagrange coefficient, D is distortion degree, the distortion degree is obtained by inverse quantization and inverse transformation of encoded data, reconstructing with a predicted image block to obtain a reconstructed image block, and subtracting the reconstructed image block from an original target image block point by point. The smaller the rate distortion cost of the residual matrix, the smaller the distortion degree at a certain code rate, namely the smaller the coding cost.

The rate distortion cost can fully embody the correlation between the coding cost and the code stream length and the distortion degree; and by adjusting lambda, the weight between the code stream length and the distortion degree can be flexibly adjusted. Therefore, there is good performance with the rate distortion penalty as the coding cost.

In some embodiments, if the encoding cost of the second residual matrix is greater than or equal to the sum of the encoding costs of the first residual matrices of the target image block and the first image block, selecting to encode the target image block based on the first residual matrix of the target image block to obtain encoded data corresponding to the target image block; and meanwhile, the first image block is not encoded, and the first image block is used as an uncoded image block to participate in the subsequent encoding process. In the encoding of the first residual matrix based on the target image block, the first residual matrix also needs to be subjected to frequency domain transformation and quantization.

Through the steps, the first image block which is not used as the new target image block in the subsequent processing process can be ensured, repeated calculation and processing of the first image block are avoided, so that the calculation force is saved, and the coding performance is improved.

One implementation of traversing an uncoded image block in a video frame, determining at least one first image block that meets a preset rule in an embodiment of the present disclosure is described below. As shown in fig. 2, traversing uncoded image blocks in a video frame provided in an embodiment of the present disclosure, determining at least one first image block that meets a preset rule includes the following steps.

S201, traversing an uncoded image block in a video frame, and determining the image block with the prediction mode being inter-frame prediction as a second image block.

As above, the predicted image block is calculated by the encoder based on the prediction algorithm. Specifically, during the encoding process, the encoder predicts the uncoded image block according to the reference image with highest similarity with the current uncoded image block and some algorithms (such as motion compensation technology), and obtains a predicted image block. Wherein each uncoded image block corresponds to a predicted image block. And, according to the difference of the positions of the reference images, the above prediction process is divided into two prediction modes of intra prediction and inter prediction.

In the intra prediction mode, luminance and chrominance information of adjacent pixels in a video frame are relatively close and gradually changed, and abrupt changes are not generated. Therefore, the coded image blocks in the same video frame can be used as reference images to predict the uncoded image blocks, so that the purpose of reducing the spatial redundancy is achieved. In the actual encoding process, the encoded data of the already encoded image block is decoded into the original image block as a reference image of the uncoded image block.

Or in the inter prediction mode, the change between the front and rear video frames in the same video is smaller, and the time correlation of the video is higher. Therefore, it is easier to find an image block with a small difference from the uncoded image block from the video frame other than the video frame where the uncoded image block is located as a reference image, and predict the uncoded image block, thereby achieving the purpose of reducing time redundancy.

Since the reference image of the image block in the intra prediction mode is a certain image block in the current video frame, when the method of the embodiment is executed, it cannot be guaranteed that the reference image of the first image block conforming to the preset rule is an encoded image block, and at this time, an additional algorithm needs to be set to ensure that the method of the embodiment is continuously executed. Thus, in some embodiments, the image block of the inter prediction mode may be selected first as the second image block. And selecting the image block conforming to the preset rule from the second image blocks as the first image block, so as to ensure that the method of the embodiment can be completely executed without increasing excessive calculation amount.

S202, traversing second image blocks in the video frame, and determining k second image blocks which are the same as the target image block in area and are adjacent to the target image block as first image blocks.

In some embodiments, the preset rule may be set to determine k second image blocks that are the same as the target image block area and nearest to the target image block as the first image blocks. Wherein k is a positive integer, and the specific value of k can be set according to the actual application requirement.

In some embodiments, the second image block adjacent to the target image block may be understood as having the number size closest to the target image block, so as to ensure that the method of this embodiment is performed orderly, and not search for too many image blocks that meet the preset rule, so as to save computation power and improve coding performance.

Illustratively, table 1 shows information of a partial image block of a video frame in which a target image block is located, wherein the units of width and height are pixels, and the units of area are square pixels. When k=1, the search may be backward based on the number of the target image block, and the searched second image block having the same first area as the target image block may be used as the first image block.

First, the image block with the number 1 is used as a target image block, the rest image blocks in the list are traversed, and the image block which accords with the preset rule (namely, the area is 1024 square pixels) is not found, and at this time, the target image block is independently encoded. Taking the image block with the number of 2 as a new target image block, traversing the uncoded image blocks in the list, finding the image block with the number of 4 as a first image block, wherein the area of the first image block is equal to that of the target image block, and the first image block is nearest to the target image block.

TABLE 1

Image block numbering	Width of (L)	Height	Area of	Prediction mode
					1	64	16	1024	Inter prediction
2	16	32	512	Inter prediction
					3	16	8	128	Inter prediction
4	16	32	512	Inter prediction
					5	16	16	256	Inter prediction
6	8	16	128	Inter prediction
					7	32	16	512	Inter prediction
8	8	8	64	Inter prediction
					9	16	16	256	Inter prediction
10	16	8	128	Inter prediction

For example, when k=3, the first three searched second image blocks having the same area as the target image block may be searched for as the three first image blocks based on the number of the target image block.

Through the above steps, the method of the present embodiment can effectively determine at least one first image block by simple rules. The preset rule can ensure that the first residual matrixes of the target image block and the first image block can be combined into the second residual matrix in a simple mode, so that the coding performance is further improved.

Another implementation of traversing an uncoded image block in a video frame to determine at least one first image block that meets a preset rule in an embodiment of the present disclosure is described below. As shown in fig. 3, traversing uncoded image blocks in a video frame provided in an embodiment of the present disclosure, determining at least one first image block that meets a preset rule includes the following steps.

S301, traversing uncoded image blocks in the video frame, and determining the image block with the prediction mode being inter-frame prediction as a third image block.

S302, traversing third image blocks in the video frame, and determining at least one third image block which is equal to the preset area and is close to the target image block as a first image block, wherein the sum of the areas of the third image blocks and the target image block is equal to the preset area.

In some embodiments, the sum of the areas with the target image block may be equal to a preset area, and at least one second image block nearest to the target image block may be taken as the first image block. The specific value of the preset area can be set according to the actual application requirement, but in order to ensure that the subsequent frequency domain transformation process can be correctly executed, the preset area needs to be set to 2 ^m ×2 ⁿ Wherein m and n are positive integers.

Illustratively, with continued reference to Table 1, during encoding, a predetermined area of 1024 square pixels is first determined.

Firstly, taking an image block with the number of 1 as a target image block, wherein the area of the target image block is equal to a preset area, and independently encoding the target image block. Taking the image block with the number of 2 as a new target image block, traversing the uncoded image blocks in the list to find the image block with the number of 4, taking the image block as a first image block, wherein the sum of the area of the image block and the area of the target image block is 1024 square pixels, and the image block is nearest to the target image block. Then, taking the image block with the number of 3 as a new target image block, traversing the uncoded image blocks in the list; if the image block with the number of 4 is coded, the image blocks with the numbers of 3, 5, 6 and 7 are found and used as four first image blocks; if the image block with the number of 4 is not coded, the image blocks with the numbers of 3, 4, 5 and 6 are found as four first image blocks.

Through the steps, the method of the embodiment can effectively determine at least one first image block, and the preset rule is more flexible. The first residual matrixes of the target image block and the first image block can be combined into the second residual matrix in a simple mode, and coding performance is further improved.

In some embodiments, an image block whose prediction mode is inter prediction may be selected as the target image block first. As described above, image blocks of the intra prediction mode have a mutual dependency relationship. Therefore, the method according to the embodiment of the present disclosure is performed on an image block whose prediction mode is inter prediction as a target image block, and after encoding of the image blocks whose all prediction modes are inter prediction is completed, the remaining image blocks are encoded. With this, it can be ensured that the reference image of each image block is encoded, so that the method of the present embodiment can be smoothly performed.

In practice, it is found that within a video frame, the number of image blocks in intra-prediction mode is much smaller than the number of image blocks in inter-prediction mode, and the probability of finding a suitable first image block is small. Thus, in some embodiments, image blocks whose prediction mode is intra-prediction may be separately encoded to save computational effort, while sacrificing some video compression efficiency, improving coding performance as a whole.

Based on the same inventive concept, the embodiment of the present disclosure further provides a video decoding method, as shown in fig. 4, where the video decoding method provided in the embodiment of the present disclosure includes the following steps.

S401, obtaining coding data corresponding to a target image block and at least one first image block.

In some embodiments, the decoder may obtain the encoded data and restore the encoded data to a second residual matrix by inverse quantization and inverse frequency domain transformation.

S402, acquiring a preset rule corresponding to the target image block.

In some embodiments, the preset rule corresponding to the target image block may be obtained according to information in the encoded data or information preset in the decoder.

S403, splitting the second residual matrixes based on a preset rule to obtain a plurality of first residual matrixes.

In the decoding process, the decoder knows whether the first residual matrix of the target image block is fused with the first residual matrix of other image blocks or not through information in the encoded data. If yes, the second residual matrix can be split into a plurality of first residual matrices based on a preset rule. The plurality of first residual matrixes correspond to the target image block and at least one first image block conforming to a preset rule.

And S404, decoding the coded data into a target image block and at least one first image block based on the plurality of first residual matrixes.

The first residual matrix is the original residual data of the corresponding image block, so that the target image block and the first image block can be obtained based on the first residual matrix and the predicted image block respectively. Wherein the predicted image block is predicted based on the reference image and some algorithms.

Through the steps, the method of the embodiment can restore the coded data of the target image block into the target image block, and complete the conversion from the data to the image. And combining a plurality of images through an algorithm to finally finish the conversion from data to video.

Based on the same inventive concept, a video encoding apparatus is also provided in the embodiments of the present disclosure, as follows. Since the principle of solving the problem of the embodiment of the device is similar to that of the embodiment of the method, the implementation of the embodiment of the device can be referred to the implementation of the embodiment of the method, and the repetition is omitted.

Fig. 5 shows a schematic structural diagram of a video encoding apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the video encoding apparatus 500 includes: a first acquisition module 501, a determination module 502, a second acquisition module 503, a fusion module 504, and an encoding module 505.

Specifically, the first obtaining module 501 is configured to obtain a target image block in a video frame. The determining module 502 is configured to traverse the uncoded image blocks in the video frame and determine at least one first image block that meets a preset rule. The second obtaining module 503 is configured to obtain a first residual matrix of each of the target image block and the at least one first image block. The fusion module 504 is configured to fuse respective first residual matrices of the target image block and at least one first image block to obtain a second residual matrix. The encoding module 505 is configured to obtain encoded data corresponding to the target image block and the at least one first image block based on the second residual matrix when the encoding cost of the second residual matrix is less than the sum of the encoding costs of the first residual matrices of the target image block and the at least one first image block.

In some embodiments, the determining module 502 is further configured to traverse the uncoded image blocks in the video frame, determine that the image block whose prediction mode is inter-prediction is the second image block; traversing second image blocks in the video frame, and determining k second image blocks which are the same as the target image block in area and are adjacent to the target image block as first image blocks, wherein k is a positive integer.

In some embodiments, the determining module 502 is further configured to traverse the uncoded image blocks in the video frame, determine that the image block whose prediction mode is inter-prediction is the third image block; traversing third image blocks in the video frame, and determining at least one third image block which is equal to the preset area and is adjacent to the target image block as a first image block, wherein the sum of the areas of the third image blocks and the target image block is equal to the preset area.

In some embodiments, the encoding module 505 is further configured to obtain, based on the first residual matrix of the target image block, encoded data corresponding to the target image block if the encoding cost of the second residual matrix is greater than or equal to the encoding cost of the first residual matrix of the target image block.

It should be noted that, when the video encoding device provided in the foregoing embodiment is used for video encoding, only the division of the foregoing functional modules is used for illustration, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the video encoding device and the video encoding method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the video encoding device and the video encoding method are detailed in the method embodiments and are not repeated herein.

Based on the same inventive concept, a video decoding apparatus is also provided in the embodiments of the present disclosure, as follows. Since the principle of solving the problem of the embodiment of the device is similar to that of the embodiment of the method, the implementation of the embodiment of the device can be referred to the implementation of the embodiment of the method, and the repetition is omitted.

Fig. 6 is a schematic structural diagram of a video decoding apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the video decoding apparatus 600 includes: a first acquisition module 601, a second acquisition module 602, a splitting module 603 and a decoding module 604.

Specifically, the first obtaining module 601 is configured to obtain encoded data corresponding to the target image block and the at least one first image block, where the encoded data includes a second residual matrix corresponding to the target image block and the at least one first image block, and the encoded data is obtained based on a method in an embodiment of the disclosure. The second obtaining module 602 is configured to obtain a preset rule corresponding to the target image block. The splitting module 603 is configured to split the second residual matrix based on a preset rule, to obtain a plurality of first residual matrices. The decoding module 604 is configured to decode the encoded data into a target image block and at least one first image block based on the plurality of first residual matrices.

It should be noted that, when the video decoding device provided in the foregoing embodiment is used for video decoding, only the division of the foregoing functional modules is used for illustration, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the video decoding device and the video decoding method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not described herein again.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to such an embodiment of the present disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of application of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device 700 is embodied in the form of a general purpose computing device. Components of electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, and a bus 730 connecting the different system components, including the memory unit 720 and the processing unit 710.

Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification.

In some embodiments, the processing unit 710 may perform the following steps of the method embodiments described above: acquiring a target image block in a video frame; traversing uncoded image blocks in a video frame, and determining at least one first image block conforming to a preset rule; respectively acquiring respective first residual matrixes of a target image block and at least one first image block; fusing the first residual matrixes of the target image block and at least one first image block respectively to obtain a second residual matrix; and under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block, obtaining the coding data corresponding to the target image block and the at least one first image block based on the second residual matrix.

In some embodiments, the processing unit 710 may also perform the following steps of the method embodiments described above: acquiring coded data corresponding to a target image block and at least one first image block, wherein the coded data comprises a second residual matrix corresponding to the target image block and the at least one first image block, and the coded data is obtained based on the method in the embodiment of the disclosure; acquiring a preset rule corresponding to a target image block; splitting the second residual matrixes based on a preset rule to obtain a plurality of first residual matrixes; the encoded data is decoded into a target image block and at least one first image block based on a plurality of first residual matrices.

The memory unit 720 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 7201 and/or cache memory 7202, and may further include Read Only Memory (ROM) 7203.

The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 730 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 740 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 700, and/or any device (e.g., router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 750. Also, electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 760. As shown, network adapter 760 communicates with other modules of electronic device 700 over bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium, which may be a readable signal medium or a readable storage medium, is also provided. On which a program product is stored which enables the implementation of the method described above of the present disclosure. In some possible implementations, the various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the above method examples section of this specification, when the program product is run on the terminal device.

More specific examples of the computer readable storage medium in the present disclosure may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In this disclosure, a computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Alternatively, the program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In particular implementations, the program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the description of the above embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A video encoding method, comprising:

acquiring a target image block in a video frame;

traversing uncoded image blocks in the video frame, and determining at least one first image block conforming to a preset rule;

respectively acquiring respective first residual matrixes of the target image block and the at least one first image block;

fusing the first residual matrixes of the target image block and the at least one first image block to obtain a second residual matrix;

and under the condition that the coding cost of the second residual matrix is smaller than the sum of the coding cost of the first residual matrix of each of the target image block and the at least one first image block, obtaining the coding data corresponding to the target image block and the at least one first image block based on the second residual matrix.

2. The method of claim 1, wherein traversing the uncoded image blocks in the video frame determines at least one first image block that meets a preset rule, comprising:

traversing uncoded image blocks in the video frame, and determining the image blocks with the prediction modes being inter-frame prediction as second image blocks;

traversing second image blocks in the video frame, and determining k second image blocks which are the same as the target image block in area and are adjacent to the target image block as the first image block, wherein k is a positive integer.

3. The method of claim 1, wherein traversing the uncoded image blocks in the video frame determines at least one first image block that meets a preset rule, comprising:

traversing uncoded image blocks in the video frame, and determining the image blocks with the prediction modes being inter-frame prediction as third image blocks;

traversing third image blocks in the video frame, and determining at least one third image block which is equal to the sum of the areas of the target image blocks and is adjacent to the target image block as the first image block.

4. The method of claim 1, wherein the encoding cost is a rate-distortion cost calculated based on a stream length and a video distortion level of encoded data of the target image block.

5. The method as recited in claim 1, further comprising:

and under the condition that the coding cost of the second residual matrix is greater than or equal to the coding cost of the first residual matrix of the target image block, obtaining the coded data corresponding to the target image block based on the first residual matrix of the target image block.

6. The method of claim 1, wherein the prediction mode of the target image block is inter prediction.

7. A video decoding method, comprising:

acquiring encoded data corresponding to a target image block and at least one first image block, wherein the encoded data comprises a second residual matrix corresponding to the target image block and the at least one first image block, and the encoded data is obtained based on the method of any one of claims 1 to 6;

acquiring a preset rule corresponding to the target image block;

splitting the second residual matrixes based on the preset rule to obtain a plurality of first residual matrixes;

the encoded data is decoded into the target image block and the at least one first image block based on the plurality of first residual matrices.

8. A video encoding apparatus, comprising:

the first acquisition module is used for acquiring a target image block in a video frame;

the determining module is used for traversing the uncoded image blocks in the video frame and determining at least one first image block conforming to a preset rule;

the second acquisition module is used for respectively acquiring the first residual matrixes of the target image block and the at least one first image block;

the fusion module is used for fusing the first residual matrixes of the target image block and the at least one first image block to obtain a second residual matrix;

And the encoding module is used for obtaining the encoding data corresponding to the target image block and the at least one first image block based on the second residual matrix under the condition that the encoding cost of the second residual matrix is smaller than the sum of the encoding cost of the first residual matrix of each of the target image block and the at least one first image block.

9. A video decoding apparatus, comprising:

a first obtaining module, configured to obtain encoded data corresponding to a target image block and at least one first image block, where the encoded data includes a second residual matrix corresponding to the target image block and the at least one first image block, and the encoded data is obtained based on the method of any one of claims 1 to 6;

the second acquisition module is used for acquiring a preset rule corresponding to the target image block;

the splitting module is used for splitting the second residual matrixes based on the preset rule to obtain a plurality of first residual matrixes;

a decoding module for decoding the encoded data into the target image block and the at least one first image block based on the plurality of first residual matrices.

10. An electronic device, comprising:

A processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1 to 7 via execution of the executable instructions.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 7.