WO2019141255A1

WO2019141255A1 - Image filtering method and device

Info

Publication number: WO2019141255A1
Application number: PCT/CN2019/072412
Authority: WO
Inventors: 姚佳宝
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-01-18
Filing date: 2019-01-18
Publication date: 2019-07-25
Also published as: CN110062225B; CN110062225A

Abstract

The present application relates to an image filtering method and a device, pertaining to the field of video imaging. The method comprises: acquiring a distorted image, the distorted image being distorted with respect to an original video image input into a video encoding system; dividing the distorted image, and acquiring multiple distorted image blocks comprised in the distorted image; filtering each of the distorted image blocks of the distorted image by means of a convolutional neural network model, and obtaining undistorted image blocks respectively corresponding to the distorted image blocks; and generating an image frame according to the undistorted image blocks respectively corresponding to the distorted image blocks. The device comprises: a first acquisition module, a second acquisition module, a filtering module, and a generation module. The present application reduces the amount of resources required for filtering, such that an apparatus meets a resource requirement for filtering.

Description

Method and device for image filtering

The present application claims priority to Chinese Patent Application No. 20181005042, filed on Jan. 18, s., the entire disclosure of which is hereby incorporated by reference.

Technical field

The present application relates to the field of video, and in particular, to a method and an apparatus for filtering pictures.

Background technique

In the video coding system, when encoding the original video picture, the original video picture is processed multiple times and the reconstructed picture is obtained. The resulting reconstructed picture may have been pixel-shifted relative to the original video picture, ie, the reconstructed picture is distorted, resulting in visual impairment or artifacts.

These distortions not only affect the subjective and objective quality of the reconstructed image. If the reconstructed image is used as a reference for subsequent encoded pixels, this will also affect the prediction accuracy of subsequent encoded pixels and affect the size of the final bitstream. Therefore, in the video codec system, an in-loop filtering module is added, and the reconstructed picture is filtered by an in-loop filtering module to eliminate distortion existing in the reconstructed picture.

In the process of implementing the present application, the inventors found that the above method has at least the following defects:

Currently, the in-loop filtering module filters the reconstructed picture of the entire frame. When the reconstructed picture is a high-resolution picture, the resources required for filtering and reconstructing the picture are often high, so that the device may not be satisfied. For example, filtering a reconstructed picture of 4K resolution may cause insufficient memory.

Summary of the invention

In order to enable the device to meet the resources required for filtering, the embodiment of the present application provides a method and an apparatus for filtering a picture. The technical solution is as follows:

In a first aspect, an embodiment of the present application provides a method for filtering a picture, where the method includes:

Obtaining a distorted picture that is distorted relative to an original video picture that is input into the video encoding system;

Obtaining a plurality of first image blocks by dividing the distorted picture;

And filtering each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;

And generating a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.

Optionally, the acquiring the plurality of first image blocks by dividing the distorted picture comprises:

Dividing the distorted picture according to a target width and a target height to obtain a plurality of distorted image blocks included in the distorted picture;

Performing edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.

Optionally, the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;

The width and height of the first distorted image block are equal to W ₁ -lap and H ₁ -lap, respectively, W ₁ is the target width, H ₁ is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W ₁ -2lap and H ₁ -lap, respectively, and the width and height of the third distorted image block are W ₁ -lap and H ₁ -2lap, respectively, the fourth distortion The width and height of the image block are W ₁ -2lap and H ₁ -2lap, respectively.

Optionally, the performing the edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain the first image block corresponding to each of the distorted image blocks comprises:

And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;

And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.

Optionally, before using the convolutional neural network model to separately filter each of the distorted image blocks of the distorted picture, the method further includes:

And setting a flanged dimension corresponding to the convolution layer included in the convolutional neural network model, the set expansion dimension is not less than zero and not greater than a second expansion dimension corresponding to the convolution layer, and the second expansion The edge size is the expanded size of the convolutional layer when the convolutional neural network model is trained.

Optionally, the method further includes:

The first expanded size is set according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.

Optionally, the generating, according to the de-distorted image block corresponding to each of the distorted image blocks, a frame of the de-distorted image, comprising:

Performing edge processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;

The third image block corresponding to each of the distorted image blocks is composed into a frame de-distorted picture.

Optionally, the method further includes:

The target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.

In a second aspect, the embodiment of the present application provides an apparatus for filtering a picture, where the apparatus includes:

a first acquiring module, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;

a second acquiring module, configured to obtain a plurality of first image blocks by dividing the distorted picture;

a filtering module, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;

And a generating module, configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.

Optionally, the second obtaining module includes:

a dividing unit, configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture;

And an edge expansion unit, configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.

Optionally, the edge expansion unit is configured to:

Optionally, the device further includes:

a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer a size, the second expanded size is an expanded size of the convolution layer when the convolutional neural network model is trained

Optionally, the device further includes:

And a second setting module, configured to set the first expanded size according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.

Optionally, the generating module includes:

An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;

And a component unit, configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.

Optionally, the device further includes:

And a determining module, configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.

In a third aspect, an embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the first aspect or the first aspect is implemented. The method steps provided in any alternative manner.

The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

By dividing the distorted picture generated in the video encoding and decoding process, obtaining a plurality of distorted image blocks included in the distorted picture, and then using the convolutional neural network model to respectively filter each distorted image block of the distorted picture to obtain each distorted image. The block corresponding de-distorted image block generates a frame of picture according to the de-distorted image block corresponding to each of the distorted image blocks. The generated one-frame picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, so that the device can satisfy Required for filtering.

The above general description and the following detailed description are intended to be illustrative and not restrictive.

DRAWINGS

The drawings herein are incorporated in and constitute a part of the specification,

FIG. 1 is a flowchart of a method for filtering a picture according to an embodiment of the present application;

2 is a flowchart of another method for filtering a picture provided by an embodiment of the present application;

3 is a structural block diagram of a video encoding system according to an embodiment of the present application;

4 is a structural block diagram of another video encoding system according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

FIG. 6 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

FIG. 7 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

FIG. 8 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

FIG. 9 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

FIG. 10 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.

11 is a system architecture diagram of a technical solution provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of data flow of a technical solution provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of obtaining a distortion image color component of a distorted image according to an embodiment of the present application; FIG.

14 is a schematic diagram of side information components provided by an embodiment of the present application;

15 is a second schematic diagram of side information components provided by an embodiment of the present application;

16 is a flowchart of a method for removing distortion of a distorted image according to an embodiment of the present application;

17 is a flowchart of a method for training a convolutional neural network model provided by an embodiment of the present application;

FIG. 18 is a flowchart of another method for filtering a picture according to an embodiment of the present disclosure;

19 is a structural block diagram of a video encoding system according to an embodiment of the present application;

20 is a structural block diagram of another video encoding system according to an embodiment of the present application;

FIG. 21 is a structural block diagram of another video encoding system according to an embodiment of the present disclosure;

FIG. 22 is a schematic diagram of an apparatus for filtering a picture according to an embodiment of the present application;

FIG. 23 is a schematic structural diagram of a device according to an embodiment of the present application.

The embodiments of the present application have been illustrated by the above-described figures, which will be described in more detail hereinafter. The drawings and the written description are not intended to limit the scope of the present invention in any way, and the concept of the present application will be described by those skilled in the art by referring to the specific embodiments.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Instead, they are merely examples of devices and methods consistent with aspects of the present application as detailed in the appended claims.

Referring to FIG. 1, an embodiment of the present application provides a method for image filtering, including:

Step 101: Acquire a distortion picture generated by a video codec process.

Step 102: Acquire a plurality of distorted image blocks by dividing the distorted picture.

Optionally, in the process of video encoding or decoding, the entire frame of the video image may be obtained, and then the entire frame of the video image is divided to obtain multiple distortion pictures. Alternatively, in the process of video encoding or decoding, part of the image data in the entire frame of the video image may be acquired each time. When the acquired image data reaches a distorted image block, the following operations are performed on the distorted image block, thereby The method of dividing the distorted picture into a plurality of distorted image blocks is realized, and the efficiency of video encoding or decoding can be improved.

Step 103: Filter each of the distorted image blocks using a convolutional neural network model to obtain a de-distorted image block corresponding to each of the distorted image blocks.

Optionally, one or more distorted image blocks can be filtered at the same time, that is, parallel filtering can be implemented to improve filtering efficiency.

Step 104: Generate a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks.

The method provided in this embodiment may occur in a video encoding process or in a video decoding process. Therefore, the distorted picture may be a video picture generated during the video encoding process or a video picture generated during the video decoding process.

In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then each distorted image block is separately filtered by using a convolutional neural network model to obtain each distorted image. The block corresponding de-distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device The resources required for filtering can be satisfied, and the resources can be resources such as video memory and/or memory.

Referring to FIG. 2, an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during an encoding process, including:

Step 201: Acquire a distorted picture generated during video encoding.

A reconstructed picture may be generated during the video encoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.

Referring to the structure diagram of the video coding system shown in FIG. 3, the video coding system includes a prediction module, an adder, a transform unit, a quantization unit, an entropy encoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, and a CNN (convolution neural network). Model) and the buffer and other parts.

The process of encoding the video coding system may be: inputting the original picture into the prediction module and the adder, and the prediction module predicts the input original picture according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the addition method. , entropy coder and reconstruction unit. The prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch. The intra prediction unit may perform intra prediction on the original picture to obtain intra prediction data, and the motion estimation and motion compensation unit performs inter prediction on the original picture according to the reference picture buffered in the buffer to obtain inter prediction data, and the switch selects the intra frame. Predict the data or output the inter prediction data to the adder and the reconstruction unit. Optionally, the intra prediction data may include intra mode information, and the inter prediction data may include inter mode information.

The adder generates prediction error information according to the prediction data and the original picture, and the transform unit transforms the prediction error information, and outputs the transformed prediction error information to the quantization unit; the quantization unit quantizes the transformed prediction error information according to the quantization parameter. The residual information is obtained, and the residual information is output to an entropy encoder and an inverse quantization unit; the entropy encoder encodes information such as residual information and prediction data to form a bitstream. At the same time, the inverse quantization unit and the inverse transform unit respectively perform inverse quantization and inverse transform processing on the residual information to obtain prediction error information, and input the prediction error information into the reconstruction unit; the reconstruction unit generates the prediction error information according to the prediction error information and the prediction data. Refactor the image. Correspondingly, in this step, the reconstructed picture generated by the reconstructing unit may be acquired, and the reconstructed picture is taken as a distorted picture.

Optionally, referring to FIG. 4, a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.

The distorted picture is distorted relative to the original video picture.

Step 202: Divide the distorted picture according to the target width and the target height to obtain a plurality of distorted image blocks included in the distorted picture.

The distortion image block divided in this step may be an image block of equal size or may not be an image block of equal size.

In the first case, when each distorted image block can be equal in size, the width of each distorted image block in the distorted picture can be equal to the target width, and the height of each distorted image block in the distorted picture can be equal to the target height.

When the width of the distorted picture is an integral multiple of the target width, there is no overlap between the distorted image blocks in each line of the distorted image block obtained according to the target width division. For example, referring to FIG. 5, the width of the distorted picture is equal to an integral multiple of the target width, and each line obtained according to the target width division includes three distorted image blocks, and for each line of distorted image blocks, the line includes three distorted image blocks. There is no overlap.

When the width of the distorted picture is not an integer multiple of the target width, there is an overlap between two distorted image blocks in each line of the distorted image block obtained according to the target width division. For example, referring to FIG. 6, the width of the distorted picture is not equal to an integral multiple of the target width, and each line obtained according to the target width includes four distorted image blocks, and for each line of distorted image blocks, the line includes the third and fourth There is an overlap between the pieces of distorted image, where ΔW is the overlapping width of the third and fourth distorted image blocks in FIG.

When the height of the distorted picture is an integral multiple of the target height, there is no overlap between the respective distorted image blocks in each column of the distorted image block obtained according to the target height division. For example, referring to FIG. 5, the height of the distorted picture is equal to an integral multiple of the target height, and each column obtained according to the target height division includes three distorted image blocks, and for each column of distorted image blocks, the column includes three distorted image blocks. There is no overlap.

When the height of the distorted picture is not an integer multiple of the target height, there is an overlap between the two distorted image blocks in each column of the distorted image block obtained according to the target height division. For example, referring to FIG. 7, the height of the distorted picture is not equal to an integral multiple of the target height, and each column obtained according to the target height division includes four distorted image blocks, and for each column of distorted image blocks, the column includes the third and fourth There is an overlap between the pieces of distorted image, where ΔH is the overlapping height of the third and fourth distorted image blocks in FIG.

In the second case, when each of the distorted image blocks can be unequal in size, the obtained plurality of distorted image blocks may include a first distorted image block, a second distorted image block, a third distorted image block, and a fourth distorted image block. Types.

Referring to FIG. 8 (the solid line frame in the figure is a distorted image block), the first distorted image block is located at a vertex position of the distorted picture, and the first distorted image block is respectively the image blocks P1, P5, P16, and P20 in FIG. The width and height of a distorted image block P1, P5, P16, and P20 are equal to W ₁ -lap and H ₁ -lap, respectively, W ₁ is the target width, H ₁ is the target height, and lap is the first expanded size.

The second distorted image block is located on an upper boundary and a lower boundary of the distorted picture, the second distorted image block is different from the first distorted image block, and the second distorted image block is respectively the image block P2, P3, P4 in FIG. , P17, P18, and P19, and the width and height of the second distorted image blocks P2, P3, P4, P17, P18, and P19 are equal to W ₁ -2lap and H ₁ -lap, respectively.

The third distorted image block is located on the left and right borders of the distorted picture, the third distorted image block is different from the first distorted image block, and the third distorted image block is respectively the image blocks P6, P11, P10 and P15 in FIG. The width and height of the third distortion image blocks P6, P11, P10, and P15 are W ₁ -lap and H ₁ -2lap, respectively.

The distortion image block of the plurality of distortion tiles except the first distortion image block, the second distortion image block, and the third distortion image block is a fourth distortion image block, and the fourth distortion image block is the image block in FIG. 8 respectively. P7, P8, P9, P12, P13 and P14, the width and height of the fourth distortion image blocks P7, P8, P9, P12, P13 and P14 are W ₁ -2lap and H ₁ -2lap, respectively.

Wherein, in the second case, the last two distorted image blocks in each line of the distorted image block may have partial overlap or partial overlap, for example, the distorted image block P4 located in the first line in FIG. There is a partial overlap with P5, and ΔW is the overlapping width of the distorted image blocks P4 and P5 of the first line in FIG. And, the last two distorted image blocks in each column of the distorted image block may have partial overlap or partial overlap. For example, the distorted image blocks P11 and P16 located in the first column in FIG. 8 partially overlap. In Fig. 8, ΔH is the overlapping height of the distortion image blocks P11 and P16 of the first column.

Before performing this step, the first expanded size may also be set, and the target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.

Wherein, the convolutional neural network model comprises a plurality of convolution layers, each convolution layer corresponding to a second expanded size. The first expanded size is calculated based on the second expanded size corresponding to each convolutional layer. Optionally, the second expanded size corresponding to each convolution layer may be accumulated to obtain an accumulated value, and the first expanded size is set to be greater than or equal to the accumulated value.

The process of determining the target width and target height will be described in the following content, which will not be explained here.

Optionally, obtaining the distorted picture may be obtaining an entire frame of the distorted picture, and then dividing the entire frame of the distorted picture; or

Each time the partial image data included in the frame distortion picture is acquired, when the acquired image data reaches the data amount of a distorted image block, the distorted image block is output, thereby realizing division of the distorted picture, so that it is not necessary to wait for the entire frame. Distorted images improve the efficiency of video encoding.

Wherein, in the first case, when the acquired image data can form a distorted image block having a width of a target width and a height of a target height, the distorted image block is output, thereby realizing dividing the distorted picture into a plurality of distorted images of equal size. Piece. For the second case described above, when the acquired image data is the data of the first distorted image block and can constitute the first distorted image block, the first distorted tile is output, and the acquired image data is the data of the second distorted image block. And when the second distortion image block can be composed, the second distortion tile is output, and when the acquired image data is data of the third distortion image block and can form a third distortion image block, the third distortion tile is output, when acquired When the image data is the data of the fourth distorted image block and can constitute the fourth distorted image block, outputting the fourth distorted tile; thereby dividing the distorted picture into the first distorted image block, the second distorted image block, and the third distorted image block And four types of distorted image blocks of the fourth distorted image block.

Step 203: Perform a process of expanding each of the distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.

Optionally, when each of the distorted image blocks is sized, the step may be:

The four edges of the target image block are respectively subjected to edge expansion processing according to the first expanded edge size to obtain a first image block corresponding to the target image block, and the target image block is any one of the plurality of distortion image blocks.

The width of the edge of each of the target image blocks is equal to the first expanded size. Assume that the width of the target image block is W ₁ , the height of the target image block is H ₁ , and the size of the first expanded edge is lap. After the target image block is expanded, the width of the first image block is W ₂ = W ₁ + ₂ lap. And the height of the first image block is H ₂ = H ₁ + ₂ lap.

For example, referring to FIG. 10, for any of the distorted image blocks, a distorted image block P1 is assumed, and each edge of the distorted image block is expanded by a first expanded-edge size lap to obtain a distorted image block.

When the distorted picture is divided into equal-sized distorted image blocks in the first case, the target width and the target height may be determined as follows before performing step 202.

The process for determining the target width may include the 31-34 process, which are:

31: Select a width value from the preset width range.

The preset width ranges from greater than 0 and less than an integer value in the width of the distorted picture. Optionally, the preset width range is greater than the first expanded size and smaller than the integer value of the width of the distorted picture. The first expanded edge size is typically greater than or equal to 1 pixel. For example, assuming that the width of the distorted picture is 10 pixels and assuming that the first expanded size is 1 pixel, the preset width range includes integer values 2, 3, 4, 5, 6, 7, 8, and 9.

32: If the width of the distorted picture is equal to an integral multiple of the width value, the width value is determined as the target width and ends.

33: If the width of the distortion symbol is not equal to an integral multiple of the width value, the overlap width corresponding to the width value is calculated according to the first formula as follows.

The first formula is:

In the first formula, ΔW is the overlap width corresponding to the selected width value, W ₁ is the selected width value, W ₂ is the width of the first image block after the edge processing of the distorted image block, and W ₃ is the distortion. The width of the image, % is the remainder operation.

34: If there is still an unselected width value in the preset width range, a width value is selected from the unselected width values, and execution 32 is returned; otherwise, the width value corresponding to the minimum overlap width is determined as the target width.

The process of determining the target height may include the 35-38 process, which are:

35: Select a height value from the preset height range.

The preset height range is greater than 0 and less than an integer value in the height of the distorted picture. Optionally, the preset height range is greater than the first expanded size and less than the integer value of the height of the distorted picture. For example, assuming that the height of the distorted picture is 10 pixels and assuming that the first expanded size is 1 pixel, the preset height range includes integer values 2, 3, 4, 5, 6, 7, 8, and 9.

36: If the height of the distorted picture is equal to an integral multiple of the height value, the height value is determined as the target height and ends.

37: If the height of the distorted picture is not equal to an integral multiple of the height value, the overlap height corresponding to the height value is calculated according to the second formula as follows.

The second formula is:

In the second formula, ΔH is the overlap width corresponding to the selected width value, H ₁ is the selected height value, H ₂ is the height of the first image block after the distortion image block is expanded, and H ₃ is the distortion picture. the height of.

38: If there is still an unselected height value in the preset height range, a height value is selected from the unselected height values, and execution 36 is returned; otherwise, the height value corresponding to the minimum overlap height is determined as the target height.

Optionally, when each of the divided image blocks is not equal in size, the step (step 203) may be:

For the distorted image block located at the boundary of the distorted picture, the target edge of the distorted image block is subjected to edge expansion processing, and the target edge is an edge of the distorted image block that does not coincide with the boundary of the distorted picture, and for other distorted image blocks of the distorted picture, The four edges of the distorted image block can be separately subjected to edge expansion processing. The detailed implementation is as follows:

Performing edge expansion processing on the target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, the target distortion image block being the first distortion image block, the second distortion image block, and the third A distorted image block, the target edge being an edge of the target distorted image block that does not coincide with the boundary of the distorted picture. And, according to the first expanded size, four edges of the fourth distortion image block are subjected to edge expansion processing to obtain a first image block corresponding to the fourth distortion image block. Wherein, the width of the expanded edge is equal to the first expanded size.

In other words, the target edge of the first distortion image block, the target edge of the second distortion image block, and the target edge of the third distortion image block are respectively subjected to edge expansion processing according to the first expanded edge size, to obtain a first distortion image block correspondingly. a first image block, a first image block corresponding to the second distortion image block, and a first image block corresponding to the third distortion image block. The target edge of the distortion image block is an edge of the distortion image block that does not coincide with the boundary of the distortion image block. .

For example, referring to FIG. 8, for the first distorted image block P1, the target edges of the first distorted image block P1 are the right edge and the lower edge. Referring to FIG. 9, the right and lower edges are respectively performed according to the first expanded edge size lap. The edge expansion processing obtains the first image block corresponding to the first distortion image block P1 (which is a broken line frame including P1).

Referring to FIG. 8, for the second distorted image block P2, the target edges of the second distorted image block P2 are a left edge, a right edge, and a lower edge. Referring to FIG. 9, the left edge and the right edge are respectively according to the first expanded edge size lap. The edge expansion processing is performed with the lower edge to obtain a first image block corresponding to the second distortion image block P2 (which is a dotted line frame including P2).

Referring to FIG. 8, for the third distortion image block P6, the target edges of the third distortion image block P6 are an upper edge, a lower edge, and a right edge. Referring to FIG. 9, the upper edge ratio and the lower edge are respectively according to the first edge expansion size lap. The edge expansion processing is performed with the right edge to obtain a first image block corresponding to the third distortion image block P6 (which is a broken line frame including P6).

Referring to FIG. 8, for the fourth distortion image block P8, referring to FIG. 9, the four edges of the fourth distortion image block P8 are respectively subjected to edge expansion processing according to the first expanded edge size lap, and the first corresponding to the fourth distortion image block P8 is obtained. Image block (which is a dashed box including P8).

Wherein, the width of each of the first image blocks obtained in the second case described above is equal to the target width, and the height of each of the first image blocks is equal to the target height.

When each of the distorted image blocks obtained by dividing the distorted picture in the second case is not a distorted image block of equal size, the target width and the target height may be determined as follows before performing step 202. Can be:

According to each width value in the preset width range, the first parameter corresponding to each width value is calculated according to the following third formula, and the width value corresponding to the smallest first parameter is determined as the target width.

The third formula is:

In the above third formula, S ₁ is the first parameter, W ₁ is the width value in the preset width range, and W ₃ is the width of the distorted picture.

According to each height value in the preset height range, the second parameter corresponding to each height value is calculated according to the fourth formula below, and the height value corresponding to the smallest second parameter is determined as the target height.

The fourth formula is:

In the fourth formula above, S ₂ is the second parameter, H ₁ is the height value in the preset height range, and H ₃ is the height of the distorted picture.

Optionally, in the foregoing first case or the second case, the edge expansion processing of the distortion image block may be performed in multiple manners. In this step, the following three manners are listed to perform the edge of the distortion image block. The edge expansion processing is respectively.

In the first method, the edge of the distorted image block is subjected to edge expansion processing using a preset pixel value.

For example, the preset pixel value may be a pixel value of 0, 1, 2, or 3, and as shown in FIG. 10, the four edges of the distorted image block P1 may be expanded by a preset pixel value, and the width of the edge expansion of each edge is equal to The first expanded size, the pixel value of each pixel in the region obtained by the edge expansion is a preset pixel value.

In the second mode, the edge is subjected to edge expansion processing using the pixel value of each pixel included in the edge of the distorted image block.

For example, referring to FIG. 10, for the left edge of the distorted image block, the left edge may be subjected to edge expansion processing using the pixel value of each pixel included in the left edge, and each pixel in the region obtained by the left edge is expanded. The pixel value of the pixel is the pixel value of a certain pixel included in the left edge.

In a third mode, the edge is subjected to edge expansion processing using a neighbor image block adjacent to the edge of the distorted image block.

For example, referring to FIG. 10, the neighboring image block adjacent to the right edge of the distorted image block P1 is P4, and the right edge of the distorted image block P1 is subjected to edge expansion processing using the neighboring image block P4.

Step 204: Filter each first image block of the distorted picture by using a convolutional neural network model to obtain a second image block corresponding to each first image block.

The convolutional neural network model can be any convolutional neural network model that currently appears, or it can be a pre-established convolutional neural network model.

The convolutional neural network includes a plurality of convolutional layers, each convolutional layer corresponding to one trim size and a second expanded size, the trim size being equal to the second expanded size. Each convolution layer performs a clipping operation on the input first image block, performs a trimming process on the first image block according to the trimming size, and according to the second expanded edge before outputting the first image block The first image block is subjected to edge expansion processing such that the size of the first image block input to the convolutional layer is equal to the size of the first image block output from the convolutional layer.

In this embodiment, in the first case, the expansion size corresponding to each convolution layer may be set before performing this step, and for each convolution layer, the expansion of the convolution layer may be set. The size is not less than 0 and is not greater than the second expanded size corresponding to the convolutional layer when the convolutional neural network model is trained, that is, the expanded size corresponding to the convolutional layer is greater than or equal to 0 and less than or equal to the volume. The second expanded edge size corresponding to the laminate.

Since the first expanded edge size is greater than or equal to the accumulated value of the second expanded edge size corresponding to each of the convolutional layers, and the trimming dimension of the convolutional layer is equal to the second expanded side size corresponding to the convolved layer, After the image block is input to the convolutional neural network model, the size of the second image block corresponding to the first image block output by the convolutional neural network model is greater than or equal to the size of the distorted image block corresponding to the first image block.

or,

In the first case or the second case, the second expanded size corresponding to each convolution layer may not be set before the step is performed, and the corresponding trimming size of the convolution layer is equal to the convolution a second expanded size corresponding to the layer, such that after the first image block is input to the convolutional neural network model, the size of the second image block corresponding to the first image block output by the convolutional neural network model is equal to the first image The size of the block.

In this step, when a pre-established convolutional neural network model is used, an edge information component corresponding to the first image block may also be generated, where the side information component represents a distortion feature of the first image block relative to the original image; The distorted image color component of the image block and the side information component are input to a pre-established convolutional neural network model for convolution filtering processing to obtain a de-distorted second image block.

For the scheme of using the pre-established convolutional neural network model, a system architecture diagram is also provided for implementing the scheme. Referring to FIG. 11, the method includes: an edge information component generation module 11, a convolutional neural network 12, and a network training module 13;

The convolutional neural network 12 may include the following three-layer structure:

The input layer processing unit 121 is configured to receive input data input to the convolutional neural network model, where the input data includes a distorted image color component of the first image block, and an edge information component of the first image block; The data is subjected to a convolution filtering process of the first layer;

The hidden layer processing unit 122 performs at least one layer of convolution filtering processing on the output data of the input layer processing unit 121;

The output layer processing unit 123 performs convolution filtering processing on the output data of the hidden layer processing unit 122, and outputs the result as a de-distorted image color component for generating a de-distorted second image block.

12 is a schematic diagram of a data flow implementing the solution, in which the distorted image color component of the first image block and the side information component of the first image block are input as input data to the pre-trained convolutional nerve In the network model, the convolutional neural network model can be represented by a convolutional neural network of a preset structure and a configured network parameter set. After the input data is subjected to convolution filtering processing of the input layer, the hidden layer, and the output, de-distortion is obtained. The second image block.

The input data as a convolutional neural network model may include one or more side information components according to actual needs, and may also include one or more distorted image color components, for example, including at least a Y color component, a U color component, and One of the V color components, correspondingly, includes one or more de-distorted image color components.

For example, in some image processing, there may be distortion only for one of all color components, and only the color component of the distorted image block may be used as input data, such as two color components, in the de-distortion process. If there is a distortion condition, the two color components of the distorted image block are all taken as input data, and correspondingly, the corresponding de-distorted image color components are output.

The stored data of each pixel of an image block, including the values of all the color components of the pixel, can be extracted from the stored data of each pixel as needed when obtaining the distorted image color component of the distorted image block. The value of one or more of the desired color components is derived to obtain a distorted image color component of the distorted image block.

As shown in FIG. 13, taking the YUV color space as an example, the value of the Y color component of each pixel is extracted therefrom, thereby obtaining the Y color component of the distorted image. In the left diagram of Fig. 13, [0, 0] and [0, 1] are positions, and Y, U, and V are three channel distortion image color components of pixel points. For example, the position [0, 0] is a stored data of one pixel, and the stored data includes three channel distortion image color components of Y, U, and V; in the right picture of FIG. 13, [0, 0] and [0, 1 ] is still the position, Y is the Y channel distortion image color component.

For the side information component, it represents the distortion feature of the first image block relative to the original image block in the original picture, which is an expression of the distortion feature determined by the image processing process. In this step, the side information component corresponding to the first image block is used as the input data to be input to the convolutional neural network model.

In an optional embodiment, the distortion feature may include at least one of the following distortion features:

Distortion, distortion position, and distortion type:

First, the side information component can represent the degree of distortion of the distorted first image block relative to its corresponding original image block in the original picture.

Second, the side information component may also represent the distorted position of the distorted first image block relative to the corresponding original image block in the original picture, and the side information component may include the boundary coordinates of each coding unit in the first image block. For example, in a mainstream video codec application, an image is usually divided into a plurality of non-overlapping and non-fixed coding units, and the coding unit is separately subjected to predictive coding and different degrees of quantization processing. The distortion between the coding units usually does not have Consistency, pixel mutations usually occur at the boundaries of the coding unit. Therefore, the boundary coordinates of the coding unit can be used as a priori edge information component to characterize the distortion position.

Again, the side information component may also represent the distortion type of the distorted first image block relative to the corresponding original image block in the original picture, and the side information component may include the prediction mode of each coding unit in the first image block. For example, in a video codec application, different coding units in an image may adopt different prediction modes, and different prediction modes may affect the distribution of residual data, thereby affecting the characteristics of the distorted first image block. Therefore, the prediction mode of the coding unit may be used as An edge information component that characterizes the type of distortion.

Optionally, the side information component may be a combination of one or more of the foregoing distortion degree, distortion position, and distortion type, or one or more of the above distortion degree, distortion position, and distortion type may be used. The parameter indicates, for example, that after image processing, the degree of distortion of the distorted first image block may be represented by a parameter of physical meaning, or the distortion of the first image block of the distortion may be represented by two parameters of different physical meanings. Correspondingly, one or more parameters indicating the degree of distortion can be used as the side information component according to actual needs, that is, input as input data to the convolutional neural network model.

The side information component of the first image block may be an edge information guide map, which is a matrix structure of the same height as the first image block. The side information component includes an edge information component of each pixel of the first image block in which the position of the side information component of the pixel is the same as the position of the pixel in the first image block.

As shown in FIG. 14, the matrix structure of the side information component is the same as the matrix structure of the distorted first image block color component, wherein the coordinates [0, 0], [0, 1] represent the distortion position, and the matrix element value 1 represents The degree of distortion, that is, the side information component, can simultaneously indicate the degree of distortion and the position of the distortion.

As shown in Fig. 15, the coordinates [0, 0], [0, 1], [2, 0], [2, 4] represent the distortion position, and the element values 1 and 2 of the matrix represent the distortion type, that is, the side information component. At the same time, it can indicate the degree of distortion and the position of distortion.

Moreover, in the above solution provided by the embodiment of the present application, two side information components respectively illustrated in FIG. 14 and FIG. 15 may be included.

The first image block is also a matrix, with each element in the matrix being the distorted image color component of the pixel in the first image block. The distorted image color component of the pixel may include the color component of any one of the three channels Y, U, V or more.

Further, according to an alternative embodiment of the solution and the need, when the distorted image color component includes a plurality of, the side information component may include side information components respectively corresponding to each of the distorted image color components.

That is to say: the side information component of the pixel in the side information component of the first image block includes the side information component corresponding to each of the distortion image color components in the pixel.

The above solution provided by the embodiment of the present application can be applied to various alternative embodiments of the presently known embodiments, for example, in an application scenario in which an image is subjected to super-resolution processing, and the present invention does not limited.

For the scheme of filtering using the pre-established convolutional neural network model, referring to FIG. 16, the following processing steps are specifically included:

In this step, the side information components of the first image block can be generated by the following two

steps

61 and 62, respectively.

Step 61: Determine, for each of the first image blocks to be processed, a distortion level value of each pixel in the first image block.

In an optional embodiment, after the image processing of the original image is performed in different manners, the physical parameters indicating the degree of distortion may also be different. Therefore, in this step, the corresponding image processing manner may be determined to accurately represent the corresponding image. The degree of distortion of the degree of pixel distortion can be as follows:

The first way: for the first image block obtained by the codec, the quantization parameter of each coding unit in the first image block is known, that is, the quantization of each coding unit in the first image block can be obtained. a parameter, determining a quantization parameter of a coding unit where each pixel of the first image block is located as a distortion level value of each pixel of the first image block;

The quantization parameter of each coding unit in the first image block is included in the quantization unit in the video coding system, so the quantization parameter of each coding unit in the first image block can be acquired from the quantization unit.

The second way: for the first image block obtained by the codec, the coding information of each coding unit in the first image block is known, that is, the coding of each coding unit in the first image block can be obtained. And calculating, according to the coding information of each coding unit in the first image block, a quantization parameter of each coding unit, determining a quantization parameter of a coding unit where each pixel of the first image block is located as a first image block. The degree of distortion of each pixel.

The encoding information of each coding unit is included in the current original video picture, and the coding information of each coding unit in the first image block may be acquired from the current original video picture.

Step 62: Generate, according to the position of each pixel point in the first image block, a side information component corresponding to the first image block by using the obtained distortion degree value of each pixel point, where each component value included in the side information component is The pixel at the same position on the first image block corresponds to the position of the side information component of the side information component in the first image block being the same as the position of the pixel point in the first image block.

Since each component value included in the side information component corresponds to a pixel point of the same position on the first image block, the side information component has the same structure as the distortion image color component of the first image block, that is, a matrix representing the side information component. The matrix representing the color component of the first image block is of the same type.

In this step, the obtained distortion level value of each pixel point may be determined as the component value of the same position of the pixel information in the side information component corresponding to the first image block, based on the position of each pixel point in the first image block. That is, the distortion degree value of each pixel is directly determined as the component value corresponding to the pixel.

When the pixel value range of the first image block is different from the value range of the pixel degree distortion value, the obtained distortion degree value of each pixel point may be normalized based on the pixel value range of the first image block, thereby obtaining The degree of distortion after processing, the range of values of the distortion level after processing is the same as the range of pixel values;

Then, based on the position of each pixel point in the first image block, the processed distortion level value of each pixel point is determined as the component value of the same position of the pixel point in the side information component corresponding to the first image block.

In this step, the distortion degree value of the pixel point can be standardized by the following formula:

Where, norm(x) is the processed distortion degree value obtained after the normalization process, and x is the distortion degree value of the pixel point, and the pixel value range of the first image block is [PIXEL _MIN , PIXEL _MAX ], and the distortion degree value of the pixel point The range of values is [QP _MIN , QP _MAX ].

Through the above two steps, that is, the process of generating the side information component by generating the side information component of the first image block, it can also be understood that the side information guide map corresponding to the first image block is generated, and the side information guide map passes through the side thereof. The information component represents the degree of distortion of the first image block, and the side information guide map is of equal width and width to the first image block.

In the embodiment of the present invention, the convolutional neural network model includes an input layer, an implicit layer, and an output layer as an example, and the first image block is filtered by using a convolutional neural network for any first image block to be processed. A de-distorted second image block is obtained, the scheme being described as follows.

Step 63: For any one of the first image blocks to be processed, the distortion image color component of the first image block and the generated side information component are used as input data of a pre-established convolutional neural network model, and are first performed by the input layer. The convolution filtering process of the layer obtains an image block expressed in a sparse form, and outputs the image block expressed in a sparse form.

In this step, in the convolutional neural network model, the input data may be input to the network through respective channels. In this step, the first image block color component Y and c _m channels of the c _v channels may be The side information component M is combined in the dimension of the channel to form the input data I of c _v +c _m channels, and multidimensional convolution filtering and nonlinear mapping are performed on the input data I by using the following formula to generate n ₁ Image blocks represented by sparse forms:

F ₁ (I)=g(W ₁ *I+B ₁ );

Where F ₁ (I) is the output of the input layer (for image blocks expressed in sparse form), I is the input of the convolution layer in the input layer, * is the convolution operation, and W ₁ is the convolution layer of the input layer. The weight coefficient of the group, B ₁ is the offset coefficient of the convolution layer filter bank of the input layer, and g() is a nonlinear mapping function.

Wherein, W ₁ corresponds to n ₁ convolution filters, that is, n ₁ convolution filters are applied to the input of the convolution layer of the input layer, and n ₁ image blocks are output; convolution of each convolution filter The size of the kernel is c ₁ ×f ₁ ×f ₁ , where c ₁ is the number of input channels and f ₁ is the spatial size of each convolution kernel.

Next, an example is given. In this example, the parameters of the input layer can be: c ₁ = 2, f ₁ = 5, n ₁ = 64, and a ReLU (Rectified linear unit) function is used as g(), its function. The expression is:

g(x)=max(0,x);

Then the input layer convolution processing expression in this embodiment is:

F ₁ (I)=max(0, W ₁ *I+B ₁ );

Step 64: The hidden layer performs further high-dimensional mapping on the image block F ₁ (I) expressed by the input layer in a sparse form to obtain a high-dimensional image block and outputs the high-dimensional image block.

In the embodiment of the present invention, the convolution layer number, the convolution layer connection mode, the convolution layer attribute, and the like included in the hidden layer are not limited, and various structures known at present may be adopted, but the hidden layer includes at least 1 convolution layer.

For example, the hidden layer contains a N-1 (N ≥ 2) layer convolutional layer, and the hidden layer processing is represented by:

F _i (I)=g(W _i *F _i-1 (I)+B _i ), i∈{2,3,...,N};

Where F _i (I) represents the output of the i-th layer convolutional layer in the convolutional neural network, * is the convolution operation, W _i is the weight coefficient of the i-th layer convolutional layer filter bank, and B _i is the convolutional layer The offset coefficient of the filter bank, g() is a nonlinear mapping function.

Wherein, W _i corresponds to n _i convolution filters, that is, n _i convolution filters are applied to the input of the i-th convolution layer, and n _i image blocks are output; convolution of each convolution filter The size of the kernel is c _i ×f _i ×f _i , where c _i is the number of input channels and f _i is the spatial size of each convolution kernel.

Next, an example is given. In this example, the hidden layer may include a convolution layer whose convolution filter parameters are: c ₂ = 64, f ₂ =1, n ₂ = 32, Using the ReLU (Rectified Linear Unit) function as g(), the convolution processing expression of the hidden layer in this embodiment is:

F ₂ (I)=max(0, W ₂ *F ₁ (I)+B ₂ );

Step 65: The output layer aggregates the high-dimensional image block F _N (I) output by the hidden layer, and outputs the de-distorted image color component of the first image block, for generating the de-distorted second image block.

The structure of the output layer is not limited in the embodiment of the present invention, and the output layer may be a Residual Learning structure, a Direct Learning structure, or other structures.

The processing using the Residual Learning structure is as follows:

The convolution operation is performed on the high-dimensional image block outputted by the hidden layer to obtain the compensation residual, and then added to the input distortion image color component to obtain the de-distorted image color component, that is, the de-distorted second image block is obtained. The output layer processing can be expressed by the following formula:

F(I)=W _N+1 *F _N (I)+B _N+1 +Y;

Where F(I) is the de-distorted image color component of the output layer, F _N (I) is the output of the hidden layer (which is a high-dimensional image block), * is the convolution operation, and W _N+1 is the output layer. The weight coefficient of the convolution layer filter bank, B _N+1 is the offset coefficient of the convolution layer filter bank of the output layer, and Y is the distorted image color component that is not subjected to convolution filtering processing and is to be subjected to de-distortion processing.

Wherein, W _N+1 corresponds to n _N+1 convolution filters, that is, n _N+1 convolution filters are applied to the input of the N+1th convolution layer, and n _N+1 image blocks are output. , n _N+1 is the number of output de-distorted image color components, generally equal to the number of input distortion image color components. If only one de-distorted image color component is output, n _{N+1 is} generally 1 The size of the convolution kernel of each convolution filter is c _N+1 ×f _N+1 ×f _N+1 , where c _N+1 is the number of input channels and f _N+1 is the number of each convolution kernel The size of the space.

The processing using the Direct Learning structure is as follows:

After the convolution operation of the output of the hidden layer, the de-distorted image color component is directly output, that is, the de-distorted second image block is obtained. The output layer processing can be expressed by the following formula:

F(I)=W _N+1 *F _N (I)+B _N+1 ;

Where F(I) is the output of the output layer, F _N (I) is the output of the hidden layer, * is the convolution operation, and W _N+1 is the weight coefficient of the convolutional layer filter bank of the output layer, B _{N+ 1} is the offset coefficient of the convolution layer filter bank of the output layer.

Next, an example is shown. In this example, the output layer adopts a Residual Learning structure, and the output layer includes a convolution layer. The convolution filter parameters of the output layer are: c ₃ =32, f ₃ =3,n ₃ =1, the convolution processing expression of the output layer in this embodiment is:

F(I)=W ₃ *F ₃ (I)+B ₃ +Y.

It should be noted that, in this embodiment, multiple distortion image blocks can be filtered at the same time, so that parallelization filtering can be implemented, and the efficiency of video coding is improved.

In the above solution provided by the embodiment of the present invention, a method for training a convolutional neural network model is also proposed, as shown in FIG. 17, which specifically includes the following processing steps:

Step 71: Acquire a preset training set, where the preset training set includes an original sample image, and a distortion image color component of the plurality of distortion images corresponding to the original sample image, and an edge information component corresponding to each of the distortion images, where the distortion image corresponds to The side information component represents the distorted feature of the distorted image relative to the original sample image. The distortion characteristics of the plurality of distorted images are different.

In this step, the original sample image (ie, the undistorted natural image) may be subjected to image processing with different degrees of distortion to obtain a distortion image corresponding to each original sample image, and according to the steps in the above-described de-distortion method, for each distortion And generating an edge information component corresponding to the distortion image, so that for each original sample image, the original sample image, the distortion image corresponding to the original sample image, and the side information component corresponding to the distortion image are combined into an image pair, and the image is composed of the image The preset training set Ω is composed. Since the original sample image is subjected to image processing of different degrees of distortion, the original sample image may correspond to a plurality of distorted images.

Further, the training set may include an original sample image, and performing image processing on the original sample image to obtain a plurality of distortion images having different distortion characteristics, and side information components corresponding to each distortion image; that is, the training set includes the An original sample image, the plurality of distorted images corresponding to the one original sample image and the side information components corresponding to each of the distorted images.

The training set may also include a plurality of original sample images, respectively performing image processing on each of the original sample images to obtain a plurality of distorted images having different distortion characteristics, and side information components corresponding to each distorted image; that is, the training set includes Each of the original sample images, the plurality of distorted images corresponding to the original sample image and the side information component corresponding to each of the distorted images of the original sample image.

Step 72: Initialize the parameters of the network parameter set of the convolutional neural network CNN for the convolutional neural network CNN of the preset structure. The initialized parameter set may be represented by θ ₁ , and the initialized parameters may be set according to actual needs and experience.

In this step, the training-related high-level parameters, such as the learning rate and the gradient descent algorithm, may be appropriately set, and may be set in the manner mentioned above, or may be set in other manners, and will not be described in detail herein.

Step 73: Perform forward calculation.

Optionally, the distortion image color component of each of the distortion images in the preset training set and the corresponding side information component are input to a convolutional neural network of a preset structure to perform convolution filtering processing, to obtain a de-distorted image corresponding to the distortion image. Color component.

In this step, specifically, the forward calculation of the convolutional neural network CNN with the parameter set θ _i is performed on the preset training set Ω, and the output F(Y) of the convolutional neural network is obtained, that is, the corresponding image of each distortion image Distorted image color component.

The first time you enter this step, the current parameter set is θ ₁ . When you enter this step again, the current parameter set θ _i is obtained by adjusting the parameter set θ _i-1 used last time. description.

Step 74: Determine a loss value of the plurality of original sample images based on the original image color component of the plurality of original sample images and the obtained de-distorted image color component.

Specifically, the mean square error (MSE) formula can be used as the loss function to obtain the loss value L(θ _i ), as shown in the following formula:

Where H represents the number of pairs of images selected from the preset training set in a single training, and I _h represents the input data of the combined edge component and the distorted image color component corresponding to the hth distorted image, F(I _h | θ _i ) represents the de-distorted image color component calculated by the convolutional neural network CNN forwardly under the parameter set θ _i for the h-th distorted image, and X _h represents the original image color component corresponding to the h-th distorted image, i is The number of times the forward calculation has been performed is currently counted.

Step 75: Determine whether the convolutional neural network of the preset structure adopting the current parameter set is converged based on the loss value. If not, proceed to step 76. If it converges, proceed to step 77.

Optionally, the convergence may be determined when the loss value is less than the preset loss value threshold. For example, the loss value of each original sample image in the plurality of original sample images is less than a preset loss value threshold, determining convergence, or For another example, the loss value of the original sample image of the plurality of original sample images is less than a preset loss value threshold, and the convergence is determined; or the difference between the loss value and the last calculated loss value may be calculated by the current calculation. When the value is less than the preset change threshold, the convergence is determined. For example, for each original sample image, the difference between the loss value of the original sample image obtained this time and the loss value of the original sample image obtained last time is calculated, that is, Calculating the difference of each original sample image, determining convergence when the difference of each original sample image is less than a preset change threshold, or determining convergence when the difference of any original sample image is less than a preset change threshold The invention is not limited herein.

In step 76, the parameters in the current parameter set are adjusted to obtain an adjusted parameter set, and then proceeds to step 73 for the next forward calculation.

Specifically, the back propagation algorithm can be used to adjust the parameters in the current parameter set.

Step 77: The current parameter set is taken as the final parameter set θ _{final of the} output, and the convolutional neural network of the preset structure adopting the final parameter set θ _final is used as the trained convolutional neural network model.

Step 205: Generate a frame of picture according to the second image block corresponding to each first image block.

In this step, if the second expanded size corresponding to each convolutional layer in the convolutional neural network model is set to zero, and the first expanded size is equal to the second expanded size corresponding to each of the convolutional layers The accumulated value is obtained, and the obtained second image block corresponding to each first image block is equal in width to the distortion image block corresponding to each first image block, and may be according to the distortion image block corresponding to each first image block. In the position of the distorted picture, the second image block corresponding to each first image block is composed into a frame de-distorted picture, and the frame de-distorted picture is buffered in the buffer as a frame reference picture.

Optionally, for the last two second image blocks in each row, if the corresponding distorted image blocks of the two second image blocks respectively partially overlap, before the frame de-distorting picture is formed, the last second The overlapping portion is removed from the image block. And/or, for the last two second image blocks in each column, if the corresponding distorted image blocks of the two second image blocks respectively partially overlap, before the one frame de-distorted picture is formed, the last second The overlapping portion is removed from the image block. Then form a frame to de-distort the picture.

or,

In this step, if the expanded edge size corresponding to each convolutional layer in the convolutional neural network model is not set, that is, the corresponding trimming size and the second expanded side size of each convolutional layer are equal, each obtained The second image block corresponding to the first image block is equal in width to the first image block, and the second image block corresponding to each first image block may be trimmed according to the first expanded size to obtain each De-distorting image blocks corresponding to the first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture, the de-distorted image blocks corresponding to each first image block are combined into one frame de-distorted picture, The frame de-distorted picture is buffered in the buffer as a frame reference picture.

Wherein, when the edge trimming process is performed, for the second image block corresponding to any one of the first image blocks, the edge of the second image block that is subjected to the edge expansion processing is determined, and the second image is determined according to the first expanded edge size. The edge determined in the block is trimmed to obtain a de-distorted image block corresponding to the first image block, and the width of the cut edge is equal to the first expanded size.

or,

In this step, for each convolutional layer in the convolutional neural network model, if the convolutional layer corresponding to the convolutional layer is set to be larger than 0 and smaller than the second expanded dimension corresponding to the convolutional layer, that is, the volume The size of the second image block is smaller than the size of the first image block and larger than the size of the first image block corresponding to the first image block obtained by the filtering. a size of the distorted image block corresponding to the first image block, calculating an accumulated value of the expanded edge size corresponding to each convolution layer, and calculating a difference between the first expanded size and the accumulated value, according to the difference The second image block corresponding to the first image block is subjected to trimming processing to obtain a de-distorted image block corresponding to each first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture, The de-distorted image block corresponding to each first image block constitutes a frame de-distorted picture, and the frame-de-distorted picture is buffered in the buffer as a frame reference picture.

Wherein, when the edge trimming process is performed, for the second image block corresponding to any one of the first image blocks, the edge of the second image block that is subjected to the edge expansion processing is determined, and the second image block is determined according to the difference. The edge is trimmed to obtain a de-distorted image block corresponding to the first image block, and the width of the cropped edge is equal to the difference.

Optionally, for the last two de-distorted image blocks in each row, if the distorted image blocks corresponding to the two de-distorted image blocks respectively overlap partially, the distortion may be de-distorted before forming a frame-de-distorted picture. The overlapping portion is removed from the image block. And/or, for the last two de-distorted image blocks in each column, if the distorted image blocks corresponding to the two de-distorted image blocks respectively overlap partially, the distortion can be de-distorted from the last before composing a frame-de-distorted picture. The overlapping portion is removed from the image block. Then form a frame to de-distort the picture.

In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each The de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering. In addition, multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video coding efficiency.

Referring to FIG. 18, an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during a decoding process, including:

Step 301: Acquire a distorted picture generated during video decoding.

A reconstructed picture may be generated during the video decoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.

Referring to the structure diagram of the video decoding system shown in FIG. 19, the video decoding system includes a prediction module, an entropy decoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, a convolutional neural network model CNN, and a buffer.

The video decoding system encodes a process of inputting a bit stream into an entropy decoder, and the entropy decoder decodes the bit stream to obtain mode information, quantization parameters, and residual information, and input the mode information into a prediction module, The quantization parameter is input to the convolutional neural network model, and the residual information is input to the inverse quantization unit. The prediction module predicts the input mode information according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the reconstruction unit. The prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch, and the mode information may include intra mode information and inter mode information. The intra prediction unit may predict intra prediction information for the intra mode information, and the motion estimation and motion compensation unit may obtain inter prediction data by inter prediction of the inter mode information according to the reference picture buffered in the buffer, and the switch selects the intraframe. Predict the data or output the inter prediction data to the reconstruction unit.

The inverse quantization unit and the inverse transform unit respectively perform inverse quantization and inverse transform processing on the residual information to obtain prediction error information, and input the prediction error information into the reconstruction unit; the reconstruction unit generates and reconstructs the prediction error information according to the prediction error information and the prediction data. image. Correspondingly, in this step, the reconstructed picture generated by the reconstructing unit may be acquired, and the reconstructed picture is taken as a distorted picture.

Optionally, referring to FIG. 20, a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.

Optionally, referring to FIG. 21, the mode information output by the entropy decoder may include only intra mode information, and the prediction module includes only an intra prediction unit, and the intra prediction unit predicts the intra mode information to obtain prediction data. Input to the reconstruction unit, and the reconstructed unit generates a reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.

Steps 302-305 are the same as steps 202-205 above, and will not be described in detail herein.

In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each The de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering. In addition, multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video decoding efficiency.

The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to FIG. 22, an embodiment of the present application provides a device 400 for image filtering, where the device 400 includes:

a first acquiring module 401, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;

The second obtaining module 402 is configured to obtain a plurality of first image blocks by dividing the distorted picture;

a filtering module 403, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;

The generating module 404 is configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.

Optionally, the second obtaining module 402 includes:

Optionally, the edge expansion unit is configured to:

Optionally, the device 400 further includes:

a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer The size, the second expanded size is an expanded size of the convolutional layer when the convolutional neural network model is trained.

Optionally, the device 400 further includes:

Optionally, the generating module 404 includes:

Optionally, the device 400 further includes:

In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks, thereby obtaining each Deblocking image blocks corresponding to the distorted image blocks, and generating a frame of de-distorted pictures according to the de-distorted image blocks corresponding to each of the distorted image blocks. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering.

With regard to the apparatus in the above embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment relating to the method, and will not be explained in detail herein.

FIG. 23 is a block diagram showing the structure of a terminal 500 according to an exemplary embodiment of the present invention. The terminal 500 can be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and a MP4 (Moving Picture Experts Group Audio Layer IV). Image experts compress standard audio layers 4) players, laptops or desktops. Terminal 500 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.

Generally, the terminal 500 includes a processor 501 and a memory 502.

Processor 501 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve. The processor 501 may also include a main processor and a coprocessor. The main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby. In some embodiments, the processor 501 can be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display. In some embodiments, the processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 can include one or more computer readable storage media, which can be non-transitory. Memory 502 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in the memory 502 is configured to store at least one instruction for execution by the processor 501 to implement one of the methods provided by the method embodiments of the present application. Image filtering method.

In some embodiments, the terminal 500 optionally further includes: a peripheral device interface 503 and at least one peripheral device. The processor 501, the memory 502, and the peripheral device interface 503 can be connected by a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 503 via a bus, signal line or circuit board. Specifically, the peripheral device includes at least one of a radio frequency circuit 504, a touch display screen 505, a camera 506, an audio circuit 507, a positioning component 508, and a power source 509.

Peripheral device interface 503 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to processor 501 and memory 502. In some embodiments, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any of processor 501, memory 502, and peripheral interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The RF circuit 504 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal. Radio frequency circuit 504 communicates with the communication network and other communication devices via electromagnetic signals. The RF circuit 504 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like. Radio frequency circuit 504 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the RF circuit 504 may also include NFC (Near Field Communication) related circuitry, which is not limited in this application.

The display screen 505 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When display 505 is a touch display, display 505 also has the ability to acquire touch signals over the surface or surface of display 505. The touch signal can be input to the processor 501 as a control signal for processing. At this point, display 505 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display screen 505 may be one, and the front panel of the terminal 500 is disposed; in other embodiments, the display screen 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; In still other embodiments, display screen 505 can be a flexible display screen disposed on a curved surface or a folded surface of terminal 500. Even the display screen 505 can be set to a non-rectangular irregular pattern, that is, a profiled screen. The display screen 505 can be prepared by using an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).

Camera component 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Usually, the front camera is placed on the front panel of the terminal, and the rear camera is placed on the back of the terminal. In some embodiments, the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions. In some embodiments, camera assembly 506 can also include a flash. The flash can be a monochrome temperature flash or a two-color temperature flash. The two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.

The audio circuit 507 can include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 501 for processing, or input to the RF circuit 504 for voice communication. For the purpose of stereo acquisition or noise reduction, there may be multiple microphones, which are respectively disposed at different parts of the terminal 500. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is then used to convert electrical signals from processor 501 or radio frequency circuit 504 into sound waves. The speaker can be a conventional film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only can the electrical signal be converted into human audible sound waves, but also the electrical signal can be converted into sound waves that are inaudible to humans for ranging and the like. In some embodiments, the audio circuit 507 can also include a headphone jack.

The location component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service). The positioning component 508 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.

Power source 509 is used to power various components in terminal 500. The power source 509 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. A wired rechargeable battery is a battery that is charged by a wired line, and a wireless rechargeable battery is a battery that is charged by a wireless coil. The rechargeable battery can also be used to support fast charging technology.

In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to, an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.

The acceleration sensor 511 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 500. For example, the acceleration sensor 511 can be used to detect components of gravity acceleration on three coordinate axes. The processor 501 can control the touch display 505 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 511. The acceleration sensor 511 can also be used for the acquisition of game or user motion data.

The gyro sensor 512 can detect the body direction and the rotation angle of the terminal 500, and the gyro sensor 512 can cooperate with the acceleration sensor 511 to collect the 3D motion of the user to the terminal 500. Based on the data collected by the gyro sensor 512, the processor 501 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.

The pressure sensor 513 may be disposed at a side border of the terminal 500 and/or a lower layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, the user's holding signal to the terminal 500 can be detected, and the processor 501 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed on the lower layer of the touch display screen 505, the operability control on the UI interface is controlled by the processor 501 according to the user's pressure on the touch display screen 505. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 514 is used to collect the fingerprint of the user. The processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon identifying that the identity of the user is a trusted identity, the processor 501 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like. The fingerprint sensor 514 can be disposed on the front, back, or side of the terminal 500. When the physical button or vendor logo is provided on the terminal 500, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer logo.

Optical sensor 515 is used to collect ambient light intensity. In one embodiment, the processor 501 can control the display brightness of the touch display 505 based on the ambient light intensity acquired by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 505 is lowered. In another embodiment, the processor 501 can also dynamically adjust the shooting parameters of the camera assembly 506 based on the ambient light intensity acquired by the optical sensor 515.

Proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of terminal 500. Proximity sensor 516 is used to collect the distance between the user and the front of terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front side of the terminal 500 is gradually decreasing, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the screen state; when the proximity sensor 516 detects When the distance between the user and the front side of the terminal 500 gradually becomes larger, the processor 501 controls the touch display screen 505 to switch from the state of the screen to the bright state.

It will be understood by those skilled in the art that the structure shown in FIG. 23 does not constitute a limitation to the terminal 500, and may include more or less components than those illustrated, or may combine some components or adopt different component arrangements.

Other embodiments of the present application will be readily apparent to those skilled in the <RTIgt; The application is intended to cover any variations, uses, or adaptations of the application, which are in accordance with the general principles of the application and include common general knowledge or common technical means in the art that are not disclosed herein. . The specification and examples are to be regarded as illustrative only,

It is to be understood that the invention is not limited to the details of the details and The scope of the present application is limited only by the accompanying claims.

Claims

A method for filtering a picture, the method comprising:

Obtaining a distorted picture that is distorted relative to an original video picture that is input into the video encoding system;

Obtaining a plurality of first image blocks by dividing the distorted picture;

And filtering each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;

And generating a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
The method according to claim 1, wherein the obtaining the plurality of first image blocks by dividing the distorted picture comprises:

Dividing the distorted picture according to a target width and a target height to obtain a plurality of distorted image blocks included in the distorted picture;

Performing edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.
The method according to claim 2, wherein said plurality of distorted image blocks comprise a first distorted image block located at a vertex position of said distorted picture, and located on an upper boundary and a lower boundary of said distorted picture a second distortion image block, a third distortion image block located on a left boundary and a right boundary of the distortion picture, and a fourth distortion in addition to the first distortion image block, the second distortion image block, and the third distortion image block Image block

The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
The method according to claim 3, wherein said each of said plurality of distorted image blocks is subjected to edge expansion processing according to said first expanded size to obtain said corresponding each of said distorted image blocks The first image block includes:

And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;

And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
The method according to claim 2, wherein before the filtering of each of the distorted image blocks of the distorted picture using the convolutional neural network model, the method further comprises:

And setting a flanged dimension corresponding to the convolution layer included in the convolutional neural network model, the set expansion dimension is not less than zero and not greater than a second expansion dimension corresponding to the convolution layer, and the second expansion The edge size is the expanded size of the convolutional layer when the convolutional neural network model is trained.
The method of claim 5, wherein the method further comprises:

The first expanded size is set according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
The method according to any one of claims 1 to 4, wherein the generating a frame of the de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks comprises:

Performing edge processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;

The third image block corresponding to each of the distorted image blocks is composed into a frame de-distorted picture.
The method according to any one of claims 2 to 6, wherein the method further comprises:

The target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
A device for filtering pictures, characterized in that the device comprises:

a first acquiring module, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;

a second acquiring module, configured to obtain a plurality of first image blocks by dividing the distorted picture;

a filtering module, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;

And a generating module, configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
The device of claim 9, wherein the second obtaining module comprises:

a dividing unit, configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture;

And an edge expansion unit, configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
The apparatus according to claim 10, wherein said plurality of distorted image blocks include a first distorted image block located at a vertex position of said distorted picture, and an upper boundary and a lower boundary of said distorted picture a second distortion image block, a third distortion image block located on a left boundary and a right boundary of the distortion picture, and a fourth distortion in addition to the first distortion image block, the second distortion image block, and the third distortion image block Image block

The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
The apparatus according to claim 11, wherein said edge expansion unit is configured to:

And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;

And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
The device of claim 10, wherein the device further comprises:

a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer The size, the second expanded size is an expanded size of the convolutional layer when the convolutional neural network model is trained.
The device of claim 13 wherein said device further comprises:

And a second setting module, configured to set the first expanded size according to a second expanded size corresponding to the number of each convolution layer included in the convolutional neural network model.
The apparatus according to any one of claims 9 to 12, wherein the generating module comprises:

An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;

And a component unit, configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
The device according to any one of claims 10 to 14, wherein the device further comprises:

And a determining module, configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.