WO2019141255A1 - Image filtering method and device - Google Patents

Image filtering method and device Download PDF

Info

Publication number
WO2019141255A1
WO2019141255A1 PCT/CN2019/072412 CN2019072412W WO2019141255A1 WO 2019141255 A1 WO2019141255 A1 WO 2019141255A1 CN 2019072412 W CN2019072412 W CN 2019072412W WO 2019141255 A1 WO2019141255 A1 WO 2019141255A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
distorted
distortion
picture
image
Prior art date
Application number
PCT/CN2019/072412
Other languages
French (fr)
Chinese (zh)
Inventor
姚佳宝
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2019141255A1 publication Critical patent/WO2019141255A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • the present application relates to the field of video, and in particular, to a method and an apparatus for filtering pictures.
  • the original video picture when encoding the original video picture, the original video picture is processed multiple times and the reconstructed picture is obtained.
  • the resulting reconstructed picture may have been pixel-shifted relative to the original video picture, ie, the reconstructed picture is distorted, resulting in visual impairment or artifacts.
  • the in-loop filtering module filters the reconstructed picture of the entire frame.
  • the reconstructed picture is a high-resolution picture
  • the resources required for filtering and reconstructing the picture are often high, so that the device may not be satisfied. For example, filtering a reconstructed picture of 4K resolution may cause insufficient memory.
  • the embodiment of the present application provides a method and an apparatus for filtering a picture.
  • the technical solution is as follows:
  • an embodiment of the present application provides a method for filtering a picture, where the method includes:
  • the acquiring the plurality of first image blocks by dividing the distorted picture comprises:
  • the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
  • the width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size.
  • the width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion
  • the width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
  • the performing the edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain the first image block corresponding to each of the distorted image blocks comprises:
  • the method before using the convolutional neural network model to separately filter each of the distorted image blocks of the distorted picture, the method further includes:
  • the set expansion dimension is not less than zero and not greater than a second expansion dimension corresponding to the convolution layer, and the second expansion
  • the edge size is the expanded size of the convolutional layer when the convolutional neural network model is trained.
  • the method further includes:
  • the first expanded size is set according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
  • the generating, according to the de-distorted image block corresponding to each of the distorted image blocks, a frame of the de-distorted image comprising:
  • the third image block corresponding to each of the distorted image blocks is composed into a frame de-distorted picture.
  • the method further includes:
  • the target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
  • the embodiment of the present application provides an apparatus for filtering a picture, where the apparatus includes:
  • a first acquiring module configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system
  • a second acquiring module configured to obtain a plurality of first image blocks by dividing the distorted picture
  • a filtering module configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
  • a generating module configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
  • the second obtaining module includes:
  • a dividing unit configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture
  • an edge expansion unit configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
  • the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
  • the width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size.
  • the width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion
  • the width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
  • the edge expansion unit is configured to:
  • the device further includes:
  • a first setting module configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer a size, the second expanded size is an expanded size of the convolution layer when the convolutional neural network model is trained
  • the device further includes:
  • a second setting module configured to set the first expanded size according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
  • the generating module includes:
  • An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
  • a component unit configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
  • the device further includes:
  • a determining module configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.
  • an embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the first aspect or the first aspect is implemented.
  • the block corresponding de-distorted image block By dividing the distorted picture generated in the video encoding and decoding process, obtaining a plurality of distorted image blocks included in the distorted picture, and then using the convolutional neural network model to respectively filter each distorted image block of the distorted picture to obtain each distorted image.
  • the block corresponding de-distorted image block generates a frame of picture according to the de-distorted image block corresponding to each of the distorted image blocks.
  • the generated one-frame picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, so that the device can satisfy Required for filtering.
  • FIG. 1 is a flowchart of a method for filtering a picture according to an embodiment of the present application
  • FIG. 2 is a flowchart of another method for filtering a picture provided by an embodiment of the present application.
  • FIG. 3 is a structural block diagram of a video encoding system according to an embodiment of the present application.
  • FIG. 4 is a structural block diagram of another video encoding system according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 6 is another schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 7 is another schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 8 is another schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 9 is another schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 10 is another schematic diagram of a divided image block provided by an embodiment of the present application.
  • FIG. 11 is a system architecture diagram of a technical solution provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of data flow of a technical solution provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of obtaining a distortion image color component of a distorted image according to an embodiment of the present application.
  • 15 is a second schematic diagram of side information components provided by an embodiment of the present application.
  • 16 is a flowchart of a method for removing distortion of a distorted image according to an embodiment of the present application
  • 17 is a flowchart of a method for training a convolutional neural network model provided by an embodiment of the present application.
  • FIG. 18 is a flowchart of another method for filtering a picture according to an embodiment of the present disclosure.
  • FIG. 19 is a structural block diagram of a video encoding system according to an embodiment of the present application.
  • FIG. 20 is a structural block diagram of another video encoding system according to an embodiment of the present application.
  • FIG. 21 is a structural block diagram of another video encoding system according to an embodiment of the present disclosure.
  • FIG. 22 is a schematic diagram of an apparatus for filtering a picture according to an embodiment of the present application.
  • FIG. 23 is a schematic structural diagram of a device according to an embodiment of the present application.
  • an embodiment of the present application provides a method for image filtering, including:
  • Step 101 Acquire a distortion picture generated by a video codec process.
  • Step 102 Acquire a plurality of distorted image blocks by dividing the distorted picture.
  • the entire frame of the video image may be obtained, and then the entire frame of the video image is divided to obtain multiple distortion pictures.
  • part of the image data in the entire frame of the video image may be acquired each time. When the acquired image data reaches a distorted image block, the following operations are performed on the distorted image block, thereby The method of dividing the distorted picture into a plurality of distorted image blocks is realized, and the efficiency of video encoding or decoding can be improved.
  • Step 103 Filter each of the distorted image blocks using a convolutional neural network model to obtain a de-distorted image block corresponding to each of the distorted image blocks.
  • one or more distorted image blocks can be filtered at the same time, that is, parallel filtering can be implemented to improve filtering efficiency.
  • Step 104 Generate a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks.
  • the method provided in this embodiment may occur in a video encoding process or in a video decoding process. Therefore, the distorted picture may be a video picture generated during the video encoding process or a video picture generated during the video decoding process.
  • a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then each distorted image block is separately filtered by using a convolutional neural network model to obtain each distorted image.
  • the block corresponding de-distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks.
  • the generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device The resources required for filtering can be satisfied, and the resources can be resources such as video memory and/or memory.
  • an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during an encoding process, including:
  • Step 201 Acquire a distorted picture generated during video encoding.
  • a reconstructed picture may be generated during the video encoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.
  • the video coding system includes a prediction module, an adder, a transform unit, a quantization unit, an entropy encoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, and a CNN (convolution neural network). Model) and the buffer and other parts.
  • the process of encoding the video coding system may be: inputting the original picture into the prediction module and the adder, and the prediction module predicts the input original picture according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the addition method.
  • entropy coder and reconstruction unit The prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch.
  • the intra prediction unit may perform intra prediction on the original picture to obtain intra prediction data
  • the motion estimation and motion compensation unit performs inter prediction on the original picture according to the reference picture buffered in the buffer to obtain inter prediction data
  • the switch selects the intra frame. Predict the data or output the inter prediction data to the adder and the reconstruction unit.
  • the intra prediction data may include intra mode information
  • the inter prediction data may include inter mode information.
  • a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture.
  • the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.
  • the distorted picture is distorted relative to the original video picture.
  • Step 202 Divide the distorted picture according to the target width and the target height to obtain a plurality of distorted image blocks included in the distorted picture.
  • the distortion image block divided in this step may be an image block of equal size or may not be an image block of equal size.
  • each distorted image block when each distorted image block can be equal in size, the width of each distorted image block in the distorted picture can be equal to the target width, and the height of each distorted image block in the distorted picture can be equal to the target height.
  • the width of the distorted picture is not an integer multiple of the target width, there is an overlap between two distorted image blocks in each line of the distorted image block obtained according to the target width division.
  • the width of the distorted picture is not equal to an integral multiple of the target width, and each line obtained according to the target width includes four distorted image blocks, and for each line of distorted image blocks, the line includes the third and fourth There is an overlap between the pieces of distorted image, where ⁇ W is the overlapping width of the third and fourth distorted image blocks in FIG.
  • the height of the distorted picture is an integral multiple of the target height
  • the height of the distorted picture is equal to an integral multiple of the target height
  • each column obtained according to the target height division includes three distorted image blocks
  • the column includes three distorted image blocks. There is no overlap.
  • the height of the distorted picture is not an integer multiple of the target height
  • the height of the distorted picture is not equal to an integral multiple of the target height
  • each column obtained according to the target height division includes four distorted image blocks
  • the column includes the third and fourth
  • the obtained plurality of distorted image blocks may include a first distorted image block, a second distorted image block, a third distorted image block, and a fourth distorted image block. Types.
  • the first distorted image block is located at a vertex position of the distorted picture, and the first distorted image block is respectively the image blocks P1, P5, P16, and P20 in FIG.
  • the width and height of a distorted image block P1, P5, P16, and P20 are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size.
  • the second distorted image block is located on an upper boundary and a lower boundary of the distorted picture, the second distorted image block is different from the first distorted image block, and the second distorted image block is respectively the image block P2, P3, P4 in FIG. , P17, P18, and P19, and the width and height of the second distorted image blocks P2, P3, P4, P17, P18, and P19 are equal to W 1 -2lap and H 1 -lap, respectively.
  • the third distorted image block is located on the left and right borders of the distorted picture, the third distorted image block is different from the first distorted image block, and the third distorted image block is respectively the image blocks P6, P11, P10 and P15 in FIG.
  • the width and height of the third distortion image blocks P6, P11, P10, and P15 are W 1 -lap and H 1 -2lap, respectively.
  • the distortion image block of the plurality of distortion tiles except the first distortion image block, the second distortion image block, and the third distortion image block is a fourth distortion image block
  • the fourth distortion image block is the image block in FIG. 8 respectively.
  • P7, P8, P9, P12, P13 and P14, the width and height of the fourth distortion image blocks P7, P8, P9, P12, P13 and P14 are W 1 -2lap and H 1 -2lap, respectively.
  • the last two distorted image blocks in each line of the distorted image block may have partial overlap or partial overlap, for example, the distorted image block P4 located in the first line in FIG. There is a partial overlap with P5, and ⁇ W is the overlapping width of the distorted image blocks P4 and P5 of the first line in FIG.
  • the last two distorted image blocks in each column of the distorted image block may have partial overlap or partial overlap.
  • the distorted image blocks P11 and P16 located in the first column in FIG. 8 partially overlap.
  • ⁇ H is the overlapping height of the distortion image blocks P11 and P16 of the first column.
  • the first expanded size may also be set, and the target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
  • the convolutional neural network model comprises a plurality of convolution layers, each convolution layer corresponding to a second expanded size.
  • the first expanded size is calculated based on the second expanded size corresponding to each convolutional layer.
  • the second expanded size corresponding to each convolution layer may be accumulated to obtain an accumulated value, and the first expanded size is set to be greater than or equal to the accumulated value.
  • obtaining the distorted picture may be obtaining an entire frame of the distorted picture, and then dividing the entire frame of the distorted picture; or
  • the acquired image data when the acquired image data can form a distorted image block having a width of a target width and a height of a target height, the distorted image block is output, thereby realizing dividing the distorted picture into a plurality of distorted images of equal size.
  • the acquired image data when the acquired image data is the data of the first distorted image block and can constitute the first distorted image block, the first distorted tile is output, and the acquired image data is the data of the second distorted image block.
  • the second distortion tile is output, and when the acquired image data is data of the third distortion image block and can form a third distortion image block, the third distortion tile is output, when acquired When the image data is the data of the fourth distorted image block and can constitute the fourth distorted image block, outputting the fourth distorted tile; thereby dividing the distorted picture into the first distorted image block, the second distorted image block, and the third distorted image block And four types of distorted image blocks of the fourth distorted image block.
  • Step 203 Perform a process of expanding each of the distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.
  • the four edges of the target image block are respectively subjected to edge expansion processing according to the first expanded edge size to obtain a first image block corresponding to the target image block, and the target image block is any one of the plurality of distortion image blocks.
  • the process for determining the target width may include the 31-34 process, which are:
  • the preset width ranges from greater than 0 and less than an integer value in the width of the distorted picture.
  • the preset width range is greater than the first expanded size and smaller than the integer value of the width of the distorted picture.
  • the first expanded edge size is typically greater than or equal to 1 pixel. For example, assuming that the width of the distorted picture is 10 pixels and assuming that the first expanded size is 1 pixel, the preset width range includes integer values 2, 3, 4, 5, 6, 7, 8, and 9.
  • the first formula is:
  • ⁇ W is the overlap width corresponding to the selected width value
  • W 1 is the selected width value
  • W 2 is the width of the first image block after the edge processing of the distorted image block
  • W 3 is the distortion.
  • the width of the image, % is the remainder operation.
  • the height of the distorted picture is equal to an integral multiple of the height value, the height value is determined as the target height and ends.
  • the second formula is:
  • ⁇ H is the overlap width corresponding to the selected width value
  • H 1 is the selected height value
  • H 2 is the height of the first image block after the distortion image block is expanded
  • H 3 is the distortion picture. the height of.
  • step (step 203) when each of the divided image blocks is not equal in size, the step (step 203) may be:
  • the target distortion image block being the first distortion image block, the second distortion image block, and the third A distorted image block, the target edge being an edge of the target distorted image block that does not coincide with the boundary of the distorted picture.
  • four edges of the fourth distortion image block are subjected to edge expansion processing to obtain a first image block corresponding to the fourth distortion image block.
  • the width of the expanded edge is equal to the first expanded size.
  • the target edges of the first distorted image block P1 are the right edge and the lower edge.
  • the right and lower edges are respectively performed according to the first expanded edge size lap.
  • the edge expansion processing obtains the first image block corresponding to the first distortion image block P1 (which is a broken line frame including P1).
  • the target edges of the second distorted image block P2 are a left edge, a right edge, and a lower edge.
  • the left edge and the right edge are respectively according to the first expanded edge size lap.
  • the edge expansion processing is performed with the lower edge to obtain a first image block corresponding to the second distortion image block P2 (which is a dotted line frame including P2).
  • the target edges of the third distortion image block P6 are an upper edge, a lower edge, and a right edge.
  • the upper edge ratio and the lower edge are respectively according to the first edge expansion size lap.
  • the edge expansion processing is performed with the right edge to obtain a first image block corresponding to the third distortion image block P6 (which is a broken line frame including P6).
  • the four edges of the fourth distortion image block P8 are respectively subjected to edge expansion processing according to the first expanded edge size lap, and the first corresponding to the fourth distortion image block P8 is obtained.
  • Image block (which is a dashed box including P8).
  • the width of each of the first image blocks obtained in the second case described above is equal to the target width
  • the height of each of the first image blocks is equal to the target height
  • the target width and the target height may be determined as follows before performing step 202. Can be:
  • the third formula is:
  • S 1 is the first parameter
  • W 1 is the width value in the preset width range
  • W 3 is the width of the distorted picture.
  • the fourth formula is:
  • S 2 is the second parameter
  • H 1 is the height value in the preset height range
  • H 3 is the height of the distorted picture.
  • the edge of the distorted image block is subjected to edge expansion processing using a preset pixel value.
  • the preset pixel value may be a pixel value of 0, 1, 2, or 3, and as shown in FIG. 10, the four edges of the distorted image block P1 may be expanded by a preset pixel value, and the width of the edge expansion of each edge is equal to The first expanded size, the pixel value of each pixel in the region obtained by the edge expansion is a preset pixel value.
  • the edge is subjected to edge expansion processing using the pixel value of each pixel included in the edge of the distorted image block.
  • the left edge may be subjected to edge expansion processing using the pixel value of each pixel included in the left edge, and each pixel in the region obtained by the left edge is expanded.
  • the pixel value of the pixel is the pixel value of a certain pixel included in the left edge.
  • the neighboring image block adjacent to the right edge of the distorted image block P1 is P4, and the right edge of the distorted image block P1 is subjected to edge expansion processing using the neighboring image block P4.
  • the convolutional neural network includes a plurality of convolutional layers, each convolutional layer corresponding to one trim size and a second expanded size, the trim size being equal to the second expanded size.
  • Each convolution layer performs a clipping operation on the input first image block, performs a trimming process on the first image block according to the trimming size, and according to the second expanded edge before outputting the first image block
  • the first image block is subjected to edge expansion processing such that the size of the first image block input to the convolutional layer is equal to the size of the first image block output from the convolutional layer.
  • the expansion size corresponding to each convolution layer may be set before performing this step, and for each convolution layer, the expansion of the convolution layer may be set.
  • the size is not less than 0 and is not greater than the second expanded size corresponding to the convolutional layer when the convolutional neural network model is trained, that is, the expanded size corresponding to the convolutional layer is greater than or equal to 0 and less than or equal to the volume.
  • the second expanded edge size corresponding to the laminate may be set before performing this step, and for each convolution layer, the expansion of the convolution layer may be set.
  • the size of the second image block corresponding to the first image block output by the convolutional neural network model is greater than or equal to the size of the distorted image block corresponding to the first image block.
  • the second expanded size corresponding to each convolution layer may not be set before the step is performed, and the corresponding trimming size of the convolution layer is equal to the convolution a second expanded size corresponding to the layer, such that after the first image block is input to the convolutional neural network model, the size of the second image block corresponding to the first image block output by the convolutional neural network model is equal to the first image The size of the block.
  • an edge information component corresponding to the first image block may also be generated, where the side information component represents a distortion feature of the first image block relative to the original image;
  • the distorted image color component of the image block and the side information component are input to a pre-established convolutional neural network model for convolution filtering processing to obtain a de-distorted second image block.
  • the method includes: an edge information component generation module 11, a convolutional neural network 12, and a network training module 13;
  • the convolutional neural network 12 may include the following three-layer structure:
  • the input layer processing unit 121 is configured to receive input data input to the convolutional neural network model, where the input data includes a distorted image color component of the first image block, and an edge information component of the first image block; The data is subjected to a convolution filtering process of the first layer;
  • the hidden layer processing unit 122 performs at least one layer of convolution filtering processing on the output data of the input layer processing unit 121;
  • the output layer processing unit 123 performs convolution filtering processing on the output data of the hidden layer processing unit 122, and outputs the result as a de-distorted image color component for generating a de-distorted second image block.
  • the convolutional neural network model can be represented by a convolutional neural network of a preset structure and a configured network parameter set. After the input data is subjected to convolution filtering processing of the input layer, the hidden layer, and the output, de-distortion is obtained. The second image block.
  • the input data as a convolutional neural network model may include one or more side information components according to actual needs, and may also include one or more distorted image color components, for example, including at least a Y color component, a U color component, and One of the V color components, correspondingly, includes one or more de-distorted image color components.
  • the stored data of each pixel of an image block including the values of all the color components of the pixel, can be extracted from the stored data of each pixel as needed when obtaining the distorted image color component of the distorted image block.
  • the value of one or more of the desired color components is derived to obtain a distorted image color component of the distorted image block.
  • the side information component it represents the distortion feature of the first image block relative to the original image block in the original picture, which is an expression of the distortion feature determined by the image processing process.
  • the side information component corresponding to the first image block is used as the input data to be input to the convolutional neural network model.
  • the side information component may also represent the distortion type of the distorted first image block relative to the corresponding original image block in the original picture, and the side information component may include the prediction mode of each coding unit in the first image block.
  • the prediction mode of the coding unit may be used as An edge information component that characterizes the type of distortion.
  • the side information component of the first image block may be an edge information guide map, which is a matrix structure of the same height as the first image block.
  • the side information component includes an edge information component of each pixel of the first image block in which the position of the side information component of the pixel is the same as the position of the pixel in the first image block.
  • the matrix structure of the side information component is the same as the matrix structure of the distorted first image block color component, wherein the coordinates [0, 0], [0, 1] represent the distortion position, and the matrix element value 1 represents The degree of distortion, that is, the side information component, can simultaneously indicate the degree of distortion and the position of the distortion.
  • the coordinates [0, 0], [0, 1], [2, 0], [2, 4] represent the distortion position
  • the element values 1 and 2 of the matrix represent the distortion type, that is, the side information component. At the same time, it can indicate the degree of distortion and the position of distortion.
  • two side information components respectively illustrated in FIG. 14 and FIG. 15 may be included.
  • the first image block is also a matrix, with each element in the matrix being the distorted image color component of the pixel in the first image block.
  • the distorted image color component of the pixel may include the color component of any one of the three channels Y, U, V or more.
  • the side information component may include side information components respectively corresponding to each of the distorted image color components.
  • the side information component of the pixel in the side information component of the first image block includes the side information component corresponding to each of the distortion image color components in the pixel.
  • the side information components of the first image block can be generated by the following two steps 61 and 62, respectively.
  • Step 61 Determine, for each of the first image blocks to be processed, a distortion level value of each pixel in the first image block.
  • the quantization parameter of each coding unit in the first image block is included in the quantization unit in the video coding system, so the quantization parameter of each coding unit in the first image block can be acquired from the quantization unit.
  • the encoding information of each coding unit is included in the current original video picture, and the coding information of each coding unit in the first image block may be acquired from the current original video picture.
  • Step 62 Generate, according to the position of each pixel point in the first image block, a side information component corresponding to the first image block by using the obtained distortion degree value of each pixel point, where each component value included in the side information component is The pixel at the same position on the first image block corresponds to the position of the side information component of the side information component in the first image block being the same as the position of the pixel point in the first image block.
  • the side information component Since each component value included in the side information component corresponds to a pixel point of the same position on the first image block, the side information component has the same structure as the distortion image color component of the first image block, that is, a matrix representing the side information component.
  • the matrix representing the color component of the first image block is of the same type.
  • the obtained distortion degree value of each pixel point may be normalized based on the pixel value range of the first image block, thereby obtaining The degree of distortion after processing, the range of values of the distortion level after processing is the same as the range of pixel values;
  • the processed distortion level value of each pixel point is determined as the component value of the same position of the pixel point in the side information component corresponding to the first image block.
  • norm(x) is the processed distortion degree value obtained after the normalization process
  • x is the distortion degree value of the pixel point
  • the pixel value range of the first image block is [PIXEL MIN , PIXEL MAX ]
  • the distortion degree value of the pixel point The range of values is [QP MIN , QP MAX ].
  • the convolutional neural network model includes an input layer, an implicit layer, and an output layer as an example, and the first image block is filtered by using a convolutional neural network for any first image block to be processed.
  • a de-distorted second image block is obtained, the scheme being described as follows.
  • Step 63 For any one of the first image blocks to be processed, the distortion image color component of the first image block and the generated side information component are used as input data of a pre-established convolutional neural network model, and are first performed by the input layer.
  • the convolution filtering process of the layer obtains an image block expressed in a sparse form, and outputs the image block expressed in a sparse form.
  • the input data may be input to the network through respective channels.
  • the first image block color component Y and c m channels of the c v channels may be
  • the side information component M is combined in the dimension of the channel to form the input data I of c v +c m channels, and multidimensional convolution filtering and nonlinear mapping are performed on the input data I by using the following formula to generate n 1 Image blocks represented by sparse forms:
  • W 1 corresponds to n 1 convolution filters, that is, n 1 convolution filters are applied to the input of the convolution layer of the input layer, and n 1 image blocks are output; convolution of each convolution filter
  • the size of the kernel is c 1 ⁇ f 1 ⁇ f 1 , where c 1 is the number of input channels and f 1 is the spatial size of each convolution kernel.
  • a ReLU Rectified linear unit
  • F i (I) g(W i *F i-1 (I)+B i ), i ⁇ 2,3,...,N ⁇ ;
  • F i (I) represents the output of the i-th layer convolutional layer in the convolutional neural network
  • * is the convolution operation
  • W i is the weight coefficient of the i-th layer convolutional layer filter bank
  • B i is the convolutional layer
  • the offset coefficient of the filter bank, g() is a nonlinear mapping function.
  • W i corresponds to n i convolution filters, that is, n i convolution filters are applied to the input of the i-th convolution layer, and n i image blocks are output; convolution of each convolution filter
  • the size of the kernel is c i ⁇ f i ⁇ f i , where c i is the number of input channels and f i is the spatial size of each convolution kernel.
  • Step 65 The output layer aggregates the high-dimensional image block F N (I) output by the hidden layer, and outputs the de-distorted image color component of the first image block, for generating the de-distorted second image block.
  • the structure of the output layer is not limited in the embodiment of the present invention, and the output layer may be a Residual Learning structure, a Direct Learning structure, or other structures.
  • the processing using the Residual Learning structure is as follows:
  • F(I) is the de-distorted image color component of the output layer
  • F N (I) is the output of the hidden layer (which is a high-dimensional image block)
  • * is the convolution operation
  • W N+1 is the output layer.
  • B N+1 is the offset coefficient of the convolution layer filter bank of the output layer
  • Y is the distorted image color component that is not subjected to convolution filtering processing and is to be subjected to de-distortion processing.
  • W N+1 corresponds to n N+1 convolution filters, that is, n N+1 convolution filters are applied to the input of the N+1th convolution layer, and n N+1 image blocks are output.
  • n N+1 is the number of output de-distorted image color components, generally equal to the number of input distortion image color components. If only one de-distorted image color component is output, n N+1 is generally 1
  • the size of the convolution kernel of each convolution filter is c N+1 ⁇ f N+1 ⁇ f N+1 , where c N+1 is the number of input channels and f N+1 is the number of each convolution kernel The size of the space.
  • the de-distorted image color component is directly output, that is, the de-distorted second image block is obtained.
  • the output layer processing can be expressed by the following formula:
  • F(I) is the output of the output layer
  • F N (I) is the output of the hidden layer
  • * is the convolution operation
  • W N+1 is the weight coefficient of the convolutional layer filter bank of the output layer
  • B N+ 1 is the offset coefficient of the convolution layer filter bank of the output layer.
  • W N+1 corresponds to n N+1 convolution filters, that is, n N+1 convolution filters are applied to the input of the N+1th convolution layer, and n N+1 image blocks are output.
  • n N+1 is the number of output de-distorted image color components, generally equal to the number of input distortion image color components. If only one de-distorted image color component is output, n N+1 is generally 1
  • the size of the convolution kernel of each convolution filter is c N+1 ⁇ f N+1 ⁇ f N+1 , where c N+1 is the number of input channels and f N+1 is the number of each convolution kernel The size of the space.
  • the output layer adopts a Residual Learning structure
  • the output layer includes a convolution layer.
  • multiple distortion image blocks can be filtered at the same time, so that parallelization filtering can be implemented, and the efficiency of video coding is improved.
  • FIG. 17 a method for training a convolutional neural network model is also proposed, as shown in FIG. 17, which specifically includes the following processing steps:
  • Step 71 Acquire a preset training set, where the preset training set includes an original sample image, and a distortion image color component of the plurality of distortion images corresponding to the original sample image, and an edge information component corresponding to each of the distortion images, where the distortion image corresponds to The side information component represents the distorted feature of the distorted image relative to the original sample image.
  • the distortion characteristics of the plurality of distorted images are different.
  • the training set may include an original sample image, and performing image processing on the original sample image to obtain a plurality of distortion images having different distortion characteristics, and side information components corresponding to each distortion image; that is, the training set includes the An original sample image, the plurality of distorted images corresponding to the one original sample image and the side information components corresponding to each of the distorted images.
  • the training-related high-level parameters such as the learning rate and the gradient descent algorithm
  • the learning rate and the gradient descent algorithm may be appropriately set, and may be set in the manner mentioned above, or may be set in other manners, and will not be described in detail herein.
  • Step 73 Perform forward calculation.
  • the distortion image color component of each of the distortion images in the preset training set and the corresponding side information component are input to a convolutional neural network of a preset structure to perform convolution filtering processing, to obtain a de-distorted image corresponding to the distortion image.
  • Color component
  • the forward calculation of the convolutional neural network CNN with the parameter set ⁇ i is performed on the preset training set ⁇ , and the output F(Y) of the convolutional neural network is obtained, that is, the corresponding image of each distortion image Distorted image color component.
  • the current parameter set is ⁇ 1 .
  • the current parameter set ⁇ i is obtained by adjusting the parameter set ⁇ i-1 used last time. description.
  • Step 74 Determine a loss value of the plurality of original sample images based on the original image color component of the plurality of original sample images and the obtained de-distorted image color component.
  • MSE mean square error
  • Step 75 Determine whether the convolutional neural network of the preset structure adopting the current parameter set is converged based on the loss value. If not, proceed to step 76. If it converges, proceed to step 77.
  • the convergence may be determined when the loss value is less than the preset loss value threshold.
  • the loss value of each original sample image in the plurality of original sample images is less than a preset loss value threshold, determining convergence, or
  • the loss value of the original sample image of the plurality of original sample images is less than a preset loss value threshold, and the convergence is determined; or the difference between the loss value and the last calculated loss value may be calculated by the current calculation.
  • the convergence is determined.
  • the difference between the loss value of the original sample image obtained this time and the loss value of the original sample image obtained last time is calculated, that is, Calculating the difference of each original sample image, determining convergence when the difference of each original sample image is less than a preset change threshold, or determining convergence when the difference of any original sample image is less than a preset change threshold
  • the invention is not limited herein.
  • Step 77 The current parameter set is taken as the final parameter set ⁇ final of the output, and the convolutional neural network of the preset structure adopting the final parameter set ⁇ final is used as the trained convolutional neural network model.
  • the second expanded size corresponding to each convolutional layer in the convolutional neural network model is set to zero, and the first expanded size is equal to the second expanded size corresponding to each of the convolutional layers
  • the accumulated value is obtained, and the obtained second image block corresponding to each first image block is equal in width to the distortion image block corresponding to each first image block, and may be according to the distortion image block corresponding to each first image block.
  • the second image block corresponding to each first image block is composed into a frame de-distorted picture, and the frame de-distorted picture is buffered in the buffer as a frame reference picture.
  • each obtained The second image block corresponding to the first image block is equal in width to the first image block, and the second image block corresponding to each first image block may be trimmed according to the first expanded size to obtain each De-distorting image blocks corresponding to the first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture, the de-distorted image blocks corresponding to each first image block are combined into one frame de-distorted picture, The frame de-distorted picture is buffered in the buffer as a frame reference picture.
  • the edge trimming process when the edge trimming process is performed, for the second image block corresponding to any one of the first image blocks, the edge of the second image block that is subjected to the edge expansion processing is determined, and the second image is determined according to the first expanded edge size.
  • the edge determined in the block is trimmed to obtain a de-distorted image block corresponding to the first image block, and the width of the cut edge is equal to the first expanded size.
  • the convolutional layer corresponding to the convolutional layer is set to be larger than 0 and smaller than the second expanded dimension corresponding to the convolutional layer, that is, the volume
  • the size of the second image block is smaller than the size of the first image block and larger than the size of the first image block corresponding to the first image block obtained by the filtering.
  • the second image block corresponding to the first image block is subjected to trimming processing to obtain a de-distorted image block corresponding to each first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture,
  • the de-distorted image block corresponding to each first image block constitutes a frame de-distorted picture, and the frame-de-distorted picture is buffered in the buffer as a frame reference picture.
  • a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each
  • the de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block.
  • the generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering.
  • multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video coding efficiency.
  • an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during a decoding process, including:
  • Step 301 Acquire a distorted picture generated during video decoding.
  • a reconstructed picture may be generated during the video decoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.
  • the video decoding system includes a prediction module, an entropy decoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, a convolutional neural network model CNN, and a buffer.
  • the video decoding system encodes a process of inputting a bit stream into an entropy decoder, and the entropy decoder decodes the bit stream to obtain mode information, quantization parameters, and residual information, and input the mode information into a prediction module,
  • the quantization parameter is input to the convolutional neural network model, and the residual information is input to the inverse quantization unit.
  • the prediction module predicts the input mode information according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the reconstruction unit.
  • the prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch, and the mode information may include intra mode information and inter mode information.
  • the intra prediction unit may predict intra prediction information for the intra mode information, and the motion estimation and motion compensation unit may obtain inter prediction data by inter prediction of the inter mode information according to the reference picture buffered in the buffer, and the switch selects the intraframe. Predict the data or output the inter prediction data to the reconstruction unit.
  • the inverse quantization unit and the inverse transform unit respectively perform inverse quantization and inverse transform processing on the residual information to obtain prediction error information, and input the prediction error information into the reconstruction unit; the reconstruction unit generates and reconstructs the prediction error information according to the prediction error information and the prediction data. image.
  • the reconstructed picture generated by the reconstructing unit may be acquired, and the reconstructed picture is taken as a distorted picture.
  • a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture.
  • the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.
  • Steps 302-305 are the same as steps 202-205 above, and will not be described in detail herein.
  • a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each
  • the de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block.
  • the generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering.
  • multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video decoding efficiency.
  • an embodiment of the present application provides a device 400 for image filtering, where the device 400 includes:
  • a first acquiring module 401 configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;
  • the second obtaining module 402 is configured to obtain a plurality of first image blocks by dividing the distorted picture
  • a filtering module 403 configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
  • the generating module 404 is configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
  • the second obtaining module 402 includes:
  • a dividing unit configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture
  • an edge expansion unit configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
  • the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
  • the width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size.
  • the width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion
  • the width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
  • the edge expansion unit is configured to:
  • the device 400 further includes:
  • a first setting module configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer
  • the size, the second expanded size is an expanded size of the convolutional layer when the convolutional neural network model is trained.
  • the device 400 further includes:
  • a second setting module configured to set the first expanded size according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
  • the generating module 404 includes:
  • An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
  • a component unit configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
  • a determining module configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.
  • a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks, thereby obtaining each Deblocking image blocks corresponding to the distorted image blocks, and generating a frame of de-distorted pictures according to the de-distorted image blocks corresponding to each of the distorted image blocks.
  • the generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering.
  • FIG. 23 is a block diagram showing the structure of a terminal 500 according to an exemplary embodiment of the present invention.
  • the terminal 500 can be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and a MP4 (Moving Picture Experts Group Audio Layer IV). Image experts compress standard audio layers 4) players, laptops or desktops.
  • Terminal 500 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.
  • the terminal 500 includes a processor 501 and a memory 502.
  • Processor 501 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 501 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
  • the processor 501 may also include a main processor and a coprocessor.
  • the main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
  • the processor 501 can be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
  • the processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 502 can include one or more computer readable storage media, which can be non-transitory. Memory 502 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in the memory 502 is configured to store at least one instruction for execution by the processor 501 to implement one of the methods provided by the method embodiments of the present application. Image filtering method.
  • the terminal 500 optionally further includes: a peripheral device interface 503 and at least one peripheral device.
  • the processor 501, the memory 502, and the peripheral device interface 503 can be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 503 via a bus, signal line or circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 504, a touch display screen 505, a camera 506, an audio circuit 507, a positioning component 508, and a power source 509.
  • Peripheral device interface 503 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to processor 501 and memory 502.
  • processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any of processor 501, memory 502, and peripheral interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the RF circuit 504 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
  • Radio frequency circuit 504 communicates with the communication network and other communication devices via electromagnetic signals.
  • the RF circuit 504 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 504 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • Radio frequency circuit 504 can communicate with other terminals via at least one wireless communication protocol.
  • the wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks.
  • the RF circuit 504 may also include NFC (Near Field Communication) related circuitry, which is not limited in this application.
  • the display screen 505 is used to display a UI (User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • display 505 is a touch display
  • display 505 also has the ability to acquire touch signals over the surface or surface of display 505.
  • the touch signal can be input to the processor 501 as a control signal for processing.
  • display 505 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 505 may be one, and the front panel of the terminal 500 is disposed; in other embodiments, the display screen 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; In still other embodiments, display screen 505 can be a flexible display screen disposed on a curved surface or a folded surface of terminal 500. Even the display screen 505 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
  • the display screen 505 can be prepared by using an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
  • Camera component 506 is used to capture images or video.
  • camera assembly 506 includes a front camera and a rear camera.
  • the front camera is placed on the front panel of the terminal, and the rear camera is placed on the back of the terminal.
  • the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions.
  • camera assembly 506 can also include a flash.
  • the flash can be a monochrome temperature flash or a two-color temperature flash.
  • the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
  • the audio circuit 507 can include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 501 for processing, or input to the RF circuit 504 for voice communication.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is then used to convert electrical signals from processor 501 or radio frequency circuit 504 into sound waves.
  • the speaker can be a conventional film speaker or a piezoelectric ceramic speaker.
  • the audio circuit 507 can also include a headphone jack.
  • the location component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service).
  • the positioning component 508 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
  • Power source 509 is used to power various components in terminal 500.
  • the power source 509 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery that is charged by a wired line
  • a wireless rechargeable battery is a battery that is charged by a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • terminal 500 also includes one or more sensors 510.
  • the one or more sensors 510 include, but are not limited to, an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.
  • the acceleration sensor 511 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 500.
  • the acceleration sensor 511 can be used to detect components of gravity acceleration on three coordinate axes.
  • the processor 501 can control the touch display 505 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 511.
  • the acceleration sensor 511 can also be used for the acquisition of game or user motion data.
  • the gyro sensor 512 can detect the body direction and the rotation angle of the terminal 500, and the gyro sensor 512 can cooperate with the acceleration sensor 511 to collect the 3D motion of the user to the terminal 500. Based on the data collected by the gyro sensor 512, the processor 501 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • the pressure sensor 513 may be disposed at a side border of the terminal 500 and/or a lower layer of the touch display screen 505.
  • the pressure sensor 513 When the pressure sensor 513 is disposed on the side frame of the terminal 500, the user's holding signal to the terminal 500 can be detected, and the processor 501 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513.
  • the operability control on the UI interface is controlled by the processor 501 according to the user's pressure on the touch display screen 505.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 514 is used to collect the fingerprint of the user.
  • the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon identifying that the identity of the user is a trusted identity, the processor 501 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
  • the fingerprint sensor 514 can be disposed on the front, back, or side of the terminal 500. When the physical button or vendor logo is provided on the terminal 500, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer logo.
  • Optical sensor 515 is used to collect ambient light intensity.
  • the processor 501 can control the display brightness of the touch display 505 based on the ambient light intensity acquired by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 505 is lowered.
  • the processor 501 can also dynamically adjust the shooting parameters of the camera assembly 506 based on the ambient light intensity acquired by the optical sensor 515.
  • Proximity sensor 516 also referred to as a distance sensor, is typically disposed on the front panel of terminal 500. Proximity sensor 516 is used to collect the distance between the user and the front of terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front side of the terminal 500 is gradually decreasing, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the screen state; when the proximity sensor 516 detects When the distance between the user and the front side of the terminal 500 gradually becomes larger, the processor 501 controls the touch display screen 505 to switch from the state of the screen to the bright state.
  • FIG. 23 does not constitute a limitation to the terminal 500, and may include more or less components than those illustrated, or may combine some components or adopt different component arrangements.

Abstract

The present application relates to an image filtering method and a device, pertaining to the field of video imaging. The method comprises: acquiring a distorted image, the distorted image being distorted with respect to an original video image input into a video encoding system; dividing the distorted image, and acquiring multiple distorted image blocks comprised in the distorted image; filtering each of the distorted image blocks of the distorted image by means of a convolutional neural network model, and obtaining undistorted image blocks respectively corresponding to the distorted image blocks; and generating an image frame according to the undistorted image blocks respectively corresponding to the distorted image blocks. The device comprises: a first acquisition module, a second acquisition module, a filtering module, and a generation module. The present application reduces the amount of resources required for filtering, such that an apparatus meets a resource requirement for filtering.

Description

一种图片滤波的方法及装置Method and device for image filtering
本申请要求于2018年1月18日提交的申请号为201810050422.8、发明名称为“一种图片滤波的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20181005042, filed on Jan. 18, s., the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本申请涉及视频领域,特别涉及一种图片滤波的方法及装置。The present application relates to the field of video, and in particular, to a method and an apparatus for filtering pictures.
背景技术Background technique
在视频编码系统中,在对原始视频图片进行编码时,原始视频图片会被进行多次处理,并得到重构图片。得到的重构图片相对原始视频图片可能已经发生像素偏移,即重构图片存在失真,导致视觉障碍或假象。In the video coding system, when encoding the original video picture, the original video picture is processed multiple times and the reconstructed picture is obtained. The resulting reconstructed picture may have been pixel-shifted relative to the original video picture, ie, the reconstructed picture is distorted, resulting in visual impairment or artifacts.
这些失真不但影响重构图像的主客观质量,若重构图像作为后续编码像素的参考,这还会影响后续编码像素的预测准确性,影响最终比特流的大小。因此,视频编解码系统中,加入环内滤波模块,通过环内滤波模块对该重构图片进行滤波,以消除重构图片存在的失真。These distortions not only affect the subjective and objective quality of the reconstructed image. If the reconstructed image is used as a reference for subsequent encoded pixels, this will also affect the prediction accuracy of subsequent encoded pixels and affect the size of the final bitstream. Therefore, in the video codec system, an in-loop filtering module is added, and the reconstructed picture is filtered by an in-loop filtering module to eliminate distortion existing in the reconstructed picture.
发明人在实现本申请的过程中,发现上述方式至少存在如下缺陷:In the process of implementing the present application, the inventors found that the above method has at least the following defects:
目前环内滤波模块对整帧重构图片进行滤波,当重构图片是高分辨率的图片时,滤波重构图片所需要的资源往往较高,使得设备可能无法满足。例如对4K分辨率的重构图片进行滤波,可能会引起显存不足的问题。Currently, the in-loop filtering module filters the reconstructed picture of the entire frame. When the reconstructed picture is a high-resolution picture, the resources required for filtering and reconstructing the picture are often high, so that the device may not be satisfied. For example, filtering a reconstructed picture of 4K resolution may cause insufficient memory.
发明内容Summary of the invention
为了使设备能够满足滤波所需的资源,本申请实施例提供了一种图片滤波的方法及装置。所述技术方案如下:In order to enable the device to meet the resources required for filtering, the embodiment of the present application provides a method and an apparatus for filtering a picture. The technical solution is as follows:
第一方面,本申请实施例提供了一种图片滤波的方法,所述方法包括:In a first aspect, an embodiment of the present application provides a method for filtering a picture, where the method includes:
获取失真图片,所述失真图片相对于输入到视频编码系统中的原始视频图片存在失真;Obtaining a distorted picture that is distorted relative to an original video picture that is input into the video encoding system;
通过对所述失真图片进行划分,获取多个第一图像块;Obtaining a plurality of first image blocks by dividing the distorted picture;
使用卷积神经网络模型对每个第一图像块进行滤波,得到所述每个第一图像块对应的第二图像块;And filtering each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
根据所述每个第一图像块对应的第二图像块生成一帧去失真图片。And generating a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
可选的,所述通过对所述失真图片进行划分,获取多个第一图像块,包括:Optionally, the acquiring the plurality of first image blocks by dividing the distorted picture comprises:
根据目标宽度和目标高度对所述失真图片进行划分,得到所述失真图片包括的多个失真图像块;Dividing the distorted picture according to a target width and a target height to obtain a plurality of distorted image blocks included in the distorted picture;
根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块。Performing edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.
可选的,所述多个失真图像块包括位于所述失真图片的顶点位置的第一失真图像块、位于所述失真图片的上边界和下边界上的第二失真图像块、位于所述失真图片的左边界和右边界上的第三失真图像块和除所述第一失真图像块、第二失真图像块和第三失真图像块之外的第四失真图像块;Optionally, the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
所述第一失真图像块的宽度和高度分别等于W 1-lap和H 1-lap,W 1为所述目标宽度,H 1为所述目标高度,lap为所述第一扩边尺寸,所述第二失真图像块的宽度和高度分别等于W 1-2lap和H 1-lap,所述第三失真图像块的宽度和高度分别为W 1-lap和H 1-2lap,所述第四失真图像块的宽度和高度分别为W 1-2lap和H 1-2lap。 The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
可选的,所述根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块,包括:Optionally, the performing the edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain the first image block corresponding to each of the distorted image blocks comprises:
根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到所述目标失真图像块对应的第一图像块,所述目标失真图像块为所述第一失真图像块、所述第二失真图像块和所述第三失真图像块,所述目标边缘为所述目标失真图像块中不与所述失真图片的边界重合的边缘;And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;
根据所述第一扩边尺寸,对所述第四失真图像块的四个边缘进行扩边处理,得到所述第四失真图像块对应的第一图像块。And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
可选的,所述使用卷积神经网络模型分别对所述失真图片的每个失真图像块进行滤波之前,还包括:Optionally, before using the convolutional neural network model to separately filter each of the distorted image blocks of the distorted picture, the method further includes:
设置所述卷积神经网络模型包括的卷积层对应的扩边尺寸,所述设置的扩边尺寸不小于零且不大于所述卷积层对应的第二扩边尺寸,所述第二扩边尺寸为在训练所述卷积神经网络模型时所述卷积层的扩边尺寸。And setting a flanged dimension corresponding to the convolution layer included in the convolutional neural network model, the set expansion dimension is not less than zero and not greater than a second expansion dimension corresponding to the convolution layer, and the second expansion The edge size is the expanded size of the convolutional layer when the convolutional neural network model is trained.
可选的,所述方法还包括:Optionally, the method further includes:
根据所述卷积神经网络模型包括的每个卷积层对应的第二扩边尺寸,设置所述第一扩边尺寸。The first expanded size is set according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
可选的,所述根据所述每个失真图像块对应的去失真图像块生成一帧去失真图片,包括:Optionally, the generating, according to the de-distorted image block corresponding to each of the distorted image blocks, a frame of the de-distorted image, comprising:
对所述每个失真图像块对应的去失真图像块进行栽边处理,得到每个失真图像块对应的第三图像块;Performing edge processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
将所述每个失真图像块对应的第三图像块组成一帧去失真图片。The third image block corresponding to each of the distorted image blocks is composed into a frame de-distorted picture.
可选的,所述方法还包括:Optionally, the method further includes:
根据所述第一扩边尺寸、所述失真图片的宽度和高度,确定所述目标宽度和所述目标高度。The target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
第二方面,本申请实施例提供了一种图片滤波的装置,所述装置包括:In a second aspect, the embodiment of the present application provides an apparatus for filtering a picture, where the apparatus includes:
第一获取模块,用于获取失真图片,所述失真图片相对于输入到视频编码系统中的原始视频图片存在失真;a first acquiring module, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;
第二获取模块,用于通过对所述失真图片进行划分,获取多个第一图像块;a second acquiring module, configured to obtain a plurality of first image blocks by dividing the distorted picture;
滤波模块,用于使用卷积神经网络模型对每个第一图像块进行滤波,得到所述每个第一图像块对应的第二图像块;a filtering module, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
生成模块,用于根据所述每个第一图像块对应的第二图像块生成一帧去失真图片。And a generating module, configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
可选的,所述第二获取模块包括:Optionally, the second obtaining module includes:
划分单元,用于根据目标宽度和目标高度对所述失真图片进行划分,得到所述失真图片包括的多个失真图像块;a dividing unit, configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture;
扩边单元,用于根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块。And an edge expansion unit, configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
可选的,所述多个失真图像块包括位于所述失真图片的顶点位置的第一失真图像块、位于所述失真图片的上边界和下边界上的第二失真图像块、位于所述失真图片的左边界和右边界上的第三失真图像块和除所述第一失真图像块、第二失真图像块和第三失真图像块之外的第四失真图像块;Optionally, the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
所述第一失真图像块的宽度和高度分别等于W 1-lap和H 1-lap,W 1为所述目标宽度,H 1为所述目标高度,lap为所述第一扩边尺寸,所述第二失真图像块的宽度和高度分别等于W 1-2lap和H 1-lap,所述第三失真图像块的宽度和高 度分别为W 1-lap和H 1-2lap,所述第四失真图像块的宽度和高度分别为W 1-2lap和H 1-2lap。 The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
可选的,所述扩边单元,用于:Optionally, the edge expansion unit is configured to:
根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到所述目标失真图像块对应的第一图像块,所述目标失真图像块为所述第一失真图像块、所述第二失真图像块和所述第三失真图像块,所述目标边缘为所述目标失真图像块中不与所述失真图片的边界重合的边缘;And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;
根据所述第一扩边尺寸,对所述第四失真图像块的四个边缘进行扩边处理,得到所述第四失真图像块对应的第一图像块。And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
可选的,所述装置还包括:Optionally, the device further includes:
第一设置模块,用于设置所述卷积神经网络模型包括的卷积层对应的扩边尺寸,所述设置的扩边尺寸不小于零且不大于所述卷积层对应的第二扩边尺寸,所述第二扩边尺寸为在训练所述卷积神经网络模型时所述卷积层的扩边尺寸a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer a size, the second expanded size is an expanded size of the convolution layer when the convolutional neural network model is trained
可选的,所述装置还包括:Optionally, the device further includes:
第二设置模块,用于根据所述卷积神经网络模型包括的每个卷积层对应的第二扩边尺寸,设置所述第一扩边尺寸。And a second setting module, configured to set the first expanded size according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
可选的,所述生成模块包括:Optionally, the generating module includes:
栽边单元,用于对所述每个失真图像块对应的去失真图像块进行栽边处理,得到每个失真图像块对应的第三图像块;An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
组成单元,用于将所述每个失真图像块对应的第三图像块组成一帧去失真图片。And a component unit, configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
可选的,所述装置还包括:Optionally, the device further includes:
确定模块,用于根据所述第一扩边尺寸、所述失真图片的宽度和高度,确定所述目标宽度和所述目标高度。And a determining module, configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.
第三方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现所述第一方面或第一方面任可选的方式提供的方法步骤。In a third aspect, an embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the first aspect or the first aspect is implemented. The method steps provided in any alternative manner.
本申请实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present application may include the following beneficial effects:
通过对视频编解码过程中生成的失真图片进行划分,获取失真图片包括的多个失真图像块,再使用卷积神经网络模型分别对失真图片的每个失真图像块进行滤波,得到每个失真图像块对应的去失真图像块,根据每个失真图像块对应的去失真图像块生成一帧图片。生成的一帧图片为滤波后的图片,由于使用卷积神经网络滤波对失真图像块进行滤波,这样相比对整帧失真图片进行滤波,可以减小滤波所需要的资源,从而使设备能够满足滤波所需。By dividing the distorted picture generated in the video encoding and decoding process, obtaining a plurality of distorted image blocks included in the distorted picture, and then using the convolutional neural network model to respectively filter each distorted image block of the distorted picture to obtain each distorted image. The block corresponding de-distorted image block generates a frame of picture according to the de-distorted image block corresponding to each of the distorted image blocks. The generated one-frame picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, so that the device can satisfy Required for filtering.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。The above general description and the following detailed description are intended to be illustrative and not restrictive.
附图说明DRAWINGS
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The drawings herein are incorporated in and constitute a part of the specification,
图1是本申请实施例提供的一种图片滤波的方法流程图;FIG. 1 is a flowchart of a method for filtering a picture according to an embodiment of the present application;
图2是本申请实施例提供的另一种图片滤波的方法流程图;2 is a flowchart of another method for filtering a picture provided by an embodiment of the present application;
图3是本申请实施例提供的一种视频编码系统的结构框图;3 is a structural block diagram of a video encoding system according to an embodiment of the present application;
图4是本申请实施例提供的另一种视频编码系统的结构框图;4 is a structural block diagram of another video encoding system according to an embodiment of the present application;
图5是本申请实施例提供的划分图像块的示意图;FIG. 5 is a schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图6是本申请实施例提供的划分图像块的另一示意图;FIG. 6 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图7是本申请实施例提供的划分图像块的另一示意图;FIG. 7 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图8是本申请实施例提供的划分图像块的另一示意图;FIG. 8 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图9是本申请实施例提供的划分图像块的另一示意图;FIG. 9 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图10是本申请实施例提供的划分图像块的另一示意图;FIG. 10 is another schematic diagram of a divided image block provided by an embodiment of the present application; FIG.
图11是本申请实施例提供的技术方案的系统架构图;11 is a system architecture diagram of a technical solution provided by an embodiment of the present application;
图12是本申请实施例提供的技术方案的数据流示意图;FIG. 12 is a schematic diagram of data flow of a technical solution provided by an embodiment of the present application;
图13是本申请实施例获得失真图像的失真图像颜色分量的示意图;FIG. 13 is a schematic diagram of obtaining a distortion image color component of a distorted image according to an embodiment of the present application; FIG.
图14是本申请实施例提供的边信息分量的示意图之一;14 is a schematic diagram of side information components provided by an embodiment of the present application;
图15是本申请实施例提供的边信息分量的示意图之二;15 is a second schematic diagram of side information components provided by an embodiment of the present application;
图16是本申请实施例提供的失真图像的去失真方法的流程图;16 is a flowchart of a method for removing distortion of a distorted image according to an embodiment of the present application;
图17是本申请实施例提供的卷积神经网络模型训练方法的流程图;17 is a flowchart of a method for training a convolutional neural network model provided by an embodiment of the present application;
图18是本申请实施例提供的另一种图片滤波的方法流程图;FIG. 18 is a flowchart of another method for filtering a picture according to an embodiment of the present disclosure;
图19是本申请实施例提供的一种视频编码系统的结构框图;19 is a structural block diagram of a video encoding system according to an embodiment of the present application;
图20是本申请实施例提供的另一种视频编码系统的结构框图;20 is a structural block diagram of another video encoding system according to an embodiment of the present application;
图21是本申请实施例提供的另一种视频编码系统的结构框图;FIG. 21 is a structural block diagram of another video encoding system according to an embodiment of the present disclosure;
图22是本申请实施例提供的一种图片滤波的装置示意图;FIG. 22 is a schematic diagram of an apparatus for filtering a picture according to an embodiment of the present application;
图23是本申请实施例提供的一种装置结构示意图。FIG. 23 is a schematic structural diagram of a device according to an embodiment of the present application.
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。The embodiments of the present application have been illustrated by the above-described figures, which will be described in more detail hereinafter. The drawings and the written description are not intended to limit the scope of the present invention in any way, and the concept of the present application will be described by those skilled in the art by referring to the specific embodiments.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Instead, they are merely examples of devices and methods consistent with aspects of the present application as detailed in the appended claims.
参见图1,本申请实施列提供了一种图片滤波的方法,包括:Referring to FIG. 1, an embodiment of the present application provides a method for image filtering, including:
步骤101:获取视频编解码过程生成的失真图片。Step 101: Acquire a distortion picture generated by a video codec process.
步骤102:通过对失真图片进行划分,获取多个失真图像块。Step 102: Acquire a plurality of distorted image blocks by dividing the distorted picture.
可选的,在视频编码或解码的过程中,可以获取到整帧视频图片,然后对整帧视频图片进行划分,得到多个失真图片。或者,在视频编码或解码的过程中,可以每次获取到整帧视频图片中的部分图像数据,当获取到的图像数据达到一个失真图像块时,就对该失真图像块执行如下操作,从而实现将失真图片划分成多个失真图像块,且可提高视频编码或解码的效率。Optionally, in the process of video encoding or decoding, the entire frame of the video image may be obtained, and then the entire frame of the video image is divided to obtain multiple distortion pictures. Alternatively, in the process of video encoding or decoding, part of the image data in the entire frame of the video image may be acquired each time. When the acquired image data reaches a distorted image block, the following operations are performed on the distorted image block, thereby The method of dividing the distorted picture into a plurality of distorted image blocks is realized, and the efficiency of video encoding or decoding can be improved.
步骤103:使用卷积神经网络模型对每个失真图像块进行滤波,得到每个失真图像块对应的去失真图像块。Step 103: Filter each of the distorted image blocks using a convolutional neural network model to obtain a de-distorted image block corresponding to each of the distorted image blocks.
可选的,可以同时对一个或多个失真图像块进行滤波,即可以实现并行化滤波,提高滤波效率。Optionally, one or more distorted image blocks can be filtered at the same time, that is, parallel filtering can be implemented to improve filtering efficiency.
步骤104:根据每个失真图像块对应的去失真图像块生成一帧去失真图片。Step 104: Generate a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks.
其中,本实施例提供的方法可以发生在视频编码过程,或者,发生在视频解码过程。所以失真图片可以是视频编码过程中生成的视频图片,或者是视频 解码过程中生成的视频图片。The method provided in this embodiment may occur in a video encoding process or in a video decoding process. Therefore, the distorted picture may be a video picture generated during the video encoding process or a video picture generated during the video decoding process.
在本申请实施例中,通过对视频编解码过程中生成的失真图片进行划分,获取多个失真图像块,再使用卷积神经网络模型分别对每个失真图像块进行滤波,得到每个失真图像块对应的去失真图像块,根据每个失真图像块对应的去失真图像块生成一帧去失真图片。生成的一帧去失真图片为滤波后的图片,由于使用卷积神经网络滤波对失真图像块进行滤波,这样相比对整帧失真图片进行滤波,可以减小滤波所需要的资源,从而使设备能够满足滤波所需的资源,该资源可以为显存和/或内存等资源。In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then each distorted image block is separately filtered by using a convolutional neural network model to obtain each distorted image. The block corresponding de-distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device The resources required for filtering can be satisfied, and the resources can be resources such as video memory and/or memory.
参见图2,本申请实施例提供了一种图片滤波的方法,该方法可以对编码过程中生成的失真图片进行滤波,包括:Referring to FIG. 2, an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during an encoding process, including:
步骤201:获取视频编码过程中生成的失真图片。Step 201: Acquire a distorted picture generated during video encoding.
在视频编码过程中会生成重构图片,失真图片可以为该重构图片,或者可以为对该重构图片进行滤波后得到的图片。A reconstructed picture may be generated during the video encoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.
参见图3所示的视频编码系统的结构示意图,视频编码系统包括预测模块、加法器、变换单元、量化单元、熵编码器、反量化单元、反变换单元、重建单元、CNN(卷积神经网络模型)和缓存器等部分组成。Referring to the structure diagram of the video coding system shown in FIG. 3, the video coding system includes a prediction module, an adder, a transform unit, a quantization unit, an entropy encoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, and a CNN (convolution neural network). Model) and the buffer and other parts.
该视频编码系统编码的过程可以为:将原始图片输入到预测模块和加法器中,预测模块根据缓存器中的参考图片对输入的原始图片进行预测得到预测数据,并将该预测数据输入到加法器、熵编码器和重建单元。其中,预测模块包括帧内预测单元、运动估计与运动补偿单元和开关。帧内预测单元可以对原始图片进行帧内预测得到帧内预测数据,运动估计与运动补偿单元根据缓存器中缓存的参考图片对原始图片进行帧间预测得到帧间预测数据,开关选择将帧内预测数据或将帧间预测数据输出给加法器和重建单元。可选的,帧内预测数据可以包括帧内模式信息,帧间预测数据可以包括帧间模式信息。The process of encoding the video coding system may be: inputting the original picture into the prediction module and the adder, and the prediction module predicts the input original picture according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the addition method. , entropy coder and reconstruction unit. The prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch. The intra prediction unit may perform intra prediction on the original picture to obtain intra prediction data, and the motion estimation and motion compensation unit performs inter prediction on the original picture according to the reference picture buffered in the buffer to obtain inter prediction data, and the switch selects the intra frame. Predict the data or output the inter prediction data to the adder and the reconstruction unit. Optionally, the intra prediction data may include intra mode information, and the inter prediction data may include inter mode information.
加法器根据该预测数据和原始图片产生预测误差信息,变换单元对该预测误差信息进行变换,将变换的该预测误差信息输出给量化单元;量化单元根据量化参数对变换的该预测误差信息进行量化得到残差信息,将该残差信息输出给熵编码器和反量化单元;熵编码器对该残差信息和预测数据等信息进行编码形成比特流。同时,反量化单元和反变换单元分别对该残差信息进行反量化和反变换处理,得到预测误差信息,将该预测误差信息输入到重建单元中;重建 单元根据该预测误差信息和预测数据生成重构图片。相应的,在本步骤中,可以获取重建单元生成的重构图片,并将该重构图片作为失真图片。The adder generates prediction error information according to the prediction data and the original picture, and the transform unit transforms the prediction error information, and outputs the transformed prediction error information to the quantization unit; the quantization unit quantizes the transformed prediction error information according to the quantization parameter. The residual information is obtained, and the residual information is output to an entropy encoder and an inverse quantization unit; the entropy encoder encodes information such as residual information and prediction data to form a bitstream. At the same time, the inverse quantization unit and the inverse transform unit respectively perform inverse quantization and inverse transform processing on the residual information to obtain prediction error information, and input the prediction error information into the reconstruction unit; the reconstruction unit generates the prediction error information according to the prediction error information and the prediction data. Refactor the image. Correspondingly, in this step, the reconstructed picture generated by the reconstructing unit may be acquired, and the reconstructed picture is taken as a distorted picture.
可选的,参见图4,在卷积神经网络模型和重建单元之间还可以串联滤波器,该滤波器还可以对重建单元生成的重构图片进行滤波,输出滤波的重构图片。相应的,在本步骤中,可以获取滤波的重构图片,并将滤波的该重构图片作为失真图片。Optionally, referring to FIG. 4, a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.
该失真图片相对于该原始视频图片存在失真。The distorted picture is distorted relative to the original video picture.
步骤202:根据目标宽度和目标高度对失真图片进行划分,得到失真图片包括的多个失真图像块。Step 202: Divide the distorted picture according to the target width and the target height to obtain a plurality of distorted image blocks included in the distorted picture.
在本步骤中划分得到的失真图像块可以是等大小的图像块,也可以不是等大小的图像块。The distortion image block divided in this step may be an image block of equal size or may not be an image block of equal size.
第一种情况、当每个失真图像块可以等大小时,失真图片中的每个失真图像块的宽度可以等于目标宽度,以及失真图片中的每个失真图像块的高度可以等于目标高度。In the first case, when each distorted image block can be equal in size, the width of each distorted image block in the distorted picture can be equal to the target width, and the height of each distorted image block in the distorted picture can be equal to the target height.
当失真图片的宽度是目标宽度的整数倍时,根据目标宽度划分得到的每行失真图像块中各失真图像块之间不存在重叠。例如,参见图5,失真图片的宽度等于目标宽度的整数倍,根据目标宽度划分得到的每行包括三个失真图像块,对于每行失真图像块,该行包括的三个失真图像块之间不存在重叠。When the width of the distorted picture is an integral multiple of the target width, there is no overlap between the distorted image blocks in each line of the distorted image block obtained according to the target width division. For example, referring to FIG. 5, the width of the distorted picture is equal to an integral multiple of the target width, and each line obtained according to the target width division includes three distorted image blocks, and for each line of distorted image blocks, the line includes three distorted image blocks. There is no overlap.
当失真图片的宽度不是目标宽度的整数倍时,根据目标宽度划分得到的每行失真图像块中存在两个失真图像块之间存在重叠。例如,参见图6,失真图片的宽度不等于目标宽度的整数倍,根据目标宽度划分得到的每行包括四个失真图像块,对于每行失真图像块,该行包括的第三个和第四个失真图像块之间存在重叠,其中,在图6中ΔW是第三个和第四个失真图像块的重叠宽度。When the width of the distorted picture is not an integer multiple of the target width, there is an overlap between two distorted image blocks in each line of the distorted image block obtained according to the target width division. For example, referring to FIG. 6, the width of the distorted picture is not equal to an integral multiple of the target width, and each line obtained according to the target width includes four distorted image blocks, and for each line of distorted image blocks, the line includes the third and fourth There is an overlap between the pieces of distorted image, where ΔW is the overlapping width of the third and fourth distorted image blocks in FIG.
当失真图片的高度是目标高度的整数倍时,根据目标高度划分得到的每列失真图像块中的各失真图像块之间不存在重叠。例如,参见图5,失真图片的高度等于目标高度的整数倍,根据目标高度划分得到的每列包括三个失真图像块,对于每列失真图像块,该列包括的三个失真图像块之间不存在重叠。When the height of the distorted picture is an integral multiple of the target height, there is no overlap between the respective distorted image blocks in each column of the distorted image block obtained according to the target height division. For example, referring to FIG. 5, the height of the distorted picture is equal to an integral multiple of the target height, and each column obtained according to the target height division includes three distorted image blocks, and for each column of distorted image blocks, the column includes three distorted image blocks. There is no overlap.
当失真图片的高度不是目标高度的整数倍时,根据目标高度划分得到的每列失真图像块中的存在两个失真图像块之间存在重叠。例如,参见图7,失真图片的高度不等于目标高度的整数倍,根据目标高度划分得到的每列包括四个失真图像块,对于每列失真图像块,该列包括的第三个和第四个失真图像块之 间存在重叠,其中,在图7中ΔH是第三个和第四个失真图像块的重叠高度。When the height of the distorted picture is not an integer multiple of the target height, there is an overlap between the two distorted image blocks in each column of the distorted image block obtained according to the target height division. For example, referring to FIG. 7, the height of the distorted picture is not equal to an integral multiple of the target height, and each column obtained according to the target height division includes four distorted image blocks, and for each column of distorted image blocks, the column includes the third and fourth There is an overlap between the pieces of distorted image, where ΔH is the overlapping height of the third and fourth distorted image blocks in FIG.
第二种情况、当每个失真图像块可以不等大小时,得到的多个失真图像块可以包括第一失真图像块、第二失真图像块、第三失真图像块和第四失真图像块四种类型。In the second case, when each of the distorted image blocks can be unequal in size, the obtained plurality of distorted image blocks may include a first distorted image block, a second distorted image block, a third distorted image block, and a fourth distorted image block. Types.
参见图8(图中的实线框为失真图像块),第一失真图像块位于失真图片的顶点位置,第一失真图像块分别为图8中的图像块P1、P5、P16和P20,第一失真图像块P1、P5、P16和P20的宽度和高度分别等于W 1-lap和H 1-lap,W 1为目标宽度,H 1为目标高度,lap为第一扩边尺寸。 Referring to FIG. 8 (the solid line frame in the figure is a distorted image block), the first distorted image block is located at a vertex position of the distorted picture, and the first distorted image block is respectively the image blocks P1, P5, P16, and P20 in FIG. The width and height of a distorted image block P1, P5, P16, and P20 are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size.
第二失真图像块位于所述失真图片的上边界和下边界上,第二失真图像块与第一失真图像块不同,第二失失真图像块分别为图8中的图像块P2、P3、P4、P17、P18和P19,且第二失真图像块P2、P3、P4、P17、P18和P19的宽度和高度分别等于W 1-2lap和H 1-lap。 The second distorted image block is located on an upper boundary and a lower boundary of the distorted picture, the second distorted image block is different from the first distorted image block, and the second distorted image block is respectively the image block P2, P3, P4 in FIG. , P17, P18, and P19, and the width and height of the second distorted image blocks P2, P3, P4, P17, P18, and P19 are equal to W 1 -2lap and H 1 -lap, respectively.
第三失真图像块位于失真图片的左边界和右边界上,第三失真图像块与第一失真图像块不同,第三失真图像块分别为图8中的图像块P6、P11、P10和P15,第三失真图像块P6、P11、P10和P15的宽度和高度分别为W 1-lap和H 1-2lap。 The third distorted image block is located on the left and right borders of the distorted picture, the third distorted image block is different from the first distorted image block, and the third distorted image block is respectively the image blocks P6, P11, P10 and P15 in FIG. The width and height of the third distortion image blocks P6, P11, P10, and P15 are W 1 -lap and H 1 -2lap, respectively.
多个失真图块中除第一失真图像块、第二失真图像块和第三失真图像块之外的失真图像块为第四失真图像块,第四失真图像块分别为图8中的图像块P7、P8、P9、P12、P13和P14,第四失真图像块P7、P8、P9、P12、P13和P14的宽度和高度分别为W 1-2lap和H 1-2lap。 The distortion image block of the plurality of distortion tiles except the first distortion image block, the second distortion image block, and the third distortion image block is a fourth distortion image block, and the fourth distortion image block is the image block in FIG. 8 respectively. P7, P8, P9, P12, P13 and P14, the width and height of the fourth distortion image blocks P7, P8, P9, P12, P13 and P14 are W 1 -2lap and H 1 -2lap, respectively.
其中,在第二种情况下每行失真图像块中的最后两个失真图像块,可能会存在部分重叠,也可能不存在部分重叠,例如,在图8中位于第一行的失真图像块P4和P5存在部分重叠,在图8中ΔW是第一行的失真图像块P4和P5的重叠宽度。以及,每列失真图像块中的最后两个失真图像块,可能会存在部分重叠,也可能不存在部分重叠,例如,在图8中位于第一列的失真图像块P11和P16存在部分重叠,在图8中ΔH是第一列的失真图像块P11和P16的重叠高度。Wherein, in the second case, the last two distorted image blocks in each line of the distorted image block may have partial overlap or partial overlap, for example, the distorted image block P4 located in the first line in FIG. There is a partial overlap with P5, and ΔW is the overlapping width of the distorted image blocks P4 and P5 of the first line in FIG. And, the last two distorted image blocks in each column of the distorted image block may have partial overlap or partial overlap. For example, the distorted image blocks P11 and P16 located in the first column in FIG. 8 partially overlap. In Fig. 8, ΔH is the overlapping height of the distortion image blocks P11 and P16 of the first column.
在执行本步骤之前,还可以设置第一扩边尺寸,以及根据第一扩边尺寸、失真图片的宽度和高度,确定目标宽度和目标高度。Before performing this step, the first expanded size may also be set, and the target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
其中,卷积神经网络模型包括多个卷积层,每个卷积层对应一个第二扩边尺寸。第一扩边尺寸是根据每个卷积层对应的第二扩边尺寸计算得到的。可选 的,可以将每个卷积层对应的第二扩边尺寸进行累加得到累加值,设置第一扩边尺寸大于或等于该累加值。Wherein, the convolutional neural network model comprises a plurality of convolution layers, each convolution layer corresponding to a second expanded size. The first expanded size is calculated based on the second expanded size corresponding to each convolutional layer. Optionally, the second expanded size corresponding to each convolution layer may be accumulated to obtain an accumulated value, and the first expanded size is set to be greater than or equal to the accumulated value.
对于确定目标宽度和目标高度的过程,将在后续内容中进行描述,在此先不说明。The process of determining the target width and target height will be described in the following content, which will not be explained here.
可选的,获取失真图片可以是获取到一整帧失真图片,然后才对该整帧失真图片进行划分;或者,Optionally, obtaining the distorted picture may be obtaining an entire frame of the distorted picture, and then dividing the entire frame of the distorted picture; or
每次获取该帧失真图片包括的部分图像数据,当获取的图像数据达到一个失真图像块的数据量时,就输出该失真图像块,从而实现对失真图片进行划分,这样不需要等待得到整帧失真图片,提高了视频编码的效率。Each time the partial image data included in the frame distortion picture is acquired, when the acquired image data reaches the data amount of a distorted image block, the distorted image block is output, thereby realizing division of the distorted picture, so that it is not necessary to wait for the entire frame. Distorted images improve the efficiency of video encoding.
其中,对于上述第一种情况,当获取的图像数据能够组成宽度为目标宽度且高度为目标高度的失真图像块时输出该失真图像块,这样实现将失真图片划分成等大小的多个失真图像块。对于上述第二种情况,当获取的图像数据为第一失真图像块的数据且能够组成第一失真图像块时,输出第一失真图块,当获取的图像数据为第二失真图像块的数据且能够组成第二失真图像块时,输出第二失真图块,当获取的图像数据为第三失真图像块的数据且能够组成第三失真图像块时,输出第三失真图块,当获取的图像数据为第四失真图像块的数据且能够组成第四失真图像块时,输出第四失真图块;从而将失真图片划分成第一失真图像块、第二失真图像块、第三失真图像块和第四失真图像块四种类型的失真图像块。Wherein, in the first case, when the acquired image data can form a distorted image block having a width of a target width and a height of a target height, the distorted image block is output, thereby realizing dividing the distorted picture into a plurality of distorted images of equal size. Piece. For the second case described above, when the acquired image data is the data of the first distorted image block and can constitute the first distorted image block, the first distorted tile is output, and the acquired image data is the data of the second distorted image block. And when the second distortion image block can be composed, the second distortion tile is output, and when the acquired image data is data of the third distortion image block and can form a third distortion image block, the third distortion tile is output, when acquired When the image data is the data of the fourth distorted image block and can constitute the fourth distorted image block, outputting the fourth distorted tile; thereby dividing the distorted picture into the first distorted image block, the second distorted image block, and the third distorted image block And four types of distorted image blocks of the fourth distorted image block.
步骤203:根据第一扩边尺寸对每个失真图像块进行扩边处理,得到每个失真图像块对应的第一图像块。Step 203: Perform a process of expanding each of the distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.
可选的,当划分出的每个失真图像块等大小时,本步骤可以为:Optionally, when each of the distorted image blocks is sized, the step may be:
根据第一扩边尺寸分别对目标图像块的四个边缘进行扩边处理,得到目标图像块对应的第一图像块,目标图像块为所述多个失真图像块中的任一个失真图像块。The four edges of the target image block are respectively subjected to edge expansion processing according to the first expanded edge size to obtain a first image block corresponding to the target image block, and the target image block is any one of the plurality of distortion image blocks.
对目标图像块的每个边缘进行扩边的宽度等于第一扩边尺寸。假设,目标图像块的宽度为W 1,目标图像块的高度为H 1,第一扩边尺寸为lap,对目标图像块扩边后得到第一图像块的宽度为W 2=W 1+2lap,以及该第一图像块的高度为H 2=H 1+2lap。 The width of the edge of each of the target image blocks is equal to the first expanded size. Assume that the width of the target image block is W 1 , the height of the target image block is H 1 , and the size of the first expanded edge is lap. After the target image block is expanded, the width of the first image block is W 2 = W 1 + 2 lap. And the height of the first image block is H 2 = H 1 + 2 lap.
例如,参见图10,对于任一个失真图像块,假设为失真图像块P1,对失真图像块的每个边缘扩边第一扩边尺寸lap得到失真图像块。For example, referring to FIG. 10, for any of the distorted image blocks, a distorted image block P1 is assumed, and each edge of the distorted image block is expanded by a first expanded-edge size lap to obtain a distorted image block.
当在第一种情况下对失真图片划分得到等大小的失真图像块时,则在执行步骤202之前可以按如下步骤确定目标宽度和目标高度。When the distorted picture is divided into equal-sized distorted image blocks in the first case, the target width and the target height may be determined as follows before performing step 202.
对于目标宽度的确定过程可以包括31-34的过程,分别为:The process for determining the target width may include the 31-34 process, which are:
31:从预设宽度范围中选择一宽度值。31: Select a width value from the preset width range.
预设宽度范围为大于0且小于失真图片的宽度中的整数值,可选的,预设宽度范围大于第一扩边尺寸且小于失真图片的宽度的整数值。第一扩边尺寸通常大于或等于1个像素。例如,假设失真图片的宽度为10个像素以及假设第一扩边尺寸为1个像素,则预设宽度范围包括整数值2、3、4、5、6、7、8和9。The preset width ranges from greater than 0 and less than an integer value in the width of the distorted picture. Optionally, the preset width range is greater than the first expanded size and smaller than the integer value of the width of the distorted picture. The first expanded edge size is typically greater than or equal to 1 pixel. For example, assuming that the width of the distorted picture is 10 pixels and assuming that the first expanded size is 1 pixel, the preset width range includes integer values 2, 3, 4, 5, 6, 7, 8, and 9.
32:如果失真图片的宽度等于该该宽度值的整数倍,则将该宽度值确定为目标宽度,结束。32: If the width of the distorted picture is equal to an integral multiple of the width value, the width value is determined as the target width and ends.
33:如果失真图征的宽度不等于该宽度值的整数倍,则按如下第一公式计算该宽度值对应的重叠宽度。33: If the width of the distortion symbol is not equal to an integral multiple of the width value, the overlap width corresponding to the width value is calculated according to the first formula as follows.
第一公式为:
Figure PCTCN2019072412-appb-000001
The first formula is:
Figure PCTCN2019072412-appb-000001
在第一公式中,ΔW为选择的该宽度值对应的重叠宽度,W 1为选择的该宽度值,W 2为对失真图像块扩边处理后的第一图像块的宽度,W 3为失真图片的宽度,%为求余运算。 In the first formula, ΔW is the overlap width corresponding to the selected width value, W 1 is the selected width value, W 2 is the width of the first image block after the edge processing of the distorted image block, and W 3 is the distortion. The width of the image, % is the remainder operation.
34:如果该预设宽度范围中还存在未选择的宽度值,则从未选择的宽度值中选择一个宽度值,返回执行32,否则,将最小重叠宽度对应的宽度值确定为目标宽度。34: If there is still an unselected width value in the preset width range, a width value is selected from the unselected width values, and execution 32 is returned; otherwise, the width value corresponding to the minimum overlap width is determined as the target width.
对于目标高度的确定过程可以包括35-38的过程,分别为:The process of determining the target height may include the 35-38 process, which are:
35:从预设高度范围中选择一高度值。35: Select a height value from the preset height range.
预设高度范围为大于0且小于失真图片的高度中的整数值,可选的,预设高度范围大于第一扩边尺寸且小于失真图片的高度的整数值。例如,假设失真图片的高度为10个像素以及假设第一扩边尺寸为1个像素,则预设高度范围包括整数值2、3、4、5、6、7、8和9。The preset height range is greater than 0 and less than an integer value in the height of the distorted picture. Optionally, the preset height range is greater than the first expanded size and less than the integer value of the height of the distorted picture. For example, assuming that the height of the distorted picture is 10 pixels and assuming that the first expanded size is 1 pixel, the preset height range includes integer values 2, 3, 4, 5, 6, 7, 8, and 9.
36:如果失真图片的高度等于该高度值的整数倍,则将该高度值确定为目标高度,结束。36: If the height of the distorted picture is equal to an integral multiple of the height value, the height value is determined as the target height and ends.
37:如果失真图片的高度不等于该高度值的整数倍,则按如下第二公式计 算该高度值对应的重叠高度。37: If the height of the distorted picture is not equal to an integral multiple of the height value, the overlap height corresponding to the height value is calculated according to the second formula as follows.
第二公式为:
Figure PCTCN2019072412-appb-000002
The second formula is:
Figure PCTCN2019072412-appb-000002
在第二公式中,ΔH为选择的该宽度值对应的重叠宽度,H 1为选择的该高度值,H 2为对失真图像块扩边处理后第一图像块的高度,H 3为失真图片的高度。 In the second formula, ΔH is the overlap width corresponding to the selected width value, H 1 is the selected height value, H 2 is the height of the first image block after the distortion image block is expanded, and H 3 is the distortion picture. the height of.
38:如果该预设高度范围中还存在未选择的高度值,则从未选择的高度值中选择一个高度值,返回执行36,否则,将最小重叠高度对应的高度值确定为目标高度。38: If there is still an unselected height value in the preset height range, a height value is selected from the unselected height values, and execution 36 is returned; otherwise, the height value corresponding to the minimum overlap height is determined as the target height.
可选的,当划分出的每个失真图像块不等大小时,本步骤(步骤203)可以为:Optionally, when each of the divided image blocks is not equal in size, the step (step 203) may be:
对于位于失真图片边界的失真图像块,对该失真图像块的目标边缘进行扩边处理,目标边缘为该失真图像块中不与失真图片的边界重合的边缘,对于失真图片的其他失真图像块,可以分别对该失真图像块的四个边缘进行扩边处理。详细实现如下:For the distorted image block located at the boundary of the distorted picture, the target edge of the distorted image block is subjected to edge expansion processing, and the target edge is an edge of the distorted image block that does not coincide with the boundary of the distorted picture, and for other distorted image blocks of the distorted picture, The four edges of the distorted image block can be separately subjected to edge expansion processing. The detailed implementation is as follows:
根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到目标失真图像块对应的第一图像块,目标失真图像块为第一失真图像块、第二失真图像块和第三失真图像块,目标边缘为目标失真图像块中不与失真图片的边界重合的边缘。以及,根据第一扩边尺寸,对第四失真图像块的四个边缘进行扩边处理,得到第四失真图像块对应的第一图像块。其中,扩边的宽度均等于第一扩边尺寸。Performing edge expansion processing on the target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, the target distortion image block being the first distortion image block, the second distortion image block, and the third A distorted image block, the target edge being an edge of the target distorted image block that does not coincide with the boundary of the distorted picture. And, according to the first expanded size, four edges of the fourth distortion image block are subjected to edge expansion processing to obtain a first image block corresponding to the fourth distortion image block. Wherein, the width of the expanded edge is equal to the first expanded size.
也就是说:根据第一扩边尺寸分别对第一失真图像块的目标边缘、第二失真图像块的目标边缘和第三失真图像块的目标边缘进行扩边处理,得到第一失真图像块对应的第一图像块、第二失真图像块对应的第一图像块和第三失真图像块对应的第一图像块,失真图像块的目标边缘为失真图像块中不与失真图片的边界重合的边缘。In other words, the target edge of the first distortion image block, the target edge of the second distortion image block, and the target edge of the third distortion image block are respectively subjected to edge expansion processing according to the first expanded edge size, to obtain a first distortion image block correspondingly. a first image block, a first image block corresponding to the second distortion image block, and a first image block corresponding to the third distortion image block. The target edge of the distortion image block is an edge of the distortion image block that does not coincide with the boundary of the distortion image block. .
例如,参见图8,对于第一失真图像块P1,第一失真图像块P1的目标边缘为右边缘和下边缘,参见图9,根据第一扩边尺寸lap分别对该右边缘和下边缘进行扩边处理,得到第一失真图像块P1对应的第一图像块(为包括P1的虚线框)。For example, referring to FIG. 8, for the first distorted image block P1, the target edges of the first distorted image block P1 are the right edge and the lower edge. Referring to FIG. 9, the right and lower edges are respectively performed according to the first expanded edge size lap. The edge expansion processing obtains the first image block corresponding to the first distortion image block P1 (which is a broken line frame including P1).
参见图8,对于第二失真图像块P2,第二失真图像块P2的目标边缘为左边缘、右边缘和下边缘,参见图9,根据第一扩边尺寸lap分别对该左边缘、右边缘和下边缘进行扩边处理,得到第二失真图像块P2对应的第一图像块(为包括P2的虚线框)。Referring to FIG. 8, for the second distorted image block P2, the target edges of the second distorted image block P2 are a left edge, a right edge, and a lower edge. Referring to FIG. 9, the left edge and the right edge are respectively according to the first expanded edge size lap. The edge expansion processing is performed with the lower edge to obtain a first image block corresponding to the second distortion image block P2 (which is a dotted line frame including P2).
参见图8,对于第三失真图像块P6,第三失真图像块P6的目标边缘为上边缘、下边缘和右边缘,参见图9,根据第一扩边尺寸lap分别对该上边比、下边缘和右边缘进行扩边处理,得到第三失真图像块P6对应的第一图像块(为包括P6的虚线框)。Referring to FIG. 8, for the third distortion image block P6, the target edges of the third distortion image block P6 are an upper edge, a lower edge, and a right edge. Referring to FIG. 9, the upper edge ratio and the lower edge are respectively according to the first edge expansion size lap. The edge expansion processing is performed with the right edge to obtain a first image block corresponding to the third distortion image block P6 (which is a broken line frame including P6).
参见图8,对于第四失真图像块P8,参见图9,根据第一扩边尺寸lap分别第四失真图像块P8的四个边缘进行扩边处理,得到第四失真图像块P8对应的第一图像块(为包括P8的虚线框)。Referring to FIG. 8, for the fourth distortion image block P8, referring to FIG. 9, the four edges of the fourth distortion image block P8 are respectively subjected to edge expansion processing according to the first expanded edge size lap, and the first corresponding to the fourth distortion image block P8 is obtained. Image block (which is a dashed box including P8).
其中,在上述第二种情况下得到的每个第一图像块的宽度均等于目标宽度,以及每个第一图像块的高度均等于目标高度。Wherein, the width of each of the first image blocks obtained in the second case described above is equal to the target width, and the height of each of the first image blocks is equal to the target height.
当在第二种情况下对失真图片划分得到的每个失真图像块不是等大小的失真图像块时,则在执行步骤202之前可以按如下方式确定目标宽度和目标高度。可以为:When each of the distorted image blocks obtained by dividing the distorted picture in the second case is not a distorted image block of equal size, the target width and the target height may be determined as follows before performing step 202. Can be:
根据预设宽度范围中的每个宽度值,按如下第三公式计算每个宽度值对应的第一参数,将最小的第一参数对应的宽度值确定为目标宽度。According to each width value in the preset width range, the first parameter corresponding to each width value is calculated according to the following third formula, and the width value corresponding to the smallest first parameter is determined as the target width.
第三公式为:
Figure PCTCN2019072412-appb-000003
The third formula is:
Figure PCTCN2019072412-appb-000003
在上述第三公式中,S 1为第一参数,W 1为预设宽度范围中的宽度值,W 3为失真图片的宽度。 In the above third formula, S 1 is the first parameter, W 1 is the width value in the preset width range, and W 3 is the width of the distorted picture.
根据预设高度范围中的每个高度值,按如下第四公式计算每个高度值对应的第二参数,将最小的第二参数对应的高度值确定为目标高度。According to each height value in the preset height range, the second parameter corresponding to each height value is calculated according to the fourth formula below, and the height value corresponding to the smallest second parameter is determined as the target height.
第四公式为:
Figure PCTCN2019072412-appb-000004
The fourth formula is:
Figure PCTCN2019072412-appb-000004
在上述第四公式中,S 2为第二参数,H 1为预设高度范围中的高度值,H 3为失真图片的高度。 In the fourth formula above, S 2 is the second parameter, H 1 is the height value in the preset height range, and H 3 is the height of the distorted picture.
可选的,对于上述第一种情况或第二种情况下,上述对失真图像块的边缘扩边处理的方式有多种,在本步骤中列举如下三种方式,对失真图像块的边缘进行扩边处理,分别为。Optionally, in the foregoing first case or the second case, the edge expansion processing of the distortion image block may be performed in multiple manners. In this step, the following three manners are listed to perform the edge of the distortion image block. The edge expansion processing is respectively.
第一种方式,使用预设像素值对失真图像块的边缘进行扩边处理。In the first method, the edge of the distorted image block is subjected to edge expansion processing using a preset pixel value.
例如,预设像素值可以为0、1、2或3等像素值,参见图10,可以采用预设像素值对失真图像块P1的四条边缘扩边,且对每条边缘扩边的宽度等于第一扩边尺寸,对该边缘扩边得到的区域内的每个像素点的像素值为预设像素值。For example, the preset pixel value may be a pixel value of 0, 1, 2, or 3, and as shown in FIG. 10, the four edges of the distorted image block P1 may be expanded by a preset pixel value, and the width of the edge expansion of each edge is equal to The first expanded size, the pixel value of each pixel in the region obtained by the edge expansion is a preset pixel value.
第二种方式,使用失真图像块的边缘包括的各像素点的像素值对该边缘进行扩边处理。In the second mode, the edge is subjected to edge expansion processing using the pixel value of each pixel included in the edge of the distorted image block.
例如,参见图10,对于失真图像块的左边缘,可以使用左边缘包括的各像素点的像素值对左边缘进行扩边处理,对该左边缘扩边得到的区域内的各像素点,该像素点的像素值为该左边缘包括的某个像素点的像素值。For example, referring to FIG. 10, for the left edge of the distorted image block, the left edge may be subjected to edge expansion processing using the pixel value of each pixel included in the left edge, and each pixel in the region obtained by the left edge is expanded. The pixel value of the pixel is the pixel value of a certain pixel included in the left edge.
第三种方式,使用与失真图像块的边缘相邻的邻居图像块对该边缘进行扩边处理。In a third mode, the edge is subjected to edge expansion processing using a neighbor image block adjacent to the edge of the distorted image block.
例如,参见图10,失真图像块P1的右边缘相邻的邻居图像块为P4,使用邻居图像块P4,对失真图像块P1的右边缘进行扩边处理。For example, referring to FIG. 10, the neighboring image block adjacent to the right edge of the distorted image block P1 is P4, and the right edge of the distorted image block P1 is subjected to edge expansion processing using the neighboring image block P4.
步骤204:使用卷积神经网络模型分别对失真图片的每个第一图像块进行滤波,得到每个第一图像块对应的第二图像块。Step 204: Filter each first image block of the distorted picture by using a convolutional neural network model to obtain a second image block corresponding to each first image block.
该卷积神经网络模型可以为目前出现的任何卷积神经网络模型,也可以是预先建立的卷积神经网络模型。The convolutional neural network model can be any convolutional neural network model that currently appears, or it can be a pre-established convolutional neural network model.
卷积神经网络包括多个卷积层,每个卷积层对应一个裁边尺寸和第二扩边尺寸,该裁边尺寸等于该第二扩边尺寸。每个卷积层在对输入的第一图像块做卷积运算的过程中,根据该裁边尺寸对第一图像块进行裁边处理,以及在输出第一图像块之前,根据第二扩边尺寸对该第一图像块进行扩边处理,使得输入到卷积层的第一图像块的大小等于从卷积层输出的第一图像块的大小。The convolutional neural network includes a plurality of convolutional layers, each convolutional layer corresponding to one trim size and a second expanded size, the trim size being equal to the second expanded size. Each convolution layer performs a clipping operation on the input first image block, performs a trimming process on the first image block according to the trimming size, and according to the second expanded edge before outputting the first image block The first image block is subjected to edge expansion processing such that the size of the first image block input to the convolutional layer is equal to the size of the first image block output from the convolutional layer.
在本实施例中,当在上述第一种情况下,在执行本步骤之前可以设置每个卷积层对应的扩边尺寸,对于每个卷积层,可以设置该卷积层对应的扩边尺寸不小于0且不大于在训练卷积神经网络模型时该卷积层对应的第二扩边尺寸,即设置后的该卷积层对应的扩边尺寸大于或等于0且小于或等于该卷积层对应的第二扩边尺寸。In this embodiment, in the first case, the expansion size corresponding to each convolution layer may be set before performing this step, and for each convolution layer, the expansion of the convolution layer may be set. The size is not less than 0 and is not greater than the second expanded size corresponding to the convolutional layer when the convolutional neural network model is trained, that is, the expanded size corresponding to the convolutional layer is greater than or equal to 0 and less than or equal to the volume. The second expanded edge size corresponding to the laminate.
由于第一扩边尺寸大于或等于每个卷积层对应的第二扩边尺寸的累加值,以及卷积层的裁边尺寸等于该卷积层对应的第二扩边尺寸,这样在将第一图像块输入到卷积神经网络模型后,卷积神经网络模型输出的该第一图像块对应的 第二图像块的大小大于或等于该第一图像块对应的失真图像块的大小。Since the first expanded edge size is greater than or equal to the accumulated value of the second expanded edge size corresponding to each of the convolutional layers, and the trimming dimension of the convolutional layer is equal to the second expanded side size corresponding to the convolved layer, After the image block is input to the convolutional neural network model, the size of the second image block corresponding to the first image block output by the convolutional neural network model is greater than or equal to the size of the distorted image block corresponding to the first image block.
或者,or,
当在上述第一种情况或第二种情况下,在执行本步骤之前也可以不对每个卷积层对应的第二扩边尺寸进行设置,该卷积层对应的裁边尺寸等于该卷积层对应的第二扩边尺寸,这样在将第一图像块输入到卷积神经网络模型后,卷积神经网络模型输出的该第一图像块对应的第二图像块的大小等于该第一图像块的大小。In the first case or the second case, the second expanded size corresponding to each convolution layer may not be set before the step is performed, and the corresponding trimming size of the convolution layer is equal to the convolution a second expanded size corresponding to the layer, such that after the first image block is input to the convolutional neural network model, the size of the second image block corresponding to the first image block output by the convolutional neural network model is equal to the first image The size of the block.
在本步骤中,当使用预先建立的卷积神经网络模型时,还可以生成第一图像块对应的边信息分量,其中,边信息分量表示第一图像块相对原始图片的失真特征;将第一图像块的失真图像颜色分量以及该边信息分量,输入预先建立的卷积神经网络模型进行卷积滤波处理,得到去失真的第二图像块。In this step, when a pre-established convolutional neural network model is used, an edge information component corresponding to the first image block may also be generated, where the side information component represents a distortion feature of the first image block relative to the original image; The distorted image color component of the image block and the side information component are input to a pre-established convolutional neural network model for convolution filtering processing to obtain a de-distorted second image block.
对于采用预先建立的卷积神经网络模型的方案,为了实现该方案还提供了一种系统架构图,参见图11,包括:边信息分量生成模块11,卷积神经网络12,网络训练模块13;For the scheme of using the pre-established convolutional neural network model, a system architecture diagram is also provided for implementing the scheme. Referring to FIG. 11, the method includes: an edge information component generation module 11, a convolutional neural network 12, and a network training module 13;
其中,卷积神经网络12可以包括如下三层结构:The convolutional neural network 12 may include the following three-layer structure:
输入层处理单元121,用于接收向卷积神经网络模型输入的输入数据,本方案中输入数据包括第一图像块的失真图像颜色分量,以及第一图像块的边信息分量;并对输入的数据进行第一层的卷积滤波处理;The input layer processing unit 121 is configured to receive input data input to the convolutional neural network model, where the input data includes a distorted image color component of the first image block, and an edge information component of the first image block; The data is subjected to a convolution filtering process of the first layer;
隐含层处理单元122,对输入层处理单元121的输出数据,进行至少一层的卷积滤波处理;The hidden layer processing unit 122 performs at least one layer of convolution filtering processing on the output data of the input layer processing unit 121;
输出层处理单元123,对隐含层处理单元122的输出数据,进行最后一层的卷积滤波处理,输出结果作为去失真图像颜色分量,用于生成去失真的第二图像块。The output layer processing unit 123 performs convolution filtering processing on the output data of the hidden layer processing unit 122, and outputs the result as a de-distorted image color component for generating a de-distorted second image block.
图12为实现该解决方案的数据流的示意图,在该示意图中,将第一图像块的失真图像颜色分量,以及第一图像块的边信息分量作为输入数据,输入到预先训练的卷积神经网络模型中,卷积神经网络模型可以用预设结构的卷积神经网络和配置的网络参数集进行表示,输入数据经过输入层、隐含层和输出的卷积滤波处理之后,得到去失真的第二图像块。12 is a schematic diagram of a data flow implementing the solution, in which the distorted image color component of the first image block and the side information component of the first image block are input as input data to the pre-trained convolutional nerve In the network model, the convolutional neural network model can be represented by a convolutional neural network of a preset structure and a configured network parameter set. After the input data is subjected to convolution filtering processing of the input layer, the hidden layer, and the output, de-distortion is obtained. The second image block.
作为卷积神经网络模型的输入数据,根据实际需要,可以包括一种或多种边信息分量,也可以包括一种或多种失真图像颜色分量,例如,至少包括Y颜色分量、U颜色分量和V颜色分量之一,相应的,包括一种或多种去失真图像颜色分量。The input data as a convolutional neural network model may include one or more side information components according to actual needs, and may also include one or more distorted image color components, for example, including at least a Y color component, a U color component, and One of the V color components, correspondingly, includes one or more de-distorted image color components.
例如,在一些图像处理中,可能仅针对全部颜色分量中的一种颜色分量存在失真情况,则可以在去失真处理时,仅将失真图像块的该颜色分量作为输入数据,如两种颜色分量存在失真情况,则将失真图像块的该两种颜色分量均作为输入数据,相应的,均输出对应的去失真图像颜色分量。For example, in some image processing, there may be distortion only for one of all color components, and only the color component of the distorted image block may be used as input data, such as two color components, in the de-distortion process. If there is a distortion condition, the two color components of the distorted image block are all taken as input data, and correspondingly, the corresponding de-distorted image color components are output.
一个图像块的每个像素点的存储数据,包括该像素点的所有颜色分量的值,在获得失真图像块的失真图像颜色分量时,可以根据需要,从每个像素点的存储数据中,提取出需要的一种或多种颜色分量的值,从而得到失真图像块的失真图像颜色分量。The stored data of each pixel of an image block, including the values of all the color components of the pixel, can be extracted from the stored data of each pixel as needed when obtaining the distorted image color component of the distorted image block. The value of one or more of the desired color components is derived to obtain a distorted image color component of the distorted image block.
如图13所示,以YUV颜色空间为例,从中提取出每个像素点的Y颜色分量的值,从而得到失真图像的Y颜色分量。在图13左图中,[0,0]和[0,1]为位置,Y、U、V是像素点的三个通道失真图像颜色分量。例如位置[0,0]是一个像素点的存储数据,该存储数据中包括Y、U、V三个通道失真图像颜色分量;在图13右图中,[0,0]和[0,1]仍为位置,Y为Y通道失真图像颜色分量。As shown in FIG. 13, taking the YUV color space as an example, the value of the Y color component of each pixel is extracted therefrom, thereby obtaining the Y color component of the distorted image. In the left diagram of Fig. 13, [0, 0] and [0, 1] are positions, and Y, U, and V are three channel distortion image color components of pixel points. For example, the position [0, 0] is a stored data of one pixel, and the stored data includes three channel distortion image color components of Y, U, and V; in the right picture of FIG. 13, [0, 0] and [0, 1 ] is still the position, Y is the Y channel distortion image color component.
对于边信息分量,其表示第一图像块相对原始图片中对应的原始图像块的失真特征,是一种由图像处理过程确定的失真特征的表达。在本步骤将第一图像块对应的边信息分量作为待输入到卷积神经网络模型的输入数据For the side information component, it represents the distortion feature of the first image block relative to the original image block in the original picture, which is an expression of the distortion feature determined by the image processing process. In this step, the side information component corresponding to the first image block is used as the input data to be input to the convolutional neural network model.
在一个可选实施例中,上述失真特征可以至少包括如下失真特征之一:In an optional embodiment, the distortion feature may include at least one of the following distortion features:
失真程度、失真位置和失真类型:Distortion, distortion position, and distortion type:
首先,边信息分量可以表示失真的第一图像块相对原始图片中其对应的原始图像块的失真程度。First, the side information component can represent the degree of distortion of the distorted first image block relative to its corresponding original image block in the original picture.
其次,边信息分量也可以表示失真的第一图像块相对原始图片中对应的原始图像块的失真位置,边信息分量可以包括第一图像块中的各编码单元的边界坐标。例如在主流的视频编解码应用中,图像通常被划分为多个不重叠且不固定大小的编码单元,对编码单元分别进行预测编码及不同程度的量化处理,编码单元之间的失真通常不具有一致性,在编码单元的边界处通常会产生像素突变,因此,编码单元的边界坐标可以作为一种先验的表征失真位置的边信息分量。Second, the side information component may also represent the distorted position of the distorted first image block relative to the corresponding original image block in the original picture, and the side information component may include the boundary coordinates of each coding unit in the first image block. For example, in a mainstream video codec application, an image is usually divided into a plurality of non-overlapping and non-fixed coding units, and the coding unit is separately subjected to predictive coding and different degrees of quantization processing. The distortion between the coding units usually does not have Consistency, pixel mutations usually occur at the boundaries of the coding unit. Therefore, the boundary coordinates of the coding unit can be used as a priori edge information component to characterize the distortion position.
再次,边信息分量也可以表示失真的第一图像块相对原始图片中对应的原始图像块的失真类型,边信息分量可以包括第一图像块中的各编码单元的预测模式。例如在视频编解码应用中,图像中不同编码单元可能采用不同预测模式,不同预测模式会影响残差数据的分布,从而影响失真的第一图像块的特征,因此,编码单元的预测模式可以作为一种表征失真类型的边信息分量。Again, the side information component may also represent the distortion type of the distorted first image block relative to the corresponding original image block in the original picture, and the side information component may include the prediction mode of each coding unit in the first image block. For example, in a video codec application, different coding units in an image may adopt different prediction modes, and different prediction modes may affect the distribution of residual data, thereby affecting the characteristics of the distorted first image block. Therefore, the prediction mode of the coding unit may be used as An edge information component that characterizes the type of distortion.
可选的,边信息分量可以为上述失真程度、失真位置和失真类型中的一种或多种的组合,也可以为上述失真程度、失真位置和失真类型中的一种可以用一个或多个参量来表示,例如,经过图像处理后,可能通过一种物理含义的参量表示失真的第一图像块的失真程度,也可能通过不同物理含义的两种参量表示失真的第一图像块的失真程度,相应的,即可以根据实际需要,将一种或多种均表示失真程度的参量作为边信息分量,即作为输入数据输入到卷积神经网络模型的。Optionally, the side information component may be a combination of one or more of the foregoing distortion degree, distortion position, and distortion type, or one or more of the above distortion degree, distortion position, and distortion type may be used. The parameter indicates, for example, that after image processing, the degree of distortion of the distorted first image block may be represented by a parameter of physical meaning, or the distortion of the first image block of the distortion may be represented by two parameters of different physical meanings. Correspondingly, one or more parameters indicating the degree of distortion can be used as the side information component according to actual needs, that is, input as input data to the convolutional neural network model.
第一图像块的边信息分量可以是一个边信息引导图,是与第一图像块等高等宽的矩阵结构。该边信息分量包括第一图像块的每个像素点的边信息分量,在该矩阵中像素点的边信息分量的位置与在第一图像块中该像素点的位置相同。The side information component of the first image block may be an edge information guide map, which is a matrix structure of the same height as the first image block. The side information component includes an edge information component of each pixel of the first image block in which the position of the side information component of the pixel is the same as the position of the pixel in the first image block.
如图14所示,边信息分量的矩阵结构与失真的第一图像块颜色分量的矩阵结构相同,其中,坐标[0,0]、[0,1]表示失真位置,矩阵的元素值1表示失真程度,即边信息分量同时能表示失真程度与失真位置。As shown in FIG. 14, the matrix structure of the side information component is the same as the matrix structure of the distorted first image block color component, wherein the coordinates [0, 0], [0, 1] represent the distortion position, and the matrix element value 1 represents The degree of distortion, that is, the side information component, can simultaneously indicate the degree of distortion and the position of the distortion.
又如图15所示,坐标[0,0]、[0,1]、[2,0]、[2,4]表示失真位置,矩阵的元素值1、2表示失真类型,即边信息分量同时能表示失真程度与失真位置。As shown in Fig. 15, the coordinates [0, 0], [0, 1], [2, 0], [2, 4] represent the distortion position, and the element values 1 and 2 of the matrix represent the distortion type, that is, the side information component. At the same time, it can indicate the degree of distortion and the position of distortion.
并且,本申请实施例提供的上述解决方案中,可以同时包括图14和图15分别所示意的两个边信息分量。Moreover, in the above solution provided by the embodiment of the present application, two side information components respectively illustrated in FIG. 14 and FIG. 15 may be included.
第一图像块也是一个矩阵,矩阵中的每个元素为第一图像块中的像素点的失真图像颜色分量。像素点的失真图像颜色分量可以包括Y、U、V三个通道中的任一个或多个通道的颜色分量。The first image block is also a matrix, with each element in the matrix being the distorted image color component of the pixel in the first image block. The distorted image color component of the pixel may include the color component of any one of the three channels Y, U, V or more.
进一步的,根据方案的一个可选实施例情况和需要,当失真图像颜色分量包括多种时,边信息分量可以包括分别与每种失真图像颜色分量对应的边信息分量。Further, according to an alternative embodiment of the solution and the need, when the distorted image color component includes a plurality of, the side information component may include side information components respectively corresponding to each of the distorted image color components.
也就是说:在第一图像块的边信息分量中像素点的边信息分量包括该像素点中的每个失真图像颜色分量对应的边信息分量。That is to say: the side information component of the pixel in the side information component of the first image block includes the side information component corresponding to each of the distortion image color components in the pixel.
本申请实施例提供的上述解决方案,可以应用于目前已知的各种一个可选实施例场景中,例如,可应用于对图像进行超分辨率处理的应用场景中,本发明在此不做限定。The above solution provided by the embodiment of the present application can be applied to various alternative embodiments of the presently known embodiments, for example, in an application scenario in which an image is subjected to super-resolution processing, and the present invention does not limited.
对于采用预先建立的卷积神经网络模型滤波的方案,参见图16,具体包括如下处理步骤:For the scheme of filtering using the pre-established convolutional neural network model, referring to FIG. 16, the following processing steps are specifically included:
本步骤可以通过如下两个61和62步骤来生成第一图像块的边信息分量,分 别为In this step, the side information components of the first image block can be generated by the following two steps 61 and 62, respectively.
步骤61、针对待处理的任一个第一图像块,确定第一图像块中的每个像素点的失真程度值。Step 61: Determine, for each of the first image blocks to be processed, a distortion level value of each pixel in the first image block.
在一个可选实施例中,对原始图像进行不同方式的图像处理之后,表示失真程度的物理参量也可能不同,因此,本步骤中,可以基于不同的图像处理方式,确定出对应的能够准确表示像素点失真程度的失真程度值,具体可以如下:In an optional embodiment, after the image processing of the original image is performed in different manners, the physical parameters indicating the degree of distortion may also be different. Therefore, in this step, the corresponding image processing manner may be determined to accurately represent the corresponding image. The degree of distortion of the degree of pixel distortion can be as follows:
第一种方式:针对通过编解码得到的第一图像块,第一图像块中的每个编码单元的量化参数都是已知的,即可以获取第一图像块中的每个编码单元的量化参数,将第一图像块的每个像素点所在编码单元的量化参数,确定为第一图像块的每个像素点的失真程度值;The first way: for the first image block obtained by the codec, the quantization parameter of each coding unit in the first image block is known, that is, the quantization of each coding unit in the first image block can be obtained. a parameter, determining a quantization parameter of a coding unit where each pixel of the first image block is located as a distortion level value of each pixel of the first image block;
在视频编码系统中量化单元中包括第一图像块中的每个编码单元的量化参数,所以可以从量化单元中获取第一图像块中的每个编码单元的量化参数。The quantization parameter of each coding unit in the first image block is included in the quantization unit in the video coding system, so the quantization parameter of each coding unit in the first image block can be acquired from the quantization unit.
第二种方式:针对通过编解码得到的第一图像块,第一图像块中的每个编码单元的编码信息都是已知的,即可以获取第一图像块中的每个编码单元的编码信息,根据第一图像块中的每个编码单元的编码信息计算出每个编码单元的量化参数,将第一图像块的每个像素点所在编码单元的量化参数,确定为第一图像块的每个像素点的失真程度值。The second way: for the first image block obtained by the codec, the coding information of each coding unit in the first image block is known, that is, the coding of each coding unit in the first image block can be obtained. And calculating, according to the coding information of each coding unit in the first image block, a quantization parameter of each coding unit, determining a quantization parameter of a coding unit where each pixel of the first image block is located as a first image block. The degree of distortion of each pixel.
当前原始视频图片中包括每个编码单元的编码信息,可以从当前原始视频图片中获取第一图像块中的每个编码单元的编码信息。The encoding information of each coding unit is included in the current original video picture, and the coding information of each coding unit in the first image block may be acquired from the current original video picture.
步骤62、基于第一图像块中的各像素点的位置,使用获取的各像素点的失真程度值,生成第一图像块对应的边信息分量,其中,边信息分量包括的每个分量值与第一图像块上相同位置的像素点相对应,即该边信息分量在第一图像块中的边信息分量的位置与该像素点在第一图像块中的位置相同。Step 62: Generate, according to the position of each pixel point in the first image block, a side information component corresponding to the first image block by using the obtained distortion degree value of each pixel point, where each component value included in the side information component is The pixel at the same position on the first image block corresponds to the position of the side information component of the side information component in the first image block being the same as the position of the pixel point in the first image block.
由于边信息分量包括的每个分量值与第一图像块上相同位置的像素点相对应,则边信息分量与第一图像块的失真图像颜色分量的结构相同,即表示边信息分量的矩阵与表示第一图像块颜色分量的矩阵是同型的。Since each component value included in the side information component corresponds to a pixel point of the same position on the first image block, the side information component has the same structure as the distortion image color component of the first image block, that is, a matrix representing the side information component. The matrix representing the color component of the first image block is of the same type.
本步骤中,可以基于第一图像块中的各像素点的位置,将获取的每个像素点的失真程度值,确定为第一图像块对应的边信息分量中该像素点相同位置的分量值,即直接将每个像素点的失真程度值,确定为该像素点对应的分量值。In this step, the obtained distortion level value of each pixel point may be determined as the component value of the same position of the pixel information in the side information component corresponding to the first image block, based on the position of each pixel point in the first image block. That is, the distortion degree value of each pixel is directly determined as the component value corresponding to the pixel.
当第一图像块的像素值范围与像素点的失真程度值的取值范围不同时,也可以基于第一图像块的像素值范围,对获取的各像素点的失真程度值进行标准化处理,得到处理后失真程度值,处理后失真程度值的取值范围与像素值范围 相同;When the pixel value range of the first image block is different from the value range of the pixel degree distortion value, the obtained distortion degree value of each pixel point may be normalized based on the pixel value range of the first image block, thereby obtaining The degree of distortion after processing, the range of values of the distortion level after processing is the same as the range of pixel values;
然后基于第一图像块中的各像素点的位置,将每个像素点的处理后失真程度值,确定为第一图像块对应的边信息分量中该像素点相同位置的分量值。Then, based on the position of each pixel point in the first image block, the processed distortion level value of each pixel point is determined as the component value of the same position of the pixel point in the side information component corresponding to the first image block.
本步骤中,可以采用如下公式对像素点的失真程度值进行标准化处理:In this step, the distortion degree value of the pixel point can be standardized by the following formula:
Figure PCTCN2019072412-appb-000005
Figure PCTCN2019072412-appb-000005
其中,norm(x)为标准化处理后得到的处理后失真程度值,x为像素点的失真程度值,第一图像块的像素值范围为[PIXEL MIN,PIXEL MAX],像素点的失真程度值的取值范围为[QP MIN,QP MAX]。 Where, norm(x) is the processed distortion degree value obtained after the normalization process, and x is the distortion degree value of the pixel point, and the pixel value range of the first image block is [PIXEL MIN , PIXEL MAX ], and the distortion degree value of the pixel point The range of values is [QP MIN , QP MAX ].
通过上述两个步骤,即生成了第一图像块的边信息分量,生成边信息分量的过程,也可以理解为生成了第一图像块对应的边信息引导图,该边信息引导图通过其边信息分量表示第一图像块的失真程度,且该边信息引导图与第一图像块是等高等宽的。Through the above two steps, that is, the process of generating the side information component by generating the side information component of the first image block, it can also be understood that the side information guide map corresponding to the first image block is generated, and the side information guide map passes through the side thereof. The information component represents the degree of distortion of the first image block, and the side information guide map is of equal width and width to the first image block.
本发明实施例中,以卷积神经网络模型包括输入层、隐含层和输出层的结构为例,针对待处理的任一个第一图像块,使用卷积神经网络对第一图像块进行滤波得到去失真的第二图像块,方案描述如下。In the embodiment of the present invention, the convolutional neural network model includes an input layer, an implicit layer, and an output layer as an example, and the first image block is filtered by using a convolutional neural network for any first image block to be processed. A de-distorted second image block is obtained, the scheme being described as follows.
步骤63、针对待处理的任一个第一图像块,将第一图像块的失真图像颜色分量以及生成的边信息分量,作为预先建立的卷积神经网络模型的输入数据,由输入层进行第一层的卷积滤波处理,得到以稀疏形式表示的图像块,输出该以稀疏形式表示的图像块。Step 63: For any one of the first image blocks to be processed, the distortion image color component of the first image block and the generated side information component are used as input data of a pre-established convolutional neural network model, and are first performed by the input layer. The convolution filtering process of the layer obtains an image block expressed in a sparse form, and outputs the image block expressed in a sparse form.
在本步骤中,在卷积神经网络模型中,输入数据可以是通过各自的通道输入到网络中,本步骤中,可以将c v个通道的第一图像块颜色分量Y与c m个通道的边信息分量M,在通道的维度上进行合并,共同组成c v+c m个通道的输入数据I,并采用如下公式对输入数据I进行多维卷积滤波和非线性映射,产生n 1个以稀疏形式表示的图像块: In this step, in the convolutional neural network model, the input data may be input to the network through respective channels. In this step, the first image block color component Y and c m channels of the c v channels may be The side information component M is combined in the dimension of the channel to form the input data I of c v +c m channels, and multidimensional convolution filtering and nonlinear mapping are performed on the input data I by using the following formula to generate n 1 Image blocks represented by sparse forms:
F 1(I)=g(W 1*I+B 1); F 1 (I)=g(W 1 *I+B 1 );
其中,F 1(I)为输入层的输出(为以稀疏形式表示的图像块),I为输入层中卷积层的输入,*为卷积操作,W 1为输入层的卷积层滤波器组的权重系数,B 1为输入层的卷积层滤波器组的偏移系数,g()为非线性映射函数。 Where F 1 (I) is the output of the input layer (for image blocks expressed in sparse form), I is the input of the convolution layer in the input layer, * is the convolution operation, and W 1 is the convolution layer of the input layer. The weight coefficient of the group, B 1 is the offset coefficient of the convolution layer filter bank of the input layer, and g() is a nonlinear mapping function.
其中,W 1对应于n 1个卷积滤波器,即有n 1个卷积滤波器作用于输入层的卷积层的输入,输出n 1个图像块;每个卷积滤波器的卷积核的大小为c 1×f 1×f 1, 其中c 1为输入通道数,f 1为每个卷积核在空间上的大小。 Wherein, W 1 corresponds to n 1 convolution filters, that is, n 1 convolution filters are applied to the input of the convolution layer of the input layer, and n 1 image blocks are output; convolution of each convolution filter The size of the kernel is c 1 ×f 1 ×f 1 , where c 1 is the number of input channels and f 1 is the spatial size of each convolution kernel.
接下来举一个实例,在该实例中,该输入层的参数可以为:c 1=2,f 1=5,n 1=64,使用ReLU(Rectified linear unit)函数作为g(),它的函数表达式为: Next, an example is given. In this example, the parameters of the input layer can be: c 1 = 2, f 1 = 5, n 1 = 64, and a ReLU (Rectified linear unit) function is used as g(), its function. The expression is:
g(x)=max(0,x);g(x)=max(0,x);
则该实施例中输入层卷积处理表达式为:Then the input layer convolution processing expression in this embodiment is:
F 1(I)=max(0,W 1*I+B 1); F 1 (I)=max(0, W 1 *I+B 1 );
步骤64、隐含层对输入层输出的以稀疏形式表示的图像块F 1(I)进行进一步的高维映射,得到高维图像块并输出该高维图像块。 Step 64: The hidden layer performs further high-dimensional mapping on the image block F 1 (I) expressed by the input layer in a sparse form to obtain a high-dimensional image block and outputs the high-dimensional image block.
本发明实施例中,不对隐含层中包含的卷积层层数、卷积层连接方式、卷积层属性等作限定,可以采用目前已知的各种结构,但隐含层中包含至少1个卷积层。In the embodiment of the present invention, the convolution layer number, the convolution layer connection mode, the convolution layer attribute, and the like included in the hidden layer are not limited, and various structures known at present may be adopted, but the hidden layer includes at least 1 convolution layer.
例如,隐含层包含N-1(N≥2)层卷积层,隐含层处理由下式表示:For example, the hidden layer contains a N-1 (N ≥ 2) layer convolutional layer, and the hidden layer processing is represented by:
F i(I)=g(W i*F i-1(I)+B i),i∈{2,3,…,N}; F i (I)=g(W i *F i-1 (I)+B i ), i∈{2,3,...,N};
其中,F i(I)表示卷积神经网络中第i层卷积层的输出,*为卷积操作,W i为第i层卷积层滤波器组的权重系数,B i为卷积层滤波器组的偏移系数,g()为非线性映射函数。 Where F i (I) represents the output of the i-th layer convolutional layer in the convolutional neural network, * is the convolution operation, W i is the weight coefficient of the i-th layer convolutional layer filter bank, and B i is the convolutional layer The offset coefficient of the filter bank, g() is a nonlinear mapping function.
其中,W i对应于n i个卷积滤波器,即有n i个卷积滤波器作用于第i层卷积层的输入,输出n i个图像块;每个卷积滤波器的卷积核的大小为c i×f i×f i,其中c i为输入通道数,f i为每个卷积核在空间上的大小。 Wherein, W i corresponds to n i convolution filters, that is, n i convolution filters are applied to the input of the i-th convolution layer, and n i image blocks are output; convolution of each convolution filter The size of the kernel is c i ×f i ×f i , where c i is the number of input channels and f i is the spatial size of each convolution kernel.
接下来举一个实例,在该实例中,该隐含层可以包括1个卷积层,该卷积层的卷积滤波器参数为:c 2=64,f 2=1,n 2=32,使用ReLU(Rectified linear unit)函数作为g(),则该实施例中隐含层的卷积处理表达式为: Next, an example is given. In this example, the hidden layer may include a convolution layer whose convolution filter parameters are: c 2 = 64, f 2 =1, n 2 = 32, Using the ReLU (Rectified Linear Unit) function as g(), the convolution processing expression of the hidden layer in this embodiment is:
F 2(I)=max(0,W 2*F 1(I)+B 2); F 2 (I)=max(0, W 2 *F 1 (I)+B 2 );
步骤65、输出层对隐含层输出的高维图像块F N(I)进行聚合,输出第一图像块的去失真图像颜色分量,用于生成去失真的第二图像块。 Step 65: The output layer aggregates the high-dimensional image block F N (I) output by the hidden layer, and outputs the de-distorted image color component of the first image block, for generating the de-distorted second image block.
本发明实施例中不对输出层的结构作限定,输出层可以是Residual Learning结构,也可以是Direct Learning结构,或者其他的结构。The structure of the output layer is not limited in the embodiment of the present invention, and the output layer may be a Residual Learning structure, a Direct Learning structure, or other structures.
采用Residual Learning结构的处理如下:The processing using the Residual Learning structure is as follows:
对隐含层输出的高维图像块进行卷积操作获取补偿残差,再与输入的失真图像颜色分量相加,得到去失真图像颜色分量,即得到去失真的第二图像块。输出层处理可由下式表示:The convolution operation is performed on the high-dimensional image block outputted by the hidden layer to obtain the compensation residual, and then added to the input distortion image color component to obtain the de-distorted image color component, that is, the de-distorted second image block is obtained. The output layer processing can be expressed by the following formula:
F(I)=W N+1*F N(I)+B N+1+Y; F(I)=W N+1 *F N (I)+B N+1 +Y;
其中,F(I)为输出层输出的去失真图像颜色分量,F N(I)为隐含层的输出(为高维图像块),*为卷积操作,W N+1为输出层的卷积层滤波器组的权重系数,B N+1为输出层的卷积层滤波器组的偏移系数,Y为未经过卷积滤波处理、欲进行去失真处理的失真图像颜色分量。 Where F(I) is the de-distorted image color component of the output layer, F N (I) is the output of the hidden layer (which is a high-dimensional image block), * is the convolution operation, and W N+1 is the output layer. The weight coefficient of the convolution layer filter bank, B N+1 is the offset coefficient of the convolution layer filter bank of the output layer, and Y is the distorted image color component that is not subjected to convolution filtering processing and is to be subjected to de-distortion processing.
其中,W N+1对应于n N+1个卷积滤波器,即有n N+1个卷积滤波器作用于第N+1层卷积层的输入,输出n N+1个图像块,n N+1为输出的去失真图像颜色分量个数,一般与输入的失真图像颜色分量的个数相等,如果只输出一种去失真图像颜色分量,则n N+1一般取值为1;每个卷积滤波器的卷积核的大小为c N+1×f N+1×f N+1,其中c N+1为输入通道数,f N+1为每个卷积核在空间上的大小。 Wherein, W N+1 corresponds to n N+1 convolution filters, that is, n N+1 convolution filters are applied to the input of the N+1th convolution layer, and n N+1 image blocks are output. , n N+1 is the number of output de-distorted image color components, generally equal to the number of input distortion image color components. If only one de-distorted image color component is output, n N+1 is generally 1 The size of the convolution kernel of each convolution filter is c N+1 ×f N+1 ×f N+1 , where c N+1 is the number of input channels and f N+1 is the number of each convolution kernel The size of the space.
采用Direct Learning结构的处理如下:The processing using the Direct Learning structure is as follows:
对隐含层的输出进行卷积操作后直接输出去失真图像颜色分量,即得到去失真的第二图像块。输出层处理可由下式表示:After the convolution operation of the output of the hidden layer, the de-distorted image color component is directly output, that is, the de-distorted second image block is obtained. The output layer processing can be expressed by the following formula:
F(I)=W N+1*F N(I)+B N+1F(I)=W N+1 *F N (I)+B N+1 ;
其中,F(I)为输出层输出,F N(I)为隐含层的输出,*为卷积操作,W N+1为输出层的卷积层滤波器组的权重系数,B N+1为输出层的卷积层滤波器组的偏移系数。 Where F(I) is the output of the output layer, F N (I) is the output of the hidden layer, * is the convolution operation, and W N+1 is the weight coefficient of the convolutional layer filter bank of the output layer, B N+ 1 is the offset coefficient of the convolution layer filter bank of the output layer.
其中,W N+1对应于n N+1个卷积滤波器,即有n N+1个卷积滤波器作用于第N+1层卷积层的输入,输出n N+1个图像块,n N+1为输出的去失真图像颜色分量个数,一般与输入的失真图像颜色分量的个数相等,如果只输出一种去失真图像颜色分量,则n N+1一般取值为1;每个卷积滤波器的卷积核的大小为c N+1×f N+1×f N+1,其中c N+1为输入通道数,f N+1为每个卷积核在空间上的大小。 Wherein, W N+1 corresponds to n N+1 convolution filters, that is, n N+1 convolution filters are applied to the input of the N+1th convolution layer, and n N+1 image blocks are output. , n N+1 is the number of output de-distorted image color components, generally equal to the number of input distortion image color components. If only one de-distorted image color component is output, n N+1 is generally 1 The size of the convolution kernel of each convolution filter is c N+1 ×f N+1 ×f N+1 , where c N+1 is the number of input channels and f N+1 is the number of each convolution kernel The size of the space.
接下来举一个实例,在该实例中,该输出层采用Residual Learning结构, 输出层包括1个卷积层,该输出层的卷积滤波器参数为:c 3=32,f 3=3,n 3=1,则该实施例中输出层的卷积处理表达式为: Next, an example is shown. In this example, the output layer adopts a Residual Learning structure, and the output layer includes a convolution layer. The convolution filter parameters of the output layer are: c 3 =32, f 3 =3,n 3 =1, the convolution processing expression of the output layer in this embodiment is:
F(I)=W 3*F 3(I)+B 3+Y。 F(I)=W 3 *F 3 (I)+B 3 +Y.
其中,需要说明的是:在本实施例中可以同时对多个失真图像块进行滤波,从而可以实现并行化滤波,提高视频编码的效率。It should be noted that, in this embodiment, multiple distortion image blocks can be filtered at the same time, so that parallelization filtering can be implemented, and the efficiency of video coding is improved.
在本发明实施例提供的上述解决方案中,还提出了一种卷积神经网络模型训练方法,如图17所示,具体包括如下处理步骤:In the above solution provided by the embodiment of the present invention, a method for training a convolutional neural network model is also proposed, as shown in FIG. 17, which specifically includes the following processing steps:
步骤71、获取预设训练集,预设训练集包括原始样本图像,以及原始样本图像对应的多个失真图像的失真图像颜色分量,以及每个失真图像对应的边信息分量,其中,失真图像对应的边信息分量表示该失真图像相对原始样本图像的失真特征。该多个失真图像的失真特征不同。Step 71: Acquire a preset training set, where the preset training set includes an original sample image, and a distortion image color component of the plurality of distortion images corresponding to the original sample image, and an edge information component corresponding to each of the distortion images, where the distortion image corresponds to The side information component represents the distorted feature of the distorted image relative to the original sample image. The distortion characteristics of the plurality of distorted images are different.
本步骤中,可以预先对原始样本图像(即未失真的自然图像)进行不同失真程度的图像处理,得到各原始样本图像对应的失真图像,并按照上述去失真方法中的步骤,针对每个失真图像,生成该失真图像对应的边信息分量,这样对于每个原始样本图像,将该原始样本图像、该原始样本图像对应的失真图像以及该失真图像对应的边信息分量组成图像对,由这些图像对组成预设训练集Ω。由于对该原始样本图像进行不同失真程度的图像处理,所以该原始样本图像可以对应多个失真图像。In this step, the original sample image (ie, the undistorted natural image) may be subjected to image processing with different degrees of distortion to obtain a distortion image corresponding to each original sample image, and according to the steps in the above-described de-distortion method, for each distortion And generating an edge information component corresponding to the distortion image, so that for each original sample image, the original sample image, the distortion image corresponding to the original sample image, and the side information component corresponding to the distortion image are combined into an image pair, and the image is composed of the image The preset training set Ω is composed. Since the original sample image is subjected to image processing of different degrees of distortion, the original sample image may correspond to a plurality of distorted images.
进一步的,训练集可以包括一个原始样本图像,针对该原始样本图像进行上述图像处理,得到失真特征不同的多个失真图像,以及每个失真图像对应的边信息分量;即该训练集包括包括该一个原始样本图像,该一个原始样本图像对应的多个失真图像和每个失真图像对应的边信息分量。Further, the training set may include an original sample image, and performing image processing on the original sample image to obtain a plurality of distortion images having different distortion characteristics, and side information components corresponding to each distortion image; that is, the training set includes the An original sample image, the plurality of distorted images corresponding to the one original sample image and the side information components corresponding to each of the distorted images.
训练集也可以包括多个原始样本图像,分别针对每个原始样本图像进行上述图像处理,得到失真特征不同的多个失真图像,以及每个失真图像对应的边信息分量;即该训练集包括包括每个原始样本图像,该原始样本图像对应的多个失真图像和该原始样本图像的每个失真图像对应的边信息分量。The training set may also include a plurality of original sample images, respectively performing image processing on each of the original sample images to obtain a plurality of distorted images having different distortion characteristics, and side information components corresponding to each distorted image; that is, the training set includes Each of the original sample images, the plurality of distorted images corresponding to the original sample image and the side information component corresponding to each of the distorted images of the original sample image.
步骤72、针对预设结构的卷积神经网络CNN,初始化该卷积神经网络CNN的网络参数集中的参数,初始化的参数集可以由θ 1表示,初始化的参数可以根据实际需要和经验进行设置。 Step 72: Initialize the parameters of the network parameter set of the convolutional neural network CNN for the convolutional neural network CNN of the preset structure. The initialized parameter set may be represented by θ 1 , and the initialized parameters may be set according to actual needs and experience.
本步骤中,还可以对训练相关的高层参数如学习率、梯度下降算法等进行合理的设置,具体可以采用上述提及的方式设置,还可以采用其他方式设置,在此不再进行详细描述。In this step, the training-related high-level parameters, such as the learning rate and the gradient descent algorithm, may be appropriately set, and may be set in the manner mentioned above, or may be set in other manners, and will not be described in detail herein.
步骤73、进行前向计算。Step 73: Perform forward calculation.
可选的,将预设训练集中的每个失真图像的失真图像颜色分量以及对应的边信息分量输入到预设结构的卷积神经网络进行卷积滤波处理,得到该失真图像对应的去失真图像颜色分量。Optionally, the distortion image color component of each of the distortion images in the preset training set and the corresponding side information component are input to a convolutional neural network of a preset structure to perform convolution filtering processing, to obtain a de-distorted image corresponding to the distortion image. Color component.
本步骤中,具体可以为对预设训练集Ω进行参数集为θ i的卷积神经网络CNN的前向计算,获取卷积神经网络的输出F(Y),即每个失真图像对应的去失真图像颜色分量。 In this step, specifically, the forward calculation of the convolutional neural network CNN with the parameter set θ i is performed on the preset training set Ω, and the output F(Y) of the convolutional neural network is obtained, that is, the corresponding image of each distortion image Distorted image color component.
第一次进入本步骤处理时,当前参数集为θ 1,后续再次进入本步骤处理时,当前参数集θ i为对上一次使用的参数集θ i-1进行调整后得到的,详见后续描述。 The first time you enter this step, the current parameter set is θ 1 . When you enter this step again, the current parameter set θ i is obtained by adjusting the parameter set θ i-1 used last time. description.
步骤74、基于多个原始样本图像的原始图像颜色分量和得到的去失真图像颜色分量,确定多个原始样本图像的损失值。Step 74: Determine a loss value of the plurality of original sample images based on the original image color component of the plurality of original sample images and the obtained de-distorted image color component.
具体可以使用均方误差(MSE)公式作为损失函数,得到损失值L(θ i),详见如下公式: Specifically, the mean square error (MSE) formula can be used as the loss function to obtain the loss value L(θ i ), as shown in the following formula:
Figure PCTCN2019072412-appb-000006
Figure PCTCN2019072412-appb-000006
其中,H表示单次训练中从预设训练集中选取的图像对个数,I h表示第h个失真图像对应的由边信息分量和失真图像颜色分量合并后的输入数据,F(I hi)表示针对第h个失真图像,卷积神经网络CNN在参数集θ i下前向计算得到的去失真图像颜色分量,X h表示第h个失真图像对应的原始图像颜色分量,i为当前已进行前向计算的次数计数。 Where H represents the number of pairs of images selected from the preset training set in a single training, and I h represents the input data of the combined edge component and the distorted image color component corresponding to the hth distorted image, F(I h | θ i ) represents the de-distorted image color component calculated by the convolutional neural network CNN forwardly under the parameter set θ i for the h-th distorted image, and X h represents the original image color component corresponding to the h-th distorted image, i is The number of times the forward calculation has been performed is currently counted.
步骤75、基于损失值确定采用当前参数集的该预设结构的卷积神经网络是否收敛,如果不收敛,进入步骤76,如果收敛,进入步骤77。Step 75: Determine whether the convolutional neural network of the preset structure adopting the current parameter set is converged based on the loss value. If not, proceed to step 76. If it converges, proceed to step 77.
可选的,可以当损失值小于预设损失值阈值时,确定收敛,例如,该多个原始样本图像中的每个原始样本图像的损失值均小于预设损失值阈值,确定收敛,或者,再例如,该多个原始样本图像中的任一个原始样本图像的损失值小于预设损失值阈值,确定收敛;也可以当本次计算得到损失值与上一次计算得到的损失值之间的差值小于预设变化阈值时,确定收敛,例如对于每个原始样本图像,计算本次得到的该原始样本图像的损失值与上一次计算得到该原始样本图像的损失值之间的差值,即计算得到每个原始样本图像的差值,在每个原始样本图像的差值均小于预设变化阈值时,确定收敛,或者,在任一个原始样本图像的差值小于预设变化阈值时,确定收敛,本发明在此不做限定。Optionally, the convergence may be determined when the loss value is less than the preset loss value threshold. For example, the loss value of each original sample image in the plurality of original sample images is less than a preset loss value threshold, determining convergence, or For another example, the loss value of the original sample image of the plurality of original sample images is less than a preset loss value threshold, and the convergence is determined; or the difference between the loss value and the last calculated loss value may be calculated by the current calculation. When the value is less than the preset change threshold, the convergence is determined. For example, for each original sample image, the difference between the loss value of the original sample image obtained this time and the loss value of the original sample image obtained last time is calculated, that is, Calculating the difference of each original sample image, determining convergence when the difference of each original sample image is less than a preset change threshold, or determining convergence when the difference of any original sample image is less than a preset change threshold The invention is not limited herein.
步骤76,对当前参数集中的参数进行调整,得到调整后的参数集,然后进入步骤73,用于下一次前向计算。In step 76, the parameters in the current parameter set are adjusted to obtain an adjusted parameter set, and then proceeds to step 73 for the next forward calculation.
具体可以利用反向传播算法对当前参数集中的参数进行调整。Specifically, the back propagation algorithm can be used to adjust the parameters in the current parameter set.
步骤77、将当前参数集作为输出的最终参数集θ final,并将采用最终参数集θ final的该预设结构的卷积神经网络,作为训练完成的卷积神经网络模型。 Step 77: The current parameter set is taken as the final parameter set θ final of the output, and the convolutional neural network of the preset structure adopting the final parameter set θ final is used as the trained convolutional neural network model.
步骤205:根据每个第一图像块对应的第二图像块生成一帧图片。Step 205: Generate a frame of picture according to the second image block corresponding to each first image block.
在本步骤中,如果将卷积神经网络模型中的每个卷积层对应的第二扩边尺寸设置为零,且第一扩边尺寸等于每个卷积层对应的第二扩边尺寸的累加值,则得到的每个第一图像块对应的第二图像块与每个第一图像块对应的失真图像块等宽等高,则可以根据每个第一图像块对应的失真图像块在失真图片中的位置,将每个第一图像块对应的第二图像块组成一帧去失真图片,将该帧去失真图片缓存在缓存器中作为一帧参考图片。In this step, if the second expanded size corresponding to each convolutional layer in the convolutional neural network model is set to zero, and the first expanded size is equal to the second expanded size corresponding to each of the convolutional layers The accumulated value is obtained, and the obtained second image block corresponding to each first image block is equal in width to the distortion image block corresponding to each first image block, and may be according to the distortion image block corresponding to each first image block. In the position of the distorted picture, the second image block corresponding to each first image block is composed into a frame de-distorted picture, and the frame de-distorted picture is buffered in the buffer as a frame reference picture.
可选的,对于每行最后两个第二图像块,如果该两个第二图像块分别对应的失真图像块存在部分重叠,则在组成一帧去失真图片之前,还可以从最后一个第二图像块中去除重叠部分。和/或,对于每列最后两个第二图像块,如果该两个第二图像块分别对应的失真图像块存在部分重叠,则在组成一帧去失真图片之前,还可以从最后一个第二图像块中去除重叠部分。然后再组成一帧去失真图片。Optionally, for the last two second image blocks in each row, if the corresponding distorted image blocks of the two second image blocks respectively partially overlap, before the frame de-distorting picture is formed, the last second The overlapping portion is removed from the image block. And/or, for the last two second image blocks in each column, if the corresponding distorted image blocks of the two second image blocks respectively partially overlap, before the one frame de-distorted picture is formed, the last second The overlapping portion is removed from the image block. Then form a frame to de-distort the picture.
或者,or,
在本步骤中,如果没有设置卷积神经网络模型中的每个卷积层对应的扩边尺寸,也就是每个卷积层对应的裁边尺寸和第二扩边尺寸相等,则得到的每个第一图像块对应的第二图像块与第一图像块等宽等高,则可以根据第一扩边尺寸对每个第一图像块对应的第二图像块进行裁边处理,得到每个第一图像块对应的去失真图像块,根据每个第一图像块对应的失真图像块在失真图片中的位置,将每个第一图像块对应的去失真图像块组成一帧去失真图片,将该帧去失真图片缓存在缓存器中作为一帧参考图片。In this step, if the expanded edge size corresponding to each convolutional layer in the convolutional neural network model is not set, that is, the corresponding trimming size and the second expanded side size of each convolutional layer are equal, each obtained The second image block corresponding to the first image block is equal in width to the first image block, and the second image block corresponding to each first image block may be trimmed according to the first expanded size to obtain each De-distorting image blocks corresponding to the first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture, the de-distorted image blocks corresponding to each first image block are combined into one frame de-distorted picture, The frame de-distorted picture is buffered in the buffer as a frame reference picture.
其中,在进行裁边处理时,对于任一个第一图像块对应的第二图像块,确定该第二图像块中被扩边处理的边缘,根据该第一扩边尺寸,对该第二图像块中确定的边缘进行裁边处理,得到该第一图像块对应的去失真图像块,且裁去的边缘的宽度等于第一扩边尺寸。Wherein, when the edge trimming process is performed, for the second image block corresponding to any one of the first image blocks, the edge of the second image block that is subjected to the edge expansion processing is determined, and the second image is determined according to the first expanded edge size. The edge determined in the block is trimmed to obtain a de-distorted image block corresponding to the first image block, and the width of the cut edge is equal to the first expanded size.
或者,or,
在本步骤中,对于卷积神经网络模型中的每个卷积层,如果设置该卷积层对应的扩边尺寸大于0且小于该卷积层对应的第二扩边尺寸,也就是该卷积层对应的裁边尺寸大于大于该卷积层对应的扩边尺寸,则对于滤波得到的第一图像块对应的第二图像块,第二图像块的大小小于第一图像块的大小且大于第一图像块对应的失真图像块的大小,计算每个卷积层对应的扩边尺寸的累加值,以及计算第一扩边尺寸与该累加值之间的差值,根据该差值对每个第一图像块对应的第二图像块进行裁边处理,得到每个第一图像块对应的去失真图像块,根据每个第一图像块对应的失真图像块在失真图片中的位置,将每个第一图像块对应的去失真图像块组成一帧去失真图片,将该帧去失真图片缓存在缓存器中作为一帧参考图片。In this step, for each convolutional layer in the convolutional neural network model, if the convolutional layer corresponding to the convolutional layer is set to be larger than 0 and smaller than the second expanded dimension corresponding to the convolutional layer, that is, the volume The size of the second image block is smaller than the size of the first image block and larger than the size of the first image block corresponding to the first image block obtained by the filtering. a size of the distorted image block corresponding to the first image block, calculating an accumulated value of the expanded edge size corresponding to each convolution layer, and calculating a difference between the first expanded size and the accumulated value, according to the difference The second image block corresponding to the first image block is subjected to trimming processing to obtain a de-distorted image block corresponding to each first image block, according to the position of the distorted image block corresponding to each first image block in the distorted picture, The de-distorted image block corresponding to each first image block constitutes a frame de-distorted picture, and the frame-de-distorted picture is buffered in the buffer as a frame reference picture.
其中,在进行裁边处理时,对于任一个第一图像块对应的第二图像块,确定该第二图像块中被扩边处理的边缘,根据该差值,对该第二图像块中确定的边缘进行裁边处理,得到该第一图像块对应的去失真图像块,且裁去的边缘的宽度等于该差值。Wherein, when the edge trimming process is performed, for the second image block corresponding to any one of the first image blocks, the edge of the second image block that is subjected to the edge expansion processing is determined, and the second image block is determined according to the difference. The edge is trimmed to obtain a de-distorted image block corresponding to the first image block, and the width of the cropped edge is equal to the difference.
可选的,对于每行最后两个去失真图像块,如果该两个去失真图像块分别对应的失真图像块存在部分重叠,则在组成一帧去失真图片之前,还可以从最后一个去失真图像块中去除重叠部分。和/或,对于每列最后两个去失真图像块,如果该两个去失真图像块分别对应的失真图像块存在部分重叠,则在组成一帧去失真图片之前,还可以从最后一个去失真图像块中去除重叠部分。然后再组成一帧去失真图片。Optionally, for the last two de-distorted image blocks in each row, if the distorted image blocks corresponding to the two de-distorted image blocks respectively overlap partially, the distortion may be de-distorted before forming a frame-de-distorted picture. The overlapping portion is removed from the image block. And/or, for the last two de-distorted image blocks in each column, if the distorted image blocks corresponding to the two de-distorted image blocks respectively overlap partially, the distortion can be de-distorted from the last before composing a frame-de-distorted picture. The overlapping portion is removed from the image block. Then form a frame to de-distort the picture.
在本申请实施例中,通过对视频编码过程中生成的失真图片进行划分,获取多个失真图像块,再使用卷积神经网络模型可以同时对一个或多个失真图像块进行滤波,得到每个失真图像块对应的去失真图像块,根据每个失真图像块对应的去失真图像块生成一帧去失真图片。生成的一帧去失真图片为滤波后的图片,由于使用卷积神经网络滤波对失真图像块进行滤波,这样相比对整帧失真图片进行滤波,可以减小滤波所需要的资源,从而使设备能够满足滤波所需的资源。另外,又可以同时对多个失真图像块进行滤波,这样可以提高滤滤效率,以及提高视频编码效率。In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each The de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering. In addition, multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video coding efficiency.
参见图18,本申请实施例提供了一种图片滤波的方法,该方法可以对解码过程中生成的失真图片进行滤波,包括:Referring to FIG. 18, an embodiment of the present application provides a method for filtering a picture, which may filter a distortion picture generated during a decoding process, including:
步骤301:获取视频解码过程中生成的失真图片。Step 301: Acquire a distorted picture generated during video decoding.
在视频解码过程中会生成重构图片,失真图片可以为该重构图片,或者可以为对该重构图片进行滤波后得到的图片。A reconstructed picture may be generated during the video decoding process, and the distorted picture may be the reconstructed picture, or may be a picture obtained by filtering the reconstructed picture.
参见图19所示的视频解码系统的结构示意图,视频解码系统包括预测模块、熵解码器、反量化单元、反变换单元、重建单元、卷积神经网络模型CNN和缓存器等部分组成。Referring to the structure diagram of the video decoding system shown in FIG. 19, the video decoding system includes a prediction module, an entropy decoder, an inverse quantization unit, an inverse transform unit, a reconstruction unit, a convolutional neural network model CNN, and a buffer.
该视频解码系统编码的过程为:将比特流输入到熵解码器中,熵解码器对该比特流进行解码得到模式信息、量化参数和残差信息,将该模式信息输入到预测模块中,将该量化参数输入到卷积神经网络模型中,以及将该残差信息输入到反量化单元中。预测模块根据缓存器中的参考图片对输入的该模式信息进行预测得到预测数据,并将该预测数据输入重建单元。其中,预测模块包括帧内预测单元、运动估计与运动补偿单元和开关,模式信息可以包括帧内模式信息和帧间模式信息。帧内预测单元可以对帧内模式信息预测得到帧内预测数据,运动估计与运动补偿单元根据缓存器中缓存的参考图片对帧间模式信息帧间预测得到帧间预测数据,开关选择将帧内预测数据或将帧间预测数据输出给重建单元。The video decoding system encodes a process of inputting a bit stream into an entropy decoder, and the entropy decoder decodes the bit stream to obtain mode information, quantization parameters, and residual information, and input the mode information into a prediction module, The quantization parameter is input to the convolutional neural network model, and the residual information is input to the inverse quantization unit. The prediction module predicts the input mode information according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the reconstruction unit. The prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch, and the mode information may include intra mode information and inter mode information. The intra prediction unit may predict intra prediction information for the intra mode information, and the motion estimation and motion compensation unit may obtain inter prediction data by inter prediction of the inter mode information according to the reference picture buffered in the buffer, and the switch selects the intraframe. Predict the data or output the inter prediction data to the reconstruction unit.
反量化单元和反变换单元分别对该残差信息进行反量化和反变换处理,得到预测误差信息,将该预测误差信息输入到重建单元中;重建单元根据该预测误差信息和预测数据生成重构图片。相应的,在本步骤中,可以获取重建单元生成的重构图片,并将该重构图片作为失真图片。The inverse quantization unit and the inverse transform unit respectively perform inverse quantization and inverse transform processing on the residual information to obtain prediction error information, and input the prediction error information into the reconstruction unit; the reconstruction unit generates and reconstructs the prediction error information according to the prediction error information and the prediction data. image. Correspondingly, in this step, the reconstructed picture generated by the reconstructing unit may be acquired, and the reconstructed picture is taken as a distorted picture.
可选的,参见图20,在卷积神经网络模型和重建单元之间还可以串联滤波器,该滤波器还可以对重建单元生成的重构图片进行滤波,输出滤波的重构图片。相应的,在本步骤中,可以获取滤波的重构图片,并将滤波的该重构图片作为失真图片。Optionally, referring to FIG. 20, a filter may be connected between the convolutional neural network model and the reconstruction unit, and the filter may also filter the reconstructed picture generated by the reconstruction unit, and output the filtered reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.
可选的,参见图21,熵解码器输出的模式信息可以只包括帧内模式信息,且预测模块只包括帧内预测单元,该帧内预测单元对该帧内模式信息进行预测得到预测数据并输入给重建单元,由该重建单元生成重构图片。相应的,在本步骤中,可以获取滤波的重构图片,并将滤波的该重构图片作为失真图片。Optionally, referring to FIG. 21, the mode information output by the entropy decoder may include only intra mode information, and the prediction module includes only an intra prediction unit, and the intra prediction unit predicts the intra mode information to obtain prediction data. Input to the reconstruction unit, and the reconstructed unit generates a reconstructed picture. Correspondingly, in this step, the filtered reconstructed picture may be obtained, and the filtered reconstructed picture is taken as a distorted picture.
步骤302-305:分别与上述步骤202-205相同,在此不再详细说明。Steps 302-305 are the same as steps 202-205 above, and will not be described in detail herein.
在本申请实施例中,通过对视频解码过程中生成的失真图片进行划分,获取多个失真图像块,再使用卷积神经网络模型可以同时对一个或多个失真图像 块进行滤波,得到每个失真图像块对应的去失真图像块,根据每个失真图像块对应的去失真图像块生成一帧去失真图片。生成的一帧去失真图片为滤波后的图片,由于使用卷积神经网络滤波对失真图像块进行滤波,这样相比对整帧失真图片进行滤波,可以减小滤波所需要的资源,从而使设备能够满足滤波所需的资源。另外,又可以同时对多个失真图像块进行滤波,这样可以提高滤滤效率,以及提高视频解码效率。In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks to obtain each The de-distorted image block corresponding to the distorted image block generates a frame de-distorted picture according to the de-distorted image block corresponding to each distorted image block. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering. In addition, multiple distortion image blocks can be filtered at the same time, which can improve the filtering efficiency and improve the video decoding efficiency.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
参见图22,本申请实施例提供了一种图片滤波的装置400,所述装置400包括:Referring to FIG. 22, an embodiment of the present application provides a device 400 for image filtering, where the device 400 includes:
第一获取模块401,用于获取失真图片,所述失真图片相对于输入到视频编码系统中的原始视频图片存在失真;a first acquiring module 401, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;
第二获取模块402,用于通过对所述失真图片进行划分,获取多个第一图像块;The second obtaining module 402 is configured to obtain a plurality of first image blocks by dividing the distorted picture;
滤波模块403,用于使用卷积神经网络模型对每个第一图像块进行滤波,得到所述每个第一图像块对应的第二图像块;a filtering module 403, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
生成模块404,用于根据所述每个第一图像块对应的第二图像块生成一帧去失真图片。The generating module 404 is configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
可选的,所述第二获取模块402包括:Optionally, the second obtaining module 402 includes:
划分单元,用于根据目标宽度和目标高度对所述失真图片进行划分,得到所述失真图片包括的多个失真图像块;a dividing unit, configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture;
扩边单元,用于根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块。And an edge expansion unit, configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
可选的,所述多个失真图像块包括位于所述失真图片的顶点位置的第一失真图像块、位于所述失真图片的上边界和下边界上的第二失真图像块、位于所述失真图片的左边界和右边界上的第三失真图像块和除所述第一失真图像块、第二失真图像块和第三失真图像块之外的第四失真图像块;Optionally, the plurality of distorted image blocks include a first distorted image block located at a vertex position of the distorted picture, a second distorted image block located on an upper boundary and a lower boundary of the distorted picture, located in the distortion a third distorted image block on the left and right borders of the picture and a fourth distorted image block other than the first distorted image block, the second distorted image block, and the third distorted image block;
所述第一失真图像块的宽度和高度分别等于W 1-lap和H 1-lap,W 1为所述目标宽度,H 1为所述目标高度,lap为所述第一扩边尺寸,所述第二失真图像 块的宽度和高度分别等于W 1-2lap和H 1-lap,所述第三失真图像块的宽度和高度分别为W 1-lap和H 1-2lap,所述第四失真图像块的宽度和高度分别为W 1-2lap和H 1-2lap。 The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
可选的,所述扩边单元,用于:Optionally, the edge expansion unit is configured to:
根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到所述目标失真图像块对应的第一图像块,所述目标失真图像块为所述第一失真图像块、所述第二失真图像块和所述第三失真图像块,所述目标边缘为所述目标失真图像块中不与所述失真图片的边界重合的边缘;And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;
根据所述第一扩边尺寸,对所述第四失真图像块的四个边缘进行扩边处理,得到所述第四失真图像块对应的第一图像块。And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
可选的,所述装置400还包括:Optionally, the device 400 further includes:
第一设置模块,用于设置所述卷积神经网络模型包括的卷积层对应的扩边尺寸,所述设置的扩边尺寸不小于零且不大于所述卷积层对应的第二扩边尺寸,所述第二扩边尺寸为在训练所述卷积神经网络模型时所述卷积层的扩边尺寸。a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer The size, the second expanded size is an expanded size of the convolutional layer when the convolutional neural network model is trained.
可选的,所述装置400还包括:Optionally, the device 400 further includes:
第二设置模块,用于根据所述卷积神经网络模型包括的每个卷积层对应的第二扩边尺寸,设置所述第一扩边尺寸。And a second setting module, configured to set the first expanded size according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
可选的,所述生成模块404包括:Optionally, the generating module 404 includes:
栽边单元,用于对所述每个失真图像块对应的去失真图像块进行栽边处理,得到每个失真图像块对应的第三图像块;An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
组成单元,用于将所述每个失真图像块对应的第三图像块组成一帧去失真图片。And a component unit, configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
可选的,所述装置400还包括:Optionally, the device 400 further includes:
确定模块,用于根据所述第一扩边尺寸、所述失真图片的宽度和高度,确定所述目标宽度和所述目标高度。And a determining module, configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.
在本申请实施例中,通过对视频编解码过程中生成的失真图片进行划分,获取多个失真图像块,再使用卷积神经网络模型可以同时对一个或多个失真图像块进行滤波,得到每个失真图像块对应的去失真图像块,根据每个失真图像块对应的去失真图像块生成一帧去失真图片。生成的一帧去失真图片为滤波后的图片,由于使用卷积神经网络滤波对失真图像块进行滤波,这样相比对整帧 失真图片进行滤波,可以减小滤波所需要的资源,从而使设备能够满足滤波所需的资源。In the embodiment of the present application, a plurality of distorted image blocks are obtained by dividing a distorted picture generated in a video encoding and decoding process, and then using a convolutional neural network model to simultaneously filter one or more distorted image blocks, thereby obtaining each Deblocking image blocks corresponding to the distorted image blocks, and generating a frame of de-distorted pictures according to the de-distorted image blocks corresponding to each of the distorted image blocks. The generated de-distorted picture is a filtered picture. Since the distortion image block is filtered by using convolutional neural network filtering, the filtering of the entire frame distortion picture can reduce the resources required for filtering, thereby making the device Can meet the resources needed for filtering.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。With regard to the apparatus in the above embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment relating to the method, and will not be explained in detail herein.
图23示出了本发明一个示例性实施例提供的终端500的结构框图。该终端500可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端500还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。FIG. 23 is a block diagram showing the structure of a terminal 500 according to an exemplary embodiment of the present invention. The terminal 500 can be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and a MP4 (Moving Picture Experts Group Audio Layer IV). Image experts compress standard audio layers 4) players, laptops or desktops. Terminal 500 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.
通常,终端500包括有:处理器501和存储器502。Generally, the terminal 500 includes a processor 501 and a memory 502.
处理器501可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器501可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器501可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。 Processor 501 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve. The processor 501 may also include a main processor and a coprocessor. The main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby. In some embodiments, the processor 501 can be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display. In some embodiments, the processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
存储器502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器502还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器502中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器501所执行以实现本申请中方法实施例提供的一种图片滤波的方法。 Memory 502 can include one or more computer readable storage media, which can be non-transitory. Memory 502 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in the memory 502 is configured to store at least one instruction for execution by the processor 501 to implement one of the methods provided by the method embodiments of the present application. Image filtering method.
在一些实施例中,终端500还可选包括有:外围设备接口503和至少一个外围设备。处理器501、存储器502和外围设备接口503之间可以通过总线或 信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口503相连。具体地,外围设备包括:射频电路504、触摸显示屏505、摄像头506、音频电路507、定位组件508和电源509中的至少一种。In some embodiments, the terminal 500 optionally further includes: a peripheral device interface 503 and at least one peripheral device. The processor 501, the memory 502, and the peripheral device interface 503 can be connected by a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 503 via a bus, signal line or circuit board. Specifically, the peripheral device includes at least one of a radio frequency circuit 504, a touch display screen 505, a camera 506, an audio circuit 507, a positioning component 508, and a power source 509.
外围设备接口503可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器501和存储器502。在一些实施例中,处理器501、存储器502和外围设备接口503被集成在同一芯片或电路板上;在一些其他实施例中,处理器501、存储器502和外围设备接口503中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。 Peripheral device interface 503 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to processor 501 and memory 502. In some embodiments, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any of processor 501, memory 502, and peripheral interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
射频电路504用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路504通过电磁信号与通信网络以及其他通信设备进行通信。射频电路504将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路504包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路504可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路504还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。The RF circuit 504 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal. Radio frequency circuit 504 communicates with the communication network and other communication devices via electromagnetic signals. The RF circuit 504 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like. Radio frequency circuit 504 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the RF circuit 504 may also include NFC (Near Field Communication) related circuitry, which is not limited in this application.
显示屏505用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏505是触摸显示屏时,显示屏505还具有采集在显示屏505的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器501进行处理。此时,显示屏505还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏505可以为一个,设置终端500的前面板;在另一些实施例中,显示屏505可以为至少两个,分别设置在终端500的不同表面或呈折叠设计;在再一些实施例中,显示屏505可以是柔性显示屏,设置在终端500的弯曲表面上或折叠面上。甚至,显示屏505还可以设置成非矩形的不规则图形,也即异形屏。显示屏505可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。The display screen 505 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When display 505 is a touch display, display 505 also has the ability to acquire touch signals over the surface or surface of display 505. The touch signal can be input to the processor 501 as a control signal for processing. At this point, display 505 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display screen 505 may be one, and the front panel of the terminal 500 is disposed; in other embodiments, the display screen 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; In still other embodiments, display screen 505 can be a flexible display screen disposed on a curved surface or a folded surface of terminal 500. Even the display screen 505 can be set to a non-rectangular irregular pattern, that is, a profiled screen. The display screen 505 can be prepared by using an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
摄像头组件506用于采集图像或视频。可选地,摄像头组件506包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设 置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件506还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。 Camera component 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Usually, the front camera is placed on the front panel of the terminal, and the rear camera is placed on the back of the terminal. In some embodiments, the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions. In some embodiments, camera assembly 506 can also include a flash. The flash can be a monochrome temperature flash or a two-color temperature flash. The two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
音频电路507可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器501进行处理,或者输入至射频电路504以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端500的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器501或射频电路504的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路507还可以包括耳机插孔。The audio circuit 507 can include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 501 for processing, or input to the RF circuit 504 for voice communication. For the purpose of stereo acquisition or noise reduction, there may be multiple microphones, which are respectively disposed at different parts of the terminal 500. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is then used to convert electrical signals from processor 501 or radio frequency circuit 504 into sound waves. The speaker can be a conventional film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only can the electrical signal be converted into human audible sound waves, but also the electrical signal can be converted into sound waves that are inaudible to humans for ranging and the like. In some embodiments, the audio circuit 507 can also include a headphone jack.
定位组件508用于定位终端500的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件508可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。The location component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service). The positioning component 508 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
电源509用于为终端500中的各个组件进行供电。电源509可以是交流电、直流电、一次性电池或可充电电池。当电源509包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。 Power source 509 is used to power various components in terminal 500. The power source 509 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. A wired rechargeable battery is a battery that is charged by a wired line, and a wireless rechargeable battery is a battery that is charged by a wireless coil. The rechargeable battery can also be used to support fast charging technology.
在一些实施例中,终端500还包括有一个或多个传感器510。该一个或多个传感器510包括但不限于:加速度传感器511、陀螺仪传感器512、压力传感器513、指纹传感器514、光学传感器515以及接近传感器516。In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to, an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.
加速度传感器511可以检测以终端500建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器511可以用于检测重力加速度在三个坐标轴上的分量。处理器501可以根据加速度传感器511采集的重力加速度信号,控制 触摸显示屏505以横向视图或纵向视图进行用户界面的显示。加速度传感器511还可以用于游戏或者用户的运动数据的采集。The acceleration sensor 511 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 500. For example, the acceleration sensor 511 can be used to detect components of gravity acceleration on three coordinate axes. The processor 501 can control the touch display 505 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 511. The acceleration sensor 511 can also be used for the acquisition of game or user motion data.
陀螺仪传感器512可以检测终端500的机体方向及转动角度,陀螺仪传感器512可以与加速度传感器511协同采集用户对终端500的3D动作。处理器501根据陀螺仪传感器512采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。The gyro sensor 512 can detect the body direction and the rotation angle of the terminal 500, and the gyro sensor 512 can cooperate with the acceleration sensor 511 to collect the 3D motion of the user to the terminal 500. Based on the data collected by the gyro sensor 512, the processor 501 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
压力传感器513可以设置在终端500的侧边框和/或触摸显示屏505的下层。当压力传感器513设置在终端500的侧边框时,可以检测用户对终端500的握持信号,由处理器501根据压力传感器513采集的握持信号进行左右手识别或快捷操作。当压力传感器513设置在触摸显示屏505的下层时,由处理器501根据用户对触摸显示屏505的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。The pressure sensor 513 may be disposed at a side border of the terminal 500 and/or a lower layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, the user's holding signal to the terminal 500 can be detected, and the processor 501 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed on the lower layer of the touch display screen 505, the operability control on the UI interface is controlled by the processor 501 according to the user's pressure on the touch display screen 505. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
指纹传感器514用于采集用户的指纹,由处理器501根据指纹传感器514采集到的指纹识别用户的身份,或者,由指纹传感器514根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器501授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器514可以被设置终端500的正面、背面或侧面。当终端500上设置有物理按键或厂商Logo时,指纹传感器514可以与物理按键或厂商Logo集成在一起。The fingerprint sensor 514 is used to collect the fingerprint of the user. The processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon identifying that the identity of the user is a trusted identity, the processor 501 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like. The fingerprint sensor 514 can be disposed on the front, back, or side of the terminal 500. When the physical button or vendor logo is provided on the terminal 500, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer logo.
光学传感器515用于采集环境光强度。在一个实施例中,处理器501可以根据光学传感器515采集的环境光强度,控制触摸显示屏505的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏505的显示亮度;当环境光强度较低时,调低触摸显示屏505的显示亮度。在另一个实施例中,处理器501还可以根据光学传感器515采集的环境光强度,动态调整摄像头组件506的拍摄参数。Optical sensor 515 is used to collect ambient light intensity. In one embodiment, the processor 501 can control the display brightness of the touch display 505 based on the ambient light intensity acquired by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 505 is lowered. In another embodiment, the processor 501 can also dynamically adjust the shooting parameters of the camera assembly 506 based on the ambient light intensity acquired by the optical sensor 515.
接近传感器516,也称距离传感器,通常设置在终端500的前面板。接近传感器516用于采集用户与终端500的正面之间的距离。在一个实施例中,当接近传感器516检测到用户与终端500的正面之间的距离逐渐变小时,由处理器501控制触摸显示屏505从亮屏状态切换为息屏状态;当接近传感器516检测到用户与终端500的正面之间的距离逐渐变大时,由处理器501控制触摸显 示屏505从息屏状态切换为亮屏状态。Proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of terminal 500. Proximity sensor 516 is used to collect the distance between the user and the front of terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front side of the terminal 500 is gradually decreasing, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the screen state; when the proximity sensor 516 detects When the distance between the user and the front side of the terminal 500 gradually becomes larger, the processor 501 controls the touch display screen 505 to switch from the state of the screen to the bright state.
本领域技术人员可以理解,图23中示出的结构并不构成对终端500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。It will be understood by those skilled in the art that the structure shown in FIG. 23 does not constitute a limitation to the terminal 500, and may include more or less components than those illustrated, or may combine some components or adopt different component arrangements.
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。Other embodiments of the present application will be readily apparent to those skilled in the <RTIgt; The application is intended to cover any variations, uses, or adaptations of the application, which are in accordance with the general principles of the application and include common general knowledge or common technical means in the art that are not disclosed herein. . The specification and examples are to be regarded as illustrative only,
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the invention is not limited to the details of the details and The scope of the present application is limited only by the accompanying claims.

Claims (16)

  1. 一种图片滤波的方法,其特征在于,所述方法包括:A method for filtering a picture, the method comprising:
    获取失真图片,所述失真图片相对于输入到视频编码系统中的原始视频图片存在失真;Obtaining a distorted picture that is distorted relative to an original video picture that is input into the video encoding system;
    通过对所述失真图片进行划分,获取多个第一图像块;Obtaining a plurality of first image blocks by dividing the distorted picture;
    使用卷积神经网络模型对每个第一图像块进行滤波,得到所述每个第一图像块对应的第二图像块;And filtering each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
    根据所述每个第一图像块对应的第二图像块生成一帧去失真图片。And generating a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
  2. 如权利要求1所述的方法,其特征在于,所述通过对所述失真图片进行划分,获取多个第一图像块,包括:The method according to claim 1, wherein the obtaining the plurality of first image blocks by dividing the distorted picture comprises:
    根据目标宽度和目标高度对所述失真图片进行划分,得到所述失真图片包括的多个失真图像块;Dividing the distorted picture according to a target width and a target height to obtain a plurality of distorted image blocks included in the distorted picture;
    根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块。Performing edge expansion processing on each of the plurality of distorted image blocks according to the first expanded size to obtain a first image block corresponding to each of the distorted image blocks.
  3. 如权利要求2所述的方法,其特征在于,所述多个失真图像块包括位于所述失真图片的顶点位置的第一失真图像块、位于所述失真图片的上边界和下边界上的第二失真图像块、位于所述失真图片的左边界和右边界上的第三失真图像块和除所述第一失真图像块、第二失真图像块和第三失真图像块之外的第四失真图像块;The method according to claim 2, wherein said plurality of distorted image blocks comprise a first distorted image block located at a vertex position of said distorted picture, and located on an upper boundary and a lower boundary of said distorted picture a second distortion image block, a third distortion image block located on a left boundary and a right boundary of the distortion picture, and a fourth distortion in addition to the first distortion image block, the second distortion image block, and the third distortion image block Image block
    所述第一失真图像块的宽度和高度分别等于W 1-lap和H 1-lap,W 1为所述目标宽度,H 1为所述目标高度,lap为所述第一扩边尺寸,所述第二失真图像块的宽度和高度分别等于W 1-2lap和H 1-lap,所述第三失真图像块的宽度和高度分别为W 1-lap和H 1-2lap,所述第四失真图像块的宽度和高度分别为W 1-2lap和H 1-2lap。 The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
  4. 如权利要求3所述的方法,其特征在于,所述根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块,包括:The method according to claim 3, wherein said each of said plurality of distorted image blocks is subjected to edge expansion processing according to said first expanded size to obtain said corresponding each of said distorted image blocks The first image block includes:
    根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到所述目标失真图像块对应的第一图像块,所述目标失真图像块为所述第一失真图像块、所述第二失真图像块和所述第三失真图像块,所述目标边缘为所述目标失真图像块中不与所述失真图片的边界重合的边缘;And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;
    根据所述第一扩边尺寸,对所述第四失真图像块的四个边缘进行扩边处理,得到所述第四失真图像块对应的第一图像块。And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
  5. 如权利要求2所述的方法,其特征在于,所述使用卷积神经网络模型分别对所述失真图片的每个失真图像块进行滤波之前,还包括:The method according to claim 2, wherein before the filtering of each of the distorted image blocks of the distorted picture using the convolutional neural network model, the method further comprises:
    设置所述卷积神经网络模型包括的卷积层对应的扩边尺寸,所述设置的扩边尺寸不小于零且不大于所述卷积层对应的第二扩边尺寸,所述第二扩边尺寸为在训练所述卷积神经网络模型时所述卷积层的扩边尺寸。And setting a flanged dimension corresponding to the convolution layer included in the convolutional neural network model, the set expansion dimension is not less than zero and not greater than a second expansion dimension corresponding to the convolution layer, and the second expansion The edge size is the expanded size of the convolutional layer when the convolutional neural network model is trained.
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    根据所述卷积神经网络模型包括的每个卷积层对应的第二扩边尺寸,设置所述第一扩边尺寸。The first expanded size is set according to a second expanded size corresponding to each convolution layer included in the convolutional neural network model.
  7. 如权利要求1至4任一项所述的方法,其特征在于,所述根据所述每个失真图像块对应的去失真图像块生成一帧去失真图片,包括:The method according to any one of claims 1 to 4, wherein the generating a frame of the de-distorted picture according to the de-distorted image block corresponding to each of the distorted image blocks comprises:
    对所述每个失真图像块对应的去失真图像块进行栽边处理,得到每个失真图像块对应的第三图像块;Performing edge processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
    将所述每个失真图像块对应的第三图像块组成一帧去失真图片。The third image block corresponding to each of the distorted image blocks is composed into a frame de-distorted picture.
  8. 如权利要求2至6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2 to 6, wherein the method further comprises:
    根据所述第一扩边尺寸、所述失真图片的宽度和高度,确定所述目标宽度和所述目标高度。The target width and the target height are determined according to the first expanded size, the width and height of the distorted picture.
  9. 一种图片滤波的装置,其特征在于,所述装置包括:A device for filtering pictures, characterized in that the device comprises:
    第一获取模块,用于获取失真图片,所述失真图片相对于输入到视频编码系统中的原始视频图片存在失真;a first acquiring module, configured to acquire a distorted picture, where the distorted picture is distorted with respect to an original video picture input to the video encoding system;
    第二获取模块,用于通过对所述失真图片进行划分,获取多个第一图像块;a second acquiring module, configured to obtain a plurality of first image blocks by dividing the distorted picture;
    滤波模块,用于使用卷积神经网络模型对每个第一图像块进行滤波,得到所述每个第一图像块对应的第二图像块;a filtering module, configured to filter each first image block by using a convolutional neural network model to obtain a second image block corresponding to each of the first image blocks;
    生成模块,用于根据所述每个第一图像块对应的第二图像块生成一帧去失真图片。And a generating module, configured to generate a frame de-distorted picture according to the second image block corresponding to each of the first image blocks.
  10. 如权利要求9所述的装置,其特征在于,所述第二获取模块包括:The device of claim 9, wherein the second obtaining module comprises:
    划分单元,用于根据目标宽度和目标高度对所述失真图片进行划分,得到所述失真图片包括的多个失真图像块;a dividing unit, configured to divide the distorted picture according to a target width and a target height, to obtain a plurality of distorted image blocks included in the distorted picture;
    扩边单元,用于根据第一扩边尺寸对所述多个失真图像块中的每个失真图像块进行扩边处理,得到所述每个失真图像块对应的第一图像块。And an edge expansion unit, configured to perform edge expansion processing on each of the plurality of distortion image blocks according to the first expansion size to obtain a first image block corresponding to each of the distortion image blocks.
  11. 如权利要求10所述的装置,其特征在于,所述多个失真图像块包括位于所述失真图片的顶点位置的第一失真图像块、位于所述失真图片的上边界和下边界上的第二失真图像块、位于所述失真图片的左边界和右边界上的第三失真图像块和除所述第一失真图像块、第二失真图像块和第三失真图像块之外的第四失真图像块;The apparatus according to claim 10, wherein said plurality of distorted image blocks include a first distorted image block located at a vertex position of said distorted picture, and an upper boundary and a lower boundary of said distorted picture a second distortion image block, a third distortion image block located on a left boundary and a right boundary of the distortion picture, and a fourth distortion in addition to the first distortion image block, the second distortion image block, and the third distortion image block Image block
    所述第一失真图像块的宽度和高度分别等于W 1-lap和H 1-lap,W 1为所述目标宽度,H 1为所述目标高度,lap为所述第一扩边尺寸,所述第二失真图像块的宽度和高度分别等于W 1-2lap和H 1-lap,所述第三失真图像块的宽度和高度分别为W 1-lap和H 1-2lap,所述第四失真图像块的宽度和高度分别为W 1-2lap和H 1-2lap。 The width and height of the first distorted image block are equal to W 1 -lap and H 1 -lap, respectively, W 1 is the target width, H 1 is the target height, and lap is the first expanded size. The width and height of the second distorted image block are equal to W 1 -2lap and H 1 -lap, respectively, and the width and height of the third distorted image block are W 1 -lap and H 1 -2lap, respectively, the fourth distortion The width and height of the image block are W 1 -2lap and H 1 -2lap, respectively.
  12. 如权利要求11所述的装置,其特征在于,所述扩边单元,用于:The apparatus according to claim 11, wherein said edge expansion unit is configured to:
    根据第一扩边尺寸对目标失真图像块的目标边缘进行扩边处理,得到所述目标失真图像块对应的第一图像块,所述目标失真图像块为所述第一失真图像块、所述第二失真图像块和所述第三失真图像块,所述目标边缘为所述目标失真图像块中不与所述失真图片的边界重合的边缘;And performing a method of expanding a target edge of the target distortion image block according to the first expanded size to obtain a first image block corresponding to the target distortion image block, where the target distortion image block is the first distortion image block, a second distortion image block and the third distortion image block, the target edge being an edge of the target distortion image block that does not coincide with a boundary of the distortion picture;
    根据所述第一扩边尺寸,对所述第四失真图像块的四个边缘进行扩边处理,得到所述第四失真图像块对应的第一图像块。And expanding, according to the first expanded size, four edges of the fourth distorted image block to obtain a first image block corresponding to the fourth distorted image block.
  13. 如权利要求10所述的装置,其特征在于,所述装置还包括:The device of claim 10, wherein the device further comprises:
    第一设置模块,用于设置所述卷积神经网络模型包括的卷积层对应的扩边尺寸,所述设置的扩边尺寸不小于零且不大于所述卷积层对应的第二扩边尺寸,所述第二扩边尺寸为在训练所述卷积神经网络模型时所述卷积层的扩边尺寸。a first setting module, configured to set an edge expansion size corresponding to the convolution layer included in the convolutional neural network model, where the expanded size of the setting is not less than zero and not greater than a second expansion corresponding to the convolution layer The size, the second expanded size is an expanded size of the convolutional layer when the convolutional neural network model is trained.
  14. 如权利要求13所述的装置,其特征在于,所述装置还包括:The device of claim 13 wherein said device further comprises:
    第二设置模块,用于根据所述卷积神经网络模型包括的每个卷积层数目对应的第二扩边尺寸,设置所述第一扩边尺寸。And a second setting module, configured to set the first expanded size according to a second expanded size corresponding to the number of each convolution layer included in the convolutional neural network model.
  15. 如权利要求9至12任一项所述的装置,其特征在于,所述生成模块包括:The apparatus according to any one of claims 9 to 12, wherein the generating module comprises:
    栽边单元,用于对所述每个失真图像块对应的去失真图像块进行栽边处理,得到每个失真图像块对应的第三图像块;An edge-splitting unit is configured to perform edge-splitting processing on the de-distorted image blocks corresponding to each of the distortion image blocks to obtain a third image block corresponding to each of the distortion image blocks;
    组成单元,用于将所述每个失真图像块对应的第三图像块组成一帧去失真图片。And a component unit, configured to form a third image block corresponding to each of the distortion image blocks into a frame de-distorted picture.
  16. 如权利要求10至14任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 10 to 14, wherein the device further comprises:
    确定模块,用于根据所述第一扩边尺寸、所述失真图片的宽度和高度,确定所述目标宽度和所述目标高度。And a determining module, configured to determine the target width and the target height according to the first expanded size, the width and height of the distorted picture.
PCT/CN2019/072412 2018-01-18 2019-01-18 Image filtering method and device WO2019141255A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810050422.8 2018-01-18
CN201810050422.8A CN110062225B (en) 2018-01-18 2018-01-18 Picture filtering method and device

Publications (1)

Publication Number Publication Date
WO2019141255A1 true WO2019141255A1 (en) 2019-07-25

Family

ID=67301965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/072412 WO2019141255A1 (en) 2018-01-18 2019-01-18 Image filtering method and device

Country Status (2)

Country Link
CN (1) CN110062225B (en)
WO (1) WO2019141255A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040213472A1 (en) * 2003-02-17 2004-10-28 Taku Kodama Image compression apparatus, image decompression apparatus, image compression method, image decompression method, program, and recording medium
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
CN105611303A (en) * 2016-03-07 2016-05-25 京东方科技集团股份有限公司 Image compression system, decompression system, training method and device, and display device
CN107018422A (en) * 2017-04-27 2017-08-04 四川大学 Still image compression method based on depth convolutional neural networks
CN107197260A (en) * 2017-06-12 2017-09-22 清华大学深圳研究生院 Video coding post-filter method based on convolutional neural networks
CN107590804A (en) * 2017-09-14 2018-01-16 浙江科技学院 Screen picture quality evaluating method based on channel characteristics and convolutional neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214362B (en) * 2011-04-27 2012-09-05 天津大学 Block-based quick image mixing method
CN107925762B (en) * 2015-09-03 2020-11-27 联发科技股份有限公司 Video coding and decoding processing method and device based on neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040213472A1 (en) * 2003-02-17 2004-10-28 Taku Kodama Image compression apparatus, image decompression apparatus, image compression method, image decompression method, program, and recording medium
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
CN105611303A (en) * 2016-03-07 2016-05-25 京东方科技集团股份有限公司 Image compression system, decompression system, training method and device, and display device
CN107018422A (en) * 2017-04-27 2017-08-04 四川大学 Still image compression method based on depth convolutional neural networks
CN107197260A (en) * 2017-06-12 2017-09-22 清华大学深圳研究生院 Video coding post-filter method based on convolutional neural networks
CN107590804A (en) * 2017-09-14 2018-01-16 浙江科技学院 Screen picture quality evaluating method based on channel characteristics and convolutional neural networks

Also Published As

Publication number Publication date
CN110062225B (en) 2021-06-11
CN110062225A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
TWI788630B (en) Method, device, computer equipment, and storage medium for generating 3d face model
US11205282B2 (en) Relocalization method and apparatus in camera pose tracking process and storage medium
CN108810538B (en) Video coding method, device, terminal and storage medium
CN108305236B (en) Image enhancement processing method and device
CN107945163B (en) Image enhancement method and device
CN110502954B (en) Video analysis method and device
US9692959B2 (en) Image processing apparatus and method
CN112633306B (en) Method and device for generating countermeasure image
WO2019141193A1 (en) Method and apparatus for processing video frame data
CN111028144B (en) Video face changing method and device and storage medium
CN110933334B (en) Video noise reduction method, device, terminal and storage medium
CN112287852A (en) Face image processing method, display method, device and equipment
WO2020083385A1 (en) Image processing method, device and system
CN110991457A (en) Two-dimensional code processing method and device, electronic equipment and storage medium
CN110232417B (en) Image recognition method and device, computer equipment and computer readable storage medium
CN112135191A (en) Video editing method, device, terminal and storage medium
CN112235650A (en) Video processing method, device, terminal and storage medium
CN115205164B (en) Training method of image processing model, video processing method, device and equipment
WO2019141258A1 (en) Video encoding method, video decoding method, device, and system
WO2023087637A1 (en) Video coding method and apparatus, and electronic device and computer-readable storage medium
WO2019141255A1 (en) Image filtering method and device
CN112383719B (en) Image brightness adjusting method, device and equipment and readable storage medium
CN115330610A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113379624A (en) Image generation method, training method, device and equipment of image generation model
CN114332709A (en) Video processing method, video processing device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19741103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19741103

Country of ref document: EP

Kind code of ref document: A1