WO2022037162A1 - 图像处理方法、装置、设备、计算机存储介质和系统 - Google Patents

图像处理方法、装置、设备、计算机存储介质和系统 Download PDF

Info

Publication number
WO2022037162A1
WO2022037162A1 PCT/CN2021/096017 CN2021096017W WO2022037162A1 WO 2022037162 A1 WO2022037162 A1 WO 2022037162A1 CN 2021096017 W CN2021096017 W CN 2021096017W WO 2022037162 A1 WO2022037162 A1 WO 2022037162A1
Authority
WO
WIPO (PCT)
Prior art keywords
network model
image
preset
training
encoding
Prior art date
Application number
PCT/CN2021/096017
Other languages
English (en)
French (fr)
Inventor
马展
王锡宁
陈彤
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022037162A1 publication Critical patent/WO2022037162A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements

Definitions

  • the present application relates to the technical field of video coding and decoding, and in particular, to an image processing method, apparatus, device, computer storage medium and system.
  • Deep learning is a branch of machine learning. It is an algorithm that attempts to use multiple processing layers containing complex structures or multiple nonlinear transformations to perform high-level abstraction on data. Its powerful expressive ability makes it widely used in computer vision and image processing. It has been widely used and has good performance in video and image processing.
  • image encoding and decoding and image post-processing technologies based on deep learning often adopt the scheme of inputting the entire image into the encoding and decoding network at one time for processing.
  • this scheme will greatly increase the running time of the encoding and decoding and the running memory requirements.
  • the structure of the existing schemes makes the encoding and decoding between blocks not completely independent, which cannot realize the parallelization of encoding and decoding processing, and still cannot reduce the running time of encoding and decoding. and running memory requirements.
  • the present application proposes an image processing method, apparatus, device, computer storage medium and system, which can realize the parallelization of encoding and decoding processing, can improve the peak signal-to-noise ratio of reconstructed images, and can also reduce the total calculation amount of the post-processing network. , thereby reducing codec runtime and memory requirements.
  • an embodiment of the present application provides an image processing method, which is applied to an image processing apparatus, and the method includes:
  • the plurality of reconstruction blocks are obtained after a plurality of image blocks divided by the image to be processed pass through a preset encoding network model and a preset decoding network model;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • an embodiment of the present application provides an image processing method, which is applied to a decoding device, and the method includes:
  • the code stream transmitted by the encoding device; wherein, the code stream is obtained by passing through a preset encoding network model through a plurality of image blocks divided by the image to be processed;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • an embodiment of the present application provides an image processing method, which is applied to an encoding device, and the method includes:
  • the code stream is transmitted to the decoding device.
  • an embodiment of the present application provides an image processing apparatus, the image processing apparatus includes: an acquisition unit and a processing unit; wherein,
  • the obtaining unit is configured to obtain a plurality of reconstruction blocks; wherein, the plurality of reconstruction blocks are obtained after a plurality of image blocks divided by the image to be processed pass through a preset encoding network model and a preset decoding network model ;
  • the processing unit is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image; and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • an embodiment of the present application provides an image processing apparatus, the image processing apparatus includes: a first memory and a first processor; wherein,
  • the first memory for storing executable instructions that can be executed on the first processor
  • the first processor is configured to execute the method according to the first aspect when executing the executable instructions.
  • an embodiment of the present application provides a decoding device, where the decoding device includes: a receiving unit, a decoding unit, and a post-processing unit; wherein,
  • the receiving unit is configured to receive a code stream transmitted by an encoding device; wherein, the code stream is obtained by a plurality of image blocks divided by the image to be processed through a preset encoding network model;
  • the decoding unit is configured to use a preset decoding network model to parse the code stream to obtain a plurality of reconstruction blocks;
  • the post-processing unit is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image, and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • an embodiment of the present application provides a decoding device, where the decoding device includes: a second memory and a second processor; wherein,
  • the second memory for storing executable instructions executable on the second processor
  • the second processor is configured to execute the method according to the second aspect when executing the executable instructions.
  • an embodiment of the present application provides an encoding device, the encoding device includes: an acquisition unit, a block unit, an encoding unit, and a sending unit; wherein,
  • the acquisition unit configured to acquire the image to be processed
  • the block unit is configured to block the to-be-processed image to obtain multiple image blocks; wherein the multiple image blocks are equal in size and do not overlap;
  • the encoding unit configured to use a preset encoding network model to encode the plurality of image blocks to generate a code stream;
  • the sending unit is configured to transmit the code stream to a decoding device.
  • an embodiment of the present application provides an encoding device, where the encoding device includes: a third memory and a third processor; wherein,
  • the third memory for storing executable instructions that can be executed on the third processor
  • the third processor is configured to execute the method according to the third aspect when executing the executable instructions.
  • an embodiment of the present application provides a computer storage medium, where an image processing program is stored in the computer storage medium, and when the image processing program is executed by the first processor, the method described in the first aspect, or The method of the second aspect is implemented when executed by the second processor, or the method of the third aspect is implemented when executed by the third processor.
  • an embodiment of the present application provides a video system, where the video system includes: an encoding device and a decoding device; wherein,
  • the encoding device is configured to obtain an image to be processed; and block the image to be processed to obtain multiple image blocks; wherein, the multiple image blocks are of equal size and do not overlap; and use a preset encoding network model encoding the plurality of image blocks to generate a code stream; and transmitting the code stream to a decoding device;
  • the decoding device is configured to receive a code stream transmitted by the encoding device; and use a preset decoding network model to parse the code stream to obtain multiple reconstructed blocks; and splicing the multiple reconstructed blocks to generate The image is reconstructed, and a preset post-processing network model is used to filter the block boundaries in the reconstructed image to obtain a target image.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a post-processing network model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a residual block provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of using a preset post-processing network model to eliminate blockiness according to an embodiment of the present application
  • FIG. 5 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an end-to-end network structure including a preset encoding network model and a preset decoding network model provided by an embodiment of the present application;
  • FIG. 7 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
  • FIG. 8 is a detailed schematic flowchart of an image processing method provided by an embodiment of the present application.
  • 9A is a schematic diagram of a reconstructed image with block effect provided by an embodiment of the present application.
  • FIG. 9B is a schematic diagram of a reconstructed image without block effect provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram comparing image rate-distortion curves before and after a preset post-processing network model provided by an embodiment of the application;
  • FIG. 11 is a schematic diagram of the composition and structure of an image processing apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a hardware structure of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the composition and structure of a decoding device provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of a hardware structure of a decoding device provided by an embodiment of the application.
  • 15 is a schematic diagram of the composition and structure of an encoding device provided by an embodiment of the application.
  • 16 is a schematic diagram of a hardware structure of an encoding device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a video system according to an embodiment of the present application.
  • an embodiment of the present application provides an image processing method, which is applied to an image processing apparatus, and the method includes:
  • the plurality of reconstruction blocks are obtained after a plurality of image blocks divided by the image to be processed pass through a preset encoding network model and a preset decoding network model;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • the method further includes:
  • the multiple reconstruction training blocks are multiple training blocks divided by at least one training image in the training set via the preset encoding network model and the preset decoding obtained after the network model;
  • a post-processing network model is constructed, and the post-processing network model is trained based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the training of the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model includes:
  • the post-processing network model obtained after training is determined as the preset post-processing network model.
  • performing filtering processing on block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image including:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one processed rectangular area.
  • the method further includes:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one target rectangular area.
  • an embodiment of the present application provides an image processing method, which is applied to a decoding device, and the method includes:
  • the code stream transmitted by the encoding device; wherein, the code stream is obtained by passing through a preset encoding network model through a plurality of image blocks divided by the image to be processed;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • the method further includes:
  • the training set includes at least one training image
  • An encoding network model and a decoding network model are constructed, and model training is performed on the encoding network model and the decoding network model based on the training set to obtain the preset encoding network model and the preset decoding network model.
  • performing model training on the encoding network model and the decoding network model based on the training set to obtain the preset encoding network model and the preset decoding network model including:
  • the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • the method further includes:
  • the multiple reconstruction training blocks are multiple training blocks divided by at least one training image in the training set via the preset coding network model and the preset coding network model.
  • the multiple reconstruction training blocks are multiple training blocks divided by at least one training image in the training set via the preset coding network model and the preset coding network model.
  • a post-processing network model is constructed, and the post-processing network model is trained based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the training of the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model includes:
  • the post-processing network model obtained after training is determined as the preset post-processing network model.
  • performing filtering processing on block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image including:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one processed rectangular area.
  • the method further includes:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one target rectangular area.
  • an embodiment of the present application provides an image processing method, which is applied to an encoding device, and the method includes:
  • the code stream is transmitted to the decoding device.
  • the method further includes:
  • the training set includes at least one training image
  • the preset decoding The network model is used to instruct the decoding device to parse the code stream to obtain a plurality of reconstructed blocks.
  • performing model training on the encoding network model and the decoding network model based on the training set to obtain a preset encoding network model and a preset decoding network model including:
  • the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • an embodiment of the present application provides an image processing apparatus, the image processing apparatus includes: an acquisition unit and a processing unit; wherein,
  • the obtaining unit is configured to obtain a plurality of reconstruction blocks; wherein, the plurality of reconstruction blocks are obtained after a plurality of image blocks divided by the image to be processed pass through a preset encoding network model and a preset decoding network model ;
  • the processing unit is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image; and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • an embodiment of the present application provides an image processing apparatus, the image processing apparatus includes: a first memory and a first processor; wherein,
  • the first memory for storing executable instructions that can be executed on the first processor
  • the first processor is configured to execute the method according to any one of the first aspects when executing the executable instructions.
  • an embodiment of the present application provides a decoding device, where the decoding device includes: a receiving unit, a decoding unit, and a post-processing unit; wherein,
  • the receiving unit is configured to receive the code stream transmitted by the encoding device; wherein, the code stream is obtained after the multiple image blocks divided by the to-be-processed image via the preset coding network model;
  • the decoding unit is configured to use a preset decoding network model to parse the code stream to obtain a plurality of reconstruction blocks;
  • the post-processing unit is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image, and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • an embodiment of the present application provides a decoding device, where the decoding device includes: a second memory and a second processor; wherein,
  • the second memory for storing executable instructions executable on the second processor
  • the second processor is configured to execute the method according to any one of the second aspects when running the executable instructions.
  • an embodiment of the present application provides an encoding device, the encoding device includes: an acquisition unit, a block unit, an encoding unit, and a sending unit; wherein,
  • the acquisition unit configured to acquire the image to be processed
  • the block unit is configured to block the to-be-processed image to obtain multiple image blocks; wherein the multiple image blocks are equal in size and do not overlap;
  • the encoding unit configured to use a preset encoding network model to encode the plurality of image blocks to generate a code stream;
  • the sending unit is configured to transmit the code stream to a decoding device.
  • an embodiment of the present application provides an encoding device, where the encoding device includes: a third memory and a third processor; wherein,
  • the third memory for storing executable instructions that can be executed on the third processor
  • the third processor is configured to execute the method according to any one of the third aspects when executing the executable instructions.
  • an embodiment of the present application provides a computer storage medium, where an image processing program is stored in the computer storage medium, and when the image processing program is executed by the first processor, the implementation of any one of the first aspects is implemented.
  • the method according to any one of the second aspects is implemented when executed by the second processor, or the method according to any one of the third aspects is implemented when executed by the third processor.
  • an embodiment of the present application provides a video system, where the video system includes: an encoding device and a decoding device; wherein,
  • the encoding device is configured to obtain an image to be processed; and block the image to be processed to obtain multiple image blocks; wherein the multiple image blocks are equal in size and do not overlap; and use a preset encoding network model encoding the plurality of image blocks to generate a code stream; and transmitting the code stream to a decoding device;
  • the decoding device is configured to receive a code stream transmitted by the encoding device; and use a preset decoding network model to parse the code stream to obtain multiple reconstructed blocks; and splicing the multiple reconstructed blocks to generate The image is reconstructed, and a preset post-processing network model is used to filter the block boundaries in the reconstructed image to obtain a target image.
  • Artificial neural network also referred to as neural network, or connection model
  • ANNs Artificial Neural Networks
  • connection model is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnected relationship between a large number of internal nodes.
  • Deep learning is a branch of machine learning. It is an algorithm that attempts to use multiple processing layers containing complex structures or multiple nonlinear transformations to perform high-level abstraction on data. Its powerful expressive ability makes it suitable for various machine learning tasks. The best results are achieved, and the performance in video and image processing also exceeds the current related technologies.
  • the autoencoder is an important part of deep learning.
  • the neural network is trained end-to-end through a large number of data sets, which can continuously improve the accuracy.
  • the decoding (decode) process makes the input and output closer and closer, which is an unsupervised learning process.
  • the decoding (decode) process makes the input and output closer and closer, which is an unsupervised learning process.
  • the development of the compression field is a good start, and it is also beneficial to the future in the direction of video compression.
  • the new scheme based on neural network has better performance and prospects than the traditional scheme in the whole system.
  • post-processing refers to the method of enhancing the quality of compressed image design and eliminating artificial traces to improve the visual effect of the image.
  • post-processing is also widely used in video compression. In this way, since deep learning has been widely used in computer vision and image processing in recent years, some current research works can also use deep learning for image or video compression post-processing, and have achieved certain results.
  • an embodiment of the present application provides an image processing method.
  • the basic idea of the method is to obtain multiple reconstructed blocks; wherein the multiple reconstructed blocks are multiple image blocks divided by the image to be processed. Obtained through a preset coding network model and a preset decoding network model; splicing the multiple reconstructed blocks to generate a reconstructed image; using a preset post-processing network model to perform block boundaries in the reconstructed image. After filtering, the target image is obtained.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries.
  • the application only performs post-processing on the rectangular area of the block boundary, which also reduces the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent. , it can also realize the parallelization of post-processing, and further reduce the running time and memory requirements of single-core post-processing.
  • FIG. 1 shows a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 1, the method may include:
  • S101 Acquire multiple reconstructed blocks; wherein, the multiple reconstructed blocks are obtained after multiple image blocks divided by the image to be processed through a preset encoding network model and a preset decoding network model.
  • the method is applied to an image processing apparatus, or a device integrated with an image processing apparatus, such as a decoding device.
  • the encoding and decoding network includes an encoding network model and a decoding network model, and the preset encoding network model and the preset decoding network model are obtained by model training based on the neural network structure.
  • the preset encoding network model is used for encoding a plurality of image blocks divided by the image to be processed to generate a code stream, and the preset decoding network model is used for parsing the code stream to obtain a plurality of reconstructed blocks.
  • the multiple image blocks have the same size and do not overlap.
  • the size of each image block may be 128*128, but the embodiment of the present application does not specifically limit it.
  • the blocks are completely independent, and the preset coding network model and the preset decoding network model can be used for parallel coding and decoding processing, and multiple reconstructed blocks can be obtained in parallel.
  • S102 Splicing the multiple reconstructed blocks to generate a reconstructed image.
  • a reconstructed image may be generated by splicing.
  • the reconstructed image obtained by splicing has obvious block effect; at this time, the block boundary in the reconstructed image needs to be filtered to reduce the block effect at the block boundary.
  • S103 Use a preset post-processing network model to filter the block boundaries in the reconstructed image to obtain a target image.
  • the preset post-processing network model is also obtained by model training based on the neural network structure.
  • the method may further include:
  • a post-processing network model is constructed, and the post-processing network model is trained based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the multiple reconstructed training blocks may be obtained by passing through the preset encoding network model and the preset decoding network model through multiple training blocks divided by at least one training image in the training set.
  • a training set needs to be obtained first, and the training set includes at least one training image.
  • the selection of the training set has a great influence on the training of the entire neural network.
  • a neural network image codec (Neutral Image Codec, NIC) data set may be selected.
  • the NIC dataset is a development dataset of the Institute of Electrical and Electronics Engineers (IEEE) standard test model NIC based on deep learning image compression.
  • a training set or a The verification set and the test set wherein, the size of the images in the training set may be 256*256, and the size of the images in the verification set and the test set may also be 256*256, but the embodiment of this application does not make any limitation.
  • the post-processing network model adopts a neural network structure, which may be composed of a convolution layer, an activation function, and multiple cascaded residual blocks for providing model performance.
  • the network structure is shown in Figure 2.
  • the activation function can be a linear rectification function (Rectified Linear Unit, ReLU), also known as a modified linear unit, which is a commonly used activation function in artificial neural networks, and usually refers to a nonlinear function represented by a ramp function and its variants. .
  • the first convolution layer for the reconstructed block boundary pixels with obvious block effect, the first convolution layer, multiple residual blocks (for example, 9 cascaded residual blocks), the second convolutional layer and the third convolutional layer, and then use the adder to superimpose the output of the third convolutional layer and the input of the first convolutional layer, which can output no obvious output.
  • Reconstructed block boundary pixels for blockiness For example, 9 cascaded residual blocks, the second convolutional layer and the third convolutional layer, and then use the adder to superimpose the output of the third convolutional layer and the input of the first convolutional layer, which can output no obvious output.
  • the first convolutional layer and the second convolutional layer include activation functions
  • the third convolutional layer does not include activation functions
  • the first convolutional layer and the second convolutional layer are expressed as k3n128+ReLU, indicating that the first volume
  • the convolution kernel size of the product layer and the second convolution layer is 3*3, the number of output features is 128, the stride is 1, and the activation function is included; while the third convolution layer is denoted as k3n3, indicating that the third convolution layer
  • the kernel size of the layer is 3*3, the number of output features is 3, and the stride is 1, but the activation function is not included.
  • Figure 3 For each residual block, its network structure is shown in Figure 3.
  • the feature map is used as input, and then passes through the fourth convolutional layer and the fifth convolutional layer in sequence, and then uses the adder to superimpose the output of the fifth convolutional layer and the input of the fourth convolutional layer to obtain Output feature map.
  • the fourth convolutional layer includes an activation function, and the fifth convolutional layer does not include an activation function; and the fourth convolutional layer is represented as k3n128+ReLU, indicating the convolution kernels of the first convolutional layer and the second convolutional layer
  • the size is 3*3, the number of output features is 128, the stride is 1, and the activation function is included; and the fifth convolutional layer is represented as k3n128, indicating that the convolution kernel size of the fifth convolutional layer is 3*3, and the output
  • the number of features is 128 and the stride is 1, but the activation function is not included.
  • the training set and the preset algorithm can be used to perform model training on the post-processing network model.
  • the training of the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model may include:
  • the post-processing network model obtained after training is determined as the preset post-processing network model.
  • the training image with the size of 256*256 in the training set after dividing it into equal and non-overlapping 128*128 training blocks and inputting the preset encoding network model and preset decoding network model, The obtained reconstructed blocks are re-spliced into 256*256 reconstructed training images with blockiness.
  • the reconstructed training image with block effect can be used as the training input image of the post-processing network model, and the training image in the training set can be used as the training target image of the post-processing network model;
  • the cost function can be a rate-distortion cost function, and the degree of distortion is the mean square error between the training input image and the training target image.
  • the preset algorithm may be an adaptive moment estimation (Adaptive moment estimation, Adam) gradient optimization algorithm.
  • the Adam gradient optimization algorithm is an extension of the stochastic gradient descent method, widely used in deep learning applications in computer vision and natural language processing. It works well and can achieve good results quickly.
  • the post-processing network is trained using the Adam gradient optimization algorithm, and the network parameters of the preset encoding network model and the preset decoding network model are kept fixed during the training process, and only the post-processing network model is iteratively updated. After the loss value corresponding to the cost function converges and converges to the preset threshold, the post-processing network model obtained by training at this time is the preset post-processing network model.
  • the preset threshold is specifically set according to the actual situation, and is not limited in any embodiment of the present application.
  • performing filtering processing on block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image may include:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one processed rectangular area.
  • the specific range of the rectangular area is, horizontal direction: 16 pixels on the left side of the block boundary to 16 pixels on the right side of the block boundary; vertical direction: the upper edge to the lower edge of the reconstructed block.
  • the specific range of the rectangular area is, vertical direction: 16 pixels on the upper side of the block boundary to 16 pixels on the lower side of the block boundary; horizontal direction: the left edge to the right edge of the reconstructed block.
  • the size of the rectangular area is all 32*128.
  • the unit of 32 is a pixel
  • the unit of 128 is a pixel; that is, the size of the reconstructed image or reconstructed block is expressed by the number of pixels.
  • the at least one rectangular area can be input into a preset post-processing network model to obtain at least one processed rectangular area;
  • the processed rectangular area replaces the corresponding local area including the block boundary in the reconstructed image to obtain the target image.
  • the target image obtained at this time can reduce the blocking effect.
  • the method may further include:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one target rectangular area.
  • the edge pixel area with a width of 8 pixels and a height of 128 pixels on the left and right sides, and only keep the center size of 16*
  • the rectangular area of 128 is at least one target rectangular area obtained by cropping; finally, the at least one target rectangular area is used to replace the corresponding local area including the block boundary in the reconstructed image, and the target image without obvious block effect can be obtained.
  • FIG. 4 shows a schematic structural diagram of using a preset post-processing network model to eliminate blockiness according to an embodiment of the present application.
  • Fig. 4 for a reconstructed image with obvious block effect, first extract at least one rectangular area including a block boundary, and the gray rectangular area shown in Fig.
  • the extracted at least one rectangular area is Input the preset post-processing network model to the rectangular area, and output at least one processed rectangular area; then, through cropping, discard the edge pixel areas on the left and right sides of the rectangular area to obtain at least one target rectangular area;
  • the target rectangular area replaces the corresponding local area including the block boundary in the reconstructed image, and a reconstructed image without obvious block effect can be obtained.
  • This embodiment provides an image processing method for acquiring multiple reconstructed blocks; wherein the multiple reconstructed blocks are multiple image blocks divided by an image to be processed via a preset coding network model and a preset decoding network After splicing the multiple reconstructed blocks to generate a reconstructed image; using a preset post-processing network model to filter the block boundaries in the reconstructed image to obtain a target image.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries. block effect, and can also improve the peak signal-to-noise ratio of the reconstructed image; at the same time, because only the rectangular area of the block boundary is post-processed, the total calculation amount of the post-processing network is also reduced, and the processing of each rectangular area is completely independent.
  • the parallelization of post-processing can also be achieved, further reducing the single-core post-processing runtime and memory requirements.
  • FIG. 5 shows a schematic flowchart of another image processing method provided by the embodiment of the present application. As shown in Figure 5, the method may include:
  • S501 Receive a code stream transmitted by an encoding device; wherein, the code stream is obtained by passing through a preset encoding network model by dividing a plurality of image blocks of an image to be processed.
  • S502 Use a preset decoding network model to parse the code stream to obtain a plurality of reconstruction blocks.
  • this method is applied to a decoding device.
  • the encoding device performs compression encoding through the preset encoding network model to generate a code stream
  • the code stream can be transmitted to the decoding device, and the decoding device uses the preset decoding network model to parse the code stream, thereby obtaining multiple reconstruction blocks.
  • the preset encoding network model and the preset decoding network model are obtained by model training based on the neural network structure.
  • the preset encoding network model is used to instruct the encoding device to encode multiple image blocks divided into the image to be processed to generate a code stream, and the preset decoding network model is used to instruct the decoding device to parse the code stream to obtain multiple reconstructed blocks.
  • the method may further include:
  • the training set includes at least one training image
  • An encoding network model and a decoding network model are constructed, and model training is performed on the encoding network model and the decoding network model based on the training set to obtain the preset encoding network model and the preset decoding network model.
  • a training set needs to be obtained first, and the training set includes at least one training image.
  • canonical high-definition static image datasets such as NIC datasets
  • canonical high-definition static image datasets can be collected and organized; then, according to the NIC datasets, a training set for model training, as well as a test set and cross-validation for model testing and model validation can be obtained collection etc.
  • FIG. 6 shows a schematic diagram of an end-to-end network structure including a preset coding network model and a preset decoding network model provided by an embodiment of the present application.
  • the encoding end adopts an encoding network model structure, which may include a main encoder, a Hyper prior encoder and a context model. Among them, for the input image, it can be divided into multiple image blocks.
  • the role of the main encoder is to transform the input original image into a feature map with 192 channels, and the row and column dimensions are 1/16 of the original size respectively.
  • the role of the super-prior codec and context model is to estimate the probability distribution of pixels in the feature map from the feature map, and provide the probability distribution to the entropy encoder.
  • the entropy encoder here can use arithmetic coding, and it is lossless entropy coding and compression.
  • the feature map generated by the main encoder can be quantized by rounding the quantization module, and the entropy encoder uses the probability distribution provided by the super-prior encoder, the super-prior decoder and the context model to quantize the quantization.
  • the resulting feature map is subjected to lossless entropy coding (such as arithmetic coding) to form a code stream; and the compressed data generated by the transcendental encoder adopts a fixed probability distribution for probability calculation, and is added to the final code stream as additional information after passing through the entropy encoder. middle.
  • the decoding end adopts a decoding network model structure, which may include a main decoder, a super-a priori decoder and a context model.
  • the role of the super-priority decoder and the context model is to decode the probability distribution of the pixels in the feature map through additional information and provide it to the entropy decoder, while the role of the main decoder is to restore the feature map to a reconstructed block, and then according to the reconstruction block.
  • the building blocks are stitched into a reconstructed image.
  • the main encoder is used to convert the pixel domain of the image into the feature domain
  • the super-prior encoder is used to convert the feature domain into a probability distribution
  • the super-prior decoder is used to convert It is used to convert the probability distribution into the feature domain
  • the main decoder converts the feature domain to the pixel domain to reconstruct the reconstructed image.
  • the probability distribution of the context model can be represented by ( ⁇ , ⁇ ); where ⁇ represents the mean, and ⁇ represents the variance.
  • model training can be performed on the encoding network model and the decoding network model by using the training set and the preset algorithm.
  • performing model training on the encoding network model and the decoding network model based on the training set to obtain the preset encoding network model and the preset decoding network model Can include:
  • the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • the preset algorithm may be the Adam gradient optimization algorithm.
  • the Adam gradient optimization algorithm is used to train the end-to-end network structure of the encoding network model and the decoding network model.
  • the cost function may be a rate-distortion cost function, and the degree of distortion is the mean square error between the training image input by the network structure and the reconstructed image output by the network structure.
  • the code rate is estimated by calculating the amount of information contained in the pixels in the feature map using the probability distribution obtained by the super-prior encoder, the super-prior decoder and the context model.
  • the encoding network model and the decoding network model are fully trained by using the training set, and after the loss value corresponding to the cost function converges and converges to the preset threshold, the encoding network model and the decoding network model are saved as the middle-end of the embodiments of the present application.
  • the preset encoding network model and the preset decoding network model of the end-to-end network structure are saved as the middle-end of the embodiments of the present application.
  • the post-processing network model is trained.
  • the method may further include:
  • the multiple reconstruction training blocks are multiple training blocks divided by at least one training image in the training set via the preset coding network model and the preset coding network model.
  • the multiple reconstruction training blocks are multiple training blocks divided by at least one training image in the training set via the preset coding network model and the preset coding network model.
  • a post-processing network model is constructed, and the post-processing network model is trained based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the training of the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model may include:
  • the post-processing network model obtained after training is determined as the preset post-processing network model.
  • the Adam gradient optimization algorithm can also be used.
  • the training images in the training set it can be divided into multiple training blocks of equal size and without overlap, and after inputting the preset encoding network model and the preset decoding network model, the obtained multiple reconstruction blocks are re-spliced into Reconstructed training images with blockiness.
  • the reconstructed training image with block effect can be used as the training input image of the post-processing network model, and the training image in the training set can be used as the training target image of the post-processing network model;
  • the mean square error of constructs the cost function for model training can be used.
  • the network parameters of the preset encoding network model and the preset decoding network model are kept fixed, and only the post-processing network model is updated iteratively.
  • the post-processing network model obtained by training at this time is the preset post-processing network model.
  • the preset threshold is specifically set according to the actual situation, and is not limited in any embodiment of the present application.
  • S503 Splicing the multiple reconstructed blocks to generate a reconstructed image.
  • S504 Use a preset post-processing network model to filter the block boundaries in the reconstructed image to obtain a target image.
  • the preset post-processing network model can be used to filter the block boundaries in the reconstructed image, so as to obtain a target image that eliminates blockiness.
  • performing filtering processing on block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image may include:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one processed rectangular area.
  • the method may further include:
  • the target image is obtained by replacing the corresponding local area including the block boundary in the reconstructed image with the at least one target rectangular area.
  • the at least one rectangular area can be input into a preset post-processing network model to obtain at least one processed rectangular area;
  • a processed rectangular area replaces the corresponding local area including the block boundary in the reconstructed image to obtain the target image.
  • the target image obtained at this time can reduce the blocking effect.
  • the at least one processed rectangular area needs to be cropped, such as discarding the left and right sides.
  • the corresponding local area of can obtain the target image without obvious block effect.
  • This embodiment provides an image processing method, by receiving a code stream transmitted by an encoding device; wherein, the code stream is obtained from a plurality of image blocks divided by an image to be processed through a preset encoding network model; Suppose a decoding network model parses the code stream to obtain multiple reconstructed blocks; splices the multiple reconstructed blocks to generate a reconstructed image; uses a preset post-processing network model to analyze the block boundaries in the reconstructed image Perform filtering to obtain the target image.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries. block effect, and can also improve the peak signal-to-noise ratio of the reconstructed image; at the same time, because only the rectangular area of the block boundary is post-processed, the total calculation amount of the post-processing network is also reduced, and the processing of each rectangular area is completely independent.
  • the parallelization of post-processing can also be achieved, further reducing the single-core post-processing runtime and memory requirements.
  • FIG. 7 shows a schematic flowchart of another image processing method provided by the embodiment of the present application. As shown in Figure 7, the method may include:
  • S701 Acquire an image to be processed.
  • S702 Divide the to-be-processed image into blocks to obtain multiple image blocks; wherein the multiple image blocks are of equal size and do not overlap.
  • S703 Use a preset coding network model to encode the plurality of image blocks to generate a code stream.
  • this method is applied to an encoding device. After the encoding device performs compression encoding through the preset encoding network model to generate a code stream, the code stream can be transmitted to the decoding device, and the decoding device uses the preset decoding network model to parse the code stream, thereby obtaining multiple reconstruction blocks.
  • the preset encoding network model and the preset decoding network model are obtained by model training based on the neural network structure.
  • the preset encoding network model is used to instruct the encoding device to encode multiple image blocks divided into the image to be processed to generate a code stream, and the preset decoding network model is used for the decoding device to parse the code stream to obtain multiple reconstructed blocks.
  • the method may further include:
  • the training set includes at least one training image
  • An encoding network model and a decoding network model are constructed, and model training is performed on the encoding network model and the decoding network model based on the training set to obtain a preset encoding network model and a preset decoding network model.
  • performing model training on the encoding network model and the decoding network model based on the training set to obtain a preset encoding network model and a preset decoding network model may include:
  • the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • the Adam gradient optimization algorithm can be used to train the encoding network model and the decoding network model.
  • the cost function may be a rate-distortion cost function
  • the degree of distortion is the mean square error between the training image input by the network structure and the reconstructed image output by the network structure.
  • the encoding network model and the decoding network model obtained by training at this time are implemented in this application.
  • This embodiment provides an image processing method, by acquiring an image to be processed; dividing the image to be processed into blocks to obtain multiple image blocks; wherein, the multiple image blocks are of equal size and do not overlap; using a preset
  • the encoding network model encodes the plurality of image blocks to generate a code stream; and transmits the code stream to a decoding device.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the image size of the encoding network model and the preset decoding network model is reduced, thereby reducing the running time of the encoding and decoding and the running memory requirements.
  • FIG. 8 shows a detailed schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in Figure 8, the detailed process may include:
  • an appropriate static image training set may be selected.
  • the selection of the training set has a great influence on the training of the entire neural network.
  • the NIC data set can be selected.
  • the NIC data set is the development data set of the IEEE standard test model NIC based on deep learning image compression. In this data set, it can include a training set with an image size of 256*256, or a training set with an image size of 256*256. Validation collections and test collections.
  • S802 Establish a multi-layer deep neural network model, including an encoding network model, a decoding network model, and a post-processing network model.
  • the coding end adopts the coding network model structure, including the main encoder, the super-prior encoder and the context model.
  • the role of the main encoder is to transform the input image into a feature map with 192 channels and 1/16 of the original size in row and column size.
  • the role of the super-prior codec and context model is to estimate the probability distribution of pixels in the feature map according to the feature map and provide it to the entropy encoder.
  • the compressed data generated by the super-a priori encoder adopts a fixed probability distribution for probability calculation, and is added to the final compressed code stream as additional information after entropy encoding.
  • the decoding end adopts the decoding network model structure, including the main decoder, the super-a priori decoder and the context model.
  • the role of the super-priority decoder and context model is to decode the probability distribution of pixels in the feature map through additional information and provide it to the entropy decoder.
  • the role of the main decoder is to restore the feature map to a reconstructed image.
  • the post-processing network model As shown in Figure 2, it can be composed of a convolution layer, an activation function, and multiple cascaded residual blocks for improving the performance of the model.
  • the specific network structure inside the residual block is shown in Figure 3.
  • k3n128 represents a convolutional layer with a convolution kernel size of 3*3, an output feature number of 128, and a stride of 1;
  • k33 represents a convolution kernel size of 3*3, the output feature number is 3, and the step size is 1 convolutional layer.
  • S803 Use the training set and the preset algorithm to perform model training on the encoding network model and the decoding network model to obtain a preset encoding network model and a preset decoding network model.
  • the Adam gradient optimization algorithm may be used to perform model training on the end-to-end encoding network model and decoding network model.
  • the cost function is the rate-distortion cost function
  • the degree of distortion is the mean square error between the training image input by the network structure and the reconstructed image output by the network structure.
  • the probability distribution obtained by the model calculates the amount of information contained in the pixels in the feature map to estimate.
  • S804 Based on the preset encoding network model and the preset decoding network model obtained by training, use the training set and the preset algorithm to perform model training on the post-processing network model to obtain a preset post-processing network model.
  • model training is performed on the post-processing network model using the preset encoding network model and the preset decoding network model saved in step S803.
  • the obtained reconstructed blocks are re-spliced into 256*256 reconstructed training images with blockiness.
  • the reconstructed training image with block effect can be used as the training input image of the post-processing network model, and the unencoded and compressed training image in the training set can be used as the training target image of the post-processing network model;
  • the cost function of model training is constructed with the mean square error of the training target image, and the Adam gradient optimization algorithm is used to train the post-processing network.
  • the network parameters of the preset encoding network model and the preset decoding network model are kept fixed, and only the post-processing is iteratively updated. network model. After the loss value corresponding to the cost function converges, the post-processing network model obtained by training at this time is the preset post-processing network model.
  • S805 Divide the to-be-processed image into 128*128 image blocks of equal size without overlapping, and input a preset coding network model to generate a code stream to be transmitted.
  • the image to be processed it can be divided into multiple image blocks of equal size without overlapping, and these image blocks are input into the preset coding network model to generate a code stream; specifically, the output of the preset coding network model can be input.
  • the data is quantized and encoded with lossless entropy and output as compressed data.
  • the image to be processed is divided into 128*128 non-overlapping image blocks, and then the preset encoding network model is input, and the preset encoding network model is used to encode each image block independently to generate a feature map. .
  • the feature map is quantized by rounding, and the entropy encoder uses the probability distribution provided by the super-prior encoder, the super-prior decoder and the context model to perform lossless entropy coding (such as arithmetic coding) on the quantized feature map. ) to form a code stream, and superimpose it with the additional code stream generated by the super-a priori encoder as the final compressed data, and then transmit it to the decoding end in the form of a code stream.
  • lossless entropy coding such as arithmetic coding
  • S806 Parse the code stream by using a preset decoding network model to obtain a reconstruction block of 128*128, and splicing to generate a reconstructed image.
  • the decoding end reconstructs the feature map of each block into a 128*128 reconstructed block through the entropy decoder and the preset decoding network model in a symmetrical manner with the encoding end, and finally reconstructs the band by splicing. Reconstructed image with obvious blockiness.
  • S807 Use a preset post-processing network model to perform local post-processing on the block boundaries in the reconstructed image to obtain a target image.
  • step S807 local post-processing is performed on the block boundary of the reconstructed image in step S806. Specifically, a rectangular area near the image boundary is extracted in the manner shown in FIG. 4 .
  • the specific range for the rectangular area of the horizontal boundary is, horizontal: 16 pixels on the left side of the block boundary to 16 pixels on the right side of the block boundary, and vertical: the upper edge of the block to the lower edge of the block.
  • the specific range of the rectangular area for the vertical boundary is, vertical: 16 pixels on the upper side of the block boundary to 16 pixels on the lower side of the block boundary.
  • Landscape Block left edge to block right edge.
  • the size of the rectangular area is all 32*128 pixels.
  • the embodiment of the present application may further crop the rectangular area output by the preset post-processing network model, for example, discarding the left and right two
  • the edge pixel area with a side width of 8 pixels and a height of 128 pixels is only reserved for a rectangular area with a center size of 16*128.
  • the corresponding block boundary rectangular area in the original reconstructed image is replaced by the rectangular area with the size of 16*128, and the reconstructed image without obvious block effect can be obtained.
  • the embodiments of the present application provide a block encoding and decoding scheme for static images, by dividing the input image into blocks and then independently encoding and decoding each image block, multi-core parallel encoding and decoding processing of images can be realized, Thus, the running time required for encoding and decoding the image and the running memory requirement of each core are reduced; in addition, local post-processing at the boundary of the reconstructed image can reduce the blocking effect at the boundary of the block.
  • the specific steps are as follows: (1) Select the appropriate static image training set, training set and verification set; (2) Establish an end-to-end network encoding network model, decoding network model and post-processing network model for reconstructed images; (3) Train the encoding network model and decoding network model of the end-to-end network, and obtain a preset encoding network model and a preset decoding network model after training; (4) Divide the training images in the training set into 128*128 non-overlapping blocks for input after training The preset encoding network model and the preset decoding network model are obtained, and then the reconstructed blocks obtained by decoding are spliced into reconstructed images and used as new training data to train the post-processing network model, and the preset post-processing network model is obtained after training; (5 ) The output data of the encoding end through the preset encoding network model is used as compressed data after quantization and lossless entropy encoding, and is transmitted to the decoding end in the form of a code stream; (6)
  • the input image is processed in blocks on the basis of the existing image encoding and decoding network structure, and the blocks are independently encoded and decoded to realize multi-core parallel processing of encoding and decoding, which can reduce the operation time. time and memory requirements for single-core operation.
  • local post-processing is performed on the rectangular area at the block boundary in the reconstructed image, which can reduce the total calculation amount, and at the same time, each rectangular area is completely independent in the processing process, which can realize the parallelization of post-processing, thereby reducing the running time. effect on time and single-core memory requirements.
  • the technical solutions of the embodiments of the present application can realize multi-core parallel encoding and decoding of images, and reduce the single-core encoding and decoding running time and running memory requirements. Because of the encoding and decoding process based on the preset encoding network model and the preset decoding network model, the blocks that are divided into multiple image blocks are independent, so multi-core parallel encoding and decoding of images can be realized. In addition, since the size of the images input to the preset encoding network model and the preset decoding network model after the block is reduced, the running time and running memory requirements required for single-core encoding and decoding are also reduced.
  • the running time of each core and the running memory requirement of a single core are reduced to 1/20 of the non-block encoding and decoding processing.
  • the unit of running time is Second (Second, s), and the unit of running memory requirement is megabyte (MByte, MB).
  • Block-free encoding and decoding network parameters There are block codec network parameters Running time (s) 10.85 0.5 Running Memory Requirements (MB) 5195 263
  • the technical solutions of the embodiments of the present application use a preset post-processing network model to eliminate blockiness in the reconstructed image, which can improve the peak signal-to-noise ratio of the reconstructed image.
  • the technical solutions of the embodiments of the present application use a preset post-processing network model to solve the block effect caused by discontinuous block boundaries due to block encoding and decoding; as shown in FIGS. 9A and 9B , the reconstructed image in FIG. 9A obviously exists Block effect, after post-processing, it can be clearly seen in Fig. 9B that the block effect problem of the reconstructed image has been effectively solved.
  • the peak signal-to-noise ratio of the reconstructed image is higher than that without post-processing about 0.05dB.
  • the technical solutions of the embodiments of the present application only perform post-processing on the block boundary rectangular area, which can reduce the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent, and the parallelization of the post-processing can also be realized, thereby Reduced single-core post-processing runtime and memory requirements.
  • the post-processing method adopted in this scheme is local post-processing of the block boundary rectangular area.
  • the total calculation amount can be reduced to 40%, and it can be seen from Table 2 that the running time and running memory required for each core post-processing are reduced to 40%. 1/90 of the whole image post-processing.
  • This embodiment provides an image processing method, and the specific implementation of the foregoing embodiments is described in detail through this embodiment. It can be seen from this embodiment that, for a plurality of image blocks divided into an image to be processed, the blocks are completely different from each other. Independence, multi-core parallel encoding and decoding processing can be realized by using the preset encoding network model and the preset decoding network model; and because the image size of the preset encoding network model and the preset decoding network model is reduced after the block, the encoding and decoding network model can also be reduced.
  • the block effect at the block boundary can also be eliminated, and the peak signal-to-noise ratio of the reconstructed image can also be improved;
  • the rectangular area of the block boundary is post-processed, which also reduces the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent, and the parallelization of post-processing can also be realized, which further reduces the running time and memory of single-core post-processing. need.
  • FIG. 11 shows a schematic structural diagram of an image processing apparatus 110 provided by an embodiment of the present application.
  • the image processing apparatus 110 may include: an acquisition unit 1101 and a processing unit 1102; wherein,
  • the obtaining unit 1101 is configured to obtain multiple reconstructed blocks; wherein, the multiple reconstructed blocks are obtained after passing through a preset encoding network model and a preset decoding network model of multiple image blocks divided by the image to be processed;
  • the processing unit 1102 is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image; and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • the image processing apparatus 110 may further include a construction unit 1103 and a training unit 1104; wherein,
  • the obtaining unit 1101 is further configured to obtain a plurality of reconstruction training blocks; wherein, the plurality of reconstruction training blocks are a plurality of training blocks divided by at least one training image in the training set via the preset coding network model and the preset decoding network model; and splicing the multiple reconstruction training blocks to obtain at least one reconstructed training image;
  • the training unit 1104 is configured to train the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the training unit 1104 is specifically configured to use a preset algorithm to perform model training on the post-processing network model based on the at least one reconstructed training image; and when the loss value corresponding to the cost function of the model training converges to a predetermined value.
  • the threshold is set, the post-processing network model obtained after training is determined as the preset post-processing network model.
  • the image processing apparatus 110 may further include a determination unit 1105 configured to determine at least one rectangular area including the block boundary in the reconstructed image;
  • the processing unit 1102 is specifically configured to input the at least one rectangular area into the preset post-processing network model to obtain at least one processed rectangular area; and replace the reconstructed image with the at least one processed rectangular area
  • the target image is obtained by including the corresponding local area of the block boundary.
  • processing unit 1102 is further configured to crop the at least one processed rectangular area to obtain at least one target rectangular area; and replace the block included in the reconstructed image with the at least one target rectangular area The corresponding local area of the boundary is obtained to obtain the target image.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit is implemented in the form of a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or The part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions for making a computer device (which can be It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • an embodiment of the present application provides a computer storage medium, which is applied to the image processing apparatus 110, where the computer storage medium stores an image processing program, and when the image processing program is executed by the first processor, any one of the foregoing embodiments is implemented method described in item.
  • the image processing apparatus 110 may include: a first communication interface 1201 , a first memory 1202 and a first processor 1203 ; various components are coupled together through a first bus system 1204 .
  • the first bus system 1204 is used to realize the connection communication between these components.
  • the first bus system 1204 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the first bus system 1204 in FIG. 12 . in,
  • the first communication interface 1201 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • a first memory 1202 for storing computer programs that can run on the first processor 1203;
  • the first processor 1203 is configured to, when running the computer program, execute:
  • the plurality of reconstruction blocks are obtained after the plurality of image blocks divided by the image to be processed pass through a preset encoding network model and a preset decoding network model;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • the first memory 1202 in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory Synchlink DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the first processor 1203 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the first processor 1203 or an instruction in the form of software.
  • the above-mentioned first processor 1203 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the first memory 1202, and the first processor 1203 reads the information in the first memory 1202, and completes the steps of the above method in combination with its hardware.
  • the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic Devices (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), General Purpose Processors, Controllers, Microcontrollers, Microprocessors, Others for performing the functions described herein electronic unit or a combination thereof.
  • the techniques described herein may be implemented through modules (eg, procedures, functions, etc.) that perform the functions described herein.
  • Software codes may be stored in memory and executed by a processor.
  • the memory can be implemented in the processor or external to the processor.
  • the first processor 1203 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an image processing apparatus, and the image processing apparatus may include an acquisition unit and a processing unit.
  • the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries.
  • the application only performs post-processing on the rectangular area of the block boundary, which also reduces the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent. , it can also realize the parallelization of post-processing, and further reduce the running time and memory requirements of single-core post-processing.
  • FIG. 13 shows a schematic structural diagram of the composition of a decoding device 130 provided by an embodiment of the present application.
  • the decoding device 130 may include: a receiving unit 1301, a decoding unit 1302 and a post-processing unit 1303; wherein,
  • the receiving unit 1301 is configured to receive a code stream transmitted by an encoding device; wherein, the code stream is obtained by a plurality of image blocks divided by the image to be processed through a preset encoding network model;
  • the decoding unit 1302 is configured to use a preset decoding network model to parse the code stream to obtain a plurality of reconstruction blocks;
  • the post-processing unit 1303 is configured to splicing the plurality of reconstructed blocks to generate a reconstructed image, and to filter the block boundaries in the reconstructed image by using a preset post-processing network model to obtain a target image.
  • the decoding device 130 may further include an acquisition unit 1304, a construction unit 1305 and a training unit 1306; wherein,
  • Obtaining unit 1304, configured to obtain a training set; wherein, the training set includes at least one training image;
  • a construction unit 1305, configured to construct an encoding network model and a decoding network model
  • the training unit 1306 is configured to perform model training on the encoding network model and the decoding network model based on the training set to obtain the preset encoding network model and the preset decoding network model.
  • the training unit 1306 is specifically configured to use a preset algorithm to perform model training on the encoding network model and the decoding network model based on the training set; and when the loss value corresponding to the cost function of the model training converges
  • the preset threshold is reached, the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • the obtaining unit 1304 is further configured to obtain multiple reconstructed training blocks; wherein the multiple reconstructed training blocks are multiple training blocks divided by at least one training image in the training set The blocks are obtained after passing through the preset encoding network model and the preset decoding network model; and splicing the multiple reconstruction training blocks to obtain at least one reconstructed training image;
  • the construction unit 1305 is further configured to construct a post-processing network model
  • the training unit 1306 is further configured to train the post-processing network model based on the at least one reconstructed training image to obtain the preset post-processing network model.
  • the training unit 1306 is specifically configured to use a preset algorithm to perform model training on the post-processing network model based on the at least one reconstructed training image; and when the loss value corresponding to the cost function of the model training converges to a predetermined value.
  • the threshold is set, the post-processing network model obtained after training is determined as the preset post-processing network model.
  • the post-processing unit 1303 is specifically configured to determine at least one rectangular area including the block boundary in the reconstructed image; and input the at least one rectangular area into the preset post-processing network model, obtaining at least one processed rectangular area; and replacing the corresponding local area including the block boundary in the reconstructed image with the at least one processed rectangular area to obtain the target image.
  • the post-processing unit 1303 is further configured to crop the at least one processed rectangular area to obtain at least one target rectangular area; and use the at least one target rectangular area to replace the reconstructed image including the The corresponding local area of the block boundary is obtained to obtain the target image.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit may be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the decoding device 130, where the computer storage medium stores an image processing program, and when the image processing program is executed by the second processor, any one of the foregoing embodiments is implemented. one of the methods described.
  • the decoding device 130 may include: a second communication interface 1401 , a second memory 1402 and a second processor 1403 ; various components are coupled together through a second bus system 1404 .
  • the second bus system 1404 is used to realize the connection communication between these components.
  • the second bus system 1404 also includes a power bus, a control bus, and a status signal bus.
  • the various buses are labeled as the second bus system 1404 in FIG. 14 . in,
  • the second communication interface 1401 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • a second memory 1402 for storing computer programs that can run on the second processor 1403;
  • the second processor 1403 is configured to, when running the computer program, execute:
  • the code stream transmitted by the encoding device; wherein, the code stream is obtained by passing through a preset encoding network model through a plurality of image blocks divided by the image to be processed;
  • a target image is obtained by filtering the block boundaries in the reconstructed image by using a preset post-processing network model.
  • the second processor 1403 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 1402 is similar to that of the first memory 1202, and the hardware function of the second processor 1403 is similar to that of the first processor 1203; details are not described here.
  • This embodiment provides a decoding device, and the decoding device may include a receiving unit, a decoding unit, and a post-processing unit.
  • the decoding device may include a receiving unit, a decoding unit, and a post-processing unit.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries.
  • the application only performs post-processing on the rectangular area of the block boundary, which also reduces the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent. , it can also realize the parallelization of post-processing, and further reduce the running time and memory requirements of single-core post-processing.
  • FIG. 15 shows a schematic structural diagram of an encoding device 150 provided by an embodiment of the present application.
  • the encoding device 150 may include: an obtaining unit 1501, a block unit 1502, an encoding unit 1503 and a sending unit 1504; wherein,
  • an acquiring unit 1501 configured to acquire an image to be processed
  • Blocking unit 1502 configured to block the to-be-processed image to obtain multiple image blocks, wherein the multiple image blocks are of equal size and do not overlap;
  • an encoding unit 1503 configured to encode the plurality of image blocks by using a preset encoding network model to generate a code stream;
  • the sending unit 1504 is configured to transmit the code stream to the decoding device.
  • the encoding device 150 may further include a construction unit 1505 and a training unit 1506; wherein,
  • the obtaining unit 1501 is further configured to obtain a training set; wherein, the training set includes at least one training image;
  • a construction unit 1505 configured to construct an encoding network model and a decoding network model
  • the training unit 1506 is configured to perform model training on the encoding network model and the decoding network model based on the training set to obtain a preset encoding network model and a preset decoding network model; wherein the preset decoding network model is used to indicate
  • the decoding device parses the code stream to obtain a plurality of reconstructed blocks.
  • the training unit 1506 is specifically configured to use a preset algorithm to perform model training on the encoding network model and the decoding network model based on the training set; and when the loss value corresponding to the cost function of the model training converges
  • the preset threshold is reached, the encoding network model and the decoding network model obtained after training are determined as the preset encoding network model and the preset decoding network model.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit may be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the encoding device 150, where the computer storage medium stores an image processing program, and when the image processing program is executed by the third processor, any of the foregoing embodiments is implemented. one of the methods described.
  • the encoding device 150 may include: a third communication interface 1601 , a third memory 1602 and a third processor 1603 ; various components are coupled together through a third bus system 1604 .
  • the third bus system 1604 is used to implement connection communication between these components.
  • the third bus system 1604 also includes a power bus, a control bus, and a status signal bus.
  • the various buses are labeled as a third bus system 1604 in FIG. 16 . in,
  • the third communication interface 1601 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • a third memory 1602 for storing computer programs that can run on the third processor 1603;
  • the third processor 1603 is configured to, when running the computer program, execute:
  • the code stream is transmitted to the decoding device.
  • the third processor 1603 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the third memory 1602 is similar to that of the first memory 1202, and the hardware function of the third processor 1603 is similar to that of the first processor 1203; details are not described here.
  • This embodiment provides an encoding device, and the encoding device may include an acquisition unit, a block unit, an encoding unit, and a sending unit.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the image size of the encoding network model and the preset decoding network model is reduced, thereby reducing the running time of the encoding and decoding and the running memory requirements.
  • FIG. 17 shows a schematic structural diagram of a video system 170 provided by an embodiment of the present application.
  • the video system 170 may include: the encoding device 150 described in the foregoing embodiments and the decoding device 130 described in the foregoing embodiments. in,
  • the encoding device 150 is configured to obtain an image to be processed; and block the image to be processed to obtain multiple image blocks; wherein, the multiple image blocks are equal in size and do not overlap; and use a preset encoding network model to encoding the plurality of image blocks to generate a code stream; and transmitting the code stream to the decoding device 130;
  • the decoding device 130 is configured to receive the code stream transmitted by the encoding device 150; and use a preset decoding network model to parse the code stream to obtain a plurality of reconstruction blocks; and splicing the plurality of reconstruction blocks to generate a reconstruction image, and use a preset post-processing network model to filter the block boundaries in the reconstructed image to obtain a target image.
  • the blocks are completely independent, and a preset encoding network model and a preset decoding network model can be used to implement multi-core parallel encoding and decoding processing;
  • the image size of the preset encoding network model and preset decoding network model input after the block is reduced, which can also reduce the running time and running memory requirements of encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate
  • the block effect at the block boundary can also improve the peak signal-to-noise ratio of the reconstructed image; at the same time, the application only performs post-processing on the rectangular area of the block boundary, which also reduces the total calculation amount of the post-processing network.
  • the processing of rectangular regions is completely independent, and the parallelization of post-processing can also be realized, which further reduces the running time and memory requirements of single-core post-processing.
  • multiple reconstruction blocks are obtained; wherein, the multiple reconstruction blocks are obtained after multiple image blocks divided by the image to be processed through a preset encoding network model and a preset decoding network model splicing the plurality of reconstructed blocks to generate a reconstructed image; using a preset post-processing network model to filter the block boundaries in the reconstructed image to obtain a target image.
  • the blocks are completely independent, and the preset encoding network model and the preset decoding network model can be used to realize multi-core parallel encoding and decoding processing;
  • the reduction of the image size of the encoding network model and the preset decoding network model can also reduce the running time and running memory requirements of the encoding and decoding; in addition, by filtering the block boundaries in the reconstructed image, it can also eliminate the block boundaries.
  • the application only performs post-processing on the rectangular area of the block boundary, which also reduces the total calculation amount of the post-processing network, and the processing of each rectangular area is completely independent. , it can also realize the parallelization of post-processing, and further reduce the running time and memory requirements of single-core post-processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种图像处理方法、装置、设备、计算机存储介质和系统,该方法包括:获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;对所述多个重构块进行拼接,生成重构图像;利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。

Description

图像处理方法、装置、设备、计算机存储介质和系统
相关申请的交叉引用
本申请要求在2020年08月21日提交中国专利局、申请号为202010851882.8、申请名称为“图像处理方法、装置、设备、计算机存储介质和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编解码技术领域,尤其涉及一种图像处理方法、装置、设备、计算机存储介质和系统。
背景技术
近年来,人工神经网络已经发展到了深度学习(deep learning)阶段。深度学习是机器学习的分支,是一种试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法,其强大的表达能力使其在计算机视觉和图像处理中得到了广泛应用,在视频和图像处理上的表现也具有较好的效果。
目前,基于深度学习的图像编解码以及图像后处理技术往往采用将整张图像一次性输入编解码网络进行处理的方案。但是随着图像尺寸的增大以及编解码网络的加深,这种方案会导致编解码的运行时间以及运行内存需求的大大增加。另外,即使目前存在一些基于块的图像编解码方案,但是现有方案的结构使得块与块之间的编解码不完全独立,无法实现编解码处理的并行化,仍然无法降低编解码的运行时间以及运行内存需求。
发明内容
本申请提出一种图像处理方法、装置、设备、计算机存储介质和系统,可以实现编解码处理的并行化,能够提高重构图像的峰值信噪比;同时还能够降低后处理网络的总计算量,从而降低编解码的运行时间以及运行内存需求。
本申请的技术方案是这样实现的:
第一方面,本申请实施例提供了一种图像处理方法,应用于图像处理装置,所述方法包括:
获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第二方面,本申请实施例提供了一种图像处理方法,应用于解码设备,所述方法包括:
接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
利用预设解码网络模型解析所述码流,获取多个重构块;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第三方面,本申请实施例提供了一种图像处理方法,应用于编码设备,所述方法包括:
获取待处理图像;
对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
利用预设编码网络模型对所述多个图像块进行编码,生成码流;
将所述码流传输到解码设备。
第四方面,本申请实施例提供了一种图像处理装置,所述图像处理装置包括:获取单元和处理单元; 其中,
所述获取单元,配置为获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
所述处理单元,配置为对所述多个重构块进行拼接,生成重构图像;以及利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第五方面,本申请实施例提供了一种图像处理装置,所述图像处理装置包括:第一存储器和第一处理器;其中,
所述第一存储器,用于存储能够在所述第一处理器上运行的可执行指令;
所述第一处理器,用于在运行所述可执行指令时,执行如第一方面所述的方法。
第六方面,本申请实施例提供了一种解码设备,所述解码设备包括:接收单元、解码单元和后处理单元;其中,
所述接收单元,配置为接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
所述解码单元,配置为利用预设解码网络模型解析所述码流,获取多个重构块;
所述后处理单元,配置为对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第七方面,本申请实施例提供了一种解码设备,所述解码设备包括:第二存储器和第二处理器;其中,
所述第二存储器,用于存储能够在所述第二处理器上运行的可执行指令;
所述第二处理器,用于在运行所述可执行指令时,执行如第二方面所述的方法。
第八方面,本申请实施例提供了一种编码设备,所述编码设备包括:获取单元、分块单元、编码单元和发送单元;其中,
所述获取单元,配置为获取待处理图像;
所述分块单元,配置为对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
所述编码单元,配置为利用预设编码网络模型对所述多个图像块进行编码,生成码流;
所述发送单元,配置为将所述码流传输到解码设备。
第九方面,本申请实施例提供了一种编码设备,所述编码设备包括:第三存储器和第三处理器;其中,
所述第三存储器,用于存储能够在所述第三处理器上运行的可执行指令;
所述第三处理器,用于在运行所述可执行指令时,执行如第三方面所述的方法。
第十方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有图像处理程序,所述图像处理程序被第一处理器执行时实现如第一方面所述的方法、或者被第二处理器执行时实现如第二方面所述的方法、或者被第三处理器执行时实现如第三方面所述的方法。
第十一方面,本申请实施例提供了一种视频系统,所述视频系统包括:编码设备和解码设备;其中,
所述编码设备,配置为获取待处理图像;以及对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;以及利用预设编码网络模型对所述多个图像块进行编码,生成码流;并将所述码流传输到解码设备;
所述解码设备,配置为接收所述编码设备传输的码流;以及利用预设解码网络模型解析所述码流,获取多个重构块;以及对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
附图说明
图1为本申请实施例提供的一种图像处理方法的流程示意图;
图2为本申请实施例提供的一种后处理网络模型的结构示意图;
图3为本申请实施例提供的一种残差块的结构示意图;
图4为本申请实施例提供的一种利用预设后处理网络模型进行块效应消除的结构示意图;
图5为本申请实施例提供的另一种图像处理方法的流程示意图;
图6为本申请实施例提供的一种包括预设编码网络模型和预设解码网络模型的端到端网络结构示意图;
图7为本申请实施例提供的又一种图像处理方法的流程示意图;
图8为本申请实施例提供的一种图像处理方法的详细流程示意图;
图9A为本申请实施例提供的一种带有块效应的重构图像示意图;
图9B为本申请实施例提供的一种无块效应的重构图像示意图;
图10为本申请实施例提供的一种预设后处理网络模型前后的图像率失真曲线对比示意图;
图11为本申请实施例提供的一种图像处理装置的组成结构示意图;
图12为本申请实施例提供的一种图像处理装置的硬件结构示意图;
图13为本申请实施例提供的一种解码设备的组成结构示意图;
图14为本申请实施例提供的一种解码设备的硬件结构示意图;
图15为本申请实施例提供的一种编码设备的组成结构示意图;
图16为本申请实施例提供的一种编码设备的硬件结构示意图;
图17为本申请实施例提供的一种视频系统的组成结构示意图。
具体实施方式
第一方面,本申请实施例提供了一种图像处理方法,应用于图像处理装置,所述方法包括:
获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
在一些实施例中,所述方法还包括:
获取多个重构训练块;其中,所述多个重构训练块是由训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;
对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
在一些实施例中,所述基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型,包括:
基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
在一些实施例中,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,包括:
确定所述重构图像中包括所述块边界的至少一个矩形区域;
将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
在一些实施例中,在所述得到至少一个处理后的矩形区域之后,所述方法还包括:
对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
第二方面,本申请实施例提供了一种图像处理方法,应用于解码设备,所述方法包括:
接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
利用预设解码网络模型解析所述码流,获取多个重构块;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
在一些实施例中,所述方法还包括:
获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型。
在一些实施例中,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型,包括:
基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
在一些实施例中,所述方法还包括:
获取多个重构训练块;其中,所述多个重构训练块是由所述训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;
对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
在一些实施例中,所述基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型,包括:
基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
在一些实施例中,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,包括:
确定所述重构图像中包括所述块边界的至少一个矩形区域;
将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
在一些实施例中,在所述得到至少一个处理后的矩形区域之后,所述方法还包括:
对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
第三方面,本申请实施例提供了一种图像处理方法,应用于编码设备,所述方法包括:
获取待处理图像;
对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
利用预设编码网络模型对所述多个图像块进行编码,生成码流;
将所述码流传输到解码设备。
在一些实施例中,所述方法还包括:
获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型;其中,所述预设解码网络模型用于指示所述解码设备解析所述码流以得到多个重构块。
在一些实施例中,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型,包括:
基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
第四方面,本申请实施例提供了一种图像处理装置,所述图像处理装置包括:获取单元和处理单元;其中,
所述获取单元,配置为获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
所述处理单元,配置为对所述多个重构块进行拼接,生成重构图像;以及利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第五方面,本申请实施例提供了一种图像处理装置,所述图像处理装置包括:第一存储器和第一处理器;其中,
所述第一存储器,用于存储能够在所述第一处理器上运行的可执行指令;
所述第一处理器,用于在运行所述可执行指令时,执行如第一方面中任一项所述的方法。
第六方面,本申请实施例提供了一种解码设备,所述解码设备包括:接收单元、解码单元和后处理单元;其中,
所述接收单元,配置为接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图 像块经由预设编码网络模型后得到的;
所述解码单元,配置为利用预设解码网络模型解析所述码流,获取多个重构块;
所述后处理单元,配置为对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
第七方面,本申请实施例提供了一种解码设备,所述解码设备包括:第二存储器和第二处理器;其中,
所述第二存储器,用于存储能够在所述第二处理器上运行的可执行指令;
所述第二处理器,用于在运行所述可执行指令时,执行如第二方面中任一项所述的方法。
第八方面,本申请实施例提供了一种编码设备,所述编码设备包括:获取单元、分块单元、编码单元和发送单元;其中,
所述获取单元,配置为获取待处理图像;
所述分块单元,配置为对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
所述编码单元,配置为利用预设编码网络模型对所述多个图像块进行编码,生成码流;
所述发送单元,配置为将所述码流传输到解码设备。
第九方面,本申请实施例提供了一种编码设备,所述编码设备包括:第三存储器和第三处理器;其中,
所述第三存储器,用于存储能够在所述第三处理器上运行的可执行指令;
所述第三处理器,用于在运行所述可执行指令时,执行如第三方面中任一项所述的方法。
第十方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有图像处理程序,所述图像处理程序被第一处理器执行时实现如第一方面中任一项所述的方法、或者被第二处理器执行时实现如第二方面中任一项所述的方法、或者被第三处理器执行时实现如第三方面中任一项所述的方法。
第十一方面,本申请实施例提供了一种视频系统,所述视频系统包括:编码设备和解码设备;其中,
所述编码设备,配置为获取待处理图像;以及对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;以及利用预设编码网络模型对所述多个图像块进行编码,生成码流;并将所述码流传输到解码设备;
所述解码设备,配置为接收所述编码设备传输的码流;以及利用预设解码网络模型解析所述码流,获取多个重构块;以及对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
人工神经网络(Artificial Neural Networks,ANNs)也可简称为神经网络,或者称为连接模型,它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的。
近年来,人工神经网络已经发展到了深度学习阶段。深度学习是机器学习的分支,是一种试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法,其强大的表达能力使其在各个机器学习的任务上取得了最好的效果,而且在视频和图像处理上的表现也超过了目前相关技术。
应理解,自编码器(Autoencoder)是深度学习的一个重要内容,神经网络通过大量数据集进行端到端(end-to-end)的训练,可以不断提高准确率,而Autoencoder通过设计编码(encode)和解码(decode)过程使得输入和输出越来越接近,这是一种无监督学习过程。这里,由于目前深度学习在视频和图像处理上的优秀表现,结合深度学习中的Autoencoder的基本思路进行视频和图像压缩,并且用深度学习的方式来提供一种新的编解码方法,对于未来视频压缩领域的发展是一个好的开始,也有利于未来在视频压缩方向上,基于神经网络的新方案在整个系统中有着比传统方案更好的表现和前景。
还需要说明的是,目前图像压缩方案一般会造成图像信息损失、图像质量下降,产生人工痕迹(artifacts)。这时候在图像压缩后就需要进行后处理,而且后处理是指针对压缩图像设计质量增强和人工痕迹消除的方法,用以改善图像的视觉效果。类似地,后处理在视频压缩中也被广泛采用。这样,由于近年来,深度学习在计算机视觉和图像处理中得到了广泛应用,使得目前一些研究工作也可以将深度学习用于图像或视频压缩后处理,并且取得了一定的效果。
然而,基于深度学习的图像编解码以及图像后处理技术的现有方案中,往往采用将整幅图像一次性输入编解码网络进行处理,但是随着图像尺寸的增大以及编解码网络的加深,这种方案会导致编解码的 运行时间以及运行内存需求的大大增加。另外,即使目前存在一些基于块的图像编解码方案,但是现有方案采用了长短期记忆人工神经网络结构,使得块与块之间的编解码不完全独立,无法实现编解码处理的并行化。而且现有的基于深度学习的图像后处理技术同样采用整幅图像输入的方案,该方案在导致运行时间以及运行内存增加的同时,由于图像中一些没必要被处理的区域也被后处理网络所计算,带来了计算冗余。
基于此,本申请实施例提供了一种图像处理方法,该方法的基本思想是:获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;对所述多个重构块进行拼接,生成重构图像;利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时本申请仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
下面将结合附图对本申请各实施例进行详细说明。
本申请的一实施例中,参见图1,其示出了本申请实施例提供的一种图像处理方法的流程示意图。如图1所示,该方法可以包括:
S101:获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的。
需要说明的是,该方法应用于图像处理装置,或者集成有图像处理装置的设备,例如解码设备。
还需要说明的是,编解码网络包括编码网络模型和解码网络模型,而预设编码网络模型和预设解码网络模型是基于神经网络结构进行模型训练得到的。其中,预设编码网络模型用于对待处理图像所划分的多个图像块进行编码以生成码流,预设解码网络模型用于对码流进行解析以得到多个重构块。
另外,针对待处理图像所划分的多个图像块,这多个图像块大小相等且无重叠。通常情况下,每个图像块的大小可以为128*128尺寸,但是本申请实施例并不作具体限定。
这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型进行并行编解码处理,能够并行得到多个重构块。
S102:对所述多个重构块进行拼接,生成重构图像。
需要说明的是,在得到多个重构块之后,可以通过拼接生成重构图像。但是拼接得到的重构图像带有明显的块效应;这时候需要对重构图像中的块边界进行滤波处理,以减弱块边界处的块效应。
S103:利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
需要说明的是,预设后处理网络模型也是基于神经网络结构进行模型训练得到的。在一些实施例中,在S103之前,该方法还可以包括:
获取多个重构训练块;
对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
需要说明的是,多个重构训练块可以是由训练集合中的至少一张训练图像所划分的多个训练块经由预设编码网络模型和预设解码网络模型后得到的。具体来讲,在模型训练前,首先需要获取训练集合,该训练集合中包括至少一张训练图像。这里,训练集合的选取对于整个神经网络的训练具有很大影响,在本申请实施例中,可以选取神经网络图像编解码器(Neutral Image Codec,NIC)数据集。NIC数据集是基于深度学习的图像压缩的电气和电子工程师协会(Institute of Electrical and Electronics Engineers,IEEE)标准测试模型NIC的开发数据集,在该数据集中,可以包括有训练集合,也可以包括有验证集合和测试集合;其中,训练集合中的图像大小可以为256*256,验证集合和测试集合中的图像大小也可以为256*256,但是本申请实施例不作任何限定。
还需要说明的是,后处理网络模型采用神经网络结构,其可以是由卷积层、激活函数以及用于提供模型性能的多个级联的残差块构成,网络结构如图2所示。这里,激活函数可以是线性整流函数(Rectified Linear Unit,ReLU),又称修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。
示例性地,以消除重构块边界的块效应为例,如图2所示,针对带有明显块效应的重构块边界像素,可以顺序经过第一卷积层、多个残差块(例如9个级联的残差块)、第二卷积层和第三卷积层,然后利 用加法器将第三卷积层的输出和第一卷积层的输入进行叠加,能够输出无明显块效应的重构块边界像素。其中,第一卷积层和第二卷积层包括有激活函数,第三卷积层不包括激活函数;且第一卷积层和第二卷积层表示为k3n128+ReLU,表明第一卷积层和第二卷积层的卷积核大小为3*3,输出特征数为128,步长为1,且包括有激活函数;而第三卷积层表示为k3n3,表明第三卷积层的卷积核大小为3*3,输出特征数为3,步长为1,但未包括激活函数。
对于每一个残差块,其网络结构如图3所示。在图3中,特征图作为输入,然后顺序经过第四卷积层和第五卷积层,再利用加法器将第五卷积层的输出和第四卷积层的输入进行叠加,从而得到输出特征图。其中,第四卷积层包括有激活函数,第五卷积层不包括激活函数;且第四卷积层表示为k3n128+ReLU,表明第一卷积层和第二卷积层的卷积核大小为3*3,输出特征数为128,步长为1,且包括有激活函数;而第五卷积层表示为k3n128,表明第五卷积层的卷积核大小为3*3,输出特征数为128,步长为1,但未包括激活函数。
这样,构建出后处理网络模型后,可以利用训练集合以及预设算法对后处理网络模型进行模型训练。具体地,在一些实施例中,所述基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型,可以包括:
基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
需要说明的是,针对训练集合中尺寸为256*256的训练图像,在将其划分为等大且无重叠的128*128的训练块并输入预设编码网络模型和预设解码网络模型后,将所得到的重构块重新拼接为256*256的带有块效应的重构训练图像。这时候可以将带有块效应的重构训练图像作为后处理网络模型的训练输入图像,将训练集合中的训练图像作为后处理网络模型的训练目标图像;然后可以根据训练输入图像和训练目标图像的均方差构建模型训练的代价函数。这里,代价函数可以为率失真代价函数,而失真度为训练输入图像和训练目标图像的均方差。
还需要说明的是,预设算法可以为自适应矩估计(Adaptive moment estimation,Adam)梯度优化算法。Adam梯度优化算法是一种对随机梯度下降法的扩展,在计算机视觉和自然语言处理中广泛应用于深度学习应用,其工作表现良好,能够很快地取得较好的成果。这样,利用Adam梯度优化算法训练后处理网络,在训练过程中保持预设编码网络模型和预设解码网络模型的网络参数固定,仅迭代更新后处理网络模型。在其代价函数对应的损失(Loss)值达到收敛且收敛到预设阈值后,这时候训练得到的后处理网络模型即为预设后处理网络模型。这里,预设阈值根据实际情况进行具体设定,本申请实施例不作任何限定。
可以理解,由于重构图像带有明显块效应,这里的滤波处理具体是指消除重构图像中块边界处的块效应。在一些实施例中,对于S103来说,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,可以包括:
确定所述重构图像中包括所述块边界的至少一个矩形区域;
将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
需要说明的是,首先需要提取重构图像中块边界附件的矩形区域。对于横向边界,矩形区域具体范围为,横向方向:块边界左侧16像素至块边界右侧16像素;纵向方向:重构块上沿至下沿。对于纵向边界,矩形区域具体范围为,纵向方向:块边界上侧16像素至块边界下侧16像素;横向方向:重构块左沿至右沿。示例性地,如果采用大小均为128*128重构块拼接成的重构图像,其矩形区域大小均为32*128。这里,32的单位为像素,128的单位为像素;也就是说,重构图像或者重构块的大小都是采用像素数量表示。
这样,在通过提取确定出重构图像中包括块边界的至少一个矩形区域后,可以将这至少一个矩形区域输入预设后处理网络模型,得到至少一个处理后的矩形区域;然后利用这至少一个处理后的矩形区域替换重构图像中包括块边界的对应局部区域,得到目标图像。这时候所得到的目标图像能够减弱块效应。
为了进一步消除块效应,还可以消除预设后处理网络模型对边界补0的卷积操作所导致的边界图像失真,这时候需要对这至少一个处理后的矩形区域进行进一步裁剪。在一些实施例中,在得到至少一个处理后的矩形区域之后,该方法还可以包括:
对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
需要说明的是,针对预设后处理网络模型所输出的至少一个处理后的矩形区域,可以通过舍弃左右两侧宽度为8像素、高度为128像素的边缘像素区域,仅保留中心大小为16*128的矩形区域,即为经过裁剪得到的至少一个目标矩形区域;最后利用这至少一个目标矩形区域替换重构图像中包括块边界的对应局部区域,可以得到无明显块效应的目标图像。
具体地,参见图4,其示出了本申请实施例提供的一种利用预设后处理网络模型进行块效应消除的结构示意图。如图4所示,针对带有明显块效应的重构图像,首先提取包括块边界的至少一个矩形区域,图4所示的灰色矩形区域表示纵向边界的矩形区域;将所提取出的至少一个矩形区域输入预设后处理网络模型,输出至少一个处理后的矩形区域;然后通过裁剪,舍弃矩形区域左右两侧的边缘像素区域,得到至少一个目标矩形区域;最后通过替换,即利用这至少一个目标矩形区域替换重构图像中包括块边界的对应局部区域,可以得到无明显块效应的重构图像。
本实施例提供了一种图像处理方法,获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;对所述多个重构块进行拼接,生成重构图像;利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时由于仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
本申请的另一实施例中,参见图5,其示出了本申请实施例提供的另一种图像处理方法的流程示意图。如图5所示,该方法可以包括:
S501:接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的。
S502:利用预设解码网络模型解析所述码流,获取多个重构块。
需要说明的是,该方法应用于解码设备。在编码设备通过预设编码网络模型进行压缩编码生成码流后,可以将码流传输到解码设备,由解码设备利用预设解码网络模型来解析码流,从而获取到多个重构块。
还需要说明的是,预设编码网络模型和预设解码网络模型是基于神经网络结构进行模型训练得到的。其中,预设编码网络模型用于指示编码设备对待处理图像所划分的多个图像块进行编码以生成码流,预设解码网络模型用于指示解码设备解析码流以得到多个重构块。
这里,对于预设编码网络模型和预设解码网络模型而言,在一些实施例中,该方法还可以包括:
获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型。
需要说明的是,在模型训练前,首先需要获取训练集合,该训练集合中包括至少一张训练图像。具体地,可以收集和整理规范的高清静态图像数据集,例如NIC数据集;然后根据NIC数据集,可以得到用于模型训练的训练集合,以及用于模型测试和模型验证的测试集合和交叉验证集合等。
另外,对于编码网络模型和解码网络模型的构建,需要建立多层深度神经网络模型,即端到端的编解码网络结构。如图6所示,其示出了本申请实施例提供的一种包括预设编码网络模型和预设解码网络模型的端到端网络结构示意图。在图6中,编码端采用编码网络模型结构,可以包括有主编码器、超先验(Hyper prior)编码器和上下文模型。其中,对于输入图像,可以划分为多个图像块。针对每一图像块,主编码器的作用为将输入的原始图像变换为通道数为192,行和列尺寸分别为原尺寸大小1/16的特征图。超先验编解码器及上下文模型的作用为根据特征图来估计特征图中像素的概率分布,并且将该概率分布提供给熵编码器。这里的熵编码器可以采用算术编码,而且为无损熵编码压缩。在编码端,针对主编码器所产生的特征图可以通过量化模块采用四舍五入取整的方式进行量化,熵编码器利用超先验编码器、超先验解码器和上下文模型提供的概率分布对量化后的特征图进行无损熵编码(如算术编码)形成码流;并且超先验编码器产生的压缩数据采用固定概率分布进行概率计算,在经过熵编码器后作为额外信息加入到最终的码流中。解码端采用解码网络模型结构,可以包括有主解码器、超先验解码器以及上下文模型。其中,超先验解码器以及上下文模型的作用为通过额外信息解码出特征图中像素的概率分布提供给熵解码器,而主解码器作用为将特征图还原为重构块,然后再根据重构块拼接成重构图像。
还需要注意的是,在图6中,主编码器是用于将图像的像素域转换为特征域,超先验编码器是用于 将特征域转换为概率分布;而超先验解码器则是用于将概率分布转换为特征域,再由主解码器将特征域转换为像素域,以重建出重构图像。另外,上下文模型的概率分布可以采用(μ,σ)表示;其中,μ表示均值,σ表示方差。
这样,构建出编码网络模型和解码网络模型后,可以利用训练集合以及预设算法对编码网络模型和解码网络模型进行模型训练。具体来讲,在一些实施例中,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型,可以包括:
基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
需要说明的是,预设算法可以为Adam梯度优化算法。采用Adam梯度优化算法对编码网络模型和解码网络模型的端到端网络结构进行模型训练。这里,代价函数可以为率失真代价函数,失真度为网络结构输入的训练图像和网络结构输出的重构图像之间的均方差。其中,码率通过利用超先验编码器、超先验解码器及上下文模型所得的概率分布计算特征图中像素包含的信息量进行估计。利用训练集合对编码网络模型和解码网络模型进行充分训练,在其代价函数对应的损失值达到收敛且收敛到预设阈值后,保存编码网络模型和解码网络模型,以作为本申请实施例中端到端网络结构的预设编码网络模型和预设解码网络模型。
在预设编码网络模型和预设解码网络模型的基础上,对后处理网络模型进行训练。具体地,在一些实施例中,该方法还可以包括:
获取多个重构训练块;其中,所述多个重构训练块是由所述训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;
对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
进一步地,所述基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型,可以包括:
基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
需要说明的是,对于后处理网络模型的模型训练,也可以采用Adam梯度优化算法。针对训练集合中的训练图像,可以将其划分为等大且无重叠的多个训练块并输入预设编码网络模型和预设解码网络模型后,将所得到的多个重构块重新拼接为带有块效应的重构训练图像。这时候可以将带有块效应的重构训练图像作为后处理网络模型的训练输入图像,将训练集合中的训练图像作为后处理网络模型的训练目标图像;然后可以根据训练输入图像和训练目标图像的均方差构建模型训练的代价函数。在利用Adam梯度优化算法训练后处理网络的过程中,保持预设编码网络模型和预设解码网络模型的网络参数固定,仅迭代更新后处理网络模型。当其代价函数对应的损失值达到收敛且收敛到预设阈值后,这时候训练得到的后处理网络模型即为预设后处理网络模型。这里,预设阈值根据实际情况进行具体设定,本申请实施例不作任何限定。
S503:对所述多个重构块进行拼接,生成重构图像。
S504:利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
需要说明的是,针对重构图像,在得到预设后处理网络模型后,可以利用该预设后处理网络模型对对重构图像中的块边界进行滤波处理,以得到消除块效应的目标图像。
在一些实施例中,对于S504来说,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,可以包括:
确定所述重构图像中包括所述块边界的至少一个矩形区域;
将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
进一步地,在得到至少一个处理后的矩形区域之后,该方法还可以包括:
对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
也就是说,在提取确定出重构图像中包括块边界的至少一个矩形区域后,可以将这至少一个矩形区 域输入预设后处理网络模型,得到至少一个处理后的矩形区域;然后利用这至少一个处理后的矩形区域替换重构图像中包括块边界的对应局部区域,得到目标图像。这时候所得到的目标图像能够减弱块效应。为了进一步消除块效应,还可以消除预设后处理网络模型对边界补0的卷积操作所导致的边界图像失真,这时候需要对这至少一个处理后的矩形区域进行裁剪,比如舍弃左右两侧宽度为8像素、高度为128像素的边缘像素区域,仅保留中心大小为16*128的矩形区域,以得到至少一个目标矩形区域;最后利用这至少一个目标矩形区域替换重构图像中包括块边界的对应局部区域,可以得到无明显块效应的目标图像。
本实施例提供了一种图像处理方法,通过接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;利用预设解码网络模型解析所述码流,获取多个重构块;对所述多个重构块进行拼接,生成重构图像;利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时由于仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
本申请的又一实施例中,参见图7,其示出了本申请实施例提供的又一种图像处理方法的流程示意图。如图7所示,该方法可以包括:
S701:获取待处理图像。
S702:对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠。
S703:利用预设编码网络模型对所述多个图像块进行编码,生成码流。
S704:将所述码流传输到解码设备。
需要说明的是,该方法应用于编码设备。在编码设备通过预设编码网络模型进行压缩编码生成码流后,可以将码流传输到解码设备,由解码设备利用预设解码网络模型来解析码流,从而获取到多个重构块。
还需要说明的是,预设编码网络模型和预设解码网络模型是基于神经网络结构进行模型训练得到的。其中,预设编码网络模型用于指示编码设备对待处理图像所划分的多个图像块进行编码以生成码流,预设解码网络模型用于解码设备解析码流以得到多个重构块。
这里,对于预设编码网络模型和预设解码网络模型而言,在一些实施例中,该方法还可以包括:
获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型。
进一步地,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型,可以包括:
基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
需要说明的是,对于编码网络模型和解码网络模型的构建,需要建立多层深度神经网络模型,即端到端的编解码网络结构,具体如图6所示。在构建出编码网络模型和解码网络模型后,可以采用Adam梯度优化算法对编码网络模型和解码网络模型进行模型训练。这里,代价函数可以为率失真代价函数,失真度为网络结构输入的训练图像和网络结构输出的重构图像之间的均方差。利用训练集合对编码网络模型和解码网络模型进行充分训练,在其代价函数对应的损失值达到收敛且收敛到预设阈值后,这时候训练得到的编码网络模型和解码网络模型即为本申请实施例中的预设编码网络模型和预设解码网络模型。
本实施例提供了一种图像处理方法,通过获取待处理图像;对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;利用预设编码网络模型对所述多个图像块进行编码,生成码流;将所述码流传输到解码设备。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,从而降低了编解码的运行时间以及运行内存需求。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图8,其示出了本申请实施例提供的一种图像处理方法的详细流程示意图。如图8所示,该详细流程可以包括:
S801:构建训练集合。
需要说明的是,对于步骤S801来说,可以选取合适的静态图像训练集合。这里,训练集合的选取对于整个神经网络的训练有很大的影响,在本申请实施例中,可以选取NIC数据集。NIC数据集是基于深度学习的图像压缩的IEEE标准测试模型NIC的开发数据集,在该数据集中,可以包括有图像大小为256*256的训练集合,也可以包括有图像大小为256*256的验证集合和测试集合。
S802:建立多层深度神经网络模型,包括编码网络模型、解码网络模型和后处理网络模型。
需要说明的是,编码网络模型和解码网络模型的端到端网络结构如图6所示。编码端采用编码网络模型结构,包含主编码器、超先验编码器及上下文模型。主编码器的作用为将输入图像变换为通道数为192,行和列尺寸分别为原大小1/16的特征图。超先验编解码器及上下文模型的作用为根据特征图来估计特征图中像素的概率分布提供给熵编码器。在编码端,超先验编码器产生的压缩数据采用固定概率分布进行概率计算,经熵编码后作为额外信息加入到最终的压缩码流中。解码端采用解码网络模型结构,包含主解码器、超先验解码器及上下文模型。超先验解码器及上下文模型的作用为通过额外信息解码出特征图中像素的概率分布并提供给熵解码器。主解码器的作用为将特征图还原为重构图像。
对于后处理网络模型而言,具体如图2所示,可以由卷积层、激活函数及用于提高模型性能的多个级联的残差块构成。其中,残差块内部的具体网络结构如图3所示。在图中,k3n128表示卷积核大小为3*3,输出特征数为128,步长为1的卷积层;k33表示卷积核大小为3*3,输出特征数为3,步长为1的卷积层。
S803:利用训练集合和预设算法对编码网络模型和解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型。
需要说明的是,对于步骤S803来说,可以采用Adam梯度优化算法对端到端的编码网络模型和解码网络模型进行模型训练。其中,代价函数为率失真代价函数,失真度为网络结构输入的训练图像与网络结构输出的重构图像之间的均方差;码率通过利用超先验编码器、超先验解码器及上下文模型所得的概率分布计算特征图中像素包含的信息量进行估计。通过在S801中建立的训练集合的基础上进行充分训练,当其代价函数对应的损失值达到收敛后,保存训练后的编码网络模型和解码网络模型,将其作为端到端的预设编码网络模型和预设解码网络模型。
S804:基于训练得到的预设编码网络模型和预设解码网络模型,利用训练集合和预设算法对后处理网络模型进行模型训练,得到预设后处理网络模型。
需要说明的是,使用步骤S803中保存的预设编码网络模型和预设解码网络模型对后处理网络模型进行模型训练。将步骤S801中所述训练集合中尺寸为256*256的训练图像,在将其划分为等大且无重叠的128*128的训练块并输入预设编码网络模型和预设解码网络模型后,将所得到的重构块重新拼接为256*256的带有块效应的重构训练图像。这时候可以将带有块效应的重构训练图像作为后处理网络模型的训练输入图像,将训练集合中未经编码压缩的训练图像作为后处理网络模型的训练目标图像;然后可以根据训练输入图像和训练目标图像的均方差构建模型训练的代价函数,采用Adam梯度优化算法训练后处理网络,在训练过程中保持预设编码网络模型和预设解码网络模型的网络参数固定,仅迭代更新后处理网络模型。在其代价函数对应的损失(Loss)值达到收敛后,这时候训练得到的后处理网络模型即为预设后处理网络模型。
S805:将待处理图像划分为128*128的等大小且无重叠的图像块并输入预设编码网络模型,生成待传输的码流。
需要说明的是,针对待处理图像,可以划分为等大无重叠的多个图像块,将这些图像块输入预设编码网络模型,以生成码流;具体可以是将预设编码网络模型的输出数据经过量化和无损熵编码输出为压缩数据。
也就是说,在编码端,将待处理图像划分为128*128等大小无重叠的图像块后输入预设编码网络模型,利用预设编码网络模型对每个图像块独立地进行编码产生特征图。然后对特征图采用四舍五入取整的方式进行量化,熵编码器则利用超先验编码器、超先验解码器及上下文模型提供的概率分布对量化后的特征图进行无损熵编码(如算术编码)形成码流,并与超先验编码器产生的额外码流叠加作为最终的压缩数据,再以码流形式传输到解码端。
S806:通过预设解码网络模型解析码流,得到128*128的重构块,并且拼接生成重构图像。
需要说明的是,解码端以同编码端对称的方式,通过熵解码器和预设解码网络模型将每个块的特征图重建为128*128的重构块,最后通过拼接的方式重建出带有明显块效应的重构图像。
S807:利用预设后处理网络模型对重构图像中的块边界进行局部后处理,得到目标图像。
需要说明的是,对于步骤S807来说,对步骤S806中重构图像的块边界进行局部后处理。具体地,采用如图4所示的方式提取图像边界附近的矩形区域。对于横向边界的矩形区域具体范围为,横向:块边界左侧16像素至块边界右侧16像素,纵向:块上沿至块下沿。对于纵向边界的矩形区域具体范围为,纵向:块边界上侧16像素至块边界下侧16像素。横向:块左沿至块右沿。对于步骤S806中采用大小为128*128重构块拼接成的重构图像,其矩形区域的大小均为32*128像素。将该矩形区域像素输入步骤S804中训练的预设后处理网络模型,输出减弱块效应的边界矩形区域。为了进一步消除预设后处理网络模型中对边界补0的卷积操作所导致的边界图像失真,本申请实施例可以对预设后处理网络模型输出的矩形区域进行进一步裁剪,例如,舍弃左右两侧宽度为8像素,高度为128像素的边缘像素区域,仅保留中心大小为16*128的矩形区域。最后用该大小为16*128的矩形区域替换原重构图像中对应的块边界矩形区域,可以得到无明显块效应的重构图像。
这样,本申请实施例提供了一种对于静态图像的分块编解码方案,通过对输入图像进行分块后对每个图像块进行独立地进行编解码,能够实现图像的多核并行编解码处理,从而减少了对图像进行编解码所需的运行时间以及每个核的运行内存需求;另外,针对重构图像的边界处进行局部后处理,可以减小分块边界处的块效应。具体步骤如下:(1)选取合适的静态图像训练集合、训练集合和验证集合;(2)建立端到端网络的编码网络模型、解码网络模型以及重构图像的后处理网络模型;(3)训练端到端网络的编码网络模型和解码网络模型,训练后得到预设编码网络模型和预设解码网络模型;(4)将训练集合中的训练图像分成128*128的无重叠块输入训练后的预设编码网络模型和预设解码网络模型,然后将解码得到的重构块拼接成重构图像后作为新的训练数据训练后处理网络模型,训练后得到预设后处理网络模型;(5)编码端通过预设编码网络模型后的输出数据,在经过量化和无损熵编码后作为压缩数据,以码流形式传输到解码端;(6)解码端通过预设解码网络模型将码流还原成128*128的重构块,并将其拼接重建出重构图像;(7)利用预设后处理网络模型对重构图像中的块边界区域进行局部后处理,以减小边界处的块效应,最后得到目标图像。
简言之,在本申请实施例中,在现有图像编解码网络结构的基础上对输入图像作分块处理,块与块之间独立地编解码,实现编解码多核并行处理,能够降低运行时间及单核运行内存需求。另外,针对重构图像中块边界处的矩形区域进行局部后处理,可以减小总计算量,同时每个矩形区域在处理过程中完全独立,能够实现后处理的并行化,从而能够达到降低运行时间及单核内存需求的效果。
也就是说,本申请实施例的技术方案能够实现图像的多核并行编解码,且降低单核编解码运行时间及运行内存需求。由于基于预设编码网络模型和预设解码网络模型的编解码过程,对于划分得到多个图像块的块与块之间完成独立,因此可以实现图像的多核并行编解码。另外,由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,因此还降低了单核编解码所需要的运行时间及运行内存需求。如表1所示,在原图像大小为512*768的柯达测试集中,每个核的运行时间及单核运行内存需求均减小为无分块编解码处理的1/20。其中,运行时间的单位为秒(Second,s),运行内存需求的单位为兆字节(MByte,MB)。
表1
  无分块编解码网络参数 有分块编解码网络参数
运行时间(s) 10.85 0.5
运行内存需求(MB) 5195 263
另外,本申请实施例的技术方案采用预设后处理网络模型消除重构图像中的块效应,可以提高重构图像的峰值信噪比。而且本申请实施例的技术方案采用预设后处理网络模型解决了由于分块编解码导致块边界不连续产生的块效应;如图9A和图9B所示,图9A的重构图像中明显存在块效应,在经过后处理之后,图9B可以明显看出重构图像的块效应问题得到了有效解决。如图10所示,其示出了码率和峰值信噪比之间的率失真曲线示例;经过预设后处理网络模型的后处理之后,重构图像的峰值信噪比比无后处理提高了约0.05dB。此外,本申请实施例的技术方案仅对块边界矩形区域进行后处理,可以降低后处理网络的总计算量,且对每个矩形区域的处理完全独立,还可以实现后处理的并行化,从而降低了单核后处理运行时间及内存需求。本方案采用的后处理方法为块边界矩形区域局部后处理。以处理柯达数据集为例,相比于整幅图像的后处理,总计算量可减少至40%,且由表2可知,每个核后处理所需要的运行时间及运行内存均减小至整幅图像后处理的1/90。
表2
  无分块后处理网络参数 有分块后处理网络参数
运行时间(s) 11.72 0.13
运行内存需求(MB) 6352 63
本实施例提供了一种图像处理方法,通过本实施例对前述实施例的具体实现进行了详细阐述,从中可以看出,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时由于仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图11,其示出了本申请实施例提供的一种图像处理装置110的组成结构示意图。如图11所示,图像处理装置110可以包括:获取单元1101和处理单元1102;其中,
获取单元1101,配置为获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
处理单元1102,配置为对所述多个重构块进行拼接,生成重构图像;以及利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
在一些实施例中,参见图11,图像处理装置110还可以包括构建单元1103和训练单元1104;其中,
获取单元1101,还配置为获取多个重构训练块;其中,所述多个重构训练块是由训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;以及对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建单元1103,配置为构建后处理网络模型;
训练单元1104,配置为基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
进一步地,训练单元1104,具体配置为基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;以及当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
在一些实施例中,参见图11,图像处理装置110还可以包括确定单元1105,配置为确定所述重构图像中包括所述块边界的至少一个矩形区域;
处理单元1102,具体配置为将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;以及利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
进一步地,处理单元1102,还配置为对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;以及利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于图像处理装置110,该计算机存储介质存储有图像处理程序,所述图像处理程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述图像处理装置110的组成以及计算机存储介质,参见图12,其示出了本申请实施例提供的图像处理装置110的硬件结构示意图。如图12所示,图像处理装置110可以包括:第一通信接口1201、第一存储器1202和第一处理器1203;各个组件通过第一总线系统1204耦合在一起。可理解,第一总线系统1204用于实现这些组件之间的连接通信。第一总线系统1204除包括数据总线之外,还包括电源 总线、控制总线和状态信号总线。但是为了清楚说明起见,在图12中将各种总线都标为第一总线系统1204。其中,
第一通信接口1201,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器1202,用于存储能够在第一处理器1203上运行的计算机程序;
第一处理器1203,用于在运行所述计算机程序时,执行:
获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
可以理解,本申请实施例中的第一存储器1202可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器1202旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器1203可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器1203可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器1202,第一处理器1203读取第一存储器1202中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器1203还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种图像处理装置,该图像处理装置可以包括获取单元和处理单元。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时本申请仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图13,其示出了本申请实施例提供的一种解码设备130的组成结构示意图。如图13所示,解码设备130可以包括:接收单元1301、解码单元1302和后处理单元1303;其中,
接收单元1301,配置为接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
解码单元1302,配置为利用预设解码网络模型解析所述码流,获取多个重构块;
后处理单元1303,配置为对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
在一些实施例中,参见图13,解码设备130还可以包括获取单元1304、构建单元1305和训练单元1306;其中,
获取单元1304,配置为获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建单元1305,配置为构建编码网络模型和解码网络模型;
训练单元1306,配置为基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型。
进一步地,训练单元1306,具体配置为基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;以及当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
在一些实施例中,获取单元1304,还配置为获取多个重构训练块;其中,所述多个重构训练块是由所述训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;以及对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
构建单元1305,还配置为构建后处理网络模型;
训练单元1306,还配置为基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
进一步地,训练单元1306,具体配置为基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;以及当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
在一些实施例中,后处理单元1303,具体配置为确定所述重构图像中包括所述块边界的至少一个矩形区域;以及将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;以及利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
进一步地,后处理单元1303,还配置为对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;以及利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于解码设备130,该计算机存储介质存储有图像处理程序,所述图像处理程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码设备130的组成以及计算机存储介质,参见图14,其示出了本申请实施例提供的解码设备130的硬件结构示意图。如图14所示,解码设备130可以包括:第二通信接口1401、第二存储器1402和第二处理器1403;各个组件通过第二总线系统1404耦合在一起。可理解,第二总线系统1404用于实现这些组件之间的连接通信。第二总线系统1404除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图14中将各种总线都标为第二总线系统1404。其中,
第二通信接口1401,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1402,用于存储能够在第二处理器1403上运行的计算机程序;
第二处理器1403,用于在运行所述计算机程序时,执行:
接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
利用预设解码网络模型解析所述码流,获取多个重构块;
对所述多个重构块进行拼接,生成重构图像;
利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
可选地,作为另一个实施例,第二处理器1403还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器1402与第一存储器1202的硬件功能类似,第二处理器1403与第一处理器 1203的硬件功能类似;这里不再详述。
本实施例提供了一种解码设备,该解码设备可以包括接收单元、解码单元和后处理单元。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时本申请仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图15,其示出了本申请实施例提供的一种编码设备150的组成结构示意图。如图15所示,编码设备150可以包括:获取单元1501、分块单元1502、编码单元1503和发送单元1504;其中,
获取单元1501,配置为获取待处理图像;
分块单元1502,配置为对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
编码单元1503,配置为利用预设编码网络模型对所述多个图像块进行编码,生成码流;
发送单元1504,配置为将所述码流传输到解码设备。
在一些实施例中,参见图15,编码设备150还可以包括构建单元1505和训练单元1506;其中,
获取单元1501,还配置为获取训练集合;其中,所述训练集合包括至少一张训练图像;
构建单元1505,配置为构建编码网络模型和解码网络模型;
训练单元1506,配置为基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型;其中,预设解码网络模型用于指示所述解码设备解析所述码流以得到多个重构块。
进一步地,训练单元1506,具体配置为基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;以及当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于编码设备150,该计算机存储介质存储有图像处理程序,所述图像处理程序被第三处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码设备150的组成以及计算机存储介质,参见图16,其示出了本申请实施例提供的编码设备150的硬件结构示意图。如图16所示,编码设备150可以包括:第三通信接口1601、第三存储器1602和第三处理器1603;各个组件通过第三总线系统1604耦合在一起。可理解,第三总线系统1604用于实现这些组件之间的连接通信。第三总线系统1604除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图16中将各种总线都标为第三总线系统1604。其中,
第三通信接口1601,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第三存储器1602,用于存储能够在第三处理器1603上运行的计算机程序;
第三处理器1603,用于在运行所述计算机程序时,执行:
获取待处理图像;
对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
利用预设编码网络模型对所述多个图像块进行编码,生成码流;
将所述码流传输到解码设备。
可选地,作为另一个实施例,第三处理器1603还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第三存储器1602与第一存储器1202的硬件功能类似,第三处理器1603与第一处理器1203的硬件功能类似;这里不再详述。
本实施例提供了一种编码设备,该编码设备可以包括获取单元、分块单元、编码单元和发送单元。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设 解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,从而降低了编解码的运行时间以及运行内存需求。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图17,其示出了本申请实施例提供的一种视频系统170的组成结构示意图。如图17所示,视频系统170可以包括:前述实施例所述的编码设备150和前述实施例所述的解码设备130。其中,
编码设备150,配置为获取待处理图像;以及对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;以及利用预设编码网络模型对所述多个图像块进行编码,生成码流;并将所述码流传输到解码设备130;
解码设备130,配置为接收编码设备150传输的码流;以及利用预设解码网络模型解析所述码流,获取多个重构块;以及对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
在本申请实施例中,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时本申请仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,通过获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;对所述多个重构块进行拼接,生成重构图像;利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。这样,针对待处理图像所划分的多个图像块,块与块之间完全独立,可以利用预设编码网络模型和预设解码网络模型实现多核并行编解码处理;而且由于分块后输入预设编码网络模型和预设解码网络模型的图像尺寸降低,还可以降低编解码的运行时间以及运行内存需求;另外,通过对重构图像中的块边界进行滤波处理,还能够消除分块边界处的块效应,且还能够提高重构图像的峰值信噪比;同时本申请仅对块边界的矩形区域进行后处理,还降低了后处理网络的总计算量,且对每个矩形区域处理完全独立,还可以实现后处理的并行化,进一步降低了单核后处理运行时间及内存需求。

Claims (23)

  1. 一种图像处理方法,应用于图像处理装置,所述方法包括:
    获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
    对所述多个重构块进行拼接,生成重构图像;
    利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取多个重构训练块;其中,所述多个重构训练块是由训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;
    对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
    构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
  3. 根据权利要求2所述的方法,其中,所述基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型,包括:
    基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
    当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
  4. 根据权利要求1至3任一项所述的方法,其中,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,包括:
    确定所述重构图像中包括所述块边界的至少一个矩形区域;
    将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
    利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
  5. 根据权利要求4所述的方法,其中,在所述得到至少一个处理后的矩形区域之后,所述方法还包括:
    对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
    利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
  6. 一种图像处理方法,应用于解码设备,所述方法包括:
    接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
    利用预设解码网络模型解析所述码流,获取多个重构块;
    对所述多个重构块进行拼接,生成重构图像;
    利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    获取训练集合;其中,所述训练集合包括至少一张训练图像;
    构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型。
  8. 根据权利要求7所述的方法,其中,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到所述预设编码网络模型和所述预设解码网络模型,包括:
    基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
    当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
  9. 根据权利要求7所述的方法,其中,所述方法还包括:
    获取多个重构训练块;其中,所述多个重构训练块是由所述训练集合中的至少一张训练图像所划分的多个训练块经由所述预设编码网络模型和所述预设解码网络模型后得到的;
    对所述多个重构训练块进行拼接,得到至少一张重构训练图像;
    构建后处理网络模型,基于所述至少一张重构训练图像对所述后处理网络模型进行训练,得到所述预设后处理网络模型。
  10. 根据权利要求9所述的方法,其中,所述基于所述至少一张重构训练图像对所述后处理网络模 型进行训练,得到所述预设后处理网络模型,包括:
    基于所述至少一张重构训练图像,利用预设算法对所述后处理网络模型进行模型训练;
    当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的后处理网络模型确定为所述预设后处理网络模型。
  11. 根据权利要求6至10任一项所述的方法,其中,所述利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像,包括:
    确定所述重构图像中包括所述块边界的至少一个矩形区域;
    将所述至少一个矩形区域输入所述预设后处理网络模型,得到至少一个处理后的矩形区域;
    利用所述至少一个处理后的矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
  12. 根据权利要求11所述的方法,其中,在所述得到至少一个处理后的矩形区域之后,所述方法还包括:
    对所述至少一个处理后的矩形区域进行裁剪,得到至少一个目标矩形区域;
    利用所述至少一个目标矩形区域替换所述重构图像中包括所述块边界的对应局部区域,得到所述目标图像。
  13. 一种图像处理方法,应用于编码设备,所述方法包括:
    获取待处理图像;
    对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
    利用预设编码网络模型对所述多个图像块进行编码,生成码流;
    将所述码流传输到解码设备。
  14. 根据权利要求13所述的方法,其中,所述方法还包括:
    获取训练集合;其中,所述训练集合包括至少一张训练图像;
    构建编码网络模型和解码网络模型,基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型;其中,所述预设解码网络模型用于指示所述解码设备解析所述码流以得到多个重构块。
  15. 根据权利要求14所述的方法,其中,所述基于所述训练集合对所述编码网络模型和所述解码网络模型进行模型训练,得到预设编码网络模型和预设解码网络模型,包括:
    基于所述训练集合,利用预设算法对所述编码网络模型和所述解码网络模型进行模型训练;
    当所述模型训练的代价函数对应的损失值收敛到预设阈值时,将训练后得到的编码网络模型和解码网络模型确定为所述预设编码网络模型和所述预设解码网络模型。
  16. 一种图像处理装置,所述图像处理装置包括:获取单元和处理单元;其中,
    所述获取单元,配置为获取多个重构块;其中,所述多个重构块是由待处理图像所划分的多个图像块经由预设编码网络模型和预设解码网络模型后得到的;
    所述处理单元,配置为对所述多个重构块进行拼接,生成重构图像;以及利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
  17. 一种图像处理装置,所述图像处理装置包括:第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的可执行指令;
    所述第一处理器,用于在运行所述可执行指令时,执行如权利要求1至5任一项所述的方法。
  18. 一种解码设备,所述解码设备包括:接收单元、解码单元和后处理单元;其中,
    所述接收单元,配置为接收编码设备传输的码流;其中,所述码流是由待处理图像所划分的多个图像块经由预设编码网络模型后得到的;
    所述解码单元,配置为利用预设解码网络模型解析所述码流,获取多个重构块;
    所述后处理单元,配置为对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
  19. 一种解码设备,所述解码设备包括:第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的可执行指令;
    所述第二处理器,用于在运行所述可执行指令时,执行如权利要求6至12任一项所述的方法。
  20. 一种编码设备,所述编码设备包括:获取单元、分块单元、编码单元和发送单元;其中,
    所述获取单元,配置为获取待处理图像;
    所述分块单元,配置为对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;
    所述编码单元,配置为利用预设编码网络模型对所述多个图像块进行编码,生成码流;
    所述发送单元,配置为将所述码流传输到解码设备。
  21. 一种编码设备,所述编码设备包括:第三存储器和第三处理器;其中,
    所述第三存储器,用于存储能够在所述第三处理器上运行的可执行指令;
    所述第三处理器,用于在运行所述可执行指令时,执行如权利要求13至15任一项所述的方法。
  22. 一种计算机存储介质,其中,所述计算机存储介质存储有图像处理程序,所述图像处理程序被第一处理器执行时实现如权利要求1至5任一项所述的方法、或者被第二处理器执行时实现如权利要求6至12任一项所述的方法、或者被第三处理器执行时实现如权利要求13至15任一项所述的方法。
  23. 一种视频系统,所述视频系统包括:编码设备和解码设备;其中,
    所述编码设备,配置为获取待处理图像;以及对所述待处理图像进行分块,得到多个图像块;其中,所述多个图像块大小相等且无重叠;以及利用预设编码网络模型对所述多个图像块进行编码,生成码流;并将所述码流传输到解码设备;
    所述解码设备,配置为接收所述编码设备传输的码流;以及利用预设解码网络模型解析所述码流,获取多个重构块;以及对所述多个重构块进行拼接,生成重构图像,并利用预设后处理网络模型对所述重构图像中的块边界进行滤波处理,得到目标图像。
PCT/CN2021/096017 2020-08-21 2021-05-26 图像处理方法、装置、设备、计算机存储介质和系统 WO2022037162A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010851882.8A CN114079776A (zh) 2020-08-21 2020-08-21 图像处理方法、装置、设备、计算机存储介质和系统
CN202010851882.8 2020-08-21

Publications (1)

Publication Number Publication Date
WO2022037162A1 true WO2022037162A1 (zh) 2022-02-24

Family

ID=80282498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096017 WO2022037162A1 (zh) 2020-08-21 2021-05-26 图像处理方法、装置、设备、计算机存储介质和系统

Country Status (3)

Country Link
CN (1) CN114079776A (zh)
TW (1) TW202209885A (zh)
WO (1) WO2022037162A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051662A (zh) * 2023-03-31 2023-05-02 腾讯科技(深圳)有限公司 图像处理方法、装置、设备和介质
CN116416137A (zh) * 2023-06-09 2023-07-11 北京五八信息技术有限公司 图像拼接方法、装置、设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829903A (zh) * 2019-01-28 2019-05-31 合肥工业大学 一种基于卷积去噪自编码器的芯片表面缺陷检测方法
US20200053362A1 (en) * 2018-08-10 2020-02-13 Apple Inc. Deep Quality Enhancement of Adaptive Downscaled Coding for Image Compression
CN110915215A (zh) * 2017-05-26 2020-03-24 谷歌有限责任公司 使用神经网络的图块化图像压缩

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110915215A (zh) * 2017-05-26 2020-03-24 谷歌有限责任公司 使用神经网络的图块化图像压缩
US20200053362A1 (en) * 2018-08-10 2020-02-13 Apple Inc. Deep Quality Enhancement of Adaptive Downscaled Coding for Image Compression
CN109829903A (zh) * 2019-01-28 2019-05-31 合肥工业大学 一种基于卷积去噪自编码器的芯片表面缺陷检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIN CHAOYI; YAO JIABAO; CHEN FANGDONG; WANG LI: "A Spatial RNN Codec for End-to-End Image Compression", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 13 June 2020 (2020-06-13), pages 13266 - 13274, XP033805129, DOI: 10.1109/CVPR42600.2020.01328 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051662A (zh) * 2023-03-31 2023-05-02 腾讯科技(深圳)有限公司 图像处理方法、装置、设备和介质
CN116416137A (zh) * 2023-06-09 2023-07-11 北京五八信息技术有限公司 图像拼接方法、装置、设备和存储介质
CN116416137B (zh) * 2023-06-09 2023-10-31 北京五八信息技术有限公司 图像拼接方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN114079776A (zh) 2022-02-22
TW202209885A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
CN111818346B (zh) 图像编码方法和装置、图像解码方法和装置
WO2022037162A1 (zh) 图像处理方法、装置、设备、计算机存储介质和系统
CN111641826B (zh) 对数据进行编码、解码的方法、装置与系统
WO2022037146A1 (zh) 图像处理方法、装置、设备、计算机存储介质和系统
US11985313B2 (en) Filtering method and apparatus, and computer storage medium
US20210335017A1 (en) Stop code tolerant image compression neural networks
Akbari et al. Generalized octave convolutions for learned multi-frequency image compression
CN116648912A (zh) 基于神经网络的码流的解码和编码
CN110753225A (zh) 一种视频压缩方法、装置及终端设备
US20230100615A1 (en) Video processing method and apparatus, and device, decoder, system and storage medium
CN116233445B (zh) 视频的编解码处理方法、装置、计算机设备和存储介质
CN113438481B (zh) 训练方法、图像编码方法、图像解码方法及装置
CN114915786B (zh) 一种面向物联网场景的非对称语义图像压缩方法
CN113822824B (zh) 视频去模糊方法、装置、设备及存储介质
CN110766048A (zh) 图像内容识别方法、装置、计算机设备和存储介质
CN113256744B (zh) 一种图像编码、解码方法及系统
CN111050170A (zh) 基于gan的图片压缩系统构建方法、压缩系统及方法
CN113382244B (zh) 编解码网络结构、图像压缩方法、装置及存储介质
CN114501031B (zh) 一种压缩编码、解压缩方法以及装置
KR20230107869A (ko) 신경망 특징맵 양자화 방법 및 장치
CN110717948A (zh) 一种图像后处理方法、系统及终端设备
CN110913220A (zh) 一种视频帧编码方法、装置及终端设备
WO2024093627A1 (zh) 一种视频压缩方法、视频解码方法和相关装置
CN117915107B (zh) 图像压缩系统、图像压缩方法、存储介质与芯片
US20230239470A1 (en) Video encoding and decoding methods, encoder, decoder, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857263

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857263

Country of ref document: EP

Kind code of ref document: A1