WO2022022176A1 - Image processing method and related device - Google Patents

Image processing method and related device Download PDF

Info

Publication number
WO2022022176A1
WO2022022176A1 PCT/CN2021/101807 CN2021101807W WO2022022176A1 WO 2022022176 A1 WO2022022176 A1 WO 2022022176A1 CN 2021101807 W CN2021101807 W CN 2021101807W WO 2022022176 A1 WO2022022176 A1 WO 2022022176A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
data
pixels
image blocks
adaptive
Prior art date
Application number
PCT/CN2021/101807
Other languages
French (fr)
Chinese (zh)
Inventor
赵政辉
马思伟
王晶
Original Assignee
华为技术有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 北京大学 filed Critical 华为技术有限公司
Publication of WO2022022176A1 publication Critical patent/WO2022022176A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4092Image resolution transcoding, e.g. by using client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the arrangement order of the N first encoded representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
  • the remainder is less than The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is where g is the remainder.
  • a third aspect of the present application provides a model training method, the method comprising:
  • the preprocessed N first image blocks are processed by the first coding neural network to obtain N groups of first feature maps;
  • the N groups of second feature maps are processed by the first decoding neural network to obtain N first reconstructed image blocks;
  • the method further includes:
  • N first adaptive quantization data Quantizing the N first adaptive data to obtain N first adaptive quantization data, and the N first adaptive quantization data is used to compensate the N first reconstructed picture blocks;
  • Obtaining the distortion loss of the second image relative to the first image includes:
  • each of the N first image blocks has the same size.
  • the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
  • the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, equal to an integer, the e includes the c and the f includes the d.
  • the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
  • the pixels of the first image block are a ⁇ b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r ⁇ t;
  • the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • filling the edge of the first image with the median value of pixels includes:
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • the method further includes:
  • the one first adaptive data is obtained from the inverse quantized first target image block.
  • a fourth aspect of the present application provides an encoding device, the device comprising:
  • a first acquisition module configured to acquire a first image
  • a segmentation module used for segmenting the first image to obtain N first image blocks, where N is an integer greater than 1;
  • a second acquisition module configured to acquire N first adaptive data from N first image blocks, where N first adaptive data corresponds to N first image blocks one-to-one;
  • a preprocessing module configured to preprocess the N first image blocks according to the N first adaptive data
  • the coding neural network module processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps
  • the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  • the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network to obtain N sets of second feature maps.
  • the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used to combine into a second image.
  • the device further includes:
  • the sending module is configured to send the N pieces of first coded representations, the N pieces of first adaptive data and corresponding relationships to the decoding end, where the correspondence relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations.
  • the device further includes:
  • a quantization module configured to quantize the N first adaptive data to obtain the N first adaptive quantized data, and the N first adaptive quantized data is used to compensate the N first reconstructed image blocks;
  • the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
  • the second image is processed by a fusion neural network to obtain the third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference including block effect.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, Equal to an integer, e includes c and f includes d.
  • the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the pixels of the first image block are a ⁇ b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ⁇ t.
  • the device also includes:
  • padding module for if not equal to an integer, and/or is not equal to an integer, then fill the edges of the first image with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • the device further includes:
  • Amplification module for use if described is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • the padding module is specifically used if not equal to an integer, then fill the edges of the first image with the pixel median.
  • the second obtaining unit is further configured to not equal to an integer, get the remainder
  • the filling module is specifically used if the remainder is greater than Then only the pixel median value is filled on one side of the width direction of the first image.
  • the filling module is specifically used if the remainder is less than Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is where g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image.
  • the device also includes:
  • an inverse quantization module for inverse quantization of the pixel value of the first target image block
  • the second obtaining module is specifically configured to obtain a piece of first adaptive data from the inverse quantized first target image block.
  • a fifth aspect of the present application provides a decoding device, the decoding device comprising:
  • the acquisition module is used to acquire N first coding representations, N first adaptive data and corresponding relationships, the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coding representations, the N first self-adaptive data and the corresponding relationships.
  • the adaptation data is in one-to-one correspondence with the N first codes, where N is an integer greater than 1;
  • the entropy decoding module performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
  • a decoding neural network module for processing N groups of second feature maps to obtain N first reconstructed image blocks
  • a compensation module configured to compensate the N first reconstructed image blocks by using the N first adaptive data
  • the combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.
  • the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network.
  • the N first image blocks after processing are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data.
  • the N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
  • the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data;
  • the compensation module is specifically configured to compensate the N first reconstructed image blocks by using the N first adaptive quantization data.
  • the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
  • the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in the arrangement order in the first image.
  • the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
  • the device further includes:
  • the fusion neural network module is used to process the second image to obtain the third image, so as to reduce the difference between the second image and the first image, and the difference includes block effect.
  • each of the N first tiles has the same size.
  • the size of the first image block is a fixed value.
  • the pixels of the first block are a ⁇ b, a and b are obtained according to the target pixel, and the target pixel is c ⁇ d, is equal to an integer, Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, Equal to an integer, e includes c and f includes d.
  • the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the pixels of the first image block are a ⁇ b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ⁇ t. exist not equal to an integer, and/or not equal to an integer, the edges of the first image are filled with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • r and t are proportionally enlarged by the encoder, and the first image with pixels r2 ⁇ t2 is obtained, equal to an integer.
  • the remainder is greater than The first image is padded with the median pixel value only on one side in the width direction.
  • the remainder is less than The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is where g is the remainder.
  • the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first
  • the adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.
  • a sixth aspect of the present application provides a training device, the device comprising:
  • a first acquisition module configured to acquire a first image
  • a segmentation module configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;
  • a second obtaining module configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
  • a preprocessing module configured to preprocess the N first image blocks according to the N first adaptive data
  • the first coding neural network module is used to process the preprocessed N first image blocks to obtain N groups of first feature maps;
  • a quantization and entropy encoding module for performing quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations
  • a training module configured to jointly train a model by using a loss function until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding neural network , a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network.
  • the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.
  • the model further includes a segmentation network, and the trainable parameter in the segmentation network is the size of the first image block;
  • the output module is used to output a second coding neural network and a second decoding neural network
  • the second coding neural network is a model obtained after the first coding neural network performs iterative training
  • the second decoding neural network is The first decoding neural network is a model obtained after iterative training is performed.
  • a quantization module configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;
  • the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;
  • the third obtaining module is specifically configured to obtain the distortion loss of the third image relative to the first image
  • the model includes a fusion neural network.
  • each of the N first image blocks has the same size.
  • the size of the first image used for training is different, and the size of the first image block is a fixed value.
  • the pixels of the first block are a ⁇ b, the a and the b are obtained according to a target pixel, and the target pixel is c ⁇ d, is equal to an integer, is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device.
  • the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
  • the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, equal to an integer, the e includes the c and the f includes the d.
  • the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
  • the pixels of the first image block are a ⁇ b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r ⁇ t;
  • the device also includes:
  • padding module for if not equal to an integer, and/or is not equal to an integer, then fill the edges of the first image with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • the device further includes:
  • Amplification module for use if described is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • the padding module is specifically used if not equal to an integer, then fill the edges of the first image with the pixel median.
  • the second acquisition module is further configured to, after proportionally amplifying r and t, if not equal to an integer, get the remainder;
  • the filling module is specifically used for if the remainder is greater than Then the pixel median value is only filled on one side of the width direction of the first image.
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • the device also includes:
  • an inverse quantization module for inverse quantization of the pixel value of the first target image block
  • the second obtaining module is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.
  • a seventh aspect of the present application provides an encoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
  • the processor may also be configured to execute the steps performed by the encoding end in each possible implementation manner of the first aspect. For details, refer to the first aspect, which will not be repeated here.
  • Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
  • the N groups of second feature maps are processed by the decoding neural network to obtain N first reconstructed image blocks;
  • the compensated N first reconstructed image blocks are combined to obtain a second image.
  • the decoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.
  • the processor may also be configured to execute the steps performed by the decoding end in each possible implementation manner of the second aspect, and details can be found in the second aspect, which will not be repeated here.
  • a ninth aspect of the present application provides a training device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • the first encoding neural network, the quantization network, the entropy encoding network, the entropy decoding network, and the first decoding neural network are jointly trained by using the loss function, until the image between the first image and the second image is The distortion value reaches the first preset level;
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer enables the computer to execute the above-mentioned first to third aspects Any of the image processing methods described.
  • the present application provides a chip system
  • the chip system includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and/or information.
  • the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • FIG. 2b is a schematic diagram of another application scenario of the embodiment of the present application.
  • FIG. 2c is a schematic diagram of another application scenario of the embodiment of the present application.
  • FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a CNN-based image encoding processing process in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a CNN-based image decoding process in an embodiment of the application.
  • FIG. 7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application
  • FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a training process provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of a decoding apparatus provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 is a structural schematic diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is elaborated in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • the image compression method provided by the embodiment of the present application can be applied to an image compression process in a terminal device, and specifically, can be applied to an album, video monitoring, etc. on the terminal device.
  • FIG. 2a is a schematic diagram of an application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, where the image to be compressed may be a photo taken by a camera component Or a frame captured from a video, and the camera component is generally a camera.
  • the terminal device divides the extracted image through a central processing unit (CPU) to obtain multiple tiles.
  • CPU central processing unit
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding.
  • Decoding neural network reconstructs the feature map to obtain multiple reconstructed blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.
  • the image compression method provided by the embodiment of the present application can be applied to an image compression process in the cloud, and specifically, can be applied to functions such as a cloud album on a cloud device, and the cloud device can be a cloud server.
  • FIG. 2b is a schematic diagram of another application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, and the image to be compressed may be captured by a camera component A photo or a frame taken from a video.
  • the terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data.
  • any lossless compression method based on the prior art can also be used.
  • the image compression method provided by the embodiments of the present application can be applied to image compression of terminal devices and image decompression processes of cloud devices. Specifically, it can be applied to functions such as cloud albums on cloud devices.
  • the cloud device can be a cloud server.
  • FIG. 2c is a schematic diagram of another application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, where the image to be compressed may be captured by a camera component A photo or a frame taken from a video.
  • the terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
  • Ws is the weight of Xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers.
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • the terminal device may be a mobile phone, a tablet, a notebook computer, a smart wearable device, or the like.
  • the terminal device may be a virtual reality (virtual reality, VR) device.
  • VR virtual reality
  • the embodiments of the present application can also be applied to intelligent monitoring, and a camera can be configured in the intelligent monitoring, and the intelligent monitoring can obtain pictures to be compressed through the camera, etc. It should be understood that the embodiments of the present application can also be applied to In other scenarios that require image compression, the other scenarios will not be listed one by one here.
  • FIG. 3a is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • step 301 the terminal device acquires a first image.
  • the terminal device can obtain the first image, where the first image can be a photo taken by the camera component or a frame captured from the captured video, the terminal device includes the camera component, and the camera component is generally a camera.
  • the first image may also be an image obtained by the terminal device from the network, or an image obtained by the terminal device using a screen capture tool.
  • FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application.
  • Fig. 3b illustrates the whole process of outputting the third image from the first image.
  • step 302 the terminal device sends the first image to the cloud device.
  • the terminal device Before the terminal device sends the first image to the cloud device, the terminal device may perform lossless encoding on the first image to obtain encoded data.
  • the encoding method can be entropy encoding, or other lossless compression methods.
  • step 303 the cloud device divides the first image to obtain N first image blocks.
  • the cloud device may receive the first image sent by the terminal device. If the first image undergoes lossless encoding by the terminal device, the cloud device also needs to perform lossless decoding on it.
  • the cloud device divides the first image to obtain N first image blocks, where N is an integer greater than 1.
  • FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application. As shown in FIG. 4 , the first image 401 is divided into 12 first tiles. Wherein, when the size of the first image is determined, the size of the first image block determines the value of N.
  • the N being 12 described here is just an example, and the size of the first image block will be described in detail in the subsequent description.
  • each of the N first tiles has the same size.
  • step 304 the cloud device obtains M first averages from the N first tiles.
  • the first tile includes data of three channels, and the number M of the first average values obtained by the cloud device is equal to 3N.
  • the first image is a grayscale image, that is, a one-channel image
  • the first block includes data of one channel, and the number M of the first mean values obtained by the cloud device is equal to N. Because the processing manner of each channel is similar, for the convenience of description, only one channel is used as an example for description in this embodiment of the present application.
  • the mean refers to the mean of the pixel values of all the pixels in the first tile.
  • step 305 the cloud device preprocesses the N first image blocks by using the N first averages.
  • the preprocessing may be to subtract the mean value from the pixel value of each pixel point in the first image block to obtain N first image blocks after preprocessing.
  • step 306 the pre-processed N first image blocks are processed through an encoding neural network to obtain N sets of first feature maps.
  • the coding neural network is a CNN
  • the terminal device may perform feature extraction on the preprocessed N first image blocks based on the CNN to obtain N groups of first feature maps.
  • Each set of first feature maps corresponds to one first block, and each set of first feature maps includes at least one feature map.
  • the first feature map may also be referred to as a channel feature map image, wherein each semantic channel corresponds to a first feature map.
  • the CNN 502 can multiply the upper left 3 ⁇ 3 pixels of the input data (the first tile) by the weights and map them to the neurons in the upper left end of the first feature map.
  • the weight to be multiplied will also be 3x3.
  • the CNN 502 scans the input data (the first tile) one by one from left to right and top to bottom, and multiplies the weights to map the neurons of the feature map.
  • the 3x3 weights used are called filters or filter kernels.
  • the process of applying filters in CNN502 is the process of performing convolution operations using filter kernels, and the extracted results are called "channel feature maps", where the channel feature maps can also be called multi-channel Feature map image, the term “multi-channel feature map image” may refer to a set of feature map images corresponding to multiple channels.
  • the channel feature map may be generated by CNN 502, also referred to as a "feature extraction layer” or “convolutional layer” of a CNN.
  • the layers of a CNN can define the mapping of output to input.
  • the mapping defined by the layers is performed as one or more filter kernels (convolution kernels) to be applied to the input data to generate channel feature maps to be output to the next layer.
  • the input data can be the first block or the channel feature map output by CNN502.
  • a CNN 502 receives a first tile 501 and generates a channel feature map 503 as input. Additionally, during forward execution, the next layer of CNN receives channel feature map 503 as input and generates channel feature map 503 as output. Then, each subsequent layer will receive the channel feature map generated in the previous layer and use it as an input to generate the channel feature map of the next layer. Finally, a set of first feature maps 504 generated in the (X1)th layer is received. Wherein, X1 is an integer greater than 1, that is, the channel feature maps of each layer above may be used as a set of first feature maps 504 .
  • the cloud device repeats the above operations for each first block, so as to obtain N groups of first feature maps.
  • the length and width of each feature map in the multi-channel feature map image gradually decrease, and the number of semantic channels in the multi-channel feature map image gradually increases, so as to realize the first image block. data compression.
  • processing operations can be performed in addition to the operation of applying convolution kernels that map input feature maps to output feature maps.
  • Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.
  • u represents the jth channel of the output of the ith convolutional layer.
  • v represents the output result of the corresponding activation function, and ⁇ and ⁇ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
  • the first image block is transformed into another space (at least one first feature map) through the CNN convolutional neural network.
  • the number of first feature maps is 192, that is, the number of semantic channels is 192, and each semantic channel corresponds to a first feature map.
  • at least one first feature map may be in the form of a three-dimensional tensor, and its size may be 192 ⁇ w ⁇ h, where w ⁇ h is the width of the matrix corresponding to the first feature map of a single channel and the long.
  • step 307 the N groups of first feature maps are quantized and entropy encoded to obtain N first encoded representations.
  • the N groups of first feature maps are converted to a quantization center according to a specified rule, so that entropy coding can be performed subsequently.
  • the quantization operation may convert the N sets of first feature maps from floating point numbers into bit streams (eg, bit streams using specific bit integers such as 8-bit integers or 4-bit integers).
  • the quantization operation may be performed on the N sets of first feature maps using rounding, but not limited to.
  • the probability estimation of each point in the output feature can be obtained by using an entropy estimation network, and the output feature is entropy encoded by using the probability estimation to obtain a binary code stream.
  • the entropy encoding process mentioned in this application is Existing entropy coding technology can be used, which is not repeated in this application.
  • step 308 the cloud device sends the N first encoded representations, the N first mean values and the corresponding relationship to the terminal device.
  • the terminal device stores the first image in the cloud device. If the terminal device needs to acquire the first image, it can send a request to the cloud device. After the cloud device receives the request sent by the terminal device, the cloud device sends N first encoded representations, N first mean values and corresponding relationships to the terminal device.
  • the correspondence relationship refers to the correspondence relationship between the N first coded representations and the N first mean values.
  • step 310 the terminal device processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks.
  • the transposed CNN 602 receives a set of second feature maps 601 and generates a reconstructed feature map 603 as output.
  • the next-layer transposed CNN receives the reconstructed feature map 603 as an input, and generates a reconstructed feature map of the next layer as an output.
  • Each subsequent transposed CNN layer will then receive the reconstructed feature map generated in the previous layer and generate the next reconstructed feature map as output.
  • the first reconstructed image block 604 generated in the (X2)th layer is received, where X2 is an integer greater than 1, that is, the reconstructed feature map of each layer above may be used as the first reconstructed map Block 604.
  • the cloud device repeats the above operations for each set of second feature maps, so as to obtain N first reconstructed image blocks.
  • iGDN inverse generalized divergence normalization
  • v represents the jth channel of the output of the ith convolutional layer.
  • u represents the output of the corresponding activation function, and ⁇ and ⁇ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
  • the cloud device sends the N first mean values and the corresponding relationship to the terminal device.
  • the terminal device compensates the N first reconstructed image blocks by using the N first mean values through the corresponding relationship. Compensation refers to adding a first mean value to the pixel value of each pixel in the first reconstructed image block to obtain a compensated first reconstructed image block.
  • N first reconstructed picture blocks after compensation can be obtained.
  • the terminal device when the terminal device receives N first quantized mean values from the cloud device, the terminal device compensates the N first reconstructed image blocks by using the N first quantized mean values. It should be determined that when the terminal device compensates the N first reconstructed image blocks by using the N first quantized average values, the cloud device will also preprocess the N first image blocks by using the N first quantized average values.
  • step 312 the terminal device combines the compensated N first reconstructed picture blocks to obtain a second image.
  • the combination is an inverse process of division, the N first reconstructed tiles are replaced with N first reconstructed tiles, and then the N first reconstructed tiles are combined.
  • the embodiments of the present application enhance the performance of each first image block by highlighting the local characteristics of each first image block, but it is also easy to cause blockiness between the first reconstructed image block and the first reconstructed image block.
  • Blockiness refers to a discontinuity phenomenon at the boundary between the first reconstructed image block and the first reconstructed image block, forming a defect in the reconstructed image.
  • the fusion neural network is a CNN. Please refer to Figure 5 and Figure 6. From the structure of CNN, the fusion neural network can be a combination of encoding neural network and decoding neural network. By taking the output 504 in FIG. 5 as the input 601 in FIG. 6 , and taking the second image as the input 501 in FIG. 5 , the output of FIG. 6 is the third image. By fusing the neural network, the blocking effect in the second image can be eliminated. It should be confirmed that here is a simple example of the framework of a fusion neural network. In practical applications, the framework of the fusion neural network, such as the number of layers of CNN, the number of layers of transposed CNN, the size of the matrix of each CNN layer, etc. can have nothing to do with encoding neural network, decoding neural network.
  • a linear rectifier unit layer ReLU is also included, and the ReLU is used to correct the negative numbers in the feature map output by the convolution kernel to zero.
  • the image processing method in the embodiment of the present application can process images of different sizes, such as the fourth image, the third image.
  • the pixels of the four images are different from the pixels of the first image.
  • the process of processing the fourth image by using the image processing method in the embodiment of the present application is similar to the process of processing the first image above, and details are not repeated here.
  • the cloud device divides the fourth image, and M second image blocks can be obtained.
  • the size of the second image block is the same as that of the first image block.
  • the same encoding neural network and decoding neural network are used to process the first tile and the second tile, and the first tile and the second tile are in the processing flow
  • the number of convolution operations in and the number of data involved in each convolution operation are the same.
  • a corresponding convolution operation unit can be designed according to the number of times of the above-mentioned convolution operation and/or the number of data involved in the convolution operation each time, so that the convolution operation unit matches the processing flow.
  • the convolution operation unit matches the first block, or the volume
  • the product operation unit is matched to the encoding neural network and/or the decoding neural network. The higher the matching degree between the convolution operation unit and the first image block, the smaller the number of idle multipliers and adders in the convolution operation unit in the processing flow, that is, the higher the usage efficiency of the convolution operation unit.
  • the size of the first tile not only affects the size of N, but also affects whether the image is just divided into tiles of integer blocks.
  • the size of the first image block is generally determined by the following two aspects.
  • the first aspect is the influence of the model on the size of the first tile.
  • the model includes an encoding neural network and a decoding neural network, and may also include a fusion neural network.
  • the influence of the model on the size of the first tile generally includes the impact on the size of the first tile when the model is trained and the impact on the size of the first tile when the model is used.
  • the terminal device when the terminal device is the encoding end, such as the aforementioned first application scenario, it is more meaningful to determine the size of the first image block by the target resolution. Because the target resolution indicates the pixels of the image that the terminal device may acquire in the future, that is, the pixels of the image that the encoding end will use the image processing method in the embodiment of this application to process in the future, when training the model, it can target pixels for training.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution.
  • FIG. 7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application.
  • the option 701 with a resolution of [4:3] 10MP is selected, although the option here does not specify the specific value of the target resolution.
  • the first image obtained by shooting it can be known that the pixels of the first image are 2736 ⁇ 3648, that is, the target pixels are 2736 ⁇ 3648.
  • the size of the first tile is determined such that is equal to an integer, equal to an integer.
  • Any first block can be understood as a local characteristic of the first image, and by highlighting the local characteristic of the first image, the reconstruction quality of the image can be improved, that is, the compression quality of the image can be improved.
  • FIG. 10 is a schematic diagram for comparison of image compression quality in an embodiment of the present application.
  • the abscissa represents the number of bits per pixel (bit-per-pixel, BPP), which is used to measure the code rate.
  • the ordinate represents the peak signal-to-noise ratio (PSNR), which is used to measure the quality.
  • the compression algorithms compared with the image processing methods in the embodiments of the present application include different implementations of the JPEG2000, HEVC (high efficiency video coding) and VVC (versatile video coding) standards.
  • JPEG2000 the reference software OpenJPEG is used to represent its compression performance.
  • Matlab the implementation integrated in Matlab is used as a supplement to the compression performance of JPEG2000.
  • HM-16.15 the reference software HM-16.15 is used to reflect the rate-distortion (RD) performance.
  • the performance of the VVC standard is expressed using the VVC standard reference software VTM-6.2.
  • the input image bit depth and the intra-computed bit depth are set to 8 to be compatible with the format of the input image, and the test image is encoded using the full intra (AI) configuration.
  • the rate-distortion performance of various compression algorithms is shown in Figure 10.
  • the rate-distortion performance curve of OpenJPEG is 1001
  • the rate-distortion performance curve implemented by Matlab of JPEG2000 is 1002
  • the performance curve of 420 image format compression of the reference software HM-16.15 is 1003
  • the performance curve of the unblocked convolutional neural network image compression algorithm is 1004
  • the performance curve of the present invention is 1005
  • the performance curve of the 420 image format compression of the reference software VTM-6.2 is 1006.
  • the database 230 stores the first image collection, and optionally, the database 230 further includes a fourth image collection.
  • the training device 220 generates a target model/rule 201 for processing the first image and/or the fourth image, and uses the first image and/or the fourth image in the database to iteratively train the target model/rule 201 to obtain a mature Target Model/Rule 201.
  • the target model/rule 201 includes an encoding neural network and a decoding neural network.
  • the target model/rule 201 further includes a fusion neural network.
  • the encoding neural network and decoding neural network obtained by training the device 220 can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, monitoring systems, and so on.
  • the execution device 210 may call data, codes, etc. in the data storage system 250 , and may also store data, instructions, etc. in the data storage system 250 .
  • the data storage system 250 may be placed in the execution device 210 , or the data storage system 250 may be an external memory relative to the execution device 210 .
  • the calculation module 211 receives the first image sent by the client device 240, divides the first image to obtain N first image blocks, extracts N first adaptive data from the N first image blocks, and uses the N first self-adaptive data. Adapt the data to preprocess the N first image blocks, and then perform feature extraction on the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps. Perform quantization and entropy encoding to obtain N encoding tables, where N is an integer greater than 1.
  • the execution device 210 and the terminal device 240 may be separate devices.
  • the execution device 210 is configured with an I/O interface 212 for data interaction with the terminal device 240 , and a “user” may
  • the first image is input to the I/O interface 212 through the terminal device 240
  • the execution device 210 returns the second image to the terminal device 240 through the I/O interface 212 to provide it to the user.
  • the relationship between the terminal device 240 and the execution device 210 can be described by the relationship between the terminal device and the encoder and the decoder.
  • the encoding end is a device that uses an encoding neural network
  • the decoding end is a device that uses a decoding neural network.
  • the encoding end and the decoding end can be the same device or independent devices.
  • the terminal device is similar to the terminal device in the above image processing method, and the terminal device may be an encoding end and/or a decoding end.
  • the terminal device 240 and the execution device 210 reference may be made to the foregoing related descriptions of FIGS. 2a-2c.
  • FIG. 11 is only a schematic structural diagram of an image processing system provided by an embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 210 may be configured in the terminal device 240.
  • the execution device 210 may be the main processor (Host Processor) of the mobile phone or tablet.
  • a module in the CPU) for performing array image processing, and the execution device 210 may also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet, and the GPU or NPU is linked as a coprocessor. Loaded to the main processor, the main processor assigns tasks.
  • GPU graphics processing unit
  • NPU neural network processor
  • FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • the model training method provided by the embodiment of the present application may include:
  • step 1201 the training device acquires a first image.
  • step 1202 the training device divides the first image to obtain N first image blocks, where N is an integer greater than 1.
  • the training device obtains N pieces of first adaptive data from the N first image blocks, and the N first adaptive data corresponds to the N first image blocks one-to-one.
  • step 1204 the training device preprocesses the N first tiles according to the N first adaptive data.
  • step 1205 the training device processes the preprocessed N first image blocks through the first coding neural network to obtain N groups of first feature maps.
  • step 1207 the training device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
  • step 1209 the training device compensates the N first reconstructed tiles by using the N first adaptive data.
  • step 1210 the training device combines the compensated N first reconstructed blocks to obtain a second image.
  • FIG. 13 is a schematic diagram of a training process provided by an embodiment of the present application.
  • the loss function of the model in the embodiment is:
  • the training process includes dividing the first image into blocks of different sizes in multiple iterations of training, that is, the values of N are different.
  • the size of the first block is optimized by comparing the loss functions obtained from multiple iterations.
  • the training device outputs a second encoding neural network and a second decoding neural network
  • the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network
  • the second decoding neural network The network is a model obtained by performing iterative training on the first decoding neural network.
  • the method further includes:
  • the training device quantizes the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, where the N pieces of first adaptive quantization data are used to compensate the N pieces of first reconstructed image blocks.
  • the training device processes the second image through a fusion neural network to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness;
  • the model includes a fusion neural network.
  • each of the N first tiles has the same size.
  • the size of the first image used for training is different, and the size of the first image block is a fixed value.
  • the target resolution is obtained according to the target image group in the gallery obtained by the camera component, the pixels of the target image group are the target pixels, and in the image groups of different pixels, the target image group is Image groups have the largest ratio in the gallery.
  • the method further includes:
  • filling the edge of the first image with the median value of pixels includes:
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image;
  • the method further includes:
  • a first acquisition module 1401, configured to acquire a first image
  • the entropy decoding module 1502 performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
  • a compensation module 1504 configured to compensate the N first reconstructed image blocks by using the N first adaptive data
  • the combining module 1505 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
  • the decoding apparatus may also perform all or part of the operations performed by the terminal device in the embodiment corresponding to FIG. 3a.
  • FIG. 16 is a schematic structural diagram of a training apparatus 1600 provided by an embodiment of the present application.
  • the training apparatus 1600 includes:
  • the first acquisition module 1601 is used to acquire a first image.
  • a segmentation module 1602, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1.
  • the second obtaining module 1603 is configured to obtain N pieces of first adaptive data from the N first image blocks, where the N pieces of first adaptive data are in one-to-one correspondence with the N first image blocks.
  • the first coding neural network module 1605 is configured to process the pre-processed N first image blocks to obtain N groups of first feature maps.
  • a quantization and entropy encoding module 1606, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  • the entropy decoding module 1607 performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
  • the first decoding neural network module 1608 is configured to process the N groups of second feature maps to obtain N first reconstructed image blocks.
  • Compensation module 1609 configured to compensate the N first reconstructed image blocks by using the N first adaptive data.
  • the combining module 1610 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
  • the third acquiring module 1611 is configured to acquire the distortion loss of the second image relative to the first image.
  • the output module 1613 is used for outputting a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained by performing iterative training on the first coding neural network, and the second decoding neural network A model obtained after performing iterative training for the first decoding neural network.
  • the training apparatus is further configured to perform all or part of the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • FIG. 17 is a schematic structural diagram of the execution device provided by the embodiment of the present application.
  • the execution device 1700 may specifically be represented as a virtual reality VR device, a mobile phone, Tablets, laptops, smart wearable devices, monitoring data processing devices, servers, etc., are not limited here.
  • the encoding apparatus described in the corresponding embodiment of FIG. 14 and/or the decoding apparatus described in the corresponding embodiment of FIG. 15 may be deployed on the execution device 1700 to implement the apparatus in the corresponding embodiment of FIG. 14 and/or FIG. 15 function.
  • the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (wherein the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17 ) , wherein the processor 1703 may include an application processor 17031 and a communication processor 17032 .
  • the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or otherwise.
  • the processor 1703 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.
  • the receiver 1701 can be used to receive input numerical or character information, and to generate signal input related to performing relevant settings and function control of the device.
  • the transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include a display device such as a display screen .
  • the processor 1703 is configured to perform the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
  • N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
  • the application processor 17031 can also be used to perform all or part of the operations that can be performed by the terminal device in the embodiment corresponding to FIG. 3a.
  • the central processing unit 1822 is configured to perform all or part of the operations performed by the training device in the embodiment corresponding to FIG. 16 .
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

The present application relates to the field of artificial intelligence. Disclosed is an image processing method, comprising: acquiring a first image; segmenting the first image to obtain N first image blocks; acquiring N pieces of first adaptive data from the N first image blocks, wherein the N pieces of first adaptive data correspond to the N first image blocks on a one-to-one basis; preprocessing the N first image blocks according to the N pieces of first adaptive data; processing the preprocessed N first image blocks by means of a coding neural network to obtain N sets of first feature maps; and performing quantization and entropy coding on the N sets of first feature maps, so as to obtain N first coded representations. In the present application, by extracting multiple pieces of adaptive information, the multiple pieces of adaptive information can be used for compensating for multiple restructured image blocks, such that local characteristics are highlighted, and the image quality of a second image is improved.

Description

一种图像处理方法以及相关设备An image processing method and related equipment
本申请要求于2020年07月30日提交中国专利局、申请号为202010754333.9、发明名称为“一种图像处理方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number of 202010754333.9 and the invention titled "An image processing method and related equipment" filed with the China Patent Office on July 30, 2020, the entire contents of which are incorporated herein by reference middle.
技术领域technical field
本申请涉及人工智能领域,尤其涉及一种图像处理方法以及相关设备。The present application relates to the field of artificial intelligence, and in particular, to an image processing method and related equipment.
背景技术Background technique
如今多媒体数据占据了互联网的绝大部分流量。对于图像数据的压缩对于多媒体数据的存储和高效传输有着至关重要的作用。所以图像编码是一项具有重大实用价值的技术。Multimedia data now accounts for the vast majority of Internet traffic. Compression of image data plays a vital role in the storage and efficient transmission of multimedia data. So image coding is a technology with great practical value.
对于图像编码的研究已经有较长的历史,研究人员提出了大量的方法,并制定了多种国际标准,比如JPEG,JPEG2000,WebP,BPG等图像编码标准。这些编码方法虽然在目前都得到了广泛应用,但是针对现在不断增长的图像数据量及不断出现的新媒体类型,这些传统方法显示出了某些局限性。近年来,开始有研究人员开展了基于深度学习图像编码方法的研究。有些研究人员已经取得了不错的成果,比如Ballé等人提出了一种端到端优化的图像编码方法,取得了超越目前最好的图像编码性能,甚至超越了目前最好的传统编码标准BPG。深度学习图像编码是一种有损图像编码技术,深度学习图像编码的一般流程如下:在编码端提取图像的自适应数据,利用自适应数据对图像进行预处理,通过编码神经网络对预处理后的图像进行编码得到压缩数据,在解码端对压缩数据进行解码,得到与原始图像相近的图像。The research on image coding has a long history. Researchers have proposed a large number of methods and formulated a variety of international standards, such as JPEG, JPEG2000, WebP, BPG and other image coding standards. Although these coding methods have been widely used at present, these traditional methods show some limitations in view of the increasing amount of image data and the emerging new media types. In recent years, researchers have begun to carry out research on image coding methods based on deep learning. Some researchers have achieved good results. For example, Ballé et al. proposed an end-to-end optimized image coding method, which achieved the best image coding performance and even the best traditional coding standard BPG. Deep learning image coding is a lossy image coding technology. The general process of deep learning image coding is as follows: extract the adaptive data of the image at the encoding end, use the adaptive data to preprocess the image, and use the coding neural network to process the preprocessed image. The compressed image is encoded to obtain compressed data, and the compressed data is decoded at the decoding end to obtain an image similar to the original image.
虽然上述深度学习图像编码相对于传统编码方法有较大的进步,但如何降低编码过程中图像质量的损失,是有损图像编码技术一直需要解决的问题。Although the above-mentioned deep learning image coding has made great progress compared with the traditional coding method, how to reduce the loss of image quality during the coding process is a problem that the lossy image coding technology has always needed to solve.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种图像处理方法以及相关设备,用于提升图像质量。The present application provides an image processing method and related equipment for improving image quality.
本申请第一方面提供了一种图像处理方法,所述方法包括:编码端获取第一图像,然后分割第一图像,获得N个第一图块,N为大于1的整数。编码端从N个第一图块中获取N个第一自适应数据,N个第一自适应数据与N个第一图块一一对应。编码端通过N个第一自适应数据对N个第一图块进行预处理。在预处理后,编码端通过编码神经网络处理预处理后N个第一图块,得到N组第一特征图。编码端对N组第一特征图进行量化和熵编码,得到N个第一编码表示。其中,通过提取多个自适应信息,该多个自适应信息可用于对多个重构图块进行补偿,从而凸显局部特性,提升第二图像的图像质量。A first aspect of the present application provides an image processing method, the method includes: an encoding end obtains a first image, and then divides the first image to obtain N first image blocks, where N is an integer greater than 1. The encoding end obtains N pieces of first adaptive data from the N first picture blocks, and the N pieces of first adaptive data are in one-to-one correspondence with the N first picture blocks. The encoding end preprocesses the N first image blocks by using the N first adaptive data. After preprocessing, the encoding end processes the preprocessed N first image blocks through an encoding neural network, and obtains N groups of first feature maps. The encoding end performs quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations. Wherein, by extracting multiple adaptive information, the multiple adaptive information can be used to compensate multiple reconstructed image blocks, so as to highlight local characteristics and improve the image quality of the second image.
在第一方面的一种可选设计中,如果对N个第一编码表示进行熵解码,可得到N组第二特征图。如果通过解码神经网络处理N组第二特征图,可得到N个第一重构图块,N个第一自适应数据用于对N个第一重构图块进行补偿。如果通过组合补偿后的N个第一重构 图块,可得到第二图像。In an optional design of the first aspect, if entropy decoding is performed on the N first encoded representations, N groups of second feature maps can be obtained. If N groups of second feature maps are processed through a decoding neural network, N first reconstructed image blocks can be obtained, and N first adaptive data are used to compensate the N first reconstructed image blocks. If the compensated N first reconstructed blocks are combined, a second image can be obtained.
在第一方面的一种可选设计中,所述方法还包括:编码端向解码端发送N个第一编码表示,N个第一自适应数据和对应关系,对应关系包括N个第一自适应数据和N个第一编码表示的对应关系。In an optional design of the first aspect, the method further includes: the encoding end sends N first encoded representations, the N first adaptive data and the corresponding relationship to the decoding end, and the corresponding relationship includes the N first self-adaptive data. The correspondence between the adaptation data and the N first encoded representations is adapted.
在第一方面的一种可选设计中,所述方法还包括:编码端量化N个第一自适应数据,获得N个第一自适应量化数据。编码端向解码端发送N个第一自适应量化数据,N个第一自适应量化数据用于对N个第一重构图块进行补偿。其中,N为大于1的整数,编码端需要获取多个第一自适应数据。相比于只从第一图像中获取一个自适应数据,在获取多个自适应数据时,量化第一自适应数据,可以减少第一自适应数据的数据量。In an optional design of the first aspect, the method further includes: the encoding end quantizes N pieces of first adaptive data to obtain N pieces of first adaptive quantization data. The encoding end sends N pieces of first adaptive quantization data to the decoding end, where the N pieces of first adaptive quantization data are used to compensate the N first reconstructed image blocks. Wherein, N is an integer greater than 1, and the encoding end needs to acquire a plurality of first adaptive data. Compared with acquiring only one adaptive data from the first image, when acquiring multiple adaptive data, quantizing the first adaptive data can reduce the data amount of the first adaptive data.
在第一方面的一种可选设计中,N越大,单个第一自适应量化数据的信息熵越小。其中,N越大,第一图块的数量越多,第一自适应量化数据的数量越多。在此情况下,通过减少第一自适应量化数据的信息熵,可以进一步提升第一自适应数量的量化程度,减少第一自适应数据的数据量。In an optional design of the first aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is. Wherein, the larger the N is, the more the number of the first image blocks is, and the more the number of the first adaptive quantization data is. In this case, by reducing the information entropy of the first adaptive quantization data, the quantization degree of the first adaptive quantity can be further improved, and the data quantity of the first adaptive data can be reduced.
在第一方面的一种可选设计中,N个第一编码表示的排列顺序和N个第一图块的排列顺序相同,N个第一图块的排列顺序为N个第一图块在第一图像中的排列顺序,对应关系包括N个第一编码表示的排列顺序和N个第一图块的排列顺序。相比于只从第一图像中获取一个自适应数据,本申请中存在多个自适应数据和多个第一图块。因此,本申请需要保证多个自适应数据和多个第一图块的对应关系。通过排列顺序来保证上述对应关系,可以减少数据量。In an optional design of the first aspect, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks. Compared with obtaining only one adaptive data from the first image, there are multiple adaptive data and multiple first tiles in the present application. Therefore, the present application needs to ensure the correspondence between multiple adaptive data and multiple first image blocks. By arranging the order to ensure the above correspondence, the amount of data can be reduced.
在第一方面的一种可选设计中,如果通过融合神经网络处理第二图像,得到第三图像,融合神经网络用于降低第二图像与第一图像的差异,差异包括块效应。本申请通过凸显每个图像块的局部特性,增强了每个图块的表现,但也因此容易造成图块与图块之间的块效应。通过融合神经网络处理第二图像,可以减少块效应造成的影响,提升图像质量。In an optional design of the first aspect, if the second image is processed through a fusion neural network to obtain a third image, the fusion neural network is used to reduce the difference between the second image and the first image, where the difference includes blockiness. The present application enhances the performance of each image block by highlighting the local characteristics of each image block, but it is also easy to cause block effects between image blocks. By processing the second image by fusing the neural network, the influence caused by the block effect can be reduced and the image quality can be improved.
在第一方面的一种可选设计中,N个第一图块中的每个第一图块的大小相同。其中,每个第一图块的大小相同,则在编码神经网络中的特征图和卷积层的运算中,每个图块的参与卷积运算中的乘法次数和加法次数相同,从而可以提升运算效率。In an optional design of the first aspect, each of the N first tiles has the same size. Among them, if the size of each first block is the same, in the operation of the feature map and the convolution layer in the coding neural network, the number of multiplications and additions involved in the convolution operation of each block is the same, which can improve the Operational efficiency.
在第一方面的一种可选设计中,当所述方法用于分割不同大小的所述第一图像时,所述第一图块的大小为固定值。其中,编码端在处理不同大小的图像时,将图像分割为大小相同的图块。通过固定第一图块的大小,使得卷积运算单元与图块的高匹配成为可能,从而可以减轻卷积运算单元的成本或提高卷积运算单元的使用效率。In an optional design of the first aspect, when the method is used to segment the first images of different sizes, the size of the first image block is a fixed value. Among them, when processing images of different sizes, the encoding end divides the images into blocks of the same size. By fixing the size of the first block, high matching of the convolution operation unit and the block is made possible, so that the cost of the convolution operation unit can be reduced or the use efficiency of the convolution operation unit can be improved.
在第一方面的一种可选设计中,第一图块的像素为a×b,a和b是根据目标像素得到的,目标像素为c×d,
Figure PCTCN2021101807-appb-000001
等于整数,
Figure PCTCN2021101807-appb-000002
等于整数,a和c为宽度方向上的像素点数量,b和d为高度方向上的像素点数量,目标像素是根据终端设备的目标分辨率得到的,终端设备包括摄像部件,摄像部件在目标分辨率的设置下得到的图像的像素为目标像素,第一图像由摄像部件获得。编码端和/或解码端可以是终端设备,也可以不是终端设备。其中,在目标分辨力的设置下,编码端得到的图像可以刚好被划分成不同的图块,从而避免填充无用的数据,提升图像质量。
In an optional design of the first aspect, the pixels of the first block are a×b, a and b are obtained according to the target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000001
is equal to an integer,
Figure PCTCN2021101807-appb-000002
Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component. The encoding end and/or the decoding end may or may not be a terminal device. Among them, under the setting of the target resolution, the image obtained by the encoding end can just be divided into different blocks, so as to avoid filling useless data and improve the image quality.
在第一方面的一种可选设计中,目标分辨率是根据摄像应用中的设置界面对摄像部件 的分辨率设置得到的。其中,摄像应用的设置界面可对摄像部件拍摄得到的分辨力进行设置。将设置界面中已经选定的分辨力作为目标分辨力,提高目标分辨力的获取效率。In an optional design of the first aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application. Wherein, the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution to improve the acquisition efficiency of the target resolution.
在第一方面的一种可选设计中,目标分辨率是根据摄像部件得到的图库中的目标图像组得到的,目标图像组的像素为目标像素,在不同像素的图像组中,目标图像组在图库中的比值最大。其中,编码端通过摄像部件得到的图库包括不同像素的图像组。通过目标图像组确定目标像素,可以保证大多数图像刚好可以被划分成不同的图块,从而避免填充无用的数据,提升图像质量。In an optional design of the first aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery. Wherein, the library obtained by the encoding end through the camera component includes image groups of different pixels. Determining the target pixels through the target image group can ensure that most images can be divided into different tiles, thereby avoiding filling useless data and improving image quality.
在第一方面的一种可选设计中,其特征在于,通过摄像部件得到多个像素的图像,多个像素为e×f,
Figure PCTCN2021101807-appb-000003
等于整数,
Figure PCTCN2021101807-appb-000004
等于整数,e包括c,f包括d。其中,终端设备可通过摄像部件得到不同像素的图像,e×f是不同像素的图像的像素集合。本申请限定摄像部件得到的不同像素的图像都能刚好被划分成不同的图块,避免填充无用的数据,从而提升图像质量。
In an optional design of the first aspect, it is characterized in that an image of a plurality of pixels is obtained by the imaging component, and the plurality of pixels is e×f,
Figure PCTCN2021101807-appb-000003
is equal to an integer,
Figure PCTCN2021101807-appb-000004
Equal to an integer, e includes c and f includes d. Wherein, the terminal device can obtain images of different pixels through the camera component, and e×f is a pixel set of images of different pixels. The present application defines that images of different pixels obtained by the imaging component can just be divided into different blocks, so as to avoid filling useless data, thereby improving image quality.
在第一方面的一种可选设计中,多个像素是通过摄像应用中的设置界面对摄像部件的分辨率设置得到的。In an optional design of the first aspect, the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
在第一方面的一种可选设计中,第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,第一图像的的像素为r×t。在获取第一图像后,在分割第一图像之前,方法还包括:若
Figure PCTCN2021101807-appb-000005
不等于整数,和/或
Figure PCTCN2021101807-appb-000006
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000007
等于整数,
Figure PCTCN2021101807-appb-000008
等于整数,填充后的第一图像的像素为r1×t1。其中,第一图块的大小是固定的,编码端可能需要面对不同像素的图像,即有些像素的图像可能无法被刚好划分。在图像无法被刚好划分的情况下,用图像中值填充图像的边缘,可以在降低对图像质量的影响的情况下,提高模型的兼容性。图像中值是像素点的中值。
In an optional design of the first aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r×t. After acquiring the first image, before dividing the first image, the method further includes: if
Figure PCTCN2021101807-appb-000005
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000006
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000007
is equal to an integer,
Figure PCTCN2021101807-appb-000008
equal to an integer, the pixels of the first image after filling are r1×t1. The size of the first block is fixed, and the encoding end may need to face images of different pixels, that is, images of some pixels may not be exactly divided. In the case that the image cannot be divided exactly, filling the edges of the image with the median value of the image can improve the compatibility of the model while reducing the impact on the image quality. The image median is the median of the pixels.
在第一方面的一种可选设计中,在获取第一图像后,在填充所述第一图像的边缘之前,所述方法还包括:若
Figure PCTCN2021101807-appb-000009
不等于整数,则等比放大r和t,得到像素为r2×t2的第一图像,
Figure PCTCN2021101807-appb-000010
等于整数。若
Figure PCTCN2021101807-appb-000011
不等于整数,则用像素中值填充所述第一图像的边缘。其中,填充像素中值的图块的数量会影响图像的质量。通过等比放大图像,减少填充图像中值的图块的数量,提升图像质量。
In an optional design of the first aspect, after acquiring the first image, before filling the edge of the first image, the method further includes: if
Figure PCTCN2021101807-appb-000009
is not equal to an integer, then proportionally enlarge r and t to obtain the first image with pixels r2×t2,
Figure PCTCN2021101807-appb-000010
equal to an integer. like
Figure PCTCN2021101807-appb-000011
not equal to an integer, then fill the edges of the first image with the pixel median. Among them, the number of tiles that fill the median of the pixels affects the quality of the image. Improve image quality by scaling up the image proportionally, reducing the number of tiles that fill the value in the image.
在第一方面的一种可选设计中,在等比放大r和t后,若
Figure PCTCN2021101807-appb-000012
不等于整数,则获取
Figure PCTCN2021101807-appb-000013
的余数。若余数大于
Figure PCTCN2021101807-appb-000014
则只在第一图像的宽度方向的一侧填充像素中值。其中,只在图像的一侧填充像素中值,在减少填充对图像块的影响的情况下,进一步减少填充图像中值的图块的数量,提升图像质量。
In an optional design of the first aspect, after proportionally enlarging r and t, if
Figure PCTCN2021101807-appb-000012
not equal to an integer, get
Figure PCTCN2021101807-appb-000013
the remainder. If the remainder is greater than
Figure PCTCN2021101807-appb-000014
Then only the pixel median value is filled on one side of the width direction of the first image. Among them, only one side of the image is filled with the median value of the pixel, and the number of blocks filled with the median value of the image is further reduced under the condition of reducing the impact of the filling on the image block, and the image quality is improved.
在第一方面的一种可选设计中,若余数小于
Figure PCTCN2021101807-appb-000015
则在第一图像的宽度方向的二侧填充像素中值,使得每侧填充的所述像素中值的宽度为
Figure PCTCN2021101807-appb-000016
其中,g为所述余数。其中,减少填充对图像块的影响,提升图像质量。
In an optional design of the first aspect, if the remainder is less than
Figure PCTCN2021101807-appb-000015
Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the median value of the pixels filled on each side is
Figure PCTCN2021101807-appb-000016
where g is the remainder. Among them, the impact of filling on image blocks is reduced, and the image quality is improved.
在第一方面的一种可选设计中,N个第一图块包括第一目标图块,第一目标图块的像素值的范围小于第一图像的像素值的范围。在从N个第一图块中获取N个第一自适应数据之前,所述方法还包括:编码端反量化第一目标图块的像素值。编码端从反量化后的第一目标图块获取一个第一自适应数据。反量化第一目标图块的像素值,进一步凸显图像的局 部特性。In an optional design of the first aspect, the N first image blocks include a first target image block, and the range of pixel values of the first target image block is smaller than the range of pixel values of the first image. Before acquiring the N first adaptive data from the N first image blocks, the method further includes: an encoding end inverse-quantizes pixel values of the first target image block. The encoding end obtains a first adaptive data from the inverse quantized first target image block. The pixel value of the first target image block is inversely quantized to further highlight the local characteristics of the image.
本申请第二方面提供了一种图像处理方法,所述方法包括:A second aspect of the present application provides an image processing method, the method comprising:
解码端获取N个第一编码表示,N个第一自适应数据和对应关系。对应关系包括N个第一自适应数据和N个第一编码表示的对应关系。N个第一自适应数据与N个第一编码一一对应,N为大于1的整数。解码端对N个第一编码表示进行熵解码,得到N组第二特征图。解码端通过解码神经网络处理N组第二特征图,得到N个第一重构图块。解码端通过N个第一自适应数据补偿N个第一重构图块。解码端组合补偿后的N个第一重构图块,以获得第二图像。其中,通过多个自适应数据对多个重构图块进行补偿,凸显每个图块的局部特性,从而提升第二图像的图像质量。The decoding end obtains N first encoded representations, N first adaptive data and corresponding relationships. The corresponding relationship includes the corresponding relationship between the N pieces of first adaptive data and the N pieces of first coding representation. The N pieces of first adaptive data are in one-to-one correspondence with the N pieces of first codes, where N is an integer greater than 1. The decoding end performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps. The decoding end processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks. The decoding end compensates the N first reconstructed image blocks by using the N first adaptive data. The decoding end combines the compensated N first reconstructed image blocks to obtain a second image. The multiple reconstructed image blocks are compensated through multiple adaptive data to highlight the local characteristics of each image block, thereby improving the image quality of the second image.
在第二方面的一种可选设计中,N个第一编码表示是通过N组第一特征图量化和熵编码得到的,N组第一特征图是通过编码神经网络处理预处理后的N个第一图块得到的,预处理后的N个第一图块是通过N个第一自适应数据对N个第一图块进行预处理得到的,N个第一自适应数据是从N个第一图块中得到的,N个第一图块是通过分割第一图像得到的。In an optional design of the second aspect, the N first encoded representations are obtained through quantization and entropy encoding of N sets of first feature maps, and the N sets of first feature maps are N preprocessed N sets of first feature maps processed by an encoding neural network. The preprocessed N first image blocks are obtained by preprocessing N first image blocks through N first adaptive data, and the N first adaptive data are obtained from N N first blocks are obtained by dividing the first image.
在第二方面的一种可选设计中,N个第一自适应数据为N个第一自适应量化数据,N个第一自适应量化数据是通过量化N个第一自适应数据得到的。解码端具体通过N个第一自适应量化数据补偿N个第一重构图块。In an optional design of the second aspect, the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data. Specifically, the decoding end compensates the N first reconstructed image blocks by using the N first adaptive quantization data.
在第二方面的一种可选设计中,N越大,单个第一自适应量化数据的信息熵越小。In an optional design of the second aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
在第二方面的一种可选设计中,N个第一编码表示的排列顺序和N个第一图块的排列顺序相同,N个第一图块的排列顺序为N个第一图块在第一图像中的排列顺序,对应关系包括N个第一编码表示的排列顺序和N个第一图块的排列顺序。In an optional design of the second aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
在第二方面的一种可选设计中,所述方法还包括:解码端通过融合神经网络处理第二图像,得到第三图像。通过融合神经网络处理第二图像,以降低第二图像与第一图像的差异,差异包括块效应。In an optional design of the second aspect, the method further includes: the decoding end processes the second image through a fusion neural network to obtain the third image. The second image is processed by a fusion neural network to reduce differences between the second image and the first image, including blockiness.
在第二方面的一种可选设计中,N个第一图块中的每个第一图块的大小相同。In an optional design of the second aspect, each of the N first tiles has the same size.
在第二方面的一种可选设计中,当所述方法用于组合生成不同大小的第二图像时,所述第一图块的大小为固定值。在第二方面的一种可选设计中,第一图块的像素为a×b。a和b是根据目标像素得到的,目标像素为c×d,
Figure PCTCN2021101807-appb-000017
等于整数,
Figure PCTCN2021101807-appb-000018
等于整数,a和c为宽度方向上的像素点数量,b和d为高度方向上的像素点数量。目标像素是根据终端设备的目标分辨率得到的,终端设备包括摄像部件,摄像部件在目标分辨率的设置下得到的图像的像素为目标像素,第一图像由摄像部件获得。
In an optional design of the second aspect, when the method is used to combine and generate second images of different sizes, the size of the first image block is a fixed value. In an optional design of the second aspect, the pixels of the first image block are a×b. a and b are obtained according to the target pixel, the target pixel is c×d,
Figure PCTCN2021101807-appb-000017
is equal to an integer,
Figure PCTCN2021101807-appb-000018
Equal to an integer, a and c are the number of pixels in the width direction, and b and d are the number of pixels in the height direction. The target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, the pixels of the image obtained by the camera component under the setting of the target resolution are the target pixels, and the first image is obtained by the camera component.
在第二方面的一种可选设计中,目标分辨率是根据摄像应用中的设置界面对摄像部件的分辨率设置得到的。In an optional design of the second aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
在第二方面的一种可选设计中,目标分辨率是根据摄像部件得到的图库中的目标图像组得到的,目标图像组的像素为目标像素。在不同像素的图像组中,目标图像组在图库中的比值最大。In an optional design of the second aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, and the pixels of the target image group are target pixels. Among the image groups with different pixels, the target image group has the largest ratio in the gallery.
在第二方面的一种可选设计中,通过摄像部件得到多个像素的图像,多个像素为e×f,
Figure PCTCN2021101807-appb-000019
等于整数,
Figure PCTCN2021101807-appb-000020
等于整数,e包括c,f包括d。
In an optional design of the second aspect, an image of multiple pixels is obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000019
is equal to an integer,
Figure PCTCN2021101807-appb-000020
Equal to an integer, e includes c and f includes d.
在第二方面的一种可选设计中,多个像素可通过摄像应用中的设置界面对摄像部件的 分辨率设置得到。In an optional design of the second aspect, the multiple pixels can be obtained by setting the resolution of the camera component through a setting interface in the camera application.
在第二方面的一种可选设计中,第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,第一图像的像素为r×t。在
Figure PCTCN2021101807-appb-000021
不等于整数,和/或
Figure PCTCN2021101807-appb-000022
不等于整数的情况下,第一图像的边缘被像素中值填充,使得
Figure PCTCN2021101807-appb-000023
等于整数,
Figure PCTCN2021101807-appb-000024
等于整数,填充后的第一图像的像素为r1×t1。
In an optional design of the second aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. exist
Figure PCTCN2021101807-appb-000021
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000022
not equal to an integer, the edges of the first image are filled with the pixel median such that
Figure PCTCN2021101807-appb-000023
is equal to an integer,
Figure PCTCN2021101807-appb-000024
equal to an integer, the pixels of the first image after filling are r1×t1.
在第二方面的一种可选设计中,在
Figure PCTCN2021101807-appb-000025
不等于整数的情况下,r和t被编码端等比放大,得到了像素为r2×t2的第一图像,
Figure PCTCN2021101807-appb-000026
等于整数。
In an optional design of the second aspect, in
Figure PCTCN2021101807-appb-000025
In the case of not equal to an integer, r and t are proportionally enlarged by the encoder, and the first image with pixels r2×t2 is obtained,
Figure PCTCN2021101807-appb-000026
equal to an integer.
在第二方面的一种可选设计中,若
Figure PCTCN2021101807-appb-000027
不等于整数,
Figure PCTCN2021101807-appb-000028
的余数大于
Figure PCTCN2021101807-appb-000029
第一图像只在宽度方向的一侧被填充像素中值。
In an optional design of the second aspect, if
Figure PCTCN2021101807-appb-000027
not equal to an integer,
Figure PCTCN2021101807-appb-000028
the remainder is greater than
Figure PCTCN2021101807-appb-000029
The first image is padded with the median pixel value only on one side in the width direction.
在第二方面的一种可选设计中,若余数小于
Figure PCTCN2021101807-appb-000030
第一图像在宽度方向的二侧被填充像素中值,每侧填充的像素中值的宽度为
Figure PCTCN2021101807-appb-000031
其中,g为余数。
In an optional design of the second aspect, if the remainder is less than
Figure PCTCN2021101807-appb-000030
The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000031
where g is the remainder.
在第二方面的一种可选设计中,N个第一图块包括第一目标图块,第一目标图块的像素值的范围小于第一图像的像素值的范围,至少其中一个第一自适应数据是从反量化后的第一目标图块得到的,反量化后的第一目标图块是通过反量化第一目标图块的像素值得到的。In an optional design of the second aspect, the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first The adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.
本申请第三方面提供了一种模型训练方法,所述方法包括:A third aspect of the present application provides a model training method, the method comprising:
获取第一图像;get the first image;
分割所述第一图像,获得N个第一图块,N为大于1的整数;dividing the first image to obtain N first image blocks, where N is an integer greater than 1;
从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应;Obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
根据所述N个第一自适应数据对所述N个第一图块进行预处理;preprocessing the N first image blocks according to the N first adaptive data;
通过第一编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The preprocessed N first image blocks are processed by the first coding neural network to obtain N groups of first feature maps;
对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示;Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations;
对所述N个第一编码表示进行熵解码,得到N组第二特征图;Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
通过第一解码神经网络处理所述N组第二特征图,得到N个第一重构图块;The N groups of second feature maps are processed by the first decoding neural network to obtain N first reconstructed image blocks;
通过所述N个第一自适应数据补偿所述N个第一重构图块;compensating the N first reconstructed image blocks by the N first adaptive data;
组合补偿后的N个第一重构图块,得到第二图像;Combining the compensated N first reconstructed image blocks to obtain a second image;
获取所述第二图像相对于所述第一图像的失真损失;obtaining a distortion loss of the second image relative to the first image;
利用损失函数对模型进行联合训练,直至所述第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述模型包括所述第一编码神经网络、量化网络、熵编码网络、熵解码网络、所述第一解码神经网络。可选地,所述模型还包括分割网络,分割网络中的可训练的参数为第一图块的大小。A loss function is used to jointly train the model until the image distortion value between the first image and the second image reaches a first preset level, and the model includes the first coding neural network, quantization network, entropy An encoding network, an entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.
输出第二编码神经网络和第二解码神经网络,所述第二编码神经网络为所述第一编码神经网络执行过迭代训练后得到的模型,所述第二解码神经网络为所述第一解码神经网络执行过迭代训练后得到的模型。Output a second encoding neural network and a second decoding neural network, where the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network, and the second decoding neural network is the first decoding neural network The model obtained after the neural network has performed iterative training.
在第三方面的一种可选设计中,所述方法还包括:In an optional design of the third aspect, the method further includes:
量化所述N个第一自适应数据,获得N个第一自适应量化数据,所述N个第一自适应 量化数据用于对所述N个第一重构图块进行补偿;Quantizing the N first adaptive data to obtain N first adaptive quantization data, and the N first adaptive quantization data is used to compensate the N first reconstructed picture blocks;
在第三方面的一种可选设计中,所述N越大,单个第一自适应量化数据的信息熵越小。In an optional design of the third aspect, the larger the N, the smaller the information entropy of the single first adaptive quantization data.
在第三方面的一种可选设计中,所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序相同,所述N个第一图块的排列顺序为所述N个第一图块在所述第一图像中的排列顺序。In an optional design of the third aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is the the arrangement order of the N first image blocks in the first image.
在第三方面的一种可选设计中,通过融合神经网络处理所述第二图像,得到第三图像,所述融合神经网络用于降低所述第二图像与所述第一图像的差异,所述差异包括块效应;In an optional design of the third aspect, the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;
获取所述第二图像相对于所述第一图像的失真损失包括:Obtaining the distortion loss of the second image relative to the first image includes:
获取所述第三图像相对于所述第一图像的失真损失;obtaining a distortion loss of the third image relative to the first image;
所述模型包括融合神经网络。The model includes a fusion neural network.
在第三方面的一种可选设计中,所述N个第一图块中的每个第一图块的大小相同。In an optional design of the third aspect, each of the N first image blocks has the same size.
在第三方面的一种可选设计中,在两次迭代训练中,训练用的第一图像的大小不同,第一图块的大小为固定值。In an optional design of the third aspect, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.
在第三方面的一种可选设计中,所述第一图块的像素为a×b,所述a和所述b是根据目标像素得到的,所述目标像素为c×d,
Figure PCTCN2021101807-appb-000032
等于整数,
Figure PCTCN2021101807-appb-000033
等于整数,所述a和c为宽度方向上的像素点数量,所述b和d为高度方向上的像素点数量,所述目标像素是根据终端设备的目标分辨率得到的,所述终端设备包括摄像部件,所述摄像部件在所述目标分辨率的设置下得到的图像的像素为所述目标像素,第一图像由所述摄像部件获得。
In an optional design of the third aspect, the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000032
is equal to an integer,
Figure PCTCN2021101807-appb-000033
is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.
在第三方面的一种可选设计中,所述目标分辨率是根据摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。In an optional design of the third aspect, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
在第三方面的一种可选设计中,所述目标分辨率是根据所述摄像部件得到的图库中的目标图像组得到的,所述目标图像组的像素为所述目标像素,在不同像素的图像组中,所述目标图像组在所述图库中的比值最大。In an optional design of the third aspect, the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
在第三方面的一种可选设计中,通过所述摄像部件得到多个像素的图像,所述多个像素为e×f,
Figure PCTCN2021101807-appb-000034
等于整数,
Figure PCTCN2021101807-appb-000035
等于整数,所述e包括所述c,所述f包括所述d。
In an optional design of the third aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000034
is equal to an integer,
Figure PCTCN2021101807-appb-000035
equal to an integer, the e includes the c and the f includes the d.
在第三方面的一种可选设计中,所述多个像素是通过所述摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。In an optional design of the third aspect, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
在第三方面的一种可选设计中,所述第一图块的像素为a×b,所述a为宽度方向上的像素点数量,所述b为高度方向上的像素点数量,所述第一图像的像素为r×t;In an optional design of the third aspect, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r×t;
在获取所述第一图像后,在分割所述第一图像之前,所述方法还包括:After acquiring the first image, and before segmenting the first image, the method further includes:
Figure PCTCN2021101807-appb-000036
不等于整数,和/或
Figure PCTCN2021101807-appb-000037
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000038
等于整数,
Figure PCTCN2021101807-appb-000039
等于整数,填充后的所述第一图像的像素为r1×t1。
like
Figure PCTCN2021101807-appb-000036
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000037
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000038
is equal to an integer,
Figure PCTCN2021101807-appb-000039
equal to an integer, the pixels of the first image after filling are r1×t1.
在第三方面的一种可选设计中,在获取所述第一图像后,在填充所述第一图像的边缘之前,所述方法还包括:In an optional design of the third aspect, after acquiring the first image and before filling the edge of the first image, the method further includes:
若所述
Figure PCTCN2021101807-appb-000040
不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
Figure PCTCN2021101807-appb-000041
等于整数;
if said
Figure PCTCN2021101807-appb-000040
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
Figure PCTCN2021101807-appb-000041
is equal to an integer;
所述若
Figure PCTCN2021101807-appb-000042
不等于整数,和/或
Figure PCTCN2021101807-appb-000043
不等于整数,则用像素中值填充所述第一图像的边缘包括:
said if
Figure PCTCN2021101807-appb-000042
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000043
is not equal to an integer, then filling the edge of the first image with the median value of pixels includes:
Figure PCTCN2021101807-appb-000044
不等于整数,则用像素中值填充所述第一图像的边缘。
like
Figure PCTCN2021101807-appb-000044
not equal to an integer, then fill the edges of the first image with the pixel median.
在第三方面的一种可选设计中,在等比放大r和t后,若
Figure PCTCN2021101807-appb-000045
不等于整数,则获取
Figure PCTCN2021101807-appb-000046
的余数。若余数大于
Figure PCTCN2021101807-appb-000047
则只在第一图像的宽度方向的一侧填充像素中值。
In an optional design of the third aspect, after proportionally enlarging r and t, if
Figure PCTCN2021101807-appb-000045
not equal to an integer, get
Figure PCTCN2021101807-appb-000046
the remainder. If the remainder is greater than
Figure PCTCN2021101807-appb-000047
Then only the pixel median value is filled on one side of the width direction of the first image.
在第三方面的一种可选设计中,若所述余数小于
Figure PCTCN2021101807-appb-000048
则在所述第一图像的宽度方向的二侧填充所述像素中值,使得每侧填充的所述像素中值的宽度为
Figure PCTCN2021101807-appb-000049
其中,所述g为所述余数。
In an optional design of the third aspect, if the remainder is less than
Figure PCTCN2021101807-appb-000048
Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000049
Wherein, the g is the remainder.
在第三方面的一种可选设计中,所述N个第一图块包括第一目标图块,所述第一目标图块的像素值的范围小于所述第一图像的像素值的范围;In an optional design of the third aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
在从所述N个第一图块中获取N个第一自适应数据之前,所述方法还包括:Before acquiring the N first adaptive data from the N first tiles, the method further includes:
反量化所述第一目标图块的像素值;inversely quantize the pixel value of the first target image block;
从所述N个第一图块中获取N个第一自适应数据包括:Obtaining N pieces of first adaptive data from the N pieces of first tiles includes:
从反量化后的所述第一目标图块获取所述一个第一自适应数据。The one first adaptive data is obtained from the inverse quantized first target image block.
本申请第四方面提供了一种编码装置,所述装置包括:A fourth aspect of the present application provides an encoding device, the device comprising:
第一获取模块,用于获取第一图像;a first acquisition module, configured to acquire a first image;
分割模块,用于分割第一图像,获得N个第一图块,N为大于1的整数;A segmentation module, used for segmenting the first image to obtain N first image blocks, where N is an integer greater than 1;
第二获取模块,用于从N个第一图块中获取N个第一自适应数据,N个第一自适应数据与N个第一图块一一对应;A second acquisition module, configured to acquire N first adaptive data from N first image blocks, where N first adaptive data corresponds to N first image blocks one-to-one;
预处理模块,用于根据N个第一自适应数据对N个第一图块进行预处理;a preprocessing module, configured to preprocess the N first image blocks according to the N first adaptive data;
编码神经网络模块,通过编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The coding neural network module processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps;
量化和熵编码模块,用于对N组第一特征图进行量化和熵编码,得到N个第一编码表示。The quantization and entropy encoding module is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
在第四方面的一种可选设计中,N个第一编码表示用于进行熵解码,得到N组第二特征图,N组第二特征图用于通过解码神经网络处理,得到N个第一重构图块,N个第一自适应数据用于对N个第一重构图块进行补偿,补偿后的N个第一重构图块用于组合成第二图像。In an optional design of the fourth aspect, the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network to obtain N sets of second feature maps. For a reconstructed image block, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used to combine into a second image.
在第四方面的一种可选设计中,所述装置还包括:In an optional design of the fourth aspect, the device further includes:
发送模块,用于向解码端发送N个第一编码表示,N个第一自适应数据和对应关系,对应关系包括N个第一自适应数据和N个第一编码表示的对应关系。The sending module is configured to send the N pieces of first coded representations, the N pieces of first adaptive data and corresponding relationships to the decoding end, where the correspondence relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations.
在第四方面的一种可选设计中,所述装置还包括:In an optional design of the fourth aspect, the device further includes:
量化模块,用于量化N个第一自适应数据,获得N个第一自适应量化数据,N个第一自适应量化数据用于对N个第一重构图块进行补偿;a quantization module, configured to quantize the N first adaptive data to obtain the N first adaptive quantized data, and the N first adaptive quantized data is used to compensate the N first reconstructed image blocks;
发送模块具体用于向解码端发送N个第一自适应量化数据。The sending module is specifically configured to send the N pieces of first adaptive quantization data to the decoding end.
在第四方面的一种可选设计中,N越大,单个第一自适应量化数据的信息熵越小。In an optional design of the fourth aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
在第四方面的一种可选设计中,N个第一编码表示的排列顺序和N个第一图块的排列顺序相同,N个第一图块的排列顺序为N个第一图块在第一图像中的排列顺序,对应关系包括N个第一编码表示的排列顺序和N个第一图块的排列顺序。In an optional design of the fourth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
在第四方面的一种可选设计中,第二图像用于通过融合神经网络处理,得到第三图像,融合神经网络用于降低第二图像与第一图像的差异,差异包括块效应。In an optional design of the fourth aspect, the second image is processed by a fusion neural network to obtain the third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference including block effect.
在第四方面的一种可选设计中,N个第一图块中的每个第一图块的大小相同。In an optional design of the fourth aspect, each of the N first tiles has the same size.
在第四方面的一种可选设计中,当所述装置用于处理不同大小的所述第一图像时,所述第一图块的大小为固定值。在第四方面的一种可选设计中,第一图块的像素为a×b,a和b是根据目标像素得到的。目标像素为c×d,
Figure PCTCN2021101807-appb-000050
等于整数,
Figure PCTCN2021101807-appb-000051
等于整数,所述a和c为宽度方向上的像素点数量,所述b和d为高度方向上的像素点数量。目标像素是根据终端设备的目标分辨率得到的,终端设备包括摄像部件,摄像部件在所述目标分辨率的设置下得到的图像的像素为目标像素,第一图像由摄像部件获得。
In an optional design of the fourth aspect, when the apparatus is used to process the first images of different sizes, the size of the first image block is a fixed value. In an optional design of the fourth aspect, the pixels of the first image block are a×b, and a and b are obtained according to target pixels. The target pixel is c×d,
Figure PCTCN2021101807-appb-000050
is equal to an integer,
Figure PCTCN2021101807-appb-000051
Equal to an integer, the a and c are the number of pixels in the width direction, and the b and d are the number of pixels in the height direction. The target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, the pixels of the image obtained by the camera component under the setting of the target resolution are the target pixels, and the first image is obtained by the camera component.
在第四方面的一种可选设计中,目标分辨率是根据摄像应用中的设置界面对摄像部件的分辨率设置得到的。In an optional design of the fourth aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
在第四方面的一种可选设计中,目标分辨率是根据摄像部件得到的图库中的目标图像组得到的,目标图像组的像素为目标像素,在不同像素的图像组中,目标图像组在图库中的比值最大。In an optional design of the fourth aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
在第四方面的一种可选设计中,通过摄像部件得到多个像素的图像,多个像素为e×f,
Figure PCTCN2021101807-appb-000052
等于整数,
Figure PCTCN2021101807-appb-000053
等于整数,e包括c,f包括d。
In an optional design of the fourth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000052
is equal to an integer,
Figure PCTCN2021101807-appb-000053
Equal to an integer, e includes c and f includes d.
在第四方面的一种可选设计中,多个像素是通过摄像应用中的设置界面对摄像部件的分辨率设置得到的。In an optional design of the fourth aspect, the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
在第四方面的一种可选设计中,第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,第一图像的像素为r×t。所述装置还包括:In an optional design of the fourth aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. The device also includes:
填充模块,用于若
Figure PCTCN2021101807-appb-000054
不等于整数,和/或
Figure PCTCN2021101807-appb-000055
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000056
等于整数,
Figure PCTCN2021101807-appb-000057
等于整数,填充后的所述第一图像的像素为r1×t1。
padding module for if
Figure PCTCN2021101807-appb-000054
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000055
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000056
is equal to an integer,
Figure PCTCN2021101807-appb-000057
equal to an integer, the pixels of the first image after filling are r1×t1.
在第四方面的一种可选设计中,所述装置还包括:In an optional design of the fourth aspect, the device further includes:
放大模块,用于若所述
Figure PCTCN2021101807-appb-000058
不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
Figure PCTCN2021101807-appb-000059
等于整数;
Amplification module for use if described
Figure PCTCN2021101807-appb-000058
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
Figure PCTCN2021101807-appb-000059
is equal to an integer;
填充模块具体用于若
Figure PCTCN2021101807-appb-000060
不等于整数,则用像素中值填充所述第一图像的边缘。
The padding module is specifically used if
Figure PCTCN2021101807-appb-000060
not equal to an integer, then fill the edges of the first image with the pixel median.
在第四方面的一种可选设计中,所述第二获取单元还用于若
Figure PCTCN2021101807-appb-000061
不等于整数,则获取
Figure PCTCN2021101807-appb-000062
的余数;
In an optional design of the fourth aspect, the second obtaining unit is further configured to
Figure PCTCN2021101807-appb-000061
not equal to an integer, get
Figure PCTCN2021101807-appb-000062
the remainder;
所述填充模块具体用于若余数大于
Figure PCTCN2021101807-appb-000063
则只在第一图像的宽度方向的一侧填充像素中值。
The filling module is specifically used if the remainder is greater than
Figure PCTCN2021101807-appb-000063
Then only the pixel median value is filled on one side of the width direction of the first image.
在第四方面的一种可选设计中,所述填充模块具体用于若余数小于
Figure PCTCN2021101807-appb-000064
则在第一图像的宽度方向的二侧填充像素中值,使得每侧填充的像素中值的宽度为
Figure PCTCN2021101807-appb-000065
其中,g为所述余数。
In an optional design of the fourth aspect, the filling module is specifically used if the remainder is less than
Figure PCTCN2021101807-appb-000064
Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000065
where g is the remainder.
在第四方面的一种可选设计中,N个第一图块包括第一目标图块,第一目标图块的像素值的范围小于所述第一图像的像素值的范围。所述装置还包括:In an optional design of the fourth aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image. The device also includes:
反量化模块,用于反量化第一目标图块的像素值;an inverse quantization module for inverse quantization of the pixel value of the first target image block;
第二获取模块具体用于从反量化后的所述第一目标图块获取一个第一自适应数据。The second obtaining module is specifically configured to obtain a piece of first adaptive data from the inverse quantized first target image block.
本申请第五方面提供了一种解码装置,所述解码装置包括:A fifth aspect of the present application provides a decoding device, the decoding device comprising:
获取模块,用于获取N个第一编码表示,N个第一自适应数据和对应关系,对应关系包括N个第一自适应数据和N个第一编码表示的对应关系,N个第一自适应数据与N个第一编码一一对应,N为大于1的整数;The acquisition module is used to acquire N first coding representations, N first adaptive data and corresponding relationships, the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coding representations, the N first self-adaptive data and the corresponding relationships. The adaptation data is in one-to-one correspondence with the N first codes, where N is an integer greater than 1;
熵解码模块,对N个第一编码表示进行熵解码,得到N组第二特征图;The entropy decoding module performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
解码神经网络模块,用于处理N组第二特征图,得到N个第一重构图块;A decoding neural network module for processing N groups of second feature maps to obtain N first reconstructed image blocks;
补偿模块,用于通过N个第一自适应数据补偿N个第一重构图块;a compensation module, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;
组合模块,用于组合补偿后的N个第一重构图块,得到第二图像。The combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.
在第五方面的一种可选设计中,所述N个第一编码表示是通过N组第一特征图量化和熵编码得到的,所述N组第一特征图是通过编码神经网络处理预处理后的N个第一图块得到的,所述预处理后的N个第一图块是通过所述N个第一自适应数据对N个第一图块进行预处理得到的,所述N个第一自适应数据是从所述N个第一图块中得到的,所述N个第一图块是通过分割第一图像得到的。In an optional design of the fifth aspect, the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network. The N first image blocks after processing are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data. The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
在第五方面的一种可选设计中,N个第一自适应数据为N个第一自适应量化数据,N个第一自适应量化数据是通过量化N个第一自适应数据得到的;In an optional design of the fifth aspect, the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data;
补偿模块具体用于通过N个第一自适应量化数据补偿N个第一重构图块。The compensation module is specifically configured to compensate the N first reconstructed image blocks by using the N first adaptive quantization data.
在第五方面的一种可选设计中,N越大,单个第一自适应量化数据的信息熵越小。In an optional design of the fifth aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
在第五方面的一种可选设计中,N个第一编码表示的排列顺序和N个第一图块的排列顺序相同,N个第一图块的排列顺序为N个第一图块在所述第一图像中的排列顺序。对应关系包括N个第一编码表示的排列顺序和N个第一图块的排列顺序。In an optional design of the fifth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in the arrangement order in the first image. The corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
在第五方面的一种可选设计中,所述装置还包括:In an optional design of the fifth aspect, the device further includes:
融合神经网络模块,用于处理第二图像,得到第三图像,以降低第二图像与第一图像的差异,差异包括块效应。The fusion neural network module is used to process the second image to obtain the third image, so as to reduce the difference between the second image and the first image, and the difference includes block effect.
在第五方面的一种可选设计中,N个第一图块中的每个第一图块的大小相同。In an optional design of the fifth aspect, each of the N first tiles has the same size.
在第五方面的一种可选设计中,当所述装置用于组合生成不同大小的第二图像时,所述第一图块的大小为固定值。在第五方面的一种可选设计中,第一图块的像素为a×b,a和b是根据目标像素得到的,目标像素为c×d,
Figure PCTCN2021101807-appb-000066
等于整数,
Figure PCTCN2021101807-appb-000067
等于整数,a和c为宽度方向上的像素点数量,b和d为高度方向上的像素点数量,目标像素是根据终端设备的目标分辨率得到的,终端设备包括摄像部件,摄像部件在目标分辨率的设置下得到的图像的像素为目标像素,第一图像由摄像部件获得。
In an optional design of the fifth aspect, when the apparatus is used to combine and generate second images of different sizes, the size of the first image block is a fixed value. In an optional design of the fifth aspect, the pixels of the first block are a×b, a and b are obtained according to the target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000066
is equal to an integer,
Figure PCTCN2021101807-appb-000067
Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component.
在第五方面的一种可选设计中,目标分辨率是根据摄像应用中的设置界面对摄像部件的分辨率设置得到的。In an optional design of the fifth aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
在第五方面的一种可选设计中,目标分辨率是根据摄像部件得到的图库中的目标图像组得到的,目标图像组的像素为目标像素,在不同像素的图像组中,目标图像组在图库中的比值最大。In an optional design of the fifth aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
在第五方面的一种可选设计中,通过摄像部件得到多个像素的图像,多个像素为e×f,
Figure PCTCN2021101807-appb-000068
等于整数,
Figure PCTCN2021101807-appb-000069
等于整数,e包括c,f包括d。
In an optional design of the fifth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000068
is equal to an integer,
Figure PCTCN2021101807-appb-000069
Equal to an integer, e includes c and f includes d.
在第五方面的一种可选设计中,多个像素是通过摄像应用中的设置界面对摄像部件的 分辨率设置得到的。In an optional design of the fifth aspect, the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
在第五方面的一种可选设计中,第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,第一图像的像素为r×t。在
Figure PCTCN2021101807-appb-000070
不等于整数,和/或
Figure PCTCN2021101807-appb-000071
不等于整数的情况下,第一图像的边缘被像素中值填充,使得
Figure PCTCN2021101807-appb-000072
等于整数,
Figure PCTCN2021101807-appb-000073
等于整数,填充后的第一图像的像素为r1×t1。
In an optional design of the fifth aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. exist
Figure PCTCN2021101807-appb-000070
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000071
not equal to an integer, the edges of the first image are filled with the pixel median such that
Figure PCTCN2021101807-appb-000072
is equal to an integer,
Figure PCTCN2021101807-appb-000073
equal to an integer, the pixels of the first image after filling are r1×t1.
在第五方面的一种可选设计中,在
Figure PCTCN2021101807-appb-000074
不等于整数的情况下,r和t被编码端等比放大,得到了像素为r2×t2的第一图像,
Figure PCTCN2021101807-appb-000075
等于整数。
In an optional design of the fifth aspect, in
Figure PCTCN2021101807-appb-000074
In the case of not equal to an integer, r and t are proportionally enlarged by the encoder, and the first image with pixels r2×t2 is obtained,
Figure PCTCN2021101807-appb-000075
equal to an integer.
在第五方面的一种可选设计中,若
Figure PCTCN2021101807-appb-000076
不等于整数,
Figure PCTCN2021101807-appb-000077
的余数大于
Figure PCTCN2021101807-appb-000078
第一图像只在宽度方向的一侧被填充像素中值。
In an optional design of the fifth aspect, if
Figure PCTCN2021101807-appb-000076
not equal to an integer,
Figure PCTCN2021101807-appb-000077
the remainder is greater than
Figure PCTCN2021101807-appb-000078
The first image is padded with the median pixel value only on one side in the width direction.
在第五方面的一种可选设计中,若余数小于
Figure PCTCN2021101807-appb-000079
第一图像在宽度方向的二侧被填充像素中值,每侧填充的像素中值的宽度为
Figure PCTCN2021101807-appb-000080
其中,g为所述余数。
In an optional design of the fifth aspect, if the remainder is less than
Figure PCTCN2021101807-appb-000079
The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000080
where g is the remainder.
在第五方面的一种可选设计中,N个第一图块包括第一目标图块,第一目标图块的像素值的范围小于第一图像的像素值的范围,至少其中一个第一自适应数据是从反量化后的第一目标图块得到的,反量化后的第一目标图块是通过反量化第一目标图块的像素值得到的。In an optional design of the fifth aspect, the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first The adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.
本申请第六方面提供了一种训练装置,所述装置包括:A sixth aspect of the present application provides a training device, the device comprising:
第一获取模块,用于获取第一图像;a first acquisition module, configured to acquire a first image;
分割模块,用于分割所述第一图像,获得N个第一图块,N为大于1的整数;a segmentation module, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;
第二获取模块,用于从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应;a second obtaining module, configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
预处理模块,用于根据所述N个第一自适应数据对所述N个第一图块进行预处理;a preprocessing module, configured to preprocess the N first image blocks according to the N first adaptive data;
第一编码神经网络模块,用于处理预处理后的N个第一图块,得到N组第一特征图;The first coding neural network module is used to process the preprocessed N first image blocks to obtain N groups of first feature maps;
量化和熵编码模块,用于对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示;a quantization and entropy encoding module, for performing quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations;
熵解码模块,对所述N个第一编码表示进行熵解码,得到N组第二特征图;an entropy decoding module, which performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
第一解码神经网络模块,用于处理所述N组第二特征图,得到N个第一重构图块;a first decoding neural network module for processing the N groups of second feature maps to obtain N first reconstructed image blocks;
补偿模块,用于通过所述N个第一自适应数据补偿所述N个第一重构图块;a compensation module, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;
组合模块,用于组合补偿后的N个第一重构图块,得到第二图像;a combining module for combining the compensated N first reconstructed image blocks to obtain a second image;
第三获取模块,用于获取所述第二图像相对于所述第一图像的失真损失;a third acquiring module, configured to acquire the distortion loss of the second image relative to the first image;
训练模块,用于利用损失函数对模型进行联合训练,直至所述第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述模型包括所述第一编码神经网络、量化网络、熵编码网络、熵解码网络、所述第一解码神经网络。可选地,所述模型还包括分割网络,分割网络中的可训练的参数为第一图块的大小。可选地,所述模型还包括分割网络,分割网络中的可训练的参数为第一图块的大小;A training module, configured to jointly train a model by using a loss function until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding neural network , a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block. Optionally, the model further includes a segmentation network, and the trainable parameter in the segmentation network is the size of the first image block;
输出模块,用于输出第二编码神经网络和第二解码神经网络,所述第二编码神经网络为所述第一编码神经网络执行过迭代训练后得到的模型,所述第二解码神经网络为所述第一解码神经网络执行过迭代训练后得到的模型。The output module is used to output a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained after the first coding neural network performs iterative training, and the second decoding neural network is The first decoding neural network is a model obtained after iterative training is performed.
在第六方面的一种可选设计中,所述装置还包括:In an optional design of the sixth aspect, the device further includes:
量化模块,用于量化所述N个第一自适应数据,获得N个第一自适应量化数据,所述N个第一自适应量化数据用于对所述N个第一重构图块进行补偿;A quantization module, configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;
在第六方面的一种可选设计中,所述N越大,单个第一自适应量化数据的信息熵越小。In an optional design of the sixth aspect, the larger the N, the smaller the information entropy of the single first adaptive quantization data.
在第六方面的一种可选设计中,所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序相同,所述N个第一图块的排列顺序为所述N个第一图块在所述第一图像中的排列顺序。In an optional design of the sixth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is the same as that of the N first image blocks. the arrangement order of the N first image blocks in the first image.
在第六方面的一种可选设计中,通过融合神经网络处理所述第二图像,得到第三图像,所述融合神经网络用于降低所述第二图像与所述第一图像的差异,所述差异包括块效应;In an optional design of the sixth aspect, the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;
第三获取模块具体用于获取所述第三图像相对于所述第一图像的失真损失;The third obtaining module is specifically configured to obtain the distortion loss of the third image relative to the first image;
所述模型包括融合神经网络。The model includes a fusion neural network.
在第六方面的一种可选设计中,所述N个第一图块中的每个第一图块的大小相同。In an optional design of the sixth aspect, each of the N first image blocks has the same size.
在第六方面的一种可选设计中,在两次迭代训练中,训练用的第一图像的大小不同,第一图块的大小为固定值。In an optional design of the sixth aspect, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.
在第六方面的一种可选设计中,所述第一图块的像素为a×b,所述a和所述b是根据目标像素得到的,所述目标像素为c×d,
Figure PCTCN2021101807-appb-000081
等于整数,
Figure PCTCN2021101807-appb-000082
等于整数,所述a和c为宽度方向上的像素点数量,所述b和d为高度方向上的像素点数量,所述目标像素是根据终端设备的目标分辨率得到的,所述终端设备包括摄像部件,所述摄像部件在所述目标分辨率的设置下得到的图像的像素为所述目标像素,第一图像由所述摄像部件获得。
In an optional design of the sixth aspect, the pixels of the first block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000081
is equal to an integer,
Figure PCTCN2021101807-appb-000082
is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.
在第六方面的一种可选设计中,所述目标分辨率是根据摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。In an optional design of the sixth aspect, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
在第六方面的一种可选设计中,所述目标分辨率是根据所述摄像部件得到的图库中的目标图像组得到的,所述目标图像组的像素为所述目标像素,在不同像素的图像组中,所述目标图像组在所述图库中的比值最大。In an optional design of the sixth aspect, the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
在第六方面的一种可选设计中,通过所述摄像部件得到多个像素的图像,所述多个像素为e×f,
Figure PCTCN2021101807-appb-000083
等于整数,
Figure PCTCN2021101807-appb-000084
等于整数,所述e包括所述c,所述f包括所述d。
In an optional design of the sixth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000083
is equal to an integer,
Figure PCTCN2021101807-appb-000084
equal to an integer, the e includes the c and the f includes the d.
在第六方面的一种可选设计中,所述多个像素是通过所述摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。In an optional design of the sixth aspect, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
在第六方面的一种可选设计中,所述第一图块的像素为a×b,所述a为宽度方向上的像素点数量,所述b为高度方向上的像素点数量,所述第一图像的像素为r×t;In an optional design of the sixth aspect, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r×t;
所述装置还包括:The device also includes:
填充模块,用于若
Figure PCTCN2021101807-appb-000085
不等于整数,和/或
Figure PCTCN2021101807-appb-000086
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000087
等于整数,
Figure PCTCN2021101807-appb-000088
等于整数,填充后的所述第一图像的像素为r1×t1。
padding module for if
Figure PCTCN2021101807-appb-000085
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000086
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000087
is equal to an integer,
Figure PCTCN2021101807-appb-000088
equal to an integer, the pixels of the first image after filling are r1×t1.
在第六方面的一种可选设计中,所述装置还包括:In an optional design of the sixth aspect, the device further includes:
放大模块,用于若所述
Figure PCTCN2021101807-appb-000089
不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
Figure PCTCN2021101807-appb-000090
等于整数;
Amplification module for use if described
Figure PCTCN2021101807-appb-000089
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
Figure PCTCN2021101807-appb-000090
is equal to an integer;
填充模块具体用于若
Figure PCTCN2021101807-appb-000091
不等于整数,则用像素中值填充所述第一图像的边缘。
The padding module is specifically used if
Figure PCTCN2021101807-appb-000091
not equal to an integer, then fill the edges of the first image with the pixel median.
在第六方面的一种可选设计中,所述第二获取模块还用于在等比放大r和t后,若
Figure PCTCN2021101807-appb-000092
不等于整数,则获取
Figure PCTCN2021101807-appb-000093
的余数;
In an optional design of the sixth aspect, the second acquisition module is further configured to, after proportionally amplifying r and t, if
Figure PCTCN2021101807-appb-000092
not equal to an integer, get
Figure PCTCN2021101807-appb-000093
the remainder;
所述填充模块具体用于若所述余数大于
Figure PCTCN2021101807-appb-000094
则只在所述第一图像的宽度方向的一侧填充所述像素中值。
The filling module is specifically used for if the remainder is greater than
Figure PCTCN2021101807-appb-000094
Then the pixel median value is only filled on one side of the width direction of the first image.
在第六方面的一种可选设计中,若所述余数小于
Figure PCTCN2021101807-appb-000095
则在所述第一图像的宽度方向的二侧填充所述像素中值,使得每侧填充的所述像素中值的宽度为
Figure PCTCN2021101807-appb-000096
其中,所述g为所述余数。
In an optional design of the sixth aspect, if the remainder is less than
Figure PCTCN2021101807-appb-000095
Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000096
Wherein, the g is the remainder.
在第六方面的一种可选设计中,所述N个第一图块包括第一目标图块,所述第一目标图块的像素值的范围小于所述第一图像的像素值的范围;In an optional design of the sixth aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
所述装置还包括:The device also includes:
反量化模块,用于反量化所述第一目标图块的像素值;an inverse quantization module for inverse quantization of the pixel value of the first target image block;
所述第二获取模块具体用于从反量化后的所述第一目标图块获取一个第一自适应数据。The second obtaining module is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.
本申请第七方面提供了一种编码设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:A seventh aspect of the present application provides an encoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
获取第一图像;get the first image;
分割第一图像,获得N个第一图块,N为大于1的整数;Divide the first image to obtain N first image blocks, where N is an integer greater than 1;
从N个第一图块中获取N个第一自适应数据,N个第一自适应数据与N个第一图块一一对应;Obtain N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
通过N个第一自适应数据对N个第一图块进行预处理;Preprocess the N first image blocks by using the N first adaptive data;
通过编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The preprocessed N first image blocks are processed by the coding neural network to obtain N groups of first feature maps;
对N组第一特征图进行量化和熵编码,得到N个第一编码表示。Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
在第七方面的一种可选设计中,编码设备为虚拟现实VR设备、手机、平板、笔记本电脑、服务器或者智能穿戴设备。In an optional design of the seventh aspect, the encoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.
本申请第七方面中,处理器还可以用于执行第一方面的各个可能实现方式中编码端执行的步骤,具体均可以参阅第一方面,此处不再赘述。In the seventh aspect of the present application, the processor may also be configured to execute the steps performed by the encoding end in each possible implementation manner of the first aspect. For details, refer to the first aspect, which will not be repeated here.
本申请第八方面提供了一种解码设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:An eighth aspect of the present application provides a decoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
获取N个第一编码表示,N个第一自适应数据和对应关系,对应关系包括N个第一自适应数据和N个第一编码表示的对应关系,N个第一自适应数据与N个第一编码一一对应,N为大于1的整数;Obtain N first coded representations, N pieces of first adaptive data and correspondences, the correspondences include correspondences between N pieces of first adaptive data and N pieces of first coded representations, and N pieces of first adaptive data and N pieces of first coded representations The first codes are in one-to-one correspondence, and N is an integer greater than 1;
对N个第一编码表示进行熵解码,得到N组第二特征图;Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
通过解码神经网络处理N组第二特征图,得到N个第一重构图块;The N groups of second feature maps are processed by the decoding neural network to obtain N first reconstructed image blocks;
通过N个第一自适应数据补偿N个第一重构图块;Compensate the N first reconstructed image blocks by using the N first adaptive data;
组合补偿后的N个第一重构图块,得到第二图像。The compensated N first reconstructed image blocks are combined to obtain a second image.
在第八方面的一种可选设计中,解码设备为虚拟现实VR设备、手机、平板、笔记本电脑、服务器或者智能穿戴设备。In an optional design of the eighth aspect, the decoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.
本申请第八方面中,处理器还可以用于执行第二方面的各个可能实现方式中解码端执 行的步骤,具体均可以参阅第二方面,此处不再赘述。In the eighth aspect of the present application, the processor may also be configured to execute the steps performed by the decoding end in each possible implementation manner of the second aspect, and details can be found in the second aspect, which will not be repeated here.
本申请第九方面提供了一种训练设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:A ninth aspect of the present application provides a training device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
获取第一图像;get the first image;
分割所述第一图像,获得N个第一图块,N为大于1的整数;dividing the first image to obtain N first image blocks, where N is an integer greater than 1;
从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应;Obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
根据所述N个第一自适应数据对所述N个第一图块进行预处理;preprocessing the N first image blocks according to the N first adaptive data;
通过第一编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The preprocessed N first image blocks are processed by the first coding neural network to obtain N groups of first feature maps;
对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示;Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations;
对所述N个第一编码表示进行熵解码,得到N组第二特征图;Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
通过第一解码神经网络处理所述N组第二特征图,得到N个第一重构图块;The N groups of second feature maps are processed by the first decoding neural network to obtain N first reconstructed image blocks;
通过所述N个第一自适应数据补偿所述N个第一重构图块;compensating the N first reconstructed image blocks by the N first adaptive data;
组合补偿后的N个第一重构图块,得到第二图像;Combining the compensated N first reconstructed image blocks to obtain a second image;
获取所述第二图像相对于所述第一图像的失真损失;obtaining a distortion loss of the second image relative to the first image;
利用损失函数对所述第一编码神经网络、量化网络、熵编码网络、熵解码网络、所述第一解码神经网络进行联合训练,直至所述第一图像与所述第二图像之间的图像失真值达到第一预设程度;The first encoding neural network, the quantization network, the entropy encoding network, the entropy decoding network, and the first decoding neural network are jointly trained by using the loss function, until the image between the first image and the second image is The distortion value reaches the first preset level;
输出第二编码神经网络和第二解码神经网络,所述第二编码神经网络为所述第一编码神经网络执行过迭代训练后得到的模型,所述第二解码神经网络为所述第一解码神经网络执行过迭代训练后得到的模型。Output a second encoding neural network and a second decoding neural network, where the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network, and the second decoding neural network is the first decoding neural network The model obtained after the neural network has performed iterative training.
本申请第九方面中,处理器还可以用于执行第三方面的各个可能实现方式中解码端执行的步骤,具体均可以参阅第三方面,此处不再赘述。In the ninth aspect of the present application, the processor may also be used to execute the steps performed by the decoding end in each possible implementation manner of the third aspect, and details can be found in the third aspect, which will not be repeated here.
第十方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第三方面任一所述的图像处理方法。In a tenth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer enables the computer to execute the above-mentioned first to third aspects Any of the image processing methods described.
第十一方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第三方面任一所述的图像处理方法。In an eleventh aspect, an embodiment of the present application provides a computer program, which, when run on a computer, causes the computer to execute the image processing method described in any one of the first to third aspects above.
第十二方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a twelfth aspect, the present application provides a chip system, the chip system includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and/or information. In a possible design, the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device. The chip system may be composed of chips, or may include chips and other discrete devices.
附图说明Description of drawings
图1为人工智能主体框架的一种结构示意图;Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame;
图2a为本申请实施例的一个应用场景示意图;2a is a schematic diagram of an application scenario of an embodiment of the present application;
图2b为本申请实施例的另一个应用场景示意图;2b is a schematic diagram of another application scenario of the embodiment of the present application;
图2c为本申请实施例的另一个应用场景示意图;FIG. 2c is a schematic diagram of another application scenario of the embodiment of the present application;
图3a为本申请实施例提供的图像处理方法的一个流程示意图;3a is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图3b为本申请实施例提供的图像处理方法的另一个流程示意图;FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application;
图4为本申请实施例中分割、组合图像的示意图;4 is a schematic diagram of dividing and combining images in an embodiment of the present application;
图5为本申请实施例中一种基于CNN的图像编码处理过程的示意图;5 is a schematic diagram of a CNN-based image encoding processing process in an embodiment of the present application;
图6为本申请实施例中一种基于CNN的图像解码过程的示意图;6 is a schematic diagram of a CNN-based image decoding process in an embodiment of the application;
图7为本申请实施例中的终端设备的相机设置分辨率的设置界面的示意图;7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application;
图8为本申请实施例中图像填充的一个流程示意图;FIG. 8 is a schematic flowchart of image filling in an embodiment of the present application;
图9为本申请实施例中图像填充的另一个流程示意图;9 is another schematic flowchart of image filling in an embodiment of the present application;
图10为本申请实施例中图像压缩质量的对比示意图;10 is a schematic diagram of a comparison of image compression quality in an embodiment of the present application;
图11为本申请实施例提供的图像处理系统的一种系统架构图;11 is a system architecture diagram of an image processing system provided by an embodiment of the present application;
图12为本申请实施例提供的模型训练方法的一种流程示意图;12 is a schematic flowchart of a model training method provided by an embodiment of the present application;
图13为本申请实施例提供的一种训练过程的示意图;13 is a schematic diagram of a training process provided by an embodiment of the present application;
图14为本申请实施例提供的编码装置的一个结构示意图;FIG. 14 is a schematic structural diagram of an encoding device provided by an embodiment of the present application;
图15为本申请实施例提供的解码装置的一个结构示意图;15 is a schematic structural diagram of a decoding apparatus provided by an embodiment of the present application;
图16为本申请实施例提供的训练装置的一个结构示意图;16 is a schematic structural diagram of a training device provided by an embodiment of the application;
图17为本申请实施例提供的执行设备的一种结构示意图;FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application;
图18是本申请实施例提供的训练设备一种结构示意图;18 is a schematic structural diagram of a training device provided by an embodiment of the present application;
图19为本申请实施例提供的芯片的一种结构示意图。FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式detailed description
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.
首先对人工智能系统总体工作流程进行描述,请参见图1,图1为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 is a structural schematic diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is elaborated in two dimensions. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
本申请可以应用于人工智能领域的图像处理领域中,下面将对多个落地到产品的多个应用场景进行介绍。This application can be applied to the field of image processing in the field of artificial intelligence. The following will introduce multiple application scenarios that are applied to products.
应用于终端设备中的图像压缩,解压过程,编码端和解码端都是终端设备。The image compression and decompression process applied to the terminal device, the encoding end and the decoding end are all terminal devices.
本申请实施例提供的图像压缩方法可以应用于终端设备中的图像压缩过程,具体的,可以应用于终端设备上的相册、视频监控等。具体的,可以参照图2a,图2a为本申请实施例的一个应用场景示意图,如图2a中示出的那样,终端设备可以获取到待压缩图像,其中待压缩图像可以是摄像部件拍摄的照片或是从视频中截取的一帧画面,摄像部件一般为相机。终端设备通过中央处理器(central processing unit,CPU)分割提取到的图像,获得多个图块。在得到多个图块后,终端设备可以通过嵌入式神经网络(neural-network processing unit,NPU)中的人工智能(artificial intelligence,AI)编码神经网络(简称编码神经网络)对获取到的多个图块进行特征提取,将图块数据变换成冗余度更低的输出特征,且产生输出特征中各特征点的概率估计,中央处理器(central processing unit,CPU)通过输出特征中各点的概率估计对提取获得的输出特征进行熵编码,降低输出特征的编码冗余,进一步降低图块压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对 应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于熵解码获取到解码得到的特征图,通过NPU中的AI解码神经网络(简称解码神经网络)对特征图进行重构,得到多个重构图块。在得到多个图块后,终端设备通过CPU组合该多个重构图块,获得重构图像。The image compression method provided by the embodiment of the present application can be applied to an image compression process in a terminal device, and specifically, can be applied to an album, video monitoring, etc. on the terminal device. Specifically, reference may be made to FIG. 2a, which is a schematic diagram of an application scenario of an embodiment of the present application. As shown in FIG. 2a, a terminal device may acquire an image to be compressed, where the image to be compressed may be a photo taken by a camera component Or a frame captured from a video, and the camera component is generally a camera. The terminal device divides the extracted image through a central processing unit (CPU) to obtain multiple tiles. After obtaining multiple tiles, the terminal device can use the artificial intelligence (artificial intelligence, AI) coding neural network (referred to as coding neural network) in the embedded neural network (neural-network processing unit, NPU). Feature extraction is performed on the blocks, the block data is transformed into output features with lower redundancy, and the probability estimates of each feature point in the output feature are generated. The probability estimation performs entropy coding on the extracted output features, reduces the coding redundancy of the output features, further reduces the amount of data transmission in the block compression process, and saves the encoded encoded data in the form of data files in the corresponding storage location . When the user needs to obtain the file saved in the above storage location, the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding. Decoding neural network) reconstructs the feature map to obtain multiple reconstructed blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.
特别地,在这种场景下,终端设备可以在云端设备上保存编码数据。当用户需要获取上述编码数据时,可以从云端设备获取该编码数据。In particular, in this scenario, the terminal device can save the encoded data on the cloud device. When the user needs to obtain the above-mentioned encoded data, the encoded data can be obtained from the cloud device.
应用于云端的图像压缩,解压过程,编码端和解码端都是云端设备。The image compression and decompression process applied to the cloud, the encoding end and the decoding end are all cloud devices.
本申请实施例提供的图像压缩方法可以应用于云端的图像压缩过程,具体的,可以应用于云端设备上的云相册等功能,云端设备可以是云端服务器。具体的,可以参照图2b,图2b为本申请实施例的另一个应用场景示意图,如图2b中示出的那样,终端设备可以获取到待压缩图像,其中待压缩图像可以是摄像部件拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过CPU对待压缩图片进行熵编码,得到编码数据。除了采用熵编码,也可以采用基于现有技术中的任意一种无损压缩方法。终端设备可以将编码数据传输至云端设备,云端设备可以对接收到的编码数据进行相应的熵解码,得到待压缩图像。终端设备通过CPU分割提取到的图像,获得多个图块。在得到多个图块后,服务器可以通过图形处理器(graphics processing unit,GPU)中的编码神经网络对获取到的多个图块进行特征提取,将图块数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,CPU通过输出特征中各点的概率估计对提取获得的输出特征进行熵编码,降低输出特征的编码冗余,进一步降低图块压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于熵解码获取到解码得到的特征图,通过NPU中的解码神经网络对特征图进行重构,得到多个重构图块,在得到多个图块后,云端设备通过CPU组合该多个重构图块,获得重构图像。云端设备可以通过CPU对待压缩图片进行熵编码,得到编码数据,编码方法还可以是其它的基于现有技术中的任意一种无损压缩方法,云端设备可以将编码数据传输至终端设备,终端设备可以对接收到的编码数据进行相应的熵解码,得到解码后的图像。The image compression method provided by the embodiment of the present application can be applied to an image compression process in the cloud, and specifically, can be applied to functions such as a cloud album on a cloud device, and the cloud device can be a cloud server. Specifically, reference may be made to FIG. 2b, which is a schematic diagram of another application scenario of an embodiment of the present application. As shown in FIG. 2b, a terminal device may acquire an image to be compressed, and the image to be compressed may be captured by a camera component A photo or a frame taken from a video. The terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data. In addition to using entropy coding, any lossless compression method based on the prior art can also be used. The terminal device can transmit the encoded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received encoded data to obtain the image to be compressed. The terminal device divides the extracted image through the CPU to obtain multiple tiles. After obtaining multiple blocks, the server can perform feature extraction on the obtained multiple blocks through the coding neural network in the graphics processing unit (GPU), and transform the block data into lower-redundancy ones. Output features, and generate a probability estimate of each point in the output feature. The CPU performs entropy encoding on the extracted output feature through the probability estimate of each point in the output feature, reducing the coding redundancy of the output feature and further reducing the block compression process. The amount of data transmission is stored, and the encoded data obtained by encoding is stored in the corresponding storage location in the form of a data file. When the user needs to obtain the file saved in the above storage location, the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding, and decode the feature map through the decoding neural network in the NPU. Reconstruction is performed to obtain a plurality of reconstructed picture blocks, and after obtaining the plurality of picture blocks, the cloud device combines the plurality of reconstructed picture blocks through the CPU to obtain a reconstructed image. The cloud device can perform entropy encoding on the compressed image through the CPU to obtain encoded data. The encoding method can also be any other lossless compression method based on the prior art. The cloud device can transmit the encoded data to the terminal device, and the terminal device can Corresponding entropy decoding is performed on the received encoded data to obtain a decoded image.
三、应用于终端设备的图像解压,云端设备的图像压缩过程,编码端是云端设备,解码端是终端设备。3. The image decompression applied to the terminal device, the image compression process of the cloud device, the encoding end is the cloud device, and the decoding end is the terminal device.
本申请实施例提供的图像压缩方法可以应用于终端设备的图像压缩,云端设备的图像解压过程,具体的,可以应用于云端设备上的云相册等功能,云端设备可以是云端服务器。具体的,可以参照图2c,图2c为本申请实施例的另一个应用场景示意图,如图2c中示出的那样,终端设备可以获取到待压缩图像,其中待压缩图像可以是摄像部件拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过CPU对待压缩图片进行熵编码,得到编码数据。除了采用熵编码,也可以采用基于现有技术中的任意一种无损压缩方法。终端设备可以将编码数据传输至云端设备,云端设备可以对接收到的编码数据进行相应的熵解码,得到待压缩图像。终端设备通过CPU分割提取到的图像,获得多个图块。在得到多个图块后,服务器可以通过GPU中的编码神经网络对获取到的多个图块进行特征提取,将图块数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,CPU通过输出特征中各点的概率估计对提取获得的输出特征进行熵编码,降低输出特征的编码冗余,进一步 降低图块压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当终端设备需要获取上述图像时,终端设备接收云端设备发送的编码数据,并基于熵解码获取到解码得到的特征图。终端设备通过NPU中的解码神经网络对特征图进行重构,得到多个重构图块。在得到多个图块后,终端设备通过CPU组合该多个重构图块,获得重构图像。The image compression method provided by the embodiments of the present application can be applied to image compression of terminal devices and image decompression processes of cloud devices. Specifically, it can be applied to functions such as cloud albums on cloud devices. The cloud device can be a cloud server. Specifically, reference may be made to FIG. 2c, which is a schematic diagram of another application scenario of an embodiment of the present application. As shown in FIG. 2c, a terminal device may acquire an image to be compressed, where the image to be compressed may be captured by a camera component A photo or a frame taken from a video. The terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data. In addition to using entropy coding, any lossless compression method based on the prior art can also be used. The terminal device can transmit the encoded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received encoded data to obtain the image to be compressed. The terminal device divides the extracted image through the CPU to obtain multiple tiles. After obtaining multiple tiles, the server can perform feature extraction on the acquired tiles through the coding neural network in the GPU, transform the tile data into output features with lower redundancy, and generate each of the output features. The probability estimation of points, the CPU performs entropy coding on the extracted output features through the probability estimation of each point in the output features, reduces the coding redundancy of the output features, and further reduces the amount of data transmission in the block compression process, and encodes the obtained output features. The encoded data is saved in the corresponding storage location in the form of a data file. When the terminal device needs to acquire the above image, the terminal device receives the encoded data sent by the cloud device, and acquires the decoded feature map based on entropy decoding. The terminal device reconstructs the feature map through the decoding neural network in the NPU to obtain multiple reconstructed image blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
Figure PCTCN2021101807-appb-000097
Figure PCTCN2021101807-appb-000097
其中,s=1、2、……、n,n为大于1的自然数,Ws为Xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2, ..., n, n is a natural number greater than 1, Ws is the weight of Xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021101807-appb-000098
其中,
Figure PCTCN2021101807-appb-000099
是输入向量,
Figure PCTCN2021101807-appb-000100
是输出向量,
Figure PCTCN2021101807-appb-000101
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021101807-appb-000102
经过如此简单的操作得到输出向量
Figure PCTCN2021101807-appb-000103
由于DNN层数多,系数W和偏移向量
Figure PCTCN2021101807-appb-000104
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021101807-appb-000105
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
Although DNN looks complicated, in terms of the work of each layer, it is not complicated. In short, it is the following linear relationship expression:
Figure PCTCN2021101807-appb-000098
in,
Figure PCTCN2021101807-appb-000099
is the input vector,
Figure PCTCN2021101807-appb-000100
is the output vector,
Figure PCTCN2021101807-appb-000101
is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is just an input vector
Figure PCTCN2021101807-appb-000102
After such a simple operation to get the output vector
Figure PCTCN2021101807-appb-000103
Due to the large number of DNN layers, the coefficient W and offset vector
Figure PCTCN2021101807-appb-000104
The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as
Figure PCTCN2021101807-appb-000105
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021101807-appb-000106
To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2021101807-appb-000106
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
(3)卷积神经网络(3) Convolutional Neural Network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(4)损失函数(4) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.
(5)反向传播算法(5) Back propagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
本申请实施例中,不仅仅进行了图像分割的操作,还在图像分割和编码神经网络之间,增加了对多个图块的自适应数据提取的步骤,自适应数据可以是均值,均方差等。提取到的自适应数据,用于对图块进行预处理。编码神经网络对预处理后的多个图块进行特征提取。自适应数据除了用于对图块进行预处理,还用于对重构图块进行补偿。为了方便描述,下面将以自适应数据为均值,应用场景为上述的第三种应用场景为例,对本申请实施例中的图像处理方法进行详细描述。在第三种应用场景中,云端设备为编码端,终端设备为解码端。In the embodiment of the present application, not only the operation of image segmentation is performed, but also the step of extracting adaptive data for multiple tiles is added between the image segmentation and the coding neural network. The adaptive data may be mean, mean square error Wait. The extracted adaptive data is used to preprocess the tiles. The coding neural network performs feature extraction on multiple preprocessed patches. In addition to preprocessing the tiles, the adaptive data is also used to compensate for the reconstructed tiles. For the convenience of description, the image processing method in the embodiment of the present application will be described in detail below by taking the adaptive data as the mean value and the application scenario as the above-mentioned third application scenario as an example. In the third application scenario, the cloud device is the encoding end, and the terminal device is the decoding end.
作为一种示例,终端设备可以为手机、平板、笔记本电脑、智能穿戴设备等。作为另一示例,所述终端设备可以为虚拟现实(virtual reality,VR)设备。作为另一示例,本申请实施例也可以应用于智能监控中,可以在所述智能监控中配置相机,则智能监控可以通过相机获取待压缩图片等,应当理解,本申请实施例还可以应用于其他需要进行图像压缩的场景中,此处不再对其他场景进行一一列举。As an example, the terminal device may be a mobile phone, a tablet, a notebook computer, a smart wearable device, or the like. As another example, the terminal device may be a virtual reality (virtual reality, VR) device. As another example, the embodiments of the present application can also be applied to intelligent monitoring, and a camera can be configured in the intelligent monitoring, and the intelligent monitoring can obtain pictures to be compressed through the camera, etc. It should be understood that the embodiments of the present application can also be applied to In other scenarios that require image compression, the other scenarios will not be listed one by one here.
请参阅图3a,图3a为本申请实施例提供的图像处理方法的一个流程示意图。Please refer to FIG. 3a, which is a schematic flowchart of an image processing method provided by an embodiment of the present application.
在步骤301中,终端设备获取第一图像。In step 301, the terminal device acquires a first image.
终端设备可以获取到第一图像,其中第一图像可以是摄像部件拍摄的照片或是从拍摄的视频中截取的一帧画面,终端设备包括该摄像部件,摄像部件一般为相机。第一图像也可以是终端设备从网络中获得的图像,或者终端设备利用截屏工具获取的图像。The terminal device can obtain the first image, where the first image can be a photo taken by the camera component or a frame captured from the captured video, the terminal device includes the camera component, and the camera component is generally a camera. The first image may also be an image obtained by the terminal device from the network, or an image obtained by the terminal device using a screen capture tool.
特别地,关于本申请实施例中的图像处理方法,也可以参考图3b,图3b为本申请实施例提供的图像处理方法的另一个流程示意图。图3b示意了从第一图像输出第三图像的整个流程。In particular, regarding the image processing method in the embodiment of the present application, reference may also be made to FIG. 3b, which is another schematic flowchart of the image processing method provided by the embodiment of the present application. Fig. 3b illustrates the whole process of outputting the third image from the first image.
在步骤302中,终端设备向云端设备发送第一图像。In step 302, the terminal device sends the first image to the cloud device.
在终端设备向云端设备发送第一图像前,终端设备可以对第一图像进行无损编码,得到编码数据。编码方法可以熵编码,或是其它的无损压缩方法。Before the terminal device sends the first image to the cloud device, the terminal device may perform lossless encoding on the first image to obtain encoded data. The encoding method can be entropy encoding, or other lossless compression methods.
在步骤303中,云端设备分割第一图像,获得N个第一图块。In step 303, the cloud device divides the first image to obtain N first image blocks.
云端设备可以接收到终端设备发送的第一图像。若该第一图像经过终端设备的无损编码,则云端设备还需要对其进行无损解码。云端设备分割第一图像,获得N个第一图块,N为大于1的整数。图4为本申请实施例中分割、组合图像的示意图。如图4所示,第一图像401被分割成了12个第一图块。其中,在第一图像的大小确定的情况下,第一图块的大小决定了N的数值。此处描述的N为12只是一个示例,在后续的描述中,会对第一图块的大小做详细的描述。The cloud device may receive the first image sent by the terminal device. If the first image undergoes lossless encoding by the terminal device, the cloud device also needs to perform lossless decoding on it. The cloud device divides the first image to obtain N first image blocks, where N is an integer greater than 1. FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application. As shown in FIG. 4 , the first image 401 is divided into 12 first tiles. Wherein, when the size of the first image is determined, the size of the first image block determines the value of N. The N being 12 described here is just an example, and the size of the first image block will be described in detail in the subsequent description.
可选地,N个第一图块中的每个第一图块的大小相同。Optionally, each of the N first tiles has the same size.
在步骤304中,云端设备从N个第一图块中获取M个第一均值。In step 304, the cloud device obtains M first averages from the N first tiles.
若第一图像是三通道图像,则第一图块包括三个通道的数据,云端设备获取的第一均值的数量M等于3N。若第一图像是灰度图,即一通道图像,则第一图块包括一个通道的数据,云端设备获取的第一均值的数量M等于N。因为每个通道的处理方式是类似的,为了方便描述,本申请实施例中,只以一个通道为例进行描述。均值是指第一图块中所有像素点的像素值的均值。If the first image is a three-channel image, the first tile includes data of three channels, and the number M of the first average values obtained by the cloud device is equal to 3N. If the first image is a grayscale image, that is, a one-channel image, the first block includes data of one channel, and the number M of the first mean values obtained by the cloud device is equal to N. Because the processing manner of each channel is similar, for the convenience of description, only one channel is used as an example for description in this embodiment of the present application. The mean refers to the mean of the pixel values of all the pixels in the first tile.
在步骤305中,云端设备通过N个第一均值对N个第一图块进行预处理。In step 305, the cloud device preprocesses the N first image blocks by using the N first averages.
预处理可以是将第一图块中每个像素点的像素值减去均值,以获得预处理后的N个第一图块。The preprocessing may be to subtract the mean value from the pixel value of each pixel point in the first image block to obtain N first image blocks after preprocessing.
在步骤306中,通过编码神经网络处理预处理后的N个第一图块,以获得N组第一特征图。In step 306, the pre-processed N first image blocks are processed through an encoding neural network to obtain N sets of first feature maps.
本申请实施例中,可选地,编码神经网络为CNN,终端设备可以基于CNN对预处理后的N个第一图块进行特征提取,得到N组第一特征图。每组第一特征图对应一个第一图块,每组第一特征图至少包括一个特征图。在下文中,第一特征图也可以称为通道特征图图像,其中每个语义通道对应一个第一特征图。In this embodiment of the present application, optionally, the coding neural network is a CNN, and the terminal device may perform feature extraction on the preprocessed N first image blocks based on the CNN to obtain N groups of first feature maps. Each set of first feature maps corresponds to one first block, and each set of first feature maps includes at least one feature map. Hereinafter, the first feature map may also be referred to as a channel feature map image, wherein each semantic channel corresponds to a first feature map.
本申请实施例中,参照图5,图5为本申请实施例中一种基于CNN的图像编码处理过程的示意图,图5示出了第一图块501、CNN502,通道特征图503和一组第一特征图504,其中CNN502可以包括多个CNN层。In the embodiment of the present application, referring to FIG. 5 , FIG. 5 is a schematic diagram of a CNN-based image coding process in the embodiment of the present application. The first feature map 504, where the CNN 502 may include multiple CNN layers.
例如,CNN502可以将输入数据(第一图块)的左上3×3像素乘以权重,并将其映射至第一特征图的左上端的神经元。要被乘的权重将也是3×3。此后,在相同的处理中,CNN502从左到右以及从上到下逐个地扫描输入数据(第一图块),并且乘以权重以映射特征图的神经元。这里,使用的3×3权重被称为滤波器或滤波器核。也就是说,在CNN502 中应用滤波器的过程是使用滤波器核执行卷积运算的过程,并且所提取的结果被称为“通道特征图”,其中,通道特征图也可以为称为多通道特征图图像,术语“多通道特征图图像”可以指与多个通道对应的特征图图像集。根据实施例,可以由CNN502生成通道特征图,CNN502也被称为CNN的“特征提取层”或“卷积层”。CNN的层可以定义输出到输入的映射。将由层定义的映射作为一个或多个要被应用于输入数据的滤波器核(卷积核)来执行,以生成要被输出到下一层的通道特征图。输入数据可以是第一图块或CNN502输出的通道特征图。For example, the CNN 502 can multiply the upper left 3×3 pixels of the input data (the first tile) by the weights and map them to the neurons in the upper left end of the first feature map. The weight to be multiplied will also be 3x3. Thereafter, in the same process, the CNN 502 scans the input data (the first tile) one by one from left to right and top to bottom, and multiplies the weights to map the neurons of the feature map. Here, the 3x3 weights used are called filters or filter kernels. That is to say, the process of applying filters in CNN502 is the process of performing convolution operations using filter kernels, and the extracted results are called "channel feature maps", where the channel feature maps can also be called multi-channel Feature map image, the term "multi-channel feature map image" may refer to a set of feature map images corresponding to multiple channels. According to an embodiment, the channel feature map may be generated by CNN 502, also referred to as a "feature extraction layer" or "convolutional layer" of a CNN. The layers of a CNN can define the mapping of output to input. The mapping defined by the layers is performed as one or more filter kernels (convolution kernels) to be applied to the input data to generate channel feature maps to be output to the next layer. The input data can be the first block or the channel feature map output by CNN502.
参照图5,在向前执行期间,CNN502接收第一图块501并作为输入而生成通道特征图503。另外,在向前执行期间,下一层CNN接收通道特征图503作为输入,并作为输出而生成通道特征图503。然后,每一个后续层将接收在前一层中生成的通道特征图,并作为输如而生成下一层通道特征图。最后,接收在第(X1)层中生成的一组第一特征图504。其中,X1为大于1的整数,即上述的每一层的通道特征图,都有可能作为一组第一特征图504。Referring to Figure 5, during forward execution, a CNN 502 receives a first tile 501 and generates a channel feature map 503 as input. Additionally, during forward execution, the next layer of CNN receives channel feature map 503 as input and generates channel feature map 503 as output. Then, each subsequent layer will receive the channel feature map generated in the previous layer and use it as an input to generate the channel feature map of the next layer. Finally, a set of first feature maps 504 generated in the (X1)th layer is received. Wherein, X1 is an integer greater than 1, that is, the channel feature maps of each layer above may be used as a set of first feature maps 504 .
云端设备对每个第一图块重复上述的操作,便可以获得N组第一特征图。The cloud device repeats the above operations for each first block, so as to obtain N groups of first feature maps.
可选地,随着CNN502层次的增加,多通道特征图图像中每个特征图的长和宽逐渐减小,多通道特征图图像的语义通道的数量逐渐增加,以此实现对第一图块的数据压缩。Optionally, with the increase of the CNN502 level, the length and width of each feature map in the multi-channel feature map image gradually decrease, and the number of semantic channels in the multi-channel feature map image gradually increases, so as to realize the first image block. data compression.
同时,除了应用将输入特征图映射到输出特征图的卷积核的操作之外,还可以执行其他处理操作。其他处理操作的示例可以包括但不限于诸如激活功能、池化、重采样等的应用。At the same time, other processing operations can be performed in addition to the operation of applying convolution kernels that map input feature maps to output feature maps. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.
例如,如图3b所示,可选地,在每层卷积核之后,还包括GDN(generalized divisive normalization,广义分歧归一化)激活函数,GDN的表达形式为:For example, as shown in Figure 3b, optionally, after each layer of convolution kernel, a GDN (generalized divisive normalization) activation function is also included, and the expression form of GDN is:
Figure PCTCN2021101807-appb-000107
Figure PCTCN2021101807-appb-000107
其中,u表示第i个卷积层输出的第j个通道。v表示对应的激活函数输出结果,β和γ分别是激活函数的可训练参数,用来增强神经网络的非线性表达能力。where u represents the jth channel of the output of the ith convolutional layer. v represents the output result of the corresponding activation function, and β and γ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
需要说明的是,以上仅为对所述第一图块进行特征提取的一种实现方式,在实际应用中,并不限定特征提取的具体实现方式。It should be noted that the above is only an implementation manner of performing feature extraction on the first image block, and in practical applications, the specific implementation manner of feature extraction is not limited.
本申请实施例中,通过上述方式,通过CNN卷积神经网络将第一图块变换到另一空间(至少一个第一特征图)。可选的,第一特征图的数量为192,即语义通道的数量为192,每一个语义通道对应一个第一特征图。本申请实施例中,至少一个第一特征图可以为一个三维张量的形式,其尺寸可以为192×w×h,其中,w×h为单个通道的第一特征图对应的矩阵的宽与长。In this embodiment of the present application, in the above manner, the first image block is transformed into another space (at least one first feature map) through the CNN convolutional neural network. Optionally, the number of first feature maps is 192, that is, the number of semantic channels is 192, and each semantic channel corresponds to a first feature map. In this embodiment of the present application, at least one first feature map may be in the form of a three-dimensional tensor, and its size may be 192×w×h, where w×h is the width of the matrix corresponding to the first feature map of a single channel and the long.
在步骤307中、对N组第一特征图进行量化和熵编码,以获得N个第一编码表示。In step 307, the N groups of first feature maps are quantized and entropy encoded to obtain N first encoded representations.
本申请实施例中,通过编码神经网络处理预处理后的N个第一图块,获得N组第一特征图后,可以对处理后的N组第一特征图进行量化和熵编码,得到N个第一编码表示。In the embodiment of the present application, after the N groups of first feature maps are obtained by processing the preprocessed N first feature maps by the coding neural network, quantization and entropy encoding may be performed on the processed N groups of first feature maps to obtain N A first encoded representation.
本申请实施例中,将N组第一特征图根据指定规则转换至量化中心,以便后续进行熵编码。量化操作可以将N组第一特征图由浮点数转换为比特流(例如,使用8位整数或4位整数等特定位整数的比特流)。在一些实施例中,可以但不限于采用四舍五入round对N组第一特征图执行量化操作。In this embodiment of the present application, the N groups of first feature maps are converted to a quantization center according to a specified rule, so that entropy coding can be performed subsequently. The quantization operation may convert the N sets of first feature maps from floating point numbers into bit streams (eg, bit streams using specific bit integers such as 8-bit integers or 4-bit integers). In some embodiments, the quantization operation may be performed on the N sets of first feature maps using rounding, but not limited to.
本申请实施例中,可以利用熵估计网络得到输出特征中各点概率估计,利用该概率估计对输出特征进行熵编码,得到二进制的码流,需要说明的是,本申请提及的熵编码过程可采用现有的熵编码技术,本申请对此不再赘述。In the embodiment of the present application, the probability estimation of each point in the output feature can be obtained by using an entropy estimation network, and the output feature is entropy encoded by using the probability estimation to obtain a binary code stream. It should be noted that the entropy encoding process mentioned in this application is Existing entropy coding technology can be used, which is not repeated in this application.
在步骤308中,云端设备向终端设备发送N个第一编码表示,N个第一均值和对应关系。In step 308, the cloud device sends the N first encoded representations, the N first mean values and the corresponding relationship to the terminal device.
在上述步骤302中,终端设备将第一图像存储于云端设备。若终端设备需要获取该第一图像,可以向云端设备发送请求。云端设备接收到终端设备发送的请求后,云端设备向终端设备发送N个第一编码表示,N个第一均值和对应关系。其中,对应关系是指N个第一编码表示和N个第一均值的对应关系。In the above step 302, the terminal device stores the first image in the cloud device. If the terminal device needs to acquire the first image, it can send a request to the cloud device. After the cloud device receives the request sent by the terminal device, the cloud device sends N first encoded representations, N first mean values and corresponding relationships to the terminal device. The correspondence relationship refers to the correspondence relationship between the N first coded representations and the N first mean values.
可选地,N个第一编码表示的排列顺序和N个第一图块的排列顺序相同,N个第一图块的排列顺序为N个第一图块在第一图像中的排列顺序,对应关系包括N个第一编码表示的排列顺序和N个第一图块的排列顺序。Optionally, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is the arrangement order of the N first tiles in the first image, The corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
可选地,在云端设备向终端设备发N个第一均值之前,云端设备量化该N个第一均值,以获得N个第一量化均值。例如,第一图块中的每个像素点的像素值用8比特表示,第一图块的第一均值为32比特的浮点数。云侧设备量化第一均值,得到的第一量化均值的比特数小于32。第一量化均值的比特数越小,第一量化均值的信息熵越小。进一步地,第一量化均值的比特数等于第一图块中的每个像素点的像素值的比特数,即当第一图块中的每个像素点的像素值用8比特表示时,第一量化均值也用8比特表示。Optionally, before the cloud device sends the N first mean values to the terminal device, the cloud device quantizes the N first mean values to obtain N first quantized mean values. For example, the pixel value of each pixel in the first tile is represented by 8 bits, and the first mean value of the first tile is a 32-bit floating point number. The cloud-side device quantizes the first mean value, and the number of bits of the obtained first quantized mean value is less than 32. The smaller the number of bits of the first quantized mean, the smaller the information entropy of the first quantized mean. Further, the number of bits of the first quantized mean is equal to the number of bits of the pixel value of each pixel in the first tile, that is, when the pixel value of each pixel in the first tile is represented by 8 bits, the A quantized mean value is also represented by 8 bits.
可选地,在云端设备量化N个第一均值的基础上,N的数值越大,单个第一量化均值的信息熵越小。信息熵用于形容云侧设备对N个第一均值的量化程度。若单个第一量化均值的信息熵越小,则表示云侧设备对N个第一均值的量化程度越高。在处理相同像素的第一图像时,每个第一图像的像素越小,则N越大。N越大,N个第一均值的数据量就越大。例如,假设第一图像的像素为640×480,第一图块的像素为320×480,N为2,每个第一量化均值用8比特表示,则N个第一量化均值的数据量为2×8比特。假设第一图块的像素为1×1,N为640×480,每个第一量化均值用8比特表示,则N个第一量化均值的数据量为640×480×8比特。第一图像的数据量也为640×480×8比特。由此可知,N值越大,N个第一均值的数据量越大,当N等于第一图像的像素大小时,即使是量化后的第一均值的数据量也达到了第一图像的数据量。因此,本申请实施例中N的数值越大,单个第一量化均值的信息熵越小。Optionally, based on the cloud device quantizing the N first averages, the larger the value of N, the smaller the information entropy of a single first quantized average. The information entropy is used to describe the quantization degree of the N first means by the cloud-side device. If the information entropy of a single first quantized mean value is smaller, it indicates that the cloud-side device has a higher quantization degree on the N first mean values. When processing the first images of the same pixel, the smaller the pixel of each first image, the larger the N is. The larger N is, the larger the amount of data of the N first averages is. For example, assuming that the pixels of the first image are 640×480, the pixels of the first block are 320×480, N is 2, and each first quantized mean is represented by 8 bits, the data amount of the N first quantized mean is 2x8 bits. Assuming that the pixels of the first block are 1×1, N is 640×480, and each first quantized mean value is represented by 8 bits, the data amount of the N first quantized mean values is 640×480×8 bits. The data amount of the first image is also 640×480×8 bits. It can be seen from this that the larger the value of N, the larger the data volume of the N first averages. When N is equal to the pixel size of the first image, even the data volume of the quantized first averages has reached the data volume of the first image. quantity. Therefore, the larger the value of N in the embodiment of the present application, the smaller the information entropy of the single first quantized mean value.
在步骤309中,终端设备对N个第一编码表示进行熵解码,以获得N组第二特征图。In step 309, the terminal device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
终端设备接收到云端设备发送的N个第一编码表示后,终端设备对N个第一编码表示进行熵解码,以获得N组第二特征图。After the terminal device receives the N first encoded representations sent by the cloud device, the terminal device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
在步骤310中,终端设备通过解码神经网络处理N组第二特征图,以获得N个第一重构图块。In step 310, the terminal device processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks.
本申请实施例中,可选地,解码神经网络为CNN,终端设备可以基于CNN对N组第二特征图进行重构,得到N组第一图块。每组第二特征图对应一个第一图块,每组第二特征图至少包括一个特征图。在下文中,第二特征图也可以称为重构特征图图像,其中每个语义通道对应一个第二特征图。In this embodiment of the present application, optionally, the decoding neural network is CNN, and the terminal device may reconstruct N groups of second feature maps based on the CNN to obtain N groups of first image blocks. Each set of second feature maps corresponds to a first block, and each set of second feature maps includes at least one feature map. Hereinafter, the second feature map may also be referred to as a reconstructed feature map image, wherein each semantic channel corresponds to a second feature map.
本申请实施例中,参照图6,图6为本申请实施例中一种基于CNN的图像解码过程的 示意图,图6示出了一组第二特征图601,转置CNN602,重构特征图603,第一重构图块604。其中转置CNN602可以包括多个转置CNN层。In the embodiment of the present application, referring to FIG. 6, FIG. 6 is a schematic diagram of a CNN-based image decoding process in the embodiment of the present application, and FIG. 6 shows a set of second feature maps 601, transposed CNN 602, and reconstructed feature maps 603 , the first reconstructed image block 604 . The transposed CNN 602 may include multiple transposed CNN layers.
例如,转置CNN602可以将输入数据(一组第二特征图601)的左上一个像素乘以权重,并将其映射至重构特征图603的左上端的神经元。要被乘的权重将是3×3。此后,在相同的处理中,转置CNN602从左到右以及从上到下逐个地扫描输入数据(一组第二特征图601),并且乘以权重以映射特征图的神经元。经过权重3×3的转置CNN602后,得到的重构特征图603的长和宽都变为第二特征图的3倍。这里,使用的3×3权重被称为反滤波器或反滤波器核。也就是说,在转置CNN602中应用反滤波器的过程是使用反滤波器核执行反卷积运算的过程,并且所提取的结果被称为“重构特征图”。根据实施例,可以由转置CNN602生成重构特征图,转置CNN602也被称为CNN的转置卷积层。CNN的层可以定义输出到输入的映射。将由层定义的映射作为一个或多个要被应用于输入数据的反滤波器核(转置卷积层)来执行,以生成要被输出到下一层的重构特征图。输入数据可以是一组第二特征图或特定层的重构特征映像。For example, the transposed CNN 602 can multiply the top left pixel of the input data (a set of second feature maps 601 ) by a weight and map it to the neuron on the top left of the reconstructed feature map 603 . The weight to be multiplied will be 3x3. Thereafter, in the same process, the transposed CNN 602 scans the input data (a second set of feature maps 601 ) one by one from left to right and from top to bottom, and multiplies the weights to map the neurons of the feature map. After transposing the CNN 602 with a weight of 3×3, the length and width of the reconstructed feature map 603 obtained become three times that of the second feature map. Here, the 3x3 weights used are called inverse filters or inverse filter kernels. That is, the process of applying an inverse filter in the transposed CNN 602 is a process of performing a deconvolution operation using an inverse filter kernel, and the extracted result is called a "reconstructed feature map". According to an embodiment, the reconstructed feature map may be generated by the transposed CNN 602, also known as the transposed convolutional layer of the CNN. The layers of a CNN can define the mapping of output to input. The mapping defined by the layers is performed as one or more inverse filter kernels (transposed convolutional layers) to be applied to the input data to generate reconstructed feature maps to be output to the next layer. The input data can be a set of second feature maps or a reconstructed feature map of a particular layer.
参照图6,转置CNN602接收一组第二特征图601并作为输出而生成重构特征图603。另外,下一层转置CNN接收重构特征图603作为输入,并作为输出而生成下一层的重构特征图。然后,每一个后续的转置CNN层将接收在前一层中生成的重构特征图,并作为输出而生成下一重构特征图。最后,接收在第(X2)层中生成的第一重构图块604,其中,X2为大于1的整数,即上述的每一层的重构特征图,都有可能作为第一重构图块604。云端设备对每组第二特征图重复上述的操作,便可以获得N个第一重构图块。Referring to FIG. 6 , the transposed CNN 602 receives a set of second feature maps 601 and generates a reconstructed feature map 603 as output. In addition, the next-layer transposed CNN receives the reconstructed feature map 603 as an input, and generates a reconstructed feature map of the next layer as an output. Each subsequent transposed CNN layer will then receive the reconstructed feature map generated in the previous layer and generate the next reconstructed feature map as output. Finally, the first reconstructed image block 604 generated in the (X2)th layer is received, where X2 is an integer greater than 1, that is, the reconstructed feature map of each layer above may be used as the first reconstructed map Block 604. The cloud device repeats the above operations for each set of second feature maps, so as to obtain N first reconstructed image blocks.
可选地,随着转置CNN的层数的增加,重构特征图中每个特征图的长和宽逐渐增大,直至恢复第一图块在输入编码神经网络前的大小。重构特征图的语义通道的数量逐渐减小,直至恢复第一图块在输入编码神经网络前的语义通道,当第一图块为单通道图像时,第一重构图块604的语义通道为1,当第一图块为三通道图像时,第一重构图块604的语义通道为3。通过上述重构,以此实现对第一图块的数据解码。Optionally, as the number of layers of the transposed CNN increases, the length and width of each feature map in the reconstructed feature map gradually increase until the size of the first block before being input to the encoding neural network is restored. The number of semantic channels of the reconstructed feature map gradually decreases until the semantic channel of the first image block before being input to the encoding neural network is restored. When the first image block is a single-channel image, the semantic channel of the first reconstructed image block 604 is restored. is 1, when the first tile is a three-channel image, the semantic channel of the first reconstructed tile 604 is 3. Through the above reconstruction, the data decoding of the first image block is realized.
同时,除了应用将每组第二特征图映射到重构特征图的转置卷积核的操作之外,还可以执行其他处理操作。其他处理操作的示例可以包括但不限于诸如激活功能、池化、重采样等的应用。Meanwhile, in addition to applying the operation of mapping each set of second feature maps to the transposed convolution kernel of the reconstructed feature map, other processing operations can also be performed. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.
例如,可选地,如图3b所示,在解码神经网络中的每层转置卷积核之后,还包括逆广义分歧归一化(inverse generalized divisive normalization,iGDN),iGDN是编码段GDN激活函数的近似逆形式,iGDN的表达形式为:For example, optionally, as shown in Figure 3b, after each layer of transposed convolution kernels in the decoding neural network, inverse generalized divergence normalization (iGDN) is also included, where iGDN is the encoding segment GDN activation The approximate inverse form of the function, iGDN is expressed as:
Figure PCTCN2021101807-appb-000108
Figure PCTCN2021101807-appb-000108
其中,v表示第i个卷积层输出的第j个通道。u表示对应的激活函数输出结果,β和γ分别是激活函数的可训练参数,用来增强神经网络的非线性表达能力。where v represents the jth channel of the output of the ith convolutional layer. u represents the output of the corresponding activation function, and β and γ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
在步骤311中,终端设备通过N个第一均值补偿N个第一重构图块。In step 311, the terminal device compensates the N first reconstructed image blocks by using the N first mean values.
在上述步骤308中,云端设备向终端设备发送了N个第一均值和对应关系。在通过解码神经网络获得N个第一重构图块后,终端设备通过对应关系,利用N个第一均值补偿N个第一重构图块。补偿是指将第一重构图块中每个像素点的像素值加第一均值,获得补偿后的第一重构图块。终端设备重复对N个第一重构图块执行补偿后,可以获得补偿后的N 个第一重构图块。In the above step 308, the cloud device sends the N first mean values and the corresponding relationship to the terminal device. After obtaining the N first reconstructed image blocks through the decoding neural network, the terminal device compensates the N first reconstructed image blocks by using the N first mean values through the corresponding relationship. Compensation refers to adding a first mean value to the pixel value of each pixel in the first reconstructed image block to obtain a compensated first reconstructed image block. After the terminal device repeatedly performs compensation on the N first reconstructed picture blocks, N first reconstructed picture blocks after compensation can be obtained.
可选地,当终端设备从云端设备接收到的是N个第一量化均值时,终端设备通过N个第一量化均值补偿N个第一重构图块。应当确定的是,当终端设备通过N个第一量化均值补偿N个第一重构图块时,云端设备也将通过N个第一量化均值对N个第一图块进行预处理。Optionally, when the terminal device receives N first quantized mean values from the cloud device, the terminal device compensates the N first reconstructed image blocks by using the N first quantized mean values. It should be determined that when the terminal device compensates the N first reconstructed image blocks by using the N first quantized average values, the cloud device will also preprocess the N first image blocks by using the N first quantized average values.
在步骤312中,终端设备组合补偿后的N个第一重构图块,以获得第二图像。In step 312, the terminal device combines the compensated N first reconstructed picture blocks to obtain a second image.
请参阅图4,组合为分割的逆过程,将N个第一图块替换为N个第一重构图块,然后对N个第一重构图块进行组合。Referring to FIG. 4 , the combination is an inverse process of division, the N first reconstructed tiles are replaced with N first reconstructed tiles, and then the N first reconstructed tiles are combined.
在步骤313中,终端设备通过融合神经网络处理第二图像,以获得第三图像。In step 313, the terminal device processes the second image through the fusion neural network to obtain a third image.
本申请实施例通过凸显每个第一图块的局部特性,增强了每个图块的表现,但也因此容易造成第一重构图块与第一重构图块之间的块效应。块效应是指在第一重构图块与第一重构图块的边界会出现不连续现象,形成重建图像的缺陷。通过融合神经网络处理第二图像,可以减少块效应造成的影响,提升图像质量。The embodiments of the present application enhance the performance of each first image block by highlighting the local characteristics of each first image block, but it is also easy to cause blockiness between the first reconstructed image block and the first reconstructed image block. Blockiness refers to a discontinuity phenomenon at the boundary between the first reconstructed image block and the first reconstructed image block, forming a defect in the reconstructed image. By processing the second image by fusing the neural network, the influence caused by the block effect can be reduced and the image quality can be improved.
可选地,融合神经网络为CNN。请参阅图5和图6,从CNN的结构上来说,融合神经网络可以是编码神经网络和解码神经网络的结合。通过将图5中的输出504作为图6中的输入601,将第二图像作为图5的输入501,图6输出的是第三图像。通过融合神经网络,便可以消除第二图像中的块效应。应当确定的是,这里是简单举例说明融合神经网络的框架。在实际应用中,融合神经网络的框架,例如CNN的层数,转置CNN的层数,每个CNN层的矩阵的大小等可以与编码神经网络,解码神经网络没有关系。Optionally, the fusion neural network is a CNN. Please refer to Figure 5 and Figure 6. From the structure of CNN, the fusion neural network can be a combination of encoding neural network and decoding neural network. By taking the output 504 in FIG. 5 as the input 601 in FIG. 6 , and taking the second image as the input 501 in FIG. 5 , the output of FIG. 6 is the third image. By fusing the neural network, the blocking effect in the second image can be eliminated. It should be confirmed that here is a simple example of the framework of a fusion neural network. In practical applications, the framework of the fusion neural network, such as the number of layers of CNN, the number of layers of transposed CNN, the size of the matrix of each CNN layer, etc. can have nothing to do with encoding neural network, decoding neural network.
可选地,如图3b所示,在融合神经网络的卷积核后,还包括线性整流单元层ReLU,ReLU用于将卷积核输出的特征图中的负数修正为零。Optionally, as shown in Figure 3b, after the convolution kernel of the neural network is fused, a linear rectifier unit layer ReLU is also included, and the ReLU is used to correct the negative numbers in the feature map output by the convolution kernel to zero.
上面对使用本申请实施例中的图像处理方法处理一张图像的流程进行了相应描述,可选地,本申请实施例中的图像处理方法可以处理不同大小的图像,例如第四图像,第四图像的像素与第一图像的像素不同。使用本申请实施例中的图像处理方法处理第四图像的流程与上述处理第一图像的流程类似,具体此处不再赘述。特别地,在使用本申请实施例中的图像处理方法处理第四图像时,云端设备分割第四图像,可以获得M个第二图块。其中,第二图块的大小与第一图块的大小相同。在第一图块和第二图块的大小相同的情况下,使用相同的编码神经网络和解码神经网络处理第一图块和第二图块,第一图块和第二图块在处理流程中的卷积运算的次数以及每次参与卷积运算的数据的个数是相同的。在此情况下,就可以根据上述卷积运算的次数和/或每次参与卷积运算的数据的个数设计相应的卷积运算单元,使得卷积运算单元与处理流程相匹配。因为处理流程中的卷积运算的次数以及每次参与卷积运算的数据的个数由第一图块的大小和CNN决定,因此也可以认为卷积运算单元和第一图块匹配,或卷积运算单元和编码神经网络和/或解码神经网络匹配。卷积运算单元与第一图块的匹配度越高,则在处理流程中,卷积运算单元中空闲的乘法器,加法器的数量越小,即卷积运算单元的使用效率越高。The flow of processing an image by using the image processing method in the embodiment of the present application has been described accordingly. Optionally, the image processing method in the embodiment of the present application can process images of different sizes, such as the fourth image, the third image. The pixels of the four images are different from the pixels of the first image. The process of processing the fourth image by using the image processing method in the embodiment of the present application is similar to the process of processing the first image above, and details are not repeated here. In particular, when the fourth image is processed using the image processing method in the embodiment of the present application, the cloud device divides the fourth image, and M second image blocks can be obtained. The size of the second image block is the same as that of the first image block. When the size of the first tile and the second tile are the same, the same encoding neural network and decoding neural network are used to process the first tile and the second tile, and the first tile and the second tile are in the processing flow The number of convolution operations in and the number of data involved in each convolution operation are the same. In this case, a corresponding convolution operation unit can be designed according to the number of times of the above-mentioned convolution operation and/or the number of data involved in the convolution operation each time, so that the convolution operation unit matches the processing flow. Because the number of convolution operations in the processing flow and the number of data involved in each convolution operation are determined by the size of the first block and the CNN, it can also be considered that the convolution operation unit matches the first block, or the volume The product operation unit is matched to the encoding neural network and/or the decoding neural network. The higher the matching degree between the convolution operation unit and the first image block, the smaller the number of idle multipliers and adders in the convolution operation unit in the processing flow, that is, the higher the usage efficiency of the convolution operation unit.
上面对本申请实施例中的图像处理方法进行了相应描述。上述流程中,第一图块的大小不仅会影响N的大小,还会影响图像是否刚好被分割成整数块的图块。而第一图块的大小的决定因素一般有以下两方面。第一方面,是模型对第一图块的大小的影响,模型包括编码神经网络和解码神经网络,可能还包括融合神经网络。模型对第一图块的大小的影响 一般包括训练模型时对第一图块的大小的影响和使用模型时对第一图块的大小的影响。训练模型时对第一图块的大小的影响包括利用不同大小的图块对模型进行训练,确定出在哪个区间,或哪个数值的图块下,模型的收敛速度比较快,或模型输出的图像质量高,或模型的压缩性能高。不同的模型针对不同大小的图像块,在使用不同的模型时,不同的模型在不同场景的表现可能会有差异,即模型的泛化性问题。使用模型时对第一图块的大小的影响包括该泛化性问题。在第二方面,是图像是否刚好被分割成整数块的图块对第一图块的大小的影响。若图像无法被刚好分割成整数块,则会存在某些图块不完整,影响模型对该图块的重构,降低图像的质量。为了减少第二方面对图像质量的影响,下面提出了一些相关的技术方案。The image processing methods in the embodiments of the present application are described above accordingly. In the above process, the size of the first tile not only affects the size of N, but also affects whether the image is just divided into tiles of integer blocks. The size of the first image block is generally determined by the following two aspects. The first aspect is the influence of the model on the size of the first tile. The model includes an encoding neural network and a decoding neural network, and may also include a fusion neural network. The influence of the model on the size of the first tile generally includes the impact on the size of the first tile when the model is trained and the impact on the size of the first tile when the model is used. The influence on the size of the first block when training the model includes using blocks of different sizes to train the model to determine in which interval or which value block the model converges faster, or the image output by the model The quality is high, or the compression performance of the model is high. Different models target image blocks of different sizes. When different models are used, the performance of different models in different scenes may be different, that is, the generalization problem of the model. The effect on the size of the first tile when using the model includes this generalization problem. In a second aspect, it is the effect on the size of the first tile whether the image is just divided into tiles of integer blocks. If the image cannot be exactly divided into integer blocks, some blocks will be incomplete, which will affect the reconstruction of the block by the model and reduce the quality of the image. In order to reduce the influence of the second aspect on the image quality, some related technical solutions are proposed below.
在上述步骤302中,终端设备向云端设备发送了第一图像。在这种场景下,云端设备中的编码神经网络可以专门服务于该终端设备,或该类终端设备。若终端设备包括摄像部件,例如相机,则希望终端设备通过相机获得的第一图像可以被云端设备分割成整数块。假设第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,a和b是根据目标像素得到的,目标像素为c×d,
Figure PCTCN2021101807-appb-000109
等于整数,
Figure PCTCN2021101807-appb-000110
等于整数,目标像素是根据终端设备的目标分辨率得到的,目标分辨率是终端设备的拍摄部件的默认分辨率,或终端设备对其设定的分辨率。终端设备的摄像部件在目标分辨率的设置下得到的图像的像素为目标像素,第一图像是根据摄像部件得到的。特别地,当终端设备为编码端时,例如前述第一种应用场景,通过目标分辨率来确定第一图块的大小是更有意义的。因为目标分辨率表明了终端设备未来可能会获取的图像的像素,即编码端未来将会使用本申请实施例中的图像处理方法来处理的图像的像素,因此在训练模型时,便可以针对该目标像素进行训练。
In the above step 302, the terminal device sends the first image to the cloud device. In this scenario, the coded neural network in the cloud device can specifically serve the terminal device, or this type of terminal device. If the terminal device includes a camera component, such as a camera, it is expected that the first image obtained by the terminal device through the camera can be divided into integer blocks by the cloud device. Assuming that the pixels of the first block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, a and b are obtained from the target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000109
is equal to an integer,
Figure PCTCN2021101807-appb-000110
Equal to an integer, the target pixel is obtained according to the target resolution of the terminal device, and the target resolution is the default resolution of the photographing component of the terminal device, or the resolution set by the terminal device for it. The pixels of the image obtained by the imaging component of the terminal device under the setting of the target resolution are the target pixels, and the first image is obtained according to the imaging component. In particular, when the terminal device is the encoding end, such as the aforementioned first application scenario, it is more meaningful to determine the size of the first image block by the target resolution. Because the target resolution indicates the pixels of the image that the terminal device may acquire in the future, that is, the pixels of the image that the encoding end will use the image processing method in the embodiment of this application to process in the future, when training the model, it can target pixels for training.
可选地,目标分辨率是根据摄像应用中的设置界面对摄像部件的分辨率设置得到的。其中,摄像应用的设置界面可对摄像部件拍摄得到的分辨力进行设置。将设置界面中已经选定的分辨力作为目标分辨力。请参阅图7,图7为本申请实施例中的终端设备的相机设置分辨率的设置界面的示意图。在设置界面示意图中,分辨率为[4:3]10MP的选项701被选定,虽然此处的选项未具体说明目标分辨率的具体数值。但根据拍摄得到的第一图像,可知第一图像的像素为2736×3648,即目标像素为2736×3648。通过确定目标像素,确定第一图块的大小,使得
Figure PCTCN2021101807-appb-000111
等于整数,
Figure PCTCN2021101807-appb-000112
等于整数。
Optionally, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application. Wherein, the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution. Please refer to FIG. 7 , which is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application. In the schematic diagram of the setting interface, the option 701 with a resolution of [4:3] 10MP is selected, although the option here does not specify the specific value of the target resolution. However, according to the first image obtained by shooting, it can be known that the pixels of the first image are 2736×3648, that is, the target pixels are 2736×3648. By determining the target pixel, the size of the first tile is determined such that
Figure PCTCN2021101807-appb-000111
is equal to an integer,
Figure PCTCN2021101807-appb-000112
equal to an integer.
可选地,目标分辨率是根据摄像部件得到的图库中的目标图像组得到的,目标图像组的像素为目标像素,在不同像素的图像组中,目标图像组在图库中的比值最大。其中,编码端通过摄像部件得到的图库包括不同像素的图像组。例如,如图7所示,该设置界面示意图对应的终端可以通过相机获得4种像素的图像。在终端设备的相机的图库中,确定哪种像素的图像占比最多,则可以保证该像素的图像可以刚好被分割成整数块。Optionally, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and among the image groups with different pixels, the ratio of the target image group in the gallery is the largest. Wherein, the library obtained by the encoding end through the camera component includes image groups of different pixels. For example, as shown in FIG. 7 , the terminal corresponding to the schematic diagram of the setting interface can obtain images of 4 types of pixels through the camera. In the gallery of the camera of the terminal device, if it is determined which pixel has the largest proportion of the image, it can be guaranteed that the image of the pixel can be divided into integer blocks exactly.
可选地,通过摄像部件得到多个像素的图像,多个像素为e×f,
Figure PCTCN2021101807-appb-000113
等于整数,
Figure PCTCN2021101807-appb-000114
等于整数,e包括c,f包括d。其中,终端设备可通过摄像部件得到不同像素的图像,e×f是不同像素的图像的像素集合。例如,如图7所示,该设置界面示意图对应的终端可以通过相机获得4种像素的图像。若
Figure PCTCN2021101807-appb-000115
等于整数,
Figure PCTCN2021101807-appb-000116
等于整数,则表明这4种像素的图像都可以被分 割成整数块。
Optionally, an image of multiple pixels is obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000113
is equal to an integer,
Figure PCTCN2021101807-appb-000114
Equal to an integer, e includes c and f includes d. Wherein, the terminal device can obtain images of different pixels through the camera component, and e×f is a pixel set of images of different pixels. For example, as shown in FIG. 7 , the terminal corresponding to the schematic diagram of the setting interface can obtain images of 4 types of pixels through the camera. like
Figure PCTCN2021101807-appb-000115
is equal to an integer,
Figure PCTCN2021101807-appb-000116
If it is equal to an integer, it means that the image of these 4 kinds of pixels can be divided into integer blocks.
可选地,e×f还包括终端设备通过截屏获得的像素。Optionally, e×f further includes pixels obtained by the terminal device by taking screenshots.
可选地,多个像素是通过摄像应用中的设置界面对摄像部件的分辨率设置得到的。Optionally, the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
上面对尽量保证第一图像被分割成整数块的方案进行了描述,但是在实际应用中,总有图像是无法被分割成整数块的。在这种情况下,为了提高模型的兼容性,需要对第一图像进行填充,下面对此进行相关描述。The scheme of ensuring that the first image is divided into integer blocks as far as possible has been described above, but in practical applications, there are always images that cannot be divided into integer blocks. In this case, in order to improve the compatibility of the model, the first image needs to be filled, which will be described below.
可选地,第一图块的像素为a×b,a为宽度方向上的像素点数量,b为高度方向上的像素点数量,第一图像的的像素为r×t。在获取第一图像后,在分割第一图像之前,方法还包括:若
Figure PCTCN2021101807-appb-000117
不等于整数,和/或
Figure PCTCN2021101807-appb-000118
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000119
等于整数,
Figure PCTCN2021101807-appb-000120
等于整数,填充后的第一图像的像素为r1×t1。像素中值是指第一图像的单个像素点的像素值的中间值,例如,当用8比特表示第一图像的一个像素点时,像素中值为128。通过用图像中值填充图像的边缘,可以在降低对图像质量的影响的情况下,提高模型的兼容性。图像中值是像素点的中值。
Optionally, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r×t. After acquiring the first image, before dividing the first image, the method further includes: if
Figure PCTCN2021101807-appb-000117
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000118
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000119
is equal to an integer,
Figure PCTCN2021101807-appb-000120
equal to an integer, the pixels of the first image after filling are r1×t1. The pixel median value refers to the median value of the pixel values of a single pixel point of the first image, for example, when 8 bits are used to represent one pixel point of the first image, the pixel median value is 128. By padding the edges of the image with the image median, you can improve model compatibility with less impact on image quality. The image median is the median of the pixels.
可选地,在填充第一图像的边缘之前,还包括:若所述
Figure PCTCN2021101807-appb-000121
不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
Figure PCTCN2021101807-appb-000122
等于整数。其中,填充像素中值的图块的数量会影响图像的质量。通过等比放大图像,减少填充图像中值的图块的数量,提升图像质量。如图8所示,图8为本申请实施例中图像填充的一个流程示意图。在图8的8a中,第一图块的像素为a×b,第一图像的的像素为r×t,等比放大r和t后,如图8的8b所示,第一图像的像素为r2×t2。放大前,如图8的8a所示,需要填充的图块的个数为6个,在放大后,如图8的8b所示,需要填充的图块的个数为4个,因此减少了需要填充的图块的数量。特别地,若
Figure PCTCN2021101807-appb-000123
不等于整数,等比放大r和t,使得
Figure PCTCN2021101807-appb-000124
则使得需要填充的图块的个数缩减为2个。
Optionally, before filling the edge of the first image, the method further includes: if the
Figure PCTCN2021101807-appb-000121
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
Figure PCTCN2021101807-appb-000122
equal to an integer. Among them, the number of tiles that fill the median of the pixels affects the quality of the image. Improve image quality by scaling up the image proportionally, reducing the number of tiles that fill the value in the image. As shown in FIG. 8 , FIG. 8 is a schematic flowchart of image filling in an embodiment of the present application. In 8a of FIG. 8 , the pixels of the first block are a×b, and the pixels of the first image are r×t. After proportionally enlarging r and t, as shown in 8b of FIG. 8 , the pixels of the first image are r×t. is r2×t2. Before zooming in, as shown in 8a of Figure 8, the number of tiles to be filled is 6. After zooming in, as shown in Figure 8b of Figure 8, the number of tiles to be filled is 4, so the number of tiles to be filled is 4. The number of tiles to fill. In particular, if
Figure PCTCN2021101807-appb-000123
not equal to an integer, scale r and t proportionally so that
Figure PCTCN2021101807-appb-000124
Then the number of tiles to be filled is reduced to 2.
可选地,在等比放大r和t后,若
Figure PCTCN2021101807-appb-000125
不等于整数,则获取
Figure PCTCN2021101807-appb-000126
的余数。若余数大于
Figure PCTCN2021101807-appb-000127
则只在第一图像的宽度方向的一侧填充像素中值。其中,只在图像的一侧填充像素中值,在减少填充对图像块的影响的情况下,进一步减少填充图像中值的图块的数量,提升图像质量。如图8的8b所示,余数g大于
Figure PCTCN2021101807-appb-000128
如图8的8c所示,在第一图像的单侧填充像素中值。
Optionally, after proportionally enlarging r and t, if
Figure PCTCN2021101807-appb-000125
not equal to an integer, get
Figure PCTCN2021101807-appb-000126
the remainder. If the remainder is greater than
Figure PCTCN2021101807-appb-000127
Then only the pixel median value is filled on one side of the width direction of the first image. Among them, only one side of the image is filled with the median value of the pixel, and the number of blocks filled with the median value of the image is further reduced under the condition of reducing the impact of the filling on the image block, and the image quality is improved. As shown in 8b of Fig. 8, the remainder g is greater than
Figure PCTCN2021101807-appb-000128
As shown in 8c of Fig. 8, the pixel median is filled on one side of the first image.
可选地,若余数小于
Figure PCTCN2021101807-appb-000129
则在第一图像的宽度方向的二侧填充像素中值,使得每侧填充的所述像素中值的宽度为
Figure PCTCN2021101807-appb-000130
其中,g为所述余数。其中,减少填充对图像块的影响,提升图像质量。如图9所示,图9为本申请实施例中图像填充的另一个流程示意图。
Optionally, if the remainder is less than
Figure PCTCN2021101807-appb-000129
Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the median value of the pixels filled on each side is
Figure PCTCN2021101807-appb-000130
where g is the remainder. Among them, the impact of filling on image blocks is reduced, and the image quality is improved. As shown in FIG. 9 , FIG. 9 is another schematic flowchart of image filling in an embodiment of the present application.
若余数g小于
Figure PCTCN2021101807-appb-000131
则在第一图像的两侧填充图像中值,填充的图像中值的宽度为
Figure PCTCN2021101807-appb-000132
If the remainder g is less than
Figure PCTCN2021101807-appb-000131
Then fill the image median value on both sides of the first image, and the width of the filled image median value is
Figure PCTCN2021101807-appb-000132
本申请通过分割第一图像,获得N个第一图块,从不同的第一图块中获取各自的均值,然后利用均值对第一重构图块进行补偿,以实现凸显第一图像局部特性的目的。特别地,N个第一图块包括第一目标图块,第一目标图块的像素值的范围小于第一图像的像素值的范围。在从N个第一图块中获取N个第一自适应数据之前,所述方法还包括:云端设备反量化第一目标图块的像素值。云端设备从反量化后的第一目标图块获取N个第一自适应数据。通过反量化第一目标图块的像素值,进一步凸第一图像的显局部特性。任一第一图块 都可以理解为第一图像的局部特性,通过凸显第一图像的局部特性,可以提升图像的重构质量,即提升图像的压缩质量。如图10所示,图10为本申请实施例中图像压缩质量的对比示意图。横坐标表示每像素的比特数(bit-per-pixel,BPP),用于度量码率。纵坐标表示峰值信噪比(peak signal-to-noise ratio,PSNR),用于度量质量。与本申请实施例中的图像处理方法对比的压缩算法包括JPEG2000、HEVC(high efficiency video coding)和VVC(versatile video coding)标准的不同实现。对于JPEG2000,采用参考软件OpenJPEG来表示其压缩性能。同时,将集成在Matlab中的实现作为JPEG2000压缩性能的补充。对于HEVC,采用参考软件HM-16.15来反映率失真(RD)性能。使用VVC标准参考软件VTM-6.2表示VVC标准的性能。需要注意的是,在VTM-6.2的编码配置中,将输入图像位深度和内部计算位深度设置为8,以与输入图像的格式兼容,使用全帧内(AI)配置对测试图像进行编码。各种压缩算法的率失真性能如图10所示,OpenJPEG的率失真性能曲线为1001,JPEG2000的Matlab实现的率失真性能曲线为1002,参考软件HM-16.15的420图像格式压缩的性能曲线为1003,未分块的卷积神经网络图像压缩算法的性能曲线为1004,本发明的性能曲线为1005,参考软件VTM-6.2的420图像格式压缩的性能曲线为1006。The present application obtains N first image blocks by dividing the first image, obtains respective average values from different first image blocks, and then uses the average value to compensate the first reconstructed image block, so as to highlight the local characteristics of the first image the goal of. In particular, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image. Before acquiring the N first adaptive data from the N first image blocks, the method further includes: the cloud device inversely quantizes the pixel value of the first target image block. The cloud device obtains N pieces of first adaptive data from the inverse quantized first target image block. By inverse quantizing the pixel value of the first target image block, the local characteristic of the first image is further enhanced. Any first block can be understood as a local characteristic of the first image, and by highlighting the local characteristic of the first image, the reconstruction quality of the image can be improved, that is, the compression quality of the image can be improved. As shown in FIG. 10 , FIG. 10 is a schematic diagram for comparison of image compression quality in an embodiment of the present application. The abscissa represents the number of bits per pixel (bit-per-pixel, BPP), which is used to measure the code rate. The ordinate represents the peak signal-to-noise ratio (PSNR), which is used to measure the quality. The compression algorithms compared with the image processing methods in the embodiments of the present application include different implementations of the JPEG2000, HEVC (high efficiency video coding) and VVC (versatile video coding) standards. For JPEG2000, the reference software OpenJPEG is used to represent its compression performance. At the same time, the implementation integrated in Matlab is used as a supplement to the compression performance of JPEG2000. For HEVC, the reference software HM-16.15 is used to reflect the rate-distortion (RD) performance. The performance of the VVC standard is expressed using the VVC standard reference software VTM-6.2. It should be noted that in the encoding configuration of VTM-6.2, the input image bit depth and the intra-computed bit depth are set to 8 to be compatible with the format of the input image, and the test image is encoded using the full intra (AI) configuration. The rate-distortion performance of various compression algorithms is shown in Figure 10. The rate-distortion performance curve of OpenJPEG is 1001, the rate-distortion performance curve implemented by Matlab of JPEG2000 is 1002, and the performance curve of 420 image format compression of the reference software HM-16.15 is 1003 , the performance curve of the unblocked convolutional neural network image compression algorithm is 1004, the performance curve of the present invention is 1005, and the performance curve of the 420 image format compression of the reference software VTM-6.2 is 1006.
上面对本申请实施例中的图像处理方法进行了描述,下面对本申请实施例中的图像处理系统进行描述。The image processing method in the embodiment of the present application is described above, and the image processing system in the embodiment of the present application is described below.
请参阅图11,图11为本申请实施例提供的图像处理系统的一种系统架构图,在图11中,图像处理系统200包括执行设备210、训练设备220、数据库230、客户设备240和数据存储系统250,执行设备210中包括计算模块211。Please refer to FIG. 11 . FIG. 11 is a system architecture diagram of an image processing system provided by an embodiment of the application. In FIG. 11 , the image processing system 200 includes an execution device 210 , a training device 220 , a database 230 , a client device 240 and data The storage system 250 includes a computing module 211 in the execution device 210 .
其中,数据库230中存储有第一图像集合,可选地,数据库230中还包括第四图像合集。训练设备220生成用于处理第一图像和/或第四图像的目标模型/规则201,并利用数据库中的第一图像和/或第四图像对目标模型/规则201进行迭代训练,得到成熟的目标模型/规则201。本申请实施例中以目标模型/规则201包括编码神经网络和解码神经网络,可选地,目标模型/规则201还包括融合神经网络。Wherein, the database 230 stores the first image collection, and optionally, the database 230 further includes a fourth image collection. The training device 220 generates a target model/rule 201 for processing the first image and/or the fourth image, and uses the first image and/or the fourth image in the database to iteratively train the target model/rule 201 to obtain a mature Target Model/Rule 201. In this embodiment of the present application, the target model/rule 201 includes an encoding neural network and a decoding neural network. Optionally, the target model/rule 201 further includes a fusion neural network.
训练设备220得到的编码神经网络和解码神经网络可以应用不同的系统或设备中,例如手机、平板、笔记本电脑、VR设备、监控系统等等。其中,执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。数据存储系统250可以置于执行设备210中,也可以为数据存储系统250相对执行设备210是外部存储器。The encoding neural network and decoding neural network obtained by training the device 220 can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, monitoring systems, and so on. The execution device 210 may call data, codes, etc. in the data storage system 250 , and may also store data, instructions, etc. in the data storage system 250 . The data storage system 250 may be placed in the execution device 210 , or the data storage system 250 may be an external memory relative to the execution device 210 .
计算模块211接收客户设备240发送的第一图像,分割第一图像,以获得N个第一图块,从N个第一图块中提取N个第一自适应数据,利用N个第一自适应数据对N个第一图块进行预处理,然后通过编码神经网络对预处理后的N个第一图块进行特征提取,得到N组第一特征图,对得到的N组第一特征图进行量化和熵编码,得到N个编码表,N为大于1的整数。The calculation module 211 receives the first image sent by the client device 240, divides the first image to obtain N first image blocks, extracts N first adaptive data from the N first image blocks, and uses the N first self-adaptive data. Adapt the data to preprocess the N first image blocks, and then perform feature extraction on the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps. Perform quantization and entropy encoding to obtain N encoding tables, where N is an integer greater than 1.
计算模块211还可以对N个编码表示进行熵解码,得到N组第二特征图,然后通过解码解码神经网络处理N组第二特征组,得到N个第一重构图块。得到N个第一重构图块后,利用N个第一自适应数据补偿该N个第一重构图块。计算模块211组合N个第一重构图块,得到第二图像。可选地,当目标模型/规则201还包括融合神经网络时,计算模块211还可以使用融合神经网络处理第二图像,以得到第三图像。其中,融合神经网络用于降低第二 图像和第一图像的差异,该差异包括块效应。The computing module 211 may further perform entropy decoding on the N encoded representations to obtain N groups of second feature maps, and then process the N groups of second feature groups through a decoding and decoding neural network to obtain N first reconstructed image blocks. After the N first reconstructed image blocks are obtained, the N first reconstructed image blocks are compensated by using the N first adaptive data. The calculation module 211 combines the N first reconstructed image blocks to obtain a second image. Optionally, when the target model/rule 201 further includes a fusion neural network, the computing module 211 may also use the fusion neural network to process the second image to obtain the third image. Among them, the fusion neural network is used to reduce the difference between the second image and the first image, the difference including blocking effect.
本申请的一些实施例中,请参阅图11,执行设备210和终端设备240可以为分别独立的设备,执行设备210配置有I/O接口212,与终端设备240进行数据交互,“用户”可以通过终端设备240向I/O接口212输入第一图像,执行设备210通过I/O接口212将第二图像返回给终端设备240,提供给用户。除此之外,终端设备240与执行设备210的关系可以通过终端设备与编码端,解码端的关系来描述。编码端为使用编码神经网络的设备,解码端为使用解码神经网络的设备,编码端和解码端可以是同一个设备,也可以是独立的设备。终端设备与上述图像处理方法中的终端设备类似,终端设备可以是编码端和/或解码端。为了便于理解终端设备240与执行设备210的关系,可以参考前述图2a-图2c的相关描述。In some embodiments of the present application, referring to FIG. 11 , the execution device 210 and the terminal device 240 may be separate devices. The execution device 210 is configured with an I/O interface 212 for data interaction with the terminal device 240 , and a “user” may The first image is input to the I/O interface 212 through the terminal device 240 , and the execution device 210 returns the second image to the terminal device 240 through the I/O interface 212 to provide it to the user. Besides, the relationship between the terminal device 240 and the execution device 210 can be described by the relationship between the terminal device and the encoder and the decoder. The encoding end is a device that uses an encoding neural network, and the decoding end is a device that uses a decoding neural network. The encoding end and the decoding end can be the same device or independent devices. The terminal device is similar to the terminal device in the above image processing method, and the terminal device may be an encoding end and/or a decoding end. In order to facilitate understanding of the relationship between the terminal device 240 and the execution device 210, reference may be made to the foregoing related descriptions of FIGS. 2a-2c.
值得注意的,图11仅是本发明实施例提供的图像处理系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备210可以配置于终端设备240中,作为示例,例如当终端设备为手机或平板时,执行设备210可以为手机或平板的主处理器(Host CPU)中用于进行阵列图像处理的模块,执行设备210也可以为手机或平板中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到主处理器上,由主处理器分配任务。It is worth noting that FIG. 11 is only a schematic structural diagram of an image processing system provided by an embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in other embodiments of the present application, the execution device 210 may be configured in the terminal device 240. As an example, for example, when the terminal device is a mobile phone or a tablet, the execution device 210 may be the main processor (Host Processor) of the mobile phone or tablet. A module in the CPU) for performing array image processing, and the execution device 210 may also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet, and the GPU or NPU is linked as a coprocessor. Loaded to the main processor, the main processor assigns tasks.
结合上述描述,下面开始对本申请实施例提供的图像处理方法的训练阶段的具体实现流程进行描述。With reference to the above description, the following begins to describe the specific implementation process of the training phase of the image processing method provided by the embodiment of the present application.
具体的,请参阅图12,图12为本申请实施例提供的模型训练方法的一种流程示意图,本申请实施例提供的模型训练方法可以包括:Specifically, please refer to FIG. 12. FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application. The model training method provided by the embodiment of the present application may include:
在步骤1201中,训练设备获取第一图像。In step 1201, the training device acquires a first image.
在步骤1202中,训练设备分割所述第一图像,获得N个第一图块,N为大于1的整数。In step 1202, the training device divides the first image to obtain N first image blocks, where N is an integer greater than 1.
在步骤1203中,训练设备从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应。In step 1203, the training device obtains N pieces of first adaptive data from the N first image blocks, and the N first adaptive data corresponds to the N first image blocks one-to-one.
在步骤1204中,训练设备根据所述N个第一自适应数据对所述N个第一图块进行预处理。In step 1204, the training device preprocesses the N first tiles according to the N first adaptive data.
在步骤1205中,训练设备通过第一编码神经网络处理预处理后的N个第一图块,得到N组第一特征图。In step 1205, the training device processes the preprocessed N first image blocks through the first coding neural network to obtain N groups of first feature maps.
在步骤1206中,训练设备对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示。In step 1206, the training device performs quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
在步骤1207中,训练设备对所述N个第一编码表示进行熵解码,得到N组第二特征图。In step 1207, the training device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
在步骤1208中,训练设备通过第一解码神经网络处理所述N组第二特征图,得到N个第一重构图块。In step 1208, the training device processes the N groups of second feature maps through the first decoding neural network to obtain N first reconstructed image blocks.
在步骤1209中,训练设备通过所述N个第一自适应数据补偿所述N个第一重构图块。In step 1209, the training device compensates the N first reconstructed tiles by using the N first adaptive data.
在步骤1210中,训练设备组合补偿后的N个第一重构图块,得到第二图像。In step 1210, the training device combines the compensated N first reconstructed blocks to obtain a second image.
在步骤1211中,训练设备获取所述第二图像相对于所述第一图像的失真损失。In step 1211, the training device obtains the distortion loss of the second image relative to the first image.
在步骤1212中,训练设备利用损失函数对模型进行联合训练,直至所述第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述模型包括所述第一编码神经网络、 量化网络、熵编码网络、熵解码网络、所述第一解码神经网络。In step 1212, the training device uses the loss function to jointly train the model until the image distortion value between the first image and the second image reaches a first preset level, and the model includes the first code A neural network, a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network.
请参阅图13,图13为本申请实施例提供的一种训练过程的示意图。实施例中模型的损失函数为:Please refer to FIG. 13 , which is a schematic diagram of a training process provided by an embodiment of the present application. The loss function of the model in the embodiment is:
loss=l d+P×l r loss=l d +P×l r
上面的损失函数中,l d表示第一编码表示的信息熵。P×l r用于表示第一图像与第二图像之间的失真度量,l r表示第一图像和第二图像的失真损失,P表示了两个损失函数之间的平衡因子,用来刻画第一编码表示与重构图像质量的相对关系。 In the above loss function, ld represents the information entropy represented by the first code. P×l r is used to represent the distortion metric between the first image and the second image, l r represents the distortion loss of the first image and the second image, P represents the balance factor between the two loss functions, used to describe The first encoding represents a relative relationship with the reconstructed image quality.
可选地,为了获得合适的分块大小,训练过程包括在多次迭代训练中,将第一图像分割成不同大小的图块,即N的数值不同。将多次迭代得到的损失函数进行对比,对第一图块的大小进行优化。Optionally, in order to obtain an appropriate block size, the training process includes dividing the first image into blocks of different sizes in multiple iterations of training, that is, the values of N are different. The size of the first block is optimized by comparing the loss functions obtained from multiple iterations.
在步骤1213中,训练设备输出第二编码神经网络和第二解码神经网络,所述第二编码神经网络为所述第一编码神经网络执行过迭代训练后得到的模型,所述第二解码神经网络为所述第一解码神经网络执行过迭代训练后得到的模型。In step 1213, the training device outputs a second encoding neural network and a second decoding neural network, the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network, and the second decoding neural network The network is a model obtained by performing iterative training on the first decoding neural network.
步骤1201至步骤1211的具体描述可以参照上述图像处理方法中的描述。For specific descriptions of steps 1201 to 1211, reference may be made to the descriptions in the above image processing method.
可选地,所述方法还包括:Optionally, the method further includes:
训练设备量化所述N个第一自适应数据,获得N个第一自适应量化数据,所述N个第一自适应量化数据用于对所述N个第一重构图块进行补偿。The training device quantizes the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, where the N pieces of first adaptive quantization data are used to compensate the N pieces of first reconstructed image blocks.
可选地,所述N越大,单个第一自适应量化数据的信息熵越小。Optionally, the larger the N, the smaller the information entropy of the single first adaptive quantization data.
可选地,所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序相同,所述N个第一图块的排列顺序为所述N个第一图块在所述第一图像中的排列顺序。Optionally, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in the arrangement order in the first image.
可选地,训练设备通过融合神经网络处理所述第二图像,得到第三图像,以降低所述第二图像与所述第一图像的差异,所述差异包括块效应;Optionally, the training device processes the second image through a fusion neural network to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness;
训练设备具体用于获取所述第三图像相对于所述第一图像的失真损失;The training device is specifically configured to obtain the distortion loss of the third image relative to the first image;
所述模型包括融合神经网络。The model includes a fusion neural network.
可选地,所述N个第一图块中的每个第一图块的大小相同。Optionally, each of the N first tiles has the same size.
可选地,在两次迭代训练中,训练用的第一图像的大小不同,第一图块的大小为固定值。Optionally, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.
可选地,所述第一图块的像素为a×b,所述a和所述b是根据目标像素得到的,所述目标像素为c×d,
Figure PCTCN2021101807-appb-000133
等于整数,
Figure PCTCN2021101807-appb-000134
等于整数,所述a和c为宽度方向上的像素点数量,所述b和d为高度方向上的像素点数量,所述目标像素是根据终端设备的目标分辨率得到的,所述终端设备包括摄像部件,所述摄像部件在所述目标分辨率的设置下得到的图像的像素为所述目标像素,第一图像由所述摄像部件获得。
Optionally, the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,
Figure PCTCN2021101807-appb-000133
is equal to an integer,
Figure PCTCN2021101807-appb-000134
is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.
可选地,所述目标分辨率是根据摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。Optionally, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in the camera application.
可选地,所述目标分辨率是根据所述摄像部件得到的图库中的目标图像组得到的,所述目标图像组的像素为所述目标像素,在不同像素的图像组中,所述目标图像组在所述图库中的比值最大。Optionally, the target resolution is obtained according to the target image group in the gallery obtained by the camera component, the pixels of the target image group are the target pixels, and in the image groups of different pixels, the target image group is Image groups have the largest ratio in the gallery.
可选地,通过所述摄像部件得到多个像素的图像,所述多个像素为e×f,
Figure PCTCN2021101807-appb-000135
等于整数,
Figure PCTCN2021101807-appb-000136
等于整数,所述e包括所述c,所述f包括所述d。
Optionally, an image of multiple pixels is obtained by the imaging component, and the multiple pixels are e×f,
Figure PCTCN2021101807-appb-000135
is equal to an integer,
Figure PCTCN2021101807-appb-000136
equal to an integer, the e includes the c and the f includes the d.
可选地,所述多个像素是通过所述摄像应用中的设置界面对所述摄像部件的分辨率设置得到的。Optionally, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
可选地,所述第一图块的像素为a×b,所述a为宽度方向上的像素点数量,所述b为高度方向上的像素点数量,所述第一图像的像素为r×t;Optionally, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, and the pixels of the first image are r ×t;
在获取所述第一图像后,在分割所述第一图像之前,所述方法还包括:After acquiring the first image, and before segmenting the first image, the method further includes:
Figure PCTCN2021101807-appb-000137
不等于整数,和/或
Figure PCTCN2021101807-appb-000138
不等于整数,则用像素中值填充所述第一图像的边缘,使得
Figure PCTCN2021101807-appb-000139
等于整数,
Figure PCTCN2021101807-appb-000140
等于整数,填充后的所述第一图像的像素为r1×t1。
like
Figure PCTCN2021101807-appb-000137
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000138
is not equal to an integer, then fill the edges of the first image with the pixel median such that
Figure PCTCN2021101807-appb-000139
is equal to an integer,
Figure PCTCN2021101807-appb-000140
equal to an integer, the pixels of the first image after filling are r1×t1.
可选地,在获取所述第一图像后,在填充所述第一图像的边缘之前,所述方法还包括:Optionally, after acquiring the first image, before filling the edge of the first image, the method further includes:
若所述
Figure PCTCN2021101807-appb-000141
不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
Figure PCTCN2021101807-appb-000142
等于整数;
if said
Figure PCTCN2021101807-appb-000141
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
Figure PCTCN2021101807-appb-000142
is equal to an integer;
所述若
Figure PCTCN2021101807-appb-000143
不等于整数,和/或
Figure PCTCN2021101807-appb-000144
不等于整数,则用像素中值填充所述第一图像的边缘包括:
said if
Figure PCTCN2021101807-appb-000143
not equal to an integer, and/or
Figure PCTCN2021101807-appb-000144
is not equal to an integer, then filling the edge of the first image with the median value of pixels includes:
Figure PCTCN2021101807-appb-000145
不等于整数,则用像素中值填充所述第一图像的边缘。
like
Figure PCTCN2021101807-appb-000145
not equal to an integer, then fill the edges of the first image with the pixel median.
可选地,在等比放大r和t后,若
Figure PCTCN2021101807-appb-000146
不等于整数,则获取
Figure PCTCN2021101807-appb-000147
的余数。若所述余数大于
Figure PCTCN2021101807-appb-000148
则训练设备只在所述第一图像的宽度方向的一侧填充所述像素中值。
Optionally, after proportionally enlarging r and t, if
Figure PCTCN2021101807-appb-000146
not equal to an integer, get
Figure PCTCN2021101807-appb-000147
the remainder. If the remainder is greater than
Figure PCTCN2021101807-appb-000148
Then the training device only fills the pixel median value on one side of the width direction of the first image.
可选地,若所述余数小于
Figure PCTCN2021101807-appb-000149
则在所述第一图像的宽度方向的二侧填充所述像素中值,使得每侧填充的所述像素中值的宽度为
Figure PCTCN2021101807-appb-000150
其中,所述g为所述余数。
Optionally, if the remainder is less than
Figure PCTCN2021101807-appb-000149
Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is
Figure PCTCN2021101807-appb-000150
Wherein, the g is the remainder.
可选地,所述N个第一图块包括第一目标图块,所述第一目标图块的像素值的范围小于所述第一图像的像素值的范围;Optionally, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image;
在从所述N个第一图块中获取N个第一自适应数据之前,所述方法还包括:Before acquiring the N first adaptive data from the N first tiles, the method further includes:
训练设备反量化所述第一目标图块的像素值;The training device inversely quantizes the pixel value of the first target image block;
训练设备具体用于从反量化后的所述第一目标图块获取一个第一自适应数据。The training device is specifically configured to acquire a piece of first adaptive data from the inverse quantized first target image block.
在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图14,图14为本申请实施例提供的编码装置1400的一个结构示意图,编码装置1400对应编码端,编码装置1400可以是终端设备或云端设备,编码装置1400包括:On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiments of the present application, related equipment for implementing the above solutions is also provided below. Referring specifically to FIG. 14, FIG. 14 is a schematic structural diagram of an encoding apparatus 1400 provided by an embodiment of the present application. The encoding apparatus 1400 corresponds to an encoding terminal, and the encoding apparatus 1400 may be a terminal device or a cloud device. The encoding apparatus 1400 includes:
第一获取模块1401,用于获取第一图像;a first acquisition module 1401, configured to acquire a first image;
分割模块1402,用于分割第一图像,获得N个第一图块,N为大于1的整数;A segmentation module 1402, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;
第二获取模块1403,用于从N个第一图块中获取N个第一自适应数据,N个第一自适应数据与N个第一图块一一对应;The second obtaining module 1403 is configured to obtain N pieces of first adaptive data from the N first picture blocks, and the N pieces of first adaptive data are in one-to-one correspondence with the N first picture blocks;
预处理模块1404,用于根据N个第一自适应数据对N个第一图块进行预处理;a preprocessing module 1404, configured to preprocess the N first image blocks according to the N first adaptive data;
编码神经网络模块1405,通过编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The coding neural network module 1405 processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps;
量化和熵编码模块1406,用于对N组第一特征图进行量化和熵编码,得到N个第一编码表示。可选地,编码装置还用执行前述图3a对应的实施例中云端设备执行的全部或部分操作。The quantization and entropy encoding module 1406 is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations. Optionally, the encoding apparatus may also perform all or part of the operations performed by the cloud device in the embodiment corresponding to FIG. 3a.
上面对本申请实施例中的编码装置进行了描述,在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供对本申请实施例中的解码装置进行描述。具体参阅图15,图15为本申请实施例提供的解码装置1500的一个结构示意图,解码装置1500对应解码端,解码装置1500可以是终端设备或云端设备,解码装置1500包括:The encoding device in the embodiment of the present application has been described above. On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiment of the present application, the following also provides a description of the encoding device in the embodiment of the present application. The decoding device is described. Referring specifically to FIG. 15, FIG. 15 is a schematic structural diagram of a decoding apparatus 1500 provided by an embodiment of the present application. The decoding apparatus 1500 corresponds to a decoding end, and the decoding apparatus 1500 may be a terminal device or a cloud device. The decoding apparatus 1500 includes:
获取模块1501,用于获取N个第一编码表示,N个第一自适应数据和对应关系,对应关系包括N个第一自适应数据和N个第一编码表示的对应关系,N个第一自适应数据与N个第一编码一一对应,N为大于1的整数;The obtaining module 1501 is configured to obtain N first coded representations, N first adaptive data and corresponding relationships, where the corresponding relationships include correspondences between N first adaptive data and N first coded representations, and N first adaptive data representations. The adaptive data is in one-to-one correspondence with N first codes, where N is an integer greater than 1;
熵解码模块1502,对N个第一编码表示进行熵解码,得到N组第二特征图;The entropy decoding module 1502 performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
解码神经网络模块1503,用于处理N组第二特征图,得到N个第一重构图块;The decoding neural network module 1503 is used to process N groups of second feature maps to obtain N first reconstructed image blocks;
补偿模块1504,用于通过N个第一自适应数据补偿N个第一重构图块;a compensation module 1504, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;
组合模块1505,用于组合补偿后的N个第一重构图块,得到第二图像。The combining module 1505 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
可选地,解码装置还用执行前述图3a对应的实施例中终端设备执行的全部或部分操作。Optionally, the decoding apparatus may also perform all or part of the operations performed by the terminal device in the embodiment corresponding to FIG. 3a.
上面对本申请实施例中的解码装置进行了描述,在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供对本申请实施例中的训练装置进行描述。具体参阅图16,图16为本申请实施例提供的训练装置1600的一个结构示意图,训练装置1600包括:The decoding apparatus in the embodiments of the present application has been described above. On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiments of the present application, the following also provides a description of the embodiments of the present application. The training device is described. Referring specifically to FIG. 16, FIG. 16 is a schematic structural diagram of a training apparatus 1600 provided by an embodiment of the present application. The training apparatus 1600 includes:
第一获取模块1601,用于获取第一图像。The first acquisition module 1601 is used to acquire a first image.
分割模块1602,用于分割所述第一图像,获得N个第一图块,N为大于1的整数。A segmentation module 1602, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1.
第二获取模块1603,用于从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应。The second obtaining module 1603 is configured to obtain N pieces of first adaptive data from the N first image blocks, where the N pieces of first adaptive data are in one-to-one correspondence with the N first image blocks.
预处理模块1604,用于根据所述N个第一自适应数据对所述N个第一图块进行预处理;A preprocessing module 1604, configured to preprocess the N first image blocks according to the N first adaptive data;
第一编码神经网络模块1605,用于处理预处理后的N个第一图块,得到N组第一特征图。The first coding neural network module 1605 is configured to process the pre-processed N first image blocks to obtain N groups of first feature maps.
量化和熵编码模块1606,用于对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示。A quantization and entropy encoding module 1606, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
熵解码模块1607,对所述N个第一编码表示进行熵解码,以得到N组第二特征图。The entropy decoding module 1607 performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
第一解码神经网络模块1608,用于处理所述N组第二特征图,得到N个第一重构图块。The first decoding neural network module 1608 is configured to process the N groups of second feature maps to obtain N first reconstructed image blocks.
补偿模块1609,用于通过所述N个第一自适应数据补偿所述N个第一重构图块。 Compensation module 1609, configured to compensate the N first reconstructed image blocks by using the N first adaptive data.
组合模块1610,用于组合补偿后的N个第一重构图块,得到第二图像。The combining module 1610 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
第三获取模块1611,用于获取所述第二图像相对于所述第一图像的失真损失。The third acquiring module 1611 is configured to acquire the distortion loss of the second image relative to the first image.
训练模块1612,用于利用损失函数对模型进行联合训练,直至所述第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述模型包括所述第一编码神经网络、量化网络、熵编码网络、熵解码网络、所述第一解码神经网络。可选地,所述模型还包括分割网络,分割网络中的可训练的参数为第一图块的大小。可选地,所述模型还包括分割网络,分割网络中的可训练的参数为第一图块的大小。A training module 1612, configured to jointly train a model by using a loss function, until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding nerve network, quantization network, entropy encoding network, entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.
输出模块1613,用于输出第二编码神经网络和第二解码神经网络,所述第二编码神经网络为所述第一编码神经网络执行过迭代训练后得到的模型,所述第二解码神经网络为所述第一解码神经网络执行过迭代训练后得到的模型。The output module 1613 is used for outputting a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained by performing iterative training on the first coding neural network, and the second decoding neural network A model obtained after performing iterative training for the first decoding neural network.
可选地,所述训练装置还用于执行前述图3a对应的实施例中终端设备和/或云端设备执行的全部或部分操作。Optionally, the training apparatus is further configured to perform all or part of the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
在第六方面的一种可选设计中,所述N个第一图块包括第一目标图块,所述第一目标图块的像素值的范围小于所述第一图像的像素值的范围;In an optional design of the sixth aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
所述装置还包括:The device also includes:
反量化模块,用于反量化所述第一目标图块的像素值;an inverse quantization module for inverse quantization of the pixel value of the first target image block;
所述第二获取模块1603具体用于从反量化后的所述第一目标图块获取一个第一自适应数据。The second obtaining module 1603 is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.
接下来介绍本申请实施例提供的一种执行设备,请参阅图17,图17为本申请实施例提供的执行设备的一种结构示意图,执行设备1700具体可以表现为虚拟现实VR设备、手机、平板、笔记本电脑、智能穿戴设备、监控数据处理设备、服务器等,此处不做限定。其中,执行设备1700上可以部署有图14对应实施例中所描述的编码装置和/或图15对应实施例中所描述的解码装置,用于实现图14和/或图15对应实施例中装置的功能。具体的,执行设备1700包括:接收器1701、发射器1702、处理器1703和存储器1704(其中执行设备1700中的处理器1703的数量可以一个或多个,图17中以一个处理器为例),其中,处理器1703可以包括应用处理器17031和通信处理器17032。在本申请的一些实施例中,接收器1701、发射器1702、处理器1703和存储器1704可通过总线或其它方式连接。Next, an execution device provided by an embodiment of the present application will be introduced. Please refer to FIG. 17. FIG. 17 is a schematic structural diagram of the execution device provided by the embodiment of the present application. The execution device 1700 may specifically be represented as a virtual reality VR device, a mobile phone, Tablets, laptops, smart wearable devices, monitoring data processing devices, servers, etc., are not limited here. The encoding apparatus described in the corresponding embodiment of FIG. 14 and/or the decoding apparatus described in the corresponding embodiment of FIG. 15 may be deployed on the execution device 1700 to implement the apparatus in the corresponding embodiment of FIG. 14 and/or FIG. 15 function. Specifically, the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (wherein the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17 ) , wherein the processor 1703 may include an application processor 17031 and a communication processor 17032 . In some embodiments of the present application, the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or otherwise.
存储器1704可以包括只读存储器和随机存取存储器,并向处理器1703提供指令和数据。存储器1704的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1704存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。Memory 1704 may include read-only memory and random access memory, and provides instructions and data to processor 1703 . A portion of memory 1704 may also include non-volatile random access memory (NVRAM). The memory 1704 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
处理器1703控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1703 controls the operation of the execution device. In a specific application, various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.
上述本申请实施例揭示的方法可以应用于处理器1703中,或者由处理器1703实现。处理器1703可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1703中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1703可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1703可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1704,处理器1703读取存储器1704中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703 . The processor 1703 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1703 or an instruction in the form of software. The above-mentioned processor 1703 may be a general-purpose processor, a digital signal processing (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1703 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.
接收器1701可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及 功能控制有关的信号输入。发射器1702可用于通过第一接口输出数字或字符信息;发射器1702还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1702还可以包括显示屏等显示设备。The receiver 1701 can be used to receive input numerical or character information, and to generate signal input related to performing relevant settings and function control of the device. The transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include a display device such as a display screen .
本申请实施例中,在一种情况下,处理器1703,用于执行图3a对应实施例中的终端设备和/或云端设备执行的操作。In this embodiment of the present application, in one case, the processor 1703 is configured to perform the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
可选地,应用处理器17031,用于获取第一图像;Optionally, the application processor 17031 is configured to acquire the first image;
分割第一图像,获得N个第一图块,N为大于1的整数;Divide the first image to obtain N first image blocks, where N is an integer greater than 1;
从N个第一图块中获取N个第一自适应数据,N个第一自适应数据与N个第一图块一一对应;Obtain N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
根据N个第一自适应数据对N个第一图块进行预处理;preprocessing the N first image blocks according to the N first adaptive data;
通过编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The preprocessed N first image blocks are processed by the coding neural network to obtain N groups of first feature maps;
对N组第一特征图进行量化和熵编码,以获得N个第一编码表示;Perform quantization and entropy encoding on N groups of first feature maps to obtain N first encoded representations;
除此之外,应用处理器17031还可以用于执行上述图3a对应的实施例中的云端设备可以执行的全部或部分操作。Besides, the application processor 17031 can also be used to perform all or part of the operations that can be performed by the cloud device in the embodiment corresponding to FIG. 3a.
可选地,应用处理器17031,用于获取N个第一编码表示;Optionally, an application processor 17031, configured to obtain N first encoded representations;
对N个第一编码表示进行熵解码,以获得N组第二特征图;performing entropy decoding on the N first encoded representations to obtain N sets of second feature maps;
通过解码神经网络处理N组第二特征图,以获得N个第一重构图块;Process N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks;
通过N个第一自适应数据补偿N个第一重构图块;Compensate the N first reconstructed image blocks by using the N first adaptive data;
组合补偿后的N个第一重构图块,以获得第二图像;combining the compensated N first reconstructed blocks to obtain a second image;
除此之外,应用处理器17031还可以用于执行上述图3a对应的实施例中的终端设备可以执行的全部或部分操作。Besides, the application processor 17031 can also be used to perform all or part of the operations that can be performed by the terminal device in the embodiment corresponding to FIG. 3a.
本申请实施例还提供了一种训练设备,请参阅图18,图18是本申请实施例提供的训练设备一种结构示意图,训练设备1800上可以部署有图16对应实施例中所描述的训练装置,用于实现图16对应实施例中训练装置的功能,具体的,训练设备1800由一个或多个服务器实现,训练设备1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在训练设备1800上执行存储介质1830中的一系列指令操作。This embodiment of the present application also provides a training device. Please refer to FIG. 18 . FIG. 18 is a schematic structural diagram of the training device provided by the embodiment of the present application. The training device 1800 can be deployed with the training described in the corresponding embodiment of FIG. 16 . The device is used to realize the function of the training device in the embodiment corresponding to FIG. 16 . Specifically, the training device 1800 is implemented by one or more servers. The training device 1800 may vary greatly due to different configurations or performances, and may include one or more servers. One or more central processing units (CPUs) 1822 (eg, one or more processors) and memory 1832, one or more storage media 1830 (eg, one or more mass storage devices) that store applications 1842 or data 1844 equipment). Among them, the memory 1832 and the storage medium 1830 may be short-term storage or persistent storage. The program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1822 may be configured to communicate with the storage medium 1830 to execute a series of instruction operations in the storage medium 1830 on the training device 1800 .
训练设备1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。 Training device 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本申请实施例中,中央处理器1822,用于执行图16对应实施例中的训练装置执行的全部或部分操作。In this embodiment of the present application, the central processing unit 1822 is configured to perform all or part of the operations performed by the training device in the embodiment corresponding to FIG. 16 .
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图17所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图18所示实施例描述的方法中训练设备所执行的步骤。Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps executed by the execution device in the method described in the foregoing embodiment shown in FIG. 17 , or causes the computer to execute steps such as The steps performed by the training device in the method described in the foregoing embodiment shown in FIG. 18 .
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图17所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图18所示实施例描述的方法中训练设备所执行的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, it causes the computer to execute the embodiment shown in FIG. 17 above. Perform the steps performed by the device in the described method, or cause the computer to perform the steps performed by the training device in the method described in the embodiment shown in FIG. 18 .
本申请实施例提供的执行设备、训练设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述图3a所示实施例描述的终端设备和/或云端设备执行的操作,或者,以使训练设备内的芯片执行上述图13所示实施例描述的模型训练方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The execution device and the training device provided by the embodiments of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the operations performed by the terminal device and/or the cloud device described in the embodiment shown in FIG. 3a, or, so that the chip in the training device executes Execute the model training method described in the embodiment shown in FIG. 13 above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体的,请参阅图19,图19为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU2000,NPU2000作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 19. FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU2000, and the NPU2000 is mounted as a co-processor to the host CPU (Host CPU) , tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路2003内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。In some implementations, the arithmetic circuit 2003 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。Unified memory 2006 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002. Input data is also transferred to unified memory 2006 via the DMAC.
BIU为Bus Interface Unit即,总线接口单元2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.
总线接口单元2010(Bus Interface Unit,简称BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on, if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。 例如,向量计算单元2007可以将线性函数和/或非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 . For example, the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 2007 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;The instruction fetch memory (instruction fetch buffer) 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004;
统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。Unified memory 2006, input memory 2001, weight memory 2002 and instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which may be a personal computer, training device, or network device, etc.) to execute the various embodiments of this application. method.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data The center transmits to another website site, computer, training equipment or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Claims (31)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取第一图像;get the first image;
    分割所述第一图像,获得N个第一图块,N为大于1的整数;dividing the first image to obtain N first image blocks, where N is an integer greater than 1;
    从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应;Obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
    根据所述N个第一自适应数据对所述N个第一图块进行预处理;preprocessing the N first image blocks according to the N first adaptive data;
    通过编码神经网络处理预处理后的N个第一图块,得到N组第一特征图;The preprocessed N first image blocks are processed by the coding neural network to obtain N groups of first feature maps;
    对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示。Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  2. 根据权利要求1所述的方法,其特征在于,所述N个第一编码表示用于进行熵解码,得到N组第二特征图,所述N组第二特征图用于通过解码神经网络处理,得到N个第一重构图块,所述N个第一自适应数据用于对所述N个第一重构图块进行补偿,补偿后的N个第一重构图块用于组合成第二图像。The method according to claim 1, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing by a decoding neural network , to obtain N first reconstructed image blocks, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used for combination into the second image.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises:
    向解码端发送所述N个第一编码表示,所述N个第一自适应数据和对应关系,所述对应关系包括所述N个第一自适应数据和所述N个第一编码表示的对应关系。Send the N first coded representations, the N first adaptive data and the corresponding relationship to the decoding end, where the corresponding relationship includes the N first adaptive data and the N first coded representations Correspondence.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, wherein the method further comprises:
    量化所述N个第一自适应数据,获得N个第一自适应量化数据,所述N个第一自适应量化数据用于对所述N个第一重构图块进行补偿;Quantizing the N first adaptive data to obtain N first adaptive quantization data, where the N first adaptive quantization data is used to compensate the N first reconstructed image blocks;
    所述向解码端发送所述N个第一自适应数据包括:The sending the N pieces of first adaptive data to the decoding end includes:
    向所述解码端发送所述N个第一自适应量化数据。Send the N first adaptive quantization data to the decoding end.
  5. 根据权利要求4所述的方法,其特征在于,所述N越大,单个第一自适应量化数据的信息熵越小。The method according to claim 4, wherein the larger the N is, the smaller the information entropy of the single first adaptive quantization data is.
  6. 根据权利要求3至5中任意一项所述的方法,其特征在于,所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序相同,所述N个第一图块的排列顺序为所述N个第一图块在所述第一图像中的排列顺序,所述对应关系包括所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序。The method according to any one of claims 3 to 5, wherein the arrangement order of the N first coded representations is the same as the arrangement order of the N first image blocks, and the N first coded representations are arranged in the same order as the N first image blocks. The arrangement order of the tiles is the arrangement order of the N first tiles in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the N first tiles arrangement order.
  7. 根据权利要求1至6中任意一项所述的方法,其特征在于,所述N个第一图块中的每个第一图块的大小相同。The method according to any one of claims 1 to 6, wherein each of the N first image blocks has the same size.
  8. 根据权利要求7所述的方法,其特征在于,当所述方法用于分割不同大小的所述第一图像时,所述第一图块的大小为固定值。The method according to claim 7, wherein when the method is used to segment the first images of different sizes, the size of the first image block is a fixed value.
  9. 根据权利要求1至8中任意一项所述的方法,其特征在于,所述第一图块的像素为a×b,所述a和所述b是根据目标像素得到的,所述目标像素为c×d,
    Figure PCTCN2021101807-appb-100001
    等于整数,
    Figure PCTCN2021101807-appb-100002
    等于整数,所述a和c为宽度方向上的像素点数量,所述b和d为高度方向上的像素点数量,所述目标像素是根据终端设备的目标分辨率得到的,所述终端设备包括摄像部件,所述摄像部件在所述目标分辨率的设置下得到的图像的像素为所述目标像素,所述第一图像由所述摄像部件获得。
    The method according to any one of claims 1 to 8, wherein the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,
    Figure PCTCN2021101807-appb-100001
    is equal to an integer,
    Figure PCTCN2021101807-appb-100002
    is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.
  10. 根据权利要求9所述的方法,其特征在于,所述目标分辨率是根据摄像应用中的设 置界面对所述摄像部件的分辨率设置得到的。The method according to claim 9, wherein the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
  11. 根据权利要求9所述的方法,其特征在于,所述目标分辨率是根据所述摄像部件得到的图库中的目标图像组得到的,所述目标图像组的像素为所述目标像素,在不同像素的图像组中,所述目标图像组在所述图库中的比值最大。The method according to claim 9, wherein the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, the pixels of the target image group are the target pixels, and the pixels of the target image group are the target pixels. In the image group of pixels, the ratio of the target image group in the gallery is the largest.
  12. 根据权利要求1至8中任意一项所述的方法,其特征在于,所述第一图块的像素为a×b,所述a为宽度方向上的像素点数量,所述b为高度方向上的像素点数量,所述第一图像的像素为r×t;The method according to any one of claims 1 to 8, wherein the pixels of the first image block are a×b, the a is the number of pixels in the width direction, and the b is the height direction The number of pixels on the first image, the pixels of the first image are r×t;
    在获取所述第一图像后,在分割所述第一图像之前,所述方法还包括:After acquiring the first image, and before segmenting the first image, the method further includes:
    Figure PCTCN2021101807-appb-100003
    不等于整数,和/或
    Figure PCTCN2021101807-appb-100004
    不等于整数,则用像素中值填充所述第一图像的边缘,使得
    Figure PCTCN2021101807-appb-100005
    等于整数,
    Figure PCTCN2021101807-appb-100006
    等于整数,填充后的所述第一图像的像素为r1×t1。
    like
    Figure PCTCN2021101807-appb-100003
    not equal to an integer, and/or
    Figure PCTCN2021101807-appb-100004
    is not equal to an integer, then fill the edges of the first image with the pixel median such that
    Figure PCTCN2021101807-appb-100005
    is equal to an integer,
    Figure PCTCN2021101807-appb-100006
    equal to an integer, the pixels of the first image after filling are r1×t1.
  13. 根据权利要求12所述的方法,其特征在于,在获取所述第一图像后,在填充所述第一图像的边缘之前,所述方法还包括:The method according to claim 12, characterized in that, after acquiring the first image, before filling the edge of the first image, the method further comprises:
    若所述
    Figure PCTCN2021101807-appb-100007
    不等于整数,则等比放大所述r和所述t,得到像素为r2×t2的所述第一图像,所述
    Figure PCTCN2021101807-appb-100008
    等于整数;
    if said
    Figure PCTCN2021101807-appb-100007
    is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
    Figure PCTCN2021101807-appb-100008
    is equal to an integer;
    所述若
    Figure PCTCN2021101807-appb-100009
    不等于整数,和/或
    Figure PCTCN2021101807-appb-100010
    不等于整数,则用像素中值填充所述第一图像的边缘包括:
    said if
    Figure PCTCN2021101807-appb-100009
    not equal to an integer, and/or
    Figure PCTCN2021101807-appb-100010
    is not equal to an integer, then filling the edge of the first image with the median value of pixels includes:
    Figure PCTCN2021101807-appb-100011
    不等于整数,则用像素中值填充所述第一图像的边缘。
    like
    Figure PCTCN2021101807-appb-100011
    not equal to an integer, then fill the edges of the first image with the pixel median.
  14. 根据权利要求1至13中任意一项所述的方法,其特征在于,所述N个第一图块包括第一目标图块,所述第一目标图块的像素值的范围小于所述第一图像的像素值的范围;The method according to any one of claims 1 to 13, wherein the N first image blocks include a first target image block, and the range of pixel values of the first target image block is smaller than that of the first image block. A range of pixel values for an image;
    在从所述N个第一图块中获取N个第一自适应数据之前,所述方法还包括:Before acquiring the N first adaptive data from the N first tiles, the method further includes:
    反量化所述第一目标图块的像素值;inversely quantize the pixel value of the first target image block;
    从所述N个第一图块中获取N个第一自适应数据包括:Obtaining N pieces of first adaptive data from the N pieces of first tiles includes:
    从反量化后的所述第一目标图块获取一个第一自适应数据。Obtain a first adaptive data from the inverse quantized first target image block.
  15. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取N个第一编码表示,N个第一自适应数据和对应关系,所述对应关系包括所述N个第一自适应数据和所述N个第一编码表示的对应关系,所述N个第一自适应数据与所述N个第一编码一一对应,N为大于1的整数;Obtain N first coded representations, N pieces of first adaptive data, and corresponding relationships, where the corresponding relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations, the N pieces of first adaptive data The first adaptive data is in one-to-one correspondence with the N first codes, and N is an integer greater than 1;
    对所述N个第一编码表示进行熵解码,得到N组第二特征图;Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
    通过解码神经网络处理所述N组第二特征图,得到N个第一重构图块;The N groups of second feature maps are processed by a decoding neural network to obtain N first reconstructed image blocks;
    通过所述N个第一自适应数据补偿所述N个第一重构图块;compensating the N first reconstructed image blocks by the N first adaptive data;
    组合补偿后的N个第一重构图块,得到第二图像。The compensated N first reconstructed image blocks are combined to obtain a second image.
  16. 根据权利要求15所述的方法,其特征在于,所述N个第一编码表示是通过N组第一特征图量化和熵编码得到的,所述N组第一特征图是通过编码神经网络处理预处理后的N个第一图块得到的,所述预处理后的N个第一图块是通过所述N个第一自适应数据对N个第一图块进行预处理得到的,所述N个第一自适应数据是从所述N个第一图块中得到的,所述N个第一图块是通过分割第一图像得到的。The method according to claim 15, wherein the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network The preprocessed N first image blocks are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data, so The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
  17. 根据权利要求15或16所述的方法,其特征在于,所述N越大,单个第一自适应量化数据的信息熵越小。The method according to claim 15 or 16, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
  18. 根据权利要求15至17中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15 to 17, wherein the method further comprises:
    通过融合神经网络处理所述第二图像,得到第三图像,以降低所述第二图像与所述第一图像的差异,所述差异包括块效应。The second image is processed by a fusion neural network to obtain a third image to reduce the difference between the second image and the first image, the difference including blockiness.
  19. 一种编码装置,其特征在于,包括:An encoding device, comprising:
    第一获取模块,用于获取第一图像;a first acquisition module, configured to acquire a first image;
    分割模块,用于分割所述第一图像,获得N个第一图块,N为大于1的整数;a segmentation module, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;
    第二获取模块,用于从所述N个第一图块中获取N个第一自适应数据,所述N个第一自适应数据与所述N个第一图块一一对应;a second obtaining module, configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
    预处理模块,用于根据所述N个第一自适应数据对所述N个第一图块进行预处理;a preprocessing module, configured to preprocess the N first image blocks according to the N first adaptive data;
    编码神经网络模块,处理预处理后的N个第一图块,得到N组第一特征图;encoding the neural network module, processing the preprocessed N first image blocks, and obtaining N groups of first feature maps;
    量化和熵编码模块,用于对所述N组第一特征图进行量化和熵编码,得到N个第一编码表示。A quantization and entropy encoding module, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  20. 根据权利要求19所述的装置,其特征在于,所述N个第一编码表示用于进行熵解码,得到N组第二特征图,所述N组第二特征图用于通过解码神经网络处理,得到N个第一重构图块,所述N个第一自适应数据用于对所述N个第一重构图块进行补偿,补偿后的N个第一重构图块用于组合成第二图像。The device according to claim 19, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network , to obtain N first reconstructed image blocks, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used for combination into the second image.
  21. 根据权利要求19或20所述的装置,其特征在于,所述装置还包括:The device according to claim 19 or 20, wherein the device further comprises:
    发送模块,用于向解码端发送所述N个第一编码表示,所述N个第一自适应数据和对应关系,所述对应关系包括所述N个第一自适应数据和所述N个第一编码表示的对应关系。A sending module, configured to send the N first coded representations, the N first adaptive data and the corresponding relationship to the decoding end, where the corresponding relationship includes the N first adaptive data and the N first adaptive data The corresponding relationship represented by the first code.
  22. 根据权利要求21所述的装置,其特征在于,所述装置还包括:The apparatus of claim 21, wherein the apparatus further comprises:
    量化模块,用于量化所述N个第一自适应数据,获得N个第一自适应量化数据,所述N个第一自适应量化数据用于对所述N个第一重构图块进行补偿;A quantization module, configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;
    所述发送模块具体用于向所述解码端发送所述N个第一自适应量化数据。The sending module is specifically configured to send the N pieces of first adaptive quantization data to the decoding end.
  23. 根据权利要求22所述的装置,其特征在于,所述N越大,单个第一自适应量化数据的信息熵越小。The apparatus according to claim 22, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
  24. 根据权利要求21至23中任意一项所述的装置,其特征在于,所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序相同,所述N个第一图块的排列顺序为所述N个第一图块在所述第一图像中的排列顺序,所述对应关系包括所述N个第一编码表示的排列顺序和所述N个第一图块的排列顺序。The apparatus according to any one of claims 21 to 23, wherein the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the N first encoded representations are arranged in the same order as the N first image blocks. The arrangement order of the tiles is the arrangement order of the N first tiles in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the N first tiles arrangement order.
  25. 根据权利要求19至24中任意一项所述的装置,其特征在于,所述N个第一图块中的每个第一图块的大小相同。The apparatus according to any one of claims 19 to 24, wherein each of the N first image blocks has the same size.
  26. 根据权利要求25所述的装置,其特征在于,当所述装置用于处理不同大小的所述第一图像时,所述第一图块的大小为固定值。The apparatus according to claim 25, wherein when the apparatus is used to process the first images of different sizes, the size of the first image block is a fixed value.
  27. 一种解码装置,其特征在于,包括:A decoding device, comprising:
    获取模块,用于获取N个第一编码表示,N个第一自适应数据和对应关系,所述对应关系包括所述N个第一自适应数据和所述N个第一编码表示的对应关系,所述N个第一自适应数据与所述N个第一编码一一对应,N为大于1的整数;an acquisition module, configured to acquire N first coded representations, N first adaptive data and corresponding relationships, where the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coded representations , the N first adaptive data are in one-to-one correspondence with the N first codes, and N is an integer greater than 1;
    熵解码模块,对所述N个第一编码表示进行熵解码,得到N组第二特征图;an entropy decoding module, which performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
    解码神经网络模块,用于处理所述N组第二特征图,得到N个第一重构图块;a decoding neural network module for processing the N groups of second feature maps to obtain N first reconstructed image blocks;
    补偿模块,用于通过所述N个第一自适应数据补偿所述N个第一重构图块;a compensation module, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;
    组合模块,用于组合补偿后的N个第一重构图块,得到第二图像。The combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.
  28. 根据权利要求27所述的装置,其特征在于,所述N个第一编码表示是通过N组第一特征图量化和熵编码得到的,所述N组第一特征图是通过编码神经网络处理预处理后的N个第一图块得到的,所述预处理后的N个第一图块是通过所述N个第一自适应数据对N个第一图块进行预处理得到的,所述N个第一自适应数据是从所述N个第一图块中得到的,所述N个第一图块是通过分割第一图像得到的。The apparatus according to claim 27, wherein the N first encoded representations are obtained by quantization and entropy coding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network The preprocessed N first image blocks are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data, so The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
  29. 根据权利要求27或28所述的装置,其特征在于,所述N越大,单个第一自适应量化数据的信息熵越小。The apparatus according to claim 27 or 28, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
  30. 根据权利要求27至29中任意一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 27 to 29, wherein the device further comprises:
    融合神经网络模块,用于处理所述第二图像,得到第三图像,以降低所述第二图像与所述第一图像的差异,所述差异包括块效应。A fusion neural network module is used to process the second image to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness.
  31. 一种图像处理设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1-18中任意一项所描述的方法。An image processing device comprising: a non-volatile memory and a processor coupled to each other, the processor calling program code stored in the memory to perform the method as described in any one of claims 1-18 .
PCT/CN2021/101807 2020-07-30 2021-06-23 Image processing method and related device WO2022022176A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010754333.9 2020-07-30
CN202010754333.9A CN114066914A (en) 2020-07-30 2020-07-30 Image processing method and related equipment

Publications (1)

Publication Number Publication Date
WO2022022176A1 true WO2022022176A1 (en) 2022-02-03

Family

ID=80037157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101807 WO2022022176A1 (en) 2020-07-30 2021-06-23 Image processing method and related device

Country Status (2)

Country Link
CN (1) CN114066914A (en)
WO (1) WO2022022176A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114923855A (en) * 2022-05-12 2022-08-19 泉州装备制造研究所 Leather quality grading method
WO2023207836A1 (en) * 2022-04-26 2023-11-02 华为技术有限公司 Image encoding method and apparatus, and image decompression method and apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101978698A (en) * 2008-03-18 2011-02-16 三星电子株式会社 Method and apparatus for encoding and decoding image
US20140185665A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated High-frequency-pass sample adaptive offset in video coding
CN105635732A (en) * 2014-10-30 2016-06-01 联想(北京)有限公司 Adaptive sampling point compensation coding method and device, and method and device for decoding video code stream
CN111052740A (en) * 2017-07-06 2020-04-21 三星电子株式会社 Method and apparatus for encoding or decoding image
CN111405287A (en) * 2019-01-03 2020-07-10 华为技术有限公司 Prediction method and device of chrominance block
CN112822489A (en) * 2020-12-30 2021-05-18 北京博雅慧视智能技术研究院有限公司 Hardware implementation method and device for sample adaptive offset compensation filtering

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101978698A (en) * 2008-03-18 2011-02-16 三星电子株式会社 Method and apparatus for encoding and decoding image
US20140185665A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated High-frequency-pass sample adaptive offset in video coding
CN105635732A (en) * 2014-10-30 2016-06-01 联想(北京)有限公司 Adaptive sampling point compensation coding method and device, and method and device for decoding video code stream
CN111052740A (en) * 2017-07-06 2020-04-21 三星电子株式会社 Method and apparatus for encoding or decoding image
CN111405287A (en) * 2019-01-03 2020-07-10 华为技术有限公司 Prediction method and device of chrominance block
CN112822489A (en) * 2020-12-30 2021-05-18 北京博雅慧视智能技术研究院有限公司 Hardware implementation method and device for sample adaptive offset compensation filtering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207836A1 (en) * 2022-04-26 2023-11-02 华为技术有限公司 Image encoding method and apparatus, and image decompression method and apparatus
CN114923855A (en) * 2022-05-12 2022-08-19 泉州装备制造研究所 Leather quality grading method

Also Published As

Publication number Publication date
CN114066914A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2021155832A1 (en) Image processing method and related device
WO2022021938A1 (en) Image processing method and device, and neutral network training method and device
US20230336758A1 (en) Encoding with signaling of feature map data
TWI834087B (en) Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product
WO2022022176A1 (en) Image processing method and related device
US20230336759A1 (en) Decoding with signaling of segmentation information
US20230353764A1 (en) Method and apparatus for decoding with signaling of feature map data
WO2022179588A1 (en) Data coding method and related device
CN116547969A (en) Processing method of chroma subsampling format in image decoding based on machine learning
US20240078414A1 (en) Parallelized context modelling using information shared between patches
US11403782B2 (en) Static channel filtering in frequency domain
TWI826160B (en) Image encoding and decoding method and apparatus
WO2023174256A1 (en) Data compression method and related device
WO2022100140A1 (en) Compression encoding method and apparatus, and decompression method and apparatus
WO2023066536A1 (en) Attention based context modelling for image and video compression
WO2023172153A1 (en) Method of video coding by multi-modal processing
CN114693811A (en) Image processing method and related equipment
WO2023160835A1 (en) Spatial frequency transform based image modification using inter-channel correlation information
WO2023121499A1 (en) Methods and apparatus for approximating a cumulative distribution function for use in entropy coding or decoding data
EP4226325A1 (en) A method and apparatus for encoding or decoding a picture using a neural network
WO2024002496A1 (en) Parallel processing of image regions with neural networks – decoding, post filtering, and rdoq
WO2024002497A1 (en) Parallel processing of image regions with neural networks – decoding, post filtering, and rdoq

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21851415

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21851415

Country of ref document: EP

Kind code of ref document: A1