WO2022022176A1 - Procédé de traitement d'image et dispositif associé - Google Patents

Procédé de traitement d'image et dispositif associé Download PDF

Info

Publication number
WO2022022176A1
WO2022022176A1 PCT/CN2021/101807 CN2021101807W WO2022022176A1 WO 2022022176 A1 WO2022022176 A1 WO 2022022176A1 CN 2021101807 W CN2021101807 W CN 2021101807W WO 2022022176 A1 WO2022022176 A1 WO 2022022176A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
data
pixels
image blocks
adaptive
Prior art date
Application number
PCT/CN2021/101807
Other languages
English (en)
Chinese (zh)
Inventor
赵政辉
马思伟
王晶
Original Assignee
华为技术有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 北京大学 filed Critical 华为技术有限公司
Publication of WO2022022176A1 publication Critical patent/WO2022022176A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4092Image resolution transcoding, e.g. by using client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the arrangement order of the N first encoded representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
  • the remainder is less than The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is where g is the remainder.
  • a third aspect of the present application provides a model training method, the method comprising:
  • the preprocessed N first image blocks are processed by the first coding neural network to obtain N groups of first feature maps;
  • the N groups of second feature maps are processed by the first decoding neural network to obtain N first reconstructed image blocks;
  • the method further includes:
  • N first adaptive quantization data Quantizing the N first adaptive data to obtain N first adaptive quantization data, and the N first adaptive quantization data is used to compensate the N first reconstructed picture blocks;
  • Obtaining the distortion loss of the second image relative to the first image includes:
  • each of the N first image blocks has the same size.
  • the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
  • the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, equal to an integer, the e includes the c and the f includes the d.
  • the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
  • the pixels of the first image block are a ⁇ b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r ⁇ t;
  • the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • filling the edge of the first image with the median value of pixels includes:
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • the method further includes:
  • the one first adaptive data is obtained from the inverse quantized first target image block.
  • a fourth aspect of the present application provides an encoding device, the device comprising:
  • a first acquisition module configured to acquire a first image
  • a segmentation module used for segmenting the first image to obtain N first image blocks, where N is an integer greater than 1;
  • a second acquisition module configured to acquire N first adaptive data from N first image blocks, where N first adaptive data corresponds to N first image blocks one-to-one;
  • a preprocessing module configured to preprocess the N first image blocks according to the N first adaptive data
  • the coding neural network module processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps
  • the quantization and entropy encoding module is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  • the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network to obtain N sets of second feature maps.
  • the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used to combine into a second image.
  • the device further includes:
  • the sending module is configured to send the N pieces of first coded representations, the N pieces of first adaptive data and corresponding relationships to the decoding end, where the correspondence relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations.
  • the device further includes:
  • a quantization module configured to quantize the N first adaptive data to obtain the N first adaptive quantized data, and the N first adaptive quantized data is used to compensate the N first reconstructed image blocks;
  • the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
  • the second image is processed by a fusion neural network to obtain the third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference including block effect.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, Equal to an integer, e includes c and f includes d.
  • the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the pixels of the first image block are a ⁇ b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ⁇ t.
  • the device also includes:
  • padding module for if not equal to an integer, and/or is not equal to an integer, then fill the edges of the first image with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • the device further includes:
  • Amplification module for use if described is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • the padding module is specifically used if not equal to an integer, then fill the edges of the first image with the pixel median.
  • the second obtaining unit is further configured to not equal to an integer, get the remainder
  • the filling module is specifically used if the remainder is greater than Then only the pixel median value is filled on one side of the width direction of the first image.
  • the filling module is specifically used if the remainder is less than Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is where g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image.
  • the device also includes:
  • an inverse quantization module for inverse quantization of the pixel value of the first target image block
  • the second obtaining module is specifically configured to obtain a piece of first adaptive data from the inverse quantized first target image block.
  • a fifth aspect of the present application provides a decoding device, the decoding device comprising:
  • the acquisition module is used to acquire N first coding representations, N first adaptive data and corresponding relationships, the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coding representations, the N first self-adaptive data and the corresponding relationships.
  • the adaptation data is in one-to-one correspondence with the N first codes, where N is an integer greater than 1;
  • the entropy decoding module performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
  • a decoding neural network module for processing N groups of second feature maps to obtain N first reconstructed image blocks
  • a compensation module configured to compensate the N first reconstructed image blocks by using the N first adaptive data
  • the combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.
  • the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network.
  • the N first image blocks after processing are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data.
  • the N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
  • the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data;
  • the compensation module is specifically configured to compensate the N first reconstructed image blocks by using the N first adaptive quantization data.
  • the larger N is, the smaller the information entropy of the single first adaptive quantization data is.
  • the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in the arrangement order in the first image.
  • the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.
  • the device further includes:
  • the fusion neural network module is used to process the second image to obtain the third image, so as to reduce the difference between the second image and the first image, and the difference includes block effect.
  • each of the N first tiles has the same size.
  • the size of the first image block is a fixed value.
  • the pixels of the first block are a ⁇ b, a and b are obtained according to the target pixel, and the target pixel is c ⁇ d, is equal to an integer, Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, Equal to an integer, e includes c and f includes d.
  • the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.
  • the pixels of the first image block are a ⁇ b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ⁇ t. exist not equal to an integer, and/or not equal to an integer, the edges of the first image are filled with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • r and t are proportionally enlarged by the encoder, and the first image with pixels r2 ⁇ t2 is obtained, equal to an integer.
  • the remainder is greater than The first image is padded with the median pixel value only on one side in the width direction.
  • the remainder is less than The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is where g is the remainder.
  • the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first
  • the adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.
  • a sixth aspect of the present application provides a training device, the device comprising:
  • a first acquisition module configured to acquire a first image
  • a segmentation module configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;
  • a second obtaining module configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;
  • a preprocessing module configured to preprocess the N first image blocks according to the N first adaptive data
  • the first coding neural network module is used to process the preprocessed N first image blocks to obtain N groups of first feature maps;
  • a quantization and entropy encoding module for performing quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations
  • a training module configured to jointly train a model by using a loss function until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding neural network , a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network.
  • the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.
  • the model further includes a segmentation network, and the trainable parameter in the segmentation network is the size of the first image block;
  • the output module is used to output a second coding neural network and a second decoding neural network
  • the second coding neural network is a model obtained after the first coding neural network performs iterative training
  • the second decoding neural network is The first decoding neural network is a model obtained after iterative training is performed.
  • a quantization module configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;
  • the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;
  • the third obtaining module is specifically configured to obtain the distortion loss of the third image relative to the first image
  • the model includes a fusion neural network.
  • each of the N first image blocks has the same size.
  • the size of the first image used for training is different, and the size of the first image block is a fixed value.
  • the pixels of the first block are a ⁇ b, the a and the b are obtained according to a target pixel, and the target pixel is c ⁇ d, is equal to an integer, is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device.
  • the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
  • the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.
  • images of multiple pixels are obtained by the imaging component, and the multiple pixels are e ⁇ f, is equal to an integer, equal to an integer, the e includes the c and the f includes the d.
  • the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.
  • the pixels of the first image block are a ⁇ b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r ⁇ t;
  • the device also includes:
  • padding module for if not equal to an integer, and/or is not equal to an integer, then fill the edges of the first image with the pixel median such that is equal to an integer, equal to an integer, the pixels of the first image after filling are r1 ⁇ t1.
  • the device further includes:
  • Amplification module for use if described is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2 ⁇ t2, and the is equal to an integer;
  • the padding module is specifically used if not equal to an integer, then fill the edges of the first image with the pixel median.
  • the second acquisition module is further configured to, after proportionally amplifying r and t, if not equal to an integer, get the remainder;
  • the filling module is specifically used for if the remainder is greater than Then the pixel median value is only filled on one side of the width direction of the first image.
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • the device also includes:
  • an inverse quantization module for inverse quantization of the pixel value of the first target image block
  • the second obtaining module is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.
  • a seventh aspect of the present application provides an encoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
  • the processor may also be configured to execute the steps performed by the encoding end in each possible implementation manner of the first aspect. For details, refer to the first aspect, which will not be repeated here.
  • Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;
  • the N groups of second feature maps are processed by the decoding neural network to obtain N first reconstructed image blocks;
  • the compensated N first reconstructed image blocks are combined to obtain a second image.
  • the decoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.
  • the processor may also be configured to execute the steps performed by the decoding end in each possible implementation manner of the second aspect, and details can be found in the second aspect, which will not be repeated here.
  • a ninth aspect of the present application provides a training device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • the first encoding neural network, the quantization network, the entropy encoding network, the entropy decoding network, and the first decoding neural network are jointly trained by using the loss function, until the image between the first image and the second image is The distortion value reaches the first preset level;
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer enables the computer to execute the above-mentioned first to third aspects Any of the image processing methods described.
  • the present application provides a chip system
  • the chip system includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and/or information.
  • the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • FIG. 2b is a schematic diagram of another application scenario of the embodiment of the present application.
  • FIG. 2c is a schematic diagram of another application scenario of the embodiment of the present application.
  • FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a CNN-based image encoding processing process in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a CNN-based image decoding process in an embodiment of the application.
  • FIG. 7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application
  • FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a training process provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of a decoding apparatus provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 is a structural schematic diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is elaborated in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • the image compression method provided by the embodiment of the present application can be applied to an image compression process in a terminal device, and specifically, can be applied to an album, video monitoring, etc. on the terminal device.
  • FIG. 2a is a schematic diagram of an application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, where the image to be compressed may be a photo taken by a camera component Or a frame captured from a video, and the camera component is generally a camera.
  • the terminal device divides the extracted image through a central processing unit (CPU) to obtain multiple tiles.
  • CPU central processing unit
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding.
  • Decoding neural network reconstructs the feature map to obtain multiple reconstructed blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.
  • the image compression method provided by the embodiment of the present application can be applied to an image compression process in the cloud, and specifically, can be applied to functions such as a cloud album on a cloud device, and the cloud device can be a cloud server.
  • FIG. 2b is a schematic diagram of another application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, and the image to be compressed may be captured by a camera component A photo or a frame taken from a video.
  • the terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data.
  • any lossless compression method based on the prior art can also be used.
  • the image compression method provided by the embodiments of the present application can be applied to image compression of terminal devices and image decompression processes of cloud devices. Specifically, it can be applied to functions such as cloud albums on cloud devices.
  • the cloud device can be a cloud server.
  • FIG. 2c is a schematic diagram of another application scenario of an embodiment of the present application.
  • a terminal device may acquire an image to be compressed, where the image to be compressed may be captured by a camera component A photo or a frame taken from a video.
  • the terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
  • Ws is the weight of Xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers.
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • the terminal device may be a mobile phone, a tablet, a notebook computer, a smart wearable device, or the like.
  • the terminal device may be a virtual reality (virtual reality, VR) device.
  • VR virtual reality
  • the embodiments of the present application can also be applied to intelligent monitoring, and a camera can be configured in the intelligent monitoring, and the intelligent monitoring can obtain pictures to be compressed through the camera, etc. It should be understood that the embodiments of the present application can also be applied to In other scenarios that require image compression, the other scenarios will not be listed one by one here.
  • FIG. 3a is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • step 301 the terminal device acquires a first image.
  • the terminal device can obtain the first image, where the first image can be a photo taken by the camera component or a frame captured from the captured video, the terminal device includes the camera component, and the camera component is generally a camera.
  • the first image may also be an image obtained by the terminal device from the network, or an image obtained by the terminal device using a screen capture tool.
  • FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application.
  • Fig. 3b illustrates the whole process of outputting the third image from the first image.
  • step 302 the terminal device sends the first image to the cloud device.
  • the terminal device Before the terminal device sends the first image to the cloud device, the terminal device may perform lossless encoding on the first image to obtain encoded data.
  • the encoding method can be entropy encoding, or other lossless compression methods.
  • step 303 the cloud device divides the first image to obtain N first image blocks.
  • the cloud device may receive the first image sent by the terminal device. If the first image undergoes lossless encoding by the terminal device, the cloud device also needs to perform lossless decoding on it.
  • the cloud device divides the first image to obtain N first image blocks, where N is an integer greater than 1.
  • FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application. As shown in FIG. 4 , the first image 401 is divided into 12 first tiles. Wherein, when the size of the first image is determined, the size of the first image block determines the value of N.
  • the N being 12 described here is just an example, and the size of the first image block will be described in detail in the subsequent description.
  • each of the N first tiles has the same size.
  • step 304 the cloud device obtains M first averages from the N first tiles.
  • the first tile includes data of three channels, and the number M of the first average values obtained by the cloud device is equal to 3N.
  • the first image is a grayscale image, that is, a one-channel image
  • the first block includes data of one channel, and the number M of the first mean values obtained by the cloud device is equal to N. Because the processing manner of each channel is similar, for the convenience of description, only one channel is used as an example for description in this embodiment of the present application.
  • the mean refers to the mean of the pixel values of all the pixels in the first tile.
  • step 305 the cloud device preprocesses the N first image blocks by using the N first averages.
  • the preprocessing may be to subtract the mean value from the pixel value of each pixel point in the first image block to obtain N first image blocks after preprocessing.
  • step 306 the pre-processed N first image blocks are processed through an encoding neural network to obtain N sets of first feature maps.
  • the coding neural network is a CNN
  • the terminal device may perform feature extraction on the preprocessed N first image blocks based on the CNN to obtain N groups of first feature maps.
  • Each set of first feature maps corresponds to one first block, and each set of first feature maps includes at least one feature map.
  • the first feature map may also be referred to as a channel feature map image, wherein each semantic channel corresponds to a first feature map.
  • the CNN 502 can multiply the upper left 3 ⁇ 3 pixels of the input data (the first tile) by the weights and map them to the neurons in the upper left end of the first feature map.
  • the weight to be multiplied will also be 3x3.
  • the CNN 502 scans the input data (the first tile) one by one from left to right and top to bottom, and multiplies the weights to map the neurons of the feature map.
  • the 3x3 weights used are called filters or filter kernels.
  • the process of applying filters in CNN502 is the process of performing convolution operations using filter kernels, and the extracted results are called "channel feature maps", where the channel feature maps can also be called multi-channel Feature map image, the term “multi-channel feature map image” may refer to a set of feature map images corresponding to multiple channels.
  • the channel feature map may be generated by CNN 502, also referred to as a "feature extraction layer” or “convolutional layer” of a CNN.
  • the layers of a CNN can define the mapping of output to input.
  • the mapping defined by the layers is performed as one or more filter kernels (convolution kernels) to be applied to the input data to generate channel feature maps to be output to the next layer.
  • the input data can be the first block or the channel feature map output by CNN502.
  • a CNN 502 receives a first tile 501 and generates a channel feature map 503 as input. Additionally, during forward execution, the next layer of CNN receives channel feature map 503 as input and generates channel feature map 503 as output. Then, each subsequent layer will receive the channel feature map generated in the previous layer and use it as an input to generate the channel feature map of the next layer. Finally, a set of first feature maps 504 generated in the (X1)th layer is received. Wherein, X1 is an integer greater than 1, that is, the channel feature maps of each layer above may be used as a set of first feature maps 504 .
  • the cloud device repeats the above operations for each first block, so as to obtain N groups of first feature maps.
  • the length and width of each feature map in the multi-channel feature map image gradually decrease, and the number of semantic channels in the multi-channel feature map image gradually increases, so as to realize the first image block. data compression.
  • processing operations can be performed in addition to the operation of applying convolution kernels that map input feature maps to output feature maps.
  • Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.
  • u represents the jth channel of the output of the ith convolutional layer.
  • v represents the output result of the corresponding activation function, and ⁇ and ⁇ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
  • the first image block is transformed into another space (at least one first feature map) through the CNN convolutional neural network.
  • the number of first feature maps is 192, that is, the number of semantic channels is 192, and each semantic channel corresponds to a first feature map.
  • at least one first feature map may be in the form of a three-dimensional tensor, and its size may be 192 ⁇ w ⁇ h, where w ⁇ h is the width of the matrix corresponding to the first feature map of a single channel and the long.
  • step 307 the N groups of first feature maps are quantized and entropy encoded to obtain N first encoded representations.
  • the N groups of first feature maps are converted to a quantization center according to a specified rule, so that entropy coding can be performed subsequently.
  • the quantization operation may convert the N sets of first feature maps from floating point numbers into bit streams (eg, bit streams using specific bit integers such as 8-bit integers or 4-bit integers).
  • the quantization operation may be performed on the N sets of first feature maps using rounding, but not limited to.
  • the probability estimation of each point in the output feature can be obtained by using an entropy estimation network, and the output feature is entropy encoded by using the probability estimation to obtain a binary code stream.
  • the entropy encoding process mentioned in this application is Existing entropy coding technology can be used, which is not repeated in this application.
  • step 308 the cloud device sends the N first encoded representations, the N first mean values and the corresponding relationship to the terminal device.
  • the terminal device stores the first image in the cloud device. If the terminal device needs to acquire the first image, it can send a request to the cloud device. After the cloud device receives the request sent by the terminal device, the cloud device sends N first encoded representations, N first mean values and corresponding relationships to the terminal device.
  • the correspondence relationship refers to the correspondence relationship between the N first coded representations and the N first mean values.
  • step 310 the terminal device processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks.
  • the transposed CNN 602 receives a set of second feature maps 601 and generates a reconstructed feature map 603 as output.
  • the next-layer transposed CNN receives the reconstructed feature map 603 as an input, and generates a reconstructed feature map of the next layer as an output.
  • Each subsequent transposed CNN layer will then receive the reconstructed feature map generated in the previous layer and generate the next reconstructed feature map as output.
  • the first reconstructed image block 604 generated in the (X2)th layer is received, where X2 is an integer greater than 1, that is, the reconstructed feature map of each layer above may be used as the first reconstructed map Block 604.
  • the cloud device repeats the above operations for each set of second feature maps, so as to obtain N first reconstructed image blocks.
  • iGDN inverse generalized divergence normalization
  • v represents the jth channel of the output of the ith convolutional layer.
  • u represents the output of the corresponding activation function, and ⁇ and ⁇ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.
  • the cloud device sends the N first mean values and the corresponding relationship to the terminal device.
  • the terminal device compensates the N first reconstructed image blocks by using the N first mean values through the corresponding relationship. Compensation refers to adding a first mean value to the pixel value of each pixel in the first reconstructed image block to obtain a compensated first reconstructed image block.
  • N first reconstructed picture blocks after compensation can be obtained.
  • the terminal device when the terminal device receives N first quantized mean values from the cloud device, the terminal device compensates the N first reconstructed image blocks by using the N first quantized mean values. It should be determined that when the terminal device compensates the N first reconstructed image blocks by using the N first quantized average values, the cloud device will also preprocess the N first image blocks by using the N first quantized average values.
  • step 312 the terminal device combines the compensated N first reconstructed picture blocks to obtain a second image.
  • the combination is an inverse process of division, the N first reconstructed tiles are replaced with N first reconstructed tiles, and then the N first reconstructed tiles are combined.
  • the embodiments of the present application enhance the performance of each first image block by highlighting the local characteristics of each first image block, but it is also easy to cause blockiness between the first reconstructed image block and the first reconstructed image block.
  • Blockiness refers to a discontinuity phenomenon at the boundary between the first reconstructed image block and the first reconstructed image block, forming a defect in the reconstructed image.
  • the fusion neural network is a CNN. Please refer to Figure 5 and Figure 6. From the structure of CNN, the fusion neural network can be a combination of encoding neural network and decoding neural network. By taking the output 504 in FIG. 5 as the input 601 in FIG. 6 , and taking the second image as the input 501 in FIG. 5 , the output of FIG. 6 is the third image. By fusing the neural network, the blocking effect in the second image can be eliminated. It should be confirmed that here is a simple example of the framework of a fusion neural network. In practical applications, the framework of the fusion neural network, such as the number of layers of CNN, the number of layers of transposed CNN, the size of the matrix of each CNN layer, etc. can have nothing to do with encoding neural network, decoding neural network.
  • a linear rectifier unit layer ReLU is also included, and the ReLU is used to correct the negative numbers in the feature map output by the convolution kernel to zero.
  • the image processing method in the embodiment of the present application can process images of different sizes, such as the fourth image, the third image.
  • the pixels of the four images are different from the pixels of the first image.
  • the process of processing the fourth image by using the image processing method in the embodiment of the present application is similar to the process of processing the first image above, and details are not repeated here.
  • the cloud device divides the fourth image, and M second image blocks can be obtained.
  • the size of the second image block is the same as that of the first image block.
  • the same encoding neural network and decoding neural network are used to process the first tile and the second tile, and the first tile and the second tile are in the processing flow
  • the number of convolution operations in and the number of data involved in each convolution operation are the same.
  • a corresponding convolution operation unit can be designed according to the number of times of the above-mentioned convolution operation and/or the number of data involved in the convolution operation each time, so that the convolution operation unit matches the processing flow.
  • the convolution operation unit matches the first block, or the volume
  • the product operation unit is matched to the encoding neural network and/or the decoding neural network. The higher the matching degree between the convolution operation unit and the first image block, the smaller the number of idle multipliers and adders in the convolution operation unit in the processing flow, that is, the higher the usage efficiency of the convolution operation unit.
  • the size of the first tile not only affects the size of N, but also affects whether the image is just divided into tiles of integer blocks.
  • the size of the first image block is generally determined by the following two aspects.
  • the first aspect is the influence of the model on the size of the first tile.
  • the model includes an encoding neural network and a decoding neural network, and may also include a fusion neural network.
  • the influence of the model on the size of the first tile generally includes the impact on the size of the first tile when the model is trained and the impact on the size of the first tile when the model is used.
  • the terminal device when the terminal device is the encoding end, such as the aforementioned first application scenario, it is more meaningful to determine the size of the first image block by the target resolution. Because the target resolution indicates the pixels of the image that the terminal device may acquire in the future, that is, the pixels of the image that the encoding end will use the image processing method in the embodiment of this application to process in the future, when training the model, it can target pixels for training.
  • the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.
  • the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution.
  • FIG. 7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application.
  • the option 701 with a resolution of [4:3] 10MP is selected, although the option here does not specify the specific value of the target resolution.
  • the first image obtained by shooting it can be known that the pixels of the first image are 2736 ⁇ 3648, that is, the target pixels are 2736 ⁇ 3648.
  • the size of the first tile is determined such that is equal to an integer, equal to an integer.
  • Any first block can be understood as a local characteristic of the first image, and by highlighting the local characteristic of the first image, the reconstruction quality of the image can be improved, that is, the compression quality of the image can be improved.
  • FIG. 10 is a schematic diagram for comparison of image compression quality in an embodiment of the present application.
  • the abscissa represents the number of bits per pixel (bit-per-pixel, BPP), which is used to measure the code rate.
  • the ordinate represents the peak signal-to-noise ratio (PSNR), which is used to measure the quality.
  • the compression algorithms compared with the image processing methods in the embodiments of the present application include different implementations of the JPEG2000, HEVC (high efficiency video coding) and VVC (versatile video coding) standards.
  • JPEG2000 the reference software OpenJPEG is used to represent its compression performance.
  • Matlab the implementation integrated in Matlab is used as a supplement to the compression performance of JPEG2000.
  • HM-16.15 the reference software HM-16.15 is used to reflect the rate-distortion (RD) performance.
  • the performance of the VVC standard is expressed using the VVC standard reference software VTM-6.2.
  • the input image bit depth and the intra-computed bit depth are set to 8 to be compatible with the format of the input image, and the test image is encoded using the full intra (AI) configuration.
  • the rate-distortion performance of various compression algorithms is shown in Figure 10.
  • the rate-distortion performance curve of OpenJPEG is 1001
  • the rate-distortion performance curve implemented by Matlab of JPEG2000 is 1002
  • the performance curve of 420 image format compression of the reference software HM-16.15 is 1003
  • the performance curve of the unblocked convolutional neural network image compression algorithm is 1004
  • the performance curve of the present invention is 1005
  • the performance curve of the 420 image format compression of the reference software VTM-6.2 is 1006.
  • the database 230 stores the first image collection, and optionally, the database 230 further includes a fourth image collection.
  • the training device 220 generates a target model/rule 201 for processing the first image and/or the fourth image, and uses the first image and/or the fourth image in the database to iteratively train the target model/rule 201 to obtain a mature Target Model/Rule 201.
  • the target model/rule 201 includes an encoding neural network and a decoding neural network.
  • the target model/rule 201 further includes a fusion neural network.
  • the encoding neural network and decoding neural network obtained by training the device 220 can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, monitoring systems, and so on.
  • the execution device 210 may call data, codes, etc. in the data storage system 250 , and may also store data, instructions, etc. in the data storage system 250 .
  • the data storage system 250 may be placed in the execution device 210 , or the data storage system 250 may be an external memory relative to the execution device 210 .
  • the calculation module 211 receives the first image sent by the client device 240, divides the first image to obtain N first image blocks, extracts N first adaptive data from the N first image blocks, and uses the N first self-adaptive data. Adapt the data to preprocess the N first image blocks, and then perform feature extraction on the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps. Perform quantization and entropy encoding to obtain N encoding tables, where N is an integer greater than 1.
  • the execution device 210 and the terminal device 240 may be separate devices.
  • the execution device 210 is configured with an I/O interface 212 for data interaction with the terminal device 240 , and a “user” may
  • the first image is input to the I/O interface 212 through the terminal device 240
  • the execution device 210 returns the second image to the terminal device 240 through the I/O interface 212 to provide it to the user.
  • the relationship between the terminal device 240 and the execution device 210 can be described by the relationship between the terminal device and the encoder and the decoder.
  • the encoding end is a device that uses an encoding neural network
  • the decoding end is a device that uses a decoding neural network.
  • the encoding end and the decoding end can be the same device or independent devices.
  • the terminal device is similar to the terminal device in the above image processing method, and the terminal device may be an encoding end and/or a decoding end.
  • the terminal device 240 and the execution device 210 reference may be made to the foregoing related descriptions of FIGS. 2a-2c.
  • FIG. 11 is only a schematic structural diagram of an image processing system provided by an embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 210 may be configured in the terminal device 240.
  • the execution device 210 may be the main processor (Host Processor) of the mobile phone or tablet.
  • a module in the CPU) for performing array image processing, and the execution device 210 may also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet, and the GPU or NPU is linked as a coprocessor. Loaded to the main processor, the main processor assigns tasks.
  • GPU graphics processing unit
  • NPU neural network processor
  • FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • the model training method provided by the embodiment of the present application may include:
  • step 1201 the training device acquires a first image.
  • step 1202 the training device divides the first image to obtain N first image blocks, where N is an integer greater than 1.
  • the training device obtains N pieces of first adaptive data from the N first image blocks, and the N first adaptive data corresponds to the N first image blocks one-to-one.
  • step 1204 the training device preprocesses the N first tiles according to the N first adaptive data.
  • step 1205 the training device processes the preprocessed N first image blocks through the first coding neural network to obtain N groups of first feature maps.
  • step 1207 the training device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
  • step 1209 the training device compensates the N first reconstructed tiles by using the N first adaptive data.
  • step 1210 the training device combines the compensated N first reconstructed blocks to obtain a second image.
  • FIG. 13 is a schematic diagram of a training process provided by an embodiment of the present application.
  • the loss function of the model in the embodiment is:
  • the training process includes dividing the first image into blocks of different sizes in multiple iterations of training, that is, the values of N are different.
  • the size of the first block is optimized by comparing the loss functions obtained from multiple iterations.
  • the training device outputs a second encoding neural network and a second decoding neural network
  • the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network
  • the second decoding neural network The network is a model obtained by performing iterative training on the first decoding neural network.
  • the method further includes:
  • the training device quantizes the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, where the N pieces of first adaptive quantization data are used to compensate the N pieces of first reconstructed image blocks.
  • the training device processes the second image through a fusion neural network to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness;
  • the model includes a fusion neural network.
  • each of the N first tiles has the same size.
  • the size of the first image used for training is different, and the size of the first image block is a fixed value.
  • the target resolution is obtained according to the target image group in the gallery obtained by the camera component, the pixels of the target image group are the target pixels, and in the image groups of different pixels, the target image group is Image groups have the largest ratio in the gallery.
  • the method further includes:
  • filling the edge of the first image with the median value of pixels includes:
  • the remainder is less than Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is Wherein, the g is the remainder.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image;
  • the method further includes:
  • a first acquisition module 1401, configured to acquire a first image
  • the entropy decoding module 1502 performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;
  • a compensation module 1504 configured to compensate the N first reconstructed image blocks by using the N first adaptive data
  • the combining module 1505 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
  • the decoding apparatus may also perform all or part of the operations performed by the terminal device in the embodiment corresponding to FIG. 3a.
  • FIG. 16 is a schematic structural diagram of a training apparatus 1600 provided by an embodiment of the present application.
  • the training apparatus 1600 includes:
  • the first acquisition module 1601 is used to acquire a first image.
  • a segmentation module 1602, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1.
  • the second obtaining module 1603 is configured to obtain N pieces of first adaptive data from the N first image blocks, where the N pieces of first adaptive data are in one-to-one correspondence with the N first image blocks.
  • the first coding neural network module 1605 is configured to process the pre-processed N first image blocks to obtain N groups of first feature maps.
  • a quantization and entropy encoding module 1606, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
  • the entropy decoding module 1607 performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.
  • the first decoding neural network module 1608 is configured to process the N groups of second feature maps to obtain N first reconstructed image blocks.
  • Compensation module 1609 configured to compensate the N first reconstructed image blocks by using the N first adaptive data.
  • the combining module 1610 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.
  • the third acquiring module 1611 is configured to acquire the distortion loss of the second image relative to the first image.
  • the output module 1613 is used for outputting a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained by performing iterative training on the first coding neural network, and the second decoding neural network A model obtained after performing iterative training for the first decoding neural network.
  • the training apparatus is further configured to perform all or part of the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
  • the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;
  • FIG. 17 is a schematic structural diagram of the execution device provided by the embodiment of the present application.
  • the execution device 1700 may specifically be represented as a virtual reality VR device, a mobile phone, Tablets, laptops, smart wearable devices, monitoring data processing devices, servers, etc., are not limited here.
  • the encoding apparatus described in the corresponding embodiment of FIG. 14 and/or the decoding apparatus described in the corresponding embodiment of FIG. 15 may be deployed on the execution device 1700 to implement the apparatus in the corresponding embodiment of FIG. 14 and/or FIG. 15 function.
  • the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (wherein the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17 ) , wherein the processor 1703 may include an application processor 17031 and a communication processor 17032 .
  • the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or otherwise.
  • the processor 1703 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.
  • the receiver 1701 can be used to receive input numerical or character information, and to generate signal input related to performing relevant settings and function control of the device.
  • the transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include a display device such as a display screen .
  • the processor 1703 is configured to perform the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.
  • N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;
  • the application processor 17031 can also be used to perform all or part of the operations that can be performed by the terminal device in the embodiment corresponding to FIG. 3a.
  • the central processing unit 1822 is configured to perform all or part of the operations performed by the training device in the embodiment corresponding to FIG. 16 .
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

La présente demande se rapporte au domaine de l'intelligence artificielle. Est divulgué un procédé de traitement d'image, comprenant les étapes consistant à : acquérir une première image ; segmenter la première image pour obtenir N premiers blocs d'image ; acquérir N éléments de premières données adaptatives à partir des N premiers blocs d'image, les N éléments de premières données adaptatives correspondant aux N premiers blocs d'image de manière biunivoque ; prétraiter les N premiers blocs d'image en fonction des N éléments de premières données adaptatives ; traiter les N premiers blocs d'image prétraités au moyen d'un réseau neuronal de codage pour obtenir N ensembles de premières cartes de caractéristiques ; et effectuer une quantification et un codage entropique sur les N ensembles de premières cartes de caractéristiques, de façon à obtenir N premières représentations codées. Dans la présente demande, l'extraction de multiples éléments d'informations adaptatives permet d'utiliser les multiples éléments d'informations adaptatives pour compenser de multiples blocs d'image restructurés, de telle sorte que des caractéristiques locales sont mises en évidence et que la qualité d'image d'une seconde image est améliorée.
PCT/CN2021/101807 2020-07-30 2021-06-23 Procédé de traitement d'image et dispositif associé WO2022022176A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010754333.9 2020-07-30
CN202010754333.9A CN114066914A (zh) 2020-07-30 2020-07-30 一种图像处理方法以及相关设备

Publications (1)

Publication Number Publication Date
WO2022022176A1 true WO2022022176A1 (fr) 2022-02-03

Family

ID=80037157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101807 WO2022022176A1 (fr) 2020-07-30 2021-06-23 Procédé de traitement d'image et dispositif associé

Country Status (2)

Country Link
CN (1) CN114066914A (fr)
WO (1) WO2022022176A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114923855A (zh) * 2022-05-12 2022-08-19 泉州装备制造研究所 一种皮革质量等级划分方法
WO2023207836A1 (fr) * 2022-04-26 2023-11-02 华为技术有限公司 Procédé et appareil de codage d'image, et procédé et appareil de décompression d'image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101978698A (zh) * 2008-03-18 2011-02-16 三星电子株式会社 用于对图像进行编码和解码的方法及设备
US20140185665A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated High-frequency-pass sample adaptive offset in video coding
CN105635732A (zh) * 2014-10-30 2016-06-01 联想(北京)有限公司 自适应样点补偿编码、对视频码流进行解码的方法及装置
CN111052740A (zh) * 2017-07-06 2020-04-21 三星电子株式会社 用于编码或解码图像的方法和装置
CN111405287A (zh) * 2019-01-03 2020-07-10 华为技术有限公司 色度块的预测方法和装置
CN112822489A (zh) * 2020-12-30 2021-05-18 北京博雅慧视智能技术研究院有限公司 一种样本自适应偏移补偿滤波的硬件实现方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101978698A (zh) * 2008-03-18 2011-02-16 三星电子株式会社 用于对图像进行编码和解码的方法及设备
US20140185665A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated High-frequency-pass sample adaptive offset in video coding
CN105635732A (zh) * 2014-10-30 2016-06-01 联想(北京)有限公司 自适应样点补偿编码、对视频码流进行解码的方法及装置
CN111052740A (zh) * 2017-07-06 2020-04-21 三星电子株式会社 用于编码或解码图像的方法和装置
CN111405287A (zh) * 2019-01-03 2020-07-10 华为技术有限公司 色度块的预测方法和装置
CN112822489A (zh) * 2020-12-30 2021-05-18 北京博雅慧视智能技术研究院有限公司 一种样本自适应偏移补偿滤波的硬件实现方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207836A1 (fr) * 2022-04-26 2023-11-02 华为技术有限公司 Procédé et appareil de codage d'image, et procédé et appareil de décompression d'image
CN114923855A (zh) * 2022-05-12 2022-08-19 泉州装备制造研究所 一种皮革质量等级划分方法

Also Published As

Publication number Publication date
CN114066914A (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2021155832A1 (fr) Procédé de traitement d'image et dispositif associé
WO2022021938A1 (fr) Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre
TWI834087B (zh) 用於從位元流重建圖像及用於將圖像編碼到位元流中的方法及裝置、電腦程式產品
US20230336758A1 (en) Encoding with signaling of feature map data
WO2022022176A1 (fr) Procédé de traitement d'image et dispositif associé
US20230336759A1 (en) Decoding with signaling of segmentation information
US20230353764A1 (en) Method and apparatus for decoding with signaling of feature map data
WO2022179588A1 (fr) Procédé de codage de données et dispositif associé
US20240078414A1 (en) Parallelized context modelling using information shared between patches
CN116547969A (zh) 基于机器学习的图像译码中色度子采样格式的处理方法
US11403782B2 (en) Static channel filtering in frequency domain
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023174256A1 (fr) Procédé de compression de données et dispositif associé
WO2022100140A1 (fr) Procédé et appareil de codage par compression, et procédé et appareil de décompression
WO2023160835A1 (fr) Modification d'image basée sur une transformée de fréquence spatiale à l'aide d'informations de corrélation inter-canaux
EP4388739A1 (fr) Modélisation de contexte basée sur l'attention pour compression d'image et de vidéo
WO2023172153A1 (fr) Procédé de codage vidéo par traitement multimodal
CN114693811A (zh) 一种图像处理方法以及相关设备
WO2023121499A1 (fr) Procédés et appareil pour approximer une fonction de distribution cumulative pour une utilisation dans des données de codage ou de décodage entropique
EP4226325A1 (fr) Procédé et appareil pour coder ou décoder une image à l'aide d'un réseau neuronal
EP4396942A1 (fr) Procédés et appareil pour approximer une fonction de distribution cumulative pour une utilisation dans des données de codage ou de décodage entropique
WO2024002496A1 (fr) Traitement parallèle de régions d'image à l'aide de réseaux neuronaux, décodage, post-filtrage et rdoq
WO2024002497A1 (fr) Traitement parallèle de régions d'image à l'aide de réseaux neuronaux, décodage, post-filtrage et rdoq

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21851415

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21851415

Country of ref document: EP

Kind code of ref document: A1