WO2022022176A1

WO2022022176A1 - Image processing method and related device

Info

Publication number: WO2022022176A1
Application number: PCT/CN2021/101807
Authority: WO
Inventors: 赵政辉; 马思伟; 王晶
Original assignee: 华为技术有限公司; 北京大学
Priority date: 2020-07-30
Filing date: 2021-06-23
Publication date: 2022-02-03
Also published as: CN114066914A

Abstract

The present application relates to the field of artificial intelligence. Disclosed is an image processing method, comprising: acquiring a first image; segmenting the first image to obtain N first image blocks; acquiring N pieces of first adaptive data from the N first image blocks, wherein the N pieces of first adaptive data correspond to the N first image blocks on a one-to-one basis; preprocessing the N first image blocks according to the N pieces of first adaptive data; processing the preprocessed N first image blocks by means of a coding neural network to obtain N sets of first feature maps; and performing quantization and entropy coding on the N sets of first feature maps, so as to obtain N first coded representations. In the present application, by extracting multiple pieces of adaptive information, the multiple pieces of adaptive information can be used for compensating for multiple restructured image blocks, such that local characteristics are highlighted, and the image quality of a second image is improved.

Description

An image processing method and related equipment

This application claims the priority of the Chinese patent application with the application number of 202010754333.9 and the invention titled "An image processing method and related equipment" filed with the China Patent Office on July 30, 2020, the entire contents of which are incorporated herein by reference middle.

technical field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method and related equipment.

Background technique

Multimedia data now accounts for the vast majority of Internet traffic. Compression of image data plays a vital role in the storage and efficient transmission of multimedia data. So image coding is a technology with great practical value.

The research on image coding has a long history. Researchers have proposed a large number of methods and formulated a variety of international standards, such as JPEG, JPEG2000, WebP, BPG and other image coding standards. Although these coding methods have been widely used at present, these traditional methods show some limitations in view of the increasing amount of image data and the emerging new media types. In recent years, researchers have begun to carry out research on image coding methods based on deep learning. Some researchers have achieved good results. For example, Ballé et al. proposed an end-to-end optimized image coding method, which achieved the best image coding performance and even the best traditional coding standard BPG. Deep learning image coding is a lossy image coding technology. The general process of deep learning image coding is as follows: extract the adaptive data of the image at the encoding end, use the adaptive data to preprocess the image, and use the coding neural network to process the preprocessed image. The compressed image is encoded to obtain compressed data, and the compressed data is decoded at the decoding end to obtain an image similar to the original image.

Although the above-mentioned deep learning image coding has made great progress compared with the traditional coding method, how to reduce the loss of image quality during the coding process is a problem that the lossy image coding technology has always needed to solve.

SUMMARY OF THE INVENTION

The present application provides an image processing method and related equipment for improving image quality.

A first aspect of the present application provides an image processing method, the method includes: an encoding end obtains a first image, and then divides the first image to obtain N first image blocks, where N is an integer greater than 1. The encoding end obtains N pieces of first adaptive data from the N first picture blocks, and the N pieces of first adaptive data are in one-to-one correspondence with the N first picture blocks. The encoding end preprocesses the N first image blocks by using the N first adaptive data. After preprocessing, the encoding end processes the preprocessed N first image blocks through an encoding neural network, and obtains N groups of first feature maps. The encoding end performs quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations. Wherein, by extracting multiple adaptive information, the multiple adaptive information can be used to compensate multiple reconstructed image blocks, so as to highlight local characteristics and improve the image quality of the second image.

In an optional design of the first aspect, if entropy decoding is performed on the N first encoded representations, N groups of second feature maps can be obtained. If N groups of second feature maps are processed through a decoding neural network, N first reconstructed image blocks can be obtained, and N first adaptive data are used to compensate the N first reconstructed image blocks. If the compensated N first reconstructed blocks are combined, a second image can be obtained.

In an optional design of the first aspect, the method further includes: the encoding end sends N first encoded representations, the N first adaptive data and the corresponding relationship to the decoding end, and the corresponding relationship includes the N first self-adaptive data. The correspondence between the adaptation data and the N first encoded representations is adapted.

In an optional design of the first aspect, the method further includes: the encoding end quantizes N pieces of first adaptive data to obtain N pieces of first adaptive quantization data. The encoding end sends N pieces of first adaptive quantization data to the decoding end, where the N pieces of first adaptive quantization data are used to compensate the N first reconstructed image blocks. Wherein, N is an integer greater than 1, and the encoding end needs to acquire a plurality of first adaptive data. Compared with acquiring only one adaptive data from the first image, when acquiring multiple adaptive data, quantizing the first adaptive data can reduce the data amount of the first adaptive data.

In an optional design of the first aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is. Wherein, the larger the N is, the more the number of the first image blocks is, and the more the number of the first adaptive quantization data is. In this case, by reducing the information entropy of the first adaptive quantization data, the quantization degree of the first adaptive quantity can be further improved, and the data quantity of the first adaptive data can be reduced.

In an optional design of the first aspect, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks. Compared with obtaining only one adaptive data from the first image, there are multiple adaptive data and multiple first tiles in the present application. Therefore, the present application needs to ensure the correspondence between multiple adaptive data and multiple first image blocks. By arranging the order to ensure the above correspondence, the amount of data can be reduced.

In an optional design of the first aspect, if the second image is processed through a fusion neural network to obtain a third image, the fusion neural network is used to reduce the difference between the second image and the first image, where the difference includes blockiness. The present application enhances the performance of each image block by highlighting the local characteristics of each image block, but it is also easy to cause block effects between image blocks. By processing the second image by fusing the neural network, the influence caused by the block effect can be reduced and the image quality can be improved.

In an optional design of the first aspect, each of the N first tiles has the same size. Among them, if the size of each first block is the same, in the operation of the feature map and the convolution layer in the coding neural network, the number of multiplications and additions involved in the convolution operation of each block is the same, which can improve the Operational efficiency.

In an optional design of the first aspect, when the method is used to segment the first images of different sizes, the size of the first image block is a fixed value. Among them, when processing images of different sizes, the encoding end divides the images into blocks of the same size. By fixing the size of the first block, high matching of the convolution operation unit and the block is made possible, so that the cost of the convolution operation unit can be reduced or the use efficiency of the convolution operation unit can be improved.

In an optional design of the first aspect, the pixels of the first block are a×b, a and b are obtained according to the target pixel, and the target pixel is c×d,

is equal to an integer,

Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component. The encoding end and/or the decoding end may or may not be a terminal device. Among them, under the setting of the target resolution, the image obtained by the encoding end can just be divided into different blocks, so as to avoid filling useless data and improve the image quality.

In an optional design of the first aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application. Wherein, the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution to improve the acquisition efficiency of the target resolution.

In an optional design of the first aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery. Wherein, the library obtained by the encoding end through the camera component includes image groups of different pixels. Determining the target pixels through the target image group can ensure that most images can be divided into different tiles, thereby avoiding filling useless data and improving image quality.

In an optional design of the first aspect, it is characterized in that an image of a plurality of pixels is obtained by the imaging component, and the plurality of pixels is e×f,

is equal to an integer,

Equal to an integer, e includes c and f includes d. Wherein, the terminal device can obtain images of different pixels through the camera component, and e×f is a pixel set of images of different pixels. The present application defines that images of different pixels obtained by the imaging component can just be divided into different blocks, so as to avoid filling useless data, thereby improving image quality.

In an optional design of the first aspect, the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.

In an optional design of the first aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r×t. After acquiring the first image, before dividing the first image, the method further includes: if

not equal to an integer, and/or

is not equal to an integer, then fill the edges of the first image with the pixel median such that

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1. The size of the first block is fixed, and the encoding end may need to face images of different pixels, that is, images of some pixels may not be exactly divided. In the case that the image cannot be divided exactly, filling the edges of the image with the median value of the image can improve the compatibility of the model while reducing the impact on the image quality. The image median is the median of the pixels.

In an optional design of the first aspect, after acquiring the first image, before filling the edge of the first image, the method further includes: if

is not equal to an integer, then proportionally enlarge r and t to obtain the first image with pixels r2×t2,

equal to an integer. like

not equal to an integer, then fill the edges of the first image with the pixel median. Among them, the number of tiles that fill the median of the pixels affects the quality of the image. Improve image quality by scaling up the image proportionally, reducing the number of tiles that fill the value in the image.

In an optional design of the first aspect, after proportionally enlarging r and t, if

not equal to an integer, get

the remainder. If the remainder is greater than

Then only the pixel median value is filled on one side of the width direction of the first image. Among them, only one side of the image is filled with the median value of the pixel, and the number of blocks filled with the median value of the image is further reduced under the condition of reducing the impact of the filling on the image block, and the image quality is improved.

In an optional design of the first aspect, if the remainder is less than

Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the median value of the pixels filled on each side is

where g is the remainder. Among them, the impact of filling on image blocks is reduced, and the image quality is improved.

In an optional design of the first aspect, the N first image blocks include a first target image block, and the range of pixel values of the first target image block is smaller than the range of pixel values of the first image. Before acquiring the N first adaptive data from the N first image blocks, the method further includes: an encoding end inverse-quantizes pixel values of the first target image block. The encoding end obtains a first adaptive data from the inverse quantized first target image block. The pixel value of the first target image block is inversely quantized to further highlight the local characteristics of the image.

A second aspect of the present application provides an image processing method, the method comprising:

The decoding end obtains N first encoded representations, N first adaptive data and corresponding relationships. The corresponding relationship includes the corresponding relationship between the N pieces of first adaptive data and the N pieces of first coding representation. The N pieces of first adaptive data are in one-to-one correspondence with the N pieces of first codes, where N is an integer greater than 1. The decoding end performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps. The decoding end processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks. The decoding end compensates the N first reconstructed image blocks by using the N first adaptive data. The decoding end combines the compensated N first reconstructed image blocks to obtain a second image. The multiple reconstructed image blocks are compensated through multiple adaptive data to highlight the local characteristics of each image block, thereby improving the image quality of the second image.

In an optional design of the second aspect, the N first encoded representations are obtained through quantization and entropy encoding of N sets of first feature maps, and the N sets of first feature maps are N preprocessed N sets of first feature maps processed by an encoding neural network. The preprocessed N first image blocks are obtained by preprocessing N first image blocks through N first adaptive data, and the N first adaptive data are obtained from N N first blocks are obtained by dividing the first image.

In an optional design of the second aspect, the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data. Specifically, the decoding end compensates the N first reconstructed image blocks by using the N first adaptive quantization data.

In an optional design of the second aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.

In an optional design of the second aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.

In an optional design of the second aspect, the method further includes: the decoding end processes the second image through a fusion neural network to obtain the third image. The second image is processed by a fusion neural network to reduce differences between the second image and the first image, including blockiness.

In an optional design of the second aspect, each of the N first tiles has the same size.

In an optional design of the second aspect, when the method is used to combine and generate second images of different sizes, the size of the first image block is a fixed value. In an optional design of the second aspect, the pixels of the first image block are a×b. a and b are obtained according to the target pixel, the target pixel is c×d,

is equal to an integer,

Equal to an integer, a and c are the number of pixels in the width direction, and b and d are the number of pixels in the height direction. The target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, the pixels of the image obtained by the camera component under the setting of the target resolution are the target pixels, and the first image is obtained by the camera component.

In an optional design of the second aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.

In an optional design of the second aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, and the pixels of the target image group are target pixels. Among the image groups with different pixels, the target image group has the largest ratio in the gallery.

In an optional design of the second aspect, an image of multiple pixels is obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

Equal to an integer, e includes c and f includes d.

In an optional design of the second aspect, the multiple pixels can be obtained by setting the resolution of the camera component through a setting interface in the camera application.

In an optional design of the second aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. exist

not equal to an integer, and/or

not equal to an integer, the edges of the first image are filled with the pixel median such that

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

In an optional design of the second aspect, in

In the case of not equal to an integer, r and t are proportionally enlarged by the encoder, and the first image with pixels r2×t2 is obtained,

equal to an integer.

In an optional design of the second aspect, if

not equal to an integer,

the remainder is greater than

The first image is padded with the median pixel value only on one side in the width direction.

In an optional design of the second aspect, if the remainder is less than

The first image is filled with the median value of pixels on both sides of the width direction, and the width of the pixel median value filled on each side is

where g is the remainder.

In an optional design of the second aspect, the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first The adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.

A third aspect of the present application provides a model training method, the method comprising:

get the first image;

dividing the first image to obtain N first image blocks, where N is an integer greater than 1;

Obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;

preprocessing the N first image blocks according to the N first adaptive data;

The preprocessed N first image blocks are processed by the first coding neural network to obtain N groups of first feature maps;

Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations;

Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;

The N groups of second feature maps are processed by the first decoding neural network to obtain N first reconstructed image blocks;

compensating the N first reconstructed image blocks by the N first adaptive data;

Combining the compensated N first reconstructed image blocks to obtain a second image;

obtaining a distortion loss of the second image relative to the first image;

A loss function is used to jointly train the model until the image distortion value between the first image and the second image reaches a first preset level, and the model includes the first coding neural network, quantization network, entropy An encoding network, an entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.

Output a second encoding neural network and a second decoding neural network, where the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network, and the second decoding neural network is the first decoding neural network The model obtained after the neural network has performed iterative training.

In an optional design of the third aspect, the method further includes:

Quantizing the N first adaptive data to obtain N first adaptive quantization data, and the N first adaptive quantization data is used to compensate the N first reconstructed picture blocks;

In an optional design of the third aspect, the larger the N, the smaller the information entropy of the single first adaptive quantization data.

In an optional design of the third aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is the the arrangement order of the N first image blocks in the first image.

In an optional design of the third aspect, the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;

Obtaining the distortion loss of the second image relative to the first image includes:

obtaining a distortion loss of the third image relative to the first image;

The model includes a fusion neural network.

In an optional design of the third aspect, each of the N first image blocks has the same size.

In an optional design of the third aspect, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.

In an optional design of the third aspect, the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,

is equal to an integer,

is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.

In an optional design of the third aspect, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.

In an optional design of the third aspect, the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.

In an optional design of the third aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

equal to an integer, the e includes the c and the f includes the d.

In an optional design of the third aspect, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.

In an optional design of the third aspect, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r×t;

After acquiring the first image, and before segmenting the first image, the method further includes:

like

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

In an optional design of the third aspect, after acquiring the first image and before filling the edge of the first image, the method further includes:

if said

is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the

is equal to an integer;

said if

not equal to an integer, and/or

is not equal to an integer, then filling the edge of the first image with the median value of pixels includes:

like

not equal to an integer, then fill the edges of the first image with the pixel median.

In an optional design of the third aspect, after proportionally enlarging r and t, if

not equal to an integer, get

the remainder. If the remainder is greater than

Then only the pixel median value is filled on one side of the width direction of the first image.

In an optional design of the third aspect, if the remainder is less than

Then fill the pixel median value on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is

Wherein, the g is the remainder.

In an optional design of the third aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;

Before acquiring the N first adaptive data from the N first tiles, the method further includes:

inversely quantize the pixel value of the first target image block;

Obtaining N pieces of first adaptive data from the N pieces of first tiles includes:

The one first adaptive data is obtained from the inverse quantized first target image block.

A fourth aspect of the present application provides an encoding device, the device comprising:

a first acquisition module, configured to acquire a first image;

A segmentation module, used for segmenting the first image to obtain N first image blocks, where N is an integer greater than 1;

A second acquisition module, configured to acquire N first adaptive data from N first image blocks, where N first adaptive data corresponds to N first image blocks one-to-one;

a preprocessing module, configured to preprocess the N first image blocks according to the N first adaptive data;

The coding neural network module processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps;

The quantization and entropy encoding module is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.

In an optional design of the fourth aspect, the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network to obtain N sets of second feature maps. For a reconstructed image block, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used to combine into a second image.

In an optional design of the fourth aspect, the device further includes:

The sending module is configured to send the N pieces of first coded representations, the N pieces of first adaptive data and corresponding relationships to the decoding end, where the correspondence relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations.

In an optional design of the fourth aspect, the device further includes:

a quantization module, configured to quantize the N first adaptive data to obtain the N first adaptive quantized data, and the N first adaptive quantized data is used to compensate the N first reconstructed image blocks;

The sending module is specifically configured to send the N pieces of first adaptive quantization data to the decoding end.

In an optional design of the fourth aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.

In an optional design of the fourth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in The arrangement order in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.

In an optional design of the fourth aspect, the second image is processed by a fusion neural network to obtain the third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference including block effect.

In an optional design of the fourth aspect, each of the N first tiles has the same size.

In an optional design of the fourth aspect, when the apparatus is used to process the first images of different sizes, the size of the first image block is a fixed value. In an optional design of the fourth aspect, the pixels of the first image block are a×b, and a and b are obtained according to target pixels. The target pixel is c×d,

is equal to an integer,

Equal to an integer, the a and c are the number of pixels in the width direction, and the b and d are the number of pixels in the height direction. The target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, the pixels of the image obtained by the camera component under the setting of the target resolution are the target pixels, and the first image is obtained by the camera component.

In an optional design of the fourth aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.

In an optional design of the fourth aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.

In an optional design of the fourth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

Equal to an integer, e includes c and f includes d.

In an optional design of the fourth aspect, the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.

In an optional design of the fourth aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. The device also includes:

padding module for if

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

In an optional design of the fourth aspect, the device further includes:

Amplification module for use if described

is equal to an integer;

The padding module is specifically used if

In an optional design of the fourth aspect, the second obtaining unit is further configured to

not equal to an integer, get

the remainder;

The filling module is specifically used if the remainder is greater than

In an optional design of the fourth aspect, the filling module is specifically used if the remainder is less than

Then fill the median value of pixels on both sides of the width direction of the first image, so that the width of the pixel median value filled on each side is

where g is the remainder.

In an optional design of the fourth aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image. The device also includes:

an inverse quantization module for inverse quantization of the pixel value of the first target image block;

The second obtaining module is specifically configured to obtain a piece of first adaptive data from the inverse quantized first target image block.

A fifth aspect of the present application provides a decoding device, the decoding device comprising:

The acquisition module is used to acquire N first coding representations, N first adaptive data and corresponding relationships, the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coding representations, the N first self-adaptive data and the corresponding relationships. The adaptation data is in one-to-one correspondence with the N first codes, where N is an integer greater than 1;

The entropy decoding module performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;

A decoding neural network module for processing N groups of second feature maps to obtain N first reconstructed image blocks;

a compensation module, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;

The combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.

In an optional design of the fifth aspect, the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network. The N first image blocks after processing are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data. The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.

In an optional design of the fifth aspect, the N pieces of first adaptive data are N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are obtained by quantizing the N pieces of first adaptive data;

The compensation module is specifically configured to compensate the N first reconstructed image blocks by using the N first adaptive quantization data.

In an optional design of the fifth aspect, the larger N is, the smaller the information entropy of the single first adaptive quantization data is.

In an optional design of the fifth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is that the N first image blocks are in the arrangement order in the first image. The corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.

In an optional design of the fifth aspect, the device further includes:

The fusion neural network module is used to process the second image to obtain the third image, so as to reduce the difference between the second image and the first image, and the difference includes block effect.

In an optional design of the fifth aspect, each of the N first tiles has the same size.

In an optional design of the fifth aspect, when the apparatus is used to combine and generate second images of different sizes, the size of the first image block is a fixed value. In an optional design of the fifth aspect, the pixels of the first block are a×b, a and b are obtained according to the target pixel, and the target pixel is c×d,

is equal to an integer,

Equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, the target pixel is obtained according to the target resolution of the terminal device, the terminal device includes a camera component, and the camera component is located in the target The pixel of the image obtained under the setting of the resolution is the target pixel, and the first image is obtained by the imaging component.

In an optional design of the fifth aspect, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application.

In an optional design of the fifth aspect, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and in the image groups of different pixels, the target image group The ratio is the largest in the gallery.

In an optional design of the fifth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

Equal to an integer, e includes c and f includes d.

In an optional design of the fifth aspect, the plurality of pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.

In an optional design of the fifth aspect, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r ×t. exist

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

In an optional design of the fifth aspect, in

equal to an integer.

In an optional design of the fifth aspect, if

not equal to an integer,

the remainder is greater than

In an optional design of the fifth aspect, if the remainder is less than

where g is the remainder.

In an optional design of the fifth aspect, the N first tiles include a first target tile, the range of pixel values of the first target tile is smaller than the range of pixel values of the first image, and at least one of the first The adaptive data is obtained from the inverse quantized first target image block, and the inverse quantized first target image block is obtained by inversely quantizing pixel values of the first target image block.

A sixth aspect of the present application provides a training device, the device comprising:

a first acquisition module, configured to acquire a first image;

a segmentation module, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;

a second obtaining module, configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;

The first coding neural network module is used to process the preprocessed N first image blocks to obtain N groups of first feature maps;

a quantization and entropy encoding module, for performing quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations;

an entropy decoding module, which performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;

a first decoding neural network module for processing the N groups of second feature maps to obtain N first reconstructed image blocks;

a combining module for combining the compensated N first reconstructed image blocks to obtain a second image;

a third acquiring module, configured to acquire the distortion loss of the second image relative to the first image;

A training module, configured to jointly train a model by using a loss function until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding neural network , a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block. Optionally, the model further includes a segmentation network, and the trainable parameter in the segmentation network is the size of the first image block;

The output module is used to output a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained after the first coding neural network performs iterative training, and the second decoding neural network is The first decoding neural network is a model obtained after iterative training is performed.

In an optional design of the sixth aspect, the device further includes:

A quantization module, configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;

In an optional design of the sixth aspect, the larger the N, the smaller the information entropy of the single first adaptive quantization data.

In an optional design of the sixth aspect, the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the arrangement order of the N first image blocks is the same as that of the N first image blocks. the arrangement order of the N first image blocks in the first image.

In an optional design of the sixth aspect, the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce the difference between the second image and the first image, the difference includes block effect;

The third obtaining module is specifically configured to obtain the distortion loss of the third image relative to the first image;

The model includes a fusion neural network.

In an optional design of the sixth aspect, each of the N first image blocks has the same size.

In an optional design of the sixth aspect, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.

In an optional design of the sixth aspect, the pixels of the first block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,

is equal to an integer,

In an optional design of the sixth aspect, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.

In an optional design of the sixth aspect, the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, and the pixels of the target image group are the target pixels, and the pixels in different pixels are Among the image groups, the ratio of the target image group in the gallery is the largest.

In an optional design of the sixth aspect, images of multiple pixels are obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

equal to an integer, the e includes the c and the f includes the d.

In an optional design of the sixth aspect, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.

In an optional design of the sixth aspect, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, so The pixel of the first image is r×t;

The device also includes:

padding module for if

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

In an optional design of the sixth aspect, the device further includes:

Amplification module for use if described

is equal to an integer;

The padding module is specifically used if

In an optional design of the sixth aspect, the second acquisition module is further configured to, after proportionally amplifying r and t, if

not equal to an integer, get

the remainder;

The filling module is specifically used for if the remainder is greater than

Then the pixel median value is only filled on one side of the width direction of the first image.

In an optional design of the sixth aspect, if the remainder is less than

Wherein, the g is the remainder.

In an optional design of the sixth aspect, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image ;

The device also includes:

The second obtaining module is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.

A seventh aspect of the present application provides an encoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:

get the first image;

Divide the first image to obtain N first image blocks, where N is an integer greater than 1;

Obtain N first adaptive data from N first image blocks, and N first adaptive data correspond to N first image blocks one-to-one;

Preprocess the N first image blocks by using the N first adaptive data;

The preprocessed N first image blocks are processed by the coding neural network to obtain N groups of first feature maps;

Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.

In an optional design of the seventh aspect, the encoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.

In the seventh aspect of the present application, the processor may also be configured to execute the steps performed by the encoding end in each possible implementation manner of the first aspect. For details, refer to the first aspect, which will not be repeated here.

An eighth aspect of the present application provides a decoding device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:

Obtain N first coded representations, N pieces of first adaptive data and correspondences, the correspondences include correspondences between N pieces of first adaptive data and N pieces of first coded representations, and N pieces of first adaptive data and N pieces of first coded representations The first codes are in one-to-one correspondence, and N is an integer greater than 1;

The N groups of second feature maps are processed by the decoding neural network to obtain N first reconstructed image blocks;

Compensate the N first reconstructed image blocks by using the N first adaptive data;

The compensated N first reconstructed image blocks are combined to obtain a second image.

In an optional design of the eighth aspect, the decoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop computer, a server, or a smart wearable device.

In the eighth aspect of the present application, the processor may also be configured to execute the steps performed by the decoding end in each possible implementation manner of the second aspect, and details can be found in the second aspect, which will not be repeated here.

A ninth aspect of the present application provides a training device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:

get the first image;

preprocessing the N first image blocks according to the N first adaptive data;

obtaining a distortion loss of the second image relative to the first image;

The first encoding neural network, the quantization network, the entropy encoding network, the entropy decoding network, and the first decoding neural network are jointly trained by using the loss function, until the image between the first image and the second image is The distortion value reaches the first preset level;

In the ninth aspect of the present application, the processor may also be used to execute the steps performed by the decoding end in each possible implementation manner of the third aspect, and details can be found in the third aspect, which will not be repeated here.

In a tenth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer enables the computer to execute the above-mentioned first to third aspects Any of the image processing methods described.

In an eleventh aspect, an embodiment of the present application provides a computer program, which, when run on a computer, causes the computer to execute the image processing method described in any one of the first to third aspects above.

In a twelfth aspect, the present application provides a chip system, the chip system includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and/or information. In a possible design, the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device. The chip system may be composed of chips, or may include chips and other discrete devices.

Description of drawings

Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame;

2a is a schematic diagram of an application scenario of an embodiment of the present application;

2b is a schematic diagram of another application scenario of the embodiment of the present application;

FIG. 2c is a schematic diagram of another application scenario of the embodiment of the present application;

3a is a schematic flowchart of an image processing method provided by an embodiment of the present application;

FIG. 3b is another schematic flowchart of the image processing method provided by the embodiment of the present application;

4 is a schematic diagram of dividing and combining images in an embodiment of the present application;

5 is a schematic diagram of a CNN-based image encoding processing process in an embodiment of the present application;

6 is a schematic diagram of a CNN-based image decoding process in an embodiment of the application;

7 is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application;

FIG. 8 is a schematic flowchart of image filling in an embodiment of the present application;

9 is another schematic flowchart of image filling in an embodiment of the present application;

10 is a schematic diagram of a comparison of image compression quality in an embodiment of the present application;

11 is a system architecture diagram of an image processing system provided by an embodiment of the present application;

12 is a schematic flowchart of a model training method provided by an embodiment of the present application;

13 is a schematic diagram of a training process provided by an embodiment of the present application;

FIG. 14 is a schematic structural diagram of an encoding device provided by an embodiment of the present application;

15 is a schematic structural diagram of a decoding apparatus provided by an embodiment of the present application;

16 is a schematic structural diagram of a training device provided by an embodiment of the application;

FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application;

18 is a schematic structural diagram of a training device provided by an embodiment of the present application;

FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application.

detailed description

The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

The embodiments of the present application will be described below with reference to the accompanying drawings. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.

First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 is a structural schematic diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is elaborated in two dimensions. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.

Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4) General ability

After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.

(5) Smart products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.

This application can be applied to the field of image processing in the field of artificial intelligence. The following will introduce multiple application scenarios that are applied to products.

The image compression and decompression process applied to the terminal device, the encoding end and the decoding end are all terminal devices.

The image compression method provided by the embodiment of the present application can be applied to an image compression process in a terminal device, and specifically, can be applied to an album, video monitoring, etc. on the terminal device. Specifically, reference may be made to FIG. 2a, which is a schematic diagram of an application scenario of an embodiment of the present application. As shown in FIG. 2a, a terminal device may acquire an image to be compressed, where the image to be compressed may be a photo taken by a camera component Or a frame captured from a video, and the camera component is generally a camera. The terminal device divides the extracted image through a central processing unit (CPU) to obtain multiple tiles. After obtaining multiple tiles, the terminal device can use the artificial intelligence (artificial intelligence, AI) coding neural network (referred to as coding neural network) in the embedded neural network (neural-network processing unit, NPU). Feature extraction is performed on the blocks, the block data is transformed into output features with lower redundancy, and the probability estimates of each feature point in the output feature are generated. The probability estimation performs entropy coding on the extracted output features, reduces the coding redundancy of the output features, further reduces the amount of data transmission in the block compression process, and saves the encoded encoded data in the form of data files in the corresponding storage location . When the user needs to obtain the file saved in the above storage location, the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding. Decoding neural network) reconstructs the feature map to obtain multiple reconstructed blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.

In particular, in this scenario, the terminal device can save the encoded data on the cloud device. When the user needs to obtain the above-mentioned encoded data, the encoded data can be obtained from the cloud device.

The image compression and decompression process applied to the cloud, the encoding end and the decoding end are all cloud devices.

The image compression method provided by the embodiment of the present application can be applied to an image compression process in the cloud, and specifically, can be applied to functions such as a cloud album on a cloud device, and the cloud device can be a cloud server. Specifically, reference may be made to FIG. 2b, which is a schematic diagram of another application scenario of an embodiment of the present application. As shown in FIG. 2b, a terminal device may acquire an image to be compressed, and the image to be compressed may be captured by a camera component A photo or a frame taken from a video. The terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data. In addition to using entropy coding, any lossless compression method based on the prior art can also be used. The terminal device can transmit the encoded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received encoded data to obtain the image to be compressed. The terminal device divides the extracted image through the CPU to obtain multiple tiles. After obtaining multiple blocks, the server can perform feature extraction on the obtained multiple blocks through the coding neural network in the graphics processing unit (GPU), and transform the block data into lower-redundancy ones. Output features, and generate a probability estimate of each point in the output feature. The CPU performs entropy encoding on the extracted output feature through the probability estimate of each point in the output feature, reducing the coding redundancy of the output feature and further reducing the block compression process. The amount of data transmission is stored, and the encoded data obtained by encoding is stored in the corresponding storage location in the form of a data file. When the user needs to obtain the file saved in the above storage location, the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on entropy decoding, and decode the feature map through the decoding neural network in the NPU. Reconstruction is performed to obtain a plurality of reconstructed picture blocks, and after obtaining the plurality of picture blocks, the cloud device combines the plurality of reconstructed picture blocks through the CPU to obtain a reconstructed image. The cloud device can perform entropy encoding on the compressed image through the CPU to obtain encoded data. The encoding method can also be any other lossless compression method based on the prior art. The cloud device can transmit the encoded data to the terminal device, and the terminal device can Corresponding entropy decoding is performed on the received encoded data to obtain a decoded image.

3. The image decompression applied to the terminal device, the image compression process of the cloud device, the encoding end is the cloud device, and the decoding end is the terminal device.

The image compression method provided by the embodiments of the present application can be applied to image compression of terminal devices and image decompression processes of cloud devices. Specifically, it can be applied to functions such as cloud albums on cloud devices. The cloud device can be a cloud server. Specifically, reference may be made to FIG. 2c, which is a schematic diagram of another application scenario of an embodiment of the present application. As shown in FIG. 2c, a terminal device may acquire an image to be compressed, where the image to be compressed may be captured by a camera component A photo or a frame taken from a video. The terminal device can perform entropy encoding on the to-be-compressed picture through the CPU to obtain encoded data. In addition to using entropy coding, any lossless compression method based on the prior art can also be used. The terminal device can transmit the encoded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received encoded data to obtain the image to be compressed. The terminal device divides the extracted image through the CPU to obtain multiple tiles. After obtaining multiple tiles, the server can perform feature extraction on the acquired tiles through the coding neural network in the GPU, transform the tile data into output features with lower redundancy, and generate each of the output features. The probability estimation of points, the CPU performs entropy coding on the extracted output features through the probability estimation of each point in the output features, reduces the coding redundancy of the output features, and further reduces the amount of data transmission in the block compression process, and encodes the obtained output features. The encoded data is saved in the corresponding storage location in the form of a data file. When the terminal device needs to acquire the above image, the terminal device receives the encoded data sent by the cloud device, and acquires the decoded feature map based on entropy decoding. The terminal device reconstructs the feature map through the decoding neural network in the NPU to obtain multiple reconstructed image blocks. After obtaining multiple image blocks, the terminal device combines the multiple reconstructed image blocks through the CPU to obtain a reconstructed image.

Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.

(1) Neural network

A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:

Among them, s=1, 2, ..., n, n is a natural number greater than 1, Ws is the weight of Xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

(2) Deep neural network

A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complicated, in terms of the work of each layer, it is not complicated. In short, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and offset vector

The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as

It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).

(3) Convolutional Neural Network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.

(5) Back propagation algorithm

The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

In the embodiment of the present application, not only the operation of image segmentation is performed, but also the step of extracting adaptive data for multiple tiles is added between the image segmentation and the coding neural network. The adaptive data may be mean, mean square error Wait. The extracted adaptive data is used to preprocess the tiles. The coding neural network performs feature extraction on multiple preprocessed patches. In addition to preprocessing the tiles, the adaptive data is also used to compensate for the reconstructed tiles. For the convenience of description, the image processing method in the embodiment of the present application will be described in detail below by taking the adaptive data as the mean value and the application scenario as the above-mentioned third application scenario as an example. In the third application scenario, the cloud device is the encoding end, and the terminal device is the decoding end.

As an example, the terminal device may be a mobile phone, a tablet, a notebook computer, a smart wearable device, or the like. As another example, the terminal device may be a virtual reality (virtual reality, VR) device. As another example, the embodiments of the present application can also be applied to intelligent monitoring, and a camera can be configured in the intelligent monitoring, and the intelligent monitoring can obtain pictures to be compressed through the camera, etc. It should be understood that the embodiments of the present application can also be applied to In other scenarios that require image compression, the other scenarios will not be listed one by one here.

Please refer to FIG. 3a, which is a schematic flowchart of an image processing method provided by an embodiment of the present application.

In step 301, the terminal device acquires a first image.

The terminal device can obtain the first image, where the first image can be a photo taken by the camera component or a frame captured from the captured video, the terminal device includes the camera component, and the camera component is generally a camera. The first image may also be an image obtained by the terminal device from the network, or an image obtained by the terminal device using a screen capture tool.

In particular, regarding the image processing method in the embodiment of the present application, reference may also be made to FIG. 3b, which is another schematic flowchart of the image processing method provided by the embodiment of the present application. Fig. 3b illustrates the whole process of outputting the third image from the first image.

In step 302, the terminal device sends the first image to the cloud device.

Before the terminal device sends the first image to the cloud device, the terminal device may perform lossless encoding on the first image to obtain encoded data. The encoding method can be entropy encoding, or other lossless compression methods.

In step 303, the cloud device divides the first image to obtain N first image blocks.

The cloud device may receive the first image sent by the terminal device. If the first image undergoes lossless encoding by the terminal device, the cloud device also needs to perform lossless decoding on it. The cloud device divides the first image to obtain N first image blocks, where N is an integer greater than 1. FIG. 4 is a schematic diagram of dividing and combining images in an embodiment of the present application. As shown in FIG. 4 , the first image 401 is divided into 12 first tiles. Wherein, when the size of the first image is determined, the size of the first image block determines the value of N. The N being 12 described here is just an example, and the size of the first image block will be described in detail in the subsequent description.

Optionally, each of the N first tiles has the same size.

In step 304, the cloud device obtains M first averages from the N first tiles.

If the first image is a three-channel image, the first tile includes data of three channels, and the number M of the first average values obtained by the cloud device is equal to 3N. If the first image is a grayscale image, that is, a one-channel image, the first block includes data of one channel, and the number M of the first mean values obtained by the cloud device is equal to N. Because the processing manner of each channel is similar, for the convenience of description, only one channel is used as an example for description in this embodiment of the present application. The mean refers to the mean of the pixel values of all the pixels in the first tile.

In step 305, the cloud device preprocesses the N first image blocks by using the N first averages.

The preprocessing may be to subtract the mean value from the pixel value of each pixel point in the first image block to obtain N first image blocks after preprocessing.

In step 306, the pre-processed N first image blocks are processed through an encoding neural network to obtain N sets of first feature maps.

In this embodiment of the present application, optionally, the coding neural network is a CNN, and the terminal device may perform feature extraction on the preprocessed N first image blocks based on the CNN to obtain N groups of first feature maps. Each set of first feature maps corresponds to one first block, and each set of first feature maps includes at least one feature map. Hereinafter, the first feature map may also be referred to as a channel feature map image, wherein each semantic channel corresponds to a first feature map.

In the embodiment of the present application, referring to FIG. 5 , FIG. 5 is a schematic diagram of a CNN-based image coding process in the embodiment of the present application. The first feature map 504, where the CNN 502 may include multiple CNN layers.

For example, the CNN 502 can multiply the upper left 3×3 pixels of the input data (the first tile) by the weights and map them to the neurons in the upper left end of the first feature map. The weight to be multiplied will also be 3x3. Thereafter, in the same process, the CNN 502 scans the input data (the first tile) one by one from left to right and top to bottom, and multiplies the weights to map the neurons of the feature map. Here, the 3x3 weights used are called filters or filter kernels. That is to say, the process of applying filters in CNN502 is the process of performing convolution operations using filter kernels, and the extracted results are called "channel feature maps", where the channel feature maps can also be called multi-channel Feature map image, the term "multi-channel feature map image" may refer to a set of feature map images corresponding to multiple channels. According to an embodiment, the channel feature map may be generated by CNN 502, also referred to as a "feature extraction layer" or "convolutional layer" of a CNN. The layers of a CNN can define the mapping of output to input. The mapping defined by the layers is performed as one or more filter kernels (convolution kernels) to be applied to the input data to generate channel feature maps to be output to the next layer. The input data can be the first block or the channel feature map output by CNN502.

Referring to Figure 5, during forward execution, a CNN 502 receives a first tile 501 and generates a channel feature map 503 as input. Additionally, during forward execution, the next layer of CNN receives channel feature map 503 as input and generates channel feature map 503 as output. Then, each subsequent layer will receive the channel feature map generated in the previous layer and use it as an input to generate the channel feature map of the next layer. Finally, a set of first feature maps 504 generated in the (X1)th layer is received. Wherein, X1 is an integer greater than 1, that is, the channel feature maps of each layer above may be used as a set of first feature maps 504 .

The cloud device repeats the above operations for each first block, so as to obtain N groups of first feature maps.

Optionally, with the increase of the CNN502 level, the length and width of each feature map in the multi-channel feature map image gradually decrease, and the number of semantic channels in the multi-channel feature map image gradually increases, so as to realize the first image block. data compression.

At the same time, other processing operations can be performed in addition to the operation of applying convolution kernels that map input feature maps to output feature maps. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.

For example, as shown in Figure 3b, optionally, after each layer of convolution kernel, a GDN (generalized divisive normalization) activation function is also included, and the expression form of GDN is:

where u represents the jth channel of the output of the ith convolutional layer. v represents the output result of the corresponding activation function, and β and γ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.

It should be noted that the above is only an implementation manner of performing feature extraction on the first image block, and in practical applications, the specific implementation manner of feature extraction is not limited.

In this embodiment of the present application, in the above manner, the first image block is transformed into another space (at least one first feature map) through the CNN convolutional neural network. Optionally, the number of first feature maps is 192, that is, the number of semantic channels is 192, and each semantic channel corresponds to a first feature map. In this embodiment of the present application, at least one first feature map may be in the form of a three-dimensional tensor, and its size may be 192×w×h, where w×h is the width of the matrix corresponding to the first feature map of a single channel and the long.

In step 307, the N groups of first feature maps are quantized and entropy encoded to obtain N first encoded representations.

In the embodiment of the present application, after the N groups of first feature maps are obtained by processing the preprocessed N first feature maps by the coding neural network, quantization and entropy encoding may be performed on the processed N groups of first feature maps to obtain N A first encoded representation.

In this embodiment of the present application, the N groups of first feature maps are converted to a quantization center according to a specified rule, so that entropy coding can be performed subsequently. The quantization operation may convert the N sets of first feature maps from floating point numbers into bit streams (eg, bit streams using specific bit integers such as 8-bit integers or 4-bit integers). In some embodiments, the quantization operation may be performed on the N sets of first feature maps using rounding, but not limited to.

In the embodiment of the present application, the probability estimation of each point in the output feature can be obtained by using an entropy estimation network, and the output feature is entropy encoded by using the probability estimation to obtain a binary code stream. It should be noted that the entropy encoding process mentioned in this application is Existing entropy coding technology can be used, which is not repeated in this application.

In step 308, the cloud device sends the N first encoded representations, the N first mean values and the corresponding relationship to the terminal device.

In the above step 302, the terminal device stores the first image in the cloud device. If the terminal device needs to acquire the first image, it can send a request to the cloud device. After the cloud device receives the request sent by the terminal device, the cloud device sends N first encoded representations, N first mean values and corresponding relationships to the terminal device. The correspondence relationship refers to the correspondence relationship between the N first coded representations and the N first mean values.

Optionally, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is the arrangement order of the N first tiles in the first image, The corresponding relationship includes the arrangement order of the N first encoded representations and the arrangement order of the N first image blocks.

Optionally, before the cloud device sends the N first mean values to the terminal device, the cloud device quantizes the N first mean values to obtain N first quantized mean values. For example, the pixel value of each pixel in the first tile is represented by 8 bits, and the first mean value of the first tile is a 32-bit floating point number. The cloud-side device quantizes the first mean value, and the number of bits of the obtained first quantized mean value is less than 32. The smaller the number of bits of the first quantized mean, the smaller the information entropy of the first quantized mean. Further, the number of bits of the first quantized mean is equal to the number of bits of the pixel value of each pixel in the first tile, that is, when the pixel value of each pixel in the first tile is represented by 8 bits, the A quantized mean value is also represented by 8 bits.

Optionally, based on the cloud device quantizing the N first averages, the larger the value of N, the smaller the information entropy of a single first quantized average. The information entropy is used to describe the quantization degree of the N first means by the cloud-side device. If the information entropy of a single first quantized mean value is smaller, it indicates that the cloud-side device has a higher quantization degree on the N first mean values. When processing the first images of the same pixel, the smaller the pixel of each first image, the larger the N is. The larger N is, the larger the amount of data of the N first averages is. For example, assuming that the pixels of the first image are 640×480, the pixels of the first block are 320×480, N is 2, and each first quantized mean is represented by 8 bits, the data amount of the N first quantized mean is 2x8 bits. Assuming that the pixels of the first block are 1×1, N is 640×480, and each first quantized mean value is represented by 8 bits, the data amount of the N first quantized mean values is 640×480×8 bits. The data amount of the first image is also 640×480×8 bits. It can be seen from this that the larger the value of N, the larger the data volume of the N first averages. When N is equal to the pixel size of the first image, even the data volume of the quantized first averages has reached the data volume of the first image. quantity. Therefore, the larger the value of N in the embodiment of the present application, the smaller the information entropy of the single first quantized mean value.

In step 309, the terminal device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

After the terminal device receives the N first encoded representations sent by the cloud device, the terminal device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

In step 310, the terminal device processes N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks.

In this embodiment of the present application, optionally, the decoding neural network is CNN, and the terminal device may reconstruct N groups of second feature maps based on the CNN to obtain N groups of first image blocks. Each set of second feature maps corresponds to a first block, and each set of second feature maps includes at least one feature map. Hereinafter, the second feature map may also be referred to as a reconstructed feature map image, wherein each semantic channel corresponds to a second feature map.

In the embodiment of the present application, referring to FIG. 6, FIG. 6 is a schematic diagram of a CNN-based image decoding process in the embodiment of the present application, and FIG. 6 shows a set of second feature maps 601, transposed CNN 602, and reconstructed feature maps 603 , the first reconstructed image block 604 . The transposed CNN 602 may include multiple transposed CNN layers.

For example, the transposed CNN 602 can multiply the top left pixel of the input data (a set of second feature maps 601 ) by a weight and map it to the neuron on the top left of the reconstructed feature map 603 . The weight to be multiplied will be 3x3. Thereafter, in the same process, the transposed CNN 602 scans the input data (a second set of feature maps 601 ) one by one from left to right and from top to bottom, and multiplies the weights to map the neurons of the feature map. After transposing the CNN 602 with a weight of 3×3, the length and width of the reconstructed feature map 603 obtained become three times that of the second feature map. Here, the 3x3 weights used are called inverse filters or inverse filter kernels. That is, the process of applying an inverse filter in the transposed CNN 602 is a process of performing a deconvolution operation using an inverse filter kernel, and the extracted result is called a "reconstructed feature map". According to an embodiment, the reconstructed feature map may be generated by the transposed CNN 602, also known as the transposed convolutional layer of the CNN. The layers of a CNN can define the mapping of output to input. The mapping defined by the layers is performed as one or more inverse filter kernels (transposed convolutional layers) to be applied to the input data to generate reconstructed feature maps to be output to the next layer. The input data can be a set of second feature maps or a reconstructed feature map of a particular layer.

Referring to FIG. 6 , the transposed CNN 602 receives a set of second feature maps 601 and generates a reconstructed feature map 603 as output. In addition, the next-layer transposed CNN receives the reconstructed feature map 603 as an input, and generates a reconstructed feature map of the next layer as an output. Each subsequent transposed CNN layer will then receive the reconstructed feature map generated in the previous layer and generate the next reconstructed feature map as output. Finally, the first reconstructed image block 604 generated in the (X2)th layer is received, where X2 is an integer greater than 1, that is, the reconstructed feature map of each layer above may be used as the first reconstructed map Block 604. The cloud device repeats the above operations for each set of second feature maps, so as to obtain N first reconstructed image blocks.

Optionally, as the number of layers of the transposed CNN increases, the length and width of each feature map in the reconstructed feature map gradually increase until the size of the first block before being input to the encoding neural network is restored. The number of semantic channels of the reconstructed feature map gradually decreases until the semantic channel of the first image block before being input to the encoding neural network is restored. When the first image block is a single-channel image, the semantic channel of the first reconstructed image block 604 is restored. is 1, when the first tile is a three-channel image, the semantic channel of the first reconstructed tile 604 is 3. Through the above reconstruction, the data decoding of the first image block is realized.

Meanwhile, in addition to applying the operation of mapping each set of second feature maps to the transposed convolution kernel of the reconstructed feature map, other processing operations can also be performed. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.

For example, optionally, as shown in Figure 3b, after each layer of transposed convolution kernels in the decoding neural network, inverse generalized divergence normalization (iGDN) is also included, where iGDN is the encoding segment GDN activation The approximate inverse form of the function, iGDN is expressed as:

where v represents the jth channel of the output of the ith convolutional layer. u represents the output of the corresponding activation function, and β and γ are the trainable parameters of the activation function, which are used to enhance the nonlinear expression ability of the neural network.

In step 311, the terminal device compensates the N first reconstructed image blocks by using the N first mean values.

In the above step 308, the cloud device sends the N first mean values and the corresponding relationship to the terminal device. After obtaining the N first reconstructed image blocks through the decoding neural network, the terminal device compensates the N first reconstructed image blocks by using the N first mean values through the corresponding relationship. Compensation refers to adding a first mean value to the pixel value of each pixel in the first reconstructed image block to obtain a compensated first reconstructed image block. After the terminal device repeatedly performs compensation on the N first reconstructed picture blocks, N first reconstructed picture blocks after compensation can be obtained.

Optionally, when the terminal device receives N first quantized mean values from the cloud device, the terminal device compensates the N first reconstructed image blocks by using the N first quantized mean values. It should be determined that when the terminal device compensates the N first reconstructed image blocks by using the N first quantized average values, the cloud device will also preprocess the N first image blocks by using the N first quantized average values.

In step 312, the terminal device combines the compensated N first reconstructed picture blocks to obtain a second image.

Referring to FIG. 4 , the combination is an inverse process of division, the N first reconstructed tiles are replaced with N first reconstructed tiles, and then the N first reconstructed tiles are combined.

In step 313, the terminal device processes the second image through the fusion neural network to obtain a third image.

The embodiments of the present application enhance the performance of each first image block by highlighting the local characteristics of each first image block, but it is also easy to cause blockiness between the first reconstructed image block and the first reconstructed image block. Blockiness refers to a discontinuity phenomenon at the boundary between the first reconstructed image block and the first reconstructed image block, forming a defect in the reconstructed image. By processing the second image by fusing the neural network, the influence caused by the block effect can be reduced and the image quality can be improved.

Optionally, the fusion neural network is a CNN. Please refer to Figure 5 and Figure 6. From the structure of CNN, the fusion neural network can be a combination of encoding neural network and decoding neural network. By taking the output 504 in FIG. 5 as the input 601 in FIG. 6 , and taking the second image as the input 501 in FIG. 5 , the output of FIG. 6 is the third image. By fusing the neural network, the blocking effect in the second image can be eliminated. It should be confirmed that here is a simple example of the framework of a fusion neural network. In practical applications, the framework of the fusion neural network, such as the number of layers of CNN, the number of layers of transposed CNN, the size of the matrix of each CNN layer, etc. can have nothing to do with encoding neural network, decoding neural network.

Optionally, as shown in Figure 3b, after the convolution kernel of the neural network is fused, a linear rectifier unit layer ReLU is also included, and the ReLU is used to correct the negative numbers in the feature map output by the convolution kernel to zero.

The flow of processing an image by using the image processing method in the embodiment of the present application has been described accordingly. Optionally, the image processing method in the embodiment of the present application can process images of different sizes, such as the fourth image, the third image. The pixels of the four images are different from the pixels of the first image. The process of processing the fourth image by using the image processing method in the embodiment of the present application is similar to the process of processing the first image above, and details are not repeated here. In particular, when the fourth image is processed using the image processing method in the embodiment of the present application, the cloud device divides the fourth image, and M second image blocks can be obtained. The size of the second image block is the same as that of the first image block. When the size of the first tile and the second tile are the same, the same encoding neural network and decoding neural network are used to process the first tile and the second tile, and the first tile and the second tile are in the processing flow The number of convolution operations in and the number of data involved in each convolution operation are the same. In this case, a corresponding convolution operation unit can be designed according to the number of times of the above-mentioned convolution operation and/or the number of data involved in the convolution operation each time, so that the convolution operation unit matches the processing flow. Because the number of convolution operations in the processing flow and the number of data involved in each convolution operation are determined by the size of the first block and the CNN, it can also be considered that the convolution operation unit matches the first block, or the volume The product operation unit is matched to the encoding neural network and/or the decoding neural network. The higher the matching degree between the convolution operation unit and the first image block, the smaller the number of idle multipliers and adders in the convolution operation unit in the processing flow, that is, the higher the usage efficiency of the convolution operation unit.

The image processing methods in the embodiments of the present application are described above accordingly. In the above process, the size of the first tile not only affects the size of N, but also affects whether the image is just divided into tiles of integer blocks. The size of the first image block is generally determined by the following two aspects. The first aspect is the influence of the model on the size of the first tile. The model includes an encoding neural network and a decoding neural network, and may also include a fusion neural network. The influence of the model on the size of the first tile generally includes the impact on the size of the first tile when the model is trained and the impact on the size of the first tile when the model is used. The influence on the size of the first block when training the model includes using blocks of different sizes to train the model to determine in which interval or which value block the model converges faster, or the image output by the model The quality is high, or the compression performance of the model is high. Different models target image blocks of different sizes. When different models are used, the performance of different models in different scenes may be different, that is, the generalization problem of the model. The effect on the size of the first tile when using the model includes this generalization problem. In a second aspect, it is the effect on the size of the first tile whether the image is just divided into tiles of integer blocks. If the image cannot be exactly divided into integer blocks, some blocks will be incomplete, which will affect the reconstruction of the block by the model and reduce the quality of the image. In order to reduce the influence of the second aspect on the image quality, some related technical solutions are proposed below.

In the above step 302, the terminal device sends the first image to the cloud device. In this scenario, the coded neural network in the cloud device can specifically serve the terminal device, or this type of terminal device. If the terminal device includes a camera component, such as a camera, it is expected that the first image obtained by the terminal device through the camera can be divided into integer blocks by the cloud device. Assuming that the pixels of the first block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, a and b are obtained from the target pixel, and the target pixel is c×d,

is equal to an integer,

Equal to an integer, the target pixel is obtained according to the target resolution of the terminal device, and the target resolution is the default resolution of the photographing component of the terminal device, or the resolution set by the terminal device for it. The pixels of the image obtained by the imaging component of the terminal device under the setting of the target resolution are the target pixels, and the first image is obtained according to the imaging component. In particular, when the terminal device is the encoding end, such as the aforementioned first application scenario, it is more meaningful to determine the size of the first image block by the target resolution. Because the target resolution indicates the pixels of the image that the terminal device may acquire in the future, that is, the pixels of the image that the encoding end will use the image processing method in the embodiment of this application to process in the future, when training the model, it can target pixels for training.

Optionally, the target resolution is obtained by setting the resolution of the camera component according to the setting interface in the camera application. Wherein, the setting interface of the camera application can set the resolution obtained by the camera component. Use the resolution that has been selected in the setting interface as the target resolution. Please refer to FIG. 7 , which is a schematic diagram of a setting interface for setting a resolution of a camera of a terminal device in an embodiment of the present application. In the schematic diagram of the setting interface, the option 701 with a resolution of [4:3] 10MP is selected, although the option here does not specify the specific value of the target resolution. However, according to the first image obtained by shooting, it can be known that the pixels of the first image are 2736×3648, that is, the target pixels are 2736×3648. By determining the target pixel, the size of the first tile is determined such that

is equal to an integer,

equal to an integer.

Optionally, the target resolution is obtained according to the target image group in the gallery obtained by the imaging component, the pixels of the target image group are target pixels, and among the image groups with different pixels, the ratio of the target image group in the gallery is the largest. Wherein, the library obtained by the encoding end through the camera component includes image groups of different pixels. For example, as shown in FIG. 7 , the terminal corresponding to the schematic diagram of the setting interface can obtain images of 4 types of pixels through the camera. In the gallery of the camera of the terminal device, if it is determined which pixel has the largest proportion of the image, it can be guaranteed that the image of the pixel can be divided into integer blocks exactly.

Optionally, an image of multiple pixels is obtained by the imaging component, and the multiple pixels are e×f,

is equal to an integer,

Equal to an integer, e includes c and f includes d. Wherein, the terminal device can obtain images of different pixels through the camera component, and e×f is a pixel set of images of different pixels. For example, as shown in FIG. 7 , the terminal corresponding to the schematic diagram of the setting interface can obtain images of 4 types of pixels through the camera. like

is equal to an integer,

If it is equal to an integer, it means that the image of these 4 kinds of pixels can be divided into integer blocks.

Optionally, e×f further includes pixels obtained by the terminal device by taking screenshots.

Optionally, the multiple pixels are obtained by setting the resolution of the camera component through a setting interface in the camera application.

The scheme of ensuring that the first image is divided into integer blocks as far as possible has been described above, but in practical applications, there are always images that cannot be divided into integer blocks. In this case, in order to improve the compatibility of the model, the first image needs to be filled, which will be described below.

Optionally, the pixels of the first image block are a×b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r×t. After acquiring the first image, before dividing the first image, the method further includes: if

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1. The pixel median value refers to the median value of the pixel values of a single pixel point of the first image, for example, when 8 bits are used to represent one pixel point of the first image, the pixel median value is 128. By padding the edges of the image with the image median, you can improve model compatibility with less impact on image quality. The image median is the median of the pixels.

Optionally, before filling the edge of the first image, the method further includes: if the

equal to an integer. Among them, the number of tiles that fill the median of the pixels affects the quality of the image. Improve image quality by scaling up the image proportionally, reducing the number of tiles that fill the value in the image. As shown in FIG. 8 , FIG. 8 is a schematic flowchart of image filling in an embodiment of the present application. In 8a of FIG. 8 , the pixels of the first block are a×b, and the pixels of the first image are r×t. After proportionally enlarging r and t, as shown in 8b of FIG. 8 , the pixels of the first image are r×t. is r2×t2. Before zooming in, as shown in 8a of Figure 8, the number of tiles to be filled is 6. After zooming in, as shown in Figure 8b of Figure 8, the number of tiles to be filled is 4, so the number of tiles to be filled is 4. The number of tiles to fill. In particular, if

not equal to an integer, scale r and t proportionally so that

Then the number of tiles to be filled is reduced to 2.

Optionally, after proportionally enlarging r and t, if

not equal to an integer, get

the remainder. If the remainder is greater than

Then only the pixel median value is filled on one side of the width direction of the first image. Among them, only one side of the image is filled with the median value of the pixel, and the number of blocks filled with the median value of the image is further reduced under the condition of reducing the impact of the filling on the image block, and the image quality is improved. As shown in 8b of Fig. 8, the remainder g is greater than

As shown in 8c of Fig. 8, the pixel median is filled on one side of the first image.

Optionally, if the remainder is less than

where g is the remainder. Among them, the impact of filling on image blocks is reduced, and the image quality is improved. As shown in FIG. 9 , FIG. 9 is another schematic flowchart of image filling in an embodiment of the present application.

If the remainder g is less than

Then fill the image median value on both sides of the first image, and the width of the filled image median value is

The present application obtains N first image blocks by dividing the first image, obtains respective average values from different first image blocks, and then uses the average value to compensate the first reconstructed image block, so as to highlight the local characteristics of the first image the goal of. In particular, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image. Before acquiring the N first adaptive data from the N first image blocks, the method further includes: the cloud device inversely quantizes the pixel value of the first target image block. The cloud device obtains N pieces of first adaptive data from the inverse quantized first target image block. By inverse quantizing the pixel value of the first target image block, the local characteristic of the first image is further enhanced. Any first block can be understood as a local characteristic of the first image, and by highlighting the local characteristic of the first image, the reconstruction quality of the image can be improved, that is, the compression quality of the image can be improved. As shown in FIG. 10 , FIG. 10 is a schematic diagram for comparison of image compression quality in an embodiment of the present application. The abscissa represents the number of bits per pixel (bit-per-pixel, BPP), which is used to measure the code rate. The ordinate represents the peak signal-to-noise ratio (PSNR), which is used to measure the quality. The compression algorithms compared with the image processing methods in the embodiments of the present application include different implementations of the JPEG2000, HEVC (high efficiency video coding) and VVC (versatile video coding) standards. For JPEG2000, the reference software OpenJPEG is used to represent its compression performance. At the same time, the implementation integrated in Matlab is used as a supplement to the compression performance of JPEG2000. For HEVC, the reference software HM-16.15 is used to reflect the rate-distortion (RD) performance. The performance of the VVC standard is expressed using the VVC standard reference software VTM-6.2. It should be noted that in the encoding configuration of VTM-6.2, the input image bit depth and the intra-computed bit depth are set to 8 to be compatible with the format of the input image, and the test image is encoded using the full intra (AI) configuration. The rate-distortion performance of various compression algorithms is shown in Figure 10. The rate-distortion performance curve of OpenJPEG is 1001, the rate-distortion performance curve implemented by Matlab of JPEG2000 is 1002, and the performance curve of 420 image format compression of the reference software HM-16.15 is 1003 , the performance curve of the unblocked convolutional neural network image compression algorithm is 1004, the performance curve of the present invention is 1005, and the performance curve of the 420 image format compression of the reference software VTM-6.2 is 1006.

The image processing method in the embodiment of the present application is described above, and the image processing system in the embodiment of the present application is described below.

Please refer to FIG. 11 . FIG. 11 is a system architecture diagram of an image processing system provided by an embodiment of the application. In FIG. 11 , the image processing system 200 includes an execution device 210 , a training device 220 , a database 230 , a client device 240 and data The storage system 250 includes a computing module 211 in the execution device 210 .

Wherein, the database 230 stores the first image collection, and optionally, the database 230 further includes a fourth image collection. The training device 220 generates a target model/rule 201 for processing the first image and/or the fourth image, and uses the first image and/or the fourth image in the database to iteratively train the target model/rule 201 to obtain a mature Target Model/Rule 201. In this embodiment of the present application, the target model/rule 201 includes an encoding neural network and a decoding neural network. Optionally, the target model/rule 201 further includes a fusion neural network.

The encoding neural network and decoding neural network obtained by training the device 220 can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, monitoring systems, and so on. The execution device 210 may call data, codes, etc. in the data storage system 250 , and may also store data, instructions, etc. in the data storage system 250 . The data storage system 250 may be placed in the execution device 210 , or the data storage system 250 may be an external memory relative to the execution device 210 .

The calculation module 211 receives the first image sent by the client device 240, divides the first image to obtain N first image blocks, extracts N first adaptive data from the N first image blocks, and uses the N first self-adaptive data. Adapt the data to preprocess the N first image blocks, and then perform feature extraction on the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps. Perform quantization and entropy encoding to obtain N encoding tables, where N is an integer greater than 1.

The computing module 211 may further perform entropy decoding on the N encoded representations to obtain N groups of second feature maps, and then process the N groups of second feature groups through a decoding and decoding neural network to obtain N first reconstructed image blocks. After the N first reconstructed image blocks are obtained, the N first reconstructed image blocks are compensated by using the N first adaptive data. The calculation module 211 combines the N first reconstructed image blocks to obtain a second image. Optionally, when the target model/rule 201 further includes a fusion neural network, the computing module 211 may also use the fusion neural network to process the second image to obtain the third image. Among them, the fusion neural network is used to reduce the difference between the second image and the first image, the difference including blocking effect.

In some embodiments of the present application, referring to FIG. 11 , the execution device 210 and the terminal device 240 may be separate devices. The execution device 210 is configured with an I/O interface 212 for data interaction with the terminal device 240 , and a “user” may The first image is input to the I/O interface 212 through the terminal device 240 , and the execution device 210 returns the second image to the terminal device 240 through the I/O interface 212 to provide it to the user. Besides, the relationship between the terminal device 240 and the execution device 210 can be described by the relationship between the terminal device and the encoder and the decoder. The encoding end is a device that uses an encoding neural network, and the decoding end is a device that uses a decoding neural network. The encoding end and the decoding end can be the same device or independent devices. The terminal device is similar to the terminal device in the above image processing method, and the terminal device may be an encoding end and/or a decoding end. In order to facilitate understanding of the relationship between the terminal device 240 and the execution device 210, reference may be made to the foregoing related descriptions of FIGS. 2a-2c.

It is worth noting that FIG. 11 is only a schematic structural diagram of an image processing system provided by an embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in other embodiments of the present application, the execution device 210 may be configured in the terminal device 240. As an example, for example, when the terminal device is a mobile phone or a tablet, the execution device 210 may be the main processor (Host Processor) of the mobile phone or tablet. A module in the CPU) for performing array image processing, and the execution device 210 may also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet, and the GPU or NPU is linked as a coprocessor. Loaded to the main processor, the main processor assigns tasks.

With reference to the above description, the following begins to describe the specific implementation process of the training phase of the image processing method provided by the embodiment of the present application.

Specifically, please refer to FIG. 12. FIG. 12 is a schematic flowchart of a model training method provided by an embodiment of the present application. The model training method provided by the embodiment of the present application may include:

In step 1201, the training device acquires a first image.

In step 1202, the training device divides the first image to obtain N first image blocks, where N is an integer greater than 1.

In step 1203, the training device obtains N pieces of first adaptive data from the N first image blocks, and the N first adaptive data corresponds to the N first image blocks one-to-one.

In step 1204, the training device preprocesses the N first tiles according to the N first adaptive data.

In step 1205, the training device processes the preprocessed N first image blocks through the first coding neural network to obtain N groups of first feature maps.

In step 1206, the training device performs quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.

In step 1207, the training device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

In step 1208, the training device processes the N groups of second feature maps through the first decoding neural network to obtain N first reconstructed image blocks.

In step 1209, the training device compensates the N first reconstructed tiles by using the N first adaptive data.

In step 1210, the training device combines the compensated N first reconstructed blocks to obtain a second image.

In step 1211, the training device obtains the distortion loss of the second image relative to the first image.

In step 1212, the training device uses the loss function to jointly train the model until the image distortion value between the first image and the second image reaches a first preset level, and the model includes the first code A neural network, a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network.

Please refer to FIG. 13 , which is a schematic diagram of a training process provided by an embodiment of the present application. The loss function of the model in the embodiment is:

loss=l _d +P×l _r

In the above loss function, _ld represents the information entropy represented by the first code. P×l _r is used to represent the distortion metric between the first image and the second image, l _r represents the distortion loss of the first image and the second image, P represents the balance factor between the two loss functions, used to describe The first encoding represents a relative relationship with the reconstructed image quality.

Optionally, in order to obtain an appropriate block size, the training process includes dividing the first image into blocks of different sizes in multiple iterations of training, that is, the values of N are different. The size of the first block is optimized by comparing the loss functions obtained from multiple iterations.

In step 1213, the training device outputs a second encoding neural network and a second decoding neural network, the second encoding neural network is a model obtained by performing iterative training on the first encoding neural network, and the second decoding neural network The network is a model obtained by performing iterative training on the first decoding neural network.

For specific descriptions of steps 1201 to 1211, reference may be made to the descriptions in the above image processing method.

Optionally, the method further includes:

The training device quantizes the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, where the N pieces of first adaptive quantization data are used to compensate the N pieces of first reconstructed image blocks.

Optionally, the larger the N, the smaller the information entropy of the single first adaptive quantization data.

Optionally, the arrangement order of the N first code representations is the same as the arrangement order of the N first tiles, and the arrangement order of the N first tiles is that the N first tiles are in the arrangement order in the first image.

Optionally, the training device processes the second image through a fusion neural network to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness;

The training device is specifically configured to obtain the distortion loss of the third image relative to the first image;

The model includes a fusion neural network.

Optionally, each of the N first tiles has the same size.

Optionally, in two iterations of training, the size of the first image used for training is different, and the size of the first image block is a fixed value.

Optionally, the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,

is equal to an integer,

Optionally, the target resolution is obtained by setting the resolution of the camera component according to a setting interface in the camera application.

Optionally, the target resolution is obtained according to the target image group in the gallery obtained by the camera component, the pixels of the target image group are the target pixels, and in the image groups of different pixels, the target image group is Image groups have the largest ratio in the gallery.

is equal to an integer,

equal to an integer, the e includes the c and the f includes the d.

Optionally, the plurality of pixels are obtained by setting the resolution of the imaging component through a setting interface in the imaging application.

Optionally, the pixels of the first image block are a×b, the a is the number of pixels in the width direction, the b is the number of pixels in the height direction, and the pixels of the first image are r ×t;

like

not equal to an integer, and/or

is equal to an integer,

equal to an integer, the pixels of the first image after filling are r1×t1.

Optionally, after acquiring the first image, before filling the edge of the first image, the method further includes:

if said

is equal to an integer;

said if

not equal to an integer, and/or

like

Optionally, after proportionally enlarging r and t, if

not equal to an integer, get

the remainder. If the remainder is greater than

Then the training device only fills the pixel median value on one side of the width direction of the first image.

Optionally, if the remainder is less than

Wherein, the g is the remainder.

Optionally, the N first tiles include a first target tile, and the range of pixel values of the first target tile is smaller than the range of pixel values of the first image;

The training device inversely quantizes the pixel value of the first target image block;

The training device is specifically configured to acquire a piece of first adaptive data from the inverse quantized first target image block.

On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiments of the present application, related equipment for implementing the above solutions is also provided below. Referring specifically to FIG. 14, FIG. 14 is a schematic structural diagram of an encoding apparatus 1400 provided by an embodiment of the present application. The encoding apparatus 1400 corresponds to an encoding terminal, and the encoding apparatus 1400 may be a terminal device or a cloud device. The encoding apparatus 1400 includes:

a first acquisition module 1401, configured to acquire a first image;

A segmentation module 1402, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;

The second obtaining module 1403 is configured to obtain N pieces of first adaptive data from the N first picture blocks, and the N pieces of first adaptive data are in one-to-one correspondence with the N first picture blocks;

a preprocessing module 1404, configured to preprocess the N first image blocks according to the N first adaptive data;

The coding neural network module 1405 processes the preprocessed N first image blocks through the coding neural network to obtain N groups of first feature maps;

The quantization and entropy encoding module 1406 is configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations. Optionally, the encoding apparatus may also perform all or part of the operations performed by the cloud device in the embodiment corresponding to FIG. 3a.

The encoding device in the embodiment of the present application has been described above. On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiment of the present application, the following also provides a description of the encoding device in the embodiment of the present application. The decoding device is described. Referring specifically to FIG. 15, FIG. 15 is a schematic structural diagram of a decoding apparatus 1500 provided by an embodiment of the present application. The decoding apparatus 1500 corresponds to a decoding end, and the decoding apparatus 1500 may be a terminal device or a cloud device. The decoding apparatus 1500 includes:

The obtaining module 1501 is configured to obtain N first coded representations, N first adaptive data and corresponding relationships, where the corresponding relationships include correspondences between N first adaptive data and N first coded representations, and N first adaptive data representations. The adaptive data is in one-to-one correspondence with N first codes, where N is an integer greater than 1;

The entropy decoding module 1502 performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;

The decoding neural network module 1503 is used to process N groups of second feature maps to obtain N first reconstructed image blocks;

a compensation module 1504, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;

The combining module 1505 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.

Optionally, the decoding apparatus may also perform all or part of the operations performed by the terminal device in the embodiment corresponding to FIG. 3a.

The decoding apparatus in the embodiments of the present application has been described above. On the basis of the embodiments corresponding to FIG. 1 to FIG. 13 , in order to better implement the above solutions of the embodiments of the present application, the following also provides a description of the embodiments of the present application. The training device is described. Referring specifically to FIG. 16, FIG. 16 is a schematic structural diagram of a training apparatus 1600 provided by an embodiment of the present application. The training apparatus 1600 includes:

The first acquisition module 1601 is used to acquire a first image.

A segmentation module 1602, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1.

The second obtaining module 1603 is configured to obtain N pieces of first adaptive data from the N first image blocks, where the N pieces of first adaptive data are in one-to-one correspondence with the N first image blocks.

A preprocessing module 1604, configured to preprocess the N first image blocks according to the N first adaptive data;

The first coding neural network module 1605 is configured to process the pre-processed N first image blocks to obtain N groups of first feature maps.

A quantization and entropy encoding module 1606, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.

The entropy decoding module 1607 performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

The first decoding neural network module 1608 is configured to process the N groups of second feature maps to obtain N first reconstructed image blocks.

Compensation module 1609, configured to compensate the N first reconstructed image blocks by using the N first adaptive data.

The combining module 1610 is configured to combine the N first reconstructed image blocks after compensation to obtain a second image.

The third acquiring module 1611 is configured to acquire the distortion loss of the second image relative to the first image.

A training module 1612, configured to jointly train a model by using a loss function, until the image distortion value between the first image and the second image reaches a first preset level, the model includes the first coding nerve network, quantization network, entropy encoding network, entropy decoding network, and the first decoding neural network. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block. Optionally, the model further includes a segmentation network, and a trainable parameter in the segmentation network is the size of the first image block.

The output module 1613 is used for outputting a second coding neural network and a second decoding neural network, the second coding neural network is a model obtained by performing iterative training on the first coding neural network, and the second decoding neural network A model obtained after performing iterative training for the first decoding neural network.

Optionally, the training apparatus is further configured to perform all or part of the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.

The device also includes:

The second obtaining module 1603 is specifically configured to obtain a first adaptive data from the inverse quantized first target image block.

Next, an execution device provided by an embodiment of the present application will be introduced. Please refer to FIG. 17. FIG. 17 is a schematic structural diagram of the execution device provided by the embodiment of the present application. The execution device 1700 may specifically be represented as a virtual reality VR device, a mobile phone, Tablets, laptops, smart wearable devices, monitoring data processing devices, servers, etc., are not limited here. The encoding apparatus described in the corresponding embodiment of FIG. 14 and/or the decoding apparatus described in the corresponding embodiment of FIG. 15 may be deployed on the execution device 1700 to implement the apparatus in the corresponding embodiment of FIG. 14 and/or FIG. 15 function. Specifically, the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (wherein the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17 ) , wherein the processor 1703 may include an application processor 17031 and a communication processor 17032 . In some embodiments of the present application, the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or otherwise.

Memory 1704 may include read-only memory and random access memory, and provides instructions and data to processor 1703 . A portion of memory 1704 may also include non-volatile random access memory (NVRAM). The memory 1704 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.

The processor 1703 controls the operation of the execution device. In a specific application, various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.

The methods disclosed in the above embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703 . The processor 1703 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1703 or an instruction in the form of software. The above-mentioned processor 1703 may be a general-purpose processor, a digital signal processing (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1703 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.

The receiver 1701 can be used to receive input numerical or character information, and to generate signal input related to performing relevant settings and function control of the device. The transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include a display device such as a display screen .

In this embodiment of the present application, in one case, the processor 1703 is configured to perform the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to FIG. 3a.

Optionally, the application processor 17031 is configured to acquire the first image;

preprocessing the N first image blocks according to the N first adaptive data;

Perform quantization and entropy encoding on N groups of first feature maps to obtain N first encoded representations;

Besides, the application processor 17031 can also be used to perform all or part of the operations that can be performed by the cloud device in the embodiment corresponding to FIG. 3a.

Optionally, an application processor 17031, configured to obtain N first encoded representations;

performing entropy decoding on the N first encoded representations to obtain N sets of second feature maps;

Process N groups of second feature maps through a decoding neural network to obtain N first reconstructed image blocks;

combining the compensated N first reconstructed blocks to obtain a second image;

Besides, the application processor 17031 can also be used to perform all or part of the operations that can be performed by the terminal device in the embodiment corresponding to FIG. 3a.

This embodiment of the present application also provides a training device. Please refer to FIG. 18 . FIG. 18 is a schematic structural diagram of the training device provided by the embodiment of the present application. The training device 1800 can be deployed with the training described in the corresponding embodiment of FIG. 16 . The device is used to realize the function of the training device in the embodiment corresponding to FIG. 16 . Specifically, the training device 1800 is implemented by one or more servers. The training device 1800 may vary greatly due to different configurations or performances, and may include one or more servers. One or more central processing units (CPUs) 1822 (eg, one or more processors) and memory 1832, one or more storage media 1830 (eg, one or more mass storage devices) that store applications 1842 or data 1844 equipment). Among them, the memory 1832 and the storage medium 1830 may be short-term storage or persistent storage. The program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1822 may be configured to communicate with the storage medium 1830 to execute a series of instruction operations in the storage medium 1830 on the training device 1800 .

Training device 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

In this embodiment of the present application, the central processing unit 1822 is configured to perform all or part of the operations performed by the training device in the embodiment corresponding to FIG. 16 .

Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps executed by the execution device in the method described in the foregoing embodiment shown in FIG. 17 , or causes the computer to execute steps such as The steps performed by the training device in the method described in the foregoing embodiment shown in FIG. 18 .

Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, it causes the computer to execute the embodiment shown in FIG. 17 above. Perform the steps performed by the device in the described method, or cause the computer to perform the steps performed by the training device in the method described in the embodiment shown in FIG. 18 .

The execution device and the training device provided by the embodiments of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the operations performed by the terminal device and/or the cloud device described in the embodiment shown in FIG. 3a, or, so that the chip in the training device executes Execute the model training method described in the embodiment shown in FIG. 13 above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

Specifically, please refer to FIG. 19. FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU2000, and the NPU2000 is mounted as a co-processor to the host CPU (Host CPU) , tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.

In some implementations, the arithmetic circuit 2003 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .

Unified memory 2006 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002. Input data is also transferred to unified memory 2006 via the DMAC.

The BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.

The bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .

The vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on, if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.

In some implementations, the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 . For example, the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 2007 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.

The instruction fetch memory (instruction fetch buffer) 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004;

Unified memory 2006, input memory 2001, weight memory 2002 and instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.

Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.

In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which may be a personal computer, training device, or network device, etc.) to execute the various embodiments of this application. method.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data The center transmits to another website site, computer, training equipment or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Claims

An image processing method, comprising:

get the first image;

dividing the first image to obtain N first image blocks, where N is an integer greater than 1;

Obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;

preprocessing the N first image blocks according to the N first adaptive data;

The preprocessed N first image blocks are processed by the coding neural network to obtain N groups of first feature maps;

Perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
The method according to claim 1, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing by a decoding neural network , to obtain N first reconstructed image blocks, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used for combination into the second image.
The method according to claim 1 or 2, wherein the method further comprises:

Send the N first coded representations, the N first adaptive data and the corresponding relationship to the decoding end, where the corresponding relationship includes the N first adaptive data and the N first coded representations Correspondence.
The method according to claim 3, wherein the method further comprises:

Quantizing the N first adaptive data to obtain N first adaptive quantization data, where the N first adaptive quantization data is used to compensate the N first reconstructed image blocks;

The sending the N pieces of first adaptive data to the decoding end includes:

Send the N first adaptive quantization data to the decoding end.
The method according to claim 4, wherein the larger the N is, the smaller the information entropy of the single first adaptive quantization data is.
The method according to any one of claims 3 to 5, wherein the arrangement order of the N first coded representations is the same as the arrangement order of the N first image blocks, and the N first coded representations are arranged in the same order as the N first image blocks. The arrangement order of the tiles is the arrangement order of the N first tiles in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the N first tiles arrangement order.
The method according to any one of claims 1 to 6, wherein each of the N first image blocks has the same size.
The method according to claim 7, wherein when the method is used to segment the first images of different sizes, the size of the first image block is a fixed value.
The method according to any one of claims 1 to 8, wherein the pixels of the first image block are a×b, the a and the b are obtained according to a target pixel, and the target pixel is c×d,
is equal to an integer,
is equal to an integer, the a and c are the number of pixels in the width direction, the b and d are the number of pixels in the height direction, and the target pixel is obtained according to the target resolution of the terminal device. It includes an imaging component, the pixels of the image obtained by the imaging component under the setting of the target resolution are the target pixels, and the first image is obtained by the imaging component.
The method according to claim 9, wherein the target resolution is obtained by setting the resolution of the camera component according to a setting interface in a camera application.
The method according to claim 9, wherein the target resolution is obtained according to a target image group in a gallery obtained by the imaging component, the pixels of the target image group are the target pixels, and the pixels of the target image group are the target pixels. In the image group of pixels, the ratio of the target image group in the gallery is the largest.
The method according to any one of claims 1 to 8, wherein the pixels of the first image block are a×b, the a is the number of pixels in the width direction, and the b is the height direction The number of pixels on the first image, the pixels of the first image are r×t;

After acquiring the first image, and before segmenting the first image, the method further includes:

like
not equal to an integer, and/or
is not equal to an integer, then fill the edges of the first image with the pixel median such that
is equal to an integer,
equal to an integer, the pixels of the first image after filling are r1×t1.
The method according to claim 12, characterized in that, after acquiring the first image, before filling the edge of the first image, the method further comprises:

if said
is not equal to an integer, then the r and the t are proportionally enlarged to obtain the first image whose pixels are r2×t2, and the
is equal to an integer;

said if
not equal to an integer, and/or
is not equal to an integer, then filling the edge of the first image with the median value of pixels includes:

like
not equal to an integer, then fill the edges of the first image with the pixel median.
The method according to any one of claims 1 to 13, wherein the N first image blocks include a first target image block, and the range of pixel values of the first target image block is smaller than that of the first image block. A range of pixel values for an image;

Before acquiring the N first adaptive data from the N first tiles, the method further includes:

inversely quantize the pixel value of the first target image block;

Obtaining N pieces of first adaptive data from the N pieces of first tiles includes:

Obtain a first adaptive data from the inverse quantized first target image block.
An image processing method, comprising:

Obtain N first coded representations, N pieces of first adaptive data, and corresponding relationships, where the corresponding relationships include correspondences between the N pieces of first adaptive data and the N pieces of first coded representations, the N pieces of first adaptive data The first adaptive data is in one-to-one correspondence with the N first codes, and N is an integer greater than 1;

Entropy decoding is performed on the N first encoded representations to obtain N groups of second feature maps;

The N groups of second feature maps are processed by a decoding neural network to obtain N first reconstructed image blocks;

compensating the N first reconstructed image blocks by the N first adaptive data;

The compensated N first reconstructed image blocks are combined to obtain a second image.
The method according to claim 15, wherein the N first encoded representations are obtained by quantization and entropy encoding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network The preprocessed N first image blocks are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data, so The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
The method according to claim 15 or 16, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
The method according to any one of claims 15 to 17, wherein the method further comprises:

The second image is processed by a fusion neural network to obtain a third image to reduce the difference between the second image and the first image, the difference including blockiness.
An encoding device, comprising:

a first acquisition module, configured to acquire a first image;

a segmentation module, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;

a second obtaining module, configured to obtain N pieces of first adaptive data from the N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;

a preprocessing module, configured to preprocess the N first image blocks according to the N first adaptive data;

encoding the neural network module, processing the preprocessed N first image blocks, and obtaining N groups of first feature maps;

A quantization and entropy encoding module, configured to perform quantization and entropy encoding on the N groups of first feature maps to obtain N first encoded representations.
The device according to claim 19, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, and the N sets of second feature maps are used for processing through a decoding neural network , to obtain N first reconstructed image blocks, the N first adaptive data are used to compensate the N first reconstructed image blocks, and the compensated N first reconstructed image blocks are used for combination into the second image.
The device according to claim 19 or 20, wherein the device further comprises:

A sending module, configured to send the N first coded representations, the N first adaptive data and the corresponding relationship to the decoding end, where the corresponding relationship includes the N first adaptive data and the N first adaptive data The corresponding relationship represented by the first code.
The apparatus of claim 21, wherein the apparatus further comprises:

A quantization module, configured to quantize the N pieces of first adaptive data to obtain N pieces of first adaptive quantization data, and the N pieces of first adaptive quantization data are used to perform quantization on the N pieces of first reconstructed image blocks compensate;

The sending module is specifically configured to send the N pieces of first adaptive quantization data to the decoding end.
The apparatus according to claim 22, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
The apparatus according to any one of claims 21 to 23, wherein the arrangement order of the N first encoded representations is the same as the arrangement order of the N first image blocks, and the N first encoded representations are arranged in the same order as the N first image blocks. The arrangement order of the tiles is the arrangement order of the N first tiles in the first image, and the corresponding relationship includes the arrangement order of the N first encoded representations and the N first tiles arrangement order.
The apparatus according to any one of claims 19 to 24, wherein each of the N first image blocks has the same size.
The apparatus according to claim 25, wherein when the apparatus is used to process the first images of different sizes, the size of the first image block is a fixed value.
A decoding device, comprising:

an acquisition module, configured to acquire N first coded representations, N first adaptive data and corresponding relationships, where the corresponding relationships include the corresponding relationships between the N first adaptive data and the N first coded representations , the N first adaptive data are in one-to-one correspondence with the N first codes, and N is an integer greater than 1;

an entropy decoding module, which performs entropy decoding on the N first encoded representations to obtain N groups of second feature maps;

a decoding neural network module for processing the N groups of second feature maps to obtain N first reconstructed image blocks;

a compensation module, configured to compensate the N first reconstructed image blocks by using the N first adaptive data;

The combining module is used for combining the compensated N first reconstructed image blocks to obtain a second image.
The apparatus according to claim 27, wherein the N first encoded representations are obtained by quantization and entropy coding of N groups of first feature maps, and the N groups of first feature maps are processed by a coding neural network The preprocessed N first image blocks are obtained, and the preprocessed N first image blocks are obtained by preprocessing the N first image blocks through the N first adaptive data, so The N first adaptive data are obtained from the N first image blocks, and the N first image blocks are obtained by dividing the first image.
The apparatus according to claim 27 or 28, wherein the larger the N, the smaller the information entropy of the single first adaptive quantization data.
The device according to any one of claims 27 to 29, wherein the device further comprises:

A fusion neural network module is used to process the second image to obtain a third image, so as to reduce the difference between the second image and the first image, where the difference includes blockiness.
An image processing device comprising: a non-volatile memory and a processor coupled to each other, the processor calling program code stored in the memory to perform the method as described in any one of claims 1-18 .