CN114066914A

CN114066914A - Image processing method and related equipment

Info

Publication number: CN114066914A
Application number: CN202010754333.9A
Authority: CN
Inventors: 赵政辉; 马思伟; 王晶
Original assignee: Peking University; Huawei Technologies Co Ltd
Current assignee: Peking University; Huawei Technologies Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2022-02-18
Also published as: WO2022022176A1

Abstract

The application relates to the field of artificial intelligence and discloses an image processing method, which comprises the following steps: acquiring a first image; segmenting the first image to obtain N first image blocks; acquiring N first adaptive data from N first image blocks, wherein the N first adaptive data correspond to the N first image blocks one by one; preprocessing the N first image blocks according to the N first self-adaptive data; n preprocessed first image blocks are processed through a coding neural network to obtain N groups of first feature maps; the N sets of first feature maps are quantized and entropy encoded to obtain N first encoded representations. By extracting a plurality of self-adaptive information, the plurality of self-adaptive information can be used for compensating a plurality of reconstructed image blocks, so that the local characteristic is highlighted, and the image quality of the second image is improved.

Description

Image processing method and related equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method and related device.

Background

Multimedia data now occupies the vast majority of the internet traffic. Compression of image data is crucial to the storage and efficient transmission of multimedia data. Therefore, image coding is a technology with great practical value.

There has been a long history of research on image coding, and researchers have proposed a large number of methods and made various international standards such as JPEG, JPEG2000, WebP, BPG, and the like. Although widely used at present, these conventional methods exhibit certain limitations with respect to the ever-increasing amount of image data and the ever-emerging new media types. In recent years, researchers have started to study depth learning-based image coding methods. Some researchers have achieved good results, for example, Ball et al propose an end-to-end optimized image coding method that achieves better image coding performance than the best current image coding performance, even better than the best current conventional coding standard BPG. The deep learning image coding is a lossy image coding technology, and the general flow of the deep learning image coding is as follows: extracting adaptive data of the image at a coding end, preprocessing the image by using the adaptive data, coding the preprocessed image by using a coding neural network to obtain compressed data, and decoding the compressed data at a decoding end to obtain an image similar to the original image.

Although the above deep learning image coding has a great progress compared with the conventional coding method, how to reduce the loss of image quality in the coding process is a problem that needs to be solved by the lossy image coding technology.

Disclosure of Invention

The application provides an image processing method and related equipment, which are used for improving the image quality.

A first aspect of the present application provides an image processing method, including:

the encoding end acquires a first image, and then divides the first image to acquire N first image blocks, wherein N is an integer larger than 1. The encoding end acquires N first adaptive data from N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks. The encoding end carries out preprocessing on the N first image blocks through the N first adaptive data. After preprocessing, the encoding end processes the N preprocessed first image blocks through the encoding neural network to obtain N groups of first feature maps. And the coding end quantizes and entropy codes the N groups of first characteristic graphs to obtain N first coded representations. The adaptive information can be used for compensating the reconstructed image blocks by extracting the adaptive information, so that local characteristics are highlighted, and the image quality of the second image is improved.

In an alternative design of the first aspect, the N sets of second feature maps may be obtained if N first encoded representations are entropy decoded. If N sets of second feature maps are processed by the decoding neural network, N first reconstructed blocks can be obtained, and N first adaptive data are used for compensating the N first reconstructed blocks. If the compensated N first reconstructed tiles are combined, a second image is obtained.

In an alternative design of the first aspect, the method further includes: the encoding end sends N first encoding representations, N first adaptive data and corresponding relations to the decoding end, wherein the corresponding relations comprise the corresponding relations between the N first adaptive data and the N first encoding representations.

In an alternative design of the first aspect, the method further includes: the coding end quantizes the N first self-adaptive data to obtain N first self-adaptive quantized data. And the encoding end sends N pieces of first adaptive quantized data to the decoding end, and the N pieces of first adaptive quantized data are used for compensating the N pieces of first reconstruction blocks. Wherein, N is an integer greater than 1, and the encoding end needs to acquire a plurality of first adaptive data. The data amount of the first adaptive data can be reduced by quantizing the first adaptive data when acquiring a plurality of adaptive data, as compared with acquiring only one adaptive data from the first image.

In an alternative design of the first aspect, the larger N, the smaller the information entropy of the single first adaptively quantized data. Wherein the larger N is, the larger the number of the first tiles is, and the larger the number of the first adaptive quantization data is. In this case, by reducing the information entropy of the first adaptive quantization data, the quantization degree of the first adaptive amount can be further increased, and the data amount of the first adaptive data can be reduced.

In an optional design of the first aspect, an arrangement order of the N first coding representations is the same as an arrangement order of the N first tiles, the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image, and the correspondence relationship includes the arrangement order of the N first coding representations and the arrangement order of the N first tiles. In contrast to acquiring only one adaptation data from the first image, there are a plurality of adaptation data and a plurality of first tiles in the present application. Therefore, the application needs to ensure the corresponding relationship between the plurality of adaptive data and the plurality of first image blocks. The data amount can be reduced by ensuring the correspondence relationship through the arrangement sequence.

In an alternative design of the first aspect, the third image is obtained if the second image is processed through a fused neural network, the fused neural network being configured to reduce differences between the second image and the first image, the differences including blockiness. The present application enhances the representation of each tile by highlighting the local characteristics of each image block, but also thereby easily causes blockiness from tile to tile. The second image is processed by fusing the neural network, so that the influence caused by the blocking effect can be reduced, and the image quality is improved.

In an alternative design of the first aspect, each of the N first tiles is the same size. And if the size of each first image block is the same, the multiplication times and the addition times of each image block participating in convolution operation are the same in the operation of the feature map and the convolution layer in the encoding neural network, so that the operation efficiency can be improved.

In an optional design of the first aspect, the size of the first tile is a fixed value when the method is used to segment the first image of different sizes. When processing images with different sizes, the encoding end divides the images into image blocks with the same size. By fixing the size of the first image block, high matching of the convolution operation unit and the image block becomes possible, so that the cost of the convolution operation unit can be reduced or the use efficiency of the convolution operation unit can be improved.

In an alternative design of the first aspect, the pixels of the first tile are a × b, a and b are derived from a target pixel, the target pixel is c × d,

equal to an integer number of times, is,

the image processing method comprises the steps that the number of pixels in the width direction is equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, a target pixel is obtained according to a target resolution of terminal equipment, the terminal equipment comprises a shooting component, a pixel of an image obtained by the shooting component under the setting of the target resolution is a target pixel, and a first image is obtained by the shooting component. The encoding end and/or the decoding end may be terminal devices or may not be terminal devices. Under the setting of the target resolution, the image obtained by the encoding end can be just divided into different image blocks, so that useless data filling is avoided, and the image quality is improved.

In an alternative design of the first aspect, the target resolution is set according to a resolution of the imaging component from a setting interface in the imaging application. The setting interface of the camera shooting application can set the resolution obtained by shooting of the camera shooting component. And the resolution selected in the setting interface is used as the target resolution, so that the acquisition efficiency of the target resolution is improved.

In an alternative design of the first aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing means, a pixel of the target image group is a target pixel, and a ratio of the target image group in the gallery is largest in the image group of different pixels. The image library obtained by the coding end through the image pickup component comprises image groups with different pixels. The target pixels are determined through the target image group, so that most of images can be just divided into different image blocks, useless data filling is avoided, and image quality is improved.

In an alternative design of the first aspect, the image capturing means obtains an image of a plurality of pixels, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, e includes c, f includes d. Wherein the terminal device can obtain images of different pixels through the image pickup means, and e × f is a pixel set of the images of the different pixels. The image of different pixels that this application limited camera shooting part obtained can both just be divided into different picture blocks, avoids filling useless data to promote image quality.

In an alternative design of the first aspect, the plurality of pixels are set by a resolution of the image capture component via a setting interface in an image capture application.

In an optional design of the first aspect, the pixels of the first image block are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t. After acquiring the first image, prior to segmenting the first image, the method further comprises: if it is

Is not equal to an integer, and/or

Not equal to an integer, filling the edges of the first image with pixel medians such that

Equal to an integer number of times, is,

equal to an integer, the pixels of the padded first image are r1 × t 1. The size of the first tile block is fixed, and the encoding end may need to face the image with different pixels, that is, the image with some pixels may not be exactly divided. And in the case that the image cannot be just divided, filling the edge of the image with the image median value can improve the compatibility of the model under the condition of reducing the influence on the image quality. The image median is the median of the pixel points.

In an alternative design of the first aspect, after acquiring the first image, before filling in edges of the first image, the method further includes: if it is

Not equal to integer, and then enlarging r and t in equal ratio to obtain a first image with pixels of r2 × t2,

equal to an integer. If it is

Not equal to an integer, filling the edges of the first image with pixel median values. The number of tiles that fill in the median of the pixels affects the quality of the image. And the image is amplified in an equal ratio, so that the number of image blocks for filling the median of the image is reduced, and the image quality is improved.

In an alternative design of the first aspect, after scaling up r and t, if

Is not equal to an integer, then obtain

The remainder of (1). If the remainder is greater than

The pixel is filled only at one side in the width direction of the first imageThe value is obtained. And the pixel median is filled only at one side of the image, so that the number of image blocks for filling the image median is further reduced under the condition of reducing the influence of filling on the image block, and the image quality is improved.

In an alternative design of the first aspect, if the remainder is less than

Filling pixel median values at both sides of the first image in the width direction so that the width of the pixel median values filled at each side is

Wherein g is the remainder. The influence of filling on the image blocks is reduced, and the image quality is improved.

In an alternative design of the first aspect, the N first tiles include a first target tile having a range of pixel values that is smaller than a range of pixel values of the first image. Before obtaining N first adaptive data from N first tiles, the method further comprises: the encoding end inverse quantizes pixel values of the first target tile. The encoding end obtains a first adaptive data from the first target image block after inverse quantization. The pixel values of the first target tile are dequantized, further highlighting local characteristics of the image.

A second aspect of the present application provides an image processing method, the method comprising:

the decoding end obtains N first coding representations, N first adaptive data and corresponding relations. The correspondence includes a correspondence of the N first adaptive data and the N first encoded representations. The N first adaptive data correspond to the N first codes one to one, where N is an integer greater than 1. And the decoding end performs entropy decoding on the N first coded representations to obtain N groups of second characteristic diagrams. And the decoding end processes the N groups of second characteristic graphs through the decoding neural network to obtain N first reconstruction image blocks. The decoding end compensates the N first reconstruction blocks through the N first adaptive data. The decoding end combines the compensated N first reconstruction blocks to obtain a second image. The plurality of reconstructed image blocks are compensated through the plurality of self-adaptive data, and the local characteristic of each image block is highlighted, so that the image quality of the second image is improved.

In an alternative design of the second aspect, the N first coded representations are obtained by quantizing and entropy coding N groups of first feature maps, the N groups of first feature maps are obtained by processing the N preprocessed first tiles through a coding neural network, the N preprocessed first tiles are obtained by preprocessing the N first tiles through N first adaptive data, the N first adaptive data are obtained from the N first tiles, and the N first tiles are obtained by dividing the first image.

In an alternative design of the second aspect, the N first adaptive data are N first adaptive quantized data, and the N first adaptive quantized data are obtained by quantizing the N first adaptive data. The decoding end compensates N first reconstruction blocks through N first adaptive quantization data.

In an alternative design of the second aspect, the larger N, the smaller the information entropy of the single first adaptively quantized data.

In an optional design of the second aspect, an arrangement order of the N first coding representations is the same as an arrangement order of the N first tiles, the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image, and the correspondence relationship includes the arrangement order of the N first coding representations and the arrangement order of the N first tiles.

In an alternative design of the second aspect, the method further includes: and the decoding end processes the second image through the fusion neural network to obtain a third image. The second image is processed through a fusion neural network to reduce differences between the second image and the first image, the differences including blockiness.

In an alternative design of the second aspect, each of the N first tiles is the same size.

In an alternative design of the second aspect, the size of the first tile is a fixed value when the method is used for generating second images of different sizes in combination. In an alternative design of the second aspect, the pixels of the first tile are a × b. a and b are according to the targetThe pixel results, the target pixel is c x d,

equal to an integer number of times, is,

equal to an integer, a and c are the number of pixels in the width direction, and b and d are the number of pixels in the height direction. The target pixel is obtained according to a target resolution of the terminal device, the terminal device includes an image pickup means, a pixel of an image obtained by the image pickup means under the setting of the target resolution is the target pixel, and the first image is obtained by the image pickup means.

In an alternative design of the second aspect, the target resolution is set according to a resolution of the image capture component from a setting interface in the image capture application.

In an alternative design of the second aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing means, and a pixel of the target image group is a target pixel. In the image group of different pixels, the ratio of the target image group in the gallery is the largest.

In an alternative design of the second aspect, an image of a plurality of pixels is obtained by the image pickup means, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, e includes c, f includes d.

In an alternative design of the second aspect, the plurality of pixels may be obtained by setting a resolution of the image capture component by a setting interface in an image capture application.

In an optional design of the second aspect, the pixels of the first image block are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t. In that

Is not equal to an integer, and/or

Not equal to an integer, the edges of the first image are filled with the pixel median values such that

Equal to an integer number of times, is,

equal to an integer, the pixels of the padded first image are r1 × t 1.

In an alternative design of the second aspect, in

When the integer is not equal to the integer, r and t are amplified by the encoding end in equal proportion to obtain a first image with the pixel r2 × t2,

equal to an integer.

In an alternative design of the second aspect, if

Is not equal to an integer number of times,

is greater than

The first image is filled with pixel medians on one side only in the width direction.

In an alternative design of the second aspect, if the remainder is less than

The first image is filled with pixel median values at two sides in the width direction, and the width of the pixel median values filled at each side is

Wherein g is the remainder.

In an alternative design of the second aspect, the N first tiles include a first target tile, a range of pixel values of the first target tile is smaller than a range of pixel values of the first image, and at least one of the first adaptive data is obtained from an inverse quantized first target tile, which is obtained by inverse quantizing pixel values of the first target tile.

A third aspect of the present application provides a model training method, the method including:

acquiring a first image;

segmenting the first image to obtain N first image blocks, wherein N is an integer larger than 1;

acquiring N first adaptive data from the N first image blocks, wherein the N first adaptive data are in one-to-one correspondence with the N first image blocks;

preprocessing the N first tiles according to the N first adaptive data;

processing the N preprocessed first image blocks through a first coding neural network to obtain N groups of first feature maps;

quantizing and entropy coding the N groups of first feature maps to obtain N first coded representations;

entropy decoding the N first coded representations to obtain N groups of second feature maps;

processing the N groups of second feature maps through a first decoding neural network to obtain N first reconstruction image blocks;

compensating the N first reconstruction tiles by the N first adaptive data;

combining the compensated N first reconstruction image blocks to obtain a second image;

acquiring distortion loss of the second image relative to the first image;

and performing joint training on a model by using a loss function until an image distortion value between the first image and the second image reaches a first preset degree, wherein the model comprises the first encoding neural network, a quantization network, an entropy encoding network, an entropy decoding network and the first decoding neural network. Optionally, the model further comprises a segmentation network, the trainable parameters in the segmentation network being the size of the first tile.

And outputting a second coding neural network and a second decoding neural network, wherein the second coding neural network is a model obtained after the iterative training is performed on the first coding neural network, and the second decoding neural network is a model obtained after the iterative training is performed on the first decoding neural network.

In an alternative design of the third aspect, the method further includes:

quantizing the N first adaptive quantized data to obtain N first adaptive quantized data, the N first adaptive quantized data being used to compensate the N first reconstructed tiles;

in an alternative design of the third aspect, the larger the N, the smaller the information entropy of the single first adaptively quantized data.

In an optional design of the third aspect, an arrangement order of the N first coded representations is the same as an arrangement order of the N first tiles, and the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image.

In an optional design of the third aspect, the second image is processed by a fused neural network to obtain a third image, the fused neural network is configured to reduce a difference between the second image and the first image, the difference includes a blocking effect;

acquiring a distortion loss of the second image relative to the first image comprises:

acquiring distortion loss of the third image relative to the first image;

the model includes a converged neural network.

In an alternative design of the third aspect, each of the N first tiles is the same size.

In an alternative design of the third aspect, in the two iterative trainings, sizes of the first images for training are different, and the size of the first image block is a fixed value.

In an alternative design of the third aspect, the pixel of the first tile is a × b, a and b are derived from a target pixel, the target pixel is c × d,

equal to an integer number of times, is,

the method comprises the steps that the number of pixels in the width direction is equal to an integer, the number of pixels in the width direction is a sum of the number of pixels in the height direction, a target pixel is obtained according to a target resolution of terminal equipment, the terminal equipment comprises a shooting component, a pixel of an image obtained by the shooting component under the setting of the target resolution is the target pixel, and a first image is obtained by the shooting component.

In an alternative design of the third aspect, the target resolution is set according to a resolution of the image capture component from a setting interface in an image capture application.

In an optional design of the third aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing unit, a pixel of the target image group is the target pixel, and a ratio of the target image group in the gallery is the largest in the image group of different pixels.

In an alternative design of the third aspect, an image of a plurality of pixels is obtained by the image pickup device, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, said e comprising said c, said f comprising said d.

In an optional design of the third aspect, the plurality of pixels are set by a resolution setting of the image pickup component through a setting interface in the image pickup application.

In an optional design of the third aspect, the pixels of the first tile are a × b, where a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t;

after acquiring the first image, prior to segmenting the first image, the method further comprises:

if it is

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the padded pixels of the first image are r1 × t 1.

In an optional design of the third aspect, after acquiring the first image, before filling in edges of the first image, the method further includes:

if it is as described

Not equal to an integer, then magnifying the r and the t equally to obtain the first image with pixels r2 × t2, the

Equal to an integer;

if it is

Is not equal to an integer, and/or

Not equal to an integer, filling the edges of the first image with pixel median values comprises:

if it is

Not equal to an integer, filling the edges of the first image with pixel median values.

In an alternative design of the third aspect, after scaling up r and t, if

Is not equal to an integer, then obtain

The remainder of (1). If the remainder is greater than

The median pixel values are filled only on one side of the first image in the width direction.

In an alternative design of the third aspect, if the remainder is less than

Filling the pixel median values at both sides of the first image in the width direction such that the width of the pixel median values filled at each side is

Wherein g is the remainder.

In an optional design of the third aspect, the N first tiles comprise a first target tile having a range of pixel values that is smaller than a range of pixel values of the first image;

before obtaining N first adaptive data from the N first tiles, the method further includes:

inverse quantizing pixel values of the first target tile;

obtaining N first adaptive data from the N first tiles comprises:

obtaining the one first adaptive data from the dequantized first target tile.

A fourth aspect of the present application provides an encoding apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a first image;

the segmentation module is used for segmenting the first image to obtain N first image blocks, wherein N is an integer larger than 1;

the second obtaining module is used for obtaining N first adaptive data from N first image blocks, and the N first adaptive data are in one-to-one correspondence with the N first image blocks;

the preprocessing module is used for preprocessing the N first image blocks according to the N first self-adaptive data;

the encoding neural network module is used for processing the N preprocessed first image blocks through the encoding neural network to obtain N groups of first feature maps;

and the quantization and entropy coding module is used for quantizing and entropy coding the N groups of first characteristic graphs to obtain N first coding representations.

In an optional design of the fourth aspect, the N first coded representations are used for entropy decoding to obtain N sets of second feature maps, the N sets of second feature maps are used for processing by a decoding neural network to obtain N first reconstruction blocks, the N first adaptive data are used for compensating the N first reconstruction blocks, and the compensated N first reconstruction blocks are used for combining into the second image.

In an alternative design of the fourth aspect, the apparatus further includes:

and the sending module is used for sending the N first coding representations, the N first adaptive data and the corresponding relation to the decoding end, wherein the corresponding relation comprises the corresponding relation between the N first adaptive data and the N first coding representations.

In an alternative design of the fourth aspect, the apparatus further includes:

a quantization module, configured to quantize the N first adaptive data to obtain N first adaptive quantized data, where the N first adaptive quantized data are used to compensate the N first reconstructed tiles;

the sending module is specifically configured to send the N first adaptive quantized data to the decoding end.

In an alternative design of the fourth aspect, the larger N, the smaller the information entropy of the single first adaptively quantized data.

In an optional design of the fourth aspect, an arrangement order of the N first coding representations is the same as an arrangement order of the N first tiles, the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image, and the correspondence relationship includes the arrangement order of the N first coding representations and the arrangement order of the N first tiles.

In an alternative design of the fourth aspect, the second image is processed by a fusion neural network to obtain a third image, and the fusion neural network is used to reduce a difference between the second image and the first image, where the difference includes a blocking effect.

In an alternative design of the fourth aspect, each of the N first tiles is the same size.

In an alternative design of the fourth aspect, the size of the first tile is a fixed value when the apparatus is used to process the first images of different sizes. In an alternative design of the fourth aspect, the pixels of the first tile are a × b, and a and b are derived from the target pixel. The target pixel is c x d and,

equal to an integer number of times, is,

and the number of the pixels in the width direction is equal to the integer, the numbers of the pixels in the width direction are a and c, and the numbers of the pixels in the height direction are b and d. The target pixel is obtained according to a target resolution of the terminal device, the terminal device comprises an image pickup part, the image pickup part obtains the pixel of the image as the target pixel under the setting of the target resolution, and the first image is obtained by the image pickup part.

In an alternative design of the fourth aspect, the target resolution is set according to a resolution of the image capture component from a setting interface in the image capture application.

In an alternative design of the fourth aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing means, a pixel of the target image group is a target pixel, and a ratio of the target image group in the gallery is largest in the image group of different pixels.

In an alternative design of the fourth aspect, an image of a plurality of pixels is obtained by the image pickup means, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, e includes c, f includes d.

In an alternative design of the fourth aspect, the plurality of pixels are set by a resolution of the image capture component via a setting interface in the image capture application.

In an optional design of the fourth aspect, the pixels of the first image block are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t. The device further comprises:

a filling module for

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the padded pixels of the first image are r1 × t 1.

In an alternative design of the fourth aspect, the apparatus further includes:

an amplifying module for amplifying the signal

Equal to an integer;

the filling module is used particularly if

In an alternative design of the fourth aspect, the second obtaining unit is further configured to obtain the second information if

Is not equal to an integer, then obtain

The remainder of (1);

the padding module is specifically configured to pad if the remainder is greater than

In an optional design of the fourth aspect, the padding module is specifically configured to be used if the remainder is less than

Filling pixel median values at both sides of the first image in the width direction so that the width of the pixel median value filled at each side is

Wherein g is the remainder.

In an optional design of the fourth aspect, the N first tiles comprise a first target tile having a range of pixel values smaller than a range of pixel values of the first image. The device further comprises:

an inverse quantization module to inverse quantize pixel values of the first target tile;

the second obtaining module is specifically configured to obtain a first adaptive data from the dequantized first target tile.

A fifth aspect of the present application provides a decoding apparatus comprising:

the acquisition module is used for acquiring N first code representations, N first self-adaptive data and corresponding relations, wherein the corresponding relations comprise the corresponding relations of the N first self-adaptive data and the N first code representations, the N first self-adaptive data correspond to the N first codes one by one, and N is an integer greater than 1;

the entropy decoding module is used for carrying out entropy decoding on the N first coded representations to obtain N groups of second feature maps;

the decoding neural network module is used for processing the N groups of second feature maps to obtain N first reconstruction image blocks;

a compensation module for compensating the N first reconstruction tiles by the N first adaptive data;

and the combination module is used for combining the compensated N first reconstruction image blocks to obtain a second image.

In an optional design of the fifth aspect, the N first coded representations are obtained by quantization and entropy coding N sets of first feature maps, the N sets of first feature maps are obtained by processing N pre-processed first tiles by a coding neural network, the N pre-processed first tiles are obtained by pre-processing N first tiles by the N first adaptive data, the N first adaptive data are obtained from the N first tiles, and the N first tiles are obtained by dividing the first image.

In an alternative design of the fifth aspect, the N first adaptive data are N first adaptive quantized data, and the N first adaptive quantized data are obtained by quantizing the N first adaptive data;

the compensation module is specifically configured to compensate the N first reconstruction tiles with the N first adaptive quantization data.

In an alternative design of the fifth aspect, the larger N, the smaller the information entropy of the single first adaptively quantized data.

In an alternative design of the fifth aspect, an arrangement order of the N first coded representations is the same as an arrangement order of the N first tiles, and the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image. The corresponding relation comprises the arrangement sequence of the N first coding representations and the arrangement sequence of the N first image blocks.

In an alternative design of the fifth aspect, the apparatus further includes:

and the fusion neural network module is used for processing the second image to obtain a third image so as to reduce the difference between the second image and the first image, wherein the difference comprises a blocking effect.

In an alternative design of the fifth aspect, each of the N first tiles is the same size.

In an alternative design of the fifth aspect, the size of the first tile is a fixed value when the apparatus is used to generate second images of different sizes in combination. In an alternative design of the fifth aspect, the pixels of the first tile are a × b, a and b are derived from a target pixel, the target pixel is c × d,

equal to an integer number of times, is,

the image processing method comprises the steps that the number of pixels in the width direction is equal to an integer, a and c are the number of pixels in the width direction, b and d are the number of pixels in the height direction, a target pixel is obtained according to a target resolution of terminal equipment, the terminal equipment comprises a shooting component, a pixel of an image obtained by the shooting component under the setting of the target resolution is a target pixel, and a first image is obtained by the shooting component.

In an alternative design of the fifth aspect, the target resolution is set according to a resolution of the image capture component from a setting interface in the image capture application.

In an alternative design of the fifth aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing means, a pixel of the target image group is a target pixel, and a ratio of the target image group in the gallery is largest in the image group of different pixels.

In an alternative design of the fifth aspect, an image of a plurality of pixels is obtained by the image pickup device, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, e includes c, f includes d.

In an alternative design of the fifth aspect, the plurality of pixels are set by a resolution of the image capture component via a setting interface in the image capture application.

In an optional design of the fifth aspect, the pixels of the first image block are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t. In that

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the pixels of the padded first image are r1 × t 1.

In an alternative design of the fifth aspect

equal to an integer.

In an alternative design of the fifth aspect, if

Is not equal to an integer number of times,

is greater than

In an alternative design of the fifth aspect, if the remainder is less than

Wherein g is the remainder.

In an alternative design of the fifth aspect, the N first tiles include a first target tile, a range of pixel values of the first target tile is smaller than a range of pixel values of the first image, and at least one of the first adaptive data is obtained from an inverse quantized first target tile, which is obtained by inverse quantizing pixel values of the first target tile.

A sixth aspect of the present application provides a training apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a first image;

a second obtaining module, configured to obtain N first adaptive data from the N first tiles, where the N first adaptive data correspond to the N first tiles one to one;

a preprocessing module, configured to preprocess the N first tiles according to the N first adaptive data;

the first coding neural network module is used for processing the N preprocessed first image blocks to obtain N groups of first feature maps;

the quantization and entropy coding module is used for quantizing and entropy coding the N groups of first characteristic graphs to obtain N first coding representations;

the first decoding neural network module is used for processing the N groups of second feature maps to obtain N first reconstruction image blocks;

a compensation module to compensate the N first reconstruction tiles with the N first adaptive data;

the combination module is used for combining the compensated N first reconstruction image blocks to obtain a second image;

a third obtaining module, configured to obtain a distortion loss of the second image relative to the first image;

the training module is used for performing joint training on a model by using a loss function until an image distortion value between the first image and the second image reaches a first preset degree, and the model comprises the first coding neural network, a quantization network, an entropy coding network, an entropy decoding network and the first decoding neural network. Optionally, the model further comprises a segmentation network, the trainable parameters in the segmentation network being the size of the first tile. Optionally, the model further comprises a segmentation network, the trainable parameters in the segmentation network being the size of the first tile;

and the output module is used for outputting a second coding neural network and a second decoding neural network, wherein the second coding neural network is a model obtained after the iterative training is executed on the first coding neural network, and the second decoding neural network is a model obtained after the iterative training is executed on the first decoding neural network.

In an alternative design of the sixth aspect, the apparatus further includes:

a quantization module configured to quantize the N first adaptive data to obtain N first adaptive quantized data, where the N first adaptive quantized data are used to compensate for the N first reconstructed tiles;

in an alternative design of the sixth aspect, the larger the N, the smaller the information entropy of the single first adaptively quantized data.

In an optional design of the sixth aspect, an arrangement order of the N first coded representations is the same as an arrangement order of the N first tiles, and the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image.

In an alternative design of the sixth aspect, the second image is processed by a fused neural network to obtain a third image, the fused neural network is configured to reduce a difference between the second image and the first image, the difference includes a blocking effect;

the third obtaining module is specifically configured to obtain a distortion loss of the third image with respect to the first image;

the model includes a converged neural network.

In an alternative design of the sixth aspect, each of the N first tiles is the same size.

In an alternative design of the sixth aspect, in the two iterative trainings, sizes of the first images for training are different, and the size of the first image block is a fixed value.

In an alternative design of the sixth aspect, the pixel of the first tile is a × b, a and b are derived from a target pixel, the target pixel is c × d,

equal to an integer number of times, is,

In an alternative design of the sixth aspect, the target resolution is set according to a resolution of the image capture component according to a setting interface in an image capture application.

In an optional design of the sixth aspect, the target resolution is obtained from a target image group in a gallery obtained by the image capturing unit, a pixel of the target image group is the target pixel, and a ratio of the target image group in the gallery is the largest in the image group of different pixels.

In an alternative design of the sixth aspect, an image of a plurality of pixels is obtained by the image pickup device, the plurality of pixels being e x f,

equal to an integer number of times, is,

equal to an integer, said e comprising said c, said f comprising said d.

In an alternative design of the sixth aspect, the plurality of pixels are set by a resolution setting of the image capture component via a setting interface in the image capture application.

In an optional design of the sixth aspect, the pixels of the first tile are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t;

the device further comprises:

a filling module for

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the padded pixels of the first image are r1 × t 1.

In an alternative design of the sixth aspect, the apparatus further includes:

an amplifying module for amplifying the signal

Equal to an integer;

the filling module is used particularly if

In an optional design of the sixth aspect, the second obtaining module is further configured to obtain the second signal after scaling up r and t, if necessary

Is not equal to an integer, then obtain

The remainder of (1);

the padding module is specifically configured to, if the remainder is greater than

The median pixel value is filled only on one side in the width direction of the first image.

In an alternative design of the sixth aspect, if the remainder is less than

Wherein g is the remainder.

In an optional design of the sixth aspect, the N first tiles comprise a first target tile having a range of pixel values that is smaller than a range of pixel values of the first image;

the device further comprises:

A seventh aspect of the present application provides an encoding device, which may include a memory for storing a program, a processor for executing the program in the memory, and a bus system, including the steps of:

acquiring a first image;

acquiring N first adaptive data from N first image blocks, wherein the N first adaptive data correspond to the N first image blocks one by one;

preprocessing the N first image blocks through the N first adaptive data;

n preprocessed first image blocks are processed through a coding neural network to obtain N groups of first feature maps;

and quantizing and entropy coding the N groups of first feature maps to obtain N first coded representations.

In an optional design of the seventh aspect, the encoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop, a server, or a smart wearable device.

In a seventh aspect of the present application, the processor may be further configured to execute steps executed by the encoding end in each possible implementation manner of the first aspect, which may specifically refer to the first aspect, and details are not described here.

An eighth aspect of the present application provides a decoding device, which may include a memory for storing a program, a processor for executing the program in the memory, and a bus system, including the steps of:

acquiring N first code representations, N first self-adaptive data and a corresponding relation, wherein the corresponding relation comprises the corresponding relation of the N first self-adaptive data and the N first code representations, the N first self-adaptive data and the N first codes are in one-to-one correspondence, and N is an integer greater than 1;

processing the N groups of second feature maps through a decoding neural network to obtain N first reconstruction image blocks;

compensating the N first reconstruction tiles by the N first adaptive data;

and combining the compensated N first reconstruction image blocks to obtain a second image.

In an optional design of the eighth aspect, the decoding device is a virtual reality VR device, a mobile phone, a tablet, a laptop, a server, or an intelligent wearable device.

In the eighth aspect of the present application, the processor may be further configured to execute steps executed by the decoding end in each possible implementation manner of the second aspect, and reference may be specifically made to the second aspect, which is not described herein again.

A ninth aspect of the present application provides a training device, which may include a memory for storing a program, a processor for executing the program in the memory, and a bus system, including the steps of:

acquiring a first image;

preprocessing the N first tiles according to the N first adaptive data;

compensating the N first reconstruction tiles by the N first adaptive data;

acquiring distortion loss of the second image relative to the first image;

performing joint training on the first encoding neural network, the quantization network, the entropy encoding network, the entropy decoding network and the first decoding neural network by using a loss function until an image distortion value between the first image and the second image reaches a first preset degree;

In the ninth aspect of the present application, the processor may be further configured to execute the step performed by the decoding end in each possible implementation manner of the third aspect, and specifically, refer to the third aspect, which is not described herein again.

In a tenth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the image processing method according to any one of the first to third aspects.

In an eleventh aspect, an embodiment of the present application provides a computer program, which when run on a computer, causes the computer to execute the image processing method according to any one of the first to third aspects.

In a twelfth aspect, the present application provides a chip system, which includes a processor for enabling an executing device or a training device to implement the functions referred to in the above aspects, for example, to transmit or process data and/or information referred to in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence body framework;

FIG. 2a is a schematic diagram of an application scenario according to an embodiment of the present application;

FIG. 2b is a schematic diagram of another application scenario according to an embodiment of the present application;

FIG. 2c is a schematic diagram of another application scenario according to an embodiment of the present application;

fig. 3a is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3b is another schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a segmented and combined image according to an embodiment of the present application;

fig. 5 is a schematic diagram of a CNN-based image encoding process in an embodiment of the present application;

fig. 6 is a schematic diagram of a CNN-based image decoding process in an embodiment of the present application;

fig. 7 is a schematic diagram of a setting interface for setting resolution by a camera of a terminal device in an embodiment of the present application;

FIG. 8 is a schematic flow chart of image filling in the embodiment of the present application;

FIG. 9 is another schematic flow chart of image filling in the embodiment of the present application;

FIG. 10 is a comparison diagram of image compression quality in the embodiment of the present application;

FIG. 11 is a system architecture diagram of an image processing system according to an embodiment of the present application;

FIG. 12 is a schematic flow chart of a model training method according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a training process provided by an embodiment of the present application;

fig. 14 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a decoding apparatus according to an embodiment of the present application;

FIG. 16 is a schematic structural diagram of an exercise device according to an embodiment of the present disclosure;

fig. 17 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 18 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application;

fig. 19 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system is described first, please refer to fig. 1, fig. 1 is a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of an "intelligent information chain" (horizontal axis) and an "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, safe city etc..

The application can be applied to the field of image processing in the field of artificial intelligence, and a plurality of application scenes of a plurality of products falling to the ground are introduced below.

The method is applied to the image compression and decompression processes in the terminal equipment, and the encoding end and the decoding end are both terminal equipment.

The image compression method provided by the embodiment of the application can be applied to the image compression process in the terminal equipment, and particularly can be applied to photo albums, video monitoring and the like on the terminal equipment. Specifically, referring to fig. 2a, fig. 2a is an application scenario schematic diagram of the embodiment of the present application, as shown in fig. 2a, the terminal device may acquire an image to be compressed, where the image to be compressed may be a photo taken by an image pickup component or a frame of picture captured from a video, and the image pickup component is generally a camera. The terminal device divides the extracted image by a Central Processing Unit (CPU) to obtain a plurality of image blocks. After obtaining a plurality of image blocks, the terminal device may perform feature extraction on the obtained plurality of image blocks through an Artificial Intelligence (AI) encoding neural network (abbreviated as encoding neural network) in an embedded neural Network (NPU), transform the image block data into output features with lower redundancy, and generate probability estimation of each feature point in the output features, and a Central Processing Unit (CPU) performs entropy encoding on the extracted output features through the probability estimation of each point in the output features, thereby reducing encoding redundancy of the output features, further reducing data transmission amount in the image block compression process, and storing the encoded data obtained by encoding in a corresponding storage location in a form of a data file. When a user needs to acquire a file stored in the storage location, the CPU may acquire and load the stored file in the corresponding storage location, acquire a feature map obtained by decoding based on entropy decoding, and reconstruct the feature map by an AI decoding neural network (decoding neural network for short) in the NPU to obtain a plurality of reconstructed image blocks. After obtaining the plurality of image blocks, the terminal device combines the plurality of reconstruction image blocks through the CPU to obtain a reconstruction image.

In particular, in such a scenario, the terminal device may save the encoded data on the cloud device. When the user needs to obtain the coded data, the coded data can be obtained from the cloud equipment.

And secondly, the method is applied to image compression and decompression processes of the cloud, and the encoding end and the decoding end are cloud devices.

The image compression method provided by the embodiment of the application can be applied to the image compression process of the cloud, specifically, the image compression method can be applied to the functions of a cloud album and the like on the cloud equipment, and the cloud equipment can be a cloud server. Specifically, referring to fig. 2b, fig. 2b is another application scenario diagram of the embodiment of the present application, as shown in fig. 2b, the terminal device may acquire an image to be compressed, where the image to be compressed may be a picture taken by an image pickup component or a frame of picture captured from a video. The terminal equipment can carry out entropy coding on the picture to be compressed through the CPU to obtain coded data. Instead of entropy coding, any lossless compression method based on the prior art may be used. The terminal device can transmit the coded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received coded data to obtain an image to be compressed. The terminal device divides the extracted image by the CPU to obtain a plurality of image blocks. After obtaining a plurality of image blocks, the server can perform feature extraction on the obtained plurality of image blocks through a coding neural network in a Graphics Processing Unit (GPU), convert the image block data into output features with lower redundancy, generate probability estimation of each point in the output features, and the CPU performs entropy coding on the extracted output features through the probability estimation of each point in the output features, so as to reduce coding redundancy of the output features, further reduce data transmission amount in the image block compression process, and store coded data obtained through coding in a corresponding storage position in the form of a data file. When a user needs to acquire a file stored in the storage location, the CPU can acquire and load the stored file at the corresponding storage location, acquire a decoded feature map based on entropy decoding, reconstruct the feature map through a decoding neural network in the NPU to obtain a plurality of reconstructed image blocks, and after the plurality of image blocks are obtained, the cloud device combines the plurality of reconstructed image blocks through the CPU to obtain a reconstructed image. The cloud device can perform entropy coding on a picture to be compressed through the CPU to obtain coded data, the coding method can also be any other lossless compression method based on the prior art, the cloud device can transmit the coded data to the terminal device, and the terminal device can perform corresponding entropy decoding on the received coded data to obtain a decoded image.

And thirdly, the method is applied to image decompression of the terminal equipment and image compression of the cloud equipment, wherein the encoding end is the cloud equipment, and the decoding end is the terminal equipment.

The image compression method provided by the embodiment of the application can be applied to image compression of the terminal device and an image decompression process of the cloud device, specifically, the image compression method can be applied to functions of a cloud album and the like on the cloud device, and the cloud device can be a cloud server. Specifically, referring to fig. 2c, fig. 2c is another application scenario diagram of the embodiment of the present application, as shown in fig. 2c, the terminal device may acquire an image to be compressed, where the image to be compressed may be a picture taken by an image pickup component or a frame of picture captured from a video. The terminal equipment can carry out entropy coding on the picture to be compressed through the CPU to obtain coded data. Instead of entropy coding, any lossless compression method based on the prior art may be used. The terminal device can transmit the coded data to the cloud device, and the cloud device can perform corresponding entropy decoding on the received coded data to obtain an image to be compressed. The terminal device divides the extracted image by the CPU to obtain a plurality of image blocks. After obtaining a plurality of image blocks, the server can extract the characteristics of the obtained image blocks through a coding neural network in the GPU, transform the image block data into output characteristics with lower redundancy, generate probability estimation of each point in the output characteristics, and the CPU carries out entropy coding on the extracted output characteristics through the probability estimation of each point in the output characteristics, thereby reducing the coding redundancy of the output characteristics, further reducing the data transmission amount in the image block compression process, and storing the coded data obtained by coding in a corresponding storage position in the form of a data file. When the terminal equipment needs to acquire the image, the terminal equipment receives the coded data sent by the cloud equipment and acquires the feature map obtained by decoding based on entropy decoding. And the terminal equipment reconstructs the characteristic diagram through a decoding neural network in the NPU to obtain a plurality of reconstructed image blocks. After obtaining the plurality of image blocks, the terminal device combines the plurality of reconstruction image blocks through the CPU to obtain a reconstruction image.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the sake of understanding, the following description will be made first of all with respect to terms and concepts of the neural networks to which the embodiments of the present application may relate.

(1) Neural network

The neural network may be composed of neural units, the neural units may refer to operation units with xs and intercept 1 as inputs, and the output of the operation units may be:

where s is 1, 2, … …, n is a natural number greater than 1, Ws is the weight of Xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.

Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, and the subscripts correspond to the third layer index 2 of the output and the inputThe second level index 4.

In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(5) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

In the embodiment of the application, not only the image segmentation operation is performed, but also a step of extracting adaptive data of a plurality of image blocks is added between the image segmentation and the encoding neural network, and the adaptive data can be a mean value, a mean square error and the like. The extracted adaptive data is used for preprocessing the image blocks. And the coding neural network performs feature extraction on the preprocessed multiple image blocks. The adaptation data is used to compensate the reconstructed tile in addition to pre-processing the tile. For convenience of description, the image processing method in the embodiment of the present application will be described in detail below by taking the adaptive data as an average value and the application scenario as the third application scenario. In a third application scenario, the cloud device is a coding end, and the terminal device is a decoding end.

As an example, the terminal device may be a mobile phone, a tablet, a notebook, a smart wearable device, and the like. As another example, the terminal device may be a Virtual Reality (VR) device. As another example, the embodiment of the present application may also be applied to intelligent monitoring, where a camera may be configured in the intelligent monitoring, and the intelligent monitoring may obtain a picture to be compressed through the camera, and it should be understood that the embodiment of the present application may also be applied to other scenes that need to be subjected to image compression, and no other scenes are listed here.

Referring to fig. 3a, fig. 3a is a schematic flowchart of an image processing method according to an embodiment of the present disclosure.

In step 301, the terminal device acquires a first image.

The terminal device may acquire a first image, where the first image may be a picture taken by an image pickup component or a frame of picture taken from a taken video, and the terminal device includes the image pickup component, and the image pickup component is typically a camera. The first image may also be an image obtained by the terminal device from a network, or an image obtained by the terminal device using a screen capture tool.

In particular, regarding the image processing method in the embodiment of the present application, reference may also be made to fig. 3b, where fig. 3b is another schematic flow chart of the image processing method provided in the embodiment of the present application. Fig. 3b illustrates the overall flow of outputting the third image from the first image.

In step 302, the terminal device sends a first image to the cloud device.

Before the terminal device sends the first image to the cloud device, the terminal device may perform lossless encoding on the first image to obtain encoded data. The encoding method may be entropy encoding, or other lossless compression methods.

In step 303, the cloud device segments the first image to obtain N first image blocks.

The cloud device can receive a first image sent by the terminal device. If the first image is subjected to lossless encoding by the terminal device, the cloud device also needs to perform lossless decoding on the first image. The cloud device divides the first image to obtain N first image blocks, wherein N is an integer larger than 1. Fig. 4 is a schematic diagram of a segmented and combined image in an embodiment of the present application. As shown in fig. 4, the first image 401 is divided into 12 first tiles. Wherein, under the condition that the size of the first image is determined, the size of the first image block determines the value of N. N is 12, which is just an example, and the size of the first block will be described in detail in the following description.

Optionally, each of the N first tiles is the same size.

In step 304, the cloud device obtains M first mean values from the N first tiles.

If the first image is a three-channel image, the first image block includes data of three channels, and the number M of the first average values acquired by the cloud device is equal to 3N. If the first image is a grayscale image, that is, a channel image, the first image block includes data of one channel, and the number M of the first average values acquired by the cloud device is equal to N. Because the processing manner of each channel is similar, for convenience of description, in the embodiment of the present application, only one channel is taken as an example for description. The average value is an average value of pixel values of all the pixels in the first image block.

In step 305, the cloud device preprocesses the N first tiles with the N first average values.

The preprocessing may be to subtract the average value from the pixel value of each pixel point in the first tile to obtain N preprocessed first tiles.

In step 306, the preprocessed N first tiles are processed by the encoding neural network to obtain N sets of first feature maps.

In this embodiment of the application, optionally, the coding neural network is CNN, and the terminal device may perform feature extraction on the N preprocessed first image blocks based on CNN to obtain N groups of first feature maps. Each group of first feature maps corresponds to one first image block, and each group of first feature maps at least comprises one feature map. Hereinafter, the first feature map may also be referred to as a channel feature map image, where each semantic channel corresponds to one first feature map.

In the embodiment of the present application, referring to fig. 5, fig. 5 is a schematic diagram of an image encoding process based on CNN in the embodiment of the present application, and fig. 5 shows a first tile 501, a CNN502, a channel feature map 503, and a set of first feature maps 504, where the CNN502 may include multiple CNN layers.

For example, the CNN502 may multiply the upper-left 3 × 3 pixels of the input data (first tile) by weights and map them to neurons at the upper-left end of the first feature map. The weight to be multiplied will also be 3 x 3. Thereafter, in the same process, the CNN502 scans the input data (first tile) one by one from left to right and top to bottom, and multiplies by the weights to map the neurons of the feature map. The 3 × 3 weights used are referred to herein as filters or filter kernels. That is, the process of applying a filter in the CNN502 is a process of performing a convolution operation using a filter kernel, and the extracted result is referred to as a "channel feature map", where the channel feature map may also be referred to as a multichannel feature map image, and the term "multichannel feature map image" may refer to a feature map image set corresponding to a plurality of channels. According to an embodiment, a channel feature map may be generated by CNN502, which CNN502 is also referred to as a "feature extraction layer" or "convolutional layer" of the CNN. The layers of CNN may define the mapping of outputs to inputs. The mapping defined by the layers is performed as one or more filter kernels (convolution kernels) to be applied to the input data to generate a channel profile to be output to the next layer. The input data may be the first tile or the channel profile output by CNN 502.

Referring to fig. 5, during forward execution, the CNN502 receives the first tile 501 and generates as input a channel profile 503. In addition, during forward execution, the next layer CNN receives the channel feature map 503 as an input and generates the channel feature map 503 as an output. Each subsequent layer will then receive the channel feature map generated in the previous layer and generate the next layer channel feature map as an output. Finally, a set of first feature maps 504 generated in the (X1) th layer is received. Where X1 is an integer greater than 1, i.e., the channel profiles of each layer described above, are possible as a set of first profiles 504.

The cloud device repeats the above operations for each first image block, so as to obtain N groups of first feature maps.

Optionally, as the CNN502 level increases, the length and width of each feature map in the multi-channel feature map image gradually decrease, and the number of semantic channels of the multi-channel feature map image gradually increases, so as to implement data compression on the first tile.

Meanwhile, other processing operations may be performed in addition to the operation of applying a convolution kernel that maps the input feature map to the output feature map. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.

For example, as shown in fig. 3b, optionally, after each layer of convolution kernel, a GDN (generalized differential normalization) activation function is further included, where the GDN is expressed in the form of:

where u represents the jth channel of the ith convolutional layer output. v denotes the corresponding activation function output result, and β and γ are trainable parameters of the activation function, respectively, to enhance the nonlinear expression capability of the neural network.

It should be noted that the above is only one implementation of feature extraction for the first image block, and in practical applications, the specific implementation of feature extraction is not limited.

In the embodiment of the present application, the first tile is transformed to another space (at least one first feature map) by the CNN convolutional neural network in the above manner. Optionally, the number of the first feature maps is 192, that is, the number of the semantic channels is 192, and each semantic channel corresponds to one first feature map. In this embodiment, at least one first feature map may be in the form of a three-dimensional tensor, and the size of the tensor may be 192 × w × h, where w × h is the width and length of the matrix corresponding to the first feature map of a single channel.

In step 307, the N sets of first feature maps are quantized and entropy encoded to obtain N first encoded representations.

In the embodiment of the application, after N groups of first feature maps are obtained by processing the N preprocessed first image blocks through the coding neural network, quantization and entropy coding can be performed on the N groups of processed first feature maps to obtain N first coded representations.

In the embodiment of the application, the N groups of first characteristic graphs are converted to a quantization center according to a specified rule so as to carry out entropy coding subsequently. The quantization operation may convert the N sets of first feature maps from floating point numbers to a bitstream (e.g., a bitstream using a particular bit integer, such as an 8-bit integer or a 4-bit integer). In some embodiments, the quantization operation may be performed on the N sets of first feature maps using, but not limited to, rounding.

In the embodiment of the present application, an entropy estimation network may be used to obtain probability estimates of each point in the output features, and entropy encoding is performed on the output features by using the probability estimates to obtain binary code streams.

In step 308, the cloud device sends N first coded representations, N first mean values, and a corresponding relationship to the terminal device.

In the step 302, the terminal device stores the first image in the cloud device. If the terminal device needs to acquire the first image, a request can be sent to the cloud device. After the cloud device receives the request sent by the terminal device, the cloud device sends N first code representations, N first mean values and the corresponding relation to the terminal device. The correspondence relationship refers to a correspondence relationship between the N first coded representations and the N first mean values.

Optionally, the arrangement order of the N first coding representations is the same as the arrangement order of the N first tiles, the arrangement order of the N first tiles is the arrangement order of the N first tiles in the first image, and the correspondence relationship includes the arrangement order of the N first coding representations and the arrangement order of the N first tiles.

Optionally, before the cloud device sends the N first mean values to the terminal device, the cloud device quantizes the N first mean values to obtain the N first quantized mean values. For example, the pixel value of each pixel point in the first tile is represented by 8 bits, and the first average value of the first tile is a 32-bit floating point number. And quantizing the first average value by the cloud side equipment, wherein the bit number of the obtained first quantized average value is less than 32. The smaller the number of bits of the first quantized average value is, the smaller the information entropy of the first quantized average value is. Further, the number of bits of the first quantized average is equal to the number of bits of the pixel value of each pixel in the first block, that is, when the pixel value of each pixel in the first block is represented by 8 bits, the first quantized average is also represented by 8 bits.

Optionally, on the basis that the cloud device quantizes the N first mean values, the larger the value of N is, the smaller the information entropy of a single first quantized mean value is. The information entropy is used for describing the quantization degree of the N first mean values by the cloud side equipment. If the information entropy of a single first quantization mean value is smaller, the smaller the information entropy of the single first quantization mean value is, the higher the degree of quantization of the N first mean values by the cloud-side device is. When processing first images of the same pixels, the smaller the pixels of each first image, the larger N. The larger N is, the larger the data amount of the N first means is. For example, assuming that the pixels of the first image are 640 × 480, the pixels of the first tile are 320 × 480, N is 2, and each first quantization average is represented by 8 bits, the data amount of the N first quantization averages is 2 × 8 bits. Assuming that the pixels of the first tile are 1 × 1, N is 640 × 480, and each first quantization average is represented by 8 bits, the data amount of the N first quantization averages is 640 × 480 × 8 bits. The data amount of the first image is also 640 × 480 × 8 bits. It can be seen that the larger the value of N, the larger the data amount of N first average values, and when N is equal to the pixel size of the first image, the data amount of even the quantized first average value reaches the data amount of the first image. Therefore, the larger the value of N in the embodiment of the present application, the smaller the information entropy of the single first quantization mean.

In step 309, the terminal device entropy decodes the N first encoded representations to obtain N sets of second feature maps.

After the terminal device receives the N first coded representations sent by the cloud device, the terminal device performs entropy decoding on the N first coded representations to obtain N groups of second characteristic diagrams.

In step 310, the terminal device processes the N sets of second feature maps by decoding the neural network to obtain N first reconstruction tiles.

In this embodiment of the application, optionally, the decoding neural network is CNN, and the terminal device may reconstruct N groups of second feature maps based on CNN to obtain N groups of first image blocks. Each group of second feature maps corresponds to one first image block, and each group of second feature maps at least comprises one feature map. In the following, the second feature map may also be referred to as a reconstructed feature map image, wherein each semantic channel corresponds to one second feature map.

In this embodiment, referring to fig. 6, fig. 6 is a schematic diagram of an image decoding process based on CNN in this embodiment, and fig. 6 shows a set of second feature maps 601, a transposed CNN602, a reconstructed feature map 603, and a first reconstructed tile block 604. Where the transposed CNN602 may include multiple transposed CNN layers.

For example, the transposed CNN602 may multiply the top-left pixel of the input data (set of second feature maps 601) by a weight and map it to the upper-left neuron of the reconstructed feature map 603. The weight to be multiplied will be 3 x 3. Thereafter, in the same process, the transposed CNN602 scans the input data (a set of second feature maps 601) one by one from left to right and from top to bottom, and multiplies the weights to map the neurons of the feature maps. After the transposed CNN602 with the weight of 3 × 3, the length and width of the obtained reconstructed feature map 603 become 3 times of the second feature map. The 3 × 3 weights used are referred to herein as inverse filters or inverse filter kernels. That is, the process of applying the inverse filter in the transposed CNN602 is a process of performing an deconvolution operation using an inverse filter kernel, and the extracted result is referred to as a "reconstructed feature map". According to an embodiment, the reconstructed feature map may be generated by a transposed CNN602, the transposed CNN602 also being referred to as a transposed convolutional layer of the CNN. The layers of CNN may define the mapping of outputs to inputs. The mapping defined by the layers is performed as one or more anti-filter kernels (transposed convolutional layers) to be applied to the input data to generate a reconstructed feature map to be output to the next layer. The input data may be a set of second profiles or a reconstructed profile of a particular layer.

Referring to fig. 6, the transposed CNN602 receives a set of second feature maps 601 and generates as output a reconstructed feature map 603. The next-layer transposed CNN receives the reconstructed feature map 603 as an input and generates a reconstructed feature map of the next layer as an output. Each subsequent transposed CNN layer will then receive the reconstructed feature map generated in the previous layer and generate as output the next reconstructed feature map. Finally, a first reconstruction tile 604 generated in the (X2) th layer is received, wherein X2 is an integer greater than 1, i.e., the reconstruction feature maps of each layer described above are possible as the first reconstruction tile 604. The cloud device repeats the above operations on each group of second feature maps, so that N first reconstruction image blocks can be obtained.

Optionally, as the number of layers of the transposed CNN increases, the length and width of each feature map in the reconstructed feature map gradually increase until the size of the first tile before being input into the encoded neural network is restored. The number of semantic channels of the reconstructed feature map is gradually reduced until the semantic channel of the first tile block before being input into the encoding neural network is restored, when the first tile block is a single-channel image, the semantic channel of the first reconstructed tile block 604 is 1, and when the first tile block is a three-channel image, the semantic channel of the first reconstructed tile block 604 is 3. Through the reconstruction, the data decoding of the first image block is realized.

Meanwhile, other processing operations may be performed in addition to applying the operation of mapping each set of second feature maps to the transposed convolution kernel of the reconstructed feature map. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.

For example, as shown in fig. 3b, after decoding each layer of transposed convolution kernel in the neural network, Inverse Generalized Differential Normalization (iGDN) is also included, and the iGDN is an approximate inverse form of the activation function of the coding segment GDN, and the expression form of the iGDN is:

where v represents the jth channel of the ith convolutional layer output. u denotes the corresponding activation function output result, and β and γ are trainable parameters of the activation function, respectively, to enhance the nonlinear expression capability of the neural network.

In step 311, the terminal device compensates the N first reconstruction tiles by the N first averages.

In step 308, the cloud device sends the N first averages and the corresponding relationship to the terminal device. After N first reconstruction image blocks are obtained through the decoding neural network, the terminal device compensates the N first reconstruction image blocks by using the N first average values through the corresponding relation. The compensation means that the pixel value of each pixel point in the first reconstruction image block is added with the first average value to obtain the compensated first reconstruction image block. After the terminal device repeatedly performs compensation on the N first reconstruction blocks, the compensated N first reconstruction blocks can be obtained.

Optionally, when the terminal device receives the N first quantized average values from the cloud device, the terminal device compensates the N first reconstructed tiles by using the N first quantized average values. It should be determined that, when the terminal device compensates the N first reconstructed tiles by the N first quantization averages, the cloud device will also pre-process the N first tiles by the N first quantization averages.

In step 312, the terminal device combines the compensated N first reconstruction tiles to obtain a second image.

Referring to fig. 4, the combination is an inverse process of the division, the N first blocks are replaced by N first reconstructed blocks, and then the N first reconstructed blocks are combined.

In step 313, the terminal device processes the second image through the converged neural network to obtain a third image.

The embodiment of the application enhances the performance of each tile by highlighting the local characteristics of each first tile, but is also easy to cause blocking effect between the first reconstructed tiles. The blocking effect is a defect that a discontinuity phenomenon occurs at a boundary between the first reconstruction image block and the first reconstruction image block to form a reconstruction image. The second image is processed by fusing the neural network, so that the influence caused by the blocking effect can be reduced, and the image quality is improved.

Optionally, the converged neural network is a CNN. Referring to fig. 5 and 6, from the CNN structure, the fusion neural network may be a combination of an encoding neural network and a decoding neural network. By taking the output 504 in fig. 5 as the input 601 in fig. 6, the second image is taken as the input 501 in fig. 5, and the output in fig. 6 is the third image. By fusing the neural networks, blocking artifacts in the second image can be eliminated. It should be determined that this is a framework that simply illustrates a converged neural network. In practical applications, the framework of the fusion neural network, such as the number of CNNs, the number of transposed CNNs, the size of the matrix of each CNN layer, etc., may have no relation to the encoding neural network, the decoding neural network.

Optionally, as shown in fig. 3b, after the convolution kernel of the neural network is fused, a linear rectification unit layer ReLU is further included, and the ReLU is used for correcting the negative number in the feature map output by the convolution kernel to zero.

The above description has been made on the flow of processing one image by using the image processing method in the embodiment of the present application, and optionally, the image processing method in the embodiment of the present application may process images with different sizes, for example, a fourth image, and pixels of the fourth image are different from pixels of the first image. The process of processing the fourth image by using the image processing method in the embodiment of the present application is similar to the process of processing the first image, and details are not repeated here. In particular, when the fourth image is processed by using the image processing method in the embodiment of the present application, the cloud device segments the fourth image, and may obtain M second tiles. The size of the second image block is the same as that of the first image block. In the case that the sizes of the first and second blocks are the same, the first and second blocks are processed using the same encoding and decoding neural networks, and the number of convolution operations in the processing flow and the number of data participating in each convolution operation of the first and second blocks are the same. In this case, the corresponding convolution operation unit may be designed according to the number of times of convolution operation and/or the number of data participating in convolution operation each time, so that the convolution operation unit matches with the processing flow. Since the number of convolution operations in the processing flow and the number of data participating in convolution operations at each time are determined by the size of the first tile block and the CNN, it can also be considered that the convolution operation unit matches the first tile block, or the convolution operation unit matches the encoding neural network and/or the decoding neural network. The higher the matching degree of the convolution operation unit and the first image block is, the smaller the number of idle multipliers and adders in the convolution operation unit in the processing flow is, namely, the higher the use efficiency of the convolution operation unit is.

The image processing method in the embodiment of the present application is described above accordingly. In the above flow, the size of the first block not only affects the size of N, but also affects whether the image is just divided into blocks of integer blocks. The determination of the size of the first block generally has the following two aspects. The first aspect is the effect of a model on the size of the first tile, the model comprising an encoding neural network and a decoding neural network, and possibly a fusion neural network. The effect of the model on the size of the first tile generally includes the effect on the size of the first tile when the model is trained and the effect on the size of the first tile when the model is used. The influence on the size of the first image block during model training comprises the steps of training the model by utilizing image blocks with different sizes, and determining in which interval or image block with which value the convergence speed of the model is higher, or the image quality output by the model is high, or the compression performance of the model is high. Different models are used for image blocks with different sizes, and when different models are used, the different models may have different performances in different scenes, namely, the generalization problem of the models. The effect on the size of the first tile when using the model includes the generalization problem. In a second aspect, the effect on the size of the first tile is whether the image has just been partitioned into tiles of integer blocks. If the image cannot be just divided into integer blocks, some image blocks are incomplete, which affects the reconstruction of the image blocks by the model and reduces the quality of the image. In order to reduce the influence of the second aspect on the image quality, some related technical solutions are proposed below.

In step 302, the terminal device sends the first image to the cloud device. In this scenario, the encoded neural network in the cloud device may serve the terminal device exclusively, or the terminal device of this type. If the terminal device includes an image capturing component, such as a camera, it is desirable that the first image obtained by the terminal device through the camera can be divided into integer pieces by the cloud device. Assuming that the pixels of the first tile are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, a and b are obtained from the target pixel, which is c × d,

equal to an integer number of times, is,

equal to an integer, the target pixel is derived from a target resolution of the terminal device, the target resolution being a default resolution of a photographing part of the terminal device or a resolution set thereto by the terminal device. The image pickup means of the terminal device obtains a pixel of an image at the setting of the target resolution as a target pixel, and the first image is obtained from the image pickup means. In particular, when the terminal device is the encoding side, such as the first application scenario described above, it is more meaningful to determine the size of the first tile by the target resolution. Since the target resolution indicates pixels of an image that may be acquired by the terminal device in the future, that is, pixels of an image that will be processed by the encoding end using the image processing method in the embodiment of the present application in the future, when the model is trained, the target pixels may be trained.

Alternatively, the target resolution is set according to a resolution of the image pickup means from a setting interface in the image pickup application. The setting interface of the camera shooting application can set the resolution obtained by shooting of the camera shooting component. The resolution that has been selected in the setting interface is taken as the target resolution. Please refer to the drawingsFig. 7 is a schematic diagram of a setting interface for setting resolution by a camera of a terminal device in the embodiment of the present application. In the set-up interface diagram, the resolution is [4:3 ]]Option 701 of 10MP is selected, although the options herein do not specify a particular value for the target resolution. However, from the first image obtained by the shooting, it is found that the pixels of the first image are 2736 × 3648, that is, the target pixels are 2736 × 3648. Determining the size of the first image block by determining the target pixel

Equal to an integer number of times, is,

equal to an integer.

Alternatively, the target resolution is obtained from a target image group in a gallery obtained by the image pickup means, the pixel of the target image group is a target pixel, and the ratio of the target image group in the gallery is the largest in the image group of different pixels. The image library obtained by the coding end through the image pickup component comprises image groups with different pixels. For example, as shown in fig. 7, the terminal corresponding to the setting interface diagram can obtain images of 4 pixels through the camera. In the gallery of the camera of the terminal device, it is determined which pixel has the largest image proportion, and it can be guaranteed that the image of the pixel can be just divided into integer blocks.

Alternatively, an image of a plurality of pixels, e x f,

equal to an integer number of times, is,

equal to an integer, e includes c, f includes d. Wherein the terminal device can obtain images of different pixels through the image pickup means, and e × f is a pixel set of the images of the different pixels. For example, as shown in fig. 7, the terminal corresponding to the setting interface diagram can obtain images of 4 pixels through the camera. If it is

Equal to an integer number of times, is,

equal to an integer number, indicates that the 4 pixels of the image can be divided into integer blocks.

Optionally, the e × f further includes pixels obtained by the terminal device through screen capture.

Alternatively, the plurality of pixels are set by the resolution of the image pickup means through a setting interface in the image pickup application.

In the above, the scheme of trying to ensure that the first image is divided into integer blocks is described, but in practical applications, there are images that cannot be divided into integer blocks. In this case, in order to improve the compatibility of the model, the first image needs to be padded, which is described in relation thereto.

Optionally, the pixels of the first tile are a × b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and the pixels of the first image are r × t. After acquiring the first image, prior to segmenting the first image, the method further comprises: if it is

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the pixels of the padded first image are r1 × t 1. The median pixel value refers to a median value of pixel values of a single pixel point of the first image, for example, when one pixel point of the first image is represented by 8 bits, the median pixel value is 128. By filling the edges of the image with image median values, image quality can be reducedThe compatibility of the model is improved under the influence of the quantity. The image median is the median of the pixel points.

Optionally, before filling the edge of the first image, further comprising: if it is as described

Equal to an integer. The number of tiles that fill in the median of the pixels affects the quality of the image. And the image is amplified in an equal ratio, so that the number of image blocks for filling the median of the image is reduced, and the image quality is improved. As shown in fig. 8, fig. 8 is a schematic flow chart of image filling in the embodiment of the present application. In fig. 8a, the pixels of the first tile are a × b, the pixels of the first image are r × t, and after r and t are scaled up, the pixels of the first image are r2 × t2 as shown in fig. 8 b. Before amplification, as shown in 8a of fig. 8, the number of tiles to be filled is 6, and after amplification, as shown in 8b of fig. 8, the number of tiles to be filled is 4, thereby reducing the number of tiles to be filled. In particular, if

Not equal to an integer, r and t are amplified in equal proportion so that

The number of tiles that need to be filled is reduced to 2.

Alternatively, after scaling up r and t, if

Is not equal to an integer, then obtain

The remainder of (1). If the remainder is greater than

The median pixel values are filled only on one side of the first image in the width direction. And the pixel median is filled only at one side of the image, so that the number of image blocks for filling the image median is further reduced under the condition of reducing the influence of filling on the image block, and the image quality is improved. As shown in fig. 8b, the remainder g is greater than

As shown in fig. 8c, the median pixel value is filled on one side of the first image.

Alternatively, if the remainder is less than

Wherein g is the remainder. The influence of filling on the image blocks is reduced, and the image quality is improved. As shown in fig. 9, fig. 9 is another schematic flow chart of image filling in the embodiment of the present application. If the remainder g is less than

Filling image median values on both sides of the first image, the width of the filled image median values being

According to the image reconstruction method, the first image is segmented to obtain N first image blocks, respective average values are obtained from different first image blocks, and then the average values are used for compensating the first reconstructed image blocks so as to achieve the purpose of highlighting the local characteristics of the first image. In particular, the N first tiles include a first target tile having a range of pixel values that is smaller than a range of pixel values of the first image. Before obtaining N first adaptive data from N first tiles, the method further comprises: the cloud device inverse quantizes pixel values of the first target tile. The cloud device acquires N pieces of first adaptive data from the first inversely quantized target image block. The locally significant property of the first image is further emphasized by inverse quantizing the pixel values of the first target tile. Any first tile can be understood as a local characteristic of the first image, and by highlighting the local characteristic of the first image, the reconstruction quality of the image, that is, the compression quality of the image, can be improved. As shown in fig. 10, fig. 10 is a comparison diagram of image compression quality in the embodiment of the present application. The abscissa represents the number of Bits Per Pixel (BPP) for measuring the code rate. The ordinate represents the peak signal-to-noise ratio (PSNR) used to measure quality. The compression algorithm compared with the image processing method in the embodiment of the present application includes different implementations of JPEG2000, hevc (high efficiency video coding), and vvc (scalable video coding) standards. For JPEG2000, the reference software OpenJPEG is used to express its compression performance. Meanwhile, the implementation integrated in Matlab is complementary to JPEG2000 compression performance. For HEVC, the reference software HM-16.15 is employed to reflect the rate-distortion (RD) performance. The performance of the VVC standard is expressed using the VVC standard reference software VTM-6.2. It is noted that in the encoding configuration of VTM-6.2, the input image bit depth and the intra-computed bit depth are set to 8 to be compatible with the format of the input image, and the test image is encoded using an all-intra (AI) configuration. The rate-distortion performance of each compression algorithm is shown in fig. 10, the rate-distortion performance curve of OpenJPEG is 1001, the rate-distortion performance curve realized by Matlab of JPEG2000 is 1002, the performance curve of 420 image format compression of reference software HM-16.15 is 1003, the performance curve of the non-blocked convolutional neural network image compression algorithm is 1004, the performance curve of the present invention is 1005, and the performance curve of 420 image format compression of reference software VTM-6.2 is 1006.

The image processing method in the embodiment of the present application is described above, and the image processing system in the embodiment of the present application is described below.

Referring to fig. 11, fig. 11 is a system architecture diagram of an image processing system according to an embodiment of the present disclosure, in fig. 11, an image processing system 200 includes an execution device 210, a training device 220, a database 230, a client device 240, and a data storage system 250, and the execution device 210 includes a calculation module 211.

The database 230 stores a first image set, and optionally, the database 230 further includes a fourth image set. The training device 220 generates the target model/rule 201 for processing the first image and/or the fourth image, and iteratively trains the target model/rule 201 by using the first image and/or the fourth image in the database to obtain a mature target model/rule 201. In the embodiment of the present application, the target model/rule 201 includes an encoding neural network and a decoding neural network, and optionally, the target model/rule 201 further includes a fusion neural network.

The encoded neural network and the decoded neural network obtained by the training device 220 can be applied to different systems or devices, such as a mobile phone, a tablet, a notebook computer, a VR device, a monitoring system, and so on. The execution device 210 may call data, codes, and the like in the data storage system 250, or store data, instructions, and the like in the data storage system 250. The data storage system 250 may be disposed in the execution device 210 or the data storage system 250 may be an external memory with respect to the execution device 210.

The calculation module 211 receives a first image sent by the client device 240, segments the first image to obtain N first image blocks, extracts N first adaptive data from the N first image blocks, pre-processes the N first image blocks by using the N first adaptive data, performs feature extraction on the pre-processed N first image blocks by using a coding neural network to obtain N groups of first feature maps, and quantizes and entropy-codes the obtained N groups of first feature maps to obtain N coding tables, where N is an integer greater than 1.

The calculation module 211 may further perform entropy decoding on the N encoded representations to obtain N sets of second feature maps, and then process the N sets of second feature maps through a decoding and decoding neural network to obtain N first reconstruction tiles. After N first reconstruction image blocks are obtained, N first adaptive data are used for compensating the N first reconstruction image blocks. The calculation module 211 combines the N first reconstruction tiles to obtain a second image. Optionally, when the target model/rule 201 further includes a fused neural network, the calculation module 211 may further process the second image using the fused neural network to obtain a third image. Wherein the fusion neural network is used to reduce the difference between the second image and the first image, the difference including blocking artifacts.

In some embodiments of the present application, referring to fig. 11, the execution device 210 and the terminal device 240 may be independent devices, the execution device 210 is configured with the I/O interface 212 to perform data interaction with the terminal device 240, the "user" may input the first image to the I/O interface 212 through the terminal device 240, and the execution device 210 returns the second image to the terminal device 240 through the I/O interface 212 to provide the user. Besides, the relationship between the terminal device 240 and the execution device 210 can be described by the relationship between the terminal device and the encoding side and the decoding side. The encoding end is a device using an encoding neural network, the decoding end is a device using a decoding neural network, and the encoding end and the decoding end can be the same device or independent devices. The terminal device may be an encoding side and/or a decoding side, similar to the terminal device in the image processing method described above. For facilitating understanding of the relationship of the terminal device 240 and the executing device 210, reference may be made to the related description of fig. 2 a-2 c described above.

It should be noted that fig. 11 is only a schematic structural diagram of an image processing system according to an embodiment of the present invention, and the positional relationship between the devices, modules, and the like shown in the diagram does not limit any limitation. For example, in other embodiments of the present application, the execution device 210 may be configured in the terminal device 240, for example, when the terminal device is a mobile phone or a tablet, the execution device 210 may be a module in a Host processor (Host CPU) of the mobile phone or the tablet for performing array image processing, and the execution device 210 may also be a Graphics Processing Unit (GPU) or a neural Network Processor (NPU) in the mobile phone or the tablet, where the GPU or the NPU is mounted on the Host processor as a coprocessor and is assigned tasks by the Host processor.

With reference to the above description, a specific implementation flow of the training phase of the image processing method provided in the embodiment of the present application is described below.

Specifically, referring to fig. 12, fig. 12 is a schematic flowchart of a model training method provided in the embodiment of the present application, where the model training method provided in the embodiment of the present application may include:

in step 1201, the training device acquires a first image.

In step 1202, the training apparatus segments the first image to obtain N first tiles, where N is an integer greater than 1.

In step 1203, the training device obtains N first adaptive data from the N first tiles, where the N first adaptive data are in one-to-one correspondence with the N first tiles.

In step 1204, the training device preprocesses the N first tiles according to the N first adaptive data.

In step 1205, the training device processes the N preprocessed first image blocks through the first coding neural network to obtain N groups of first feature maps.

In step 1206, the training device quantizes and entropy encodes the N sets of first feature maps to obtain N first encoded representations.

In step 1207, the training device performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

In step 1208, the training apparatus processes the N sets of second feature maps through the first decoding neural network to obtain N first reconstruction blocks.

In step 1209, the training device compensates the N first reconstructed tiles with the N first adaptive data.

In step 1210, the training apparatus combines the compensated N first reconstructed tiles to obtain a second image.

In step 1211, the training device obtains a distortion loss of the second image relative to the first image.

In step 1212, the training device jointly trains a model using a loss function until an image distortion value between the first image and the second image reaches a first preset degree, where the model includes the first encoding neural network, a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network.

Referring to fig. 13, fig. 13 is a schematic diagram of a training process according to an embodiment of the present application. The loss function of the model in the example is:

loss＝l_d+P×l_r

in the above loss function,/_dRepresenting the information entropy of the first encoded representation. Pxl_rFor representing a distortion measure between a first image and a second image,/_rRepresenting the distortion loss of the first image and the second image, and P represents a balance factor between the two loss functions, which characterizes the first encoded representation relative to the quality of the reconstructed image.

Optionally, in order to obtain a suitable tile size, the training process includes segmenting the first image into tiles of different sizes in a plurality of iterative training, i.e. N is different in value. Comparing the loss functions obtained by multiple iterations, and optimizing the size of the first image block.

In step 1213, the training apparatus outputs a second encoding neural network and a second decoding neural network, where the second encoding neural network is a model obtained after the iterative training is performed on the first encoding neural network, and the second decoding neural network is a model obtained after the iterative training is performed on the first decoding neural network.

The detailed description of steps 1201 to 1211 may refer to the description in the above-described image processing method.

Optionally, the method further comprises:

the training device quantizes the N first adaptive data to obtain N first adaptive quantized data, which are used to compensate the N first reconstructed tiles.

Optionally, the larger the N, the smaller the information entropy of the single first adaptively quantized data.

Optionally, an arrangement order of the N first coded representations is the same as an arrangement order of the N first tiles, and the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image.

Optionally, the training device processes the second image through a fusion neural network to obtain a third image, so as to reduce a difference between the second image and the first image, where the difference includes a blocking effect;

the training device is specifically configured to obtain a distortion loss of the third image with respect to the first image;

the model includes a converged neural network.

Optionally, each of the N first tiles is the same size.

Optionally, in two iterative trainings, the sizes of the first images for training are different, and the size of the first image block is a fixed value.

Optionally, the pixel of the first tile is a × b, a and b are obtained according to a target pixel, the target pixel is c × d,

equal to an integer number of times, is,

Optionally, the target resolution is set according to a setting interface in an imaging application for the resolution of the imaging component.

Optionally, the target resolution is obtained from a target image group in a gallery obtained by the image capturing component, a pixel of the target image group is the target pixel, and a ratio of the target image group in the gallery is the largest in an image group of different pixels.

Alternatively, an image of a plurality of pixels is obtained by the image pickup meansThe plurality of pixels are e x f,

equal to an integer number of times, is,

equal to an integer, said e comprising said c, said f comprising said d.

Optionally, the plurality of pixels are set by a resolution of the imaging component through a setting interface in the imaging application.

Optionally, pixels of the first tile are a × b, where a is the number of pixels in the width direction, b is the number of pixels in the height direction, and pixels of the first image are r × t;

if it is

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the padded pixels of the first image are r1 × t 1.

Optionally, after acquiring the first image, before filling in the edge of the first image, the method further comprises:

if it is as described

Equal to an integer;

if it is

Is not equal to an integer, and/or

if it is

Alternatively, after scaling up r and t, if

Is not equal to an integer, then obtain

The remainder of (1). If the remainder is greater than

The training device fills in the median pixel values only on one side of the first image in the width direction.

Optionally, if the remainder is less than

Wherein g is the remainder.

Optionally, the N first tiles comprise a first target tile having a range of pixel values smaller than a range of pixel values of the first image;

the training device dequantizes pixel values of the first target tile;

the training device is specifically configured to obtain a first adaptive data from the dequantized first target block.

On the basis of the embodiments corresponding to fig. 1 to fig. 13, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme. Specifically referring to fig. 14, fig. 14 is a schematic structural diagram of an encoding apparatus 1400 provided in the present embodiment, the encoding apparatus 1400 corresponds to an encoding end, the encoding apparatus 1400 may be a terminal device or a cloud device, and the encoding apparatus 1400 includes:

a first acquiring module 1401 for acquiring a first image;

a segmentation module 1402, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1;

a second obtaining module 1403, configured to obtain N first adaptive data from the N first tiles, where the N first adaptive data correspond to the N first tiles one to one;

a preprocessing module 1404, configured to preprocess the N first tiles according to the N first adaptive data;

the encoding neural network module 1405 is used for processing the N preprocessed first image blocks through the encoding neural network to obtain N groups of first feature maps;

and a quantization and entropy coding module 1406, configured to quantize and entropy code the N groups of first feature maps to obtain N first coded representations. Optionally, the encoding apparatus is further configured to perform all or part of the operations performed by the cloud device in the embodiment corresponding to fig. 3 a.

The above describes the encoding apparatus in the embodiment of the present application, and on the basis of the embodiments corresponding to fig. 1 to fig. 13, in order to better implement the above-described scheme of the embodiment of the present application, the following also provides a description of the decoding apparatus in the embodiment of the present application. Specifically referring to fig. 15, fig. 15 is a schematic structural diagram of a decoding apparatus 1500 provided in the embodiment of the present application, the decoding apparatus 1500 corresponds to a decoding end, the decoding apparatus 1500 may be a terminal device or a cloud device, and the decoding apparatus 1500 includes:

an obtaining module 1501, configured to obtain N first code representations, N first adaptive data, and a corresponding relationship, where the corresponding relationship includes a corresponding relationship between the N first adaptive data and the N first code representations, the N first adaptive data corresponds to the N first codes one to one, and N is an integer greater than 1;

an entropy decoding module 1502, which performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps;

a decoding neural network module 1503, configured to process the N groups of second feature maps to obtain N first reconstruction blocks;

a compensation module 1504 to compensate the N first reconstruction tiles with the N first adaptive data;

a combining module 1505 is used for combining the compensated N first reconstruction tiles to obtain a second image.

Optionally, the decoding apparatus is further configured to perform all or part of the operations performed by the terminal device in the embodiment corresponding to fig. 3 a.

The decoding apparatus in the embodiment of the present application is described above, and on the basis of the embodiments corresponding to fig. 1 to fig. 13, in order to better implement the above-described scheme of the embodiment of the present application, a description is also provided below for a training apparatus in the embodiment of the present application. Referring specifically to fig. 16, fig. 16 is a schematic structural diagram of an exercise device 1600 provided in the present application, the exercise device 1600 including:

a first acquiring module 1601 is configured to acquire a first image.

A segmenting module 1602, configured to segment the first image to obtain N first image blocks, where N is an integer greater than 1.

A second obtaining module 1603, configured to obtain N first adaptive data from the N first tiles, where the N first adaptive data correspond to the N first tiles one to one.

A preprocessing module 1604, configured to preprocess the N first tiles according to the N first adaptive data;

the first encoding neural network module 1605 is configured to process the preprocessed N first image blocks to obtain N groups of first feature maps.

A quantization and entropy coding module 1606, configured to quantize and entropy code the N groups of first feature maps to obtain N first coded representations.

An entropy decoding module 1607, which performs entropy decoding on the N first encoded representations to obtain N sets of second feature maps.

A first decoding neural network module 1608, configured to process the N sets of second feature maps to obtain N first reconstruction tiles.

A compensation module 1609 for compensating the N first reconstructed tiles by the N first adaptive data.

And a combining module 1610 configured to combine the compensated N first reconstruction tiles to obtain a second image.

A third obtaining module 1611, configured to obtain a distortion loss of the second image relative to the first image.

A training module 1612, configured to perform joint training on a model using a loss function until an image distortion value between the first image and the second image reaches a first preset degree, where the model includes the first encoding neural network, a quantization network, an entropy encoding network, an entropy decoding network, and the first decoding neural network. Optionally, the model further comprises a segmentation network, the trainable parameters in the segmentation network being the size of the first tile. Optionally, the model further comprises a segmentation network, the trainable parameters in the segmentation network being the size of the first tile.

An output module 1613, configured to output a second coding neural network and a second decoding neural network, where the second coding neural network is a model obtained after the iterative training is performed on the first coding neural network, and the second decoding neural network is a model obtained after the iterative training is performed on the first decoding neural network.

Optionally, the training apparatus is further configured to perform all or part of the operations performed by the terminal device and/or the cloud device in the embodiment corresponding to fig. 3 a.

the device further comprises:

the second obtaining module 1603 is specifically configured to obtain a first adaptive data from the dequantized first target tile.

Referring to fig. 17, fig. 17 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 1700 may be embodied as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a monitoring data processing device, a server, and the like, which is not limited herein. The execution apparatus 1700 may have the encoding apparatus described in the embodiment corresponding to fig. 14 and/or the decoding apparatus described in the embodiment corresponding to fig. 15 deployed thereon, so as to implement the functions of the apparatuses in the embodiments corresponding to fig. 14 and/or fig. 15. Specifically, the execution apparatus 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703 and a memory 1704 (wherein the number of processors 1703 in the execution device 1700 may be one or more, for example one processor in fig. 17), wherein the processor 1703 may include an application processor 17031 and a communication processor 17032. In some embodiments of the present application, the receiver 1701, the transmitter 1702, the processor 1703 and the memory 1704 may be connected by a bus or other means.

Memory 1704, which may include both read-only memory and random-access memory, provides instructions and data to processor 1703. A portion of memory 1704 may also include non-volatile random access memory (NVRAM). The memory 1704 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1703 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703. The processor 1703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1703. The processor 1703 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1703 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704 and completes the steps of the method in combination with hardware thereof.

The receiver 1701 may be used to receive input numeric or character information and generate signal inputs related to performing device related settings and function control. The transmitter 1702 may be configured to output numeric or character information via a first interface; the transmitter 1702 may also be configured to send instructions to the disk pack through the first interface to modify data in the disk pack; the transmitter 1702 may also include a display device such as a display screen.

In this embodiment, in an embodiment of the present application, the processor 1703 is configured to execute operations executed by the terminal device and/or the cloud device in the embodiment corresponding to fig. 3 a.

Optionally, an application processor 17031 for acquiring a first image;

preprocessing the N first image blocks according to the N first self-adaptive data;

quantizing and entropy encoding the N groups of first feature maps to obtain N first encoded representations;

in addition, the application processor 17031 may be further configured to perform all or part of the operations that the cloud device in the embodiment corresponding to fig. 3a may perform.

Optionally, an application processor 17031, configured to obtain N first encoded representations;

entropy decoding the N first encoded representations to obtain N sets of second feature maps;

processing the N sets of second feature maps by a decoding neural network to obtain N first reconstruction tiles;

compensating the N first reconstruction tiles by the N first adaptive data;

combining the compensated N first reconstruction blocks to obtain a second image;

in addition, the application processor 17031 may be further configured to perform all or part of the operations that the terminal device may perform in the embodiment corresponding to fig. 3 a.

Referring to fig. 18, fig. 18 is a schematic structural diagram of a training device provided in an embodiment of the present application, a training apparatus described in the embodiment corresponding to fig. 16 may be disposed on the training device 1800 to implement the functions of the training apparatus in the embodiment corresponding to fig. 16, specifically, the training device 1800 is implemented by one or more servers, and the training device 1800 may generate relatively large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1822 (e.g., one or more processors) and a memory 1832, and one or more storage media 1830 (e.g., one or more mass storage devices) storing an application 1842 or data 1844. The memory 1832 and the storage medium 1830 may be, among other things, transient storage or persistent storage. The program stored on storage medium 1830 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, a central processor 1822 may be provided in communication with the storage medium 1830 to execute a series of instruction operations in the storage medium 1830 on the exercise device 1800.

The training apparatus 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input-output interfaces 1858, and/or one or more operating systems 1841, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

In this embodiment, the central processor 1822 is configured to perform all or part of the operations performed by the training apparatus in the embodiment corresponding to fig. 16.

Also provided in the embodiments of the present application is a computer program product, which when run on a computer, causes the computer to perform the steps performed by the device in the method described in the foregoing embodiment shown in fig. 17, or causes the computer to perform the steps performed by the training device in the method described in the foregoing embodiment shown in fig. 18.

Also provided in the embodiments of the present application is a computer-readable storage medium, which stores a program for signal processing, and when the program is run on a computer, the program causes the computer to execute the steps executed by the device in the method described in the foregoing embodiment shown in fig. 17, or causes the computer to execute the steps executed by the training device in the method described in the foregoing embodiment shown in fig. 18.

The execution device and the training device provided by the embodiment of the application can be specifically chips, and the chips comprise: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instruction stored in the storage unit, so as to enable the chip in the execution device to perform the operation performed by the terminal device and/or the cloud device described in the embodiment shown in fig. 3a, or to enable the chip in the training device to perform the model training method described in the embodiment shown in fig. 13. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 19, fig. 19 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU2000, and the NPU2000 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 2003, and the controller 2004 controls the arithmetic circuit 2003 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 2003 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic circuitry 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 2001 and performs matrix arithmetic with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator) 2008.

The unified memory 2006 is used to store input data and output data. The weight data directly passes through a Memory Access Controller (DMAC) 2005, and the DMAC is transferred to the weight Memory 2002. Input data is also carried into the unified memory 2006 by the DMAC.

The BIU is a Bus Interface Unit 2010 for the interaction of the AXI Bus with the DMAC and the Instruction Fetch Buffer (IFB) 2009.

The Bus Interface Unit 2010(Bus Interface Unit, BIU for short) is configured to obtain an instruction from the external memory by the instruction fetch memory 2009, and is further configured to obtain the original data of the input matrix a or the weight matrix B from the external memory by the storage Unit access controller 2005.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2006 or to transfer weight data to the weight memory 2002 or to transfer input data to the input memory 2001.

The vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 2007 can store the vector of processed outputs to the unified memory 2006. For example, the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the arithmetic circuit 2003, such as linear interpolation of the feature planes extracted by the convolutional layers, and further such as a vector of accumulated values, to generate the activation values. In some implementations, the vector calculation unit 2007 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 2003, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer 2009 connected to the controller 2004 for storing instructions used by the controller 2004;

the unified memory 2006, the input memory 2001, the weight memory 2002, and the instruction fetch memory 2009 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. An image processing method, comprising:

acquiring a first image;

preprocessing the N first tiles according to the N first adaptive data;

2. The method of claim 1, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, wherein the N sets of second feature maps are used for processing by a decoding neural network to obtain N first reconstructed blocks, wherein the N first adaptive data are used for compensating the N first reconstructed blocks, and wherein the compensated N first reconstructed blocks are used for combining into a second image.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

and sending the N first coded representations to a decoding end, wherein the N first adaptive data and the corresponding relation comprise the corresponding relation between the N first adaptive data and the N first coded representations.

4. The method of claim 3, further comprising:

the sending the N first adaptive data to the decoding end includes:

and sending the N pieces of first self-adaptive quantized data to the decoding end.

5. The method of claim 4, wherein the larger the N, the smaller the information entropy of the single first adaptively quantized data.

6. The method according to any one of claims 3 to 5, wherein a permutation order of the N first coded representations is the same as a permutation order of the N first tiles, the permutation order of the N first tiles is a permutation order of the N first tiles in the first image, and the correspondence relationship includes the permutation order of the N first coded representations and the permutation order of the N first tiles.

7. The method of any one of claims 1 to 6, wherein each of the N first tiles is the same size.

8. The method of claim 7, wherein the size of the first tile is a fixed value when the method is used to segment the first image of different sizes.

9. The method according to any one of claims 1 to 8, wherein the pixels of the first tile are a x b, wherein a and b are derived from a target pixel, wherein the target pixel is c x d,

equal to an integer number of times, is,

equal to an integer, a and c are the number of pixels in the width direction, and b and d are the number of pixels in the height directionThe target pixel is obtained according to a target resolution of a terminal device, the terminal device comprises an image pickup component, a pixel of an image obtained by the image pickup component under the setting of the target resolution is the target pixel, and the first image is obtained by the image pickup component.

10. The method of claim 9, wherein the target resolution is set according to a resolution of the imaging component from a setting interface in an imaging application.

11. The method according to claim 9, wherein the target resolution is obtained from a target image group in a gallery obtained by the image pickup device, a pixel of the target image group is the target pixel, and a ratio of the target image group in the gallery is the largest in the image group of different pixels.

12. The method according to any one of claims 1 to 8, wherein the pixels of the first tile are a x b, a is the number of pixels in the width direction, b is the number of pixels in the height direction, and r x t is the pixels of the first image;

if it is

Is not equal to an integer, and/or

Equal to an integer number of times, is,

equal to an integer, the padded pixels of the first image are r1 × t 1.

13. The method of claim 12, wherein after acquiring the first image, prior to filling in edges of the first image, the method further comprises:

if it is as described

Equal to an integer;

if it is

Is not equal to an integer, and/or

if it is

14. The method according to any one of claims 1 to 13, wherein the N first tiles comprise a first target tile having a range of pixel values smaller than a range of pixel values of the first image;

inverse quantizing pixel values of the first target tile;

obtaining N first adaptive data from the N first tiles comprises:

a first adaptive data is obtained from the dequantized first target tile.

15. An image processing method, comprising:

acquiring N first code representations, N first adaptive data and a corresponding relation, wherein the corresponding relation comprises the corresponding relation between the N first adaptive data and the N first code representations, the N first adaptive data and the N first codes are in one-to-one correspondence, and N is an integer greater than 1;

compensating the N first reconstruction tiles by the N first adaptive data;

16. The method of claim 15, wherein the N first coded representations are obtained by quantization and entropy coding N sets of first feature maps obtained by processing N pre-processed first tiles by a coding neural network, wherein the N pre-processed first tiles are obtained by pre-processing N first tiles by the N first adaptive data obtained from the N first tiles obtained by dividing the first image.

17. The method according to claim 15 or 16, wherein the larger the N, the smaller the information entropy of the single first adaptively quantized data.

18. The method according to any one of claims 15 to 17, further comprising:

and processing the second image through a fusion neural network to obtain a third image so as to reduce the difference between the second image and the first image, wherein the difference comprises block effect.

19. An encoding apparatus, comprising:

the first acquisition module is used for acquiring a first image;

the encoding neural network module is used for processing the N preprocessed first image blocks to obtain N groups of first feature maps;

20. The apparatus of claim 19, wherein the N first encoded representations are used for entropy decoding to obtain N sets of second feature maps, wherein the N sets of second feature maps are used for processing by a decoding neural network to obtain N first reconstructed blocks, wherein the N first adaptive data are used for compensating the N first reconstructed blocks, and wherein the compensated N first reconstructed blocks are used for combining into a second image.

21. The apparatus of claim 19 or 20, further comprising:

a sending module, configured to send the N first encoded representations, the N first adaptive data, and a corresponding relationship to a decoding end, where the corresponding relationship includes a corresponding relationship between the N first adaptive data and the N first encoded representations.

22. The apparatus of claim 21, further comprising:

23. The apparatus of claim 22, wherein the larger the N, the smaller the information entropy of a single first adaptively quantized data.

24. The apparatus according to any one of claims 21 to 23, wherein an arrangement order of the N first coded representations is the same as an arrangement order of the N first tiles, the arrangement order of the N first tiles is an arrangement order of the N first tiles in the first image, and the correspondence relationship includes the arrangement order of the N first coded representations and the arrangement order of the N first tiles.

25. The apparatus of any one of claims 19 to 24, wherein each of the N first tiles is the same size.

26. The apparatus of claim 25 wherein the size of the first tile is a fixed value when the apparatus is used to process the first images of different sizes.

27. A decoding apparatus, comprising:

an obtaining module, configured to obtain N first code representations, N first adaptive data, and a corresponding relationship, where the corresponding relationship includes a corresponding relationship between the N first adaptive data and the N first code representations, the N first adaptive data corresponds to the N first codes one to one, and N is an integer greater than 1;

28. The apparatus of claim 27, wherein the N first coded representations are quantized and entropy coded using N sets of first feature maps obtained by processing N pre-processed first tiles by a coding neural network, wherein the N pre-processed first tiles are obtained by pre-processing N first tiles by the N first adaptive data obtained from the N first tiles obtained by dividing the first image.

29. The apparatus according to claim 27 or 28, wherein the larger the N, the smaller the information entropy of the single first adaptively quantized data.

30. The apparatus of any one of claims 27 to 29, further comprising:

and the fusion neural network module is used for processing the second image to obtain a third image so as to reduce the difference between the second image and the first image, wherein the difference comprises a block effect.

31. An image processing apparatus comprising: a non-volatile memory and a processor coupled to each other, the processor calling program code stored in the memory to perform the method as described in any one of claims 1-18.