WO2022021938A1

WO2022021938A1 - Image processing method and device, and neutral network training method and device

Info

Publication number: WO2022021938A1
Application number: PCT/CN2021/086836
Authority: WO
Inventors: 赵政辉; 马思伟; 王晶
Original assignee: 华为技术有限公司; 北京大学
Priority date: 2020-07-30
Filing date: 2021-04-13
Publication date: 2022-02-03
Also published as: CN114067007A

Abstract

An image processing method and device, and a neutral network training method and device. The image processing method comprises: determining texture complexity information corresponding to each of a plurality of regional images in an image to be processed (S910); determining, according to the texture complexity information corresponding to each regional image, an image compression model corresponding to each regional image (S920), different texture complexity information corresponding to different image compression models; and using the image compression model corresponding to each regional image to compress each regional image (S930). According to the method, the compression model corresponding to the texture complexity of each of the regional images having different texture complexities in the image to be processed is used to compress the regional image, which improves the overall compression effect of the image to be processed.

Description

Image processing method and device, and method and device for neural network training

This application claims the priority of the Chinese patent application filed on July 30, 2020 with the application number 202010754067.X and the application name "image processing method and device, neural network training method and device", all of which The contents are incorporated herein by reference.

technical field

The present application relates to the field of artificial intelligence, and more particularly, to an image processing method and apparatus, and a method and apparatus for neural network training.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.

Image compression can reduce redundant information in image data. Therefore, image compression is of great significance to improve the storage efficiency and transmission efficiency of images. Traditional image compression methods such as Joint Photographic Experts Group (JPEG) have good compression effects in medium and high bit rate regions, but in low bit rate regions, the compression effects of traditional image compression methods are not ideal.

The image can be compressed by neural network. This method mainly uses neural network and corresponding nonlinear transformation to extract image features, so as to achieve the purpose of compression. Compared with the traditional image compression method, this method can save complicated parameter design and module design. During decompression, a neural network can be used for decoding to reconstruct the image. How to improve the image compression performance of neural network has become a technical problem that needs to be solved urgently.

SUMMARY OF THE INVENTION

The present application provides an image processing method and device, and a neural network training method and device, which can improve the image compression effect of the neural network.

In a first aspect, an image processing method is provided, the method comprising: determining texture complexity information corresponding to each area image in a plurality of area images in an image to be processed; according to the texture complexity corresponding to each area image degree information, determine the image compression model corresponding to the image of each area, wherein, different texture complexity information corresponds to different image compression models; use the image compression model corresponding to the image of each area to compress the image of each area to compress.

In different regions of a complete image, the texture complexity of the image may not be the same. For example, in the background area such as sky and beach, the texture complexity of the image is low; in the area of interest or foreground area including people and other objects, the image complexity is high.

By dividing the image to be processed into multiple regional images, and compressing each regional image using a compression model corresponding to the texture complexity of the image, the regions with different texture complexities in the image to be processed can be compared with the texture of the image. Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.

With reference to the first aspect, in some possible implementations, the method further includes: using an image decompression model corresponding to the image compression model for compressing the image of each area, compressing the image obtained by compressing the image of each area Decompress the features to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain a restored image to be processed, and the optimization processing includes Adjust the edges of the decompressed image in each area.

Since a complete image is divided, each image decompression model used to deal with different texture complexity is spliced by the decompressed image of the region obtained by decompression, two adjacent decompressed images may appear line discontinuity after splicing or color differences, etc. By adjusting the pixels of the edge area of the decompressed image in one or more regions, the degree of image distortion between the complete image after compression, decompression, and splicing processing and the image before processing can be made smaller.

With reference to the first aspect, in some possible implementations, the determining the texture complexity information corresponding to each area image in the multiple area images in the image to be processed includes: calculating the gradient of each pixel in each area image Size; according to the gradient size of each pixel, determine the texture complexity information of each region image.

The magnitude of the gradient of a pixel can be determined based on the brightness of the pixel or other representations of color.

The texture complexity of the region image can be represented by the median or average of the gradient sizes of each pixel in the region image.

With reference to the first aspect, in some possible implementations, the method further includes: dividing the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images Include all pixels in the image to be processed.

The multiple area images include all the pixels in the to-be-processed image and the multiple area images do not overlap, which can reduce the bit rate.

In a second aspect, a neural network training method is provided, the method comprising: determining texture complexity information corresponding to each training area image in a plurality of training area images in a training image; Train texture complexity information, and determine the codec model corresponding to each training area image, wherein different texture complexity information corresponds to different codec models, and each of the codec models is used for the input of the training Compress the regional images, and decompress the compression results to obtain multiple decompressed training area images; adjust the parameters of the codec model according to the rate distortion obtained from the decompressed training area images and the training area images.

During the training process, for the training area images of each texture complexity, the codec corresponding to the texture complexity is trained, so that the compression performance of each codec for the image of the corresponding texture complexity is achieved. better.

In a complete image, the texture complexity of the images may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high. Dividing a complete image into multiple regions, so as to use the images of each region to train the codec, can make the training data more in line with the texture complexity corresponding to the image of the codec, thereby improving each codec Compression performance for pictures of corresponding texture complexity.

With reference to the second aspect, in some possible implementations, the multiple decompressed training area images are stitched and optimized through a fusion model to obtain the training restoration image, and the optimization process includes decompressing at least one of the decompressed images. The edge of the training area image is adjusted; the parameters of the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.

By adopting the fusion model, the image processed by the codec can be less distorted after splicing.

The parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.

Optionally, the neural network models required in the image processing process can be trained in an "end-to-end" fashion.

Calculate the rate distortion according to the image distortion degree of the training recovery image and the to-be-processed training image, so as to adjust the parameters of the codec model and the fusion model according to the rate distortion, and complete the training of the codec model and the fusion model, so that the image The "end-to-end" training of the neural network model required during processing reduces the complexity of neural network training.

It should be understood that the rate-distortion can be determined according to the degree of image distortion and the bit rate. The bit rate is used to indicate the compression degree of the image, which can be determined according to the compression result of the codec model.

It should be understood that the fusion model can also adjust other regions other than the edges of the decompressed training region image.

With reference to the second aspect, in some possible implementations, the determining the texture complexity information corresponding to each training area image in the multiple training area images in the training image includes: calculating each pixel in each training area image according to the gradient size of each pixel, determine the texture complexity information of each training area image.

With reference to the second aspect, in some possible implementations, the method further includes: dividing the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images The region image includes all the pixels in the training image.

In a third aspect, an electronic device and an image processing apparatus are provided, which are characterized by comprising a storage module and a processing module; the storage module is used to store program instructions; when the program instructions are executed in the processor, the The processing module is used to: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; according to the texture complexity information corresponding to each area image, determine the corresponding The image compression model, wherein, different texture complexity information corresponds to different image compression models; the image compression model corresponding to each area image is used to compress each area image.

With reference to the third aspect, in some possible implementations, the processing module is further configured to: use an image decompression model corresponding to the image compression model for compressing the image of each area, compress the image of each area to obtain Decompress the image features of the multiple regions to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain the restored image to be processed, and the optimization processing includes Pixel adjustment is performed on the edges of the decompressed image in the multiple regions.

With reference to the third aspect, in some possible implementations, the processing module is further configured to: calculate the gradient size of each pixel in each area image; determine the gradient size of each area image according to the gradient size of each pixel Texture complexity information.

With reference to the third aspect, in some possible implementations, the processing module is further configured to: divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images The area image includes all the pixels in the image to be processed.

In a fourth aspect, a neural network training device is provided, comprising a storage module and a processing module; the storage module is used for storing program instructions, and when the program instructions are executed in the processor, the processing module is used for: Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image; according to the training texture complexity information corresponding to each training area image, determine the encoding corresponding to each training area image. A decoding model, wherein different texture complexity information corresponds to different encoding and decoding models, and each encoding and decoding model is used to compress the input image of the training area, and decompress the compression result to obtain multiple decompressed training areas image; adjust the parameters of the codec model according to the rate-distortion, and the rate-distortion is obtained from the decompressed training area image and the training area image.

With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes: Pixel adjustment is performed on the edge of at least one image of the decompressed training area; and the parameters of the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.

With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: calculate the gradient size of each pixel in each training area image; determine each training area according to the gradient size of each pixel Texture complexity information for the image.

With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images do not overlap. A training area image includes all pixels in the training image.

In a fifth aspect, an electronic device is provided, comprising a memory and a processor, wherein the memory is used for storing program instructions; when the program instructions are executed in the processor, the processor is used for executing the first aspect or the first aspect The method described in the second aspect.

The processor in the fifth aspect above may be either a central processing unit (CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a graphics processor (graphics processing unit). unit, GPU), neural network processor (neural-network processing unit, NPU) and tensor processor (tensor processing unit, TPU) and so on. Among them, TPU is Google's fully customized artificial intelligence accelerator application-specific integrated circuit for machine learning.

In a sixth aspect, a computer-readable medium is provided, the computer-readable medium stores program code for execution by a device, the program code comprising a method for performing any one of the implementations of the first aspect or the second aspect .

In a seventh aspect, there is provided a computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to execute the method in any one of the implementation manners of the first aspect or the second aspect.

In an eighth aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementations of the first aspect or the second aspect.

The above chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Description of drawings

FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.

FIG. 2 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.

FIG. 3 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.

FIG. 6 is a schematic structural diagram of an image processing system.

FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.

FIG. 8 is a schematic structural diagram of a neural network training method provided by an embodiment of the present application.

FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.

FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.

FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.

FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.

FIG. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

FIG. 14 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.

(1) Neural network

A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x _s and an intercept 1 as input, and the output of the operation unit can be:

Among them, s=1, 2, ... n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

(2) Deep neural network

A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complicated, it is not complicated in terms of the work of each layer. In short, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and offset vector

The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as

It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).

(3) Convolutional Neural Network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent neural networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and each node in each layer is unconnected. Although this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer and this layer are no longer unconnected but connected, and the input of the hidden layer not only includes The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN.

Why use a recurrent neural network when you already have a convolutional neural network? The reason is very simple. In the convolutional neural network, there is a premise that the elements are independent of each other, and the input and output are also independent, such as cats and dogs. But in the real world, many elements are interconnected, such as the change of stocks over time, and another example of a person who said: I like to travel, and my favorite place is Yunnan. I must go there in the future. Fill in the blanks here. Humans should all know that it is "Yunnan". Because humans make inferences based on the content of the context, but how do you get machines to do this? RNN came into being. RNNs are designed to give machines the ability to memorize like humans do. Therefore, the output of RNN needs to rely on current input information and historical memory information.

(5) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.

(6) Back propagation algorithm

The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

(7) Pixel value

The pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color. For example, the pixel value is 256*Red+100*Green+76Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness. For grayscale images, the pixel values can be grayscale values.

(8) Image compression

Image compression refers to the technology of representing the original pixel matrix lossy or lossless with fewer bits, also known as image coding. Image compression, which performs transformations on the image content, can reduce the amount of data required to represent a digital image, thereby reducing the space occupied by image storage.

Image data can be compressed because there is redundancy in the data. The redundancy of image data is mainly manifested as: spatial redundancy caused by the correlation between adjacent pixels in the image; temporal redundancy caused by the correlation between different frames in the image sequence; caused by the correlation of different color planes or spectral bands. spectrum redundancy. The purpose of data compression is to reduce the number of bits required to represent data by removing these data redundancies. Due to the huge amount of image data, it is very difficult to store, transmit and process, so the compression of image data is very important.

(9) Image decompression

Image decompression is the inverse process of image compression, which can also be called decompression or decoding. Through image decoding, the information format of the input compact representation can be restored as an image.

(10) Image Distortion

Generally, the peak signal-to-noise ratio (PSNR) between the original image and the encoded reconstructed image is used to measure the image distortion. This PSNR can be the PSNR of luminance or the linearity of PSNR of luminance and chrominance. combination. Generally, in the simplest case, the PSNR (Y-PSNR) of the luminance is used as the main criterion. Among them, the peak signal is the maximum value of the pixel in the image (for example, the maximum value of the pixel brightness), and the noise refers to the mean square error of each pixel value in the original image and the reconstructed image (the square of the difference is averaged); the ratio of the two is converted. In the form of molasses, it is PSNR.

(11) Code rate

The code rate (rate), also known as the encoding code rate, can be the average data amount ((bit-per-pixel, bpp) of each pixel in the compressed data, which is used to indicate the degree of data compression. The code rate can be Determined according to the proportion of the data volume after image compression.

(12) Rate distortion

Rate distortion is used to express the relationship between image distortion and bit rate.

The smaller the image distortion, the more the image details and the higher the bit rate. The greater the image distortion, the more details the image loses due to compression, and the smaller the bit rate.

Rate distortion optimization (R-D optimization) refers to reducing image distortion and bit rate as much as possible according to preset rules. That is to say, in the case of a bit rate as small as possible, the distortion of the obtained image can be reduced as much as possible, so as to achieve a better compression effect. Through rate-distortion optimization, a balance point can be found between bit rate and distortion, so that the compression effect is optimal. Of course, the rule for rate-distortion optimization may also be that the distortion is the smallest when the code rate is guaranteed to be less than the upper limit, or the code rate is the smallest when the distortion is guaranteed to be less than the lower limit, and so on.

Rate-distortion can be calculated by the rate-distortion function. The rate-distortion function can be expressed as, for example, I=D+λR, or I=D·R, etc., where I is the rate-distortion, R is the code rate, R is the distortion degree, and λ is a preset Lagrangian coefficient.

As shown in FIG. 1 , an embodiment of the present application provides a system architecture 100 . In Figure 1, a data collection device 160 is used to collect training data. For the image processing method according to the embodiment of the present application, the training data may include training images.

After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .

The following describes how the training device 120 obtains the target model/rule 101 based on the training data. The training device 120 processes the input original image and compares the output image with the original image until the training device 120 outputs the image and the original image. The rate-distortion determined by the difference is less than a certain threshold, so that the training of the target model/rule 101 is completed.

The above target model/rule 101 can be used to implement the image processing method of the embodiment of the present application. The target model/rule 101 in this embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application Limitations of Examples.

The target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptops, augmented reality (AR) AR/virtual reality (VR), in-vehicle terminals, etc., can also be servers or the cloud. In FIG. 1, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: an image to be processed input by the client device.

The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112. In this embodiment of the present application, the preprocessing module 113 and the preprocessing module may also be absent. 114 (or only one of the preprocessing modules), and directly use the calculation module 111 to process the input data.

When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .

Finally, the I/O interface 112 returns the processing result, such as the above-obtained image classification result, to the client device 140 so as to be provided to the user.

It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.

In the case shown in FIG. 1 , the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .

It is worth noting that FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 1 , the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .

As shown in FIG. 1 , the target model/rule 101 is obtained by training the training device 120. The target model/rule 101 may be the neural network in the present application in this embodiment of the present application. Specifically, the neural network may be used in this embodiment of the present application. For CNN, deep convolutional neural network (deep convolutional neural networks, DCNN), recurrent neural network (recurrent neural network, RNN) and so on.

Since CNN is a very common neural network, the structure of CNN will be introduced in detail in conjunction with Figure 2 below. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. A deep learning architecture refers to an algorithm based on machine learning. learning at multiple levels of abstraction. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.

The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 2 . In FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 . Wherein, the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed by the convolution layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained. The internal layer structure in the CNN 200 in Figure 2 is described in detail below.

Convolutional layer/pooling layer 220:

Convolutional layer:

As shown in FIG. 2, the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.

The following will take the convolutional layer 221 as an example to introduce the inner working principle of a convolutional layer.

The convolution layer 221 may include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will result in a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied, That is, multiple isotype matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted convolution feature maps with the same size are combined to form The output of the convolution operation.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .

When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (eg, 221 ) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 200 deepens, the features extracted by the later convolutional layers (eg, 226) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.

Pooling layer:

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 221-226 as shown in 220 in Figure 2, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the only purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 230:

After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

After the multi-layer hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation) is completed, the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 3 . In FIG. 3 , a convolutional neural network (CNN) 200 may include an input layer 110 , a convolutional/pooling layer 120 (where the pooling layer is optional), and a neural network layer 130 . Compared with FIG. 2 , multiple convolution layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 3 are parallel, and the extracted features are input to the full neural network layer 130 for processing.

It should be noted that the convolutional neural networks shown in FIG. 2 and FIG. 3 are only examples of two possible convolutional neural networks of the image processing method according to the embodiment of the present application. The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.

FIG. 4 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50 . The chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 . The chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 . The algorithms of each layer in the convolutional neural network shown in Figures 2 and 3 can be implemented in the chip shown in Figure 4.

The neural network processor NPU 50 is mounted on the main central processing unit (CPU) (host CPU) as a coprocessor, and tasks are allocated by the main CPU. The core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract the data in the memory (weight memory or input memory) and perform operations.

In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 503 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers it on each PE in the operation circuit. The arithmetic circuit fetches the data of the matrix A from the input memory 501 and performs the matrix operation on the matrix B, and stores the partial result or the final result of the matrix in the accumulator 508.

The vector calculation unit 507 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector computing unit 507 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .

In some implementations, vector computation unit 507 can store the processed output vectors to unified buffer 506 . For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate activation values. In some implementations, vector computation unit 507 generates normalized values, merged values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 503, eg, for use in subsequent layers in a neural network.

Unified memory 506 is used to store input data and output data.

The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.

A bus interface unit (BIU) 510 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 509 through the bus.

The instruction fetch memory (instruction fetch buffer) 509 connected with the controller 504 is used to store the instructions used by the controller 504;

The controller 504 is used for invoking the instructions cached in the memory 509 to control the working process of the operation accelerator.

Generally, the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip (On-Chip) memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access Memory (double data rate synchronous dynamic random access memory, referred to as DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.

The operation of each layer in the convolutional neural network shown in FIG. 2 and FIG. 3 may be performed by the operation circuit 503 or the vector calculation unit 507 .

The execution device 110 in FIG. 1 described above can execute each step of the image processing method of the embodiment of the present application. The CNN model shown in FIG. 2 and FIG. 3 and the chip shown in FIG. 4 can also be used to execute the implementation of the present application. The steps of the image processing method of the example. The method for training a neural network according to the embodiment of the present application and the image processing method according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings.

As shown in FIG. 5 , an embodiment of the present application provides a system architecture 300 . The system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.

The execution device 210 may be implemented by one or more servers. Optionally, the execution device 210 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 210 may be arranged on one physical site, or distributed across multiple physical sites. The execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the image processing method in this embodiment of the present application.

Specifically, the execution device 210 may perform the following process: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; determine the texture complexity information corresponding to each area image according to the texture complexity information An image compression model corresponding to each area image, wherein different texture complexity information corresponds to different image compression models; each area image is compressed by using the image compression model corresponding to each area image.

Through the above process execution device 210, the image compression effect of the to-be-processed image can be improved by using the compression model corresponding to the texture complexity of the region image for the region image with different texture complexity.

A user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and the like.

Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In an implementation manner, the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network for image classification Or image processing, etc.

In another implementation, the target neural network can be directly deployed on the execution device 210, and the execution device 210 obtains the images to be processed from the local device 301 and the local device 302, and classifies the images to be processed or other types of images according to the target neural network deal with.

The above execution device 210 may also be a cloud device, in this case, the execution device 210 may be deployed in the cloud; or, the above execution device 210 may also be a terminal device, in this case, the execution device 210 may be deployed on the user terminal side, the embodiment of the present application This is not limited.

FIG. 6 shows a schematic structural diagram of an image processing system.

The image processing system 600 includes an encoder 610 and a quantization module 620 at the encoding end, and a decoder 630 at the decoding end. The encoder 610 and the decoder 630 are neural networks.

The image processing system 600 may be applied in the scenario of transmitting and storing images.

The encoder 610 and the quantization module 620 at the encoding end may be set in a server in the cloud. Image data performs an image encoding process in the cloud, resulting in a compact representation of compressed data. Storing compressed data can reduce the storage space occupied by saving images. The transmission of compressed data can reduce the occupation of transmission resources in the image transmission process and reduce the demand for bandwidth.

The decoder 630 on the decoding side may be provided in a terminal device serving as a client. The decoding end performs a decoding operation on the compressed data to obtain a reconstructed image. The terminal device can display the reconstructed image through a display.

The encoder 610 is used for extracting features of the image to be processed, so as to obtain image features.

The quantization module 620 is used for quantizing image features to obtain compressed data. Quantization, that is, the process of approximating a continuous value of a signal (or a large number of possible discrete values) into a finite number (or fewer) of discrete values in the field of digital signal processing.

The cloud can transmit the compressed data to the client.

The decoder 630 is used to decompress the compressed data to obtain a reconstructed image.

The image processing system 600 uses the entire image as input, and performs nonlinear transformation on the image to reduce the correlation between codewords and improve the compression performance of the neural network.

Natural images themselves contain very rich texture information. Texture in computer graphics includes both the texture of the surface of the object in the usual sense, even if the surface of the object exhibits uneven grooves, and the color pattern on the smooth surface of the object. Texture complexity can be used to reflect how strongly pixel values in an image are transformed. Different categories of images have significant differences in texture details and other characteristics, and image content characteristics with different texture complexity are quite different.

The image processing system 600 uses the same encoder to perform the same processing for images with different texture complexities, which hinders further improvement of the compression performance.

In order to solve the above problems, embodiments of the present application provide an image processing system to improve image compression performance.

FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application. According to the texture complexity of the image, the neural network corresponding to the image texture complexity is selected from multiple neural networks. The structure realizes the adaptive adjustment of the selection of the neural network according to the image content, and compresses the image according to different texture characteristics. A further improvement in image compression performance is achieved.

Image processing system 700 includes compression system 710 and decompression system 720 . The compression system 710 includes a segmentation model 711 , a classification model 712 , and a compression model 713 , and the decompression system 720 includes a decompression module 721 and a fusion module 722 .

Compression system 710 and decompression system 720 may be located in the same or different devices.

The image to be processed is input into the compression system 710, and the compression system 710 is used for compressing the image to be processed.

The segmentation model 711 can segment the to-be-processed image to obtain multiple region images.

Area images can also be referred to as image blocks.

The sizes of the plurality of area images may be the same or different. In order to reduce the difficulty of image segmentation, the image can be segmented according to the target size to obtain multiple region images with the same size.

There may or may not be an overlap between the plurality of region images. To improve compression performance, the multiple region images do not overlap.

For example, the to-be-processed image can be divided into multiple 128×128 area images. Through the non-overlapping division of multiple images to be processed, a plurality of area images with non-repetitive contents are formed.

The region images are input into the classification model 712. The classification model 712 is used to calculate the texture complexity of the input image.

For the simple first-order differential operation of digital images, due to its fixed directionality, it can only detect edges in a specific direction. In order to detect and overcome the shortcomings of the first-order derivative, the gradient of each pixel in the image can be calculated, so as to realize the first-order differential operation of the image, and consider the directionality.

The direction of the gradient of the image is at the maximum change rate of the image gray level, which can reflect the gray level change on the edge of the image. The gradient operator always points in the direction of the most drastic transformation. In image processing, the direction of the gradient operator is orthogonal to the edges in the image. The size of the gradient operator represents the rate of change of the grayscale of the image.

The classification model 712 can calculate the difference value of the luminance of each pixel in the horizontal direction and the vertical direction for the input image. According to the difference value of the brightness of each pixel in the horizontal direction and the vertical direction, the gradient size of the pixel can be calculated. From the magnitude of the gradient for each pixel, the average gradient magnitude or the median of the gradient magnitudes for the individual pixels in the image can be determined. The average gradient size of the image or the median can indicate how smooth the image is, reflecting the texture complexity of the image.

The compression module 713 is configured to perform image feature extraction on the image according to the texture complexity of the image by using a compression model corresponding to the texture complexity, so as to realize the compression of the image. The compression module 713 may include an AI model for extracting features, called a compression model (also called an image compression model), or the compression module 713 may invoke the compression model through an interface to extract image features. The compression model can be a neural network model that is pre-trained. The region image can be input into the compression model to obtain the image features of the region image. The compression model may be, for example, CNN, RNN, or the like.

The compression module 713 may store the correspondence between the texture complexity and the compression model. Therefore, according to the texture complexity of the area image, the compression module 713 may determine a compression model corresponding to the texture complexity from a plurality of compression models, and process the area image.

According to the processing of the regional images by the compression module 713, the image features of each regional image can be obtained.

Compression system 710 may also include quantization models and the like. The quantization model can quantify image features.

The decompression system 720 is used to decompress the processing result of the compression system 710 .

The image features processed by the compression system 710 are input into the decompression module 721 . The image features processed by the compression system 710 may be the image features output by the compression module 713 . In the case where the compression system 710 may further include a quantization model, the image features processed by the compression system 710 may be quantized image features.

The decompression module 721 is used to decompress the image features to obtain a decompressed image.

The decompression module 721 may be configured to perform image decompression on the image features according to the texture complexity of the image using a decompression model corresponding to the texture complexity to obtain a decompressed image.

Alternatively, the decompression module 721 may receive indication information, where the indication information is used to indicate the decompression model corresponding to each image feature. The decompression module 721 can decompress the image feature by using the decompression model indicated by the indication information.

The decompression module 721 may include an AI model for decompressing image features, called a decompression model or an image decompression model, or the decompression module 721 may invoke the decompression model through an interface to decompress the image. The decompression model can be a neural network model that is pre-trained. The image feature and the image texture complexity corresponding to the image feature can be input into the decompression model to obtain the decompressed image of the region. The decompression model can be, for example, CNN, RNN, or the like.

The decompression module 721 may store the correspondence between the texture complexity and the decompression model. Therefore, according to the texture complexity of the regional image, the decompression module 721 can determine a decompression model corresponding to the texture complexity from a plurality of compression models, and process the image feature.

Through the processing of the decompression module 721, the restored images of each region can be obtained.

The fusion module 722 is used to fuse the restored regional images. The fusion module 722 may include an AI model for image fusion, which is called a fusion model, or the fusion module 722 may call the fusion model through an interface to realize fusion of regional images. The fusion model can be a neural network model that is pre-trained. The restored image of each region can be input into the fusion model to obtain the fused image. The fused image may also be referred to as a reconstructed image or a compressed reconstructed image. The fusion model can be, for example, a CNN or the like.

The fusion of regional images can be splicing the regional images.

Further, the fusion of the regional images may also include adjustment of the edge pixels of the regional images, so that the error between the reconstructed image and the image to be processed is smaller and the degree of distortion is reduced.

The image processing system 700 uses different compression models and decompression models to process data by calculating the texture complexity of the regional images under the condition that the texture complexity of the regional images is different, thereby improving the image compression performance.

The image processing system 700 divides the image to be processed and calculates the texture complexity of the images in different regions, so that the foreground and background of the image to be processed can be processed using different compression models and decompression models, thereby improving image compression performance.

The decompression system 720 in the image processing system 700 reduces the degree of image distortion and improves the image compression performance by adjusting the edge pixels of the regional image during regional image fusion.

Each AI model used in the image processing system 700 may be obtained through end-to-end training; alternatively, the compression model and the decompression model may be trained first, and then the fusion model may be trained. For the training method of the AI model adopted in the image processing system 700, reference may be made to the description of FIG. 8 .

End-to-end training is a machine learning paradigm. The entire learning process does not divide artificial sub-problems, but is completely handed over to the deep learning model to directly learn the mapping from the original data to the desired output.

FIG. 8 is a schematic structural diagram of a neural network model training method provided by an embodiment of the present application.

At S810, determine the texture complexity information corresponding to each training area image in the plurality of training area images in the training image.

The training images to be processed can be obtained. The complete training image to be processed can be divided to obtain multiple training area images.

The plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.

At S820, according to the training texture complexity information corresponding to each training area image, determine an encoding/decoding model corresponding to each training area image.

Different texture complexity information corresponds to different codec models.

Each of the encoding and decoding models is used to compress the input images of the training area, and decompress the compression result, thereby obtaining multiple decompressed images of the training area.

That is, the codec model includes a compression model and a decompression model. The compression model is used to compress the image, and the decompression model is used to decompress the processing result of the compression model.

Each training area image is input into an encoding/decoding model corresponding to the training area image, and the encoding/decoding model processes the input training area image to obtain a decoded training area image corresponding to the training area image.

In the encoding and decoding model, the compression model compresses the images in the training area to obtain the training features of the images in the training area. The decompression model decodes the training features of the training area image to obtain the decompressed training area image corresponding to the training area image.

In S830, the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.

That is, the encoding and decoding model can be trained by using the training images.

When performing S830, the training area image is processed by using the codec model adjusted by the parameters each time until the rate distortion gradually converges, so as to obtain the codec model that has been trained.

For a training image to be processed, a fusion model can be used to stitch and optimize the decompressed training area images corresponding to each training area image input to the fusion model to obtain a training restoration image. This embodiment of the present application does not limit the sequence of the splicing process and the optimization process.

The optimization process includes adjustments to the edge regions of the decompressed training images.

The optimization process may also include adjustments to regions other than the border regions of the decompressed training image.

It should be understood that the adjustment in the optimization process is the adjustment of the color, for example, the brightness and chromaticity can be adjusted.

The parameters of the fusion model can be adjusted according to the image distortion degree of the training recovery image and the training image to be processed to complete the training of the fusion model.

It is also possible to determine the rate-distortion and adjust the parameters of the codec model according to the image distortion degree of the training recovery image and the training image to be processed. Complete the training of the encoder-decoder model.

The fusion model can stitch the decompressed training images corresponding to the images in each training area, and modify and adjust the pixels located in the edge area of the decompressed training images to obtain the training recovery image. The training recovery image is the recovered training image to be processed.

According to the difference between the training restored image and the training image to be processed, the degree of image distortion can be determined. The bit rate can be determined according to the average amount of data per pixel in the compression result.

According to the image distortion degree of the training recovery image and the training image to be processed, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the image distortion degree is reduced when the bit rate meets the preset conditions.

Alternatively, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image may also be adjusted according to the code rate. After that, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the bit rate is reduced when the image distortion degree satisfies the preset condition.

Alternatively, the compression performance can be reflected as a whole through rate-distortion. The parameters of the fusion model and the codec model corresponding to each training area image can be adjusted to minimize rate-distortion.

For each training image to be processed, the codec model and fusion model after parameter adjustment are used for processing each time until the rate-distortion gradually converges, so as to obtain the codec model and fusion model after training.

Therefore, the training of the AI model in the multi-image processing system 700 can be implemented in an "end-to-end" manner.

In order to improve the training speed of the AI model in the image processing system 700, the codec model may also be pre-trained first, and the fusion model may be trained by using the pre-trained codec model.

Specifically, training images to be processed may be acquired. The to-be-processed training image can be divided to obtain multiple training region images.

For each training area image, the codec model corresponding to the training area image can be used for compression, and the compression result can be decompressed to determine the bit rate and the image distortion rate, thereby determining the rate distortion. The rate-distortion is optimized by adjusting the parameters of the codec model.

A large number of training images to be processed are processed to obtain a large number of training region images to cover images of each texture complexity. For each texture complexity, each time the codec model after parameter adjustment is used to process the image of the training area of the texture complexity until the rate-distortion gradually converges, so as to obtain each codec model that has been pre-trained.

The pre-trained encoding and decoding model is used to process the to-be-processed training image, so as to obtain the decompressed training area image corresponding to each training area image in the to-be-processed training image.

The fusion model is used to fuse the decompressed training images corresponding to the images of each training area in a to-be-processed training image to obtain a training recovery image.

Adjust the parameters of the fusion model to minimize the difference between the training restored image and the pending training image before compression.

Using the adjusted fusion model, the pre-trained codec model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained. The completed fusion model.

In some embodiments, in the process of using the pre-trained codec model to train the fusion model, the parameters of the codec model may also be adjusted to obtain each AI model that has been trained in the image processing system 700 .

Through S810 to S830, in the training process, the encoding and decoding models corresponding to each texture complexity are trained by using the training area images of different texture complexities. Using the trained codec model can realize differential processing of regional images with different texture complexity, thereby improving the overall image compression performance.

Using the trained codec model, image processing can be performed. See Figure 9 for description.

That is to say, in S830, the code rate and the image distortion degree can be calculated according to the decompressed training area image and the training area image, so as to determine the rate distortion, and adjust the parameters of the codec model according to the rate distortion, so as to complete the training of the codec model or pretrained. After that, use the trained or pre-trained encoder-decoder model to train the fusion model.

Alternatively, at S830, the training restoration image may also be determined by processing the fusion model according to the decompressed training area image. Calculate the bit rate according to the decompressed training area image. According to the training recovery image and the training image, the image distortion degree is calculated, and the rate distortion is determined according to the bit rate and the image distortion degree. After that, the parameters of the encoder-decoder model and the fusion model can be adjusted according to the rate-distortion to complete the training of the encoder-decoder model and the fusion model.

FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The method 900 shown in FIG. 9 can be executed by an image processing device, and the image processing device can be a mobile terminal, and the computing power of a computer, a server, etc. is sufficient for the image processing device.

The method 900 may be specifically applied in the fields of image transmission, graphics storage, etc. that need to compress images. The method includes S910 to S930, and these steps are described in detail below.

At S910, determine the texture complexity information corresponding to each of the multiple area images in the image to be processed.

At S920, an image compression model corresponding to each regional image is determined according to the texture complexity information corresponding to each regional image.

Among them, different texture complexity information corresponds to different image compression models.

In order to determine the texture complexity of the area image, the difference value of each pixel of the area image in two different directions can be calculated. Taking the horizontal direction of the area image as the x-axis and the vertical direction as the y-axis, a plane rectangular coordinate system is established. The area image is stored in the form of a two-dimensional array, and the difference value of the pixel (i, j) in the x direction is:

dx(i,j)=p(i,j)-p(i-1,j)

The difference value of the pixel (i, j) in the x direction is:

dy(i,j)=p(i,j)-p(i,j-1)

Wherein, p(i, j) may be the brightness of the pixel (i, j), or may be other parameters used to represent the color of the pixel.

Calculate the gradient of each pixel (i, j) in the area image according to the difference values in these two directions (also known as the gradient operator) can be represented by a vector: (dx(i,j),dy(i,j) ).

The gradient size Grad(i, j) of pixel (i, j) is:

Thus, the average value of the gradient magnitude of the region image can be obtained:

Among them, W represents the number of pixels in the area image in the x direction, and H represents the number of pixels in the area image in the y direction. The gradient mean value G of the area image can be used to represent and evaluate the texture complexity of the area image.

In some embodiments, the gradient size (which can also be referred to as the gradient length) of each pixel (i, j) Grad(i, j) can also be expressed as:

Grad(i,j)=|dx(i,j)|+|dy(i,j)|

In the above gradient calculation method, taking the absolute value or calculating the square value of the difference value in the x-direction and the y-direction can prevent the positive and negative opposites of the two directions from canceling out, making the gradient calculation result more accurate.

The image compression model corresponding to the first texture complexity information of each region image may be determined according to the corresponding relationship between the texture complexity and the compression model.

The correspondence between the texture complexity information and the image compression model may include two or more types of texture complexity information, and a compression model corresponding to each texture complexity information.

For example, when the average value of the gradient size of the image is greater than or equal to the preset value, the texture complexity of the image can be determined to be complex; when the average value of the gradient size of the image is smaller than the preset value, the texture complexity of the image can be determined to be simple . Therefore, the compression model corresponding to the image can be determined according to the average value of the gradient size of the image.

It should be understood that the correspondence between the texture complexity and the compression model is the same as the correspondence between the texture complexity and the compression model used when training the first compression model used in the method 900 .

At S930, each area image is compressed using an image compression model corresponding to the each area image.

Through S910 to S930, for images with different texture complexities, image compression models corresponding to the image texture complexities can be used to compress each region image in the to-be-processed image, thereby improving the image compression effect of the to-be-processed image.

The image to be processed may be a complete image, for example, a photo captured by a camera, or a frame of image in a video.

Divide the image to be processed to obtain multiple area images. In order to reduce the complexity of division, the sizes of the plurality of area images may be the same. In order to reduce the amount of compressed data, there is no overlapping area between the multiple area images. The multiple region images may include all pixels in the to-be-processed image, thereby reducing image distortion and improving compression performance.

In different regions of a complete image, the texture complexity of the image may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high.

By dividing the image to be processed into multiple regional images, and compressing each regional image using a compression model corresponding to the texture complexity of the image, the regions with different texture complexities in the image to be processed can be compared with the texture of the image. Complexity-adapted compression processing to improve the overall compression effect of the image to be processed. The embodiments of the present application provide a more flexible image processing manner.

The image to be processed is divided into multiple area images, and each area image is compressed to obtain compressed data. When the compressed data is decompressed to obtain the image to be processed, the compressed data corresponding to each regional image may be decompressed separately.

Each area image is compressed using an image compression model corresponding to the area image to obtain the image features of the area image. For the image features obtained after compression, the decompression model corresponding to the image compression model during compression should be used for decompression to obtain a regional decompressed image, and the regional decompressed image can also be understood as a restored regional image obtained by decompression.

The region decompressed images corresponding to each region image in the to-be-processed image are spliced, so that the to-be-processed image can be restored.

Further, optimization processing can be performed on the decompressed images of each region.

The optimization process may include adjusting the border regions of the decompressed image for one or more regions.

Since the second images corresponding to the images in each region may be obtained by different decompression models, in the edge region of two adjacent second images, discontinuous lines or color differences may appear after splicing. In order to make the image distortion degree between the decompressed image and the image to be processed smaller, the pixels of the edge region of one or more second images may be adjusted.

The optimization process may also include adjustments to regions other than the edge regions of the region-decompressed image.

Optimization processing may be performed before or after splicing, which is not limited in this embodiment of the present application.

The pixels in the edge region of the second image may be stitched and optimized using the fusion model.

Through the optimization process, the pixels of one or more edge regions of the second image are adjusted, which can further reduce the degree of image distortion and improve the image compression effect.

Using the image processing method 900 to compress and decompress images can achieve better image compression performance.

Divide the image to be processed into a plurality of area images of equal size. The plurality of area images include all pixels of the image to be processed.

For each area image, when the average value of the gradient size of the image is greater than or equal to the preset value, the texture complexity of the image is determined to be complex; when the average value of the gradient size of the image is less than the preset value, the texture complexity of the image is determined to be complex degree is simple. Therefore, the compression model and the decompression model corresponding to the image can be determined according to the average value of the gradient size of the image, the image is compressed, and the compression result is decompressed.

After that, use the fusion model to adjust the pixels located in the edge area of the decompressed image, and perform stitching to obtain the decompressed image.

Tested on Kodak dataset using different compression algorithms. As shown in FIG. 10 , compared with the method in which the image to be processed is processed by using a single compression model and a decompression model, the image processing method provided by the embodiment of the present application adopts the multi-model processing method, which can In the case of the same bit rate, the PSNR is effectively improved, and the image distortion is lower.

The image processing system provided by the embodiment of the present application, the AI model training method required by the image processing system, and the image processing method are described above with reference to FIGS. 1 to 10 . The following describes the embodiment of the present application with reference to FIGS. 11 to 14 . device example. It should be understood that the descriptions of the image processing system, the AI model training method required by the image processing system, and the image processing method correspond to the descriptions of the apparatus embodiments. Therefore, for the parts not described in detail, reference may be made to the above descriptions.

The image processing apparatus 2000 includes a storage module 2010 and a processing module 2020 .

The storage module 2010 is used to store program instructions.

When the program instructions are executed in the processor, the processing module 2020 is configured to:

Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;

According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;

Each area image is compressed by using an image compression model corresponding to each area image.

Optionally, the processing module 2020 is further configured to use an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area. The region decompressed image corresponding to each region image is described.

The processing module 2020 is further configured to perform stitching processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images of the multiple regions.

Optionally, the processing module 2020 is further configured to calculate the gradient size of each pixel in each regional image.

The processing module 2020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each regional image.

Optionally, the processing module 2020 is further configured to divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images include the image to be processed. All pixels.

The neural network training apparatus 3000 includes a storage module 3010 and a processing module 3020 .

The storage module 3010 is used to store program instructions.

When the program instructions are executed in the processor, the processing module 3020 is configured to:

Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;

According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;

The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.

Optionally, the processing module 3020 is further configured to perform stitching processing and optimization processing on the multiple decompressed training area images by using a fusion model, and the optimization processing includes performing pixel adjustment on the edges of the multiple decompressed training area images.

The processing module 3020 is further configured to adjust the parameters of the fusion model according to the degree of image distortion between the training restored image and the training image.

Optionally, the processing module 3020 is further configured to adjust parameters of the encoding and decoding model according to the degree of image distortion between the training restoration image and the training image.

Optionally, the processing module 3020 is further configured to calculate the gradient size of each pixel in each training area image.

The processing module 3020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each training area image.

Optionally, the processing module 3020 is further configured to divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images include the training area images. of all pixels.

FIG. 13 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 4000 shown in FIG. 13 includes a memory 4001 , a processor 4002 , a communication interface 4003 , and a bus 4004 . The memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.

The memory 4001 may be ROM, static storage device and RAM. The memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the image processing method of the embodiment of the present application.

The processor 4002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be performed by the units in the image processing apparatus of the embodiments of the present application, Or execute the image processing method of the method embodiment of the present application.

The processor 4002 may also be an integrated circuit chip with signal processing capability, for example, the chip shown in FIG. 4 . In the implementation process, each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.

The above-mentioned processor 4002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001, and combines its hardware to complete the functions required to be performed by the units included in the image processing apparatus of the embodiments of the present application, or to perform the image processing of the method embodiments of the present application. method.

The communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver. For example, the image to be processed can be acquired through the communication interface 4003 .

Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).

FIG. 14 is a schematic diagram of a hardware structure of a neural network training apparatus according to an embodiment of the present application. Similar to the above-mentioned apparatus 3000 and apparatus 4000 , the neural network training apparatus 5000 shown in FIG. 14 includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 . The memory 5001 , the processor 5002 , and the communication interface 5003 are connected to each other through the bus 5004 for communication.

The neural network can be trained by the neural network training apparatus 5000 shown in FIG. 14 , and the neural network obtained by training can be used to execute the image processing method of the embodiment of the present application.

Specifically, the apparatus shown in FIG. 14 can obtain training data and the neural network to be trained from the outside through the communication interface 5003, and then the processor can train the neural network to be trained according to the training data.

It should be noted that although the above-mentioned apparatus 4000 and apparatus 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may also include the necessary components for normal operation. of other devices. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may only include the necessary devices for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 13 and FIG. 14 .

It should be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (DRAM) Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Fetch memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive.

It should be understood that the term "and/or" in this document is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A and B exist at the same time , there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this document generally indicates that the related objects before and after are an "or" relationship, but may also indicate an "and/or" relationship, which can be understood with reference to the context.

In this application, "at least one" means one or more, and "plurality" means two or more. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple .

It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

An image processing method, comprising:

Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;

According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;

Each area image is compressed by using an image compression model corresponding to each area image.
The method according to claim 1, wherein the method further comprises:

Using the image decompression model corresponding to the image compression model for compressing the image of each area, decompress the image features obtained after the image of each area is compressed, and obtain the area decompression image corresponding to the image of each area;

Perform splicing processing and optimization processing on the plurality of regional decompressed images to obtain restored images to be processed, and the optimization processing includes adjusting the edge of at least one of the regional decompressed images.
The method according to claim 1 or 2, wherein the determining the texture complexity information corresponding to each region image in the plurality of region images in the image to be processed comprises:

Calculate the gradient size of each pixel in each region image;

According to the gradient size of each pixel, the texture complexity information of each region image is determined.
The method according to any one of claims 1-3, wherein the method further comprises:

The to-be-processed image is divided into the multiple area images, the multiple area images do not overlap, and the multiple area images include all pixels in the to-be-processed image.
A neural network training method, characterized in that the method comprises:

Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;

According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;

The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
The neural network training method according to claim 5, wherein the method further comprises:

Perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes adjusting the edge of at least one of the decompressed training area images;

The parameters of the encoding and decoding model and the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.
The method according to claim 5 or 6, wherein the determining the texture complexity information corresponding to each training area image in the plurality of training area images in the training image comprises:

Calculate the gradient size of each pixel in each training area image;

According to the gradient size of each pixel, the texture complexity information of each training area image is determined.
The method according to any one of claims 5-7, wherein the method further comprises:

The training image is divided into the plurality of training area images, the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
An image processing device, comprising a storage module and a processing module;

The storage module is used to store program instructions;

When the program instructions are executed in the processor, the processing module is configured to:

Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;

According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;

Each area image is compressed by using an image compression model corresponding to each area image.
The device according to claim 9, wherein the processing module is further configured to:

Using the image decompression model corresponding to the image compression model for compressing the image of each area, decompress the image features obtained after the image of each area is compressed, and obtain the area decompression image corresponding to the image of each area;

Perform splicing processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images in the multiple regions.
The device according to claim 9 or 10, wherein the processing module is further configured to:

Calculate the gradient size of each pixel in each region image;

According to the gradient size of each pixel, the texture complexity information of each region image is determined.
The device according to any one of claims 9-11, wherein the processing module is further configured to:

The to-be-processed image is divided into the multiple area images, the multiple area images do not overlap, and the multiple area images include all pixels in the to-be-processed image.
A neural network training device, comprising a storage module and a processing module;

The storage module is used to store program instructions,

When the program instructions are executed in the processor, the processing module is configured to:

Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;

According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;

The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
The device according to claim 13, wherein the processing module is further configured to:

Perform splicing processing and optimization processing on the plurality of decompressed training area images through the fusion model to obtain a training restoration image, and the optimization processing includes performing pixel adjustment on the edge of at least one of the decompressed training area images;

The parameters of the codec model and the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.
The device according to claim 13 or 14, wherein the processing module is further configured to:

Calculate the gradient size of each pixel in each training area image;

According to the gradient size of each pixel, the texture complexity information of each training area image is determined.
The device according to any one of claims 13-15, wherein the processing module is further configured to:

The training image is divided into the plurality of training area images, the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
A computer-readable storage medium, characterized in that the computer-readable medium stores a program code for execution by a device, the program code comprising for executing the method according to any one of claims 1 to 8.
A chip, characterized in that the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface, so as to execute the method according to any one of claims 1 to 8 method.