WO2022021938A1 - Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre - Google Patents

Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre Download PDF

Info

Publication number
WO2022021938A1
WO2022021938A1 PCT/CN2021/086836 CN2021086836W WO2022021938A1 WO 2022021938 A1 WO2022021938 A1 WO 2022021938A1 CN 2021086836 W CN2021086836 W CN 2021086836W WO 2022021938 A1 WO2022021938 A1 WO 2022021938A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
area
images
model
Prior art date
Application number
PCT/CN2021/086836
Other languages
English (en)
Chinese (zh)
Inventor
赵政辉
马思伟
王晶
Original Assignee
华为技术有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 北京大学 filed Critical 华为技术有限公司
Publication of WO2022021938A1 publication Critical patent/WO2022021938A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/41Bandwidth or redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Definitions

  • the present application relates to the field of artificial intelligence, and more particularly, to an image processing method and apparatus, and a method and apparatus for neural network training.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • Image compression can reduce redundant information in image data. Therefore, image compression is of great significance to improve the storage efficiency and transmission efficiency of images.
  • Traditional image compression methods such as Joint Photographic Experts Group (JPEG) have good compression effects in medium and high bit rate regions, but in low bit rate regions, the compression effects of traditional image compression methods are not ideal.
  • JPEG Joint Photographic Experts Group
  • the image can be compressed by neural network.
  • This method mainly uses neural network and corresponding nonlinear transformation to extract image features, so as to achieve the purpose of compression. Compared with the traditional image compression method, this method can save complicated parameter design and module design.
  • a neural network can be used for decoding to reconstruct the image. How to improve the image compression performance of neural network has become a technical problem that needs to be solved urgently.
  • the present application provides an image processing method and device, and a neural network training method and device, which can improve the image compression effect of the neural network.
  • an image processing method comprising: determining texture complexity information corresponding to each area image in a plurality of area images in an image to be processed; according to the texture complexity corresponding to each area image degree information, determine the image compression model corresponding to the image of each area, wherein, different texture complexity information corresponds to different image compression models; use the image compression model corresponding to the image of each area to compress the image of each area to compress.
  • the texture complexity of the image may not be the same.
  • the texture complexity of the image is low; in the area of interest or foreground area including people and other objects, the image complexity is high.
  • the regions with different texture complexities in the image to be processed can be compared with the texture of the image.
  • Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.
  • the method further includes: using an image decompression model corresponding to the image compression model for compressing the image of each area, compressing the image obtained by compressing the image of each area Decompress the features to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain a restored image to be processed, and the optimization processing includes Adjust the edges of the decompressed image in each area.
  • each image decompression model used to deal with different texture complexity is spliced by the decompressed image of the region obtained by decompression, two adjacent decompressed images may appear line discontinuity after splicing or color differences, etc.
  • the degree of image distortion between the complete image after compression, decompression, and splicing processing and the image before processing can be made smaller.
  • the determining the texture complexity information corresponding to each area image in the multiple area images in the image to be processed includes: calculating the gradient of each pixel in each area image Size; according to the gradient size of each pixel, determine the texture complexity information of each region image.
  • the magnitude of the gradient of a pixel can be determined based on the brightness of the pixel or other representations of color.
  • the texture complexity of the region image can be represented by the median or average of the gradient sizes of each pixel in the region image.
  • the method further includes: dividing the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images Include all pixels in the image to be processed.
  • the multiple area images include all the pixels in the to-be-processed image and the multiple area images do not overlap, which can reduce the bit rate.
  • a neural network training method comprising: determining texture complexity information corresponding to each training area image in a plurality of training area images in a training image; Train texture complexity information, and determine the codec model corresponding to each training area image, wherein different texture complexity information corresponds to different codec models, and each of the codec models is used for the input of the training Compress the regional images, and decompress the compression results to obtain multiple decompressed training area images; adjust the parameters of the codec model according to the rate distortion obtained from the decompressed training area images and the training area images.
  • the codec corresponding to the texture complexity is trained, so that the compression performance of each codec for the image of the corresponding texture complexity is achieved. better.
  • the texture complexity of the images may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high. Dividing a complete image into multiple regions, so as to use the images of each region to train the codec, can make the training data more in line with the texture complexity corresponding to the image of the codec, thereby improving each codec Compression performance for pictures of corresponding texture complexity.
  • the multiple decompressed training area images are stitched and optimized through a fusion model to obtain the training restoration image, and the optimization process includes decompressing at least one of the decompressed images.
  • the edge of the training area image is adjusted; the parameters of the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the image processed by the codec can be less distorted after splicing.
  • the parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the neural network models required in the image processing process can be trained in an "end-to-end” fashion.
  • rate-distortion can be determined according to the degree of image distortion and the bit rate.
  • the bit rate is used to indicate the compression degree of the image, which can be determined according to the compression result of the codec model.
  • the fusion model can also adjust other regions other than the edges of the decompressed training region image.
  • the determining the texture complexity information corresponding to each training area image in the multiple training area images in the training image includes: calculating each pixel in each training area image according to the gradient size of each pixel, determine the texture complexity information of each training area image.
  • the method further includes: dividing the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images
  • the region image includes all the pixels in the training image.
  • an electronic device and an image processing apparatus are provided, which are characterized by comprising a storage module and a processing module; the storage module is used to store program instructions; when the program instructions are executed in the processor, the The processing module is used to: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; according to the texture complexity information corresponding to each area image, determine the corresponding The image compression model, wherein, different texture complexity information corresponds to different image compression models; the image compression model corresponding to each area image is used to compress each area image.
  • the processing module is further configured to: use an image decompression model corresponding to the image compression model for compressing the image of each area, compress the image of each area to obtain Decompress the image features of the multiple regions to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain the restored image to be processed, and the optimization processing includes Pixel adjustment is performed on the edges of the decompressed image in the multiple regions.
  • the processing module is further configured to: calculate the gradient size of each pixel in each area image; determine the gradient size of each area image according to the gradient size of each pixel Texture complexity information.
  • the processing module is further configured to: divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images
  • the area image includes all the pixels in the image to be processed.
  • a neural network training device comprising a storage module and a processing module; the storage module is used for storing program instructions, and when the program instructions are executed in the processor, the processing module is used for: Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image; according to the training texture complexity information corresponding to each training area image, determine the encoding corresponding to each training area image.
  • a decoding model wherein different texture complexity information corresponds to different encoding and decoding models, and each encoding and decoding model is used to compress the input image of the training area, and decompress the compression result to obtain multiple decompressed training areas image; adjust the parameters of the codec model according to the rate-distortion, and the rate-distortion is obtained from the decompressed training area image and the training area image.
  • the processing module is further configured to: perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes: Pixel adjustment is performed on the edge of at least one image of the decompressed training area; and the parameters of the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.
  • the parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the processing module is further configured to: calculate the gradient size of each pixel in each training area image; determine each training area according to the gradient size of each pixel Texture complexity information for the image.
  • the processing module is further configured to: divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images do not overlap.
  • a training area image includes all pixels in the training image.
  • an electronic device comprising a memory and a processor, wherein the memory is used for storing program instructions; when the program instructions are executed in the processor, the processor is used for executing the first aspect or the first aspect The method described in the second aspect.
  • the processor in the fifth aspect above may be either a central processing unit (CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a graphics processor (graphics processing unit). unit, GPU), neural network processor (neural-network processing unit, NPU) and tensor processor (tensor processing unit, TPU) and so on.
  • TPU is Google's fully customized artificial intelligence accelerator application-specific integrated circuit for machine learning.
  • a computer-readable medium stores program code for execution by a device, the program code comprising a method for performing any one of the implementations of the first aspect or the second aspect .
  • a computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • a chip in an eighth aspect, includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementations of the first aspect or the second aspect.
  • the above chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image processing system.
  • FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a neural network training method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present application.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers.
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer. In short, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficients), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and offset vector The number is also higher.
  • the DNN Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
  • the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Recurrent neural networks are used to process sequence data.
  • RNN Recurrent neural networks
  • the layers are fully connected, and each node in each layer is unconnected.
  • this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • the pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color.
  • the pixel value is 256*Red+100*Green+76Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness.
  • the pixel values can be grayscale values.
  • Image compression refers to the technology of representing the original pixel matrix lossy or lossless with fewer bits, also known as image coding.
  • Image compression which performs transformations on the image content, can reduce the amount of data required to represent a digital image, thereby reducing the space occupied by image storage.
  • Image data can be compressed because there is redundancy in the data.
  • the redundancy of image data is mainly manifested as: spatial redundancy caused by the correlation between adjacent pixels in the image; temporal redundancy caused by the correlation between different frames in the image sequence; caused by the correlation of different color planes or spectral bands. spectrum redundancy.
  • the purpose of data compression is to reduce the number of bits required to represent data by removing these data redundancies. Due to the huge amount of image data, it is very difficult to store, transmit and process, so the compression of image data is very important.
  • Image decompression is the inverse process of image compression, which can also be called decompression or decoding.
  • image decoding the information format of the input compact representation can be restored as an image.
  • the peak signal-to-noise ratio (PSNR) between the original image and the encoded reconstructed image is used to measure the image distortion.
  • PSNR can be the PSNR of luminance or the linearity of PSNR of luminance and chrominance. combination.
  • the PSNR (Y-PSNR) of the luminance is used as the main criterion.
  • the peak signal is the maximum value of the pixel in the image (for example, the maximum value of the pixel brightness)
  • the noise refers to the mean square error of each pixel value in the original image and the reconstructed image (the square of the difference is averaged); the ratio of the two is converted.
  • PSNR PSNR.
  • the code rate (rate), also known as the encoding code rate, can be the average data amount ((bit-per-pixel, bpp) of each pixel in the compressed data, which is used to indicate the degree of data compression.
  • the code rate can be Determined according to the proportion of the data volume after image compression.
  • Rate distortion is used to express the relationship between image distortion and bit rate.
  • Rate distortion optimization refers to reducing image distortion and bit rate as much as possible according to preset rules. That is to say, in the case of a bit rate as small as possible, the distortion of the obtained image can be reduced as much as possible, so as to achieve a better compression effect.
  • rate-distortion optimization a balance point can be found between bit rate and distortion, so that the compression effect is optimal.
  • the rule for rate-distortion optimization may also be that the distortion is the smallest when the code rate is guaranteed to be less than the upper limit, or the code rate is the smallest when the distortion is guaranteed to be less than the lower limit, and so on.
  • Rate-distortion can be calculated by the rate-distortion function.
  • an embodiment of the present application provides a system architecture 100 .
  • a data collection device 160 is used to collect training data.
  • the training data may include training images.
  • the data collection device 160 After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
  • the training device 120 processes the input original image and compares the output image with the original image until the training device 120 outputs the image and the original image.
  • the rate-distortion determined by the difference is less than a certain threshold, so that the training of the target model/rule 101 is completed.
  • the above target model/rule 101 can be used to implement the image processing method of the embodiment of the present application.
  • the target model/rule 101 in this embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application Limitations of Examples.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptops, augmented reality (AR) AR/virtual reality (VR), in-vehicle terminals, etc., can also be servers or the cloud.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: an image to be processed input by the client device.
  • the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112.
  • the preprocessing module 113 and the preprocessing module may also be absent.
  • 114 or only one of the preprocessing modules, and directly use the calculation module 111 to process the input data.
  • the execution device 110 When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the above-obtained image classification result, to the client device 140 so as to be provided to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.
  • the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the target model/rule 101 is obtained by training the training device 120.
  • the target model/rule 101 may be the neural network in the present application in this embodiment of the present application.
  • the neural network may be used in this embodiment of the present application.
  • CNN deep convolutional neural network
  • DCNN deep convolutional neural networks
  • RNN recurrent neural network
  • CNN is a very common neural network
  • a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture.
  • a deep learning architecture refers to an algorithm based on machine learning. learning at multiple levels of abstraction.
  • CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.
  • a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 .
  • the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed by the convolution layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained.
  • the internal layer structure in the CNN 200 in Figure 2 is described in detail below.
  • the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 may include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted convolution feature maps with the same size are combined to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .
  • the initial convolutional layer eg, 221
  • the features extracted by the later convolutional layers eg, 226 become more and more complex, such as features such as high-level semantics.
  • features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multi-layer hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to classification cross entropy, and is specifically used to calculate the prediction error,
  • the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation)
  • the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
  • a convolutional neural network (CNN) 200 may include an input layer 110 , a convolutional/pooling layer 120 (where the pooling layer is optional), and a neural network layer 130 .
  • CNN convolutional neural network
  • FIG. 3 Compared with FIG. 2 , multiple convolution layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 3 are parallel, and the extracted features are input to the full neural network layer 130 for processing.
  • the convolutional neural networks shown in FIG. 2 and FIG. 3 are only examples of two possible convolutional neural networks of the image processing method according to the embodiment of the present application.
  • the convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • FIG. 4 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50 .
  • the chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 .
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 .
  • the algorithms of each layer in the convolutional neural network shown in Figures 2 and 3 can be implemented in the chip shown in Figure 4.
  • the neural network processor NPU 50 is mounted on the main central processing unit (CPU) (host CPU) as a coprocessor, and tasks are allocated by the main CPU.
  • the core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract the data in the memory (weight memory or input memory) and perform operations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 503 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers it on each PE in the operation circuit.
  • the arithmetic circuit fetches the data of the matrix A from the input memory 501 and performs the matrix operation on the matrix B, and stores the partial result or the final result of the matrix in the accumulator 508.
  • the vector calculation unit 507 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector computing unit 507 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • vector computation unit 507 can store the processed output vectors to unified buffer 506 .
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 507 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 503, eg, for use in subsequent layers in a neural network.
  • Unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • a bus interface unit (BIU) 510 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 509 through the bus.
  • the instruction fetch memory (instruction fetch buffer) 509 connected with the controller 504 is used to store the instructions used by the controller 504;
  • the controller 504 is used for invoking the instructions cached in the memory 509 to control the working process of the operation accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip (On-Chip) memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access Memory (double data rate synchronous dynamic random access memory, referred to as DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • DDR SDRAM double data rate synchronous dynamic random access Memory
  • HBM high bandwidth memory
  • HBM high bandwidth memory
  • each layer in the convolutional neural network shown in FIG. 2 and FIG. 3 may be performed by the operation circuit 503 or the vector calculation unit 507 .
  • the execution device 110 in FIG. 1 described above can execute each step of the image processing method of the embodiment of the present application.
  • the CNN model shown in FIG. 2 and FIG. 3 and the chip shown in FIG. 4 can also be used to execute the implementation of the present application.
  • the steps of the image processing method of the example The method for training a neural network according to the embodiment of the present application and the image processing method according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • an embodiment of the present application provides a system architecture 300 .
  • the system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.
  • the execution device 210 may be implemented by one or more servers.
  • the execution device 210 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 210 may be arranged on one physical site, or distributed across multiple physical sites.
  • the execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the image processing method in this embodiment of the present application.
  • the execution device 210 may perform the following process: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; determine the texture complexity information corresponding to each area image according to the texture complexity information An image compression model corresponding to each area image, wherein different texture complexity information corresponds to different image compression models; each area image is compressed by using the image compression model corresponding to each area image.
  • the image compression effect of the to-be-processed image can be improved by using the compression model corresponding to the texture complexity of the region image for the region image with different texture complexity.
  • a user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 .
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and the like.
  • Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • any communication mechanism/standard communication network which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network for image classification Or image processing, etc.
  • the target neural network can be directly deployed on the execution device 210, and the execution device 210 obtains the images to be processed from the local device 301 and the local device 302, and classifies the images to be processed or other types of images according to the target neural network deal with.
  • the above execution device 210 may also be a cloud device, in this case, the execution device 210 may be deployed in the cloud; or, the above execution device 210 may also be a terminal device, in this case, the execution device 210 may be deployed on the user terminal side, the embodiment of the present application This is not limited.
  • FIG. 6 shows a schematic structural diagram of an image processing system.
  • the image processing system 600 includes an encoder 610 and a quantization module 620 at the encoding end, and a decoder 630 at the decoding end.
  • the encoder 610 and the decoder 630 are neural networks.
  • the image processing system 600 may be applied in the scenario of transmitting and storing images.
  • the encoder 610 and the quantization module 620 at the encoding end may be set in a server in the cloud.
  • Image data performs an image encoding process in the cloud, resulting in a compact representation of compressed data.
  • Storing compressed data can reduce the storage space occupied by saving images.
  • the transmission of compressed data can reduce the occupation of transmission resources in the image transmission process and reduce the demand for bandwidth.
  • the decoder 630 on the decoding side may be provided in a terminal device serving as a client.
  • the decoding end performs a decoding operation on the compressed data to obtain a reconstructed image.
  • the terminal device can display the reconstructed image through a display.
  • the encoder 610 is used for extracting features of the image to be processed, so as to obtain image features.
  • the quantization module 620 is used for quantizing image features to obtain compressed data. Quantization, that is, the process of approximating a continuous value of a signal (or a large number of possible discrete values) into a finite number (or fewer) of discrete values in the field of digital signal processing.
  • the cloud can transmit the compressed data to the client.
  • the decoder 630 is used to decompress the compressed data to obtain a reconstructed image.
  • the image processing system 600 uses the entire image as input, and performs nonlinear transformation on the image to reduce the correlation between codewords and improve the compression performance of the neural network.
  • Texture in computer graphics includes both the texture of the surface of the object in the usual sense, even if the surface of the object exhibits uneven grooves, and the color pattern on the smooth surface of the object. Texture complexity can be used to reflect how strongly pixel values in an image are transformed. Different categories of images have significant differences in texture details and other characteristics, and image content characteristics with different texture complexity are quite different.
  • the image processing system 600 uses the same encoder to perform the same processing for images with different texture complexities, which hinders further improvement of the compression performance.
  • embodiments of the present application provide an image processing system to improve image compression performance.
  • FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • the neural network corresponding to the image texture complexity is selected from multiple neural networks.
  • the structure realizes the adaptive adjustment of the selection of the neural network according to the image content, and compresses the image according to different texture characteristics. A further improvement in image compression performance is achieved.
  • Image processing system 700 includes compression system 710 and decompression system 720 .
  • the compression system 710 includes a segmentation model 711 , a classification model 712 , and a compression model 713
  • the decompression system 720 includes a decompression module 721 and a fusion module 722 .
  • Compression system 710 and decompression system 720 may be located in the same or different devices.
  • the image to be processed is input into the compression system 710, and the compression system 710 is used for compressing the image to be processed.
  • the segmentation model 711 can segment the to-be-processed image to obtain multiple region images.
  • Area images can also be referred to as image blocks.
  • the sizes of the plurality of area images may be the same or different.
  • the image can be segmented according to the target size to obtain multiple region images with the same size.
  • the multiple region images do not overlap.
  • the to-be-processed image can be divided into multiple 128 ⁇ 128 area images. Through the non-overlapping division of multiple images to be processed, a plurality of area images with non-repetitive contents are formed.
  • the region images are input into the classification model 712.
  • the classification model 712 is used to calculate the texture complexity of the input image.
  • the gradient of each pixel in the image can be calculated, so as to realize the first-order differential operation of the image, and consider the directionality.
  • the direction of the gradient of the image is at the maximum change rate of the image gray level, which can reflect the gray level change on the edge of the image.
  • the gradient operator always points in the direction of the most drastic transformation.
  • the direction of the gradient operator is orthogonal to the edges in the image.
  • the size of the gradient operator represents the rate of change of the grayscale of the image.
  • the classification model 712 can calculate the difference value of the luminance of each pixel in the horizontal direction and the vertical direction for the input image. According to the difference value of the brightness of each pixel in the horizontal direction and the vertical direction, the gradient size of the pixel can be calculated. From the magnitude of the gradient for each pixel, the average gradient magnitude or the median of the gradient magnitudes for the individual pixels in the image can be determined. The average gradient size of the image or the median can indicate how smooth the image is, reflecting the texture complexity of the image.
  • the compression module 713 is configured to perform image feature extraction on the image according to the texture complexity of the image by using a compression model corresponding to the texture complexity, so as to realize the compression of the image.
  • the compression module 713 may include an AI model for extracting features, called a compression model (also called an image compression model), or the compression module 713 may invoke the compression model through an interface to extract image features.
  • the compression model can be a neural network model that is pre-trained.
  • the region image can be input into the compression model to obtain the image features of the region image.
  • the compression model may be, for example, CNN, RNN, or the like.
  • the compression module 713 may store the correspondence between the texture complexity and the compression model. Therefore, according to the texture complexity of the area image, the compression module 713 may determine a compression model corresponding to the texture complexity from a plurality of compression models, and process the area image.
  • the image features of each regional image can be obtained.
  • Compression system 710 may also include quantization models and the like.
  • the quantization model can quantify image features.
  • the decompression system 720 is used to decompress the processing result of the compression system 710 .
  • the image features processed by the compression system 710 are input into the decompression module 721 .
  • the image features processed by the compression system 710 may be the image features output by the compression module 713 .
  • the compression system 710 may further include a quantization model, the image features processed by the compression system 710 may be quantized image features.
  • the decompression module 721 is used to decompress the image features to obtain a decompressed image.
  • the decompression module 721 may be configured to perform image decompression on the image features according to the texture complexity of the image using a decompression model corresponding to the texture complexity to obtain a decompressed image.
  • the decompression module 721 may receive indication information, where the indication information is used to indicate the decompression model corresponding to each image feature.
  • the decompression module 721 can decompress the image feature by using the decompression model indicated by the indication information.
  • the decompression module 721 may include an AI model for decompressing image features, called a decompression model or an image decompression model, or the decompression module 721 may invoke the decompression model through an interface to decompress the image.
  • the decompression model can be a neural network model that is pre-trained.
  • the image feature and the image texture complexity corresponding to the image feature can be input into the decompression model to obtain the decompressed image of the region.
  • the decompression model can be, for example, CNN, RNN, or the like.
  • the decompression module 721 may store the correspondence between the texture complexity and the decompression model. Therefore, according to the texture complexity of the regional image, the decompression module 721 can determine a decompression model corresponding to the texture complexity from a plurality of compression models, and process the image feature.
  • the restored images of each region can be obtained.
  • the fusion module 722 is used to fuse the restored regional images.
  • the fusion module 722 may include an AI model for image fusion, which is called a fusion model, or the fusion module 722 may call the fusion model through an interface to realize fusion of regional images.
  • the fusion model can be a neural network model that is pre-trained.
  • the restored image of each region can be input into the fusion model to obtain the fused image.
  • the fused image may also be referred to as a reconstructed image or a compressed reconstructed image.
  • the fusion model can be, for example, a CNN or the like.
  • the fusion of regional images can be splicing the regional images.
  • the fusion of the regional images may also include adjustment of the edge pixels of the regional images, so that the error between the reconstructed image and the image to be processed is smaller and the degree of distortion is reduced.
  • the image processing system 700 uses different compression models and decompression models to process data by calculating the texture complexity of the regional images under the condition that the texture complexity of the regional images is different, thereby improving the image compression performance.
  • the image processing system 700 divides the image to be processed and calculates the texture complexity of the images in different regions, so that the foreground and background of the image to be processed can be processed using different compression models and decompression models, thereby improving image compression performance.
  • the decompression system 720 in the image processing system 700 reduces the degree of image distortion and improves the image compression performance by adjusting the edge pixels of the regional image during regional image fusion.
  • Each AI model used in the image processing system 700 may be obtained through end-to-end training; alternatively, the compression model and the decompression model may be trained first, and then the fusion model may be trained.
  • the training method of the AI model adopted in the image processing system 700 reference may be made to the description of FIG. 8 .
  • End-to-end training is a machine learning paradigm.
  • the entire learning process does not divide artificial sub-problems, but is completely handed over to the deep learning model to directly learn the mapping from the original data to the desired output.
  • FIG. 8 is a schematic structural diagram of a neural network model training method provided by an embodiment of the present application.
  • the training images to be processed can be obtained.
  • the complete training image to be processed can be divided to obtain multiple training area images.
  • the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
  • Each of the encoding and decoding models is used to compress the input images of the training area, and decompress the compression result, thereby obtaining multiple decompressed images of the training area.
  • the codec model includes a compression model and a decompression model.
  • the compression model is used to compress the image
  • the decompression model is used to decompress the processing result of the compression model.
  • Each training area image is input into an encoding/decoding model corresponding to the training area image, and the encoding/decoding model processes the input training area image to obtain a decoded training area image corresponding to the training area image.
  • the compression model compresses the images in the training area to obtain the training features of the images in the training area.
  • the decompression model decodes the training features of the training area image to obtain the decompressed training area image corresponding to the training area image.
  • the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  • the encoding and decoding model can be trained by using the training images.
  • the training area image is processed by using the codec model adjusted by the parameters each time until the rate distortion gradually converges, so as to obtain the codec model that has been trained.
  • a fusion model can be used to stitch and optimize the decompressed training area images corresponding to each training area image input to the fusion model to obtain a training restoration image.
  • This embodiment of the present application does not limit the sequence of the splicing process and the optimization process.
  • the optimization process includes adjustments to the edge regions of the decompressed training images.
  • the optimization process may also include adjustments to regions other than the border regions of the decompressed training image.
  • the adjustment in the optimization process is the adjustment of the color, for example, the brightness and chromaticity can be adjusted.
  • the parameters of the fusion model can be adjusted according to the image distortion degree of the training recovery image and the training image to be processed to complete the training of the fusion model.
  • the fusion model can stitch the decompressed training images corresponding to the images in each training area, and modify and adjust the pixels located in the edge area of the decompressed training images to obtain the training recovery image.
  • the training recovery image is the recovered training image to be processed.
  • the degree of image distortion can be determined.
  • the bit rate can be determined according to the average amount of data per pixel in the compression result.
  • the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the image distortion degree is reduced when the bit rate meets the preset conditions.
  • the parameters of the fusion model and the encoding/decoding model corresponding to each training area image may also be adjusted according to the code rate. After that, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the bit rate is reduced when the image distortion degree satisfies the preset condition.
  • the compression performance can be reflected as a whole through rate-distortion.
  • the parameters of the fusion model and the codec model corresponding to each training area image can be adjusted to minimize rate-distortion.
  • the codec model and fusion model after parameter adjustment are used for processing each time until the rate-distortion gradually converges, so as to obtain the codec model and fusion model after training.
  • the training of the AI model in the multi-image processing system 700 can be implemented in an "end-to-end” manner.
  • the codec model may also be pre-trained first, and the fusion model may be trained by using the pre-trained codec model.
  • training images to be processed may be acquired.
  • the to-be-processed training image can be divided to obtain multiple training region images.
  • the codec model corresponding to the training area image can be used for compression, and the compression result can be decompressed to determine the bit rate and the image distortion rate, thereby determining the rate distortion.
  • the rate-distortion is optimized by adjusting the parameters of the codec model.
  • a large number of training images to be processed are processed to obtain a large number of training region images to cover images of each texture complexity.
  • each texture complexity each time the codec model after parameter adjustment is used to process the image of the training area of the texture complexity until the rate-distortion gradually converges, so as to obtain each codec model that has been pre-trained.
  • the pre-trained encoding and decoding model is used to process the to-be-processed training image, so as to obtain the decompressed training area image corresponding to each training area image in the to-be-processed training image.
  • the fusion model is used to fuse the decompressed training images corresponding to the images of each training area in a to-be-processed training image to obtain a training recovery image.
  • the pre-trained codec model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained.
  • the completed fusion model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained.
  • the parameters of the codec model may also be adjusted to obtain each AI model that has been trained in the image processing system 700 .
  • the encoding and decoding models corresponding to each texture complexity are trained by using the training area images of different texture complexities.
  • Using the trained codec model can realize differential processing of regional images with different texture complexity, thereby improving the overall image compression performance.
  • the code rate and the image distortion degree can be calculated according to the decompressed training area image and the training area image, so as to determine the rate distortion, and adjust the parameters of the codec model according to the rate distortion, so as to complete the training of the codec model or pretrained. After that, use the trained or pre-trained encoder-decoder model to train the fusion model.
  • the training restoration image may also be determined by processing the fusion model according to the decompressed training area image. Calculate the bit rate according to the decompressed training area image. According to the training recovery image and the training image, the image distortion degree is calculated, and the rate distortion is determined according to the bit rate and the image distortion degree. After that, the parameters of the encoder-decoder model and the fusion model can be adjusted according to the rate-distortion to complete the training of the encoder-decoder model and the fusion model.
  • FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the method 900 shown in FIG. 9 can be executed by an image processing device, and the image processing device can be a mobile terminal, and the computing power of a computer, a server, etc. is sufficient for the image processing device.
  • the method 900 may be specifically applied in the fields of image transmission, graphics storage, etc. that need to compress images.
  • the method includes S910 to S930, and these steps are described in detail below.
  • an image compression model corresponding to each regional image is determined according to the texture complexity information corresponding to each regional image.
  • the difference value of each pixel of the area image in two different directions can be calculated. Taking the horizontal direction of the area image as the x-axis and the vertical direction as the y-axis, a plane rectangular coordinate system is established.
  • the area image is stored in the form of a two-dimensional array, and the difference value of the pixel (i, j) in the x direction is:
  • the difference value of the pixel (i, j) in the x direction is:
  • p(i, j) may be the brightness of the pixel (i, j), or may be other parameters used to represent the color of the pixel.
  • Calculate the gradient of each pixel (i, j) in the area image according to the difference values in these two directions can be represented by a vector: (dx(i,j),dy(i,j) ).
  • the gradient size Grad(i, j) of pixel (i, j) is:
  • the average value of the gradient magnitude of the region image can be obtained:
  • W represents the number of pixels in the area image in the x direction
  • H represents the number of pixels in the area image in the y direction.
  • the gradient mean value G of the area image can be used to represent and evaluate the texture complexity of the area image.
  • the gradient size (which can also be referred to as the gradient length) of each pixel (i, j) Grad(i, j) can also be expressed as:
  • the image compression model corresponding to the first texture complexity information of each region image may be determined according to the corresponding relationship between the texture complexity and the compression model.
  • the correspondence between the texture complexity information and the image compression model may include two or more types of texture complexity information, and a compression model corresponding to each texture complexity information.
  • the texture complexity of the image can be determined to be complex; when the average value of the gradient size of the image is smaller than the preset value, the texture complexity of the image can be determined to be simple . Therefore, the compression model corresponding to the image can be determined according to the average value of the gradient size of the image.
  • the correspondence between the texture complexity and the compression model is the same as the correspondence between the texture complexity and the compression model used when training the first compression model used in the method 900 .
  • each area image is compressed using an image compression model corresponding to the each area image.
  • image compression models corresponding to the image texture complexities can be used to compress each region image in the to-be-processed image, thereby improving the image compression effect of the to-be-processed image.
  • the image to be processed may be a complete image, for example, a photo captured by a camera, or a frame of image in a video.
  • the multiple region images may include all pixels in the to-be-processed image, thereby reducing image distortion and improving compression performance.
  • the texture complexity of the image may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high.
  • the regions with different texture complexities in the image to be processed can be compared with the texture of the image.
  • Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.
  • the embodiments of the present application provide a more flexible image processing manner.
  • the image to be processed is divided into multiple area images, and each area image is compressed to obtain compressed data.
  • each area image is compressed to obtain compressed data.
  • the compressed data corresponding to each regional image may be decompressed separately.
  • Each area image is compressed using an image compression model corresponding to the area image to obtain the image features of the area image.
  • the decompression model corresponding to the image compression model during compression should be used for decompression to obtain a regional decompressed image, and the regional decompressed image can also be understood as a restored regional image obtained by decompression.
  • the region decompressed images corresponding to each region image in the to-be-processed image are spliced, so that the to-be-processed image can be restored.
  • optimization processing can be performed on the decompressed images of each region.
  • the optimization process may include adjusting the border regions of the decompressed image for one or more regions.
  • the second images corresponding to the images in each region may be obtained by different decompression models, in the edge region of two adjacent second images, discontinuous lines or color differences may appear after splicing.
  • the pixels of the edge region of one or more second images may be adjusted.
  • the optimization process may also include adjustments to regions other than the edge regions of the region-decompressed image.
  • Optimization processing may be performed before or after splicing, which is not limited in this embodiment of the present application.
  • the pixels in the edge region of the second image may be stitched and optimized using the fusion model.
  • the pixels of one or more edge regions of the second image are adjusted, which can further reduce the degree of image distortion and improve the image compression effect.
  • FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
  • Using the image processing method 900 to compress and decompress images can achieve better image compression performance.
  • the image to be processed into a plurality of area images of equal size.
  • the plurality of area images include all pixels of the image to be processed.
  • the compression model and the decompression model corresponding to the image can be determined according to the average value of the gradient size of the image, the image is compressed, and the compression result is decompressed.
  • the image processing method provided by the embodiment of the present application adopts the multi-model processing method, which can In the case of the same bit rate, the PSNR is effectively improved, and the image distortion is lower.
  • the image processing system provided by the embodiment of the present application, the AI model training method required by the image processing system, and the image processing method are described above with reference to FIGS. 1 to 10 .
  • FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • the image processing apparatus 2000 includes a storage module 2010 and a processing module 2020 .
  • the storage module 2010 is used to store program instructions.
  • the processing module 2020 is configured to:
  • the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models
  • Each area image is compressed by using an image compression model corresponding to each area image.
  • the processing module 2020 is further configured to use an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area.
  • an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area.
  • the region decompressed image corresponding to each region image is described.
  • the processing module 2020 is further configured to perform stitching processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images of the multiple regions.
  • the processing module 2020 is further configured to calculate the gradient size of each pixel in each regional image.
  • the processing module 2020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each regional image.
  • the processing module 2020 is further configured to divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images include the image to be processed. All pixels.
  • FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
  • the neural network training apparatus 3000 includes a storage module 3010 and a processing module 3020 .
  • the storage module 3010 is used to store program instructions.
  • the processing module 3020 is configured to:
  • the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model
  • the decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;
  • the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  • the processing module 3020 is further configured to perform stitching processing and optimization processing on the multiple decompressed training area images by using a fusion model, and the optimization processing includes performing pixel adjustment on the edges of the multiple decompressed training area images.
  • the processing module 3020 is further configured to adjust the parameters of the fusion model according to the degree of image distortion between the training restored image and the training image.
  • the processing module 3020 is further configured to adjust parameters of the encoding and decoding model according to the degree of image distortion between the training restoration image and the training image.
  • the processing module 3020 is further configured to calculate the gradient size of each pixel in each training area image.
  • the processing module 3020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each training area image.
  • the processing module 3020 is further configured to divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images include the training area images. of all pixels.
  • FIG. 13 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 4000 shown in FIG. 13 includes a memory 4001 , a processor 4002 , a communication interface 4003 , and a bus 4004 .
  • the memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.
  • the memory 4001 may be ROM, static storage device and RAM.
  • the memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the image processing method of the embodiment of the present application.
  • the processor 4002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be performed by the units in the image processing apparatus of the embodiments of the present application, Or execute the image processing method of the method embodiment of the present application.
  • the processor 4002 may also be an integrated circuit chip with signal processing capability, for example, the chip shown in FIG. 4 .
  • each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.
  • the above-mentioned processor 4002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001, and combines its hardware to complete the functions required to be performed by the units included in the image processing apparatus of the embodiments of the present application, or to perform the image processing of the method embodiments of the present application. method.
  • the communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver.
  • a transceiver device such as, but not limited to, a transceiver.
  • the image to be processed can be acquired through the communication interface 4003 .
  • Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).
  • FIG. 14 is a schematic diagram of a hardware structure of a neural network training apparatus according to an embodiment of the present application. Similar to the above-mentioned apparatus 3000 and apparatus 4000 , the neural network training apparatus 5000 shown in FIG. 14 includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 . The memory 5001 , the processor 5002 , and the communication interface 5003 are connected to each other through the bus 5004 for communication.
  • the neural network can be trained by the neural network training apparatus 5000 shown in FIG. 14 , and the neural network obtained by training can be used to execute the image processing method of the embodiment of the present application.
  • the apparatus shown in FIG. 14 can obtain training data and the neural network to be trained from the outside through the communication interface 5003, and then the processor can train the neural network to be trained according to the training data.
  • apparatus 4000 and apparatus 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may also include the necessary components for normal operation. of other devices. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may only include the necessary devices for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 13 and FIG. 14 .
  • the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory Fetch memory
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “plurality” means two or more.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre. Le procédé de traitement d'image comprend : la détermination d'informations de complexité de texture correspondant à chacune d'une pluralité d'images régionales dans une image à traiter (S910) ; la détermination, en fonction des informations de complexité de texture correspondant à chaque image régionale, d'un modèle de compression d'image correspondant à chaque image régionale (S920), de différentes informations de complexité de texture correspondant à différents modèles de compression d'image ; et l'utilisation du modèle de compression d'image correspondant à chaque image régionale pour compresser chaque image régionale (S930). Selon le procédé, le modèle de compression correspondant à la complexité de texture de chacune des images régionales ayant différentes complexités de texture dans l'image à traiter est utilisé pour compresser l'image régionale, ce qui améliore l'effet de compression global de l'image à traiter.
PCT/CN2021/086836 2020-07-30 2021-04-13 Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre WO2022021938A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010754067.X 2020-07-30
CN202010754067.XA CN114067007A (zh) 2020-07-30 2020-07-30 图像处理方法与装置、神经网络训练的方法与装置

Publications (1)

Publication Number Publication Date
WO2022021938A1 true WO2022021938A1 (fr) 2022-02-03

Family

ID=80037134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086836 WO2022021938A1 (fr) 2020-07-30 2021-04-13 Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre

Country Status (2)

Country Link
CN (1) CN114067007A (fr)
WO (1) WO2022021938A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491272A (zh) * 2022-02-14 2022-05-13 北京有竹居网络技术有限公司 一种多媒体内容推荐方法及装置
CN115278246A (zh) * 2022-08-01 2022-11-01 天津大学 一种深度图端到端智能压缩编码方法及装置
CN116684607A (zh) * 2023-07-26 2023-09-01 腾讯科技(深圳)有限公司 图像压缩和解压缩方法、装置、电子设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147501B (zh) * 2022-09-05 2022-12-02 深圳市明源云科技有限公司 图片解压方法、装置、终端设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217668A (zh) * 2008-01-14 2008-07-09 浙江大学 基于块分类的混合图像压缩方法
CN103700121A (zh) * 2013-12-30 2014-04-02 Tcl集团股份有限公司 一种复合图像的压缩方法及装置
CN108062780A (zh) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 图像压缩方法和装置
CN109996078A (zh) * 2019-02-25 2019-07-09 阿里巴巴集团控股有限公司 一种图像压缩方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217668A (zh) * 2008-01-14 2008-07-09 浙江大学 基于块分类的混合图像压缩方法
CN103700121A (zh) * 2013-12-30 2014-04-02 Tcl集团股份有限公司 一种复合图像的压缩方法及装置
CN108062780A (zh) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 图像压缩方法和装置
CN109996078A (zh) * 2019-02-25 2019-07-09 阿里巴巴集团控股有限公司 一种图像压缩方法、装置及电子设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491272A (zh) * 2022-02-14 2022-05-13 北京有竹居网络技术有限公司 一种多媒体内容推荐方法及装置
CN114491272B (zh) * 2022-02-14 2023-09-12 北京有竹居网络技术有限公司 一种多媒体内容推荐方法及装置
CN115278246A (zh) * 2022-08-01 2022-11-01 天津大学 一种深度图端到端智能压缩编码方法及装置
CN115278246B (zh) * 2022-08-01 2024-04-16 天津大学 一种深度图端到端智能压缩编码方法及装置
CN116684607A (zh) * 2023-07-26 2023-09-01 腾讯科技(深圳)有限公司 图像压缩和解压缩方法、装置、电子设备及存储介质
CN116684607B (zh) * 2023-07-26 2023-11-14 腾讯科技(深圳)有限公司 图像压缩和解压缩方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114067007A (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2020216227A9 (fr) Procédé et appareil de classification d'image et procédé et appareil de traitement de données
WO2022021938A1 (fr) Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre
WO2020177651A1 (fr) Procédé de segmentation d'image et dispositif de traitement d'image
WO2021018163A1 (fr) Procédé et appareil de recherche de réseau neuronal
CN113259665B (zh) 一种图像处理方法以及相关设备
WO2021043273A1 (fr) Procédé et appareil d'amélioration d'image
WO2020177607A1 (fr) Procédé et appareil de débruitage d'image
WO2022001372A1 (fr) Procédé et appareil d'entraînement de réseau neuronal, et procédé et appareil de traitement d'image
CN113066017B (zh) 一种图像增强方法、模型训练方法及设备
WO2021018245A1 (fr) Procédé et appareil de classification d'images
WO2021018251A1 (fr) Procédé et dispositif de classification d'image
WO2024002211A1 (fr) Procédé de traitement d'image et appareil associé
CN113066018A (zh) 一种图像增强方法及相关装置
CN113011562A (zh) 一种模型训练方法及装置
WO2022179588A1 (fr) Procédé de codage de données et dispositif associé
TWI826160B (zh) 圖像編解碼方法和裝置
WO2021042774A1 (fr) Procédé de récupération d'image, procédé d'entraînement de réseau de récupération d'image, dispositif, et support de stockage
CN113284055A (zh) 一种图像处理的方法以及装置
WO2023174256A1 (fr) Procédé de compression de données et dispositif associé
WO2022022176A1 (fr) Procédé de traitement d'image et dispositif associé
WO2022001364A1 (fr) Procédé d'extraction de caractéristiques de données et appareil associé
WO2021057091A1 (fr) Procédé de traitement d'image de point de vue et dispositif associé
CN115409697A (zh) 一种图像处理方法及相关装置
WO2021189321A1 (fr) Procédé et dispositif de traitement d'image
CN115294429A (zh) 一种基于特征域网络训练方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849568

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849568

Country of ref document: EP

Kind code of ref document: A1