WO2022021938A1 - Image processing method and device, and neutral network training method and device - Google Patents

Image processing method and device, and neutral network training method and device Download PDF

Info

Publication number
WO2022021938A1
WO2022021938A1 PCT/CN2021/086836 CN2021086836W WO2022021938A1 WO 2022021938 A1 WO2022021938 A1 WO 2022021938A1 CN 2021086836 W CN2021086836 W CN 2021086836W WO 2022021938 A1 WO2022021938 A1 WO 2022021938A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
area
images
model
Prior art date
Application number
PCT/CN2021/086836
Other languages
French (fr)
Chinese (zh)
Inventor
赵政辉
马思伟
王晶
Original Assignee
华为技术有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 北京大学 filed Critical 华为技术有限公司
Publication of WO2022021938A1 publication Critical patent/WO2022021938A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/41Bandwidth or redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Definitions

  • the present application relates to the field of artificial intelligence, and more particularly, to an image processing method and apparatus, and a method and apparatus for neural network training.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • Image compression can reduce redundant information in image data. Therefore, image compression is of great significance to improve the storage efficiency and transmission efficiency of images.
  • Traditional image compression methods such as Joint Photographic Experts Group (JPEG) have good compression effects in medium and high bit rate regions, but in low bit rate regions, the compression effects of traditional image compression methods are not ideal.
  • JPEG Joint Photographic Experts Group
  • the image can be compressed by neural network.
  • This method mainly uses neural network and corresponding nonlinear transformation to extract image features, so as to achieve the purpose of compression. Compared with the traditional image compression method, this method can save complicated parameter design and module design.
  • a neural network can be used for decoding to reconstruct the image. How to improve the image compression performance of neural network has become a technical problem that needs to be solved urgently.
  • the present application provides an image processing method and device, and a neural network training method and device, which can improve the image compression effect of the neural network.
  • an image processing method comprising: determining texture complexity information corresponding to each area image in a plurality of area images in an image to be processed; according to the texture complexity corresponding to each area image degree information, determine the image compression model corresponding to the image of each area, wherein, different texture complexity information corresponds to different image compression models; use the image compression model corresponding to the image of each area to compress the image of each area to compress.
  • the texture complexity of the image may not be the same.
  • the texture complexity of the image is low; in the area of interest or foreground area including people and other objects, the image complexity is high.
  • the regions with different texture complexities in the image to be processed can be compared with the texture of the image.
  • Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.
  • the method further includes: using an image decompression model corresponding to the image compression model for compressing the image of each area, compressing the image obtained by compressing the image of each area Decompress the features to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain a restored image to be processed, and the optimization processing includes Adjust the edges of the decompressed image in each area.
  • each image decompression model used to deal with different texture complexity is spliced by the decompressed image of the region obtained by decompression, two adjacent decompressed images may appear line discontinuity after splicing or color differences, etc.
  • the degree of image distortion between the complete image after compression, decompression, and splicing processing and the image before processing can be made smaller.
  • the determining the texture complexity information corresponding to each area image in the multiple area images in the image to be processed includes: calculating the gradient of each pixel in each area image Size; according to the gradient size of each pixel, determine the texture complexity information of each region image.
  • the magnitude of the gradient of a pixel can be determined based on the brightness of the pixel or other representations of color.
  • the texture complexity of the region image can be represented by the median or average of the gradient sizes of each pixel in the region image.
  • the method further includes: dividing the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images Include all pixels in the image to be processed.
  • the multiple area images include all the pixels in the to-be-processed image and the multiple area images do not overlap, which can reduce the bit rate.
  • a neural network training method comprising: determining texture complexity information corresponding to each training area image in a plurality of training area images in a training image; Train texture complexity information, and determine the codec model corresponding to each training area image, wherein different texture complexity information corresponds to different codec models, and each of the codec models is used for the input of the training Compress the regional images, and decompress the compression results to obtain multiple decompressed training area images; adjust the parameters of the codec model according to the rate distortion obtained from the decompressed training area images and the training area images.
  • the codec corresponding to the texture complexity is trained, so that the compression performance of each codec for the image of the corresponding texture complexity is achieved. better.
  • the texture complexity of the images may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high. Dividing a complete image into multiple regions, so as to use the images of each region to train the codec, can make the training data more in line with the texture complexity corresponding to the image of the codec, thereby improving each codec Compression performance for pictures of corresponding texture complexity.
  • the multiple decompressed training area images are stitched and optimized through a fusion model to obtain the training restoration image, and the optimization process includes decompressing at least one of the decompressed images.
  • the edge of the training area image is adjusted; the parameters of the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the image processed by the codec can be less distorted after splicing.
  • the parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the neural network models required in the image processing process can be trained in an "end-to-end” fashion.
  • rate-distortion can be determined according to the degree of image distortion and the bit rate.
  • the bit rate is used to indicate the compression degree of the image, which can be determined according to the compression result of the codec model.
  • the fusion model can also adjust other regions other than the edges of the decompressed training region image.
  • the determining the texture complexity information corresponding to each training area image in the multiple training area images in the training image includes: calculating each pixel in each training area image according to the gradient size of each pixel, determine the texture complexity information of each training area image.
  • the method further includes: dividing the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images
  • the region image includes all the pixels in the training image.
  • an electronic device and an image processing apparatus are provided, which are characterized by comprising a storage module and a processing module; the storage module is used to store program instructions; when the program instructions are executed in the processor, the The processing module is used to: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; according to the texture complexity information corresponding to each area image, determine the corresponding The image compression model, wherein, different texture complexity information corresponds to different image compression models; the image compression model corresponding to each area image is used to compress each area image.
  • the processing module is further configured to: use an image decompression model corresponding to the image compression model for compressing the image of each area, compress the image of each area to obtain Decompress the image features of the multiple regions to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain the restored image to be processed, and the optimization processing includes Pixel adjustment is performed on the edges of the decompressed image in the multiple regions.
  • the processing module is further configured to: calculate the gradient size of each pixel in each area image; determine the gradient size of each area image according to the gradient size of each pixel Texture complexity information.
  • the processing module is further configured to: divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images
  • the area image includes all the pixels in the image to be processed.
  • a neural network training device comprising a storage module and a processing module; the storage module is used for storing program instructions, and when the program instructions are executed in the processor, the processing module is used for: Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image; according to the training texture complexity information corresponding to each training area image, determine the encoding corresponding to each training area image.
  • a decoding model wherein different texture complexity information corresponds to different encoding and decoding models, and each encoding and decoding model is used to compress the input image of the training area, and decompress the compression result to obtain multiple decompressed training areas image; adjust the parameters of the codec model according to the rate-distortion, and the rate-distortion is obtained from the decompressed training area image and the training area image.
  • the processing module is further configured to: perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes: Pixel adjustment is performed on the edge of at least one image of the decompressed training area; and the parameters of the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.
  • the parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
  • the processing module is further configured to: calculate the gradient size of each pixel in each training area image; determine each training area according to the gradient size of each pixel Texture complexity information for the image.
  • the processing module is further configured to: divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images do not overlap.
  • a training area image includes all pixels in the training image.
  • an electronic device comprising a memory and a processor, wherein the memory is used for storing program instructions; when the program instructions are executed in the processor, the processor is used for executing the first aspect or the first aspect The method described in the second aspect.
  • the processor in the fifth aspect above may be either a central processing unit (CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a graphics processor (graphics processing unit). unit, GPU), neural network processor (neural-network processing unit, NPU) and tensor processor (tensor processing unit, TPU) and so on.
  • TPU is Google's fully customized artificial intelligence accelerator application-specific integrated circuit for machine learning.
  • a computer-readable medium stores program code for execution by a device, the program code comprising a method for performing any one of the implementations of the first aspect or the second aspect .
  • a computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • a chip in an eighth aspect, includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementations of the first aspect or the second aspect.
  • the above chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image processing system.
  • FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a neural network training method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present application.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers.
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer. In short, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficients), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and offset vector The number is also higher.
  • the DNN Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
  • the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Recurrent neural networks are used to process sequence data.
  • RNN Recurrent neural networks
  • the layers are fully connected, and each node in each layer is unconnected.
  • this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • the pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color.
  • the pixel value is 256*Red+100*Green+76Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness.
  • the pixel values can be grayscale values.
  • Image compression refers to the technology of representing the original pixel matrix lossy or lossless with fewer bits, also known as image coding.
  • Image compression which performs transformations on the image content, can reduce the amount of data required to represent a digital image, thereby reducing the space occupied by image storage.
  • Image data can be compressed because there is redundancy in the data.
  • the redundancy of image data is mainly manifested as: spatial redundancy caused by the correlation between adjacent pixels in the image; temporal redundancy caused by the correlation between different frames in the image sequence; caused by the correlation of different color planes or spectral bands. spectrum redundancy.
  • the purpose of data compression is to reduce the number of bits required to represent data by removing these data redundancies. Due to the huge amount of image data, it is very difficult to store, transmit and process, so the compression of image data is very important.
  • Image decompression is the inverse process of image compression, which can also be called decompression or decoding.
  • image decoding the information format of the input compact representation can be restored as an image.
  • the peak signal-to-noise ratio (PSNR) between the original image and the encoded reconstructed image is used to measure the image distortion.
  • PSNR can be the PSNR of luminance or the linearity of PSNR of luminance and chrominance. combination.
  • the PSNR (Y-PSNR) of the luminance is used as the main criterion.
  • the peak signal is the maximum value of the pixel in the image (for example, the maximum value of the pixel brightness)
  • the noise refers to the mean square error of each pixel value in the original image and the reconstructed image (the square of the difference is averaged); the ratio of the two is converted.
  • PSNR PSNR.
  • the code rate (rate), also known as the encoding code rate, can be the average data amount ((bit-per-pixel, bpp) of each pixel in the compressed data, which is used to indicate the degree of data compression.
  • the code rate can be Determined according to the proportion of the data volume after image compression.
  • Rate distortion is used to express the relationship between image distortion and bit rate.
  • Rate distortion optimization refers to reducing image distortion and bit rate as much as possible according to preset rules. That is to say, in the case of a bit rate as small as possible, the distortion of the obtained image can be reduced as much as possible, so as to achieve a better compression effect.
  • rate-distortion optimization a balance point can be found between bit rate and distortion, so that the compression effect is optimal.
  • the rule for rate-distortion optimization may also be that the distortion is the smallest when the code rate is guaranteed to be less than the upper limit, or the code rate is the smallest when the distortion is guaranteed to be less than the lower limit, and so on.
  • Rate-distortion can be calculated by the rate-distortion function.
  • an embodiment of the present application provides a system architecture 100 .
  • a data collection device 160 is used to collect training data.
  • the training data may include training images.
  • the data collection device 160 After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
  • the training device 120 processes the input original image and compares the output image with the original image until the training device 120 outputs the image and the original image.
  • the rate-distortion determined by the difference is less than a certain threshold, so that the training of the target model/rule 101 is completed.
  • the above target model/rule 101 can be used to implement the image processing method of the embodiment of the present application.
  • the target model/rule 101 in this embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application Limitations of Examples.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptops, augmented reality (AR) AR/virtual reality (VR), in-vehicle terminals, etc., can also be servers or the cloud.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: an image to be processed input by the client device.
  • the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112.
  • the preprocessing module 113 and the preprocessing module may also be absent.
  • 114 or only one of the preprocessing modules, and directly use the calculation module 111 to process the input data.
  • the execution device 110 When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the above-obtained image classification result, to the client device 140 so as to be provided to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.
  • the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the target model/rule 101 is obtained by training the training device 120.
  • the target model/rule 101 may be the neural network in the present application in this embodiment of the present application.
  • the neural network may be used in this embodiment of the present application.
  • CNN deep convolutional neural network
  • DCNN deep convolutional neural networks
  • RNN recurrent neural network
  • CNN is a very common neural network
  • a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture.
  • a deep learning architecture refers to an algorithm based on machine learning. learning at multiple levels of abstraction.
  • CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.
  • a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 .
  • the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed by the convolution layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained.
  • the internal layer structure in the CNN 200 in Figure 2 is described in detail below.
  • the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 may include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted convolution feature maps with the same size are combined to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .
  • the initial convolutional layer eg, 221
  • the features extracted by the later convolutional layers eg, 226 become more and more complex, such as features such as high-level semantics.
  • features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multi-layer hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to classification cross entropy, and is specifically used to calculate the prediction error,
  • the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation)
  • the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
  • a convolutional neural network (CNN) 200 may include an input layer 110 , a convolutional/pooling layer 120 (where the pooling layer is optional), and a neural network layer 130 .
  • CNN convolutional neural network
  • FIG. 3 Compared with FIG. 2 , multiple convolution layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 3 are parallel, and the extracted features are input to the full neural network layer 130 for processing.
  • the convolutional neural networks shown in FIG. 2 and FIG. 3 are only examples of two possible convolutional neural networks of the image processing method according to the embodiment of the present application.
  • the convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • FIG. 4 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50 .
  • the chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 .
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 .
  • the algorithms of each layer in the convolutional neural network shown in Figures 2 and 3 can be implemented in the chip shown in Figure 4.
  • the neural network processor NPU 50 is mounted on the main central processing unit (CPU) (host CPU) as a coprocessor, and tasks are allocated by the main CPU.
  • the core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract the data in the memory (weight memory or input memory) and perform operations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 503 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers it on each PE in the operation circuit.
  • the arithmetic circuit fetches the data of the matrix A from the input memory 501 and performs the matrix operation on the matrix B, and stores the partial result or the final result of the matrix in the accumulator 508.
  • the vector calculation unit 507 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector computing unit 507 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • vector computation unit 507 can store the processed output vectors to unified buffer 506 .
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 507 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 503, eg, for use in subsequent layers in a neural network.
  • Unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • a bus interface unit (BIU) 510 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 509 through the bus.
  • the instruction fetch memory (instruction fetch buffer) 509 connected with the controller 504 is used to store the instructions used by the controller 504;
  • the controller 504 is used for invoking the instructions cached in the memory 509 to control the working process of the operation accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip (On-Chip) memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access Memory (double data rate synchronous dynamic random access memory, referred to as DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • DDR SDRAM double data rate synchronous dynamic random access Memory
  • HBM high bandwidth memory
  • HBM high bandwidth memory
  • each layer in the convolutional neural network shown in FIG. 2 and FIG. 3 may be performed by the operation circuit 503 or the vector calculation unit 507 .
  • the execution device 110 in FIG. 1 described above can execute each step of the image processing method of the embodiment of the present application.
  • the CNN model shown in FIG. 2 and FIG. 3 and the chip shown in FIG. 4 can also be used to execute the implementation of the present application.
  • the steps of the image processing method of the example The method for training a neural network according to the embodiment of the present application and the image processing method according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • an embodiment of the present application provides a system architecture 300 .
  • the system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.
  • the execution device 210 may be implemented by one or more servers.
  • the execution device 210 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 210 may be arranged on one physical site, or distributed across multiple physical sites.
  • the execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the image processing method in this embodiment of the present application.
  • the execution device 210 may perform the following process: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; determine the texture complexity information corresponding to each area image according to the texture complexity information An image compression model corresponding to each area image, wherein different texture complexity information corresponds to different image compression models; each area image is compressed by using the image compression model corresponding to each area image.
  • the image compression effect of the to-be-processed image can be improved by using the compression model corresponding to the texture complexity of the region image for the region image with different texture complexity.
  • a user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 .
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and the like.
  • Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • any communication mechanism/standard communication network which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network for image classification Or image processing, etc.
  • the target neural network can be directly deployed on the execution device 210, and the execution device 210 obtains the images to be processed from the local device 301 and the local device 302, and classifies the images to be processed or other types of images according to the target neural network deal with.
  • the above execution device 210 may also be a cloud device, in this case, the execution device 210 may be deployed in the cloud; or, the above execution device 210 may also be a terminal device, in this case, the execution device 210 may be deployed on the user terminal side, the embodiment of the present application This is not limited.
  • FIG. 6 shows a schematic structural diagram of an image processing system.
  • the image processing system 600 includes an encoder 610 and a quantization module 620 at the encoding end, and a decoder 630 at the decoding end.
  • the encoder 610 and the decoder 630 are neural networks.
  • the image processing system 600 may be applied in the scenario of transmitting and storing images.
  • the encoder 610 and the quantization module 620 at the encoding end may be set in a server in the cloud.
  • Image data performs an image encoding process in the cloud, resulting in a compact representation of compressed data.
  • Storing compressed data can reduce the storage space occupied by saving images.
  • the transmission of compressed data can reduce the occupation of transmission resources in the image transmission process and reduce the demand for bandwidth.
  • the decoder 630 on the decoding side may be provided in a terminal device serving as a client.
  • the decoding end performs a decoding operation on the compressed data to obtain a reconstructed image.
  • the terminal device can display the reconstructed image through a display.
  • the encoder 610 is used for extracting features of the image to be processed, so as to obtain image features.
  • the quantization module 620 is used for quantizing image features to obtain compressed data. Quantization, that is, the process of approximating a continuous value of a signal (or a large number of possible discrete values) into a finite number (or fewer) of discrete values in the field of digital signal processing.
  • the cloud can transmit the compressed data to the client.
  • the decoder 630 is used to decompress the compressed data to obtain a reconstructed image.
  • the image processing system 600 uses the entire image as input, and performs nonlinear transformation on the image to reduce the correlation between codewords and improve the compression performance of the neural network.
  • Texture in computer graphics includes both the texture of the surface of the object in the usual sense, even if the surface of the object exhibits uneven grooves, and the color pattern on the smooth surface of the object. Texture complexity can be used to reflect how strongly pixel values in an image are transformed. Different categories of images have significant differences in texture details and other characteristics, and image content characteristics with different texture complexity are quite different.
  • the image processing system 600 uses the same encoder to perform the same processing for images with different texture complexities, which hinders further improvement of the compression performance.
  • embodiments of the present application provide an image processing system to improve image compression performance.
  • FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • the neural network corresponding to the image texture complexity is selected from multiple neural networks.
  • the structure realizes the adaptive adjustment of the selection of the neural network according to the image content, and compresses the image according to different texture characteristics. A further improvement in image compression performance is achieved.
  • Image processing system 700 includes compression system 710 and decompression system 720 .
  • the compression system 710 includes a segmentation model 711 , a classification model 712 , and a compression model 713
  • the decompression system 720 includes a decompression module 721 and a fusion module 722 .
  • Compression system 710 and decompression system 720 may be located in the same or different devices.
  • the image to be processed is input into the compression system 710, and the compression system 710 is used for compressing the image to be processed.
  • the segmentation model 711 can segment the to-be-processed image to obtain multiple region images.
  • Area images can also be referred to as image blocks.
  • the sizes of the plurality of area images may be the same or different.
  • the image can be segmented according to the target size to obtain multiple region images with the same size.
  • the multiple region images do not overlap.
  • the to-be-processed image can be divided into multiple 128 ⁇ 128 area images. Through the non-overlapping division of multiple images to be processed, a plurality of area images with non-repetitive contents are formed.
  • the region images are input into the classification model 712.
  • the classification model 712 is used to calculate the texture complexity of the input image.
  • the gradient of each pixel in the image can be calculated, so as to realize the first-order differential operation of the image, and consider the directionality.
  • the direction of the gradient of the image is at the maximum change rate of the image gray level, which can reflect the gray level change on the edge of the image.
  • the gradient operator always points in the direction of the most drastic transformation.
  • the direction of the gradient operator is orthogonal to the edges in the image.
  • the size of the gradient operator represents the rate of change of the grayscale of the image.
  • the classification model 712 can calculate the difference value of the luminance of each pixel in the horizontal direction and the vertical direction for the input image. According to the difference value of the brightness of each pixel in the horizontal direction and the vertical direction, the gradient size of the pixel can be calculated. From the magnitude of the gradient for each pixel, the average gradient magnitude or the median of the gradient magnitudes for the individual pixels in the image can be determined. The average gradient size of the image or the median can indicate how smooth the image is, reflecting the texture complexity of the image.
  • the compression module 713 is configured to perform image feature extraction on the image according to the texture complexity of the image by using a compression model corresponding to the texture complexity, so as to realize the compression of the image.
  • the compression module 713 may include an AI model for extracting features, called a compression model (also called an image compression model), or the compression module 713 may invoke the compression model through an interface to extract image features.
  • the compression model can be a neural network model that is pre-trained.
  • the region image can be input into the compression model to obtain the image features of the region image.
  • the compression model may be, for example, CNN, RNN, or the like.
  • the compression module 713 may store the correspondence between the texture complexity and the compression model. Therefore, according to the texture complexity of the area image, the compression module 713 may determine a compression model corresponding to the texture complexity from a plurality of compression models, and process the area image.
  • the image features of each regional image can be obtained.
  • Compression system 710 may also include quantization models and the like.
  • the quantization model can quantify image features.
  • the decompression system 720 is used to decompress the processing result of the compression system 710 .
  • the image features processed by the compression system 710 are input into the decompression module 721 .
  • the image features processed by the compression system 710 may be the image features output by the compression module 713 .
  • the compression system 710 may further include a quantization model, the image features processed by the compression system 710 may be quantized image features.
  • the decompression module 721 is used to decompress the image features to obtain a decompressed image.
  • the decompression module 721 may be configured to perform image decompression on the image features according to the texture complexity of the image using a decompression model corresponding to the texture complexity to obtain a decompressed image.
  • the decompression module 721 may receive indication information, where the indication information is used to indicate the decompression model corresponding to each image feature.
  • the decompression module 721 can decompress the image feature by using the decompression model indicated by the indication information.
  • the decompression module 721 may include an AI model for decompressing image features, called a decompression model or an image decompression model, or the decompression module 721 may invoke the decompression model through an interface to decompress the image.
  • the decompression model can be a neural network model that is pre-trained.
  • the image feature and the image texture complexity corresponding to the image feature can be input into the decompression model to obtain the decompressed image of the region.
  • the decompression model can be, for example, CNN, RNN, or the like.
  • the decompression module 721 may store the correspondence between the texture complexity and the decompression model. Therefore, according to the texture complexity of the regional image, the decompression module 721 can determine a decompression model corresponding to the texture complexity from a plurality of compression models, and process the image feature.
  • the restored images of each region can be obtained.
  • the fusion module 722 is used to fuse the restored regional images.
  • the fusion module 722 may include an AI model for image fusion, which is called a fusion model, or the fusion module 722 may call the fusion model through an interface to realize fusion of regional images.
  • the fusion model can be a neural network model that is pre-trained.
  • the restored image of each region can be input into the fusion model to obtain the fused image.
  • the fused image may also be referred to as a reconstructed image or a compressed reconstructed image.
  • the fusion model can be, for example, a CNN or the like.
  • the fusion of regional images can be splicing the regional images.
  • the fusion of the regional images may also include adjustment of the edge pixels of the regional images, so that the error between the reconstructed image and the image to be processed is smaller and the degree of distortion is reduced.
  • the image processing system 700 uses different compression models and decompression models to process data by calculating the texture complexity of the regional images under the condition that the texture complexity of the regional images is different, thereby improving the image compression performance.
  • the image processing system 700 divides the image to be processed and calculates the texture complexity of the images in different regions, so that the foreground and background of the image to be processed can be processed using different compression models and decompression models, thereby improving image compression performance.
  • the decompression system 720 in the image processing system 700 reduces the degree of image distortion and improves the image compression performance by adjusting the edge pixels of the regional image during regional image fusion.
  • Each AI model used in the image processing system 700 may be obtained through end-to-end training; alternatively, the compression model and the decompression model may be trained first, and then the fusion model may be trained.
  • the training method of the AI model adopted in the image processing system 700 reference may be made to the description of FIG. 8 .
  • End-to-end training is a machine learning paradigm.
  • the entire learning process does not divide artificial sub-problems, but is completely handed over to the deep learning model to directly learn the mapping from the original data to the desired output.
  • FIG. 8 is a schematic structural diagram of a neural network model training method provided by an embodiment of the present application.
  • the training images to be processed can be obtained.
  • the complete training image to be processed can be divided to obtain multiple training area images.
  • the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
  • Each of the encoding and decoding models is used to compress the input images of the training area, and decompress the compression result, thereby obtaining multiple decompressed images of the training area.
  • the codec model includes a compression model and a decompression model.
  • the compression model is used to compress the image
  • the decompression model is used to decompress the processing result of the compression model.
  • Each training area image is input into an encoding/decoding model corresponding to the training area image, and the encoding/decoding model processes the input training area image to obtain a decoded training area image corresponding to the training area image.
  • the compression model compresses the images in the training area to obtain the training features of the images in the training area.
  • the decompression model decodes the training features of the training area image to obtain the decompressed training area image corresponding to the training area image.
  • the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  • the encoding and decoding model can be trained by using the training images.
  • the training area image is processed by using the codec model adjusted by the parameters each time until the rate distortion gradually converges, so as to obtain the codec model that has been trained.
  • a fusion model can be used to stitch and optimize the decompressed training area images corresponding to each training area image input to the fusion model to obtain a training restoration image.
  • This embodiment of the present application does not limit the sequence of the splicing process and the optimization process.
  • the optimization process includes adjustments to the edge regions of the decompressed training images.
  • the optimization process may also include adjustments to regions other than the border regions of the decompressed training image.
  • the adjustment in the optimization process is the adjustment of the color, for example, the brightness and chromaticity can be adjusted.
  • the parameters of the fusion model can be adjusted according to the image distortion degree of the training recovery image and the training image to be processed to complete the training of the fusion model.
  • the fusion model can stitch the decompressed training images corresponding to the images in each training area, and modify and adjust the pixels located in the edge area of the decompressed training images to obtain the training recovery image.
  • the training recovery image is the recovered training image to be processed.
  • the degree of image distortion can be determined.
  • the bit rate can be determined according to the average amount of data per pixel in the compression result.
  • the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the image distortion degree is reduced when the bit rate meets the preset conditions.
  • the parameters of the fusion model and the encoding/decoding model corresponding to each training area image may also be adjusted according to the code rate. After that, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the bit rate is reduced when the image distortion degree satisfies the preset condition.
  • the compression performance can be reflected as a whole through rate-distortion.
  • the parameters of the fusion model and the codec model corresponding to each training area image can be adjusted to minimize rate-distortion.
  • the codec model and fusion model after parameter adjustment are used for processing each time until the rate-distortion gradually converges, so as to obtain the codec model and fusion model after training.
  • the training of the AI model in the multi-image processing system 700 can be implemented in an "end-to-end” manner.
  • the codec model may also be pre-trained first, and the fusion model may be trained by using the pre-trained codec model.
  • training images to be processed may be acquired.
  • the to-be-processed training image can be divided to obtain multiple training region images.
  • the codec model corresponding to the training area image can be used for compression, and the compression result can be decompressed to determine the bit rate and the image distortion rate, thereby determining the rate distortion.
  • the rate-distortion is optimized by adjusting the parameters of the codec model.
  • a large number of training images to be processed are processed to obtain a large number of training region images to cover images of each texture complexity.
  • each texture complexity each time the codec model after parameter adjustment is used to process the image of the training area of the texture complexity until the rate-distortion gradually converges, so as to obtain each codec model that has been pre-trained.
  • the pre-trained encoding and decoding model is used to process the to-be-processed training image, so as to obtain the decompressed training area image corresponding to each training area image in the to-be-processed training image.
  • the fusion model is used to fuse the decompressed training images corresponding to the images of each training area in a to-be-processed training image to obtain a training recovery image.
  • the pre-trained codec model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained.
  • the completed fusion model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained.
  • the parameters of the codec model may also be adjusted to obtain each AI model that has been trained in the image processing system 700 .
  • the encoding and decoding models corresponding to each texture complexity are trained by using the training area images of different texture complexities.
  • Using the trained codec model can realize differential processing of regional images with different texture complexity, thereby improving the overall image compression performance.
  • the code rate and the image distortion degree can be calculated according to the decompressed training area image and the training area image, so as to determine the rate distortion, and adjust the parameters of the codec model according to the rate distortion, so as to complete the training of the codec model or pretrained. After that, use the trained or pre-trained encoder-decoder model to train the fusion model.
  • the training restoration image may also be determined by processing the fusion model according to the decompressed training area image. Calculate the bit rate according to the decompressed training area image. According to the training recovery image and the training image, the image distortion degree is calculated, and the rate distortion is determined according to the bit rate and the image distortion degree. After that, the parameters of the encoder-decoder model and the fusion model can be adjusted according to the rate-distortion to complete the training of the encoder-decoder model and the fusion model.
  • FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the method 900 shown in FIG. 9 can be executed by an image processing device, and the image processing device can be a mobile terminal, and the computing power of a computer, a server, etc. is sufficient for the image processing device.
  • the method 900 may be specifically applied in the fields of image transmission, graphics storage, etc. that need to compress images.
  • the method includes S910 to S930, and these steps are described in detail below.
  • an image compression model corresponding to each regional image is determined according to the texture complexity information corresponding to each regional image.
  • the difference value of each pixel of the area image in two different directions can be calculated. Taking the horizontal direction of the area image as the x-axis and the vertical direction as the y-axis, a plane rectangular coordinate system is established.
  • the area image is stored in the form of a two-dimensional array, and the difference value of the pixel (i, j) in the x direction is:
  • the difference value of the pixel (i, j) in the x direction is:
  • p(i, j) may be the brightness of the pixel (i, j), or may be other parameters used to represent the color of the pixel.
  • Calculate the gradient of each pixel (i, j) in the area image according to the difference values in these two directions can be represented by a vector: (dx(i,j),dy(i,j) ).
  • the gradient size Grad(i, j) of pixel (i, j) is:
  • the average value of the gradient magnitude of the region image can be obtained:
  • W represents the number of pixels in the area image in the x direction
  • H represents the number of pixels in the area image in the y direction.
  • the gradient mean value G of the area image can be used to represent and evaluate the texture complexity of the area image.
  • the gradient size (which can also be referred to as the gradient length) of each pixel (i, j) Grad(i, j) can also be expressed as:
  • the image compression model corresponding to the first texture complexity information of each region image may be determined according to the corresponding relationship between the texture complexity and the compression model.
  • the correspondence between the texture complexity information and the image compression model may include two or more types of texture complexity information, and a compression model corresponding to each texture complexity information.
  • the texture complexity of the image can be determined to be complex; when the average value of the gradient size of the image is smaller than the preset value, the texture complexity of the image can be determined to be simple . Therefore, the compression model corresponding to the image can be determined according to the average value of the gradient size of the image.
  • the correspondence between the texture complexity and the compression model is the same as the correspondence between the texture complexity and the compression model used when training the first compression model used in the method 900 .
  • each area image is compressed using an image compression model corresponding to the each area image.
  • image compression models corresponding to the image texture complexities can be used to compress each region image in the to-be-processed image, thereby improving the image compression effect of the to-be-processed image.
  • the image to be processed may be a complete image, for example, a photo captured by a camera, or a frame of image in a video.
  • the multiple region images may include all pixels in the to-be-processed image, thereby reducing image distortion and improving compression performance.
  • the texture complexity of the image may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high.
  • the regions with different texture complexities in the image to be processed can be compared with the texture of the image.
  • Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.
  • the embodiments of the present application provide a more flexible image processing manner.
  • the image to be processed is divided into multiple area images, and each area image is compressed to obtain compressed data.
  • each area image is compressed to obtain compressed data.
  • the compressed data corresponding to each regional image may be decompressed separately.
  • Each area image is compressed using an image compression model corresponding to the area image to obtain the image features of the area image.
  • the decompression model corresponding to the image compression model during compression should be used for decompression to obtain a regional decompressed image, and the regional decompressed image can also be understood as a restored regional image obtained by decompression.
  • the region decompressed images corresponding to each region image in the to-be-processed image are spliced, so that the to-be-processed image can be restored.
  • optimization processing can be performed on the decompressed images of each region.
  • the optimization process may include adjusting the border regions of the decompressed image for one or more regions.
  • the second images corresponding to the images in each region may be obtained by different decompression models, in the edge region of two adjacent second images, discontinuous lines or color differences may appear after splicing.
  • the pixels of the edge region of one or more second images may be adjusted.
  • the optimization process may also include adjustments to regions other than the edge regions of the region-decompressed image.
  • Optimization processing may be performed before or after splicing, which is not limited in this embodiment of the present application.
  • the pixels in the edge region of the second image may be stitched and optimized using the fusion model.
  • the pixels of one or more edge regions of the second image are adjusted, which can further reduce the degree of image distortion and improve the image compression effect.
  • FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
  • Using the image processing method 900 to compress and decompress images can achieve better image compression performance.
  • the image to be processed into a plurality of area images of equal size.
  • the plurality of area images include all pixels of the image to be processed.
  • the compression model and the decompression model corresponding to the image can be determined according to the average value of the gradient size of the image, the image is compressed, and the compression result is decompressed.
  • the image processing method provided by the embodiment of the present application adopts the multi-model processing method, which can In the case of the same bit rate, the PSNR is effectively improved, and the image distortion is lower.
  • the image processing system provided by the embodiment of the present application, the AI model training method required by the image processing system, and the image processing method are described above with reference to FIGS. 1 to 10 .
  • FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • the image processing apparatus 2000 includes a storage module 2010 and a processing module 2020 .
  • the storage module 2010 is used to store program instructions.
  • the processing module 2020 is configured to:
  • the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models
  • Each area image is compressed by using an image compression model corresponding to each area image.
  • the processing module 2020 is further configured to use an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area.
  • an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area.
  • the region decompressed image corresponding to each region image is described.
  • the processing module 2020 is further configured to perform stitching processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images of the multiple regions.
  • the processing module 2020 is further configured to calculate the gradient size of each pixel in each regional image.
  • the processing module 2020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each regional image.
  • the processing module 2020 is further configured to divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images include the image to be processed. All pixels.
  • FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
  • the neural network training apparatus 3000 includes a storage module 3010 and a processing module 3020 .
  • the storage module 3010 is used to store program instructions.
  • the processing module 3020 is configured to:
  • the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model
  • the decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;
  • the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  • the processing module 3020 is further configured to perform stitching processing and optimization processing on the multiple decompressed training area images by using a fusion model, and the optimization processing includes performing pixel adjustment on the edges of the multiple decompressed training area images.
  • the processing module 3020 is further configured to adjust the parameters of the fusion model according to the degree of image distortion between the training restored image and the training image.
  • the processing module 3020 is further configured to adjust parameters of the encoding and decoding model according to the degree of image distortion between the training restoration image and the training image.
  • the processing module 3020 is further configured to calculate the gradient size of each pixel in each training area image.
  • the processing module 3020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each training area image.
  • the processing module 3020 is further configured to divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images include the training area images. of all pixels.
  • FIG. 13 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 4000 shown in FIG. 13 includes a memory 4001 , a processor 4002 , a communication interface 4003 , and a bus 4004 .
  • the memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.
  • the memory 4001 may be ROM, static storage device and RAM.
  • the memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the image processing method of the embodiment of the present application.
  • the processor 4002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be performed by the units in the image processing apparatus of the embodiments of the present application, Or execute the image processing method of the method embodiment of the present application.
  • the processor 4002 may also be an integrated circuit chip with signal processing capability, for example, the chip shown in FIG. 4 .
  • each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.
  • the above-mentioned processor 4002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001, and combines its hardware to complete the functions required to be performed by the units included in the image processing apparatus of the embodiments of the present application, or to perform the image processing of the method embodiments of the present application. method.
  • the communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver.
  • a transceiver device such as, but not limited to, a transceiver.
  • the image to be processed can be acquired through the communication interface 4003 .
  • Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).
  • FIG. 14 is a schematic diagram of a hardware structure of a neural network training apparatus according to an embodiment of the present application. Similar to the above-mentioned apparatus 3000 and apparatus 4000 , the neural network training apparatus 5000 shown in FIG. 14 includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 . The memory 5001 , the processor 5002 , and the communication interface 5003 are connected to each other through the bus 5004 for communication.
  • the neural network can be trained by the neural network training apparatus 5000 shown in FIG. 14 , and the neural network obtained by training can be used to execute the image processing method of the embodiment of the present application.
  • the apparatus shown in FIG. 14 can obtain training data and the neural network to be trained from the outside through the communication interface 5003, and then the processor can train the neural network to be trained according to the training data.
  • apparatus 4000 and apparatus 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may also include the necessary components for normal operation. of other devices. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may only include the necessary devices for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 13 and FIG. 14 .
  • the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory Fetch memory
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “plurality” means two or more.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

An image processing method and device, and a neutral network training method and device. The image processing method comprises: determining texture complexity information corresponding to each of a plurality of regional images in an image to be processed (S910); determining, according to the texture complexity information corresponding to each regional image, an image compression model corresponding to each regional image (S920), different texture complexity information corresponding to different image compression models; and using the image compression model corresponding to each regional image to compress each regional image (S930). According to the method, the compression model corresponding to the texture complexity of each of the regional images having different texture complexities in the image to be processed is used to compress the regional image, which improves the overall compression effect of the image to be processed.

Description

图像处理方法与装置、神经网络训练的方法与装置Image processing method and device, and method and device for neural network training
本申请要求于2020年7月30日提交中国专利局、申请号为202010754067.X、申请名称为“图像处理方法与装置、神经网络训练的方法与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 30, 2020 with the application number 202010754067.X and the application name "image processing method and device, neural network training method and device", all of which The contents are incorporated herein by reference.
技术领域technical field
本申请涉及人工智能领域,并且更具体地,涉及一种图像处理方法与装置、神经网络训练的方法与装置。The present application relates to the field of artificial intelligence, and more particularly, to an image processing method and apparatus, and a method and apparatus for neural network training.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
图像压缩能够减小图像数据中的冗余信息,因此,图像压缩对于提高图像的存储效率和传输效率有着重要意义。传统的图像压缩方法如联合图像专家组(joint photographic experts group,JPEG)在中高码率区域有较好的压缩效果,在低码率区域,传统的图像压缩方法的压缩效果不够理想。Image compression can reduce redundant information in image data. Therefore, image compression is of great significance to improve the storage efficiency and transmission efficiency of images. Traditional image compression methods such as Joint Photographic Experts Group (JPEG) have good compression effects in medium and high bit rate regions, but in low bit rate regions, the compression effects of traditional image compression methods are not ideal.
可以通过神经网络对图像进行压缩,该方法主要利用神经网络和对应的非线性变换提取图像特征,从而达到压缩目的。这种方法相比传统的图像压缩方法,可以省去复杂的参数设计和模块设计。在解压时,可以采用神经网络进行解码以重构图像。如何提高神经网络对图像的压缩性能,成为一个亟需解决的技术问题。The image can be compressed by neural network. This method mainly uses neural network and corresponding nonlinear transformation to extract image features, so as to achieve the purpose of compression. Compared with the traditional image compression method, this method can save complicated parameter design and module design. During decompression, a neural network can be used for decoding to reconstruct the image. How to improve the image compression performance of neural network has become a technical problem that needs to be solved urgently.
发明内容SUMMARY OF THE INVENTION
本申请提供一种图像处理方法与装置、及神经网络训练的方法与装置,能够提高神经网络对图像的压缩效果。The present application provides an image processing method and device, and a neural network training method and device, which can improve the image compression effect of the neural network.
第一方面,提供了一种图像处理方法,所述方法包括:确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。In a first aspect, an image processing method is provided, the method comprising: determining texture complexity information corresponding to each area image in a plurality of area images in an image to be processed; according to the texture complexity corresponding to each area image degree information, determine the image compression model corresponding to the image of each area, wherein, different texture complexity information corresponds to different image compression models; use the image compression model corresponding to the image of each area to compress the image of each area to compress.
一张完整的图像的不同区域中,图像的纹理复杂度可能并不相同。例如,在天空、海 滩等背景区域,图像的纹理复杂度较低;在包括人物等目标的兴趣区域或前景区域,图像复杂度较高。In different regions of a complete image, the texture complexity of the image may not be the same. For example, in the background area such as sky and beach, the texture complexity of the image is low; in the area of interest or foreground area including people and other objects, the image complexity is high.
通过将待处理图像划分为多个区域图像,对每个区域图像分别使用与该图像纹理复杂度相对应的压缩模型进行图像的压缩,可以对待处理图像中不同纹理复杂度的区域进行与该纹理复杂度相适应的压缩处理,以提高对待处理图像整体的压缩效果。By dividing the image to be processed into multiple regional images, and compressing each regional image using a compression model corresponding to the texture complexity of the image, the regions with different texture complexities in the image to be processed can be compared with the texture of the image. Complexity-adapted compression processing to improve the overall compression effect of the image to be processed.
结合第一方面,在一些可能的实现方式中,所述方法还包括:利用与压缩所述每个区域图像的图像压缩模型对应的图像解压模型,对所述每个区域图像压缩后得到的图像特征进行解压缩,得到所述每个区域图像对应的区域解压图像;对所述多个区域解压图像进行拼接处理和优化处理,得到恢复后的待处理图像,所述优化处理包括对所述多个区域解压图像的边沿进行调整。With reference to the first aspect, in some possible implementations, the method further includes: using an image decompression model corresponding to the image compression model for compressing the image of each area, compressing the image obtained by compressing the image of each area Decompress the features to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain a restored image to be processed, and the optimization processing includes Adjust the edges of the decompressed image in each area.
由于对一张完整的图像进行了划分,将各个用于处理不同纹理复杂度的图像解压模型通过解压得到的区域解压图像进行拼接,两个相邻的区域解压图像在拼接之后可能出现线条不连续或颜色差异等。通过对一个或多个区域解压图像的边沿区域的像素进行调整,可以使得对经过压缩、解压、拼接处理后的完整图像与处理前的图像之间的图像失真度更小。Since a complete image is divided, each image decompression model used to deal with different texture complexity is spliced by the decompressed image of the region obtained by decompression, two adjacent decompressed images may appear line discontinuity after splicing or color differences, etc. By adjusting the pixels of the edge area of the decompressed image in one or more regions, the degree of image distortion between the complete image after compression, decompression, and splicing processing and the image before processing can be made smaller.
结合第一方面,在一些可能的实现方式中,所述确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息,包括:计算每个区域图像中每个像素的梯度大小;根据所述每个像素的梯度大小,确定每个区域图像的纹理复杂度信息。With reference to the first aspect, in some possible implementations, the determining the texture complexity information corresponding to each area image in the multiple area images in the image to be processed includes: calculating the gradient of each pixel in each area image Size; according to the gradient size of each pixel, determine the texture complexity information of each region image.
可以根据像素的亮度或其他颜色的表示方式,确定像素的梯度大小。The magnitude of the gradient of a pixel can be determined based on the brightness of the pixel or other representations of color.
可以利用该区域图像中各个像素的梯度大小的中位数或平均数表示该区域图像的纹理复杂度。The texture complexity of the region image can be represented by the median or average of the gradient sizes of each pixel in the region image.
结合第一方面,在一些可能的实现方式中,所述方法还包括:将所述待处理图像划分为所述多个区域图像,所述多个区域图像不重叠,且所述多个区域图像包括所述待处理图像中的全部像素。With reference to the first aspect, in some possible implementations, the method further includes: dividing the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images Include all pixels in the image to be processed.
多个区域图像包括所述待处理图像中的全部像素并且该多个区域图像不重叠,可以降低码率。The multiple area images include all the pixels in the to-be-processed image and the multiple area images do not overlap, which can reduce the bit rate.
第二方面,提供一种神经网络训练方法,所述方法包括:确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息;根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型,其中,不同的纹理复杂度信息对应不同的编解码模型,每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压得到多个解压训练区域图像;根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。In a second aspect, a neural network training method is provided, the method comprising: determining texture complexity information corresponding to each training area image in a plurality of training area images in a training image; Train texture complexity information, and determine the codec model corresponding to each training area image, wherein different texture complexity information corresponds to different codec models, and each of the codec models is used for the input of the training Compress the regional images, and decompress the compression results to obtain multiple decompressed training area images; adjust the parameters of the codec model according to the rate distortion obtained from the decompressed training area images and the training area images.
通过在训练过程中,针对每种纹理复杂度的训练区域图像,训练与该纹理复杂度相对应的编解码器,使得每个编解码器对于与之相对应的纹理复杂度的图片的压缩性能更好。During the training process, for the training area images of each texture complexity, the codec corresponding to the texture complexity is trained, so that the compression performance of each codec for the image of the corresponding texture complexity is achieved. better.
一张完整的图像中,图像的纹理复杂度可能并不相同。例如,在天空、海滩等背景区域,图像的纹理复杂度较低;在包括人物等目标的兴趣区域或前景区域,图像复杂度较高。将一张完整图像划分为多个个区域,从而利用每个区域的图像对编解码器进行训练,能够使得训练数据更加符合编解码器的图像对应的纹理复杂度,从而提升每个编解码器对于与之相对应的纹理复杂度的图片的压缩性能。In a complete image, the texture complexity of the images may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high. Dividing a complete image into multiple regions, so as to use the images of each region to train the codec, can make the training data more in line with the texture complexity corresponding to the image of the codec, thereby improving each codec Compression performance for pictures of corresponding texture complexity.
结合第二方面,在一些可能的实现方式中,通过融合模型对所述多个解压训练区域图 像进行拼接处理和优化处理,得到所述训练恢复图像,所述优化处理包括对至少一个所述解压训练区域图像的边沿进行调整;根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述融合模型的参数。With reference to the second aspect, in some possible implementations, the multiple decompressed training area images are stitched and optimized through a fusion model to obtain the training restoration image, and the optimization process includes decompressing at least one of the decompressed images. The edge of the training area image is adjusted; the parameters of the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.
通过采用融合模型,能够使得编解码器处理后的图像在拼接后图像失真度更小。By adopting the fusion model, the image processed by the codec can be less distorted after splicing.
还可以根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述编解码模型的参数。The parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
可选地,可以通过“端到端”的方式训练图像处理过程中所需的神经网络模型。Optionally, the neural network models required in the image processing process can be trained in an "end-to-end" fashion.
根据训练恢复图像与所述待处理训练图像的图像失真度,计算率失真,从而根据率失真调整编解码模型、融合模型的参数,完成对编解码模型和融合模型的训练,从而可以实现对图像处理过程中所需的神经网络模型的“端到端”的训练,降低了神经网络训练的复杂度。Calculate the rate distortion according to the image distortion degree of the training recovery image and the to-be-processed training image, so as to adjust the parameters of the codec model and the fusion model according to the rate distortion, and complete the training of the codec model and the fusion model, so that the image The "end-to-end" training of the neural network model required during processing reduces the complexity of neural network training.
应当理解,率失真可以根据图像失真度和码率确定。码率用于表示图像的压缩程度,可以根据编解码模型的压缩结果确定。It should be understood that the rate-distortion can be determined according to the degree of image distortion and the bit rate. The bit rate is used to indicate the compression degree of the image, which can be determined according to the compression result of the codec model.
应当理解,融合模型还可以对解压训练区域图像的边沿之外的其他区域进行调整。It should be understood that the fusion model can also adjust other regions other than the edges of the decompressed training region image.
结合第二方面,在一些可能的实现方式中,所述确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息,包括:计算每个训练区域图像中每个像素的梯度大小;根据所述每个像素的梯度大小,确定每个训练区域图像的纹理复杂度信息。With reference to the second aspect, in some possible implementations, the determining the texture complexity information corresponding to each training area image in the multiple training area images in the training image includes: calculating each pixel in each training area image according to the gradient size of each pixel, determine the texture complexity information of each training area image.
结合第二方面,在一些可能的实现方式中,所述方法还包括:将所述训练图像划分为所述多个训练区域图像,所述多个训练区域图像不重叠,且所述多个训练区域图像包括所述训练图像中的全部像素。With reference to the second aspect, in some possible implementations, the method further includes: dividing the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images The region image includes all the pixels in the training image.
第三方面,提供一种电子设备,图像处理装置,其特征在于,包括存储模块和处理模块;所述存储模块用于存储程序指令;当所述程序指令在所述处理器中执行时,所述处理模块用于:确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。In a third aspect, an electronic device and an image processing apparatus are provided, which are characterized by comprising a storage module and a processing module; the storage module is used to store program instructions; when the program instructions are executed in the processor, the The processing module is used to: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; according to the texture complexity information corresponding to each area image, determine the corresponding The image compression model, wherein, different texture complexity information corresponds to different image compression models; the image compression model corresponding to each area image is used to compress each area image.
结合第三方面,在一些可能的实现方式中,所述处理模块还用于:利用与压缩所述每个区域图像的图像压缩模型对应的图像解压模型,对所述每个区域图像压缩后得到的图像特征进行解压缩,得到所述每个区域图像对应的区域解压图像;对所述多个区域解压图像进行拼接处理和优化处理,得到恢复后的待处理图像,所述优化处理包括对所述多个区域解压图像的边沿进行像素调整。With reference to the third aspect, in some possible implementations, the processing module is further configured to: use an image decompression model corresponding to the image compression model for compressing the image of each area, compress the image of each area to obtain Decompress the image features of the multiple regions to obtain a region decompressed image corresponding to each region image; perform splicing processing and optimization processing on the multiple region decompressed images to obtain the restored image to be processed, and the optimization processing includes Pixel adjustment is performed on the edges of the decompressed image in the multiple regions.
结合第三方面,在一些可能的实现方式中,所述处理模块还用于:计算每个区域图像中每个像素的梯度大小;根据所述每个像素的梯度大小,确定每个区域图像的纹理复杂度信息。With reference to the third aspect, in some possible implementations, the processing module is further configured to: calculate the gradient size of each pixel in each area image; determine the gradient size of each area image according to the gradient size of each pixel Texture complexity information.
结合第三方面,在一些可能的实现方式中,所述处理模块还用于:将所述待处理图像划分为所述多个区域图像,所述多个区域图像不重叠,且所述多个区域图像包括所述待处理图像中的全部像素。With reference to the third aspect, in some possible implementations, the processing module is further configured to: divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images The area image includes all the pixels in the image to be processed.
第四方面,提供一种神经网络训练装置,包括存储模块和处理模块;所述存储模块用于存储程序指令,当所述程序指令在所述处理器中执行时,所述处理模块用于:确定训练 图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息;根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型,其中,不同的纹理复杂度信息对应不同的编解码模型,每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压得到多个解压训练区域图像;根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。In a fourth aspect, a neural network training device is provided, comprising a storage module and a processing module; the storage module is used for storing program instructions, and when the program instructions are executed in the processor, the processing module is used for: Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image; according to the training texture complexity information corresponding to each training area image, determine the encoding corresponding to each training area image. A decoding model, wherein different texture complexity information corresponds to different encoding and decoding models, and each encoding and decoding model is used to compress the input image of the training area, and decompress the compression result to obtain multiple decompressed training areas image; adjust the parameters of the codec model according to the rate-distortion, and the rate-distortion is obtained from the decompressed training area image and the training area image.
结合第四方面,在一些可能的实现方式中,所述处理模块还用于:通过融合模型对所述多个解压训练区域图像进行拼接处理和优化处理,得到训练恢复图像,所述优化处理包括对至少一个所述解压训练区域图像的边沿进行像素调整;根据所述训练恢复图像与所述训练图像的图像失真度,调整融合模型的参数。With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes: Pixel adjustment is performed on the edge of at least one image of the decompressed training area; and the parameters of the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.
还可以根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述编解码模型的参数。The parameters of the codec model may also be adjusted according to the degree of image distortion between the training restoration image and the training image.
结合第四方面,在一些可能的实现方式中,所述处理模块还用于:计算每个训练区域图像中每个像素的梯度大小;根据所述每个像素的梯度大小,确定每个训练区域图像的纹理复杂度信息。With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: calculate the gradient size of each pixel in each training area image; determine each training area according to the gradient size of each pixel Texture complexity information for the image.
结合第四方面,在一些可能的实现方式中,所述处理模块还用于:将所述训练图像划分为所述多个训练区域图像,所述多个训练区域图像不重叠,且所述多个训练区域图像包括所述训练图像中的全部像素。With reference to the fourth aspect, in some possible implementations, the processing module is further configured to: divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images do not overlap. A training area image includes all pixels in the training image.
第五方面,提供一种电子设备,包括存储器和处理器,所述存储器用于存储程序指令;当所述程序指令在所述处理器中执行时,所述处理器用于执行第一方面或第二方面所述的方法。In a fifth aspect, an electronic device is provided, comprising a memory and a processor, wherein the memory is used for storing program instructions; when the program instructions are executed in the processor, the processor is used for executing the first aspect or the first aspect The method described in the second aspect.
上述第五方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(google)为机器学习全定制的人工智能加速器专用集成电路。The processor in the fifth aspect above may be either a central processing unit (CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor may include a graphics processor (graphics processing unit). unit, GPU), neural network processor (neural-network processing unit, NPU) and tensor processor (tensor processing unit, TPU) and so on. Among them, TPU is Google's fully customized artificial intelligence accelerator application-specific integrated circuit for machine learning.
第六方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第二方面中的任意一种实现方式中的方法。In a sixth aspect, a computer-readable medium is provided, the computer-readable medium stores program code for execution by a device, the program code comprising a method for performing any one of the implementations of the first aspect or the second aspect .
第七方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方法。In a seventh aspect, there is provided a computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to execute the method in any one of the implementation manners of the first aspect or the second aspect.
第八方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。In an eighth aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementations of the first aspect or the second aspect.
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。The above chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
附图说明Description of drawings
图1为本申请实施例提供的一种系统架构的结构示意图。FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
图2为本申请实施例提供的一种卷积神经网络的结构示意图。FIG. 2 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present application.
图3为本申请实施例提供的另一种卷积神经网络的结构示意图。FIG. 3 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
图4为本申请实施例提供的一种芯片的硬件结构示意图。FIG. 4 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.
图5为本申请实施例提供的一种系统架构的示意图。FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.
图6是一种图像处理系统的示意性结构图。FIG. 6 is a schematic structural diagram of an image processing system.
图7是本申请实施例提供的一种图像处理系统的示意性结构图。FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
图8是本申请实施例提供的一种神经网络训练方法的示意性结构图。FIG. 8 is a schematic structural diagram of a neural network training method provided by an embodiment of the present application.
图9是本申请实施例提供的一种图像处理方法的示意性流程图。FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
图10是本申请实施例提高的图像处理方法的压缩性能的示意图。FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
图11是本申请实施例提供的一种图像处理装置的示意性结构图。FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
图12是本申请实施例提供的一种神经网络训练装置的示意性结构图。FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
图13是本申请实施例的图像处理装置的示意性结构图。FIG. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
图14是本申请实施例的神经网络训练装置的示意性结构图。FIG. 14 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为: A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
Figure PCTCN2021086836-appb-000001
Figure PCTCN2021086836-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2, ... n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定 与第i+1层的任意一个神经元相连。A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021086836-appb-000002
其中,
Figure PCTCN2021086836-appb-000003
是输入向量,
Figure PCTCN2021086836-appb-000004
是输出向量,
Figure PCTCN2021086836-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021086836-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2021086836-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2021086836-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021086836-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
Although DNN looks complicated, it is not complicated in terms of the work of each layer. In short, it is the following linear relationship expression:
Figure PCTCN2021086836-appb-000002
in,
Figure PCTCN2021086836-appb-000003
is the input vector,
Figure PCTCN2021086836-appb-000004
is the output vector,
Figure PCTCN2021086836-appb-000005
is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is just an input vector
Figure PCTCN2021086836-appb-000006
After such a simple operation to get the output vector
Figure PCTCN2021086836-appb-000007
Due to the large number of DNN layers, the coefficient W and offset vector
Figure PCTCN2021086836-appb-000008
The number is also higher. These parameters are defined in the DNN as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as
Figure PCTCN2021086836-appb-000009
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021086836-appb-000010
To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2021086836-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
(3)卷积神经网络(3) Convolutional Neural Network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(4)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。(4) Recurrent neural networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and each node in each layer is unconnected. Although this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer and this layer are no longer unconnected but connected, and the input of the hidden layer not only includes The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN.
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN 就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。Why use a recurrent neural network when you already have a convolutional neural network? The reason is very simple. In the convolutional neural network, there is a premise that the elements are independent of each other, and the input and output are also independent, such as cats and dogs. But in the real world, many elements are interconnected, such as the change of stocks over time, and another example of a person who said: I like to travel, and my favorite place is Yunnan. I must go there in the future. Fill in the blanks here. Humans should all know that it is "Yunnan". Because humans make inferences based on the content of the context, but how do you get machines to do this? RNN came into being. RNNs are designed to give machines the ability to memorize like humans do. Therefore, the output of RNN needs to rely on current input information and historical memory information.
(5)损失函数(5) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.
(6)反向传播算法(6) Back propagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
(7)像素值(7) Pixel value
图像的像素值可以是一个红绿蓝(RGB)颜色值,像素值可以是表示颜色的长整数。例如,像素值为256*Red+100*Green+76Blue,其中,Blue代表蓝色分量,Green代表绿色分量,Red代表红色分量。各个颜色分量中,数值越小,亮度越低,数值越大,亮度越高。对于灰度图像来说,像素值可以是灰度值。The pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color. For example, the pixel value is 256*Red+100*Green+76Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness. For grayscale images, the pixel values can be grayscale values.
(8)图像压缩(8) Image compression
图像压缩,是指以较少的比特有损或无损地表示原来的像素矩阵的技术,也称图像编码。通过图像压缩,对图像内容执行变换,可以减少表示数字图像时需要的数据量,从而可以降低图像存储占用的空间。Image compression refers to the technology of representing the original pixel matrix lossy or lossless with fewer bits, also known as image coding. Image compression, which performs transformations on the image content, can reduce the amount of data required to represent a digital image, thereby reducing the space occupied by image storage.
图像数据之所以能被压缩,就是因为数据中存在着冗余。图像数据的冗余主要表现为:图像中相邻像素间的相关性引起的空间冗余;图像序列中不同帧之间存在相关性引起的时间冗余;不同彩色平面或频谱带的相关性引起的频谱冗余。数据压缩的目的就是通过去除这些数据冗余来减少表示数据所需的比特数。由于图像数据量的庞大,在存储、传输、处理时非常困难,因此图像数据的压缩就显得非常重要。Image data can be compressed because there is redundancy in the data. The redundancy of image data is mainly manifested as: spatial redundancy caused by the correlation between adjacent pixels in the image; temporal redundancy caused by the correlation between different frames in the image sequence; caused by the correlation of different color planes or spectral bands. spectrum redundancy. The purpose of data compression is to reduce the number of bits required to represent data by removing these data redundancies. Due to the huge amount of image data, it is very difficult to store, transmit and process, so the compression of image data is very important.
(9)图像解压(9) Image decompression
图像解压是图像压缩的逆过程,也可以称为解压或解码。通过图像解码,可以将输入的紧凑表示的信息格式恢复为图像。Image decompression is the inverse process of image compression, which can also be called decompression or decoding. Through image decoding, the information format of the input compact representation can be restored as an image.
(10)图像失真度(10) Image Distortion
一般采用原始图像与编码重建图像之间的峰值性噪比(peak signal-to-noise ratio,PSNR)衡量图像失真度,这个PSNR可以是亮度的PSNR,也可以是亮度与色度的PSNR 的线性组合。一般最简单的情况下,采用亮度的PSNR(Y-PSNR)作为主要衡量依据。其中,峰值信号即图像中像素的最大值(比方说像素亮度的最大值),噪声指原始图像与重建图像中各像素值的均方差(差值的平方取均值);将两者的比值转换成分贝形式,即为PSNR。Generally, the peak signal-to-noise ratio (PSNR) between the original image and the encoded reconstructed image is used to measure the image distortion. This PSNR can be the PSNR of luminance or the linearity of PSNR of luminance and chrominance. combination. Generally, in the simplest case, the PSNR (Y-PSNR) of the luminance is used as the main criterion. Among them, the peak signal is the maximum value of the pixel in the image (for example, the maximum value of the pixel brightness), and the noise refers to the mean square error of each pixel value in the original image and the reconstructed image (the square of the difference is averaged); the ratio of the two is converted. In the form of molasses, it is PSNR.
(11)码率(11) Code rate
码率(rate),也可以称为编码码率,可以是压缩后的数据中每个像素的平均数据量((bit-per-pixel,bpp),用于表示数据压缩的程度。码率可以根据图像压缩后数据量的比例确定。The code rate (rate), also known as the encoding code rate, can be the average data amount ((bit-per-pixel, bpp) of each pixel in the compressed data, which is used to indicate the degree of data compression. The code rate can be Determined according to the proportion of the data volume after image compression.
(12)率失真(12) Rate distortion
率失真用于表示图像失真度与码率之间的关系。Rate distortion is used to express the relationship between image distortion and bit rate.
图像失真度越小,意味着图像细节越多,码率越大。图像失真度越大,图像由于压缩所损失的细节越多,那么码率越小。The smaller the image distortion, the more the image details and the higher the bit rate. The greater the image distortion, the more details the image loses due to compression, and the smaller the bit rate.
率失真优化(rate distortion optimization,R-D optimization)是指按照预设规则,尽可能减小图像失真度和码率。也就是说,可以在尽可能小的码率的情况下,使得获取的图像失真度尽可能的少,从而达到较好的压缩效果。通过率失真优化,可以在码率与失真度中找出平衡点,使压缩效果最优。当然,率失真优化的规则也可以是在保证码率不过上限的情况下失真度最小,或者,在保证失真度不过下限的情况下码率最小,等等。Rate distortion optimization (R-D optimization) refers to reducing image distortion and bit rate as much as possible according to preset rules. That is to say, in the case of a bit rate as small as possible, the distortion of the obtained image can be reduced as much as possible, so as to achieve a better compression effect. Through rate-distortion optimization, a balance point can be found between bit rate and distortion, so that the compression effect is optimal. Of course, the rule for rate-distortion optimization may also be that the distortion is the smallest when the code rate is guaranteed to be less than the upper limit, or the code rate is the smallest when the distortion is guaranteed to be less than the lower limit, and so on.
可以通过率失真函数计算率失真。率失真函数例如可以表示为I=D+λR,或I=D·R等,其中,I为率失真,R为码率,R为失真度,λ为预设的拉格朗日系数。Rate-distortion can be calculated by the rate-distortion function. The rate-distortion function can be expressed as, for example, I=D+λR, or I=D·R, etc., where I is the rate-distortion, R is the code rate, R is the distortion degree, and λ is a preset Lagrangian coefficient.
如图1所示,本申请实施例提供了一种系统架构100。在图1中,数据采集设备160用于采集训练数据。针对本申请实施例的图像处理方法来说,训练数据可以包括训练图像。As shown in FIG. 1 , an embodiment of the present application provides a system architecture 100 . In Figure 1, a data collection device 160 is used to collect training data. For the image processing method according to the embodiment of the present application, the training data may include training images.
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的原始图像进行处理,将输出的图像与原始图像进行对比,直到根据训练设备120输出的图像与原始图像的差值确定的率失真小于一定的阈值,从而完成目标模型/规则101的训练。The following describes how the training device 120 obtains the target model/rule 101 based on the training data. The training device 120 processes the input original image and compares the output image with the original image until the training device 120 outputs the image and the original image. The rate-distortion determined by the difference is less than a certain threshold, so that the training of the target model/rule 101 is completed.
上述目标模型/规则101能够用于实现本申请实施例的图像处理方法。本申请实施例中的目标模型/规则101具体可以为神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。The above target model/rule 101 can be used to implement the image processing method of the embodiment of the present application. The target model/rule 101 in this embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application Limitations of Examples.
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图1所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图1中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待 处理图像。The target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptops, augmented reality (AR) AR/virtual reality (VR), in-vehicle terminals, etc., can also be servers or the cloud. In FIG. 1, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: an image to be processed input by the client device.
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理图像)进行预处理,在本申请实施例中,也可以没有预处理模块113和预处理模块114(也可以只有其中的一个预处理模块),而直接采用计算模块111对输入数据进行处理。The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112. In this embodiment of the present application, the preprocessing module 113 and the preprocessing module may also be absent. 114 (or only one of the preprocessing modules), and directly use the calculation module 111 to process the input data.
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
最后,I/O接口112将处理结果,如上述得到的图像的分类结果返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing result, such as the above-obtained image classification result, to the client device 140 so as to be provided to the user.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.
在图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 1 , the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
值得注意的是,图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图1中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。It is worth noting that FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 1 , the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
如图1所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是本申请中的神经网络,具体的,本申请实施例使用神经网络可以为CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)等等。As shown in FIG. 1 , the target model/rule 101 is obtained by training the training device 120. The target model/rule 101 may be the neural network in the present application in this embodiment of the present application. Specifically, the neural network may be used in this embodiment of the present application. For CNN, deep convolutional neural network (deep convolutional neural networks, DCNN), recurrent neural network (recurrent neural network, RNN) and so on.
由于CNN是一种非常常见的神经网络,下面结合图2重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。Since CNN is a very common neural network, the structure of CNN will be introduced in detail in conjunction with Figure 2 below. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. A deep learning architecture refers to an algorithm based on machine learning. learning at multiple levels of abstraction. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图2所示。在图2中,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及神经网络层230。其中,输入层210可以获取待处理图像,并将获取到的待处理图像交由卷积层/池化层220以及后面的神经网络层230进行处理,可以得到图像的处 理结果。下面对图2中的CNN 200中内部的层结构进行详细的介绍。The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 2 . In FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 . Wherein, the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed by the convolution layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained. The internal layer structure in the CNN 200 in Figure 2 is described in detail below.
卷积层/池化层220:Convolutional layer/pooling layer 220:
卷积层:Convolutional layer:
如图2所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 2, the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。The following will take the convolutional layer 221 as an example to introduce the inner working principle of a convolutional layer.
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。The convolution layer 221 may include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will result in a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied, That is, multiple isotype matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted convolution feature maps with the same size are combined to form The output of the convolution operation.
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (eg, 221 ) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 200 deepens, the features extracted by the later convolutional layers (eg, 226) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图2中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样, 池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 221-226 as shown in 220 in Figure 2, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the only purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
神经网络层230:Neural network layer 230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层230中可以包括多层隐含层(如图2所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
在神经网络层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成,反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation) is completed, the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图3所示。在图3中,卷积神经网络(CNN)200可以包括输入层110,卷积层/池化层120(其中池化层为可选的),以及神经网络层130。与图2相比,图3中的卷积层/池化层120中的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层130进行处理。The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 3 . In FIG. 3 , a convolutional neural network (CNN) 200 may include an input layer 110 , a convolutional/pooling layer 120 (where the pooling layer is optional), and a neural network layer 130 . Compared with FIG. 2 , multiple convolution layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 3 are parallel, and the extracted features are input to the full neural network layer 130 for processing.
需要说明的是,图2和图3所示的卷积神经网络仅作为一种本申请实施例的图像处理方法的两种可能的卷积神经网络的示例,在具体的应用中,本申请实施例的图像处理方法所采用的卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural networks shown in FIG. 2 and FIG. 3 are only examples of two possible convolutional neural networks of the image processing method according to the embodiment of the present application. The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
图4为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器50。该芯片可以被设置在如图1所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则101。如图2和图3所示的卷积神经网络中各层的算法均可在如图4所示的芯片中得以实现。FIG. 4 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50 . The chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 . The chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 . The algorithms of each layer in the convolutional neural network shown in Figures 2 and 3 can be implemented in the chip shown in Figure 4.
神经网络处理器NPU 50作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU 50 is mounted on the main central processing unit (CPU) (host CPU) as a coprocessor, and tasks are allocated by the main CPU. The core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract the data in the memory (weight memory or input memory) and perform operations.
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 503 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累 加器(accumulator)508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers it on each PE in the operation circuit. The arithmetic circuit fetches the data of the matrix A from the input memory 501 and performs the matrix operation on the matrix B, and stores the partial result or the final result of the matrix in the accumulator 508.
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit 507 can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector computing unit 507 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现种,向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, vector computation unit 507 can store the processed output vectors to unified buffer 506 . For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate activation values. In some implementations, vector computation unit 507 generates normalized values, merged values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 503, eg, for use in subsequent layers in a neural network.
统一存储器506用于存放输入数据以及输出数据。Unified memory 506 is used to store input data and output data.
权重数据直接通过存储单元访问控制器505(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
总线接口单元(bus interface unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。A bus interface unit (BIU) 510 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 509 through the bus.
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令;The instruction fetch memory (instruction fetch buffer) 509 connected with the controller 504 is used to store the instructions used by the controller 504;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过程。The controller 504 is used for invoking the instructions cached in the memory 509 to control the working process of the operation accelerator.
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,简称DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip (On-Chip) memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access Memory (double data rate synchronous dynamic random access memory, referred to as DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
其中,图2和图3所示的卷积神经网络中各层的运算可以由运算电路503或向量计算单元507执行。The operation of each layer in the convolutional neural network shown in FIG. 2 and FIG. 3 may be performed by the operation circuit 503 or the vector calculation unit 507 .
上文中介绍的图1中的执行设备110能够执行本申请实施例的图像处理方法的各个步骤,图2和图3所示的CNN模型和图4所示的芯片也可以用于执行本申请实施例的图像处理方法的各个步骤。下面结合附图对本申请实施例的神经网络训练的方法和本申请实施例的图像处理方法进行详细的介绍。The execution device 110 in FIG. 1 described above can execute each step of the image processing method of the embodiment of the present application. The CNN model shown in FIG. 2 and FIG. 3 and the chip shown in FIG. 4 can also be used to execute the implementation of the present application. The steps of the image processing method of the example. The method for training a neural network according to the embodiment of the present application and the image processing method according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
如图5所示,本申请实施例提供了一种系统架构300。该系统架构包括本地设备301、本地设备302以及执行设备210和数据存储系统250,其中,本地设备301和本地设备302通过通信网络与执行设备210连接。As shown in FIG. 5 , an embodiment of the present application provides a system architecture 300 . The system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.
执行设备210可以由一个或多个服务器实现。可选的,执行设备210可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码来实现本申请实施例的图像处理的方法。The execution device 210 may be implemented by one or more servers. Optionally, the execution device 210 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 210 may be arranged on one physical site, or distributed across multiple physical sites. The execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the image processing method in this embodiment of the present application.
具体地,执行设备210可以执行以下过程:确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。Specifically, the execution device 210 may perform the following process: determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed; determine the texture complexity information corresponding to each area image according to the texture complexity information An image compression model corresponding to each area image, wherein different texture complexity information corresponds to different image compression models; each area image is compressed by using the image compression model corresponding to each area image.
通过上述过程执行设备210能够通过对于不同纹理复杂度的区域图像,使用与该区域图像纹理复杂度相对应的压缩模型进行图像的压缩,提高对待处理图像的图像压缩的效果。Through the above process execution device 210, the image compression effect of the to-be-processed image can be improved by using the compression model corresponding to the texture complexity of the region image for the region image with different texture complexity.
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。A user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and the like.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
在一种实现方式中,本地设备301、本地设备302从执行设备210获取到目标神经网络的相关参数,将目标神经网络部署在本地设备301、本地设备302上,利用该目标神经网络进行图像分类或者图像处理等等。In an implementation manner, the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network for image classification Or image processing, etc.
在另一种实现中,执行设备210上可以直接部署目标神经网络,执行设备210通过从本地设备301和本地设备302获取待处理图像,并根据目标神经网络对待处理图像进行分类或者其他类型的图像处理。In another implementation, the target neural network can be directly deployed on the execution device 210, and the execution device 210 obtains the images to be processed from the local device 301 and the local device 302, and classifies the images to be processed or other types of images according to the target neural network deal with.
上述执行设备210也可以为云端设备,此时,执行设备210可以部署在云端;或者,上述执行设备210也可以为终端设备,此时,执行设备210可以部署在用户终端侧,本申请实施例对此并不限定。The above execution device 210 may also be a cloud device, in this case, the execution device 210 may be deployed in the cloud; or, the above execution device 210 may also be a terminal device, in this case, the execution device 210 may be deployed on the user terminal side, the embodiment of the present application This is not limited.
图6示出一种图像处理系统的示意性结构图。FIG. 6 shows a schematic structural diagram of an image processing system.
图像处理系统600包括编码端的编码器610和量化模块620,以及解码端的解码器630。其中,编码器610和解码器630为神经网络。The image processing system 600 includes an encoder 610 and a quantization module 620 at the encoding end, and a decoder 630 at the decoding end. The encoder 610 and the decoder 630 are neural networks.
图像处理系统600可以应用在对图像进行传输和存储的场景中。The image processing system 600 may be applied in the scenario of transmitting and storing images.
编码端的编码器610和量化模块620可以设置在云端的服务器中。图像数据在云端执行图像编码过程,得到紧凑表示的压缩数据。对压缩数据进行存储,可以减少保存图像占用的存储空间。对压缩数据进行传输,能够降低图像传输过程的对传输资源的占用,降低对带宽的需求。The encoder 610 and the quantization module 620 at the encoding end may be set in a server in the cloud. Image data performs an image encoding process in the cloud, resulting in a compact representation of compressed data. Storing compressed data can reduce the storage space occupied by saving images. The transmission of compressed data can reduce the occupation of transmission resources in the image transmission process and reduce the demand for bandwidth.
解码端的解码器630可以设置在作为客户端的终端设备中。解码端对压缩数据执行解码操作,得到重构图像。终端设备可以通过显示器显示该重构图像。The decoder 630 on the decoding side may be provided in a terminal device serving as a client. The decoding end performs a decoding operation on the compressed data to obtain a reconstructed image. The terminal device can display the reconstructed image through a display.
编码器610用于对待处理图像进行特征的提取,从而得到图像特征。The encoder 610 is used for extracting features of the image to be processed, so as to obtain image features.
量化模块620用于对图像特征进行量化,以得到压缩数据。量化,即数字信号处理领域中将信号的连续取值(或者大量可能的离散取值)近似为有限多个(或较少的)离散值的过程。The quantization module 620 is used for quantizing image features to obtain compressed data. Quantization, that is, the process of approximating a continuous value of a signal (or a large number of possible discrete values) into a finite number (or fewer) of discrete values in the field of digital signal processing.
云端可以将压缩数据传输至客户端。The cloud can transmit the compressed data to the client.
解码器630用于对压缩数据进行解压缩,以得到重构图像。The decoder 630 is used to decompress the compressed data to obtain a reconstructed image.
图像处理系统600使用整张图像作为输入,通过对图像执行非线性变换,降低码字之 间的相关性,提升神经网络的压缩性能。The image processing system 600 uses the entire image as input, and performs nonlinear transformation on the image to reduce the correlation between codewords and improve the compression performance of the neural network.
自然图像本身包含了非常丰富的纹理信息。纹理在计算机图形学中既包括通常意义上物体表面的纹理即使物体表面呈现凹凸不平的沟纹,同时也包括在物体的光滑表面上的彩色图案。纹理复杂度可以用于反映图像中像素值的变换剧烈程度。不同的类别的图像之间纹理细节等特征差别显著,纹理复杂度不同的图像内容特性相差较大。Natural images themselves contain very rich texture information. Texture in computer graphics includes both the texture of the surface of the object in the usual sense, even if the surface of the object exhibits uneven grooves, and the color pattern on the smooth surface of the object. Texture complexity can be used to reflect how strongly pixel values in an image are transformed. Different categories of images have significant differences in texture details and other characteristics, and image content characteristics with different texture complexity are quite different.
图像处理系统600对于纹理复杂度不同的图像,利用相同的编码器进行相同的处理,阻碍了压缩性能的进一步提升。The image processing system 600 uses the same encoder to perform the same processing for images with different texture complexities, which hinders further improvement of the compression performance.
为了解决上述问题,本申请实施例提供了一种图像处理系统,以提高图像压缩性能。In order to solve the above problems, embodiments of the present application provide an image processing system to improve image compression performance.
图7是本申请实施例提供的一种图像处理系统的示意性结构图。根据图像的纹理复杂度,在多个神经网络中选择与该图像纹理复杂度对应的神经网络,结构实现了神经网络的选择根据图像内容的自适应调整,根据不同的纹理特性对图像进行压缩,实现了图像压缩性能的进一步提升。FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application. According to the texture complexity of the image, the neural network corresponding to the image texture complexity is selected from multiple neural networks. The structure realizes the adaptive adjustment of the selection of the neural network according to the image content, and compresses the image according to different texture characteristics. A further improvement in image compression performance is achieved.
图像处理系统700包括压缩系统710和解压系统720。其中,压缩系统710包括分割模型711、分类模型712、压缩模型713,解压系统720包括解压模块721、融合模块722。 Image processing system 700 includes compression system 710 and decompression system 720 . The compression system 710 includes a segmentation model 711 , a classification model 712 , and a compression model 713 , and the decompression system 720 includes a decompression module 721 and a fusion module 722 .
压缩系统710和解压系统720可以位于相同或不同的设备中。 Compression system 710 and decompression system 720 may be located in the same or different devices.
将待处理图像输入压缩系统710,压缩系统710用于对待处理图像进行压缩处理。The image to be processed is input into the compression system 710, and the compression system 710 is used for compressing the image to be processed.
分割模型711可以将待处理图像进行分割,以得到多个区域图像。The segmentation model 711 can segment the to-be-processed image to obtain multiple region images.
区域图像也可以称为图像块。Area images can also be referred to as image blocks.
该多个区域图像的尺寸可以相同或不相同。为了降低图像分割的难度,可以按照目标尺寸对图像进行分割,以得到尺寸相同的多个区域图像。The sizes of the plurality of area images may be the same or different. In order to reduce the difficulty of image segmentation, the image can be segmented according to the target size to obtain multiple region images with the same size.
该多个区域图像之间可以存在或不存在重叠。为了提高压缩性能,该多个区域图像不重叠。There may or may not be an overlap between the plurality of region images. To improve compression performance, the multiple region images do not overlap.
例如,可以将待处理图像划分为多个128×128的区域图像。通过多待处理图像的无重叠划分,形成了多个内容无重复的区域图像。For example, the to-be-processed image can be divided into multiple 128×128 area images. Through the non-overlapping division of multiple images to be processed, a plurality of area images with non-repetitive contents are formed.
将区域图像输入分类模型712。分类模型712用于计算输入图像的纹理复杂度。The region images are input into the classification model 712. The classification model 712 is used to calculate the texture complexity of the input image.
对于数字图像的简单一阶微分运算,由于其具有固定的方向性,只能检测特定的某一方向的边缘。为了检测克服一阶导数的缺点,可以计算图像中各个像素的梯度,从而实现对图像的一阶微分运算,并考虑方向性。For the simple first-order differential operation of digital images, due to its fixed directionality, it can only detect edges in a specific direction. In order to detect and overcome the shortcomings of the first-order derivative, the gradient of each pixel in the image can be calculated, so as to realize the first-order differential operation of the image, and consider the directionality.
图像的梯度的方向是在图像灰度最大变化率上,可以反映出图像边缘上的灰度变化。梯度算子总是指向变换最剧烈的方向,在图像处理中,梯度算子的方向与图像中的边缘正交。梯度算子的大小表示图像灰度的变化率。The direction of the gradient of the image is at the maximum change rate of the image gray level, which can reflect the gray level change on the edge of the image. The gradient operator always points in the direction of the most drastic transformation. In image processing, the direction of the gradient operator is orthogonal to the edges in the image. The size of the gradient operator represents the rate of change of the grayscale of the image.
分类模型712可以对输入的图像计算每个像素在水平方向和竖直方向的亮度的差分值。根据每个像素在水平方向和竖直方向的亮度的差分值,可以计算该像素的梯度大小。根据每个像素的梯度大小,可以确定平均梯度大小或该图像中各个像素的梯度大小的中位数。该图像的平均梯度大小或该中位数可以指示图像的平滑程度,反映图像的纹理复杂度。The classification model 712 can calculate the difference value of the luminance of each pixel in the horizontal direction and the vertical direction for the input image. According to the difference value of the brightness of each pixel in the horizontal direction and the vertical direction, the gradient size of the pixel can be calculated. From the magnitude of the gradient for each pixel, the average gradient magnitude or the median of the gradient magnitudes for the individual pixels in the image can be determined. The average gradient size of the image or the median can indicate how smooth the image is, reflecting the texture complexity of the image.
压缩模块713用于根据图像的纹理复杂度,利用与该纹理复杂度对应的压缩模型对图像进行图像特征的提取,以实现对图像的压缩。压缩模块713中可以包括用于对特征进行提取的AI模型,称为压缩模型(也可以称为图像压缩模型),或者,压缩模块713可以通过接口调用压缩模型,以实现对图像特征的提取。压缩模型可以是预训练完成的一种神 经网络模型。可以将区域图像输入压缩模型,以得到该区域图像的图像特征。压缩模型例如可以是CNN、RNN等。The compression module 713 is configured to perform image feature extraction on the image according to the texture complexity of the image by using a compression model corresponding to the texture complexity, so as to realize the compression of the image. The compression module 713 may include an AI model for extracting features, called a compression model (also called an image compression model), or the compression module 713 may invoke the compression model through an interface to extract image features. The compression model can be a neural network model that is pre-trained. The region image can be input into the compression model to obtain the image features of the region image. The compression model may be, for example, CNN, RNN, or the like.
压缩模块713可以保存有纹理复杂度与压缩模型的对应关系。从而,根据区域图像的纹理复杂度,压缩模块713可以从多个与压缩模型中确定与该纹理复杂度对应的压缩模型,对该区域图像进行处理。The compression module 713 may store the correspondence between the texture complexity and the compression model. Therefore, according to the texture complexity of the area image, the compression module 713 may determine a compression model corresponding to the texture complexity from a plurality of compression models, and process the area image.
根据压缩模块713对区域图像的处理,可以得到每个区域图像的图像特征。According to the processing of the regional images by the compression module 713, the image features of each regional image can be obtained.
压缩系统710还可以包括量化模型等。量化模型可以对图像特征进行量化处理。 Compression system 710 may also include quantization models and the like. The quantization model can quantify image features.
解压系统720用于对于压缩系统710的处理结果进行解压缩。The decompression system 720 is used to decompress the processing result of the compression system 710 .
将压缩系统710处理得到的图像特征输入解压模块721。压缩系统710处理得到的图像特征可以是压缩模块713输出的图像特征。在压缩系统710还可以包括量化模型的情况下,压缩系统710处理得到的图像特征可以是量化后的图像特征。The image features processed by the compression system 710 are input into the decompression module 721 . The image features processed by the compression system 710 may be the image features output by the compression module 713 . In the case where the compression system 710 may further include a quantization model, the image features processed by the compression system 710 may be quantized image features.
解压模块721用于对图像特征进行图像解压,以得到解压后的图像。The decompression module 721 is used to decompress the image features to obtain a decompressed image.
解压模块721可以用于根据图像的纹理复杂度,利用与该纹理复杂度对应的解压模型对图像特征进行图像解压,以得到解压后的图像。The decompression module 721 may be configured to perform image decompression on the image features according to the texture complexity of the image using a decompression model corresponding to the texture complexity to obtain a decompressed image.
或者,解压模块721可以接收指示信息,指示信息用于指示每个图像特征对应的解压模型。解压模块721可以利用指示信息指示的解压模型,对该图像特征进行解压。Alternatively, the decompression module 721 may receive indication information, where the indication information is used to indicate the decompression model corresponding to each image feature. The decompression module 721 can decompress the image feature by using the decompression model indicated by the indication information.
解压模块721中可以包括用于对图像特征进行解压的AI模型,称为解压模型或图像解压模型,或者,解压模块721可以通过接口调用解压模型,以实现对图像的解压。解压模型可以是预训练完成的一种神经网络模型。可以将图像特征和该图像特征对应的图像纹理复杂度输入解压模型,以得到解压后的该区域图像。解压模型例如可以是CNN、RNN等。The decompression module 721 may include an AI model for decompressing image features, called a decompression model or an image decompression model, or the decompression module 721 may invoke the decompression model through an interface to decompress the image. The decompression model can be a neural network model that is pre-trained. The image feature and the image texture complexity corresponding to the image feature can be input into the decompression model to obtain the decompressed image of the region. The decompression model can be, for example, CNN, RNN, or the like.
解压模块721可以保存有纹理复杂度与解压模型的对应关系。从而,根据区域图像的纹理复杂度,解压模块721可以从多个与压缩模型中确定与该纹理复杂度对应的解压模型,对该图像特征进行处理。The decompression module 721 may store the correspondence between the texture complexity and the decompression model. Therefore, according to the texture complexity of the regional image, the decompression module 721 can determine a decompression model corresponding to the texture complexity from a plurality of compression models, and process the image feature.
通过解压模块721的处理,可以得到恢复后的各个区域图像。Through the processing of the decompression module 721, the restored images of each region can be obtained.
融合模块722用于对恢复后的各个区域图像进行融合。融合模块722中可以包括用于图像融合的AI模型,称为融合模型,或者,融合模块722可以通过接口调用融合模型,以实现对区域图像的融合。融合模型可以是预训练完成的一种神经网络模型。可以将恢复后的各个区域图像输入融合模型,以得到融合后的图像。该融合后的图像也可以称为重构图像或压缩重构图像。融合模型例如可以是CNN等。The fusion module 722 is used to fuse the restored regional images. The fusion module 722 may include an AI model for image fusion, which is called a fusion model, or the fusion module 722 may call the fusion model through an interface to realize fusion of regional images. The fusion model can be a neural network model that is pre-trained. The restored image of each region can be input into the fusion model to obtain the fused image. The fused image may also be referred to as a reconstructed image or a compressed reconstructed image. The fusion model can be, for example, a CNN or the like.
区域图像的融合,可以是将区域图像进行拼接。The fusion of regional images can be splicing the regional images.
进一步地,区域图像的融合还可以包括对区域图像边缘像素的调整,以使得重构图像与待处理图像之间的误差更小,降低失真度。Further, the fusion of the regional images may also include adjustment of the edge pixels of the regional images, so that the error between the reconstructed image and the image to be processed is smaller and the degree of distortion is reduced.
图像处理系统700通过对区域图像纹理复杂度的计算,在区域图像的纹理复杂程度不同的情况下,利用不同的压缩模型和解压模型进行数据处理,提高了图像压缩性能。The image processing system 700 uses different compression models and decompression models to process data by calculating the texture complexity of the regional images under the condition that the texture complexity of the regional images is different, thereby improving the image compression performance.
图像处理系统700通过对待处理图像进行分割,对不同的区域图像分别进行纹理复杂程度的计算,可以对待处理图像的前景和背景采用不同的压缩模型和解压模型进行数据处理,提高了图像压缩性能。The image processing system 700 divides the image to be processed and calculates the texture complexity of the images in different regions, so that the foreground and background of the image to be processed can be processed using different compression models and decompression models, thereby improving image compression performance.
图像处理系统700中的解压系统720通过进行区域图像融合时,对区域图像边缘像素 的调整,降低图像失真度,提高了图像压缩性能。The decompression system 720 in the image processing system 700 reduces the degree of image distortion and improves the image compression performance by adjusting the edge pixels of the regional image during regional image fusion.
图像处理系统700中采用的各个AI模型可以通过端到端的训练得到;或者,也可以先对压缩模型和解压模型进行训练,之后,训练融合模型。图像处理系统700中采用的AI模型的训练方法可以参见图8的说明。Each AI model used in the image processing system 700 may be obtained through end-to-end training; alternatively, the compression model and the decompression model may be trained first, and then the fusion model may be trained. For the training method of the AI model adopted in the image processing system 700, reference may be made to the description of FIG. 8 .
端到端的训练是一种机器学习范式,整个学习的流程并不进行人为的子问题划分,而是完全交给深度学习模型直接学习从原始数据到期望输出的映射。End-to-end training is a machine learning paradigm. The entire learning process does not divide artificial sub-problems, but is completely handed over to the deep learning model to directly learn the mapping from the original data to the desired output.
图8是本申请实施例提供的一种神经网络模型训练方法的示意性结构图。FIG. 8 is a schematic structural diagram of a neural network model training method provided by an embodiment of the present application.
在S810,确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息。At S810, determine the texture complexity information corresponding to each training area image in the plurality of training area images in the training image.
可以获取待处理训练图像。可以对完整的待处理训练图像进行划分,以得到多个训练区域图像。The training images to be processed can be obtained. The complete training image to be processed can be divided to obtain multiple training area images.
所述多个训练区域图像不重叠,且所述多个训练区域图像包括所述训练图像中的全部像素。The plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
在S820,根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型。At S820, according to the training texture complexity information corresponding to each training area image, determine an encoding/decoding model corresponding to each training area image.
不同的纹理复杂度信息对应不同的编解码模型。Different texture complexity information corresponds to different codec models.
每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压,从而得到多个解压训练区域图像。Each of the encoding and decoding models is used to compress the input images of the training area, and decompress the compression result, thereby obtaining multiple decompressed images of the training area.
也就是说,编解码模型包括压缩模型和解压模型。其中,压缩模型用于对图像进行压缩,解压模型用于对压缩模型的处理结果进行解压。That is, the codec model includes a compression model and a decompression model. The compression model is used to compress the image, and the decompression model is used to decompress the processing result of the compression model.
将每个训练区域图像输入与该训练区域图像对应的编解码模型,编解码模型对输入的训练区域图像进行处理,得到训练区域图像对应的解码训练区域图像。Each training area image is input into an encoding/decoding model corresponding to the training area image, and the encoding/decoding model processes the input training area image to obtain a decoded training area image corresponding to the training area image.
编解码模型中,压缩模型对训练区域图像进行压缩处理,得到训练区域图像的训练特征。解压模型对训练区域图像的训练特征进行解码,以得到与该训练区域图像对应的解压训练区域图像。In the encoding and decoding model, the compression model compresses the images in the training area to obtain the training features of the images in the training area. The decompression model decodes the training features of the training area image to obtain the decompressed training area image corresponding to the training area image.
在S830,根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。In S830, the parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
也就是说,可以利用所述训练图像,训练编解码模型。That is, the encoding and decoding model can be trained by using the training images.
在执行S830时,每次使用参数调整后的编解码模型对训练区域图像进行处理,直到率失真逐渐收敛,从而得到训练完成的编解码模型。When performing S830, the training area image is processed by using the codec model adjusted by the parameters each time until the rate distortion gradually converges, so as to obtain the codec model that has been trained.
对于一个待处理训练图像,可以利用融合模型对输入该融合模型的各个训练区域图像对应的解压训练区域图像进行拼接处理和优化处理,以得到训练恢复图像。本申请实施例对拼接处理和优化处理的先后顺序不作限定。For a training image to be processed, a fusion model can be used to stitch and optimize the decompressed training area images corresponding to each training area image input to the fusion model to obtain a training restoration image. This embodiment of the present application does not limit the sequence of the splicing process and the optimization process.
优化处理包括对解压训练图像的边沿区域进行调整。The optimization process includes adjustments to the edge regions of the decompressed training images.
优化处理还可以包括对解压训练图像边沿区域之外的其他区域的调整。The optimization process may also include adjustments to regions other than the border regions of the decompressed training image.
应当理解,优化处理中的调整是对颜色的调整,例如可以调整亮度和色度等。It should be understood that the adjustment in the optimization process is the adjustment of the color, for example, the brightness and chromaticity can be adjusted.
可以根据训练恢复图像与待处理训练图像的图像失真度,调整融合模型的参数,完成对融合模型的训练。The parameters of the fusion model can be adjusted according to the image distortion degree of the training recovery image and the training image to be processed to complete the training of the fusion model.
还可以根据训练恢复图像与待处理训练图像的图像失真度,确定率失真,调整编解码 模型的参数。完成编解码模型的训练。It is also possible to determine the rate-distortion and adjust the parameters of the codec model according to the image distortion degree of the training recovery image and the training image to be processed. Complete the training of the encoder-decoder model.
融合模型可以对各个训练区域图像对应的解压训练图像进行拼接,并修改并对位于解压后的训练图像的边沿区域的像素进行调整,以得到训练恢复图像。训练恢复图像也就是恢复后的待处理训练图像。The fusion model can stitch the decompressed training images corresponding to the images in each training area, and modify and adjust the pixels located in the edge area of the decompressed training images to obtain the training recovery image. The training recovery image is the recovered training image to be processed.
根据训练恢复图像与待处理训练图像的差异,可以确定图像失真度。可以根据压缩结果中每个像素的平均数据量,确定码率。According to the difference between the training restored image and the training image to be processed, the degree of image distortion can be determined. The bit rate can be determined according to the average amount of data per pixel in the compression result.
根据训练恢复图像与待处理训练图像的图像失真度,对融合模型和每个训练区域图像对应的编解码模型的参数进行调整,以使得码率满足预设条件的情况下,降低图像失真度。According to the image distortion degree of the training recovery image and the training image to be processed, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the image distortion degree is reduced when the bit rate meets the preset conditions.
或者,也可以根据码率对融合模型和每个训练区域图像对应的编解码模型的参数进行调整。之后,对融合模型和每个训练区域图像对应的编解码模型的参数进行调整,以使得图像失真度满足预设条件的情况下,降低码率。Alternatively, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image may also be adjusted according to the code rate. After that, the parameters of the fusion model and the encoding/decoding model corresponding to each training area image are adjusted, so that the bit rate is reduced when the image distortion degree satisfies the preset condition.
或者,可以通过率失真从整体上反映压缩性能。可以对融合模型和每个训练区域图像对应的编解码模型的参数进行调整,以使得率失真最小。Alternatively, the compression performance can be reflected as a whole through rate-distortion. The parameters of the fusion model and the codec model corresponding to each training area image can be adjusted to minimize rate-distortion.
对每个待处理训练图像,每次使用参数调整后的编解码模型、融合模型进行处理,直到率失真逐渐收敛,从而得到训练完成的编解码模型和融合模型。For each training image to be processed, the codec model and fusion model after parameter adjustment are used for processing each time until the rate-distortion gradually converges, so as to obtain the codec model and fusion model after training.
因此,可以通过“端到端”的方式实现多图像处理系统700中AI模型的训练。Therefore, the training of the AI model in the multi-image processing system 700 can be implemented in an "end-to-end" manner.
为了提高对图像处理系统700中AI模型的训练的速度,也可以先对编解码模型进行预训练,利用预训练完成的编解码模型训练融合模型。In order to improve the training speed of the AI model in the image processing system 700, the codec model may also be pre-trained first, and the fusion model may be trained by using the pre-trained codec model.
具体地,可以获取待处理训练图像。可以对待处理训练图像进行划分,以得到多个训练区域图像。Specifically, training images to be processed may be acquired. The to-be-processed training image can be divided to obtain multiple training region images.
对于每个训练区域图像,可以利用与该训练区域图像对应的编解码模型进行压缩,并对压缩结果进行解压,以确定码率和图像失真率,从而确定率失真。通过调整该编解码模型的参数,以优化率失真。For each training area image, the codec model corresponding to the training area image can be used for compression, and the compression result can be decompressed to determine the bit rate and the image distortion rate, thereby determining the rate distortion. The rate-distortion is optimized by adjusting the parameters of the codec model.
对大量待处理训练图像进行处理,从而获得大量的训练区域图像,以覆盖每种纹理复杂度的图像。对于每种纹理复杂度,每次使用参数调整后的编解码模型对该纹理复杂度的训练区域图像进行处理,直到率失真逐渐收敛,从而得到预训练完成的各个编解码模型。A large number of training images to be processed are processed to obtain a large number of training region images to cover images of each texture complexity. For each texture complexity, each time the codec model after parameter adjustment is used to process the image of the training area of the texture complexity until the rate-distortion gradually converges, so as to obtain each codec model that has been pre-trained.
利用预训练后的编解码模型对待处理训练图像进行处理,以得到待处理训练图像中各个训练区域图像对应的解压训练区域图像。The pre-trained encoding and decoding model is used to process the to-be-processed training image, so as to obtain the decompressed training area image corresponding to each training area image in the to-be-processed training image.
利用融合模型,对一个待处理训练图像中各个训练区域图像对应的解压训练图像进行融合,以得到训练恢复图像。The fusion model is used to fuse the decompressed training images corresponding to the images of each training area in a to-be-processed training image to obtain a training recovery image.
调整融合模型的参数,以最小化训练恢复图像与压缩之前的待处理训练图像之间的差异。Adjust the parameters of the fusion model to minimize the difference between the training restored image and the pending training image before compression.
利用调整后的融合模型,对预训练得到的编解码模型对每个待处理训练图像的处理结果进行融合,直到训练恢复图像与压缩之前的待处理训练图像之间的误差逐渐收敛,即得到训练完成的融合模型。Using the adjusted fusion model, the pre-trained codec model is used to fuse the processing results of each to-be-processed training image, until the error between the training restored image and the to-be-processed training image before compression gradually converges, that is, the training is obtained. The completed fusion model.
在一些实施例中,在利用预训练得到的编解码模型对融合模型进行训练的过程中,也可以调整编解码模型的参数,从而得到图像处理系统700中的训练完成的各个AI模型。In some embodiments, in the process of using the pre-trained codec model to train the fusion model, the parameters of the codec model may also be adjusted to obtain each AI model that has been trained in the image processing system 700 .
通过S810至S830,在训练过程中,利用不同纹理复杂度的训练区域图像,对每种纹理复杂度对应的编解码模型进行训练。使用训练完成的编解码模型,能够实现对不同纹理 复杂度的区域图像的差异化处理,从而提升对整体图像的压缩性能。Through S810 to S830, in the training process, the encoding and decoding models corresponding to each texture complexity are trained by using the training area images of different texture complexities. Using the trained codec model can realize differential processing of regional images with different texture complexity, thereby improving the overall image compression performance.
利用训练完成的编解码模型,可以进行图像处理。参见图9的说明。Using the trained codec model, image processing can be performed. See Figure 9 for description.
也就是说,在S830,可以根据解压训练区域图像和训练区域图像,计算码率和图像失真度,从而确定率失真,并根据率失真调整编解码模型的参数,以完成对编解码模型的训练或预训练。之后,采用训练或预训练完成的编解码模型,训练融合模型。That is to say, in S830, the code rate and the image distortion degree can be calculated according to the decompressed training area image and the training area image, so as to determine the rate distortion, and adjust the parameters of the codec model according to the rate distortion, so as to complete the training of the codec model or pretrained. After that, use the trained or pre-trained encoder-decoder model to train the fusion model.
或者,在S830,也可以根据解压训练区域图像,利用融合模型的处理,确定训练恢复图像。根据解压训练区域图像,计算码率。根据训练恢复图像与训练图像,计算图像失真度,从而,根据码率和图像失真度确定率失真。之后,可以根据率失真调整编解码模型和融合模型的参数,以完成对编解码模型和融合模型的训练。Alternatively, at S830, the training restoration image may also be determined by processing the fusion model according to the decompressed training area image. Calculate the bit rate according to the decompressed training area image. According to the training recovery image and the training image, the image distortion degree is calculated, and the rate distortion is determined according to the bit rate and the image distortion degree. After that, the parameters of the encoder-decoder model and the fusion model can be adjusted according to the rate-distortion to complete the training of the encoder-decoder model and the fusion model.
图9是本申请实施例提供的一种图像处理方法的示意性流程图。图9所示的方法900可以由图像处理装置来执行,该图像处理装置可以是移动终端,电脑、服务器等运算能力足以用来图像处理装置。FIG. 9 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The method 900 shown in FIG. 9 can be executed by an image processing device, and the image processing device can be a mobile terminal, and the computing power of a computer, a server, etc. is sufficient for the image processing device.
方法900可以具体应用在图像传输、图形存储等需要对图像进行压缩处理的领域。方法包括S910至S930,下面分别对这些步骤进行详细的描述。The method 900 may be specifically applied in the fields of image transmission, graphics storage, etc. that need to compress images. The method includes S910 to S930, and these steps are described in detail below.
在S910,确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息。At S910, determine the texture complexity information corresponding to each of the multiple area images in the image to be processed.
在S920,根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型。At S920, an image compression model corresponding to each regional image is determined according to the texture complexity information corresponding to each regional image.
其中,不同的纹理复杂度信息对应不同的图像压缩模型。Among them, different texture complexity information corresponds to different image compression models.
为了确定区域图像的纹理复杂度,可以计算区域图像每个像素在两个不同方向的差分值。以区域图像的水平方向为x轴,竖直方向为y轴,建立平面直角坐标系。区域图像以一个二维数组的形式存储,则像素(i,j)在x方向的差分值为:In order to determine the texture complexity of the area image, the difference value of each pixel of the area image in two different directions can be calculated. Taking the horizontal direction of the area image as the x-axis and the vertical direction as the y-axis, a plane rectangular coordinate system is established. The area image is stored in the form of a two-dimensional array, and the difference value of the pixel (i, j) in the x direction is:
dx(i,j)=p(i,j)-p(i-1,j)dx(i,j)=p(i,j)-p(i-1,j)
像素(i,j)在x方向的差分值为:The difference value of the pixel (i, j) in the x direction is:
dy(i,j)=p(i,j)-p(i,j-1)dy(i,j)=p(i,j)-p(i,j-1)
其中,p(i,j)可以是像素(i,j)的亮度,也可以是其他用于表示该像素颜色的参数。Wherein, p(i, j) may be the brightness of the pixel (i, j), or may be other parameters used to represent the color of the pixel.
根据这两个方向的差分值计算区域图像中每个像素(i,j)的梯度(也可以称为梯度算子)可以通过向量表示:(dx(i,j),dy(i,j))。Calculate the gradient of each pixel (i, j) in the area image according to the difference values in these two directions (also known as the gradient operator) can be represented by a vector: (dx(i,j),dy(i,j) ).
像素(i,j)的梯度大小Grad(i,j)为:The gradient size Grad(i, j) of pixel (i, j) is:
Figure PCTCN2021086836-appb-000011
Figure PCTCN2021086836-appb-000011
从而,可以得到区域图像的梯度大小的平均值:Thus, the average value of the gradient magnitude of the region image can be obtained:
Figure PCTCN2021086836-appb-000012
Figure PCTCN2021086836-appb-000012
其中,W表示区域图像在x方向的像素数量,H表示区域图像在y方向的像素数量。区域图像的梯度平均值G可以用于表示和评价区域图像的纹理复杂度。Among them, W represents the number of pixels in the area image in the x direction, and H represents the number of pixels in the area image in the y direction. The gradient mean value G of the area image can be used to represent and evaluate the texture complexity of the area image.
在一些实施例中,每个像素(i,j)的梯度大小(也可以称为梯度长度)Grad(i,j)也可以表示为:In some embodiments, the gradient size (which can also be referred to as the gradient length) of each pixel (i, j) Grad(i, j) can also be expressed as:
Grad(i,j)=|dx(i,j)|+|dy(i,j)|Grad(i,j)=|dx(i,j)|+|dy(i,j)|
上述梯度计算方式中,对x方向、y方向的差分值取绝对值或计算平方值,可以防止出现两个方向的正负相反发生抵消,使得梯度计算结果更准确。In the above gradient calculation method, taking the absolute value or calculating the square value of the difference value in the x-direction and the y-direction can prevent the positive and negative opposites of the two directions from canceling out, making the gradient calculation result more accurate.
可以根据纹理复杂度与压缩模型的对应关系,确定与每个区域图像的第一纹理复杂度信息对应的图像压缩模型。The image compression model corresponding to the first texture complexity information of each region image may be determined according to the corresponding relationship between the texture complexity and the compression model.
纹理复杂度信息与图像压缩模型的对应关系,可以包括两种或两种以上的纹理复杂度信息,以及每种纹理复杂度信息对应的压缩模型。The correspondence between the texture complexity information and the image compression model may include two or more types of texture complexity information, and a compression model corresponding to each texture complexity information.
例如,可以在图像的梯度大小的平均值大于或等于预设值时,确定图像的纹理复杂度为复杂;在图像的梯度大小的平均值小于预设值时,确定图像的纹理复杂度为简单。从而,可以根据图像的梯度大小的平均值,确定与该图像对应的压缩模型。For example, when the average value of the gradient size of the image is greater than or equal to the preset value, the texture complexity of the image can be determined to be complex; when the average value of the gradient size of the image is smaller than the preset value, the texture complexity of the image can be determined to be simple . Therefore, the compression model corresponding to the image can be determined according to the average value of the gradient size of the image.
应当理解,该纹理复杂度与压缩模型的对应关系,与训练方法900中所使用的第一压缩模型时所使用的纹理复杂度与压缩模型的对应关系是相同的。It should be understood that the correspondence between the texture complexity and the compression model is the same as the correspondence between the texture complexity and the compression model used when training the first compression model used in the method 900 .
在S930,利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。At S930, each area image is compressed using an image compression model corresponding to the each area image.
通过S910至S930,对于不同纹理复杂度的图像,可以使用与该图像纹理复杂度相对应的图像压缩模型进行对待处理图像中的各个区域图像进行压缩,从而提高对待处理图像的图像压缩效果。Through S910 to S930, for images with different texture complexities, image compression models corresponding to the image texture complexities can be used to compress each region image in the to-be-processed image, thereby improving the image compression effect of the to-be-processed image.
待处理图像可以是一张完整的图像,例如可以是摄像头采集的一张照片,或者视频中的一帧图像。The image to be processed may be a complete image, for example, a photo captured by a camera, or a frame of image in a video.
对待处理图像进行划分,以得到多个区域图像。为了降低划分的复杂程度,该多个区域图像的尺寸可以相同。为了降低压缩后的数据量,该多个区域图像之间不存在重叠的区域。多个区域图像可以包括所述待处理图像中的全部像素,从而降低图像失真度,提高压缩性能。Divide the image to be processed to obtain multiple area images. In order to reduce the complexity of division, the sizes of the plurality of area images may be the same. In order to reduce the amount of compressed data, there is no overlapping area between the multiple area images. The multiple region images may include all pixels in the to-be-processed image, thereby reducing image distortion and improving compression performance.
一张完整的图像的不同区域中,图像的纹理复杂度可能并不相同。例如,在天空、海滩等背景区域,图像的纹理复杂度较低;在包括人物等目标的兴趣区域或前景区域,图像复杂度较高。In different regions of a complete image, the texture complexity of the image may not be the same. For example, in background areas such as sky and beach, the texture complexity of the image is low; in areas of interest or foreground areas including objects such as people, the image complexity is high.
通过将待处理图像划分为多个区域图像,对每个区域图像分别使用与该图像纹理复杂度相对应的压缩模型进行图像的压缩,可以对待处理图像中不同纹理复杂度的区域进行与该纹理复杂度相适应的压缩处理,以提高对待处理图像整体的压缩效果。本申请实施例提供了一种更加灵活的图像处理方式。By dividing the image to be processed into multiple regional images, and compressing each regional image using a compression model corresponding to the texture complexity of the image, the regions with different texture complexities in the image to be processed can be compared with the texture of the image. Complexity-adapted compression processing to improve the overall compression effect of the image to be processed. The embodiments of the present application provide a more flexible image processing manner.
对待处理图像划分为多个区域图像,并对每个区域图像进行了压缩处理,得到了压缩后的数据。在对压缩后的数据进行解压,以得到待处理图像时,可以分别对每个区域图像对应的压缩后的数据进行解压。The image to be processed is divided into multiple area images, and each area image is compressed to obtain compressed data. When the compressed data is decompressed to obtain the image to be processed, the compressed data corresponding to each regional image may be decompressed separately.
分别使用与每个区域图像对应的图像压缩模型对该区域图像进行压缩,以得到该区域图像的图像特征。对于压缩后得到的图像特征,应当使用于与压缩时的图像压缩模型对应的解压模型进行解压,以得到区域解压图像,区域解压图像也可以理解为进行解压得到的恢复后的区域图像。Each area image is compressed using an image compression model corresponding to the area image to obtain the image features of the area image. For the image features obtained after compression, the decompression model corresponding to the image compression model during compression should be used for decompression to obtain a regional decompressed image, and the regional decompressed image can also be understood as a restored regional image obtained by decompression.
对待处理图像中各个区域图像对应的区域解压图像进行拼接,从而可以对待处理图像进行恢复。The region decompressed images corresponding to each region image in the to-be-processed image are spliced, so that the to-be-processed image can be restored.
进一步地,可以对各个区域解压图像进行优化处理。Further, optimization processing can be performed on the decompressed images of each region.
优化处理可以包括对一个或多个区域解压图像的边沿区域进行调整。The optimization process may include adjusting the border regions of the decompressed image for one or more regions.
由于各个区域图像对应的第二图像可以是通过不同的解压模型得到的,在两个相邻的第二图像的边沿区域,在拼接之后可能出现线条不连续或颜色差异等现象。为了使得解压 得到的图像与待处理图像之间的图像失真度更小,可以对一个或多个第二图像的边沿区域的像素进行调整。Since the second images corresponding to the images in each region may be obtained by different decompression models, in the edge region of two adjacent second images, discontinuous lines or color differences may appear after splicing. In order to make the image distortion degree between the decompressed image and the image to be processed smaller, the pixels of the edge region of one or more second images may be adjusted.
优化处理还可以包括对区域解压图像的边沿区域之外的其他区域的调整。The optimization process may also include adjustments to regions other than the edge regions of the region-decompressed image.
可以在拼接之前或之后,进行优化处理,本申请实施例对此不作限定。Optimization processing may be performed before or after splicing, which is not limited in this embodiment of the present application.
可以使用融合模型对第二图像的边沿区域的像素进行拼接处理和优化处理。The pixels in the edge region of the second image may be stitched and optimized using the fusion model.
通过优化处理,对一个或多个第二图像的边沿区域的像素进行调整,可以进一步降低图像失真度,提高图像压缩效果。Through the optimization process, the pixels of one or more edge regions of the second image are adjusted, which can further reduce the degree of image distortion and improve the image compression effect.
图10是本申请实施例提高的图像处理方法的压缩性能的示意图。FIG. 10 is a schematic diagram of the compression performance of the image processing method improved by the embodiment of the present application.
采用图像处理方法900对图像进行压缩和解压,能够实现更好的图像压缩性能。Using the image processing method 900 to compress and decompress images can achieve better image compression performance.
对待处理图像划分为多个大小相等的区域图像。该多个区域图像包括待处理图像的全部像素。Divide the image to be processed into a plurality of area images of equal size. The plurality of area images include all pixels of the image to be processed.
对于每个区域图像,在图像的梯度大小的平均值大于或等于预设值时,确定图像的纹理复杂度为复杂;在图像的梯度大小的平均值小于预设值时,确定图像的纹理复杂度为简单。从而,可以根据图像的梯度大小的平均值,确定与该图像对应的压缩模型和解压模型,对图像进行压缩,对压缩结果进行解压。For each area image, when the average value of the gradient size of the image is greater than or equal to the preset value, the texture complexity of the image is determined to be complex; when the average value of the gradient size of the image is less than the preset value, the texture complexity of the image is determined to be complex degree is simple. Therefore, the compression model and the decompression model corresponding to the image can be determined according to the average value of the gradient size of the image, the image is compressed, and the compression result is decompressed.
之后,利用融合模型,调整位于解压得到的图像边沿区域的像素,并进行拼接,得到解压后的图像。After that, use the fusion model to adjust the pixels located in the edge area of the decompressed image, and perform stitching to obtain the decompressed image.
使用不同的压缩算法,在Kodak数据集上进行测试。如图10所示,相比于不区分图像复杂度,对待处理图像采用单一的压缩模型和解压模型进行处理的方式,本申请实施例提供的图像处理方法,采用多模型处理的方式,能够在相同码率的情况下,有效提高PSNR,具有更低的图像失真度。Tested on Kodak dataset using different compression algorithms. As shown in FIG. 10 , compared with the method in which the image to be processed is processed by using a single compression model and a decompression model, the image processing method provided by the embodiment of the present application adopts the multi-model processing method, which can In the case of the same bit rate, the PSNR is effectively improved, and the image distortion is lower.
上文结合图1至图10的描述了本申请实施例提供的图像处理系统、图像处理系统所需的AI模型的训练方法以及图像处理方法,下面结合图11至图14,描述本申请实施例的装置实施例。应理解,图像处理系统、图像处理系统所需的AI模型的训练方法以及图像处理方法的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见上文的描述。The image processing system provided by the embodiment of the present application, the AI model training method required by the image processing system, and the image processing method are described above with reference to FIGS. 1 to 10 . The following describes the embodiment of the present application with reference to FIGS. 11 to 14 . device example. It should be understood that the descriptions of the image processing system, the AI model training method required by the image processing system, and the image processing method correspond to the descriptions of the apparatus embodiments. Therefore, for the parts not described in detail, reference may be made to the above descriptions.
图11是本申请实施例提供的一种图像处理装置的示意性结构图。FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
图像处理装置2000包括存储模块2010、处理模块2020。The image processing apparatus 2000 includes a storage module 2010 and a processing module 2020 .
存储模块2010用于存储程序指令。The storage module 2010 is used to store program instructions.
当所述程序指令在所述处理器中执行时,处理模块2020用于:When the program instructions are executed in the processor, the processing module 2020 is configured to:
确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;
根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;
利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。Each area image is compressed by using an image compression model corresponding to each area image.
可选地,处理模块2020还用于,利用与压缩所述每个区域图像的图像压缩模型对应的图像解压模型,对所述每个区域图像压缩后得到的图像特征进行解压缩,以得到所述每个区域图像对应的区域解压图像。Optionally, the processing module 2020 is further configured to use an image decompression model corresponding to the image compression model for compressing the image of each area to decompress the image features obtained after the image of each area is compressed, so as to obtain the image feature of each area. The region decompressed image corresponding to each region image is described.
处理模块2020还用于,对所述多个区域解压图像进行拼接处理和优化处理,得到恢复后的待处理图像,所述优化处理包括对所述多个区域解压图像的边沿进行像素调整。The processing module 2020 is further configured to perform stitching processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images of the multiple regions.
可选地,处理模块2020还用于,计算每个区域图像中每个像素的梯度大小。Optionally, the processing module 2020 is further configured to calculate the gradient size of each pixel in each regional image.
处理模块2020还用于,根据所述每个像素的梯度大小,确定每个区域图像的纹理复杂度信息。The processing module 2020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each regional image.
可选地,处理模块2020还用于,将所述待处理图像划分为所述多个区域图像,所述多个区域图像不重叠,且所述多个区域图像包括所述待处理图像中的全部像素。Optionally, the processing module 2020 is further configured to divide the image to be processed into the multiple area images, the multiple area images do not overlap, and the multiple area images include the image to be processed. All pixels.
图12是本申请实施例提供的一种神经网络训练装置的示意性结构图。FIG. 12 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the present application.
神经网络训练装置3000包括存储模块3010、处理模块3020。The neural network training apparatus 3000 includes a storage module 3010 and a processing module 3020 .
存储模块3010用于存储程序指令。The storage module 3010 is used to store program instructions.
当所述程序指令在所述处理器中执行时,处理模块3020用于:When the program instructions are executed in the processor, the processing module 3020 is configured to:
确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;
根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型,其中,不同的纹理复杂度信息对应不同的编解码模型,每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压得到多个解压训练区域图像;According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;
根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
可选地,处理模块3020还用于,通过融合模型对所述多个解压训练区域图像进行拼接处理和优化处理,所述优化处理包括对所述多个解压训练区域图像的边沿进行像素调整。Optionally, the processing module 3020 is further configured to perform stitching processing and optimization processing on the multiple decompressed training area images by using a fusion model, and the optimization processing includes performing pixel adjustment on the edges of the multiple decompressed training area images.
处理模块3020还用于,根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述融合模型的参数。The processing module 3020 is further configured to adjust the parameters of the fusion model according to the degree of image distortion between the training restored image and the training image.
可选地,处理模块3020还用于,根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述编解码模型的参数。Optionally, the processing module 3020 is further configured to adjust parameters of the encoding and decoding model according to the degree of image distortion between the training restoration image and the training image.
可选地,处理模块3020还用于,计算每个训练区域图像中每个像素的梯度大小。Optionally, the processing module 3020 is further configured to calculate the gradient size of each pixel in each training area image.
处理模块3020还用于,根据所述每个像素的梯度大小,确定每个训练区域图像的纹理复杂度信息。The processing module 3020 is further configured to, according to the gradient size of each pixel, determine the texture complexity information of each training area image.
可选地,处理模块3020还用于,将所述训练图像划分为所述多个训练区域图像,所述多个训练区域图像不重叠,且所述多个训练区域图像包括所述训练图像中的全部像素。Optionally, the processing module 3020 is further configured to divide the training image into the multiple training area images, the multiple training area images do not overlap, and the multiple training area images include the training area images. of all pixels.
图13是本申请实施例的图像处理装置的硬件结构示意图。图13所示的图像处理装置4000包括存储器4001、处理器4002、通信接口4003以及总线4004。其中,存储器4001、处理器4002、通信接口4003通过总线4004实现彼此之间的通信连接。FIG. 13 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 4000 shown in FIG. 13 includes a memory 4001 , a processor 4002 , a communication interface 4003 , and a bus 4004 . The memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.
存储器4001可以是ROM,静态存储设备和RAM。存储器4001可以存储程序,当存储器4001中存储的程序被处理器4002执行时,处理器4002和通信接口4003用于执行本申请实施例的图像处理方法的各个步骤。The memory 4001 may be ROM, static storage device and RAM. The memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the image processing method of the embodiment of the present application.
处理器4002可以采用通用的,CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像处理装置中的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。The processor 4002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be performed by the units in the image processing apparatus of the embodiments of the present application, Or execute the image processing method of the method embodiment of the present application.
处理器4002还可以是一种集成电路芯片,具有信号的处理能力,例如,可以是图4所示的芯片。在实现过程中,本申请实施例的图像处理方法的各个步骤可以通过处理器 4002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 4002 may also be an integrated circuit chip with signal processing capability, for example, the chip shown in FIG. 4 . In the implementation process, each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.
上述处理器4002还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器4001,处理器4002读取存储器4001中的信息,结合其硬件完成本申请实施例的图像处理装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。The above-mentioned processor 4002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001, and combines its hardware to complete the functions required to be performed by the units included in the image processing apparatus of the embodiments of the present application, or to perform the image processing of the method embodiments of the present application. method.
通信接口4003使用例如但不限于收发器一类的收发装置,来实现装置4000与其他设备或通信网络之间的通信。例如,可以通过通信接口4003获取待处理图像。The communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver. For example, the image to be processed can be acquired through the communication interface 4003 .
总线4004可包括在装置4000各个部件(例如,存储器4001、处理器4002、通信接口4003)之间传送信息的通路。Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).
图14是本申请实施例的神经网络训练装置的硬件结构示意图。与上述装置3000和装置4000类似,图14所示的神经网络训练装置5000包括存储器5001、处理器5002、通信接口5003以及总线5004。其中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。FIG. 14 is a schematic diagram of a hardware structure of a neural network training apparatus according to an embodiment of the present application. Similar to the above-mentioned apparatus 3000 and apparatus 4000 , the neural network training apparatus 5000 shown in FIG. 14 includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 . The memory 5001 , the processor 5002 , and the communication interface 5003 are connected to each other through the bus 5004 for communication.
可以通过图14所示的神经网络训练装置5000对该神经网络进行训练,训练得到的神经网络就可以用于执行本申请实施例的图像处理方法了。The neural network can be trained by the neural network training apparatus 5000 shown in FIG. 14 , and the neural network obtained by training can be used to execute the image processing method of the embodiment of the present application.
具体地,图14所示的装置可以通过通信接口5003从外界获取训练数据以及待训练的神经网络,然后由处理器根据训练数据对待训练的神经网络进行训练。Specifically, the apparatus shown in FIG. 14 can obtain training data and the neural network to be trained from the outside through the communication interface 5003, and then the processor can train the neural network to be trained according to the training data.
应注意,尽管上述装置4000和装置5000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置4000和装置5000还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置4000和装置5000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置4000和装置5000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图13和图14中所示的全部器件。It should be noted that although the above-mentioned apparatus 4000 and apparatus 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may also include the necessary components for normal operation. of other devices. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 4000 and the apparatus 5000 may only include the necessary devices for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 13 and FIG. 14 .
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM), 其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (DRAM) Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Fetch memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive.
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。It should be understood that the term "and/or" in this document is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, and A and B exist at the same time , there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this document generally indicates that the related objects before and after are an "or" relationship, but may also indicate an "and/or" relationship, which can be understood with reference to the context.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application, "at least one" means one or more, and "plurality" means two or more. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple .
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元 的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (18)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;
    根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;
    利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。Each area image is compressed by using an image compression model corresponding to each area image.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    利用与压缩所述每个区域图像的图像压缩模型对应的图像解压模型,对所述每个区域图像压缩后得到的图像特征进行解压缩,得到所述每个区域图像对应的区域解压图像;Using the image decompression model corresponding to the image compression model for compressing the image of each area, decompress the image features obtained after the image of each area is compressed, and obtain the area decompression image corresponding to the image of each area;
    对所述多个区域解压图像进行拼接处理和优化处理,得到恢复后的待处理图像,所述优化处理包括对至少一个所述区域解压图像的边沿进行调整。Perform splicing processing and optimization processing on the plurality of regional decompressed images to obtain restored images to be processed, and the optimization processing includes adjusting the edge of at least one of the regional decompressed images.
  3. 根据权利要求1或2所述的方法,其特征在于,所述确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息,包括:The method according to claim 1 or 2, wherein the determining the texture complexity information corresponding to each region image in the plurality of region images in the image to be processed comprises:
    计算每个区域图像中每个像素的梯度大小;Calculate the gradient size of each pixel in each region image;
    根据所述每个像素的梯度大小,确定每个区域图像的纹理复杂度信息。According to the gradient size of each pixel, the texture complexity information of each region image is determined.
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    将所述待处理图像划分为所述多个区域图像,所述多个区域图像不重叠,且所述多个区域图像包括所述待处理图像中的全部像素。The to-be-processed image is divided into the multiple area images, the multiple area images do not overlap, and the multiple area images include all pixels in the to-be-processed image.
  5. 一种神经网络训练方法,其特征在于,所述方法包括:A neural network training method, characterized in that the method comprises:
    确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;
    根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型,其中,不同的纹理复杂度信息对应不同的编解码模型,每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压得到多个解压训练区域图像;According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;
    根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  6. 如权利要求5所述的神经网络训练方法,其特征在于,所述方法还包括:The neural network training method according to claim 5, wherein the method further comprises:
    通过融合模型对所述多个解压训练区域图像进行拼接处理和优化处理,得到训练恢复图像,所述优化处理包括对至少一个所述解压训练区域图像的边沿进行调整;Perform splicing processing and optimization processing on the plurality of decompressed training area images through a fusion model to obtain a training restoration image, and the optimization processing includes adjusting the edge of at least one of the decompressed training area images;
    根据所述训练恢复图像与所述训练图像之间的图像失真度,调整所述编解码模型和所述融合模型的参数。The parameters of the encoding and decoding model and the fusion model are adjusted according to the degree of image distortion between the training restoration image and the training image.
  7. 根据权利要求5或6所述的方法,其特征在于,所述确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息,包括:The method according to claim 5 or 6, wherein the determining the texture complexity information corresponding to each training area image in the plurality of training area images in the training image comprises:
    计算每个训练区域图像中每个像素的梯度大小;Calculate the gradient size of each pixel in each training area image;
    根据所述每个像素的梯度大小,确定每个训练区域图像的纹理复杂度信息。According to the gradient size of each pixel, the texture complexity information of each training area image is determined.
  8. 根据权利要求5-7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 5-7, wherein the method further comprises:
    将所述训练图像划分为所述多个训练区域图像,所述多个训练区域图像不重叠,且所 述多个训练区域图像包括所述训练图像中的全部像素。The training image is divided into the plurality of training area images, the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
  9. 一种图像处理装置,其特征在于,包括存储模块和处理模块;An image processing device, comprising a storage module and a processing module;
    所述存储模块用于存储程序指令;The storage module is used to store program instructions;
    当所述程序指令在所述处理器中执行时,所述处理模块用于:When the program instructions are executed in the processor, the processing module is configured to:
    确定待处理图像中的多个区域图像中每个区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each area image in the multiple area images in the image to be processed;
    根据所述每个区域图像对应的纹理复杂度信息,确定所述每个区域图像对应的图像压缩模型,其中,不同的纹理复杂度信息对应不同的图像压缩模型;According to the texture complexity information corresponding to each area image, the image compression model corresponding to each area image is determined, wherein different texture complexity information corresponds to different image compression models;
    利用所述每个区域图像对应的图像压缩模型对所述每个区域图像进行压缩。Each area image is compressed by using an image compression model corresponding to each area image.
  10. 根据权利要求9所述的装置,其特征在于,所述处理模块还用于:The device according to claim 9, wherein the processing module is further configured to:
    利用与压缩所述每个区域图像的图像压缩模型对应的图像解压模型,对所述每个区域图像压缩后得到的图像特征进行解压缩,得到所述每个区域图像对应的区域解压图像;Using the image decompression model corresponding to the image compression model for compressing the image of each area, decompress the image features obtained after the image of each area is compressed, and obtain the area decompression image corresponding to the image of each area;
    对所述多个区域解压图像进行拼接处理和优化处理,得到恢复后的待处理图像,所述优化处理包括对所述多个区域解压图像的边沿进行像素调整。Perform splicing processing and optimization processing on the decompressed images of the multiple regions to obtain a restored image to be processed, and the optimization processing includes performing pixel adjustment on the edges of the decompressed images in the multiple regions.
  11. 根据权利要求9或10所述的装置,其特征在于,所述处理模块还用于:The device according to claim 9 or 10, wherein the processing module is further configured to:
    计算每个区域图像中每个像素的梯度大小;Calculate the gradient size of each pixel in each region image;
    根据所述每个像素的梯度大小,确定每个区域图像的纹理复杂度信息。According to the gradient size of each pixel, the texture complexity information of each region image is determined.
  12. 根据权利要求9-11中任一项所述的装置,其特征在于,所述处理模块还用于:The device according to any one of claims 9-11, wherein the processing module is further configured to:
    将所述待处理图像划分为所述多个区域图像,所述多个区域图像不重叠,且所述多个区域图像包括所述待处理图像中的全部像素。The to-be-processed image is divided into the multiple area images, the multiple area images do not overlap, and the multiple area images include all pixels in the to-be-processed image.
  13. 一种神经网络训练装置,其特征在于,包括存储模块和处理模块;A neural network training device, comprising a storage module and a processing module;
    所述存储模块用于存储程序指令,The storage module is used to store program instructions,
    当所述程序指令在所述处理器中执行时,所述处理模块用于:When the program instructions are executed in the processor, the processing module is configured to:
    确定训练图像中的多个训练区域图像中每个训练区域图像对应的纹理复杂度信息;Determine the texture complexity information corresponding to each training area image in the multiple training area images in the training image;
    根据所述每个训练区域图像对应的训练纹理复杂度信息,确定所述每个训练区域图像对应的编解码模型,其中,不同的纹理复杂度信息对应不同的编解码模型,每个所述编解码模型用于对输入的所述训练区域图像进行压缩,并对压缩结果进行解压得到多个解压训练区域图像;According to the training texture complexity information corresponding to each training area image, the codec model corresponding to each training area image is determined, wherein different texture complexity information corresponds to different codec models, and each codec model The decoding model is used to compress the input training area images, and decompress the compression results to obtain a plurality of decompressed training area images;
    根据率失真调整所述编解码模型的参数,所述率失真根据所述解压训练区域图像和所述训练区域图像得到。The parameters of the codec model are adjusted according to the rate distortion obtained from the decompressed training area image and the training area image.
  14. 根据权利要求13所述的装置,其特征在于,所述处理模块还用于:The device according to claim 13, wherein the processing module is further configured to:
    通过融合模型对所述多个解压训练区域图像进行拼接处理和优化处理,得到训练恢复图像,所述优化处理包括对至少一个所述解压训练区域图像的边沿进行像素调整;Perform splicing processing and optimization processing on the plurality of decompressed training area images through the fusion model to obtain a training restoration image, and the optimization processing includes performing pixel adjustment on the edge of at least one of the decompressed training area images;
    根据所述训练恢复图像与所述训练图像的图像失真度,调整所述编解码模型和所述融合模型的参数。The parameters of the codec model and the fusion model are adjusted according to the image distortion degree of the training restoration image and the training image.
  15. 根据权利要求13或14所述的装置,其特征在于,所述处理模块还用于:The device according to claim 13 or 14, wherein the processing module is further configured to:
    计算每个训练区域图像中每个像素的梯度大小;Calculate the gradient size of each pixel in each training area image;
    根据所述每个像素的梯度大小,确定每个训练区域图像的纹理复杂度信息。According to the gradient size of each pixel, the texture complexity information of each training area image is determined.
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,所述处理模块还用于:The device according to any one of claims 13-15, wherein the processing module is further configured to:
    将所述训练图像划分为所述多个训练区域图像,所述多个训练区域图像不重叠,且所 述多个训练区域图像包括所述训练图像中的全部像素。The training image is divided into the plurality of training area images, the plurality of training area images do not overlap, and the plurality of training area images include all pixels in the training image.
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至8中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable medium stores a program code for execution by a device, the program code comprising for executing the method according to any one of claims 1 to 8.
  18. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至8中任一项所述的方法。A chip, characterized in that the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface, so as to execute the method according to any one of claims 1 to 8 method.
PCT/CN2021/086836 2020-07-30 2021-04-13 Image processing method and device, and neutral network training method and device WO2022021938A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010754067.X 2020-07-30
CN202010754067.XA CN114067007A (en) 2020-07-30 2020-07-30 Image processing method and device and neural network training method and device

Publications (1)

Publication Number Publication Date
WO2022021938A1 true WO2022021938A1 (en) 2022-02-03

Family

ID=80037134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086836 WO2022021938A1 (en) 2020-07-30 2021-04-13 Image processing method and device, and neutral network training method and device

Country Status (2)

Country Link
CN (1) CN114067007A (en)
WO (1) WO2022021938A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491272A (en) * 2022-02-14 2022-05-13 北京有竹居网络技术有限公司 Multimedia content recommendation method and device
CN115278246A (en) * 2022-08-01 2022-11-01 天津大学 End-to-end intelligent compression coding method and device for depth map
CN116684607A (en) * 2023-07-26 2023-09-01 腾讯科技(深圳)有限公司 Image compression and decompression method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147501B (en) * 2022-09-05 2022-12-02 深圳市明源云科技有限公司 Picture decompression method and device, terminal device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217668A (en) * 2008-01-14 2008-07-09 浙江大学 A mixed image compression method based on block classification
CN103700121A (en) * 2013-12-30 2014-04-02 Tcl集团股份有限公司 Method and device for compressing composite image
CN108062780A (en) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 Method for compressing image and device
CN109996078A (en) * 2019-02-25 2019-07-09 阿里巴巴集团控股有限公司 A kind of method for compressing image, device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217668A (en) * 2008-01-14 2008-07-09 浙江大学 A mixed image compression method based on block classification
CN103700121A (en) * 2013-12-30 2014-04-02 Tcl集团股份有限公司 Method and device for compressing composite image
CN108062780A (en) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 Method for compressing image and device
CN109996078A (en) * 2019-02-25 2019-07-09 阿里巴巴集团控股有限公司 A kind of method for compressing image, device and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491272A (en) * 2022-02-14 2022-05-13 北京有竹居网络技术有限公司 Multimedia content recommendation method and device
CN114491272B (en) * 2022-02-14 2023-09-12 北京有竹居网络技术有限公司 Multimedia content recommendation method and device
CN115278246A (en) * 2022-08-01 2022-11-01 天津大学 End-to-end intelligent compression coding method and device for depth map
CN115278246B (en) * 2022-08-01 2024-04-16 天津大学 Depth map end-to-end intelligent compression coding method and device
CN116684607A (en) * 2023-07-26 2023-09-01 腾讯科技(深圳)有限公司 Image compression and decompression method and device, electronic equipment and storage medium
CN116684607B (en) * 2023-07-26 2023-11-14 腾讯科技(深圳)有限公司 Image compression and decompression method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114067007A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2022021938A1 (en) Image processing method and device, and neutral network training method and device
WO2020216227A9 (en) Image classification method and apparatus, and data processing method and apparatus
WO2020177651A1 (en) Image segmentation method and image processing device
WO2021043273A1 (en) Image enhancement method and apparatus
WO2022001372A1 (en) Neural network training method and apparatus, and image processing method and apparatus
WO2021018163A1 (en) Neural network search method and apparatus
CN113259665B (en) Image processing method and related equipment
WO2020177607A1 (en) Image denoising method and apparatus
CN113066017B (en) Image enhancement method, model training method and equipment
WO2021018245A1 (en) Image classification method and apparatus
WO2021018251A1 (en) Image classification method and device
CN113066018A (en) Image enhancement method and related device
CN113011562A (en) Model training method and device
WO2024002211A1 (en) Image processing method and related apparatus
WO2022179588A1 (en) Data coding method and related device
WO2021057091A1 (en) Viewpoint image processing method and related device
TWI826160B (en) Image encoding and decoding method and apparatus
WO2021042774A1 (en) Image recovery method, image recovery network training method, device, and storage medium
CN113284055A (en) Image processing method and device
WO2023174256A1 (en) Data compression method and related device
WO2022022176A1 (en) Image processing method and related device
WO2022001364A1 (en) Method for extracting data features, and related apparatus
CN115409697A (en) Image processing method and related device
WO2021189321A1 (en) Image processing method and device
CN115294429A (en) Feature domain network training method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849568

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849568

Country of ref document: EP

Kind code of ref document: A1