WO2020237646A1 - Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur - Google Patents

Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2020237646A1
WO2020237646A1 PCT/CN2019/089588 CN2019089588W WO2020237646A1 WO 2020237646 A1 WO2020237646 A1 WO 2020237646A1 CN 2019089588 W CN2019089588 W CN 2019089588W WO 2020237646 A1 WO2020237646 A1 WO 2020237646A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processed
channel
frequency domain
neural network
Prior art date
Application number
PCT/CN2019/089588
Other languages
English (en)
Chinese (zh)
Inventor
李恒杰
赵文军
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/089588 priority Critical patent/WO2020237646A1/fr
Priority to CN201980008045.4A priority patent/CN111630570A/zh
Publication of WO2020237646A1 publication Critical patent/WO2020237646A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present invention relate to the field of image processing technology, and in particular to an image processing method, an image processing device, and a computer-readable storage medium.
  • Image coding also known as image compression, refers to a technology that uses a small number of bits to represent an image or the information contained in an image under certain quality conditions.
  • the existing image coding is generally implemented by an image encoder. Since there are often differences between different images, there are also differences between different image features. When the image is encoded by the image encoder, the encoder needs to manually extract the features of different images, and adjust the parameters according to the extracted features.
  • the embodiments of the present invention provide an image processing method, an image processing device, and a computer-readable storage medium, so as to solve the technical problem that manual image encoding is relatively time-consuming and labor-intensive in the prior art.
  • an image processing method including: obtaining frequency domain information of an image to be processed, the frequency domain information being obtained through time-frequency conversion processing by an image encoder; A neural network model processes the frequency domain information to obtain the first coding parameter of the image to be processed; sends the first coding parameter to the image encoder, so that the image encoder is based on the The first encoding parameter encodes the image to be processed.
  • an image processing device including: a memory and a processor; the memory is used to store program code; the processor calls the program code, and when the program code is executed Is used to perform the following operations: acquiring frequency domain information of the image to be processed, the frequency domain information being obtained through time-frequency conversion processing by an image encoder; and processing the frequency domain information through a preset first neural network model , Obtain the first encoding parameter of the image to be processed; send the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter .
  • a computer-readable storage medium having a computer program stored thereon, and the computer program implements the image processing method described in the first aspect when the computer program is executed by a processor.
  • the frequency domain information is obtained by the time-frequency conversion processing of the image encoder; and the preset first neural network model
  • the frequency domain information is processed to obtain the first encoding parameter of the image to be processed;
  • the first encoding parameter is sent to the image encoder, so that the image encoder performs a pairing according to the first encoding parameter
  • the image to be processed is encoded.
  • the first neural network model can realize the automatic optimization of the first encoding parameters of the image to be processed.
  • the optimal first encoding parameter is selected for the image encoder, which improves the encoding efficiency and performance of the image encoder, and can achieve higher quality image encoding effects at the same compression rate In this way, it is possible to further achieve the optimal decoding effect under the same evaluation index, and realize the effective combination of deep learning and the internal structure of the image encoder.
  • the embodiment of the invention discloses an image encoder parameter optimization method based on frequency domain characteristics, which can be applied to scene-related products such as image and video compression coding.
  • Fig. 1 is a schematic flowchart of an image coding method provided by an exemplary embodiment of the present invention
  • Fig. 2 is a block diagram of an image encoder provided by an exemplary embodiment of the present invention.
  • Fig. 3 is a structural diagram of a VGG-16 model provided by an exemplary embodiment of the present invention.
  • Fig. 4 is a structural diagram of a ResNet34 model provided by an exemplary embodiment of the present invention.
  • Figure 5 is a structural diagram of a GoogLeNet model provided by an exemplary embodiment of the present invention.
  • Fig. 6 is a block diagram of an image encoder based on a first neural network model provided by an exemplary embodiment of the present invention
  • Fig. 7 is a block diagram of an image encoder based on a first neural network model provided by another exemplary embodiment of the present invention.
  • FIG. 8 is a block diagram of an image encoder based on a first neural network model and a second neural network model provided by an exemplary embodiment of the present invention
  • Fig. 9 is a flowchart of image information processing provided by an exemplary embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of an image processing device provided by Embodiment 6 of the present invention.
  • deep learning Since deep learning does not require artificial selection of features, it extracts image features by learning and training the network, and then these extracted features are used to generate subsequent decision results, so as to achieve functions such as classification and recognition. That is, the neural network has a strong learning ability, can comprehensively and accurately establish the mapping relationship between samples and tags, complete a large number of tasks that cannot be completed by traditional methods, or greatly improve the efficiency and accuracy of traditional methods. Therefore, the use of deep learning to optimize the parameters of the image encoder can greatly avoid the shortcomings of the manual encoder parameter optimization.
  • the present invention provides an image processing method, equipment, and computer-readable storage medium, which effectively combines deep learning with the encoding process or internal structure of an image encoder to avoid the deficiencies of manual tuning. , Give full play to the advantages of deep learning.
  • image encoding method, device, and computer-readable storage medium provided by the present invention can be applied to any image encoding scene.
  • Fig. 1 is a schematic flowchart of an image coding method provided by an exemplary embodiment of the present invention.
  • the method provided in the embodiment of the present invention may be executed by any terminal device and/or server with computing and processing capabilities, which is not limited in the present invention.
  • the method provided by the embodiment of the present invention may include the following steps.
  • step S110 frequency domain information of the image to be processed is acquired, and the frequency domain information is obtained by performing time-frequency conversion processing on the image to be processed by an image encoder.
  • the time-frequency conversion includes any one of K-L transform, Fourier transform, cosine transform, and wavelet transform.
  • a standard image encoder is taken as an example for illustration, but in fact, the technical solution provided by the embodiment of the present invention can be applied to any image encoder.
  • Fig. 2 is a block diagram of an image encoder provided by an exemplary embodiment of the present invention.
  • the general process of a standard image encoder is: input the image to be processed into the image encoder, sequentially undergo transformation processing, quantization processing and entropy encoding processing, and output code stream information.
  • the main image coding standards are: JPEG (Joint Photographic Experts Group), JPEG2000, etc.
  • JPEG Joint Photographic Experts Group
  • JPEG2000 Joint Photographic Experts Group
  • the basic coding is done under the framework of Figure 2.
  • the transformation processing is mainly a process of converting the information of the image to be processed from the time domain to the frequency domain.
  • the purpose is to separate the frequency domain information of different frequency bands and utilize the characteristics of human eyes that are not sensitive to high frequency information. Choose different quantization steps for different frequency bands to reduce the spatial redundancy in image compression, so as to obtain a higher compression rate.
  • the quantization process approximates the transformed frequency domain information according to a certain quantization step.
  • the quantized image information can be expressed with fewer bits, which is an important part of image compression.
  • the entropy coding process refers to expressing the quantized image information according to a certain coding rule.
  • the code stream information obtained after entropy encoding is the representation of the image to be processed after being encoded by the image encoder.
  • the traditional optimization for image encoders is mainly through manual parameter tuning, starting from the image encoder encoding process, that is, the accuracy of transformation, the design of quantization tables, and the estimation of probability distribution in entropy encoding.
  • the image encoder encoding process that is, the accuracy of transformation, the design of quantization tables, and the estimation of probability distribution in entropy encoding.
  • the current image encoders do not fully consider the differences between the images, and the coding parameters Poor universality.
  • step S120 the frequency domain information is processed through a preset first neural network model to obtain the first encoding parameter of the image to be processed.
  • the first neural network model may include N arithmetic units connected in sequence.
  • the processing the frequency domain information through the preset first neural network model to obtain the first coding parameters of the image to be processed may include: inputting the frequency domain information to the first neural network
  • the nth arithmetic unit of the network model outputs the first coding parameter through the Nth arithmetic unit of the first neural network model; wherein, N ⁇ n; n is a positive integer greater than or equal to 2.
  • the arithmetic unit mentioned in the embodiment of the present invention may be a software module.
  • the first neural network model may include at least one of the following: VGG-16 model, VGG-19 model, ResNet model, GoogLeNet, etc.
  • n 2.
  • the present invention is not limited to this.
  • VGGNet is a deep convolutional neural network.
  • VGGNet explored the relationship between the depth of the convolutional neural network and its performance, and successfully constructed a 16-19-layer deep convolutional neural network, which proved that increasing the depth of the network can affect the final performance of the network to a certain extent, causing errors
  • the rate has dropped sharply, and at the same time, the scalability is very strong.
  • the generalization of migration to other image data is also very good, which can be used to extract image features.
  • VGGNet has several different structures, such as VGG-16 and VGG-19 models.
  • VGG-16 and VGG-19 models the design and transformation based on the VGG-16 model is used as the first neural network model for parameter optimization in the embodiment of the present invention. Since the transformation process of the image encoder can be considered to be similar to the function of the neural network convolutional layer, the first two convolutional layers and the first pooling layer of the VGG-16 model can be removed here to obtain a new neural network as Neural network model for parameter optimization.
  • the VGG-16 model may include a first arithmetic unit 310, a second arithmetic unit 320, a third arithmetic unit 330, a fourth arithmetic unit 340, and a fifth arithmetic unit 350, which are connected in sequence.
  • the first operation unit 310 may include a first convolutional layer 311, a second convolutional layer 312, and a first pooling layer 313.
  • the second operation unit 320 may include a third convolution layer 321 and a fourth convolution layer 322.
  • the third operation unit 330 may include a second pooling layer 331, a fifth convolutional layer 332, a sixth convolutional layer 333, and a seventh convolutional layer 334.
  • the fourth operation unit 340 may include a third pooling layer 341, an eighth convolution layer 342, a ninth convolution layer 343, and a tenth convolution layer 344.
  • the fifth operation unit 350 may include a fourth pooling layer 351, an eleventh convolution layer 352, a twelfth convolution layer 353, and a thirteenth convolution layer 354.
  • the sixth computing unit 360 may include a fifth pooling layer 361, a first fully connected layer 362, a second fully connected layer 363, and a third fully connected layer 364.
  • the seventh arithmetic unit 370 includes a softmax layer (normalization layer).
  • the frequency domain information output after the image to be processed is transformed by the image encoder is directly input to the second arithmetic unit 320 of the VGG-16 model, and then passes through the third arithmetic unit 330 and the fourth arithmetic unit 330 in turn.
  • the processing of the arithmetic unit 340, the fifth arithmetic unit 350, the sixth arithmetic unit 360, and the seventh arithmetic unit 370 output the first encoding parameter. That is, the network that processes frequency domain information includes 11 convolutional layers, 4 pooling layers, and 3 fully connected layers.
  • the number of network layers used to process frequency domain information in the embodiment of the present invention is reduced, the amount of calculation is reduced, and the speed Speed up and improve the real-time performance of the coding process.
  • Fig. 4 is a structural diagram of a ResNet34 model provided by an exemplary embodiment of the present invention.
  • the first neural network model for processing frequency domain information in the embodiment of the present invention can also be used to remove the first convolutional layer and the first pooling layer (ie, the first one in Figure 4).
  • the ResNet model of the arithmetic unit 410) directly inputs the frequency domain information output by the image encoder to the second convolutional layer, and finally outputs the first coding parameters through a fully connected layer (fc, fully connected).
  • Fig. 5 is a structural diagram of a GoogLeNet model provided by an exemplary embodiment of the present invention.
  • the first neural network model for processing frequency domain information in the embodiment of the present invention may also use the GoogLeNet model without the first arithmetic unit 510.
  • the first operation unit 510 may include the first convolutional layer, the first maximum pooling layer, and the first LRN (Local Response Norm) layer, and finally pass through the softmax (normalization) layer to output the first encoding parameters .
  • IM in Figure 5 is the abbreviation of Inception module (initial module).
  • the present invention is not limited to this. According to specific application scenarios and actual needs, Other neural networks can be selected for feature extraction, such as other variants of CNN (Convolutional Neural Networks), RNN (Recurrent Neural Network), etc., such as VGG-19 or a network of the same depth or level, such as ResNet50 and above deep level networks, GoogLeNet and some similar deep neural networks.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Network
  • the frequency domain information is processed by removing the first arithmetic unit of the first neural network model as an example, in other embodiments, more arithmetic units may be removed.
  • step S130 the first encoding parameter is sent to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter.
  • the first encoding parameter may include at least one of a typical quantization parameter design, a quantization table design, a feature transformation accuracy design, a rate control proportional design, and the like.
  • the typical quantization parameter design and quantization table design correspond to the parameters of the image encoder for quantization processing;
  • the feature transformation accuracy relates to the parameters corresponding to the time-frequency conversion of the image encoder;
  • the rate control ratio Design parameters corresponding to the entropy coding processing of the image encoder.
  • the image processing method provided by the embodiment of the present invention acquires the frequency domain information of the image to be processed through the time-frequency conversion processing of the image encoder; processes the frequency domain information through the preset first neural network model to obtain the The first encoding parameter of the image to be processed; sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter.
  • the first neural network model can realize the automatic optimization of the first encoding parameters of the image to be processed.
  • the optimal first encoding parameter is selected for the image encoder, which improves the encoding efficiency and performance of the image encoder, and can achieve higher quality image encoding effects at the same compression rate In this way, it is possible to further achieve the optimal decoding effect under the same evaluation index, and realize the effective combination of deep learning and the internal structure of the image encoder.
  • the embodiment of the invention discloses an image encoder parameter optimization method based on frequency domain characteristics, which can be applied to scene-related products such as image and video compression coding.
  • the sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter may include: The first encoding parameter is sent to the image encoder, so that the image encoder performs quantization processing and entropy encoding processing on the frequency domain information of the image to be processed according to the first encoding parameter.
  • the method may further include: the image encoder encodes the image to be processed according to the first encoding parameter to obtain code stream information of the image to be processed.
  • the method may further include: using an image decoder to perform a decoding operation on the code stream information to obtain a reconstructed image to be processed.
  • Fig. 6 is a block diagram of an image encoder based on a first neural network model provided by an exemplary embodiment of the present invention.
  • the first encoding parameter may include a typical quantization parameter design and/or a quantization table design and a rate control proportional design.
  • the image encoder performs quantization processing on the to-be-processed image according to the first coding parameter (for example, a typical quantization parameter design and/or quantization table design) to generate quantized information of the to-be-processed image, and then according to the first An encoding parameter (for example, a rate control proportional design) performs entropy encoding processing on the quantized information, and outputs the code stream information.
  • the code stream information is input to the image decoder for decoding operation, and the image decoder outputs the decoded image, that is, the reconstructed image to be processed is obtained.
  • the sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter may include: The first encoding parameter is sent to the image encoder, so that the image encoder performs time-frequency conversion on the image to be processed again according to the first encoding parameter and generates new frequency domain information based on The first coding parameter performs quantization processing and entropy coding processing on the new frequency domain information. In this way, it is possible to reversely adjust the time-frequency conversion parameters, and further improve the coding performance of the image encoder. An example will be described below in conjunction with the schematic view of FIG. 7.
  • Fig. 7 is a block diagram of an image encoder based on a first neural network model provided by another exemplary embodiment of the present invention.
  • the image to be processed is input to the image encoder, and after a transformation operation, a time-frequency conversion process is performed to generate frequency domain information of the image to be processed; then, the frequency domain information is input to the first nerve
  • the network model is used to process the frequency domain information through the first neural network model, output the first coding parameters of the image to be processed, and then input the first coding parameters to the transformation operation of the image encoder, according to
  • the first coding parameter re-executes time-frequency conversion on the to-be-processed image, generates new frequency-domain information of the to-be-processed image, and then quantizes the newly generated frequency-domain information according to the first coding parameter Processing, generating quantized information of the image to be processed; and then performing entropy coding processing on the quantized information according to the first coding parameter to generate code stream information.
  • the image decoder receives the code stream information, and can decode the image to reconstruct the image to be processed.
  • the sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter may include: The first encoding parameter is sent to the image encoder, so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and generates quantization information of the image to be processed.
  • the method may further include: acquiring quantitative information of the image to be processed; processing the quantitative information through a preset second neural network model to obtain a second image of the image to be processed Encoding parameters; sending the second encoding parameters to the image encoder, so that the image encoder encodes the image to be processed according to the second encoding parameters.
  • the second neural network model can be used to process the quantized information after the quantization of the image encoder, and output the second encoding parameter, so that the image encoder can further perform the processing on the to-be-determined information according to the second encoding parameter. Process the image for encoding. This can further improve coding efficiency and performance.
  • the second neural network model here can adopt any one or multiple neural network structures, which is not limited in the present invention.
  • the second neural network may include M arithmetic units connected in sequence.
  • the processing the quantized information through the preset second neural network model to obtain the second coding parameters of the image to be processed may include: inputting the quantized information into the second neural network model
  • the mth arithmetic unit of the second neural network model outputs the second coding parameter through the Mth arithmetic unit of the second neural network model; wherein, M ⁇ m; m is a positive integer greater than or equal to 2.
  • m can be equal to 2 or 3, but the present invention is not limited to this.
  • the second neural network model may include at least one of the following: VGG-16 model, VGG-19 model, ResNet model, GoogLeNet model, etc. Reference may be made to the description of the first neural network model in the embodiments of FIGS. 3-5.
  • the second encoding parameter may include at least one of a typical quantization parameter design, a quantization table design, a rate control proportional design, and the like.
  • the sending the second encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the second encoding parameter may include:
  • the second coding parameter is sent to the image encoder, so that the image coding can perform an adjustment to the frequency domain of the image to be processed according to the second coding parameter (for example, a typical quantization parameter design and/or a quantization table design).
  • the information is re-quantized and new quantized information is generated, and then the new quantized information is entropy-encoded based on the second coding parameter (for example, rate control proportional design).
  • the second coding parameter for example, rate control proportional design
  • the sending the second encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the second encoding parameter may include:
  • the second encoding parameter is sent to the image encoder, so that the image encoder performs entropy encoding processing on the quantization information according to the second encoding parameter. This will be illustrated below with reference to FIG. 8.
  • Fig. 8 is a block diagram of an image encoder based on a first neural network model and a second neural network model provided by an exemplary embodiment of the present invention.
  • the first encoding parameter may include a typical quantization parameter design and/or a quantization table design;
  • the second encoding parameter may include a rate-controlled proportional design.
  • the image to be processed is input to the image encoder, and after a transformation operation, a time-frequency conversion process is performed to generate frequency domain information of the image to be processed; then, the frequency domain information is input to the first nerve
  • the network model is used to process the frequency domain information through the first neural network model, output the first coding parameter of the image to be processed, and then input the first coding parameter to the quantization operation of the image encoder, according to
  • the first coding parameter performs quantization processing on the frequency domain information to generate quantization information of the image to be processed; and then performs entropy coding processing on the quantization information according to the first coding parameter to generate code stream information.
  • the image decoder receives the code stream information, and can decode the image to reconstruct the image to be processed.
  • the purpose of training the neural network is to make the neural network have feature extraction capabilities.
  • Each layer of the neural network extracts different feature information based on the previous layer.
  • the deep neural network can extract high-dimensional information. Features, establish the mapping relationship between the sample and the target, complete complex classification, regression and other tasks.
  • the convolution kernel of the neural network can be regarded as a series of filters, and the initial convolution layer can be regarded as the extraction of image frequency domain features.
  • the transformation process of the image encoder is to extract the frequency characteristics of the image and separate the frequency domain information of different frequency bands. That is to say, the transformation process is also a filtering operation.
  • the image encoder is transformed (for example, DCT (Discrete Cosine Transform, Discrete Cosine Transform), DWT (Discrete Wavelet) The output after Transform, discrete wavelet transform), etc.) is used as the input of the first neural network model.
  • DCT Discrete Cosine Transform, Discrete Cosine Transform
  • DWT Discrete Wavelet
  • frequency domain information as the input of the first neural network model
  • the number of layers of the neural network can be reduced and the training process can be speeded up.
  • the internal structure of the image encoder can be better combined with deep learning. , To better utilize the advantages of deep learning and improve the application ability of deep learning in image encoder parameter optimization tasks.
  • the method may further include: performing processing on the first training data set by using a preset first training data set.
  • the neural network model is trained.
  • the first training data set may include frequency domain information of several images whose first coding parameters have been marked.
  • the first training data set may be obtained through the following steps: time-frequency conversion is performed on a number of images that have been labeled with their first coding parameters to obtain their frequency domain information; The frequency domain information of the image of the encoding parameter forms the first training data set.
  • pre-preparing K (K is a positive integer greater than or equal to 1) group of optimal first encoding parameters for the image data set, the optimal first encoding parameters corresponding to at least some images in the image data set are known, that is, at least Some image tags are known.
  • the first neural network model has the ability to optimize the parameters of the image encoder.
  • the obtained frequency domain information is input to the first neural network model, and its optimal first coding parameters can be output.
  • the method may further include: performing the processing on the second neural network through a preset second training data set.
  • the network model is trained.
  • the second training data set here includes quantization information of a number of images whose second coding parameters have been marked.
  • the second training data set may be obtained by the following steps: time-frequency conversion and quantization processing are performed on a number of images that have been labeled with their second coding parameters to obtain their quantization information; The quantization information of the image of the second coding parameter forms the second training data set.
  • pre-prepared P (P is a positive integer greater than or equal to 1) group of optimal second encoding parameters for the image data set, the optimal second encoding parameters corresponding to at least some images in the image data set are known, that is, at least Some image tags are known.
  • the image with a known label is time-frequency transformed and quantized to obtain the corresponding quantized information as the second training data set of the second neural network model, so that the second neural network model learns image features and the optimal first
  • the mapping relationship between the two encoding parameters After multiple iterations, the network training is completed, and the modeling of the mapping relationship between the image and the optimal second coding parameter is realized.
  • the second neural network model has the ability to optimize the parameters of the image encoder.
  • the obtained quantization information is input to the second neural network model, and the optimal second coding parameters can be output.
  • the method may further include: if the image to be processed is in the YUV format, determining the dimensions of the U channel and the V channel and the dimension of the Y channel of the image to be processed; if the U channel If the dimensions of the V channel and the Y channel are inconsistent, the preprocessing operation is performed on the image to be processed so that the dimensions of the U channel and the V channel of the image to be processed are the same as the dimension of the Y channel. Consistent.
  • the method may further include: if the image to be processed is in a preset format, performing a preprocessing operation on the image to be processed, so that the U channel of the image to be processed The dimensions of the V channel and the Y channel are consistent.
  • the preset format is YUV422 format or YUV420 format
  • a preprocessing operation is performed on the image to be processed, so that the dimensions of the U channel and the V channel of the image to be processed are Consistent with the dimension of the Y channel.
  • the image format input by the image encoder is basically the YUV format
  • the main YUV formats of the image encoder are YUV444, YUV422, YUV420 and other formats. Because under the two data formats of YUV422 and YUV420, there is a down-sampling operation for the UV component, which causes the data to be inconsistent in each channel dimension.
  • the performing a preprocessing operation on the image to be processed so that the dimensions of the U channel and the V channel of the image to be processed are consistent with the dimension of the Y channel may include: The U channel and the V channel of the image to be processed are subjected to an up-sampling operation, so that the dimensions of the three channels of Y, U and V of the image to be processed are the same.
  • the performing an upsampling operation on the U channel and the V channel of the image to be processed so that the dimensions of the Y, U, and V channels of the image to be processed are the same may include: The U channel and the V channel of the image to be processed are subjected to a bilinear interpolation operation, so that the dimensions of the Y, U and V channels of the image to be processed are the same.
  • performing a preprocessing operation on the image to be processed so that the dimensions of the U channel and the V channel of the image to be processed are consistent with the dimension of the Y channel may include: The Y channel of the image to be processed is subjected to a down-sampling operation, so that the dimensions of the three channels of Y, U, and V of the image to be processed are the same.
  • the problem of different dimensions of the three channels of YUV can be solved by up-sampling the U and V channels and unifying the three channels to the Y dimension.
  • the problem of different dimensions of the three channels of YUV can be solved by up-sampling the U and V channels and unifying the three channels to the Y dimension.
  • it can also be solved by down-sampling the Y channel and unifying the three channels to the UV dimension.
  • the present invention does not limit this.
  • acquiring the frequency domain information of the image to be processed after time-frequency conversion processing by the image encoder may include: performing DCT transformation on the Y, U, and V channels of the image to be processed with the same dimensions, respectively, Generate frequency domain information for Y, U and V channels.
  • acquiring the frequency domain information of the image to be processed after time-frequency conversion processing by the image encoder may include: performing DWT transformation on the Y, U, and V channels of the image to be processed with the same dimensions, Generate frequency domain information for Y, U and V channels.
  • a discrete cosine transform (DCT) or a discrete wavelet transform (DWT) may be used to implement the transform processing.
  • DCT discrete cosine transform
  • DWT discrete wavelet transform
  • a two-dimensional DCT transformation can be performed on it.
  • the image to be processed is divided into 8 ⁇ 8 blocks, and each block undergoes the following transformation process:
  • C u and C v are the first transformation parameter and the second transformation parameter, respectively.
  • the input image to be processed can be regarded as x[m,n], where both m and n are positive integers greater than or equal to 1.
  • the DWT uses a high-pass filter h[n] and a low-pass filter g[n] to extract different frequencies of the image to be processed continuously multiple times. Each DWT process is as follows:
  • the frequency domain information before the frequency domain information is processed by the preset first neural network model, it may further include: converting the frequency domain information of the Y, U, and V channels of the image to be processed To cascade.
  • Fig. 9 is a flowchart of image information processing provided by an exemplary embodiment of the present invention.
  • the U and V channels are separately up-sampled as an example for illustration.
  • the Y channel is directly subjected to DCT/DWT conversion, and the U and V channels are respectively up-sampled to make the three channels of Y, U and V have the same dimensions, and then DCT/DWT conversion is performed to maintain the same dimension.
  • the transformed frequency domain information is cascaded, and the frequency domain information is output as the input of the first neural network model.
  • the input of the first neural network model is the coefficient matrix after DCT/DWT transformation, that is, frequency domain information.
  • the transformation process does not change the dimension of the matrix, that is, for each image to be processed, the transformed coefficient matrix can still be regarded as three-channel data information. Therefore, after the DCT/DWT transformation, the coefficient matrix can be directly used as the input of the first neural network model.
  • the image processing method provided by the embodiment of the present invention utilizes the similarity between the transformation process in the encoding process of the image encoder and the deep learning feature extraction, and uses the frequency domain information output from the transformation stage of the image encoder process as the input of the neural network model , Combining the time-frequency transformation part of the encoding process of the image encoder with the neural network model, a new neural network model with fewer layers can be used to extract image features.
  • a new neural network model with fewer layers can be used to extract image features.
  • you can Reducing the amount of data to be processed, such as reducing the weight during training can shorten the training time of the network and the optimization time of the image encoder, and save time.
  • the neural network model provided by the embodiments of the present invention can automatically establish the mapping relationship between image features and optimal encoding parameters, so that the optimal encoding parameters can be adaptively selected according to the input image to be processed Mode, to guide the image encoder to perform image encoding, simplify the optimization of encoding parameters, and realize the application of deep learning to the task of image encoder parameter optimization.
  • the neural network model can be used to adaptively generate the corresponding optimal encoding parameters for each image to be processed. While improving the efficiency of image encoder parameter optimization, it can be used for each image. The characteristics of the image are optimized and the differences between the images are considered, so that the encoding performance of the image encoder is improved, and the image quality after decoding can be greatly improved.
  • Fig. 10 is a schematic structural diagram of an image processing device provided by an exemplary embodiment of the present invention.
  • the image processing device may include: a memory 101 and a processor 102.
  • the memory 101 may be used to store program codes.
  • the processor 102 can call the program code, and when the program code is executed, it can be used to perform the following operations: obtain frequency domain information of the image to be processed, and the frequency domain information is obtained through time-frequency conversion processing by an image encoder Processing the frequency domain information through the preset first neural network model to obtain the first encoding parameter of the image to be processed; sending the first encoding parameter to the image encoder, so that the The image encoder encodes the image to be processed according to the first encoding parameter.
  • the image processing device acquires the frequency domain information of the image to be processed, and the frequency domain information is obtained by the time-frequency conversion processing of the image encoder; the frequency domain information is processed by the preset first neural network model. Information is processed to obtain the first encoding parameter of the image to be processed; the first encoding parameter is sent to the image encoder, so that the image encoder can perform processing on the image to be processed according to the first encoding parameter The image is encoded.
  • the first neural network model can realize the automatic optimization of the first encoding parameters of the image to be processed.
  • the optimal first encoding parameter is selected for the image encoder, which improves the encoding efficiency and performance of the image encoder, and can achieve higher quality image encoding effects at the same compression rate , So as to further achieve the best decoding effect under the same evaluation index, and realize the effective combination of deep learning and the internal structure of the image encoder.
  • the embodiment of the invention discloses an image encoder parameter optimization method based on frequency domain characteristics, which can be applied to scene-related products such as image and video compression coding.
  • the first neural network model may include N arithmetic units connected in sequence.
  • the processor when the processor processes the frequency domain information through the preset first neural network model to obtain the first coding parameters of the image to be processed, it may be used to: input the frequency domain information to The nth arithmetic unit of the first neural network model outputs the first coding parameter through the Nth arithmetic unit of the first neural network model.
  • N ⁇ n; n is a positive integer greater than or equal to 2.
  • the first neural network model may include at least one of the following: VGG-16 model, VGG-19 model, ResNet model, and GoogLeNet model.
  • n 2.
  • the processor may also be used to: pass the preset first neural network model.
  • the training data set trains the first neural network model; wherein the first training data set includes frequency domain information of a number of images whose first coding parameters have been marked.
  • the processor may obtain the first training data set by executing the following steps: time-frequency conversion is performed on a number of images labeled with their first coding parameters to obtain their frequency domain Information; the first training data set is formed according to the frequency domain information of the image whose first coding parameter has been marked.
  • the processor may be further configured to: if the image to be processed is in the YUV format, determine the dimensions of the U channel and the V channel and the Y channel of the image to be processed If the dimensions of the U channel and the V channel are inconsistent with the dimension of the Y channel, perform a preprocessing operation on the image to be processed, so that the U channel and the V channel of the image to be processed The dimension is consistent with the dimension of the Y channel.
  • the processor may be further configured to: if the to-be-processed image is in a preset format, perform a preprocessing operation on the to-be-processed image, so that the The dimensions of the U channel and the V channel of the image to be processed are consistent with the dimension of the Y channel.
  • the processor when the preset format is YUV422 format or YUV420 format, the processor performs a preprocessing operation on the to-be-processed image, so that the to-be-processed image is The dimensions of the U channel and the V channel of the processed image are consistent with the dimensions of the Y channel.
  • the processor is performing a preprocessing operation on the image to be processed, so that the dimensions of the U channel and the V channel of the image to be processed are the same as those of the Y
  • it can be used to: perform an up-sampling operation on the U channel and the V channel of the image to be processed, so that the dimensions of the Y, U, and V channels of the image to be processed are the same.
  • the processor performs an up-sampling operation on the U channel and the V channel of the image to be processed, so that the Y, U, and V of the image to be processed are
  • the processor can be used to perform a bilinear interpolation operation on the U channel and the V channel of the image to be processed, so that the dimensions of the Y, U, and V channels of the image to be processed are the same.
  • the processor is performing a preprocessing operation on the image to be processed, so that the dimensions of the U channel and the V channel of the image to be processed are the same as those of the Y
  • it can be used to perform a down-sampling operation on the Y channel of the image to be processed, so that the dimensions of the Y, U, and V channels of the image to be processed are the same.
  • the processor when it obtains the frequency domain information of the image to be processed through the time-frequency conversion processing of the image encoder, it may be used to:
  • the Y, U and V channels of the image are respectively subjected to DCT transformation to generate frequency domain information of the Y, U and V channels.
  • the processor when it obtains the frequency domain information of the image to be processed through the time-frequency conversion processing of the image encoder, it may be used to:
  • the Y, U, and V channels of the image are respectively DWT transformed to generate frequency domain information for the Y, U, and V channels.
  • the processor before processing the frequency domain information through the preset first neural network model, the processor may also be used to: convert the Y of the image to be processed The frequency domain information of, U and V channels are cascaded.
  • the first encoding parameter includes at least one of a typical quantization parameter design, a quantization table design, a feature transformation accuracy design, and a rate control proportional design.
  • the processor is sending the first encoding parameter to the image encoder, so that the image encoder performs an adjustment to the image encoder according to the first encoding parameter.
  • the image to be processed it can be used to: send the first encoding parameter to the image encoder, so that the image encoder can perform frequency domain information of the image to be processed according to the first encoding parameter. Perform quantization processing and entropy coding processing.
  • the processor is sending the first encoding parameter to the image encoder, so that the image encoder performs an adjustment to the image encoder according to the first encoding parameter.
  • the image to be processed it can be used to: send the first encoding parameter to the image encoder, so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and Generate quantized information of the image to be processed.
  • the processor may also be used to: obtain quantized information of the image to be processed; process the quantized information through a preset second neural network model to obtain The second encoding parameter of the image to be processed; sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter.
  • the processor is sending the second encoding parameter to the image encoder, so that the image encoder performs an adjustment to the image encoder according to the second encoding parameter.
  • the processor may be used to send the second encoding parameter to the image encoder, so that the image encoder performs entropy encoding processing on the quantized information according to the second encoding parameter.
  • the second neural network includes M arithmetic units connected in sequence; wherein, the processor performs the quantitative information processing on the quantized information through a preset second neural network model. Processing, when the second coding parameter of the image to be processed is obtained, it can be used to: input the quantized information to the mth arithmetic unit of the second neural network model, and pass the second neural network model M arithmetic units output the second coding parameter; wherein, M ⁇ m; m is a positive integer greater than or equal to 2.
  • the second neural network model may include at least one of the following: VGG-16 model, VGG-19 model, ResNet model, GoogLeNet model.
  • the second encoding parameter may include at least one of a typical quantization parameter design, a quantization table design, and a rate control proportional design.
  • the processor may be further configured to: the image encoder encodes the image to be processed according to the first encoding parameter to obtain the image of the image to be processed Stream information.
  • the processor may be further configured to: use an image decoder to perform a decoding operation on the code stream information to obtain a reconstructed image to be processed.
  • this embodiment also provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the image processing method described in the foregoing embodiment.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute the method described in the various embodiments of the present invention. Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne, selon des modes de réalisation, un procédé et un dispositif de traitement d'images et un support de stockage lisible par ordinateur. Le procédé de traitement d'image consiste à : acquérir des informations de domaine de fréquence d'une image à traiter, les informations de domaine de fréquence étant obtenues au moyen d'un codeur d'image effectuant un traitement de conversion de fréquence temporelle ; traiter les informations de domaine de fréquence au moyen d'un premier modèle de réseau neuronal prédéfini pour obtenir un premier paramètre de codage de l'image à traiter ; et envoyer le premier paramètre de codage au codeur d'image, de telle sorte que le codeur d'image code, selon le premier paramètre de codage, l'image à traiter. Selon les modes de réalisation de la présente invention, un codeur d'image et un premier modèle de réseau neuronal sont combinés efficacement, ce qui permet d'améliorer l'efficacité et la performance de codage d'image.
PCT/CN2019/089588 2019-05-31 2019-05-31 Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur WO2020237646A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/089588 WO2020237646A1 (fr) 2019-05-31 2019-05-31 Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur
CN201980008045.4A CN111630570A (zh) 2019-05-31 2019-05-31 图像处理方法、设备及计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/089588 WO2020237646A1 (fr) 2019-05-31 2019-05-31 Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur

Publications (1)

Publication Number Publication Date
WO2020237646A1 true WO2020237646A1 (fr) 2020-12-03

Family

ID=72261321

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089588 WO2020237646A1 (fr) 2019-05-31 2019-05-31 Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN111630570A (fr)
WO (1) WO2020237646A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669202A (zh) * 2020-12-25 2021-04-16 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112749802A (zh) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 神经网络模型的训练方法、装置以及计算机可读存储介质
CN115412731A (zh) * 2021-05-11 2022-11-29 北京字跳网络技术有限公司 视频处理方法、装置、设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330567B (zh) * 2020-11-23 2023-07-21 中国建设银行股份有限公司 图像处理方法和装置
CN115225200A (zh) * 2021-04-21 2022-10-21 华为技术有限公司 一种数据处理的方法以及装置
CN113643261B (zh) * 2021-08-13 2023-04-18 江南大学 一种基于频率注意网络的胸肺部疾病诊断方法
CN113691818B (zh) * 2021-08-25 2023-06-30 深圳龙岗智能视听研究院 视频目标检测方法、系统、存储介质、计算机视觉终端
CN114745556B (zh) * 2022-02-07 2024-04-02 浙江智慧视频安防创新中心有限公司 编码方法、装置、数字视网膜系统、电子设备及存储介质
CN116600106B (zh) * 2023-05-18 2024-04-09 深圳聚源视芯科技有限公司 一种动态调整压缩率的图像压缩方法及系统
CN116506622B (zh) * 2023-06-26 2023-09-08 瀚博半导体(上海)有限公司 模型训练方法及视频编码参数优化方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160360202A1 (en) * 2015-06-05 2016-12-08 Sony Corporation Banding prediction for video encoding
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
CN109325595A (zh) * 2018-10-23 2019-02-12 天津天地伟业信息系统集成有限公司 基于交通场景的jpeg自学习量化方法和装置
CN109819252A (zh) * 2019-03-20 2019-05-28 福州大学 一种不依赖gop结构的量化参数级联方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3545679B1 (fr) * 2016-12-02 2022-08-24 Huawei Technologies Co., Ltd. Appareil et procédé de codage d'une image
KR102301232B1 (ko) * 2017-05-31 2021-09-10 삼성전자주식회사 다채널 특징맵 영상을 처리하는 방법 및 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160360202A1 (en) * 2015-06-05 2016-12-08 Sony Corporation Banding prediction for video encoding
CN109325595A (zh) * 2018-10-23 2019-02-12 天津天地伟业信息系统集成有限公司 基于交通场景的jpeg自学习量化方法和装置
CN109286825A (zh) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 用于处理视频的方法和装置
CN109819252A (zh) * 2019-03-20 2019-05-28 福州大学 一种不依赖gop结构的量化参数级联方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669202A (zh) * 2020-12-25 2021-04-16 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112669202B (zh) * 2020-12-25 2023-08-08 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112749802A (zh) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 神经网络模型的训练方法、装置以及计算机可读存储介质
CN112749802B (zh) * 2021-01-25 2024-02-09 深圳力维智联技术有限公司 神经网络模型的训练方法、装置以及计算机可读存储介质
CN115412731A (zh) * 2021-05-11 2022-11-29 北京字跳网络技术有限公司 视频处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111630570A (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2020237646A1 (fr) Procédé et dispositif de traitement d'image et support de stockage lisible par ordinateur
Cheng et al. Learned image compression with discretized gaussian mixture likelihoods and attention modules
CN111641832B (zh) 编码方法、解码方法、装置、电子设备及存储介质
CN113259676B (zh) 一种基于深度学习的图像压缩方法和装置
CN111641826B (zh) 对数据进行编码、解码的方法、装置与系统
US11178430B2 (en) Adaptive DCT sharpener
CN113079378B (zh) 图像处理方法、装置和电子设备
CN110753225A (zh) 一种视频压缩方法、装置及终端设备
WO2023130333A1 (fr) Procédé de codage et de décodage, codeur, décodeur, et support de stockage
CN114449276B (zh) 一种基于学习的超先验边信息补偿图像压缩方法
CN113822147A (zh) 一种协同机器语义任务的深度压缩方法
Li et al. Multiple description coding based on convolutional auto-encoder
Fu et al. An extended hybrid image compression based on soft-to-hard quantification
CN116600119B (zh) 视频编码、解码方法、装置、计算机设备和存储介质
CN111080729B (zh) 基于Attention机制的训练图片压缩网络的构建方法及系统
RU2683614C2 (ru) Кодер, декодер и способ работы с использованием интерполяции
JP2014521273A (ja) 画像を符号化する方法および装置
CN115294222A (zh) 图像编码方法及图像处理方法、终端及介质
CN114882133B (zh) 一种图像编解码方法、系统、设备及介质
CN117459737B (zh) 一种图像预处理网络的训练方法和图像预处理方法
CN116916034B (zh) 基于safd的图像处理方法、装置、设备及存储介质
CN117676149B (zh) 一种基于频域分解的图像压缩方法
CN115914630B (zh) 一种图像压缩方法、装置、设备及存储介质
WO2023051223A1 (fr) Procédé et appareil de filtrage, procédé et appareil de codage, procédé et appareil de décodage, support lisible par ordinateur et dispositif électronique
Sachdeva et al. A Review on Digital Image Compression Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930884

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19930884

Country of ref document: EP

Kind code of ref document: A1