WO2020047756A1

WO2020047756A1 - Image encoding method and apparatus

Info

Publication number: WO2020047756A1
Application number: PCT/CN2018/104022
Authority: WO
Inventors: 赵文军; 牛兵兵
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2020-03-12
Also published as: CN110870310A

Abstract

Provided are an image encoding method and apparatus. The method comprises: acquiring an image to be encoded; obtaining, according to the image to be encoded and by means of a trained neural network model, an optimal encoding parameter corresponding to the image to be encoded; and encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded. The processing method involves automatically selecting, by using a trained neural network model, an optimal encoding parameter corresponding to an image to be encoded, and then encoding the image to be encoded by using the optimal encoding parameter corresponding to the image to be encoded, wherein same takes the difference between images to be encoded into full consideration, and may select the best matched encoding parameter for each of the images to be encoded, thereby improving the encoding quality of the image to be encoded.

Description

Image coding method and device

Copyright statement

The content disclosed in this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the official records and archives of the Patent and Trademark Office.

Technical field

The present application relates to the field of image encoding and decoding, and more particularly, to an image encoding method and device.

Background technique

With the rapid development of image or video technology, how to improve the quality of image encoding and decoding has become a hot issue that people care about.

For image codec quality, the selection of encoding parameters is particularly critical. Traditional encoding and decoding technologies usually use fixed encoding parameters to encode the image to be encoded, or manually select encoding parameters of the image to be encoded, resulting in poor encoding quality of the image to be encoded.

Summary of the Invention

The present application provides an image encoding method and device, which can improve the encoding quality of an image to be encoded.

According to a first aspect, an image encoding method is provided, including: acquiring an image to be encoded; and obtaining an optimal encoding parameter corresponding to the image to be encoded according to the image to be encoded and a trained neural network model; The optimal encoding parameter corresponding to the encoded image encodes the image to be encoded.

According to a second aspect, an image encoding device is provided, including: a memory for storing a program; a processor for executing the program stored in the memory to perform the following operations: obtaining an image to be encoded; and according to the image to be encoded Obtaining the optimal encoding parameter corresponding to the image to be encoded by using the trained neural network model; and encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded.

According to a third aspect, there is provided an image encoding apparatus including a module for performing the method of the first aspect.

In a fourth aspect, there is provided a computer-readable storage medium having stored thereon instructions for performing the method in the first aspect.

In a fifth aspect, a computer program product is provided including instructions for performing the method in the first aspect.

This application first uses the trained neural network model to automatically select the optimal encoding parameters corresponding to the image to be encoded, and then uses the optimal encoding parameters corresponding to the image to be encoded to encode the image to be encoded. The above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a conventional image coding process.

FIG. 2 is a schematic flowchart of an image coding method according to an embodiment of the present application.

FIG. 3 is a schematic diagram of an image encoding process provided by an embodiment of the present application.

FIG. 4 is an example diagram of a processing manner of a neural network model provided by an embodiment of the present application for an image to be encoded.

FIG. 5 is a schematic structural diagram of a neural network model according to an embodiment of the present application.

FIG. 6 is a schematic diagram of an implementation process of spatial pyramid pooling provided by an embodiment of the present application.

FIG. 7 is an example diagram of an input image format and a configuration manner of the number of channels provided by an embodiment of the present application.

FIG. 8 is a schematic diagram of a training process of a neural network model according to an embodiment of the present application.

FIG. 9 is a schematic flowchart of a training step of a neural network model according to an embodiment of the present application.

FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application.

detailed description

The embodiments of the present application can be applied to standard or non-standard image or video encoders. For example, it is applicable to JPEG (joint photograthic experts), JPEG2000, H.264, H.265 and other standard encoders.

In order to facilitate understanding, the traditional image encoding process (or video encoding process) is briefly introduced with reference to FIG. 1.

As shown in FIG. 1, after receiving an image to be encoded, the traditional image encoding process usually includes processes such as transform 12, quantization 14, and entropy encoding 16, and finally output a code stream of the image to be encoded. The decoder usually decodes the received bitstream according to the inverse process of the above process to recover the image information before encoding.

The encoding parameters mentioned in the embodiments of the present application may refer to any parameters that need to be used in the encoding process, and may include, for example, one or more of the following parameters: a parameter for indicating a transformation mode of an image to be encoded (or a transformation process Related parameters), parameters used to indicate the quantization mode of the image to be encoded (or related parameters of the quantization process), and parameters used to indicate the entropy encoding mode of the image to be encoded (or related parameters of the entropy encoding process).

The relevant parameters of the transformation process may include, for example, parameters indicating a transformation manner and / or a precision of the transformation. The changing mode may include a discrete cosine transform (DCT), a discrete wavelet transform (DWT), and the like.

The relevant parameters of the quantization process may include, for example, a parameter indicating a selection manner of the quantization parameter and a parameter indicating a design manner of the quantization table.

The relevant parameters of the entropy encoding process may include, for example, parameters indicating an entropy encoding mode, parameters indicating an estimation of a probability distribution of the entropy encoding, and the like. The entropy coding method may be, for example, Shannon coding, Huffman coding, or arithmetic coding.

In the traditional image encoding process, the value of the encoding parameters is generally fixed, that is, regardless of whether the images to be encoded are the same, they are encoded using a uniform encoding parameter. The appropriate encoding parameters are not selected for the differences of each image, resulting in encoding. The generality of the parameters is poor. Some manufacturers have proposed optimization schemes for coding parameters, but most of these optimization schemes require manual optimization of coding parameters. Manual coding parameter optimization requires higher professional quality of coding personnel, which is time-consuming and labor-intensive. Complex issues.

Considering the differences between images, the embodiment of the present application proposes an image coding scheme that uses the neural network model to adaptively select the optimal coding parameters for the image to be coded from the perspective of deep learning. With a view to improving the encoding quality of images. In order to facilitate understanding, before introducing the image coding methods provided in the embodiments of the present application in detail, the concepts of deep learning and neural network models are briefly introduced.

Deep learning originated from the study of neural networks. In the 1960s, inspired by neuroscience research on the structure of the human brain, in order to make machines have the same intelligence as humans, artificial neural networks were proposed to simulate the process of data processing by the human brain.

In the mid-1980s, the backpropogation (BP) algorithm was proposed to provide a way to learn neural network models with multiple hidden layer structures, allowing the rapid development of neural network research. However, since traditional neural networks are basically fully connected networks, there are too many network parameters in the case of large input dimensions and it is difficult to train. For this reason, research on neural networks for high-dimensional data has been stagnant. However, with the development of convolutional neural network (CNN) and other neural network models, it has solved the problem of too many neural network model parameters and difficult to train, and it has also made neural network models more and more applied in various fields. .

At present, neural network models based on deep learning are widely used in image fields, such as object detection and face recognition in images, and have achieved great success. Compared with traditional image detection algorithms, deep learning-based neural network models do not need to manually select features. Instead, the neural network model is trained by learning to extract image features, and then use the extracted image features for subsequent decision-making to achieve Classification and recognition of images.

The embodiment of the present application uses a neural network model to optimize image coding parameters, so as to improve the image coding quality.

The image encoding method provided by the embodiment of the present application is described in detail below with reference to FIG. 2.

In step 22, an image to be encoded is acquired.

The image format of the image to be encoded is not specifically limited in the embodiment of the present application, and may be an image in a YUV format or an image in an RGB format.

In step 24, according to the image to be encoded, an optimal coding parameter corresponding to the image to be encoded is obtained through the trained neural network model.

The aforementioned neural network model may be used to indicate a mapping relationship between an image to be encoded and its corresponding optimal encoding parameter. For example, the neural network model can directly output the optimal encoding parameters corresponding to the image to be encoded; for another example, the neural network model can output the preset probability of each candidate encoding parameter as the optimal encoding parameter, and then a parameter selection module ( Or parameter selection step) from which the candidate encoding parameter with the highest probability is selected as the optimal encoding parameter corresponding to the image to be encoded.

The embodiment of the present application does not specifically limit the structure of the neural network model, for example, it may be a CNN, a recurrent neural network (RNN), or a fully convolutional network (FCN). The embodiment of the present application does not specifically limit the training method of the neural network model provided in the embodiment of the present application. The detailed description will be given below in combination with specific embodiments, which will not be described in detail here.

In step 26, the image to be encoded is encoded according to the optimal encoding parameter corresponding to the image to be encoded.

In the above steps, the encoding of the image to be encoded may be implemented by using a standard encoder (such as an encoder supporting the JPEG or H264 standard), or by using a non-standard encoder, which is not limited in this embodiment of the present application.

The embodiment of the present application first uses the trained neural network model to automatically select the optimal encoding parameter corresponding to the image to be encoded, and then uses the optimal encoding parameter corresponding to the image to be encoded to encode the image to be encoded. The above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.

For encoders, it is often difficult to establish the mapping relationship between the image to be encoded and the optimal encoding parameters through theoretical derivation. This embodiment of the present application uses a deep learning-based method and uses a trained neural network model to encode each image to be encoded. The parameters are optimized, thereby establishing the mapping relationship between the image to be encoded and the optimal encoding parameters, solving the problem of poor universality of traditional encoding parameters, and optimizing the encoding framework.

This embodiment of the present application does not specifically limit the timing of performing steps 22 to 26.

Optionally, as an implementation manner, steps 22 to 26 may be performed online (that is, steps 22 to 26 are performed once for each frame of an image to be encoded). The following describes this implementation manner in detail with reference to FIG. 3.

Referring to FIG. 3, when an image to be encoded is received, the image to be encoded may be input into a neural network model and an encoder, respectively. The neural network model can calculate the probability (or probability estimation) of the preset candidate encoding parameters as the optimal encoding parameters according to the characteristics of the image to be encoded, and output the calculated probability to a selection module, which can be selected from the selection module. The candidate encoding parameter with the highest probability is used as the optimal encoding parameter corresponding to the image to be encoded, and the optimal encoding parameter corresponding to the image to be encoded is sent to the encoder. The encoder can be a standard encoder or a non-standard encoder. The encoder can encode an image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded to obtain a code stream of the image to be encoded. Then, the encoder can send the code stream to the decoding end, and the decoder at the decoding end decodes the image to obtain a decoded image.

Optionally, as another implementation manner, step 22 to step 24 may be performed offline, and then step 26 may be performed online. For example, assuming that the image to be encoded contains a group of images, a target image (such as the first frame of the image to be encoded) can be selected from this group of images, and the optimal corresponding to the target image is determined offline using steps 22 to 24 Encoding parameters. Then, the encoding parameters of the encoder can be set to the optimal encoding parameters corresponding to the target image, and each frame image in the to-be-encoded image is encoded online. The advantage of this implementation is that there is no need to adjust the structure of the encoder or the encoding process, and it has good compatibility with traditional encoding methods.

The structure of the neural network model or the operations it needs to implement can be flexibly arranged as required, which is not limited in the embodiments of the present application. In the following, the structure of the neural network model or the operations that it needs to implement are described in detail in combination with specific embodiments.

Optionally, the neural network model can be used to perform the operations shown in Figure 4:

Step 42: Perform feature extraction on the image to be encoded to obtain a feature vector.

There can be multiple feature extraction methods for the image to be encoded. As shown in FIG. 5, the neural network model may include multiple convolution and down-sampling layers (or convolution layers) for extracting feature vectors of an image to be encoded. Alternatively, the features of the image to be encoded may also be extracted by other methods. For example, traditional feature extraction methods such as pyramid decomposition or principal component analysis (PCA) can be used to extract the features of the image to be encoded. Alternatively, in some embodiments, a traditional feature extraction manner may be combined with a feature extraction manner based on a convolution operation. After the feature extraction and downsampling operations, a feature vector of the image to be encoded can be obtained, and the feature vector can be regarded as a high-dimensional feature representation of the image to be encoded.

Step 44: Determine the optimal encoding parameter corresponding to the image to be encoded according to the feature vector.

There may be multiple implementations of step 44.

Optionally, as an embodiment, if the size (or resolution) of the image to be encoded is fixed, the feature vector may be directly input to the fully connected layer of the neural network model, and then the candidate encoding parameters are calculated using the fully connected layer to be optimal Probability of encoding parameters, thereby determining an optimal encoding parameter corresponding to an image to be encoded.

Optionally, as another embodiment, it is assumed that the neural network model is a neural network model formed by a convolution and downsampling layer and a fully connected layer as shown in FIG. 5, because the convolution and downsampling layer can receive an arbitrary size (or If the size of the image to be encoded is not fixed, the size of the feature map output by the convolution and downsampling layers is also not fixed. In this case, as shown in Figure 5, a spatial pyramid pooling layer can be set between the convolutional layer and the fully connected layer, and the feature maps output by the convolutional and downsampling layers can be processed to obtain a fixed-dimensional Feature vector, and then input a fixed-dimensional feature vector to the fully connected layer to obtain the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter corresponding to the image to be encoded. The introduction of the spatial pyramid pooling layer enables the neural network model provided by the embodiments of the present application to be able to process images to be encoded in any size (or resolution). Of course, other methods can also be used to convert the feature map output by the convolution layer into a feature vector with a fixed number of dimensions. This embodiment of the present application is not limited to this. The use of the spatial pyramid pooling layer can better avoid the image to be encoded Loss of critical information.

The embodiment of the present application does not limit the specific implementation manner of the spatial pyramid pooling layer. A possible implementation manner is given below with reference to FIG. 6.

As shown in Figure 6, for the feature maps (or high-dimensional hidden feature maps) output through the convolution and downsampling layers, the spatial pyramid pooling layer can block them at three scales as shown in Figure 6, and then A feature is extracted in each block, so that the feature dimensions of the output of feature maps of any size are the same, which effectively solves the requirement of inputting feature vectors of fixed dimensions in the fully-connected layer later.

The fully connected layer can use the input feature vector to generate the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter of the image to be encoded. The fully connected layer is equivalent to mapping the hidden features of the image to be encoded into the space of the optimal encoding parameter (or optimal encoding parameter mode) of the encoder, thereby generating the optimal encoding parameter of the image to be encoded.

The above mainly describes the process of using the neural network model. Before using the neural network model, it usually needs to be trained to obtain the parameters of the neural network model. The following describes the training process of the neural network model in combination with specific embodiments.

The training process of the neural network model can be performed based on the training samples. The training samples can be input images with known optimal encoding parameters. The format of the input image may be an RGB format or a YUV format, which is not limited in this embodiment of the present application.

The input image data can be original image data or image data after preprocessing. For example, the data of the input image may be normalized data. The normalized processing of the input image data can improve the convergence performance and feature expression ability of the neural network model.

Taking the format of the original image as YUV (for example, YUV444, YUV422, YUV420) as an example, for the original image f (x, y), the components of the original image can be normalized in the following manner to obtain the input image:

Neural network models generally use one or more channels to process images. Taking an image in the RGB format as an example, a neural network model usually uses three channels to process the color components R, G, and B, respectively. Taking an image in the YUV format as an example, a neural network model usually uses three channels to separately color the components Y, G, and B. U, V for processing.

In order to reduce the training complexity of the neural network model and improve the convergence performance of the neural network model, the data of some color components of the input image used for training can be merged together, so that the data of the input image is reduced from three-channel data to double Channel data or single channel data. For example, the data of the input image may be dual-channel data, and one of the two-channel data includes data corresponding to two color components of the input image. As another example, the data of the input image may be single-channel data, and the single-channel data includes data corresponding to each color component of the input image.

The number of channels of the input image can be determined based on the format of the input image or the down-sampling method of the color components of the input image. Taking FIG. 7 as an example, for an input image in the YUV444 format, since the amount of data of each color component is the same, the data of such an input image can still maintain three channels of data. For the input image of YUV422 format, since its color component U and color component V are down-sampled in the horizontal direction, as shown in FIG. 7, the color component U and the color component V can be spliced in the horizontal direction so that the input image The data is dual-channel data. For the input image of YUV420 format, because its color component U and color component V are down-sampled in both horizontal and vertical directions, the color component U and color component V are reduced to the data amount of color component Y in both horizontal and vertical directions. Therefore, in order to facilitate training, as shown in FIG. 7, the data of the color component U and the color component V can be stitched and then combined with the color component Y (such as being placed below the color component Y) so that the input image The data is single-channel data.

The following describes the training process of the neural network model with reference to FIG. 8.

First, a certain number of candidate coding parameters can be set in advance. The embodiment of the present application does not specifically limit the number and selection method of candidate coding parameters, and may be selected based on experience or experiments. For example, multiple sets of coding parameters may be obtained first based on experience; then, optionally, the multiple sets of candidate coding parameters may be randomly modified to generate modified coding parameters; then, a peak signal to noise ratio (peak signal to noise) may be used ratio, PSNR) or other evaluation methods to evaluate the coding performance of these coding parameters, and select the coding parameter with the best coding performance as a candidate coding parameter. The number of candidate coding parameters may be set according to actual needs, or may be directly set to a fixed value. For example, the number of candidate coding parameters may be set to 27.

As shown in FIG. 8, for an input image, it can be input to a neural network model to obtain the probability that each candidate encoding parameter is the optimal encoding parameter. Taking the number of candidate coding parameters as 27 as an example, the probability in FIG. 8 may be a 27 × 1 vector, which corresponds to the 27 groups of candidate coding parameters one by one. The value of each element of the vector can be a number between 0 and 1, which is used to indicate the probability that a set of candidate coding parameters corresponding to the element is the optimal coding parameter. The probability that each candidate coding parameter output by the neural network model is the optimal coding parameter can be called the true value output by the neural network model.

Since the optimal encoding parameters of the input image are known in advance, the values of the optimal encoding parameters corresponding to the input image can be set to 1 and the values of the remaining encoding parameters are set to 0 to obtain the theoretical value output by the neural network.

Then, the parameters of the neural network model can be adjusted according to the deviation between the real value output from the neural network model and the theoretical value, so that the real value output by the neural network model is as close to the theoretical value as possible, thereby implementing the training of the neural network model.

The training process of the neural network model can be seen as a process of continuous iteration, so that the real value and the theoretical value of the output of the neural network model are continuously approaching. The termination of the iterative process can be determined by a loss function (or cost function).

The embodiment of the present application does not specifically limit the type of the loss function used in the training process of the neural network model, and may be a minimum squared-error (MSE) function or a cross-entropy function.

The cross-entropy function can be defined as follows:

Among them, y ′ is the actual output, y is the ideal output, and n is the number of candidate encoding parameters.

Compared to the MSE function, the cross-entropy function converges more easily.

The resolution of the input image used for training can be selected according to actual needs, an image with a fixed resolution can be selected, or an image with a different resolution can be selected.

In order to enable the trained neural network model to support the optimization of encoding parameters of images with multiple resolutions, the input image can be selected as an image with multiple resolutions. These input images can be randomly input into the neural network model for training, or they can be input into the neural network model for training in a certain order.

For example, the input image may include a first image set and a second image set. Each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from the resolution of the images in the first image set. The images in the second image set may have the same or different resolutions, which is not limited in this embodiment of the present application.

When training a neural network model, see Figure 9, which can be performed in two steps.

In step 92, the neural network model is trained using the images in the first image set to obtain the parameters of the neural network model.

The training phase corresponding to this step can be called a single-resolution image training phase, the purpose is to be able to quickly train a neural network model based on a resolution image.

In step 94, the images in the second image set are used to modify the parameters of the neural network model.

The training phase corresponding to this step can be called a multi-resolution image training phase, and the purpose is to fine-tune the parameters of the neural network model generated in step 92, so that the neural network model can be used to process images of other resolutions.

The two-step training method shown in FIG. 9 can make the parameters of the neural network model quickly converge, thereby improving the training efficiency of the neural network model.

The method embodiments of the present application are described in detail above with reference to FIGS. 1 to 9, and the device embodiments of the present application are described in detail below with reference to FIG. 10. It should be understood that the device embodiments correspond to the method embodiments. Therefore, for the parts that are not described in detail, reference may be made to the foregoing method embodiments.

FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application. The image encoding device 1000 includes a memory 1010 and a processor 1020.

The memory 1010 may be used to store a program. The processor 1020 may be configured to execute a program stored in the memory to perform the following operations: obtaining an image to be encoded; and obtaining an optimal encoding corresponding to the image to be encoded according to the image to be encoded by using a trained neural network model. Parameters; encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.

Optionally, the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine an optimal encoding parameter corresponding to the image to be encoded.

Optionally, determining the optimal encoding parameter corresponding to the image to be encoded according to the feature vector may include generating an output vector according to the feature vector, where the output vector is used to represent multiple preset presets. The probability that each candidate encoding parameter is an optimal encoding parameter; and based on the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, the one with the highest probability is selected from the plurality of candidate encoding parameters as the corresponding one of the image to be encoded Optimal encoding parameters.

Optionally, before acquiring the image to be encoded, the processor 1020 may be further configured to perform the following operation: training the neural network model according to an input image whose optimal encoding parameters are known.

Optionally, the training process of the neural network model may use a cross-entropy function as a loss function.

Optionally, the data of the input image may be normalized data.

Optionally, the data of the input image may be dual-channel data, and one of the two-channel data may include data corresponding to two color components of the input image.

Optionally, the data of the input image may be single-channel data, and the single-channel data may include data corresponding to each color component of the input image.

Optionally, the data of the input image may be data in YUV or RGB format.

Optionally, the input image may include a first image set and a second image set, each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from The resolution of the images in the first image set, and the training of the neural network model according to the input image whose optimal encoding parameters are known may include: using the images in the first image set to train the neural network The network model is trained to obtain the parameters of the neural network model; the images in the second image set are used to modify the parameters of the neural network model.

Optionally, the encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded may include: using a standard or non-standard encoder according to the optimal encoding parameter corresponding to the image to be encoded. Encoding the image to be encoded.

Optionally, the optimal encoding parameter corresponding to the image to be encoded may include at least one of the following: a parameter for indicating a transformation manner of the image to be encoded, and a parameter for indicating a quantization manner of the image to be encoded. Parameters, and parameters used to indicate an entropy encoding manner of the image to be encoded.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc. .

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.

The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

An image encoding method, comprising:

Obtaining an image to be encoded;

Obtaining the optimal encoding parameters corresponding to the image to be encoded according to the trained neural network model according to the image to be encoded;

And encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
The method according to claim 1, wherein the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine the best corresponding to the image to be encoded. Optimal coding parameters.
The method according to claim 2, wherein the determining an optimal encoding parameter corresponding to the image to be encoded according to the feature vector comprises:

Generating an output vector according to the feature vector, where the output vector is used to indicate a probability that each of a plurality of preset candidate coding parameters is an optimal coding parameter;

According to the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, an optimal encoding parameter corresponding to the image to be encoded is selected from among the plurality of candidate encoding parameters with a highest probability.
The method according to any one of claims 1-3, wherein before the acquiring an image to be encoded, the method further comprises:

The neural network model is trained according to an input image whose optimal encoding parameters are known.
The method according to claim 4, wherein the training process of the neural network model uses a cross-entropy function as a loss function.
The method according to claim 4 or 5, wherein the data of the input image is normalized data.
The method according to any one of claims 4-6, wherein the data of the input image is dual-channel data, and one of the two-channel data includes data corresponding to two color components of the input image .
The method according to any one of claims 4 to 6, wherein the data of the input image is single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
The method according to claim 7 or 8, wherein the data of the input image is data in YUV or RGB format.
The method according to any one of claims 4-9, wherein the input image includes a first image set and a second image set, and each image in the first image set has the same resolution, The resolution of the images in the second image set is different from the resolution of the images in the first image set,

The training the neural network model based on an input image with known optimal encoding parameters includes:

Using the images in the first image set to train the neural network model to obtain parameters of the neural network model;

The images in the second image set are used to modify the parameters of the neural network model.
The method according to any one of claims 1 to 10, wherein the encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded comprises:

According to an optimal encoding parameter corresponding to the image to be encoded, the standard or non-standard encoder is used to encode the image to be encoded.
The method according to any one of claims 1-11, wherein the optimal encoding parameter corresponding to the image to be encoded includes at least one of the following: a method for indicating a transformation manner of the image to be encoded A parameter for indicating a quantization manner of the image to be encoded and a parameter for indicating an entropy encoding manner of the image to be encoded.
An image encoding device, comprising:

Memory for storing programs;

A processor, configured to execute a program stored in the memory to perform the following operations:

Obtaining an image to be encoded;

Obtaining the optimal encoding parameters corresponding to the image to be encoded according to the trained neural network model according to the image to be encoded;

And encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
The image encoding device according to claim 13, wherein the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine that the image to be encoded corresponds to Optimal encoding parameters.
The image encoding device according to claim 14, wherein the determining an optimal encoding parameter corresponding to the image to be encoded according to the feature vector comprises:

Generating an output vector according to the feature vector, where the output vector is used to indicate a probability that each of a plurality of preset candidate coding parameters is an optimal coding parameter;

According to the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, an optimal encoding parameter corresponding to the image to be encoded is selected from among the plurality of candidate encoding parameters with a highest probability.
The image encoding device according to any one of claims 13-15, wherein before the acquiring an image to be encoded, the processor is further configured to perform the following operations:

The neural network model is trained according to an input image whose optimal encoding parameters are known.
The image coding device according to claim 16, wherein the training process of the neural network model uses a cross-entropy function as a loss function.
The image encoding device according to claim 16 or 17, wherein the data of the input image is normalized data.
The image encoding device according to any one of claims 16 to 18, wherein the data of the input image is dual-channel data, and one of the dual-channel data includes two color components corresponding to the input image The data.
The image encoding device according to any one of claims 16 to 18, wherein the data of the input image is single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
The image encoding device according to claim 19 or 20, wherein the data of the input image is data in YUV or RGB format.
The image encoding device according to any one of claims 16-21, wherein the input image includes a first image set and a second image set, and each image in the first image set has the same resolution Rate, the resolution of the images in the second image set is different from the resolution of the images in the first image set,

The training the neural network model based on an input image with known optimal encoding parameters includes:

Using the images in the first image set to train the neural network model to obtain parameters of the neural network model;

The images in the second image set are used to modify the parameters of the neural network model.
The image encoding device according to any one of claims 13 to 22, wherein the encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded, comprises:

According to an optimal encoding parameter corresponding to the image to be encoded, the standard or non-standard encoder is used to encode the image to be encoded.
The image encoding device according to any one of claims 13 to 23, wherein an optimal encoding parameter corresponding to the image to be encoded includes at least one of the following: an instruction for indicating a transformation of the image to be encoded The parameter of the mode is used to indicate a parameter of the quantization mode of the image to be encoded and the parameter of the entropy encoding mode of the image to be encoded.
A computer-readable storage medium, characterized in that it stores instructions for performing the method according to any one of claims 1-12.
A computer program product, comprising instructions for performing a method according to any one of claims 1-12.