WO2020047756A1 - Image encoding method and apparatus - Google Patents
Image encoding method and apparatus Download PDFInfo
- Publication number
- WO2020047756A1 WO2020047756A1 PCT/CN2018/104022 CN2018104022W WO2020047756A1 WO 2020047756 A1 WO2020047756 A1 WO 2020047756A1 CN 2018104022 W CN2018104022 W CN 2018104022W WO 2020047756 A1 WO2020047756 A1 WO 2020047756A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- encoded
- encoding
- neural network
- network model
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Definitions
- the present application relates to the field of image encoding and decoding, and more particularly, to an image encoding method and device.
- the present application provides an image encoding method and device, which can improve the encoding quality of an image to be encoded.
- an image encoding method including: acquiring an image to be encoded; and obtaining an optimal encoding parameter corresponding to the image to be encoded according to the image to be encoded and a trained neural network model; The optimal encoding parameter corresponding to the encoded image encodes the image to be encoded.
- an image encoding device including: a memory for storing a program; a processor for executing the program stored in the memory to perform the following operations: obtaining an image to be encoded; and according to the image to be encoded Obtaining the optimal encoding parameter corresponding to the image to be encoded by using the trained neural network model; and encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded.
- an image encoding apparatus including a module for performing the method of the first aspect.
- a computer-readable storage medium having stored thereon instructions for performing the method in the first aspect.
- a computer program product including instructions for performing the method in the first aspect.
- This application first uses the trained neural network model to automatically select the optimal encoding parameters corresponding to the image to be encoded, and then uses the optimal encoding parameters corresponding to the image to be encoded to encode the image to be encoded.
- the above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.
- FIG. 1 is a schematic diagram of a conventional image coding process.
- FIG. 2 is a schematic flowchart of an image coding method according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of an image encoding process provided by an embodiment of the present application.
- FIG. 4 is an example diagram of a processing manner of a neural network model provided by an embodiment of the present application for an image to be encoded.
- FIG. 5 is a schematic structural diagram of a neural network model according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of an implementation process of spatial pyramid pooling provided by an embodiment of the present application.
- FIG. 7 is an example diagram of an input image format and a configuration manner of the number of channels provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of a training process of a neural network model according to an embodiment of the present application.
- FIG. 9 is a schematic flowchart of a training step of a neural network model according to an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application.
- the embodiments of the present application can be applied to standard or non-standard image or video encoders.
- JPEG joint photograthic experts
- JPEG2000 JPEG2000, H.264, H.265 and other standard encoders.
- the traditional image encoding process after receiving an image to be encoded, usually includes processes such as transform 12, quantization 14, and entropy encoding 16, and finally output a code stream of the image to be encoded.
- the decoder usually decodes the received bitstream according to the inverse process of the above process to recover the image information before encoding.
- the encoding parameters mentioned in the embodiments of the present application may refer to any parameters that need to be used in the encoding process, and may include, for example, one or more of the following parameters: a parameter for indicating a transformation mode of an image to be encoded (or a transformation process Related parameters), parameters used to indicate the quantization mode of the image to be encoded (or related parameters of the quantization process), and parameters used to indicate the entropy encoding mode of the image to be encoded (or related parameters of the entropy encoding process).
- the relevant parameters of the transformation process may include, for example, parameters indicating a transformation manner and / or a precision of the transformation.
- the changing mode may include a discrete cosine transform (DCT), a discrete wavelet transform (DWT), and the like.
- the relevant parameters of the quantization process may include, for example, a parameter indicating a selection manner of the quantization parameter and a parameter indicating a design manner of the quantization table.
- the relevant parameters of the entropy encoding process may include, for example, parameters indicating an entropy encoding mode, parameters indicating an estimation of a probability distribution of the entropy encoding, and the like.
- the entropy coding method may be, for example, Shannon coding, Huffman coding, or arithmetic coding.
- the value of the encoding parameters is generally fixed, that is, regardless of whether the images to be encoded are the same, they are encoded using a uniform encoding parameter.
- the appropriate encoding parameters are not selected for the differences of each image, resulting in encoding.
- the generality of the parameters is poor.
- Some manufacturers have proposed optimization schemes for coding parameters, but most of these optimization schemes require manual optimization of coding parameters. Manual coding parameter optimization requires higher professional quality of coding personnel, which is time-consuming and labor-intensive. Complex issues.
- the embodiment of the present application proposes an image coding scheme that uses the neural network model to adaptively select the optimal coding parameters for the image to be coded from the perspective of deep learning. With a view to improving the encoding quality of images.
- the concepts of deep learning and neural network models are briefly introduced.
- Deep learning originated from the study of neural networks. In the 1960s, inspired by neuroscience research on the structure of the human brain, in order to make machines have the same intelligence as humans, artificial neural networks were proposed to simulate the process of data processing by the human brain.
- BP backpropogation
- neural network models based on deep learning are widely used in image fields, such as object detection and face recognition in images, and have achieved great success.
- image fields such as object detection and face recognition in images
- deep learning-based neural network models do not need to manually select features. Instead, the neural network model is trained by learning to extract image features, and then use the extracted image features for subsequent decision-making to achieve Classification and recognition of images.
- the embodiment of the present application uses a neural network model to optimize image coding parameters, so as to improve the image coding quality.
- step 22 an image to be encoded is acquired.
- the image format of the image to be encoded is not specifically limited in the embodiment of the present application, and may be an image in a YUV format or an image in an RGB format.
- step 24 according to the image to be encoded, an optimal coding parameter corresponding to the image to be encoded is obtained through the trained neural network model.
- the aforementioned neural network model may be used to indicate a mapping relationship between an image to be encoded and its corresponding optimal encoding parameter.
- the neural network model can directly output the optimal encoding parameters corresponding to the image to be encoded; for another example, the neural network model can output the preset probability of each candidate encoding parameter as the optimal encoding parameter, and then a parameter selection module ( Or parameter selection step) from which the candidate encoding parameter with the highest probability is selected as the optimal encoding parameter corresponding to the image to be encoded.
- the embodiment of the present application does not specifically limit the structure of the neural network model, for example, it may be a CNN, a recurrent neural network (RNN), or a fully convolutional network (FCN).
- the embodiment of the present application does not specifically limit the training method of the neural network model provided in the embodiment of the present application. The detailed description will be given below in combination with specific embodiments, which will not be described in detail here.
- step 26 the image to be encoded is encoded according to the optimal encoding parameter corresponding to the image to be encoded.
- the encoding of the image to be encoded may be implemented by using a standard encoder (such as an encoder supporting the JPEG or H264 standard), or by using a non-standard encoder, which is not limited in this embodiment of the present application.
- a standard encoder such as an encoder supporting the JPEG or H264 standard
- a non-standard encoder which is not limited in this embodiment of the present application.
- the embodiment of the present application first uses the trained neural network model to automatically select the optimal encoding parameter corresponding to the image to be encoded, and then uses the optimal encoding parameter corresponding to the image to be encoded to encode the image to be encoded.
- the above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.
- This embodiment of the present application uses a deep learning-based method and uses a trained neural network model to encode each image to be encoded. The parameters are optimized, thereby establishing the mapping relationship between the image to be encoded and the optimal encoding parameters, solving the problem of poor universality of traditional encoding parameters, and optimizing the encoding framework.
- This embodiment of the present application does not specifically limit the timing of performing steps 22 to 26.
- steps 22 to 26 may be performed online (that is, steps 22 to 26 are performed once for each frame of an image to be encoded).
- steps 22 to 26 are performed once for each frame of an image to be encoded.
- the image to be encoded may be input into a neural network model and an encoder, respectively.
- the neural network model can calculate the probability (or probability estimation) of the preset candidate encoding parameters as the optimal encoding parameters according to the characteristics of the image to be encoded, and output the calculated probability to a selection module, which can be selected from the selection module.
- the candidate encoding parameter with the highest probability is used as the optimal encoding parameter corresponding to the image to be encoded, and the optimal encoding parameter corresponding to the image to be encoded is sent to the encoder.
- the encoder can be a standard encoder or a non-standard encoder.
- the encoder can encode an image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded to obtain a code stream of the image to be encoded. Then, the encoder can send the code stream to the decoding end, and the decoder at the decoding end decodes the image to obtain a decoded image.
- step 22 to step 24 may be performed offline, and then step 26 may be performed online.
- a target image such as the first frame of the image to be encoded
- the optimal corresponding to the target image is determined offline using steps 22 to 24 Encoding parameters.
- the encoding parameters of the encoder can be set to the optimal encoding parameters corresponding to the target image, and each frame image in the to-be-encoded image is encoded online.
- the structure of the neural network model or the operations it needs to implement can be flexibly arranged as required, which is not limited in the embodiments of the present application. In the following, the structure of the neural network model or the operations that it needs to implement are described in detail in combination with specific embodiments.
- the neural network model can be used to perform the operations shown in Figure 4:
- Step 42 Perform feature extraction on the image to be encoded to obtain a feature vector.
- the neural network model may include multiple convolution and down-sampling layers (or convolution layers) for extracting feature vectors of an image to be encoded.
- the features of the image to be encoded may also be extracted by other methods.
- traditional feature extraction methods such as pyramid decomposition or principal component analysis (PCA) can be used to extract the features of the image to be encoded.
- PCA principal component analysis
- a traditional feature extraction manner may be combined with a feature extraction manner based on a convolution operation.
- Step 44 Determine the optimal encoding parameter corresponding to the image to be encoded according to the feature vector.
- step 44 There may be multiple implementations of step 44.
- the feature vector may be directly input to the fully connected layer of the neural network model, and then the candidate encoding parameters are calculated using the fully connected layer to be optimal Probability of encoding parameters, thereby determining an optimal encoding parameter corresponding to an image to be encoded.
- the neural network model is a neural network model formed by a convolution and downsampling layer and a fully connected layer as shown in FIG. 5, because the convolution and downsampling layer can receive an arbitrary size (or If the size of the image to be encoded is not fixed, the size of the feature map output by the convolution and downsampling layers is also not fixed.
- a spatial pyramid pooling layer can be set between the convolutional layer and the fully connected layer, and the feature maps output by the convolutional and downsampling layers can be processed to obtain a fixed-dimensional Feature vector, and then input a fixed-dimensional feature vector to the fully connected layer to obtain the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter corresponding to the image to be encoded.
- the introduction of the spatial pyramid pooling layer enables the neural network model provided by the embodiments of the present application to be able to process images to be encoded in any size (or resolution). Of course, other methods can also be used to convert the feature map output by the convolution layer into a feature vector with a fixed number of dimensions. This embodiment of the present application is not limited to this.
- the use of the spatial pyramid pooling layer can better avoid the image to be encoded Loss of critical information.
- the embodiment of the present application does not limit the specific implementation manner of the spatial pyramid pooling layer. A possible implementation manner is given below with reference to FIG. 6.
- the spatial pyramid pooling layer can block them at three scales as shown in Figure 6, and then A feature is extracted in each block, so that the feature dimensions of the output of feature maps of any size are the same, which effectively solves the requirement of inputting feature vectors of fixed dimensions in the fully-connected layer later.
- the fully connected layer can use the input feature vector to generate the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter of the image to be encoded.
- the fully connected layer is equivalent to mapping the hidden features of the image to be encoded into the space of the optimal encoding parameter (or optimal encoding parameter mode) of the encoder, thereby generating the optimal encoding parameter of the image to be encoded.
- the above mainly describes the process of using the neural network model. Before using the neural network model, it usually needs to be trained to obtain the parameters of the neural network model. The following describes the training process of the neural network model in combination with specific embodiments.
- the training process of the neural network model can be performed based on the training samples.
- the training samples can be input images with known optimal encoding parameters.
- the format of the input image may be an RGB format or a YUV format, which is not limited in this embodiment of the present application.
- the input image data can be original image data or image data after preprocessing.
- the data of the input image may be normalized data.
- the normalized processing of the input image data can improve the convergence performance and feature expression ability of the neural network model.
- the components of the original image can be normalized in the following manner to obtain the input image:
- Neural network models generally use one or more channels to process images. Taking an image in the RGB format as an example, a neural network model usually uses three channels to process the color components R, G, and B, respectively. Taking an image in the YUV format as an example, a neural network model usually uses three channels to separately color the components Y, G, and B. U, V for processing.
- the data of some color components of the input image used for training can be merged together, so that the data of the input image is reduced from three-channel data to double Channel data or single channel data.
- the data of the input image may be dual-channel data, and one of the two-channel data includes data corresponding to two color components of the input image.
- the data of the input image may be single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
- the number of channels of the input image can be determined based on the format of the input image or the down-sampling method of the color components of the input image. Taking FIG. 7 as an example, for an input image in the YUV444 format, since the amount of data of each color component is the same, the data of such an input image can still maintain three channels of data. For the input image of YUV422 format, since its color component U and color component V are down-sampled in the horizontal direction, as shown in FIG. 7, the color component U and the color component V can be spliced in the horizontal direction so that the input image The data is dual-channel data.
- the color component U and color component V are down-sampled in both horizontal and vertical directions, the color component U and color component V are reduced to the data amount of color component Y in both horizontal and vertical directions. Therefore, in order to facilitate training, as shown in FIG. 7, the data of the color component U and the color component V can be stitched and then combined with the color component Y (such as being placed below the color component Y) so that the input image
- the data is single-channel data.
- a certain number of candidate coding parameters can be set in advance.
- the embodiment of the present application does not specifically limit the number and selection method of candidate coding parameters, and may be selected based on experience or experiments. For example, multiple sets of coding parameters may be obtained first based on experience; then, optionally, the multiple sets of candidate coding parameters may be randomly modified to generate modified coding parameters; then, a peak signal to noise ratio (peak signal to noise) may be used ratio, PSNR) or other evaluation methods to evaluate the coding performance of these coding parameters, and select the coding parameter with the best coding performance as a candidate coding parameter.
- the number of candidate coding parameters may be set according to actual needs, or may be directly set to a fixed value. For example, the number of candidate coding parameters may be set to 27.
- the probability in FIG. 8 may be a 27 ⁇ 1 vector, which corresponds to the 27 groups of candidate coding parameters one by one.
- the value of each element of the vector can be a number between 0 and 1, which is used to indicate the probability that a set of candidate coding parameters corresponding to the element is the optimal coding parameter.
- the probability that each candidate coding parameter output by the neural network model is the optimal coding parameter can be called the true value output by the neural network model.
- the values of the optimal encoding parameters corresponding to the input image can be set to 1 and the values of the remaining encoding parameters are set to 0 to obtain the theoretical value output by the neural network.
- the parameters of the neural network model can be adjusted according to the deviation between the real value output from the neural network model and the theoretical value, so that the real value output by the neural network model is as close to the theoretical value as possible, thereby implementing the training of the neural network model.
- the training process of the neural network model can be seen as a process of continuous iteration, so that the real value and the theoretical value of the output of the neural network model are continuously approaching.
- the termination of the iterative process can be determined by a loss function (or cost function).
- the embodiment of the present application does not specifically limit the type of the loss function used in the training process of the neural network model, and may be a minimum squared-error (MSE) function or a cross-entropy function.
- MSE minimum squared-error
- the cross-entropy function can be defined as follows:
- y ′ is the actual output
- y is the ideal output
- n is the number of candidate encoding parameters.
- the resolution of the input image used for training can be selected according to actual needs, an image with a fixed resolution can be selected, or an image with a different resolution can be selected.
- the input image can be selected as an image with multiple resolutions. These input images can be randomly input into the neural network model for training, or they can be input into the neural network model for training in a certain order.
- the input image may include a first image set and a second image set.
- Each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from the resolution of the images in the first image set.
- the images in the second image set may have the same or different resolutions, which is not limited in this embodiment of the present application.
- step 92 the neural network model is trained using the images in the first image set to obtain the parameters of the neural network model.
- the training phase corresponding to this step can be called a single-resolution image training phase, the purpose is to be able to quickly train a neural network model based on a resolution image.
- step 94 the images in the second image set are used to modify the parameters of the neural network model.
- the training phase corresponding to this step can be called a multi-resolution image training phase, and the purpose is to fine-tune the parameters of the neural network model generated in step 92, so that the neural network model can be used to process images of other resolutions.
- the two-step training method shown in FIG. 9 can make the parameters of the neural network model quickly converge, thereby improving the training efficiency of the neural network model.
- FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application.
- the image encoding device 1000 includes a memory 1010 and a processor 1020.
- the memory 1010 may be used to store a program.
- the processor 1020 may be configured to execute a program stored in the memory to perform the following operations: obtaining an image to be encoded; and obtaining an optimal encoding corresponding to the image to be encoded according to the image to be encoded by using a trained neural network model. Parameters; encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
- the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine an optimal encoding parameter corresponding to the image to be encoded.
- determining the optimal encoding parameter corresponding to the image to be encoded according to the feature vector may include generating an output vector according to the feature vector, where the output vector is used to represent multiple preset presets.
- the probability that each candidate encoding parameter is an optimal encoding parameter and based on the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, the one with the highest probability is selected from the plurality of candidate encoding parameters as the corresponding one of the image to be encoded Optimal encoding parameters.
- the processor 1020 may be further configured to perform the following operation: training the neural network model according to an input image whose optimal encoding parameters are known.
- the training process of the neural network model may use a cross-entropy function as a loss function.
- the data of the input image may be normalized data.
- the data of the input image may be dual-channel data, and one of the two-channel data may include data corresponding to two color components of the input image.
- the data of the input image may be single-channel data, and the single-channel data may include data corresponding to each color component of the input image.
- the data of the input image may be data in YUV or RGB format.
- the input image may include a first image set and a second image set, each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from The resolution of the images in the first image set, and the training of the neural network model according to the input image whose optimal encoding parameters are known may include: using the images in the first image set to train the neural network The network model is trained to obtain the parameters of the neural network model; the images in the second image set are used to modify the parameters of the neural network model.
- the encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded may include: using a standard or non-standard encoder according to the optimal encoding parameter corresponding to the image to be encoded. Encoding the image to be encoded.
- the optimal encoding parameter corresponding to the image to be encoded may include at least one of the following: a parameter for indicating a transformation manner of the image to be encoded, and a parameter for indicating a quantization manner of the image to be encoded. Parameters, and parameters used to indicate an entropy encoding manner of the image to be encoded.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc. .
- a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
- an optical medium for example, a digital video disc (DVD)
- DVD digital video disc
- SSD solid state disk
- the disclosed systems, devices, and methods may be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
Provided are an image encoding method and apparatus. The method comprises: acquiring an image to be encoded; obtaining, according to the image to be encoded and by means of a trained neural network model, an optimal encoding parameter corresponding to the image to be encoded; and encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded. The processing method involves automatically selecting, by using a trained neural network model, an optimal encoding parameter corresponding to an image to be encoded, and then encoding the image to be encoded by using the optimal encoding parameter corresponding to the image to be encoded, wherein same takes the difference between images to be encoded into full consideration, and may select the best matched encoding parameter for each of the images to be encoded, thereby improving the encoding quality of the image to be encoded.
Description
版权申明Copyright statement
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。The content disclosed in this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the official records and archives of the Patent and Trademark Office.
本申请涉及图像编解码领域,并且更为具体地,涉及一种图像编码方法和装置。The present application relates to the field of image encoding and decoding, and more particularly, to an image encoding method and device.
随着图像或视频技术的快速发展,如何提高图像的编解码质量成为人们关心的热点问题。With the rapid development of image or video technology, how to improve the quality of image encoding and decoding has become a hot issue that people care about.
对于图像编解码质量而言,编码参数的选取尤为关键。传统编解码技术通常采用固定的编码参数对待编码图像进行编码,或者采用人工方式选择待编码图像的编码参数,导致待编码图像的编码质量较差。For image codec quality, the selection of encoding parameters is particularly critical. Traditional encoding and decoding technologies usually use fixed encoding parameters to encode the image to be encoded, or manually select encoding parameters of the image to be encoded, resulting in poor encoding quality of the image to be encoded.
发明内容Summary of the Invention
本申请提供一种图像编码方法和装置,能够提高待编码图像的编码质量。The present application provides an image encoding method and device, which can improve the encoding quality of an image to be encoded.
第一方面,提供一种图像编码方法,包括:获取待编码图像;根据所述待编码图像,通过训练出的神经网络模型,得到所述待编码图像对应的最优编码参数;根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码。According to a first aspect, an image encoding method is provided, including: acquiring an image to be encoded; and obtaining an optimal encoding parameter corresponding to the image to be encoded according to the image to be encoded and a trained neural network model; The optimal encoding parameter corresponding to the encoded image encodes the image to be encoded.
第二方面,提供一种图像编码装置,包括:存储器,用于存储程序;处理器,用于执行所述存储器中存储的程序,以执行如下操作:获取待编码图像;根据所述待编码图像,通过训练出的神经网络模型,得到所述待编码图像对应的最优编码参数;根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码。According to a second aspect, an image encoding device is provided, including: a memory for storing a program; a processor for executing the program stored in the memory to perform the following operations: obtaining an image to be encoded; and according to the image to be encoded Obtaining the optimal encoding parameter corresponding to the image to be encoded by using the trained neural network model; and encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded.
第三方面,提供一种图像编码装置包括用于执行第一方面的方法的模块。According to a third aspect, there is provided an image encoding apparatus including a module for performing the method of the first aspect.
第四方面,提供一种计算机可读存储介质,其上存储有用于执行第一方 面中的方法的指令。In a fourth aspect, there is provided a computer-readable storage medium having stored thereon instructions for performing the method in the first aspect.
第五方面,提供一种计算机程序产品,包括用于执行第一方面中的方法的指令。In a fifth aspect, a computer program product is provided including instructions for performing the method in the first aspect.
本申请首先利用训练出的神经网络模型自动选取待编码图像对应的最优编码参数,然后利用该待编码图像对应的最优编码参数对该待编码图像进行编码。上述图像编码方式充分考虑了各待编码图像之间的差异性,为各待编码图像选取与其最为匹配的编码参数,从而可以提高待编码图像的编码质量。This application first uses the trained neural network model to automatically select the optimal encoding parameters corresponding to the image to be encoded, and then uses the optimal encoding parameters corresponding to the image to be encoded to encode the image to be encoded. The above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.
图1是传统图像编码过程示意图。FIG. 1 is a schematic diagram of a conventional image coding process.
图2是本申请实施例提供的图像编码方法的示意性流程图。FIG. 2 is a schematic flowchart of an image coding method according to an embodiment of the present application.
图3是本申请实施例提供的图像编码过程示意图。FIG. 3 is a schematic diagram of an image encoding process provided by an embodiment of the present application.
图4是本申请实施例提供的神经网络模型对待编码图像的处理方式的示例图。FIG. 4 is an example diagram of a processing manner of a neural network model provided by an embodiment of the present application for an image to be encoded.
图5是本申请实施例提供的神经网络模型的示意性结构图。FIG. 5 is a schematic structural diagram of a neural network model according to an embodiment of the present application.
图6是本申请实施例提供的空间金字塔池化的实现过程的示意图。FIG. 6 is a schematic diagram of an implementation process of spatial pyramid pooling provided by an embodiment of the present application.
图7是本申请实施例提供的输入图像格式与通道数量的配置方式的示例图。FIG. 7 is an example diagram of an input image format and a configuration manner of the number of channels provided by an embodiment of the present application.
图8是本申请实施例提供的神经网络模型的训练过程的示意图。FIG. 8 is a schematic diagram of a training process of a neural network model according to an embodiment of the present application.
图9是本申请实施例提供的神经网络模型的训练步骤的示意性流程图。FIG. 9 is a schematic flowchart of a training step of a neural network model according to an embodiment of the present application.
图10是本申请实施例提供的图像编码装置的示意性结构图。FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application.
本申请实施例可适用于标准或非标准的图像或视频编码器。例如,可适用于JPEG(joint photograthic experts group),JPEG2000,H.264,H.265等标准的编码器。The embodiments of the present application can be applied to standard or non-standard image or video encoders. For example, it is applicable to JPEG (joint photograthic experts), JPEG2000, H.264, H.265 and other standard encoders.
为了便于理解,先结合图1,对传统图像编码过程(或视频编码过程)进行简单介绍。In order to facilitate understanding, the traditional image encoding process (or video encoding process) is briefly introduced with reference to FIG. 1.
如图1所示,在接收到待编码图像之后,传统图像编码过程通常包括变换12、量化14和熵编码16等过程,最终输出待编码图像的码流。解码端通 常是按照上述过程的逆过程对接收到的码流进行解码,以恢复出编码前的图像信息。As shown in FIG. 1, after receiving an image to be encoded, the traditional image encoding process usually includes processes such as transform 12, quantization 14, and entropy encoding 16, and finally output a code stream of the image to be encoded. The decoder usually decodes the received bitstream according to the inverse process of the above process to recover the image information before encoding.
本申请实施例提及的编码参数可以指编码过程中需要使用的任意参数,例如可以包括以下参数中的一种或多种:用于指示待编码图像的变换方式的参数(或称变换过程的相关参数),用于指示待编码图像的量化方式的参数(或称量化过程的相关参数),以及用于指示待编码图像的熵编码方式的参数(或称熵编码过程的相关参数)。The encoding parameters mentioned in the embodiments of the present application may refer to any parameters that need to be used in the encoding process, and may include, for example, one or more of the following parameters: a parameter for indicating a transformation mode of an image to be encoded (or a transformation process Related parameters), parameters used to indicate the quantization mode of the image to be encoded (or related parameters of the quantization process), and parameters used to indicate the entropy encoding mode of the image to be encoded (or related parameters of the entropy encoding process).
变换过程的相关参数例如可以包括指示变换方式和/或变换精度的参数。变化方式可包括离散余弦变换(discrete cosine transform,DCT)、离散小波变换(discrete wavelet transform,DWT)等。The relevant parameters of the transformation process may include, for example, parameters indicating a transformation manner and / or a precision of the transformation. The changing mode may include a discrete cosine transform (DCT), a discrete wavelet transform (DWT), and the like.
量化过程的相关参数例如可以包括指示量化参数的选取方式的参数,以及指示量化表的设计方式的参数。The relevant parameters of the quantization process may include, for example, a parameter indicating a selection manner of the quantization parameter and a parameter indicating a design manner of the quantization table.
熵编码过程的相关参数例如可以包括指示熵编码方式的参数,指示熵编码的概率分布的估计的参数等。熵编码方式例如可以是香浓编码,霍夫曼编码或算数编码等。The relevant parameters of the entropy encoding process may include, for example, parameters indicating an entropy encoding mode, parameters indicating an estimation of a probability distribution of the entropy encoding, and the like. The entropy coding method may be, for example, Shannon coding, Huffman coding, or arithmetic coding.
传统的图像编码过程中,编码参数的取值一般为固定值,即无论待编码图像是否相同,均采用统一的编码参数进行编码,没有针对各图像的差异为其选取合适的编码参数,导致编码参数的普适性较差。有些厂商提出了编码参数的优化方案,但此类优化方案大多需要采用人工的方式对编码参数进行优化,人工的编码参数优化方式对编码人员的专业素质要求较高,存在耗时耗力、实现复杂等问题。In the traditional image encoding process, the value of the encoding parameters is generally fixed, that is, regardless of whether the images to be encoded are the same, they are encoded using a uniform encoding parameter. The appropriate encoding parameters are not selected for the differences of each image, resulting in encoding. The generality of the parameters is poor. Some manufacturers have proposed optimization schemes for coding parameters, but most of these optimization schemes require manual optimization of coding parameters. Manual coding parameter optimization requires higher professional quality of coding personnel, which is time-consuming and labor-intensive. Complex issues.
考虑到图像之间的差异性,本申请实施例从深度学习的角度出发,提出一种根据图像自身的特征,采用神经网络模型自适应地为待编码图像选取最优编码参数的图像编码方案,以期提高图像的编码质量。为了便于理解,在详细介绍本申请实施例提供的图像编码方式之前,先对深度学习和神经网络模型的概念进行简单介绍。Considering the differences between images, the embodiment of the present application proposes an image coding scheme that uses the neural network model to adaptively select the optimal coding parameters for the image to be coded from the perspective of deep learning. With a view to improving the encoding quality of images. In order to facilitate understanding, before introducing the image coding methods provided in the embodiments of the present application in detail, the concepts of deep learning and neural network models are briefly introduced.
深度学习起源于对神经网络的研究。20世纪60年代,受神经科学对人脑结构研究的启发,为了让机器也具有类似人一样的智能,人工神经网络被提出,用于模拟人脑处理数据的流程。Deep learning originated from the study of neural networks. In the 1960s, inspired by neuroscience research on the structure of the human brain, in order to make machines have the same intelligence as humans, artificial neural networks were proposed to simulate the process of data processing by the human brain.
20世纪80年代中期,反向传播(back propogation,BP)算法的提出,提供了一条如何学习含有多隐层结构的神经网络模型的途径,让神经网络研 究得高速发展。但是,由于传统的神经网络基本都是全连接网络,导致在输入维度较大的情况下网络参数太多,难以训练。为此,针对高维数据的神经网络的研究一直处于停滞状态。但是,随着卷积神经网络(convolutional neural network,CNN)等神经网络模型的提出,解决了神经网络模型参数太多、难以训练的问题,也使得神经网络模型在各个领域的应用越来越多。In the mid-1980s, the backpropogation (BP) algorithm was proposed to provide a way to learn neural network models with multiple hidden layer structures, allowing the rapid development of neural network research. However, since traditional neural networks are basically fully connected networks, there are too many network parameters in the case of large input dimensions and it is difficult to train. For this reason, research on neural networks for high-dimensional data has been stagnant. However, with the development of convolutional neural network (CNN) and other neural network models, it has solved the problem of too many neural network model parameters and difficult to train, and it has also made neural network models more and more applied in various fields. .
目前,基于深度学习的神经网络模型被广泛应用于图像领域,如广泛应用于图像中的目标检测、人脸识别,并取得了极大的成功。相对于传统图像检测算法,基于深度学习的神经网络模型不需要人为选择特征,而是通过学习的方式训练神经网络模型提取图像的特征,然后将提取出的图像特征用于后续的决策,从而实现图像的分类、识别等功能。At present, neural network models based on deep learning are widely used in image fields, such as object detection and face recognition in images, and have achieved great success. Compared with traditional image detection algorithms, deep learning-based neural network models do not need to manually select features. Instead, the neural network model is trained by learning to extract image features, and then use the extracted image features for subsequent decision-making to achieve Classification and recognition of images.
本申请实施例利用神经网络模型对图像的编码参数进行优化,以期提高图像的编码质量。The embodiment of the present application uses a neural network model to optimize image coding parameters, so as to improve the image coding quality.
下面结合图2,对本申请实施例提供的图像编码方法进行详细介绍。The image encoding method provided by the embodiment of the present application is described in detail below with reference to FIG. 2.
在步骤22,获取待编码图像。In step 22, an image to be encoded is acquired.
本申请实施例对待编码图像的图像格式不做具体限定,可以是YUV格式的图像,也可以是RGB格式的图像。The image format of the image to be encoded is not specifically limited in the embodiment of the present application, and may be an image in a YUV format or an image in an RGB format.
在步骤24,根据待编码图像,通过训练出的神经网络模型,得到待编码图像对应的最优编码参数。In step 24, according to the image to be encoded, an optimal coding parameter corresponding to the image to be encoded is obtained through the trained neural network model.
上述神经网络模型可用于指示待编码图像与其对应的最优编码参数之间的映射关系。例如,神经网络模型可以直接输出待编码图像对应的最优编码参数;又如,神经网络模型可以输出预设的各个候选编码参数为最优编码参数的概率,然后可以再利用一个参数选取模块(或参数选取步骤)从中选取概率最大的候选编码参数作为待编码图像对应的最优编码参数。The aforementioned neural network model may be used to indicate a mapping relationship between an image to be encoded and its corresponding optimal encoding parameter. For example, the neural network model can directly output the optimal encoding parameters corresponding to the image to be encoded; for another example, the neural network model can output the preset probability of each candidate encoding parameter as the optimal encoding parameter, and then a parameter selection module ( Or parameter selection step) from which the candidate encoding parameter with the highest probability is selected as the optimal encoding parameter corresponding to the image to be encoded.
本申请实施例对神经网络模型的结构不做具体限定,例如可以是CNN,也可以是循环神经网络(recurrent neural network,RNN),全卷积网络(fully convolutional network,FCN)。本申请实施例对本申请实施例提供的神经网络模型的训练方式不做具体限定,下文会结合具体的实施例进行详细的举例说明,此处暂不详述。The embodiment of the present application does not specifically limit the structure of the neural network model, for example, it may be a CNN, a recurrent neural network (RNN), or a fully convolutional network (FCN). The embodiment of the present application does not specifically limit the training method of the neural network model provided in the embodiment of the present application. The detailed description will be given below in combination with specific embodiments, which will not be described in detail here.
在步骤26,根据待编码图像对应的最优编码参数,对待编码图像进行编码。In step 26, the image to be encoded is encoded according to the optimal encoding parameter corresponding to the image to be encoded.
在上述步骤中,待编码图像的编码可以采用标准编码器(如支持JPEG 或H264标准的编码器)实现,也可以采用非标准编码器实现,本申请实施例对此并不限定。In the above steps, the encoding of the image to be encoded may be implemented by using a standard encoder (such as an encoder supporting the JPEG or H264 standard), or by using a non-standard encoder, which is not limited in this embodiment of the present application.
本申请实施例首先利用训练出的神经网络模型自动选取待编码图像对应的最优编码参数,然后利用该待编码图像对应的最优编码参数对该待编码图像进行编码。上述图像编码方式充分考虑了各待编码图像之间的差异性,为各待编码图像选取与其最为匹配的编码参数,从而可以提高待编码图像的编码质量。The embodiment of the present application first uses the trained neural network model to automatically select the optimal encoding parameter corresponding to the image to be encoded, and then uses the optimal encoding parameter corresponding to the image to be encoded to encode the image to be encoded. The above image encoding method fully considers the differences between the images to be encoded, and selects the encoding parameters that most closely match the images to be encoded, thereby improving the encoding quality of the images to be encoded.
对于编码器,通常很难通过理论推导建立待编码图像和最优编码参数之间的映射关系,本申请实施例采用基于深度学习的方式,采用训练出的神经网络模型对各待编码图像的编码参数进行优化,从而建立了待编码图像与最优编码参数之间的映射关系,解决了传统编码参数普适性差的问题,优化了编码框架。For encoders, it is often difficult to establish the mapping relationship between the image to be encoded and the optimal encoding parameters through theoretical derivation. This embodiment of the present application uses a deep learning-based method and uses a trained neural network model to encode each image to be encoded. The parameters are optimized, thereby establishing the mapping relationship between the image to be encoded and the optimal encoding parameters, solving the problem of poor universality of traditional encoding parameters, and optimizing the encoding framework.
本申请实施例对执行步骤22至步骤26的时机不做具体限定。This embodiment of the present application does not specifically limit the timing of performing steps 22 to 26.
可选地,作为一种实现方式,可以在线执行步骤22至步骤26(即每输入一帧待编码图像,则执行一次步骤22至步骤26)。下面结合图3,对这种实现方式进行详细的举例说明。Optionally, as an implementation manner, steps 22 to 26 may be performed online (that is, steps 22 to 26 are performed once for each frame of an image to be encoded). The following describes this implementation manner in detail with reference to FIG. 3.
参见图3,当接收到待编码图像时,可以将该待编码图像分别输入至神经网络模型和编码器中。神经网络模型可以根据待编码图像的特征,计算预设的各候选编码参数为最优编码参数的概率(或称概率估计),并将计算得到的概率输出至选择模块,该选择模块可以从中选取概率最大的候选编码参数作为待编码图像对应的最优编码参数,并将待编码图像对应的最优编码参数发送至编码器。该编码器可以是标准编码器,也可以是非标准编码器。该编码器可以根据待编码图像对应的最优编码参数对待编码图像进行编码,得到待编码图像的码流。然后,编码器可以将该码流发送至解码端,由解码端的解码器对图像进行解码,从而获得解码后的图像。Referring to FIG. 3, when an image to be encoded is received, the image to be encoded may be input into a neural network model and an encoder, respectively. The neural network model can calculate the probability (or probability estimation) of the preset candidate encoding parameters as the optimal encoding parameters according to the characteristics of the image to be encoded, and output the calculated probability to a selection module, which can be selected from the selection module. The candidate encoding parameter with the highest probability is used as the optimal encoding parameter corresponding to the image to be encoded, and the optimal encoding parameter corresponding to the image to be encoded is sent to the encoder. The encoder can be a standard encoder or a non-standard encoder. The encoder can encode an image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded to obtain a code stream of the image to be encoded. Then, the encoder can send the code stream to the decoding end, and the decoder at the decoding end decodes the image to obtain a decoded image.
可选地,作为另一种实现方式,可以离线执行步骤22至步骤24,然后在线执行步骤26。例如,假设待编码图像包含一组图像,可以从这组图像中选取一张目标图像(如待编码图像的第一帧),利用步骤22至步骤24离线确定该目标图像对应的最优编码参数。然后,可以将编码器的编码参数设置为该目标图像对应的最优编码参数,并在线对待编码图像中的各帧图像进行编码。这种实现方式的优点在于无需对编码器的结构或编码流程进行调整, 与传统编码方式具有很好的兼容性。Optionally, as another implementation manner, step 22 to step 24 may be performed offline, and then step 26 may be performed online. For example, assuming that the image to be encoded contains a group of images, a target image (such as the first frame of the image to be encoded) can be selected from this group of images, and the optimal corresponding to the target image is determined offline using steps 22 to 24 Encoding parameters. Then, the encoding parameters of the encoder can be set to the optimal encoding parameters corresponding to the target image, and each frame image in the to-be-encoded image is encoded online. The advantage of this implementation is that there is no need to adjust the structure of the encoder or the encoding process, and it has good compatibility with traditional encoding methods.
神经网络模型的结构或其需要实现的操作可以根据需要灵活布置,本申请实施例对此并不限定。下面结合具体的实施例,对神经网络模型的结构或其需要实现的操作进行详细地举例说明。The structure of the neural network model or the operations it needs to implement can be flexibly arranged as required, which is not limited in the embodiments of the present application. In the following, the structure of the neural network model or the operations that it needs to implement are described in detail in combination with specific embodiments.
可选地,神经网络模型可用于执行如图4所示的操作:Optionally, the neural network model can be used to perform the operations shown in Figure 4:
步骤42、对待编码图像进行特征提取,得到特征向量。Step 42: Perform feature extraction on the image to be encoded to obtain a feature vector.
待编码图像的特征提取方式可以有多种。如图5所示,神经网络模型可以包括多个卷积及下采样层(或卷积层),用于提取待编码图像的特征向量。或者,也可以采用其他方式提取待编码图像的特征。例如,可以采用如金字塔分解或主成分分析(principal component analysis,PCA)等传统特征提取方式提取待编码图像的特征。或者,在某些实施例中,还可以将传统的特征提取方式与基于卷积操作的特征提取方式相结合。经过特征提取和下采样操作,可以得到待编码图像的特征向量,该特征向量可以看成是待编码图像的高维特征表示。There can be multiple feature extraction methods for the image to be encoded. As shown in FIG. 5, the neural network model may include multiple convolution and down-sampling layers (or convolution layers) for extracting feature vectors of an image to be encoded. Alternatively, the features of the image to be encoded may also be extracted by other methods. For example, traditional feature extraction methods such as pyramid decomposition or principal component analysis (PCA) can be used to extract the features of the image to be encoded. Alternatively, in some embodiments, a traditional feature extraction manner may be combined with a feature extraction manner based on a convolution operation. After the feature extraction and downsampling operations, a feature vector of the image to be encoded can be obtained, and the feature vector can be regarded as a high-dimensional feature representation of the image to be encoded.
步骤44、根据特征向量,确定待编码图像对应的最优编码参数。Step 44: Determine the optimal encoding parameter corresponding to the image to be encoded according to the feature vector.
步骤44的实现方式可以有多种。There may be multiple implementations of step 44.
可选地,作为一个实施例,如果待编码图像的尺寸(或分辨率)固定,可以将特征向量直接输入至神经网络模型的全连接层,然后利用全连接层计算各候选编码参数为最优编码参数的概率,从而确定待编码图像对应的最优编码参数。Optionally, as an embodiment, if the size (or resolution) of the image to be encoded is fixed, the feature vector may be directly input to the fully connected layer of the neural network model, and then the candidate encoding parameters are calculated using the fully connected layer to be optimal Probability of encoding parameters, thereby determining an optimal encoding parameter corresponding to an image to be encoded.
可选地,作为另一个实施例,假设神经网络模型为如图5所示的卷积及下采样层和全连接层形成的神经网络模型,由于卷积及下采样层可以接收任意尺寸(或分辨率)的待编码图像,如果待编码图像的尺寸不固定,则卷积及下采样层输出的特征图的尺寸也是不固定的。在这种情况下,如图5所示,可以在卷积层和全连接层之间设置空间金字塔池化层,对卷积及下采样层输出的特征图进行处理,得到具有固定维数的特征向量,然后将固定维数的特征向量输入至全连接层,以得到各候选编码参数为最优编码参数的概率,从而确定待编码图像对应的最优编码参数。空间金字塔池化层的引入使得本申请实施例提供的神经网络模型能够处理任意尺寸(或分辨率)的待编码图像。当然,还可以采用其他方式将卷积层输出的特征图转换成具有固定维数的特征向量,本申请实施例对此并不限定,空间金字塔池化层的运用能够更好地 避免待编码图像中的关键信息的丢失。Optionally, as another embodiment, it is assumed that the neural network model is a neural network model formed by a convolution and downsampling layer and a fully connected layer as shown in FIG. 5, because the convolution and downsampling layer can receive an arbitrary size (or If the size of the image to be encoded is not fixed, the size of the feature map output by the convolution and downsampling layers is also not fixed. In this case, as shown in Figure 5, a spatial pyramid pooling layer can be set between the convolutional layer and the fully connected layer, and the feature maps output by the convolutional and downsampling layers can be processed to obtain a fixed-dimensional Feature vector, and then input a fixed-dimensional feature vector to the fully connected layer to obtain the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter corresponding to the image to be encoded. The introduction of the spatial pyramid pooling layer enables the neural network model provided by the embodiments of the present application to be able to process images to be encoded in any size (or resolution). Of course, other methods can also be used to convert the feature map output by the convolution layer into a feature vector with a fixed number of dimensions. This embodiment of the present application is not limited to this. The use of the spatial pyramid pooling layer can better avoid the image to be encoded Loss of critical information.
本申请实施例对空间金字塔池化层的具体实现方式不做限定。下面结合图6,给出一种可能的实现方式。The embodiment of the present application does not limit the specific implementation manner of the spatial pyramid pooling layer. A possible implementation manner is given below with reference to FIG. 6.
如图6所示,对于通过卷积及下采样层输出的特征图(或称高维隐含特征图),空间金字塔池化层可以将其在如图6所示的三个尺度进行分块,然后在每个分块提取一个特征,从而使得任意大小的特征图,其输出的特征维数都是相同的,从而有效解决了后面全连接层需要输入固定维数的特征向量的要求。As shown in Figure 6, for the feature maps (or high-dimensional hidden feature maps) output through the convolution and downsampling layers, the spatial pyramid pooling layer can block them at three scales as shown in Figure 6, and then A feature is extracted in each block, so that the feature dimensions of the output of feature maps of any size are the same, which effectively solves the requirement of inputting feature vectors of fixed dimensions in the fully-connected layer later.
全连接层可以利用输入的特征向量生成各候选编码参数为最优编码参数的概率,从而可以确定待编码图像的最优编码参数。全连接层相当于将待编码图像的隐含特征映射到编码器的最优编码参数(或最优编码参数模式)的空间中,从而产生了待编码图像的最优编码参数。The fully connected layer can use the input feature vector to generate the probability that each candidate encoding parameter is the optimal encoding parameter, thereby determining the optimal encoding parameter of the image to be encoded. The fully connected layer is equivalent to mapping the hidden features of the image to be encoded into the space of the optimal encoding parameter (or optimal encoding parameter mode) of the encoder, thereby generating the optimal encoding parameter of the image to be encoded.
上文主要描述了神经网络模型的使用过程,在使用神经网络模型之前,通常需要对其进行训练,以获得神经网络模型的参数。下面结合具体的实施例对神经网络模型的训练过程进行介绍。The above mainly describes the process of using the neural network model. Before using the neural network model, it usually needs to be trained to obtain the parameters of the neural network model. The following describes the training process of the neural network model in combination with specific embodiments.
神经网络模型的训练过程可以基于训练样本进行。训练样本可以是最优编码参数已知的输入图像。输入图像的格式可以是RGB格式,也可以是YUV格式,本申请实施例对此并不限定。The training process of the neural network model can be performed based on the training samples. The training samples can be input images with known optimal encoding parameters. The format of the input image may be an RGB format or a YUV format, which is not limited in this embodiment of the present application.
输入图像的数据可以是原始的图像数据,也可以经过预处理之后的图像数据。例如,输入图像的数据可以为经过归一化的数据。输入图像数据的归一化处理可以提高神经网络模型的收敛性能和特征表达能力。The input image data can be original image data or image data after preprocessing. For example, the data of the input image may be normalized data. The normalized processing of the input image data can improve the convergence performance and feature expression ability of the neural network model.
以原始图像的格式为YUV(例如可以是YUV444,YUV422,YUV420)为例,对于原始图像f(x,y),可以采用如下方式对原始图像的各个分量进行归一化处理,得到输入图像:Taking the format of the original image as YUV (for example, YUV444, YUV422, YUV420) as an example, for the original image f (x, y), the components of the original image can be normalized in the following manner to obtain the input image:
神经网络模型一般使用一个或多个通道(channel)对图像进行处理。以RGB格式的图像为例,神经网络模型通常采用三个通道分别对颜色分量R、G、B进行处理;以YUV格式的图像为例,神经网络模型通常采用三个通 道分别对颜色分量Y、U、V进行处理。Neural network models generally use one or more channels to process images. Taking an image in the RGB format as an example, a neural network model usually uses three channels to process the color components R, G, and B, respectively. Taking an image in the YUV format as an example, a neural network model usually uses three channels to separately color the components Y, G, and B. U, V for processing.
为了降低神经网络模型的训练复杂度,提高神经网络模型的收敛性能,可以将用于训练的输入图像的某些颜色分量的数据合并在一起,从而使得输入图像的数据从三通道数据降为双通道数据或单通道数据。例如,输入图像的数据可以为双通道数据,双通道数据之一包括输入图像的两个颜色分量对应的数据。又如,输入图像的数据可以为单通道数据,单通道数据包括输入图像的各个颜色分量对应的数据。In order to reduce the training complexity of the neural network model and improve the convergence performance of the neural network model, the data of some color components of the input image used for training can be merged together, so that the data of the input image is reduced from three-channel data to double Channel data or single channel data. For example, the data of the input image may be dual-channel data, and one of the two-channel data includes data corresponding to two color components of the input image. As another example, the data of the input image may be single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
输入图像的通道数量可以基于输入图像的格式或输入图像的颜色分量的下采样方式确定。以图7为例,对于YUV444格式的输入图像,由于其每个颜色分量的数据量相同,则此类输入图像的数据仍可以保持三个通道数据。对于YUV422格式的输入图像,由于其颜色分量U和颜色分量V在水平方向进行了下采样,因此,如图7所示,可以将颜色分量U和颜色分量V在水平方向进行拼接,使得输入图像的数据为双通道数据。对于YUV420格式的输入图像,由于其颜色分量U和颜色分量V在水平方向和垂直方向均进行了下采样,使得颜色分量U和颜色分量V在水平和垂直方向均下降为颜色分量Y的数据量的1/2,因此,为了方便训练,如图7所示,可以将颜色分量U和颜色分量V的数据拼接之后,与颜色分量Y合并(如置于颜色分量Y的下方),使得输入图像的数据为单通道数据。The number of channels of the input image can be determined based on the format of the input image or the down-sampling method of the color components of the input image. Taking FIG. 7 as an example, for an input image in the YUV444 format, since the amount of data of each color component is the same, the data of such an input image can still maintain three channels of data. For the input image of YUV422 format, since its color component U and color component V are down-sampled in the horizontal direction, as shown in FIG. 7, the color component U and the color component V can be spliced in the horizontal direction so that the input image The data is dual-channel data. For the input image of YUV420 format, because its color component U and color component V are down-sampled in both horizontal and vertical directions, the color component U and color component V are reduced to the data amount of color component Y in both horizontal and vertical directions. Therefore, in order to facilitate training, as shown in FIG. 7, the data of the color component U and the color component V can be stitched and then combined with the color component Y (such as being placed below the color component Y) so that the input image The data is single-channel data.
下面结合图8,对神经网络模型的训练过程进行介绍。The following describes the training process of the neural network model with reference to FIG. 8.
首先,可以预先设置一定数量的候选编码参数。本申请实施例对候选编码参数的数量和选取方式不做具体限定,可以根据经验或者实验选取。例如,首先可以根据经验获得多组编码参数;然后,可选地,可以对该多组候选编码参数进行随机修正,生成修正后的编码参数;接着,可以利用峰值信噪比(peak signal to noise ratio,PSNR)或其他评价方式评价这些编码参数的编码性能,并从中选取编码性能最优的编码参数作为候选编码参数。候选编码参数的数量可以根据实际需要设定,也可以直接设定为固定值,例如,可以将候选编码参数的数量设定为27。First, a certain number of candidate coding parameters can be set in advance. The embodiment of the present application does not specifically limit the number and selection method of candidate coding parameters, and may be selected based on experience or experiments. For example, multiple sets of coding parameters may be obtained first based on experience; then, optionally, the multiple sets of candidate coding parameters may be randomly modified to generate modified coding parameters; then, a peak signal to noise ratio (peak signal to noise) may be used ratio, PSNR) or other evaluation methods to evaluate the coding performance of these coding parameters, and select the coding parameter with the best coding performance as a candidate coding parameter. The number of candidate coding parameters may be set according to actual needs, or may be directly set to a fixed value. For example, the number of candidate coding parameters may be set to 27.
如图8所示,对于一张输入图像而言,可以将其输入神经网络模型,得到各候选编码参数为最优编码参数的概率。以候选编码参数的数量为27为例,则图8中的概率可以为27×1的向量,与27组候选编码参数一一对应。该向量的每个元素的取值可以为介于0到1之间的数,用于指示该元素对应 的一组候选编码参数为最优编码参数的概率。神经网络模型输出的各候选编码参数为最优编码参数的概率可以称为神经网络模型输出的真实值。As shown in FIG. 8, for an input image, it can be input to a neural network model to obtain the probability that each candidate encoding parameter is the optimal encoding parameter. Taking the number of candidate coding parameters as 27 as an example, the probability in FIG. 8 may be a 27 × 1 vector, which corresponds to the 27 groups of candidate coding parameters one by one. The value of each element of the vector can be a number between 0 and 1, which is used to indicate the probability that a set of candidate coding parameters corresponding to the element is the optimal coding parameter. The probability that each candidate coding parameter output by the neural network model is the optimal coding parameter can be called the true value output by the neural network model.
由于预先知道该输入图像的最优编码参数,可以将该输入图像对应的最优编码参数的取值设置为1,其余编码参数的取值设置为0,得到神经网络输出的理论值。Since the optimal encoding parameters of the input image are known in advance, the values of the optimal encoding parameters corresponding to the input image can be set to 1 and the values of the remaining encoding parameters are set to 0 to obtain the theoretical value output by the neural network.
然后,可以根据神经网络模型输出的真实值与理论值之间的偏差调整神经网络模型的参数,使得神经网络模型输出的真实值尽量接近理论值,从而实现神经网络模型的训练。Then, the parameters of the neural network model can be adjusted according to the deviation between the real value output from the neural network model and the theoretical value, so that the real value output by the neural network model is as close to the theoretical value as possible, thereby implementing the training of the neural network model.
神经网络模型的训练过程可以看成是不断迭代的过程,使得神经网络模型输出的真实值与理论值不断逼近。迭代过程的终止可以由损失函数(或代价函数)决定。The training process of the neural network model can be seen as a process of continuous iteration, so that the real value and the theoretical value of the output of the neural network model are continuously approaching. The termination of the iterative process can be determined by a loss function (or cost function).
本申请实施例对神经网络模型的训练过程所采用的损失函数的类型不做具体限定,可以是最小平方误差(minimum squared-error,MSE)函数,也可以是交叉熵函数。The embodiment of the present application does not specifically limit the type of the loss function used in the training process of the neural network model, and may be a minimum squared-error (MSE) function or a cross-entropy function.
交叉熵函数可以定义如下:The cross-entropy function can be defined as follows:
其中,y′为实际输出,而y为理想输出,n为候选编码参数的数量。Among them, y ′ is the actual output, y is the ideal output, and n is the number of candidate encoding parameters.
相比于MSE函数,交叉熵函数更容易收敛。Compared to the MSE function, the cross-entropy function converges more easily.
用于训练的输入图像的分辨率可以根据实际需要选取,可以选取具有固定分辨率的图像,也可以选取具有不同分辨率的图像。The resolution of the input image used for training can be selected according to actual needs, an image with a fixed resolution can be selected, or an image with a different resolution can be selected.
为了使得训练出的神经网络模型能够支持多种分辨率的图像的编码参数的优化,输入图像可以选取为具有多种分辨率的图像。这些输入图像可以随机输入到神经网络模型中对其进行训练,也可以按照某种顺序输入到神经网络模型中对其进行训练。In order to enable the trained neural network model to support the optimization of encoding parameters of images with multiple resolutions, the input image can be selected as an image with multiple resolutions. These input images can be randomly input into the neural network model for training, or they can be input into the neural network model for training in a certain order.
例如,输入图像可以包括第一图像集合和第二图像集合。第一图像集合中的各图像具有相同的分辨率,第二图像集合中的图像的分辨率不同于第一图像集合中的图像的分辨率。第二图像集合中的图像可以具有相同或不同的分辨率,本申请实施例对此并不限定。For example, the input image may include a first image set and a second image set. Each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from the resolution of the images in the first image set. The images in the second image set may have the same or different resolutions, which is not limited in this embodiment of the present application.
训练神经网络模型时,参见图9,可以分两步进行。When training a neural network model, see Figure 9, which can be performed in two steps.
在步骤92,采用第一图像集合中的图像对神经网络模型进行训练,得到 神经网络模型的参数。In step 92, the neural network model is trained using the images in the first image set to obtain the parameters of the neural network model.
该步骤对应的训练阶段可以称为单一分辨率图像训练阶段,目的是能够基于一种分辨率的图像快速训练出神经网络的模型。The training phase corresponding to this step can be called a single-resolution image training phase, the purpose is to be able to quickly train a neural network model based on a resolution image.
在步骤94,采用第二图像集合中的图像对神经网络模型的参数进行修正。In step 94, the images in the second image set are used to modify the parameters of the neural network model.
该步骤对应的训练阶段可以称为多分辨率图像训练阶段,目的是对步骤92生成的神经网络模型参数进行微调,使得神经网络模型可以用于对其他分辨率的图像进行处理。The training phase corresponding to this step can be called a multi-resolution image training phase, and the purpose is to fine-tune the parameters of the neural network model generated in step 92, so that the neural network model can be used to process images of other resolutions.
采用如图9所示的两步训练法,可以使得神经网络模型的参数快速收敛,从而可以提升神经网络模型的训练效率。The two-step training method shown in FIG. 9 can make the parameters of the neural network model quickly converge, thereby improving the training efficiency of the neural network model.
上文结合图1-图9,详细描述了本申请的方法实施例,下文结合图10,详细描述本申请的装置实施例。应理解,装置实施例与方法实施例对应,因此,未详细描述的部分可以参见前面各方法实施例。The method embodiments of the present application are described in detail above with reference to FIGS. 1 to 9, and the device embodiments of the present application are described in detail below with reference to FIG. 10. It should be understood that the device embodiments correspond to the method embodiments. Therefore, for the parts that are not described in detail, reference may be made to the foregoing method embodiments.
图10是本申请实施例提供的图像编码装置的示意性结构图。该图像编码装置1000包括存储器1010和处理器1020。FIG. 10 is a schematic structural diagram of an image encoding device according to an embodiment of the present application. The image encoding device 1000 includes a memory 1010 and a processor 1020.
存储器1010可用于存储程序。处理器1020可用于执行所述存储器中存储的程序,以执行如下操作:获取待编码图像;根据所述待编码图像,通过训练出的神经网络模型,得到所述待编码图像对应的最优编码参数;根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码。The memory 1010 may be used to store a program. The processor 1020 may be configured to execute a program stored in the memory to perform the following operations: obtaining an image to be encoded; and obtaining an optimal encoding corresponding to the image to be encoded according to the image to be encoded by using a trained neural network model. Parameters; encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
可选地,所述神经网络模型用于对所述待编码图像进行特征提取,得到特征向量;并根据所述特征向量,确定所述待编码图像对应的最优编码参数。Optionally, the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine an optimal encoding parameter corresponding to the image to be encoded.
可选地,所述根据所述特征向量,确定所述待编码图像对应的最优编码参数可以包括:根据所述特征向量,生成输出向量,所述输出向量用于表示预先设定的多个候选编码参数各自为最优编码参数的概率;根据多个所述候选编码参数各自为最优编码参数的概率,从多个所述候选编码参数中选取概率最大的作为所述待编码图像对应的最优编码参数。Optionally, determining the optimal encoding parameter corresponding to the image to be encoded according to the feature vector may include generating an output vector according to the feature vector, where the output vector is used to represent multiple preset presets. The probability that each candidate encoding parameter is an optimal encoding parameter; and based on the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, the one with the highest probability is selected from the plurality of candidate encoding parameters as the corresponding one of the image to be encoded Optimal encoding parameters.
可选地,在所述获取待编码图像之前,所述处理器1020还可用于执行以下操作:根据最优编码参数已知的输入图像对所述神经网络模型进行训练。Optionally, before acquiring the image to be encoded, the processor 1020 may be further configured to perform the following operation: training the neural network model according to an input image whose optimal encoding parameters are known.
可选地,所述神经网络模型的训练过程可以采用交叉熵函数为损失函数。Optionally, the training process of the neural network model may use a cross-entropy function as a loss function.
可选地,所述输入图像的数据可以为经过归一化的数据。Optionally, the data of the input image may be normalized data.
可选地,所述输入图像的数据可以为双通道数据,所述双通道数据之一可以包括所述输入图像的两个颜色分量对应的数据。Optionally, the data of the input image may be dual-channel data, and one of the two-channel data may include data corresponding to two color components of the input image.
可选地,所述输入图像的数据可以为单通道数据,所述单通道数据可以包括所述输入图像的各个颜色分量对应的数据。Optionally, the data of the input image may be single-channel data, and the single-channel data may include data corresponding to each color component of the input image.
可选地,所述输入图像的数据可以为YUV或RGB格式的数据。Optionally, the data of the input image may be data in YUV or RGB format.
可选地,所述输入图像可以包括第一图像集合和第二图像集合,所述第一图像集合中的各图像具有相同的分辨率,所述第二图像集合中的图像的分辨率不同于所述第一图像集合中的图像的分辨率,所述根据最优编码参数已知的输入图像对所述神经网络模型进行训练可以包括:采用所述第一图像集合中的图像对所述神经网络模型进行训练,得到所述神经网络模型的参数;采用所述第二图像集合中的图像对所述神经网络模型的参数进行修正。Optionally, the input image may include a first image set and a second image set, each image in the first image set has the same resolution, and the resolution of the images in the second image set is different from The resolution of the images in the first image set, and the training of the neural network model according to the input image whose optimal encoding parameters are known may include: using the images in the first image set to train the neural network The network model is trained to obtain the parameters of the neural network model; the images in the second image set are used to modify the parameters of the neural network model.
可选地,所述根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码可以包括:根据所述待编码图像对应的最优编码参数,采用标准或非标准编码器对所述待编码图像进行编码。Optionally, the encoding the image to be encoded according to the optimal encoding parameter corresponding to the image to be encoded may include: using a standard or non-standard encoder according to the optimal encoding parameter corresponding to the image to be encoded. Encoding the image to be encoded.
可选地,所述待编码图像对应的最优编码参数可以包括以下中的至少一种:用于指示所述待编码图像的变换方式的参数,用于指示所述待编码图像的量化方式的参数,以及用于指示所述待编码图像的熵编码方式的参数。Optionally, the optimal encoding parameter corresponding to the image to be encoded may include at least one of the following: a parameter for indicating a transformation manner of the image to be encoded, and a parameter for indicating a quantization manner of the image to be encoded. Parameters, and parameters used to indicate an entropy encoding manner of the image to be encoded.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc. .
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Claims (26)
- 一种图像编码方法,其特征在于,包括:An image encoding method, comprising:获取待编码图像;Obtaining an image to be encoded;根据所述待编码图像,通过训练出的神经网络模型,得到所述待编码图像对应的最优编码参数;Obtaining the optimal encoding parameters corresponding to the image to be encoded according to the trained neural network model according to the image to be encoded;根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码。And encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
- 根据权利要求1所述的方法,其特征在于,所述神经网络模型用于对所述待编码图像进行特征提取,得到特征向量;并根据所述特征向量,确定所述待编码图像对应的最优编码参数。The method according to claim 1, wherein the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine the best corresponding to the image to be encoded. Optimal coding parameters.
- 根据权利要求2所述的方法,其特征在于,所述根据所述特征向量,确定所述待编码图像对应的最优编码参数,包括:The method according to claim 2, wherein the determining an optimal encoding parameter corresponding to the image to be encoded according to the feature vector comprises:根据所述特征向量,生成输出向量,所述输出向量用于表示预先设定的多个候选编码参数各自为最优编码参数的概率;Generating an output vector according to the feature vector, where the output vector is used to indicate a probability that each of a plurality of preset candidate coding parameters is an optimal coding parameter;根据多个所述候选编码参数各自为最优编码参数的概率,从多个所述候选编码参数中选取概率最大的作为所述待编码图像对应的最优编码参数。According to the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, an optimal encoding parameter corresponding to the image to be encoded is selected from among the plurality of candidate encoding parameters with a highest probability.
- 根据权利要求1-3中任一项所述的方法,其特征在于,在所述获取待编码图像之前,所述方法还包括:The method according to any one of claims 1-3, wherein before the acquiring an image to be encoded, the method further comprises:根据最优编码参数已知的输入图像对所述神经网络模型进行训练。The neural network model is trained according to an input image whose optimal encoding parameters are known.
- 根据权利要求4所述的方法,其特征在于,所述神经网络模型的训练过程采用交叉熵函数为损失函数。The method according to claim 4, wherein the training process of the neural network model uses a cross-entropy function as a loss function.
- 根据权利要求4或5所述的方法,其特征在于,所述输入图像的数据为经过归一化的数据。The method according to claim 4 or 5, wherein the data of the input image is normalized data.
- 根据权利要求4-6中任一项所述的方法,其特征在于,所述输入图像的数据为双通道数据,所述双通道数据之一包括所述输入图像的两个颜色分量对应的数据。The method according to any one of claims 4-6, wherein the data of the input image is dual-channel data, and one of the two-channel data includes data corresponding to two color components of the input image .
- 根据权利要求4-6中任一项所述的方法,其特征在于,所述输入图像的数据为单通道数据,所述单通道数据包括所述输入图像的各个颜色分量对应的数据。The method according to any one of claims 4 to 6, wherein the data of the input image is single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
- 根据权利要求7或8所述的方法,其特征在于,所述输入图像的数据为YUV或RGB格式的数据。The method according to claim 7 or 8, wherein the data of the input image is data in YUV or RGB format.
- 根据权利要求4-9中任一项所述的方法,其特征在于,所述输入图 像包括第一图像集合和第二图像集合,所述第一图像集合中的各图像具有相同的分辨率,所述第二图像集合中的图像的分辨率不同于所述第一图像集合中的图像的分辨率,The method according to any one of claims 4-9, wherein the input image includes a first image set and a second image set, and each image in the first image set has the same resolution, The resolution of the images in the second image set is different from the resolution of the images in the first image set,所述根据最优编码参数已知的输入图像对所述神经网络模型进行训练,包括:The training the neural network model based on an input image with known optimal encoding parameters includes:采用所述第一图像集合中的图像对所述神经网络模型进行训练,得到所述神经网络模型的参数;Using the images in the first image set to train the neural network model to obtain parameters of the neural network model;采用所述第二图像集合中的图像对所述神经网络模型的参数进行修正。The images in the second image set are used to modify the parameters of the neural network model.
- 根据权利要求1-10中任一项所述的方法,其特征在于,所述根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码,包括:The method according to any one of claims 1 to 10, wherein the encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded comprises:根据所述待编码图像对应的最优编码参数,采用标准或非标准编码器对所述待编码图像进行编码。According to an optimal encoding parameter corresponding to the image to be encoded, the standard or non-standard encoder is used to encode the image to be encoded.
- 根据权利要求1-11中任一项所述的方法,其特征在于,所述待编码图像对应的最优编码参数包括以下中的至少一种:用于指示所述待编码图像的变换方式的参数,用于指示所述待编码图像的量化方式的参数,以及用于指示所述待编码图像的熵编码方式的参数。The method according to any one of claims 1-11, wherein the optimal encoding parameter corresponding to the image to be encoded includes at least one of the following: a method for indicating a transformation manner of the image to be encoded A parameter for indicating a quantization manner of the image to be encoded and a parameter for indicating an entropy encoding manner of the image to be encoded.
- 一种图像编码装置,其特征在于,包括:An image encoding device, comprising:存储器,用于存储程序;Memory for storing programs;处理器,用于执行所述存储器中存储的程序,以执行如下操作:A processor, configured to execute a program stored in the memory to perform the following operations:获取待编码图像;Obtaining an image to be encoded;根据所述待编码图像,通过训练出的神经网络模型,得到所述待编码图像对应的最优编码参数;Obtaining the optimal encoding parameters corresponding to the image to be encoded according to the trained neural network model according to the image to be encoded;根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码。And encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded.
- 根据权利要求13所述的图像编码装置,其特征在于,所述神经网络模型用于对所述待编码图像进行特征提取,得到特征向量;并根据所述特征向量,确定所述待编码图像对应的最优编码参数。The image encoding device according to claim 13, wherein the neural network model is used to perform feature extraction on the image to be encoded to obtain a feature vector; and according to the feature vector, determine that the image to be encoded corresponds to Optimal encoding parameters.
- 根据权利要求14所述的图像编码装置,其特征在于,所述根据所述特征向量,确定所述待编码图像对应的最优编码参数,包括:The image encoding device according to claim 14, wherein the determining an optimal encoding parameter corresponding to the image to be encoded according to the feature vector comprises:根据所述特征向量,生成输出向量,所述输出向量用于表示预先设定的多个候选编码参数各自为最优编码参数的概率;Generating an output vector according to the feature vector, where the output vector is used to indicate a probability that each of a plurality of preset candidate coding parameters is an optimal coding parameter;根据多个所述候选编码参数各自为最优编码参数的概率,从多个所述候 选编码参数中选取概率最大的作为所述待编码图像对应的最优编码参数。According to the probability that each of the plurality of candidate encoding parameters is an optimal encoding parameter, an optimal encoding parameter corresponding to the image to be encoded is selected from among the plurality of candidate encoding parameters with a highest probability.
- 根据权利要求13-15中任一项所述的图像编码装置,其特征在于,在所述获取待编码图像之前,所述处理器还用于执行以下操作:The image encoding device according to any one of claims 13-15, wherein before the acquiring an image to be encoded, the processor is further configured to perform the following operations:根据最优编码参数已知的输入图像对所述神经网络模型进行训练。The neural network model is trained according to an input image whose optimal encoding parameters are known.
- 根据权利要求16所述的图像编码装置,其特征在于,所述神经网络模型的训练过程采用交叉熵函数为损失函数。The image coding device according to claim 16, wherein the training process of the neural network model uses a cross-entropy function as a loss function.
- 根据权利要求16或17所述的图像编码装置,其特征在于,所述输入图像的数据为经过归一化的数据。The image encoding device according to claim 16 or 17, wherein the data of the input image is normalized data.
- 根据权利要求16-18中任一项所述的图像编码装置,其特征在于,所述输入图像的数据为双通道数据,所述双通道数据之一包括所述输入图像的两个颜色分量对应的数据。The image encoding device according to any one of claims 16 to 18, wherein the data of the input image is dual-channel data, and one of the dual-channel data includes two color components corresponding to the input image The data.
- 根据权利要求16-18中任一项所述的图像编码装置,其特征在于,所述输入图像的数据为单通道数据,所述单通道数据包括所述输入图像的各个颜色分量对应的数据。The image encoding device according to any one of claims 16 to 18, wherein the data of the input image is single-channel data, and the single-channel data includes data corresponding to each color component of the input image.
- 根据权利要求19或20所述的图像编码装置,其特征在于,所述输入图像的数据为YUV或RGB格式的数据。The image encoding device according to claim 19 or 20, wherein the data of the input image is data in YUV or RGB format.
- 根据权利要求16-21中任一项所述的图像编码装置,其特征在于,所述输入图像包括第一图像集合和第二图像集合,所述第一图像集合中的各图像具有相同的分辨率,所述第二图像集合中的图像的分辨率不同于所述第一图像集合中的图像的分辨率,The image encoding device according to any one of claims 16-21, wherein the input image includes a first image set and a second image set, and each image in the first image set has the same resolution Rate, the resolution of the images in the second image set is different from the resolution of the images in the first image set,所述根据最优编码参数已知的输入图像对所述神经网络模型进行训练,包括:The training the neural network model based on an input image with known optimal encoding parameters includes:采用所述第一图像集合中的图像对所述神经网络模型进行训练,得到所述神经网络模型的参数;Using the images in the first image set to train the neural network model to obtain parameters of the neural network model;采用所述第二图像集合中的图像对所述神经网络模型的参数进行修正。The images in the second image set are used to modify the parameters of the neural network model.
- 根据权利要求13-22中任一项所述的图像编码装置,其特征在于,所述根据所述待编码图像对应的最优编码参数,对所述待编码图像进行编码,包括:The image encoding device according to any one of claims 13 to 22, wherein the encoding the image to be encoded according to an optimal encoding parameter corresponding to the image to be encoded, comprises:根据所述待编码图像对应的最优编码参数,采用标准或非标准编码器对所述待编码图像进行编码。According to an optimal encoding parameter corresponding to the image to be encoded, the standard or non-standard encoder is used to encode the image to be encoded.
- 根据权利要求13-23中任一项所述的图像编码装置,其特征在于, 所述待编码图像对应的最优编码参数包括以下中的至少一种:用于指示所述待编码图像的变换方式的参数,用于指示所述待编码图像的量化方式的参数,以及用于指示所述待编码图像的熵编码方式的参数。The image encoding device according to any one of claims 13 to 23, wherein an optimal encoding parameter corresponding to the image to be encoded includes at least one of the following: an instruction for indicating a transformation of the image to be encoded The parameter of the mode is used to indicate a parameter of the quantization mode of the image to be encoded and the parameter of the entropy encoding mode of the image to be encoded.
- 一种计算机可读存储介质,其特征在于,其上存储有用于执行如权利要求1-12中任一项所述的方法的指令。A computer-readable storage medium, characterized in that it stores instructions for performing the method according to any one of claims 1-12.
- 一种计算机程序产品,其特征在于,包括用于执行如权利要求1-12中任一项所述的方法的指令。A computer program product, comprising instructions for performing a method according to any one of claims 1-12.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/104022 WO2020047756A1 (en) | 2018-09-04 | 2018-09-04 | Image encoding method and apparatus |
CN201880037859.6A CN110870310A (en) | 2018-09-04 | 2018-09-04 | Image encoding method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/104022 WO2020047756A1 (en) | 2018-09-04 | 2018-09-04 | Image encoding method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020047756A1 true WO2020047756A1 (en) | 2020-03-12 |
Family
ID=69651651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/104022 WO2020047756A1 (en) | 2018-09-04 | 2018-09-04 | Image encoding method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110870310A (en) |
WO (1) | WO2020047756A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114501026A (en) * | 2022-02-17 | 2022-05-13 | 北京百度网讯科技有限公司 | Video encoding method, device, equipment and storage medium |
CN116506622A (en) * | 2023-06-26 | 2023-07-28 | 瀚博半导体(上海)有限公司 | Model training method and video coding parameter optimization method and device |
WO2023169190A1 (en) * | 2022-03-07 | 2023-09-14 | 华为技术有限公司 | Encoding and decoding method, and electronic device |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111405285A (en) * | 2020-03-27 | 2020-07-10 | 北京百度网讯科技有限公司 | Method and apparatus for compressing image |
CN114731406B (en) * | 2020-12-04 | 2024-10-11 | 深圳市大疆创新科技有限公司 | Encoding method, decoding method, encoding device, and decoding device |
CN115412731B (en) * | 2021-05-11 | 2024-08-23 | 北京字跳网络技术有限公司 | Video processing method, device, equipment and storage medium |
CN115883831A (en) * | 2021-08-05 | 2023-03-31 | 华为技术有限公司 | Encoding and decoding method and device |
CN114302425B (en) * | 2021-12-21 | 2024-06-04 | 深圳Tcl新技术有限公司 | Equipment network distribution method and device, storage medium and electronic equipment |
CN114745556B (en) * | 2022-02-07 | 2024-04-02 | 浙江智慧视频安防创新中心有限公司 | Encoding method, encoding device, digital retina system, electronic device, and storage medium |
CN115050093B (en) * | 2022-05-23 | 2024-05-31 | 山东大学 | Cross-visual-angle gait recognition method based on staged multistage pyramid |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101924943A (en) * | 2010-08-27 | 2010-12-22 | 郭敏 | Real-time low-bit rate video transcoding method based on H.264 |
KR20170059040A (en) * | 2015-11-19 | 2017-05-30 | 전자부품연구원 | Optimal mode decision unit of video encoder and video encoding method using the optimal mode decision |
US20170264902A1 (en) * | 2016-03-09 | 2017-09-14 | Sony Corporation | System and method for video processing based on quantization parameter |
CN107609549A (en) * | 2017-09-20 | 2018-01-19 | 北京工业大学 | The Method for text detection of certificate image under a kind of natural scene |
-
2018
- 2018-09-04 CN CN201880037859.6A patent/CN110870310A/en active Pending
- 2018-09-04 WO PCT/CN2018/104022 patent/WO2020047756A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101924943A (en) * | 2010-08-27 | 2010-12-22 | 郭敏 | Real-time low-bit rate video transcoding method based on H.264 |
KR20170059040A (en) * | 2015-11-19 | 2017-05-30 | 전자부품연구원 | Optimal mode decision unit of video encoder and video encoding method using the optimal mode decision |
US20170264902A1 (en) * | 2016-03-09 | 2017-09-14 | Sony Corporation | System and method for video processing based on quantization parameter |
CN107609549A (en) * | 2017-09-20 | 2018-01-19 | 北京工业大学 | The Method for text detection of certificate image under a kind of natural scene |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114501026A (en) * | 2022-02-17 | 2022-05-13 | 北京百度网讯科技有限公司 | Video encoding method, device, equipment and storage medium |
CN114501026B (en) * | 2022-02-17 | 2023-04-14 | 北京百度网讯科技有限公司 | Video coding method, device, equipment and storage medium |
WO2023169190A1 (en) * | 2022-03-07 | 2023-09-14 | 华为技术有限公司 | Encoding and decoding method, and electronic device |
CN116506622A (en) * | 2023-06-26 | 2023-07-28 | 瀚博半导体(上海)有限公司 | Model training method and video coding parameter optimization method and device |
CN116506622B (en) * | 2023-06-26 | 2023-09-08 | 瀚博半导体(上海)有限公司 | Model training method and video coding parameter optimization method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110870310A (en) | 2020-03-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020047756A1 (en) | Image encoding method and apparatus | |
KR102285738B1 (en) | Method and apparatus for assessing subjective quality of a video | |
CN110798690B (en) | Video decoding method, and method, device and equipment for training loop filtering model | |
KR101377021B1 (en) | Encoding device and method, decoding device and method, and transmission system | |
WO2018150083A1 (en) | A method and technical equipment for video processing | |
TWI590662B (en) | Decoder and method | |
KR20220145405A (en) | An image encoding/decoding method and apparatus for performing feature quantization/inverse quantization, and a recording medium for storing a bitstream | |
US11893761B2 (en) | Image processing apparatus and method | |
US20230362378A1 (en) | Video coding method and apparatus | |
WO2018120019A1 (en) | Compression/decompression apparatus and system for use with neural network data | |
WO2023279961A1 (en) | Video image encoding method and apparatus, and video image decoding method and apparatus | |
CN116547969A (en) | Processing method of chroma subsampling format in image decoding based on machine learning | |
WO2023050720A1 (en) | Image processing method, image processing apparatus, and model training method | |
CN118872266A (en) | Video decoding method based on multi-mode processing | |
CN114222127A (en) | Video coding method, video decoding method and device | |
CN108805943B (en) | Image transcoding method and device | |
US20210092403A1 (en) | Object manipulation video conference compression | |
US20220335560A1 (en) | Watermark-Based Image Reconstruction | |
Mohammadi et al. | Perceptual impact of the loss function on deep-learning image coding performance | |
US20230171435A1 (en) | Image encoding, decoding method and device, coder-decoder | |
CN114245126B (en) | Depth feature map compression method based on texture cooperation | |
WO2023133888A1 (en) | Image processing method and apparatus, remote control device, system, and storage medium | |
EP4360059A1 (en) | Methods and apparatuses for encoding/decoding an image or a video | |
CN114600166A (en) | Image processing method, image processing apparatus, and storage medium | |
CN111988621A (en) | Video processor training method and device, video processing device and video processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18932527 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18932527 Country of ref document: EP Kind code of ref document: A1 |