WO2020181554A1 - 预测值的确定方法、解码器以及计算机存储介质 - Google Patents

预测值的确定方法、解码器以及计算机存储介质 Download PDF

Info

Publication number
WO2020181554A1
WO2020181554A1 PCT/CN2019/078160 CN2019078160W WO2020181554A1 WO 2020181554 A1 WO2020181554 A1 WO 2020181554A1 CN 2019078160 W CN2019078160 W CN 2019078160W WO 2020181554 A1 WO2020181554 A1 WO 2020181554A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
reference image
pixel matrix
input value
matrix
Prior art date
Application number
PCT/CN2019/078160
Other languages
English (en)
French (fr)
Inventor
周益民
程学理
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2019/078160 priority Critical patent/WO2020181554A1/zh
Priority to CN201980093336.8A priority patent/CN113490953A/zh
Publication of WO2020181554A1 publication Critical patent/WO2020181554A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the embodiments of the present application relate to the technical field of inter-frame prediction in video decoding, and in particular, to a method for determining a prediction value, a decoder, and a computer storage medium.
  • the inter-frame prediction technology makes full use of the high time-domain correlation between video image frames to achieve the purpose of video image compression, and is widely used in ordinary TV, conference TV, video phone, and high-definition TV.
  • the compression codec is widely used in ordinary TV, conference TV, video phone, and high-definition TV.
  • the core lies in the motion estimation (ME, Motion Estimation) and motion compensation (MC, Motion Compensation) technology.
  • the encoder has adjacent or similar coded reconstructed images in the time domain.
  • search for the best matching block of the image block to be encoded as the reference image block of the image block to be encoded calculate the residual difference between the reference image block and the image block to be encoded, and then generate it through transformation, quantization, entropy coding and other processes
  • the bit stream is transmitted. Since the video content is generally dynamic content, under normal circumstances, the current image block to be encoded cannot search the reference image for a reference pixel block with a completely matching pixel value.
  • the reference image is an encoded image instead of the source image. Due to the quantization technology, there is a certain distortion between the encoded image and the source image. Therefore, the residual error between the reference image block and the image block to be coded will be further amplified, causing the encoder to consume more bits to encode the prediction residual information.
  • the embodiments of the present application expect to provide a method for determining a predicted value, a decoder, and a computer storage medium, which can improve the decoding efficiency of the decoder.
  • an embodiment of the present application provides a method for determining a prediction value.
  • the method is applied to a decoder, and the method includes:
  • the input value is the pixel matrix of the reference image block.
  • the determining the input value according to the pixel matrix of the reference image block includes:
  • the input value is a pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block arranged in relative positions.
  • the determining the input value according to the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block includes:
  • the pixel matrix of the adjacent reference image block is processed to obtain the pixel matrix of the adjacent reference image block after interpolation; the pixel matrix of the reference image block and the interpolated adjacent reference
  • the pixel matrix of the image block is composed of a pixel matrix arranged according to relative positions and determined as the input value of the neural network.
  • an embodiment of the present application provides a decoder, and the decoder includes:
  • the obtaining module is used to obtain the pixel matrix of the reference image block of the image block to be decoded; the determining module is used to determine the input value according to the pixel matrix of the reference image block; the processing module is used to input the input value to the preset In the neural network, the predicted value of the image block to be decoded is obtained.
  • the input value is the pixel matrix of the reference image block.
  • the determining module includes:
  • the determining sub-module is configured to determine the input value according to the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block.
  • the input value is a pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block arranged in relative positions.
  • the determining submodule is specifically used for:
  • the pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the interpolated adjacent reference image block arranged according to relative positions is determined as the input value of the neural network.
  • an embodiment of the present application provides a decoder, and the decoder includes:
  • an embodiment of the present application provides a computer storage medium that stores executable instructions, and when the executable instructions are executed by one or more processors, the processor executes the operations described in the first aspect. The method of determining the predicted value described above.
  • the embodiment of the application provides a method, a decoder, and a computer storage medium for determining a predicted value.
  • the method is applied to a decoder.
  • the method includes: obtaining a pixel matrix of a reference image block of an image block to be decoded, and according to the reference image The pixel matrix of the block, the input value is determined, and the input value is input to the preset neural network to obtain the predicted value of the image block to be decoded; that is to say, in the embodiment of the present application, first, the reference of the image block to be decoded is obtained The pixel matrix of the image block, and then use the preset neural network to process the pixel matrix of the reference image block.
  • the neural network is used to obtain the predicted value of the image block to be encoded, so that the predicted value is closer to the pixel matrix of the image block to be encoded , Thereby reducing the bit stream of prediction residuals, and thereby improving the efficiency of video image coding and decoding.
  • FIG. 1 is a schematic flowchart of an optional method for determining a predicted value according to an embodiment of the application
  • Figure 2 is a schematic diagram of the arrangement of image blocks to be decoded
  • FIG. 3 is a schematic flowchart of another optional method for determining a predicted value according to an embodiment of the application
  • FIG. 4 is a schematic diagram of the arrangement of an optional image block to be encoded and a reference image block provided by an embodiment of this application;
  • FIG. 5 is a schematic structural diagram of an optional neural network provided by an embodiment of this application.
  • FIG. 6 is a first structural diagram of a decoder provided by an embodiment of this application.
  • FIG. 7 is a second structural diagram of a decoder provided by an embodiment of this application.
  • FIG. 1 is a schematic flow chart of an optional method for determining a predicted value provided by an embodiment of the present application. As shown, the method for determining the predicted value may include:
  • the encoder can use ME, MC and vector prediction techniques to select the best time-domain reference image block from the reconstructed reference image, and use the reference image block and the image block to be encoded to determine
  • the prediction residual of the image block to be coded is transmitted to the decoder, and the decoder uses the selected reference image block and the prediction residual to decode the real image block.
  • the decoder can use ME, MC and vector prediction
  • the other technologies select the reference image block from the reconstructed reference image, and obtain the pixel matrix of the reference image block.
  • the pixel matrix of the reference image block is used to determine the predicted value of the image block to be decoded, so as to determine the predicted value of the image block based on the predicted value and prediction residual. Decode the real image block.
  • Figure 2 is a schematic diagram of the arrangement of image blocks to be decoded. As shown in Figure 2, the area of the diagonal stripes is the image blocks that have been decoded. During the decoding process, the decoder follows the order of the image blocks (each row follows from the left). In Fig. 2, after the lower left image block is decoded, the next image block of the lower left image block is the image block to be decoded (the space in Fig. 2).
  • the pixel matrix of the reference image block may be the pixel matrix of the chrominance value of the reference image block, or may be the pixel matrix of the luminance value of the reference image block, which is not specifically limited in the embodiment of the present application.
  • the decoder uses a preset neural network to determine the predicted value of the image block to be decoded. Then, in order to obtain the predicted value of the image block to be decoded, the neural network needs to be determined first. The input value of the network.
  • the input value may be the pixel matrix of the reference image block.
  • the decoder directly uses the pixel matrix of the reference image block as the input value of the neural network.
  • the pixel matrix of the reference image block is an N ⁇ N matrix, and the matrix is input to the neural network.
  • the matrix processing can obtain the predicted value of the image block to be encoded.
  • FIG. 3 is a schematic flowchart of another optional method for determining a predicted value provided in an embodiment of this application.
  • S102 Can include:
  • S302 Determine an input value according to the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block.
  • the decoder not only obtains the pixel matrix of the reference image block, but also needs to use ME, MC and vector prediction techniques to obtain the pixel matrix of the adjacent reference image block of the reference image block from the reference image.
  • the reference image is a non-boundary reference image block
  • obtain all reference image blocks adjacent to the reference image for example, including: above, below, left, right, top left, top right, bottom left, and right of the reference image
  • the lower reference image block in this way, the pixel matrix of the adjacent reference image block of the reference image block can be obtained.
  • the decoder After the decoder determines the reference image block, it records the pixel distance between the reference image block and the image block to be decoded, that is, the motion vector (MV, Motion Vector) information.
  • the decoder can use the whole-pixel motion search technology, which can be obtained according to the MV.
  • the pixel matrix of the adjacent reference image so that after the pixel matrix of the reference image and the pixel matrix of the adjacent reference image are obtained, the input value of the neural network can be determined according to the pixel matrix of the reference image and the pixel matrix of the adjacent reference image .
  • the input to the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block are arranged according to relative positions. matrix.
  • the pixel matrix of the reference image block is an N ⁇ N pixel matrix
  • the pixel matrix of each adjacent reference image block is an N ⁇ N pixel matrix
  • the pixel matrix arranged in accordance with the relative position relationship is a 3N ⁇ 3N pixel matrix, this 3N ⁇ 3N pixel matrix is used as the input value of the neural network.
  • S302 may include:
  • the pixel matrix of the adjacent reference image block is processed to obtain the pixel matrix of the adjacent reference image block after interpolation;
  • the pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the neighboring reference image block after interpolation is arranged according to the relative position, and the pixel matrix is determined as the input value of the neural network.
  • the preset interpolation method may include a linear interpolation method, a bilinear interpolation method, and a cubic linear interpolation method.
  • the embodiment of the present application does not specifically limit this.
  • the decoder uses Sub-pixel precision motion search technology, that is, the decoder processes the pixel matrix of 8 adjacent reference image blocks using a preset interpolation method to obtain the pixel matrix of the adjacent reference image block after interpolation.
  • the reference image block The pixel matrix of the pixel matrix and the pixel matrix of the interpolated adjacent reference image block are arranged according to the relative position, and the pixel matrix is determined as the input value of the neural network.
  • Fig. 4 is a schematic diagram of the arrangement of an optional image block to be coded and a reference image block provided by an embodiment of the application.
  • the smallest square represents one pixel
  • the square formed by every 16 pixels represents A to-be-decoded image block
  • ME, MC and vector prediction techniques to obtain the reference image block of the image block to be decoded from the reference image, and record the pixel distance between the reference image block and the image block to be decoded, namely MV information
  • MV information if the whole pixel motion search technology is used, refer to the pixel information of the image block and obtain the 8 neighboring blocks (upper, lower, left, right, upper left, upper right, lower left, and lower right) around it, you can directly follow the MV from Obtained from the reference image.
  • the decoder uses the sub-pixel precision motion search technology to select the reference image block.
  • the reference image block is a boundary image block, and the embodiment of the present application does not perform interpolation processing on it.
  • the beginning of the arrow in Figure 4 is the image block to be decoded
  • the end of the arrow in Figure 4 is the reference image block of the image block to be decoded.
  • all reference image blocks adjacent to the reference image block are reference images.
  • S103 Input the input value into the preset neural network to obtain the predicted value of the image block to be decoded.
  • the input value of the neural network is input to the neural network, where the input value can be the pixel matrix of the reference image block, or the pixel matrix of the reference image block and the pixels of the adjacent reference image block
  • the matrix is arranged according to the relative position of the pixel matrix, or the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block after interpolation are arranged according to the relative position.
  • the processing of the input value by the neural network can be based on The difference in the input value is implemented in two processing methods:
  • S103 may include:
  • Input the input value into the neural network, and sequentially perform normalization operation, convolution operation, feature extraction, denormalization operation and addition operation on the input value to obtain the predicted value of the image block to be decoded.
  • the predicted value of the image block to be decoded there are two cases to determine the predicted value of the image block to be decoded.
  • the pixel matrix of the image block to be decoded is an N ⁇ N matrix
  • one is that the input value is the pixel matrix of the reference image block.
  • the value is a matrix with the same dimension as the image block to be decoded, for example, an N ⁇ N matrix.
  • the other is a pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block arranged in relative positions, or the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block after interpolation
  • a matrix of pixels arranged according to relative positions, for example, a 3N ⁇ 3N matrix; then, for matrices of different dimensions, the predicted value of the image block to be decoded can be determined in the following manner:
  • S103 may include:
  • the preset convolution kernel perform a convolution operation on the normalized input matrix to obtain a matrix after the convolution operation
  • the obtained pixel matrix of the reference image block is also an N ⁇ N matrix, and the pixel matrix of the reference image block (N ⁇ N matrix) to perform normalization processing to obtain the normalized input matrix; then, use the preset convolution kernel to perform the convolution operation on the normalized input matrix, and secondly, perform the convolution operation on the convolution operation matrix
  • the residual Res layer is used for feature extraction to obtain the residual matrix, and again, the residual matrix is added to the normalized input matrix, and finally, the added matrix is denormalized to obtain the image to be decoded
  • the predicted value of the block in this way, the predicted value of the image block to be decoded is an N ⁇ N matrix.
  • S103 may include:
  • the preset convolution kernel perform a convolution operation on the normalized input matrix to obtain a matrix after the convolution operation
  • the matrix after the convolution operation is scaled to obtain a pixel matrix with a preset dimension; wherein the preset dimension is the same as the dimension of the pixel matrix of the reference image block;
  • the obtained pixel matrix of the reference image block is also an N ⁇ N matrix, the pixel matrix of the adjacent reference image block or after interpolation
  • the pixel matrices of the adjacent reference image blocks are all N ⁇ N matrices, so the input value is a 3N ⁇ 3N matrix, and the 3N ⁇ 3N matrix is input to the neural network.
  • the neural network is a residual network (ResNet, Residual Network) based on a convolutional neural network.
  • FIG. 5 is a schematic structural diagram of an optional neural network provided by an embodiment of the application. Refer to FIG. 5, the neural network It can include 4 layers of convolution kernels of different sizes and depths, a scaled convolutional layer, and a Res layer; Table 1 below shows the network configuration details of the neural network in Figure 5, as shown in Table 1 below:
  • the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block, or the 3N ⁇ 3N matrix composed of the pixel matrix of the adjacent reference image block after interpolation are input to the neural network as the input value (Input) of the neural network.
  • the 3N ⁇ 3N matrix is normalized to obtain the normalized input matrix
  • the pixel matrix of the reference image block is normalized to obtain the normalized pixel matrix of the reference image block.
  • Leaky ReLU is an activation function
  • alpha is the parameter of the activation function.
  • the decoder uses the 05-layer convolution kernel (equivalent to the preset scaling convolution layer) to perform a dimensionality reduction convolution operation on the matrix after the convolution operation, that is, scaling processing, to obtain a N ⁇ N matrix (equivalent to a matrix of preset dimensions),
  • the residual matrix is obtained, the residual matrix is added to the pixel matrix of the normalized reference image block to obtain the added matrix, and then the added matrix is denormalized to obtain the decoding
  • the predicted value of the image block is used as the output value of the neural network (Output).
  • the pixel block size has different sizes such as 8 ⁇ 8, 16 ⁇ 16, etc., and the pixel values of image blocks of different sizes have obvious differences.
  • the pixel sum of the brightness image block The pixels of the chroma image block also have a big difference in texture characteristics. Therefore, for the pixels of the brightness image block and the pixel of the chroma image block of different sizes, different network parameters can be trained to ensure better prediction values. .
  • the neural network is transplanted to the encoder and decoder, and the encoder and decoder select the best matching block to perform the prediction value calculation. After the prediction value is obtained, for the encoder, the frame The prediction residuals passed by the inter prediction module to the subsequent modules need to be replaced by the difference between the predicted value and the image block to be coded.
  • the encoder uses the difference between the determined predicted value and the image block to be decoded as the prediction residual, which can consume less bits. Streaming to transmit prediction residuals can improve coding and decoding efficiency.
  • the encoder when the encoder encodes the current block, it needs to send the reconstructed reference image block and its neighboring reference image blocks to the corresponding neural network to calculate the predicted value, and then subtract the predicted value from the predicted value.
  • the residuals obtain the pixels of the currently encoded image block, thereby completing the encoding work of the currently encoded image block and ensuring the consistency of the encoding and decoding.
  • the embodiment of the present application provides a method for determining a predicted value.
  • the method includes: obtaining a pixel matrix of a reference image block of an image block to be decoded, determining an input value according to the pixel matrix of the reference image block, and inputting the input value to a preset
  • the predicted value of the image block to be decoded is obtained; that is to say, in the embodiment of the present application, first, the pixel matrix of the reference image block of the image block to be decoded is obtained, and then the preset neural network is used to compare the reference The pixel matrix of the image block is processed.
  • the neural network is used to obtain the predicted value of the image block to be coded, so that the predicted value is closer to the pixel matrix of the image block to be coded, thereby reducing the bit stream of the prediction residual and improving the video image The efficiency of encoding and decoding.
  • FIG. 6 is a structural schematic diagram 1 of a decoder provided by an embodiment of this application.
  • the decoder may include:
  • the obtaining module 61 is configured to obtain the pixel matrix of the reference image block of the image block to be decoded
  • the determining module 62 is configured to determine the input value according to the pixel matrix of the reference image block
  • the processing module 63 is used to input the input value into the preset neural network to obtain the predicted value of the image block to be decoded.
  • the input value may be the pixel matrix of the reference image block.
  • the determining module 62 includes:
  • the determining sub-module is used to determine the input value according to the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block.
  • the input value may be a pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the adjacent reference image block arranged according to relative positions.
  • the sub-module is determined, which is specifically used for:
  • the pixel matrix of the adjacent reference image block is processed to obtain the pixel matrix of the adjacent reference image block after interpolation;
  • the pixel matrix composed of the pixel matrix of the reference image block and the pixel matrix of the neighboring reference image block after interpolation is arranged according to the relative position, and the pixel matrix is determined as the input value of the neural network.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program, or software, etc., of course, may also be a module, or may be non-modular.
  • constituent units in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • FIG. 7 is a second structural diagram of a decoder provided by an embodiment of the application. As shown in FIG. 7, an embodiment of the present application provides a decoder 700.
  • the storage medium 72 includes a processor 71 and a storage medium 72 storing executable instructions of the processor 71.
  • the storage medium 72 relies on the processor 71 to perform operations through the communication bus 73.
  • the prediction value of the first embodiment is executed. Determine the method.
  • the communication bus 73 is used to implement connection and communication between these components.
  • the communication bus 73 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the communication bus 73 in FIG. 7.
  • An embodiment of the present application provides a computer storage medium that stores executable instructions.
  • the processors execute the operations described in one or more embodiments above. How to determine the predicted value.
  • the memory in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Synchronous Link Dynamic Random Access Memory
  • Synchlink DRAM Synchronous Link Dynamic Random Access Memory
  • DRRAM Direct Rambus RAM
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • the steps of the above method can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • FPGA ready-made programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the embodiments described herein can be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Equipment (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processors, controllers, microcontrollers, microprocessors, and others for performing the functions described in this application Electronic unit or its combination.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Equipment
  • PLD programmable Logic Device
  • PLD Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the technology described herein can be implemented by modules (such as procedures, functions, etc.) that perform the functions described herein.
  • the software codes can be stored in the memory and executed by the processor.
  • the memory can be implemented in the processor or external to the processor.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes a number of instructions to enable a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • the pixel matrix of the reference image block of the image block to be decoded is obtained, the input value is determined according to the pixel matrix of the reference image block, and the input value is input into the preset neural network to obtain the image block to be decoded That is to say, in this embodiment of the application, first, obtain the pixel matrix of the reference image block of the image block to be decoded, and then process the pixel matrix of the reference image block using a preset neural network.
  • the neural network is used to obtain the predicted value of the image block to be coded, so that the predicted value is closer to the pixel matrix of the image block to be coded, thereby reducing the bit stream of prediction residuals, and thereby improving the efficiency of video image coding and decoding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供了一种预测值的确定方法、解码器以及计算机存储介质,该方法应用于一解码器中,该方法包括:获取待解码图像块的参考图像块的像素矩阵,根据参考图像块的像素矩阵,确定输入值,将输入值输入至预设的神经网络中得到待解码图像块的预测值;也就是说,在本申请实施例中,首先,获取待解码图像块的参考图像块的像素矩阵,然后,利用预设的神经网络对参考图像块的像素矩阵进行处理,如此,利用神经网络得到待编码图像块的预测值,使得预测值更加接近待编码图像块的像素矩阵,从而减小预测残差的比特流,进而可以提高视频图像编解码的效率。

Description

预测值的确定方法、解码器以及计算机存储介质 技术领域
本申请实施例涉及视频解码中帧间预测技术领域,尤其涉及一种预测值的确定方法、解码器以及计算机存储介质。
背景技术
现有的视频编解码标准中,帧间预测技术是充分利用视频图像帧间的高度时域相关性来达到视频图像压缩的目的,广泛应用于普通电视、会议电视、视频电话、高清晰度电视的压缩编解码中。
为了最大程度上消除时域信息冗余,核心在于运动估计(ME,Motion Estimation)和运动补偿(MC,Motion Compensation)技术,编码器在时域上相邻或相近的已编码重构图像即在参考图像中,搜索待编码图像块的最佳匹配块,作为当前待编码图像块的参考图像块,计算参考图像块与待编码图像块的残差,再经由变换、量化、熵编码等过程生成比特流进行传输。由于视频内容一般为动态内容,通常情况下,当前待编码图像块并不能在参考图像中搜索到像素值完全匹配的参考像素块,它与编码器选出的最优匹配块之间天然存在一定的残差。另外,参考图像是已编码图像而非信源图像,由于量化技术的存在,编码图像与信源图像之间存在一定失真。因此,参考图像块与待编码图像块之间的残差会进一步被放大,导致编码器需要消耗更多比特来编码预测残差信息。
现有编码器通过MC、光流法、加权预测等数学模型对最优参考图像块进行滤波处理,从而减小预测残差,进而尽可能地消除时域冗余,但是由于这些均为经验模型,所预测出的最优参考图像块与当前编码图像块的残差仍处于较高的数量级,影响视频图像的编解码效率;由此可以看出,现 有的视频图像的编解码中的所确定出的预测值,导致视频图像的编解码效率较低。
发明内容
有鉴于此,本申请实施例期望提供一种预测值的确定方法、解码器以及计算机存储介质,能够提高解码器的解码效率。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种预测值的确定方法,所述方法应用于一解码器中,所述方法包括:
获取待解码图像块的参考图像块的像素矩阵;根据所述参考图像块的像素矩阵,确定输入值;将所述输入值输入至预设的神经网络中,得到所述待解码图像块的预测值。
在上述方案中,所述输入值为所述参考图像块的像素矩阵。
在上述方案中,所述根据所述参考图像块的像素矩阵,确定输入值,包括:
获取所述参考图像块的相邻参考图像块的像素矩阵;根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值。
在上述方案中,所述输入值为所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
在上述方案中,所述根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值,包括:
根据预设的插值法,对所述相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;将所述参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为所述神经网络的输入值。
第二方面,本申请实施例提供了一种解码器,所述解码器包括:
获取模块,用于获取待解码图像块的参考图像块的像素矩阵;确定模块,用于根据所述参考图像块的像素矩阵,确定输入值;处理模块,用于将所述输入值输入至预设的神经网络中,得到所述待解码图像块的预测值。
在上述方案中,所述输入值为所述参考图像块的像素矩阵。
在上述方案中,所述确定模块,包括:
获取子模块,用于获取所述参考图像块的相邻参考图像块的像素矩阵;
确定子模块,用于根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值。
在上述方案中,所述输入值为所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
在上述方案中,所述确定子模块,具体用于:
根据预设的插值法,对所述相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;
将所述参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为所述神经网络的输入值。
第三方面,本申请实施例提供了一种解码器,所述解码器包括:
处理器以及存储有所述处理器可执行指令的存储介质,所述存储介质通过通信总线依赖所述处理器执行操作,当所述指令被所述处理器执行时,执行上述第一方面所述的预测值的确定方法。
第四方面,本申请实施例提供了一种计算机存储介质,存储有可执行指令,当所述可执行指令被一个或多个处理器执行的时候,所述处理器执行所述第一方面所述的预测值的确定方法。
本申请实施例提供了一种预测值的确定方法、解码器以及计算机存储介质,该方法应用于一解码器中,该方法包括:获取待解码图像块的参考图像块的像素矩阵,根据参考图像块的像素矩阵,确定输入值,将输入值输入至预设的神经网络中,得到待解码图像块的预测值;也就是说,在本 申请实施例中,首先,获取待解码图像块的参考图像块的像素矩阵,然后,利用预设的神经网络对参考图像块的像素矩阵进行处理,如此,利用神经网络得到待编码图像块的预测值,使得预测值更加接近待编码图像块的像素矩阵,从而减小预测残差的比特流,进而可以提高视频图像编解码的效率。
附图说明
图1为本申请实施例提供的一种可选的预测值的确定方法的流程示意图;
图2为待解码图像块的排布示意图;
图3为本申请实施例提供的另一种可选的预测值的确定方法的流程示意图;
图4为本申请实施例提供的一种可选的待编码图像块和参考图像块的排布示意图;
图5为本申请实施例提供的一种可选的神经网络的结构示意图;
图6为本申请实施例提供的一种解码器的结构示意图一;
图7为本申请实施例提供的一种解码器的结构示意图二。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
本申请实施例提供了一种预测值的确定方法,该方法应用于一解码器中,图1为本申请实施例提供的一种可选的预测值的确定方法的流程示意图,参考图1所示,该预测值的确定方法可以包括:
S101:获取待解码图像块的参考图像块的像素矩阵;
在视频图像的编解码中,编码器可以利用ME、MC和向量预测等技术从重构的参考图像中选取出最佳时域的参考图像块,利用参考图像块和待编码图像块,确定出待编码图像块的预测残差,将预测残差传输至解码器,解码器利用选取出的参考图像块和预测残差来解码出真实图像块。
由此可以看出,若预测残差较大,则编码器需要消耗较多的比特来编码预测残差,这样,会影响编解码的效率。
基于现有的预测残差处于较高的数量等级,导致编解码效率较低,为了提高视频图像编解码的效率,首先,针对待解码图像块来说,解码器可以利用ME、MC和向量预测等技术从重构的参考图像中选取出参考图像块,并得到参考图像块的像素矩阵,利用参考图像块的像素矩阵来确定待解码图像块的预测值,从而根据预测值和预测残差来解码出真实图像块。
图2为待解码图像块的排布示意图,如图2所示,斜条纹的区域为已解码出的图像块,在解码的过程中,解码器是按照图像块的顺序(每一行按照从左往右的顺序)来解码的,在图2中,当解码出左下方的图像块之后,左下方图像块的下一个图像块即为待解码图像块(图2中的空格)。
这里,需要说明的是,参考图像块的像素矩阵可以为参考图像块的色度值的像素矩阵,也可以为参考图像块的亮度值的像素矩阵,本申请实施例对此不作具体限定。
S102:根据参考图像块的像素矩阵,确定输入值;
为了确定出待解码图像块的预测值,解码器采用预先设置有的神经网络,利用神经网络来确定待解码图像块的预测值,那么,为了得到待解码图像块的预测值,需要先确定神经网络的输入值。
为了确定神经网络的输入值,可以通过下列一种或者多种方式来实现:
为了确定出神经网络的输入值,在一种可选的实施例中,该输入值可以为参考图像块的像素矩阵。
具体来说,解码器直接将参考图像块的像素矩阵作为神经网络的输入 值,例如,参考图像块的像素矩阵为一个N×N的矩阵,将该矩阵输入至神经网络,利用神经网络对该矩阵进行处理可以得到待编码图像块的预测值。
为了确定出神经网络的输入值,在一种可选的实施例中,图3为本申请实施例提供的另一种可选的预测值的确定方法的流程示意图,参考图3所示,S102可以包括:
S301:获取参考图像块的相邻参考图像块的像素矩阵;
S302:根据参考图像块的像素矩阵和相邻参考图像块的像素矩阵,确定输入值。
具体来说,解码器不仅仅获取参考图像块的像素矩阵,还需要采用ME、MC和向量预测等技术从参考图像中获取参考图像块的相邻参考图像块的像素矩阵,在实际应用中,针对参考图像为非边界参考图像块时,获取与参考图像相邻的全部参考图像块,例如包括位于:参考图像的上方、下方、左方、右方、左上方、右上方、左下方和右下方的参考图像块,这样,便可以获取到参考图像块的相邻参考图像块的像素矩阵。
这里,解码器确定出参考图像块之后,记录参考图像块与待解码图像块的像素距离,即运动向量(MV,Motion Vector)信息,解码器可以采用整像素运动搜索技术,根据MV可以获取8个相邻参考图像的像素矩阵,这样,在得到参考图像的像素矩阵和相邻参考图像的像素矩阵之后,可以根据参考图像的像素矩阵和相邻参考图像的像素矩阵,确定神经网络的输入值。
为了确定出神经网络的输入值,在一种可选的实施例中,在302中,该输入至可以为参考图像块的像素矩阵和相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
举例来说,参考图像块的像素矩阵为一个N×N的像素矩阵,每个相邻参考图像块的像素矩阵为一个N×N的像素矩阵,那么按照相对位置关 系排列组成的像素矩阵为一个3N×3N的像素矩阵,将这个3N×3N的像素矩阵作为神经网络的输入值。
为了确定出神经网络的输入值,在一种可选的实施例中,S302可以包括:
根据预设的插值法,对相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;
将参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为神经网络的输入值。
其中,预设的插值法可以包括一次线性插值法,双线性插值法和三次线性插值法,这里,本申请实施例对此不作具体限定。
具体来说,8个相邻参考图像块的像素矩阵若直接作为神经网络的输入值,会导致像素信息不平滑,生成的图像存在肉眼可见的边界效应,为了消除这种边界效应,解码器采用亚像素精度的运动搜索技术,即解码器对8个相邻参考图像块的像素矩阵利用预设的插值法进行处理,得到插值后的相邻参考图像块的像素矩阵,最后,将参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为神经网络的输入值。
图4为本申请实施例提供的一种可选的待编码图像块和参考图像块的排布示意图,参考图4所示,最小的方格代表一个像素,每16个像素构成的方格代表一个待解码图像块,在解码时,先利用ME、MC和向量预测等技术从参考图像中获取到待解码图像块的参考图像块,并记录参考图像块与待解码图像块的像素距离,即MV信息,若采用整像素运动搜索技术,参考图像块的像素信息及获得其周围(上、下、左、右、左上、右上、左下和右下)8个相邻块,可以直接根据MV从参考图像中获得。
解码器利用亚像素精度的运动搜索技术选出的参考图像块,在这种情况下,若直接取参考图像块的周围8个相邻参考图像块作为神经网络的输 入,会导致像素信息不平滑,生成的图像存在肉眼可见的边界效应,因此,对于亚像素精度的预测像素块,需要对其周围8个相邻参考图像块进行相应的亚像素插值,保证其相对位置重新拼接,作为神经网络的输入。在实际应用中,对于参考图像块为边界图像块,本申请实施例不对其进行插值处理。
其中,在图4中箭头的起始端为待解码图像块,图4中箭头的末端为待解码图像块的参考图像块,在参考图像中,与参考图像块相邻的全部参考图像块为参考图像块的相邻参考图像块。
S103:将输入值输入至预设的神经网络中,得到待解码图像块的预测值。
在S102中确定出神经网络的输入值之后,将输入值输入至神经网络中,其中,该输入值可以为参考图像块的像素矩阵,或者参考图像块的像素矩阵和相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,或者参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,这里,利用神经网络对输入值的处理可以根据输入值的不同分为两种处理方式来实现:在一种可选的实施例中,S103可以包括:
将输入值输入至神经网络中,依次对输入值进行归一化运算、卷积运算、特征提取、反归一化运算和加法运算,得到待解码图像块的预测值。
具体来说,分为两种情况来确定待解码图像块的预测值,当待解码图像块的像素矩阵为一个N×N的矩阵时,一种是输入值为参考图像块的像素矩阵,输入值为与待解码图像块相同维度的矩阵,例如N×N的矩阵。
另一种是输入值为参考图像块的像素矩阵和相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,或者参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,例如,3N×3N的矩阵;那么,针对不同维度的矩阵可以通过下列方式来确定待解 码图像块的预测值:
当输入值为参考图像块的像素矩阵,为了实现利用神经网络对输入值的处理,在一种可选的实施例中,S103可以包括:
对输入值进行归一化处理,得到归一化后的输入矩阵;
根据预设的卷积核,对归一化后的输入矩阵进行卷积运算,得到卷积运算后的矩阵;
对卷积运算后的矩阵进行特征提取,得到残差矩阵;
将残差矩阵与归一化后的输入矩阵相加,得到相加后的矩阵;
对相加后的矩阵进行反归一化处理,得到待解码图像块的预测值。
这里,以待解码图像块的像素矩阵为一个N×N的矩阵为例来说,获取到的参考图像块的像素矩阵也为一个N×N的矩阵,先对参考图像块的像素矩阵(N×N的矩阵)进行归一化处理,得到归一化后的输入矩阵;然后,利用预设的卷积核对归一化后的输入矩阵进行卷积运算,其次,对卷积运算后的矩阵采用残差Res层进行特征提取,得到残差矩阵,再次,将残差矩阵与归一化后的输入矩阵相加,最后,对相加后的矩阵进行反归一化处理,得到待解码图像块的预测值;如此,使得待解码图像块的预测值为一个N×N的矩阵。
当输入值为参考图像块的像素矩阵和相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,或者参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,为了实现利用神经网络对输入值的处理,在一种可选的实施例中,S103可以包括:
对输入值进行归一化处理,得到归一化后的输入矩阵;
对参考图像块的像素矩阵进行归一化处理,得到归一化后的参考图像块的像素矩阵;
根据预设的卷积核,对归一化后的输入矩阵进行卷积运算,得到卷积运算后的矩阵;
根据预设的缩放卷积层,对卷积运算后的矩阵进行缩放,得到预设维度的像素矩阵;其中,预设维度与参考图像块的像素矩阵的维度相同;
对预设维度的矩阵进行特征提取,得到残差矩阵;
将残差矩阵与归一化后的参考图像块的像素矩阵相加,得到相加后的矩阵;
对相加后的矩阵进行反归一化处理,得到待解码图像块的预测值。
仍然以待解码图像块的像素矩阵为一个N×N的矩阵为例来说,获取到的参考图像块的像素矩阵也为一个N×N的矩阵,相邻参考图像块的像素矩阵或者插值后的相邻参考图像块的像素矩阵均为N×N的矩阵,那么,输入值为一个3N×3N的矩阵,将3N×3N的矩阵输入至神经网络中。
其中,该神经网络为基于卷积神经网络的残差网络(ResNet,Residual Network),图5为本申请实施例提供的一种可选的神经网络的结构示意图,参考图5所示,神经网络可以包括4层不同大小和不同深度的卷积核,一个缩放卷积层,一个Res层;下面表1为图5中的神经网络的网络配置详情,如下表1所示:
表1 网络配置详情
Figure PCTCN2019078160-appb-000001
首先,参考图像块的像素矩阵与相邻参考图像块的像素矩阵,或者与插值后的相邻参考图像块的像素矩阵组成的3N×3N的矩阵作为神经网络的输入值(Input)输入至神经网络中,对3N×3N的矩阵进行归一化处理,得到归一化后的输入矩阵,对参考图像块的像素矩阵进行归一化处理,得到归一化后的参考图像块的像素矩阵,然后,利用01、02、03和04层的卷积核对3N×3N的矩阵进行卷积运算;其中,01层卷积核可以为:卷积(32×5×5,stride=1)+Leaky ReLU(alpha=0.5),02层卷积核可以为:卷积(64×5×5,stride=1)+Leaky ReLU(alpha=0.5),03层卷积核可以为:卷积(64×3×3,stride=1)+Leaky ReLU(alpha=0.5),04层卷积核可以为:卷积(64×3×3,stride=1)+Leaky ReLU(alpha=0.5),其中,stride表示卷积的跨度,Leaky ReLU为一种激活函数,alpha为激活函数的参数。
在得到卷积运算后的矩阵之后,解码器采用05层的卷积核(相当于预设的缩放卷积层)对卷积运算后的矩阵进行降维卷积运算,即缩放处理,得到一个N×N的矩阵(相当于预设维度的矩阵),05层卷积核可以为:卷积(16×5×5,stride=3)+ReLU;解码器采用神经网络中的Res网络层的卷积核特征提取,得到残差矩阵,ReS网络层的卷积核可以为:卷积(1×3×3,stride=1)+tanh,tanh为一种双曲函数。
在得到残差矩阵之后,将残差矩阵与归一化后的参考图像块的像素矩阵相加,得到相加后的矩阵,再对相加后的矩阵进行反归一化处理,得到待解码图像块的预测值,作为神经网络的输出值(Output)。
这里,需要说明的是,由于解码器在进行块划分时,像素块大小有8×8、16×16等不同尺寸,不同尺寸的图像块的像素值存在明显的差异,亮度图像块的像素和色度图像块的像素在纹理特征上也有较大的区别,因此,对于不同尺寸的亮度图像块的像素与色度图像块的像素,可以通过训练不同的网络参数以保证得到更优的预测值。
在实际应用中,将神经网络移植到编码器和解码器中,便可以在编码 器和解码器选出最佳匹配块后进行预测值的运算,得到预测值之后,针对编码器来说,帧间预测模块传递给后续模块的预测残差就需要用预测值与待编码图像块之差代替。
也就是说,与用参考图像块和待解码图像块的差值作为预测残差相比,编码器用确定出的预测值与待解码图像块的差值作为预测残差,可以消耗较少的比特流来传输预测残差,可以提高编解码的效率。
这里,类似解码端,当编码器进行编码当前块时,需要将重构出的参考图像块及其相邻参考图像块送入对应的神经网络进行预测值的计算,再将预测值减去预测残差得到当前编码图像块的像素,由此完成当前编码图像块的编码工作,并且保证编解码的一致性。
通过该实例,改进帧间预测技术,使用解码器选出的参考图像块及其相邻参考图像块生成待解码图像块的预测值,提升参考图像块与待编码图像块的相似度,减少预测残差,进而提升编解码性能。
本申请实施例提供了一种预测值的确定方法,该方法包括:获取待解码图像块的参考图像块的像素矩阵,根据参考图像块的像素矩阵,确定输入值,将输入值输入至预设的神经网络中,得到待解码图像块的预测值;也就是说,在本申请实施例中,首先,获取待解码图像块的参考图像块的像素矩阵,然后,利用预设的神经网络对参考图像块的像素矩阵进行处理,如此,利用神经网络得到待编码图像块的预测值,使得预测值更加接近待编码图像块的像素矩阵,从而减小预测残差的比特流,进而可以提高视频图像编解码的效率。
基于前述实施例相同的发明构思,参见图6,图6为本申请实施例提供的一种解码器的结构示意图一,该解码器可以包括:
获取模块61,用于获取待解码图像块的参考图像块的像素矩阵;
确定模块62,用于根据参考图像块的像素矩阵,确定输入值;
处理模块63,用于将输入值输入至预设的神经网络中,得到待解码图 像块的预测值。
在上述方案中,该输入值可以为将参考图像块的像素矩阵。
在上述方案中,确定模块62,包括:
获取子模块,用于获取参考图像块的相邻参考图像块的像素矩阵;
确定子模块,用于根据参考图像块的像素矩阵和相邻参考图像块的像素矩阵,确定输入值。
在上述方案中,该输入值可以为参考图像块的像素矩阵和相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
在上述方案中,确定子模块,具体用于:
根据预设的插值法,对相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;
将参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为神经网络的输入值。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。
另外,在本实施例中的各组成单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移 动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
图7为本申请实施例提供的一种解码器的结构示意图二,如图7所示,本申请实施例提供了一种解码器700,
包括处理器71以及存储有处理器71可执行指令的存储介质72,存储介质72通过通信总线73依赖处理器71执行操作,当指令被处理器71执行时,执行上述实施例一的预测值的确定方法。
需要说明的是,实际应用时,终端中的各个组件通过通信总线73耦合在一起。可理解,通信总线73用于实现这些组件之间的连接通信。通信总线73除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图7中将各种总线都标为通信总线73。
本申请实施例提供了一种计算机存储介质,存储有可执行指令,当所述可执行指令被一个或多个处理器执行的时候,所述处理器执行上述一个或多个实施例所述的预测值的确定方法。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM, SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
而处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来 实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机、计算机、服务器、或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本申请的保护之内。
工业实用性
本申请实施例中,首先,获取待解码图像块的参考图像块的像素矩阵,根据参考图像块的像素矩阵,确定输入值,将输入值输入至预设的神经网络中,得到待解码图像块的预测值;也就是说,在本申请实施例中,首先, 获取待解码图像块的参考图像块的像素矩阵,然后,利用预设的神经网络对参考图像块的像素矩阵进行处理,如此,利用神经网络得到待编码图像块的预测值,使得预测值更加接近待编码图像块的像素矩阵,从而减小预测残差的比特流,进而可以提高视频图像编解码的效率。

Claims (12)

  1. 一种预测值的确定方法,其中,所述方法应用于解码器中,所述方法包括:
    获取待解码图像块的参考图像块的像素矩阵;
    根据所述参考图像块的像素矩阵,确定输入值;
    将所述输入值输入至预设的神经网络中,得到所述待解码图像块的预测值。
  2. 根据权利要求1所述的方法,其中,所述输入值为所述参考图像块的像素矩阵。
  3. 根据权利要求1所述的方法,其中,所述根据所述参考图像块的像素矩阵,确定输入值,包括:
    获取所述参考图像块的相邻参考图像块的像素矩阵;
    根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值。
  4. 根据权利要求3所述的方法,其中,所述输入值为所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
  5. 根据权利要求3所述的方法,其中,所述根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值,包括:
    根据预设的插值法,对所述相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;
    将所述参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为所述神经网络的输入值。
  6. 一种解码器,其中,所述解码器包括:
    获取模块,用于获取待解码图像块的参考图像块的像素矩阵;
    确定模块,用于根据所述参考图像块的像素矩阵,确定输入值;
    处理模块,用于将所述输入值输入至预设的神经网络中,得到所述待解码图像块的预测值。
  7. 根据权利要求6所述的解码器,其中,所述输入值为所述参考图像块的像素矩阵。
  8. 根据权利要求6所述的解码器,其中,所述确定模块,包括:
    获取子模块,用于获取所述参考图像块的相邻参考图像块的像素矩阵;
    确定子模块,用于根据所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵,确定所述输入值。
  9. 根据权利要求8所述的解码器,其中,所述输入值为所述参考图像块的像素矩阵和所述相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵。
  10. 根据权利要求8所述的解码器,其中,所述确定子模块,具体用于:
    根据预设的插值法,对所述相邻参考图像块的像素矩阵进行处理,得到插值后的相邻参考图像块的像素矩阵;
    将所述参考图像块的像素矩阵和插值后的相邻参考图像块的像素矩阵按照相对位置排列组成的像素矩阵,确定为所述神经网络的输入值。
  11. 一种解码器,其中,所述解码器包括:
    处理器以及存储有所述处理器可执行指令的存储介质,所述存储介质通过通信总线依赖所述处理器执行操作,当所述指令被所述处理器执行时,执行上述的权利要求1至5任一项所述的预测值的确定方法。
  12. 一种计算机存储介质,其中,存储有可执行指令,当所述可执行指令被一个或多个处理器执行的时候,所述处理器执行所述的权利要求1至5任一项所述的预测值的确定方法。
PCT/CN2019/078160 2019-03-14 2019-03-14 预测值的确定方法、解码器以及计算机存储介质 WO2020181554A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/078160 WO2020181554A1 (zh) 2019-03-14 2019-03-14 预测值的确定方法、解码器以及计算机存储介质
CN201980093336.8A CN113490953A (zh) 2019-03-14 2019-03-14 预测值的确定方法、解码器以及计算机存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078160 WO2020181554A1 (zh) 2019-03-14 2019-03-14 预测值的确定方法、解码器以及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2020181554A1 true WO2020181554A1 (zh) 2020-09-17

Family

ID=72427772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078160 WO2020181554A1 (zh) 2019-03-14 2019-03-14 预测值的确定方法、解码器以及计算机存储介质

Country Status (2)

Country Link
CN (1) CN113490953A (zh)
WO (1) WO2020181554A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363632A (zh) * 2021-12-10 2022-04-15 浙江大华技术股份有限公司 帧内预测方法、编解码方法、编解码器、系统、电子设备和存储介质
WO2024055525A1 (zh) * 2022-09-16 2024-03-21 苏州元脑智能科技有限公司 一种视频图像数据存储的方法、装置、设备及可读介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
CN108833925A (zh) * 2018-07-19 2018-11-16 哈尔滨工业大学 一种混合视频编解码系统中基于深度神经网络的帧间预测方法
WO2019031410A1 (ja) * 2017-08-10 2019-02-14 シャープ株式会社 画像フィルタ装置、画像復号装置、および画像符号化装置
US10223614B1 (en) * 2018-09-04 2019-03-05 StradVision, Inc. Learning method, learning device for detecting lane through classification of lane candidate pixels and testing method, testing device using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
WO2019031410A1 (ja) * 2017-08-10 2019-02-14 シャープ株式会社 画像フィルタ装置、画像復号装置、および画像符号化装置
CN108833925A (zh) * 2018-07-19 2018-11-16 哈尔滨工业大学 一种混合视频编解码系统中基于深度神经网络的帧间预测方法
US10223614B1 (en) * 2018-09-04 2019-03-05 StradVision, Inc. Learning method, learning device for detecting lane through classification of lane candidate pixels and testing method, testing device using the same

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363632A (zh) * 2021-12-10 2022-04-15 浙江大华技术股份有限公司 帧内预测方法、编解码方法、编解码器、系统、电子设备和存储介质
WO2024055525A1 (zh) * 2022-09-16 2024-03-21 苏州元脑智能科技有限公司 一种视频图像数据存储的方法、装置、设备及可读介质

Also Published As

Publication number Publication date
CN113490953A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
US20160373767A1 (en) Encoding and Decoding Methods and Apparatuses
CN110166771B (zh) 视频编码方法、装置、计算机设备和存储介质
WO2019157717A1 (zh) 运动补偿的方法、装置和计算机系统
WO2021203394A1 (zh) 环路滤波的方法与装置
WO2021134706A1 (zh) 环路滤波的方法与装置
KR20210042355A (ko) 비디오 이미지 성분의 예측 방법, 장치 및 컴퓨터 저장 매체
CN115606179A (zh) 用于使用学习的下采样特征进行图像和视频编码的基于学习的下采样的cnn滤波器
WO2020192034A1 (zh) 滤波方法及装置、计算机存储介质
WO2020181554A1 (zh) 预测值的确定方法、解码器以及计算机存储介质
WO2021056433A1 (zh) 预测值的确定方法、解码器以及计算机存储介质
WO2020192085A1 (zh) 图像预测方法、编码器、解码器以及存储介质
WO2018040869A1 (zh) 一种帧间预测编码方法及装置
CN110913219A (zh) 一种视频帧预测方法、装置及终端设备
WO2022266955A1 (zh) 图像解码及处理方法、装置及设备
JP2022553594A (ja) インター予測方法および装置、機器、記憶媒体
WO2020181474A1 (zh) 预测值的确定方法、编码器以及计算机存储介质
WO2020192084A1 (zh) 图像预测方法、编码器、解码器以及存储介质
CN110830806A (zh) 一种视频帧预测方法、装置及终端设备
US11659187B2 (en) Method for determining prediction direction, decoder, and computer storage medium
CN112313950A (zh) 视频图像分量的预测方法、装置及计算机存储介质
US20220046231A1 (en) Video encoding/decoding method and device
WO2024113311A1 (zh) 编解码方法、编解码器、码流以及存储介质
WO2022246809A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2020192180A1 (zh) 图像分量的预测方法、编码器、解码器及计算机存储介质
WO2023197192A1 (zh) 编解码方法、装置、编码设备、解码设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919501

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19919501

Country of ref document: EP

Kind code of ref document: A1