CN110708559A - Image processing method, device and storage medium - Google Patents

Image processing method, device and storage medium Download PDF

Info

Publication number
CN110708559A
CN110708559A CN201910829305.6A CN201910829305A CN110708559A CN 110708559 A CN110708559 A CN 110708559A CN 201910829305 A CN201910829305 A CN 201910829305A CN 110708559 A CN110708559 A CN 110708559A
Authority
CN
China
Prior art keywords
image block
training image
training
pixel points
target image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910829305.6A
Other languages
Chinese (zh)
Other versions
CN110708559B (en
Inventor
张元尊
郑云飞
闻兴
于冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910829305.6A priority Critical patent/CN110708559B/en
Publication of CN110708559A publication Critical patent/CN110708559A/en
Application granted granted Critical
Publication of CN110708559B publication Critical patent/CN110708559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to an image processing method, apparatus, and storage medium, the method comprising: aiming at each target image block to be predicted in a target image frame, acquiring reference pixel points of the target image block; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the method has the advantages of improving the offset accuracy of intra-frame prediction during image processing and further improving the accuracy of intra-frame prediction results.

Description

Image processing method, device and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, and a storage medium.
Background
In general, for an image, the luminance and chrominance values of two adjacent pixels are often relatively close, i.e. the color changes gradually, and does not change to a completely different color at a glance. Video coding is performed in order to take advantage of this correlation for compression. The intra-frame prediction is to use the correlation of the video spatial domain and the adjacent coded pixels in the same frame of image to predict the current pixel so as to achieve the purpose of effectively removing the video spatial redundancy.
The original idea of the Intra Prediction technique MIP (Matrix Weighted Intra Prediction) is derived from an Intra Prediction technique based on a neural network, that is, a multi-layer neural network is used to predict pixel values of a current image block based on adjacent pixels. However, the complexity of the prediction mode is too high, and an intra-frame prediction technology based on linear affine transformation is developed through balancing. Specifically, for a W H tile in the current image frame, the input of MIP is W pixels above this tile and H pixels to the left, and the original MIP prediction mode can be simplified by the current intra prediction mode plus an offset.
However, in the related art, MIP only provides a framework for intra-frame prediction, and does not provide an offset obtaining manner, so that accuracy of an intra-frame prediction result is affected under the condition that offset is inaccurate, and accuracy of an encoding result based on MIP intra-frame prediction is affected finally.
Disclosure of Invention
The present disclosure provides an image processing method, apparatus and storage medium to at least solve the problem in the related art that the accuracy of an intra prediction result is affected due to an inaccurate offset. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including:
aiming at each target image block to be predicted in a target image frame, acquiring reference pixel points of the target image block;
acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;
and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.
Optionally, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
Optionally, before the step of obtaining the prediction result of the target image block according to the reference pixel point, a preset matrix, and a pre-trained offset corresponding to the target image block, the method further includes:
obtaining coding residual errors of a plurality of training image blocks in an existing intra-frame prediction mode;
and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.
Optionally, the step of obtaining the offset according to the encoded residual, the W + n pixel points above the training image block, the H + m pixel points on the left side of the training image block, and the j pixel points on the upper left corner of the training image block includes:
training a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;
and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode.
Optionally, the step of obtaining the offset according to the encoded residual, the W + n pixel points above the training image block, the H + m pixel points on the left side of the training image block, and the j pixel points on the upper left corner of the training image block includes:
training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold;
and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block.
Optionally, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.
According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:
the reference pixel point acquisition module is configured to execute the steps of acquiring reference pixel points of target image blocks aiming at each target image block to be predicted in a target image frame;
the intra-frame prediction module is configured to execute the prediction of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;
and the predicted image frame acquisition module is configured to acquire a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.
Optionally, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
Optionally, the apparatus further comprises:
a coding residual obtaining module configured to obtain coding residuals of a plurality of the training image blocks by an existing intra prediction manner;
and the offset acquisition module is configured to acquire the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.
Optionally, the offset obtaining module includes:
a first model training sub-module configured to perform training of a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner, and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;
a first offset obtaining sub-module configured to perform an operation of taking an output of the machine learning model as an offset corresponding to the intra prediction mode.
Optionally, the offset obtaining module includes:
a second model training sub-module configured to train a preset machine learning model through a coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, and j pixel points at the upper left corner until a second norm between an output of the machine learning model and the coding residual meets a second preset threshold;
and the second offset acquisition sub-module is configured to execute the output of the machine learning model as the offset corresponding to all the intra-frame prediction modes corresponding to the training image block.
Optionally, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.
According to a third aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any one of the image processing methods as described above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an image processing apparatus, enable the image processing apparatus to perform any one of the image processing methods as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when the instructions in the storage medium are executed by a processor of an image processing apparatus, enables the image processing apparatus to perform any one of the image processing methods as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in the embodiment of the disclosure, a reference pixel point of each target image block to be predicted in a target image frame is obtained; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the offset accuracy in intra-frame prediction is improved, the intra-frame prediction accuracy is improved, the coded data quantity is reduced, and the coding effect is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is one of the flow diagrams illustrating one method of image processing according to one exemplary embodiment.
Fig. 2 is a process diagram illustrating a MIP in accordance with an exemplary embodiment.
Fig. 3A is one of the schematic diagrams illustrating a downsampling process according to an exemplary embodiment.
Fig. 3B is a second schematic diagram illustrating a downsampling process according to an exemplary embodiment.
Fig. 4 is a diagram illustrating a difference process for MIP according to an exemplary embodiment.
FIG. 5 is a second flowchart illustrating a method of image processing according to an exemplary embodiment.
FIG. 6 is a diagram illustrating a reference pixel of a test image block according to an exemplary embodiment.
Fig. 7 is one of block diagrams illustrating an image processing apparatus according to an exemplary embodiment.
Fig. 8 is a second block diagram of an image processing apparatus according to an exemplary embodiment.
FIG. 9 is a block diagram illustrating an apparatus in accordance with an example embodiment.
FIG. 10 is a block diagram illustrating an apparatus in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment, which may include the steps of, as shown in fig. 1:
in step S11, for each target image block to be predicted in the target image frame, a reference pixel point of the target image block is obtained.
In step S12, obtaining a prediction result of the target image block according to the reference pixel point, a preset matrix, and a pre-trained offset corresponding to the target image block;
in step S13, a predicted image frame corresponding to the target image frame is obtained according to the prediction result of each target image block.
As mentioned above, for a W × H image block to be predicted in an image frame, the input of the MIP is W pixels above the image block and H pixels on the left side, as shown in fig. 2 as an example, the general framework of the MIP includes: carrying out average operation (down sampling) on W pixel points above the image block and H pixel points on the left side to obtain sampling points (4 or 8); performing matrix vector multiplication on the obtained sampling points, and adding an offset to obtain partial prediction values (for example, matrices with the sizes of 4x4, 4x8, 8x4, 8x8 and the like); and carrying out bilinear interpolation according to the partial predicted values to obtain a final brightness predicted value. Firstly, a target image block with the size of W multiplied by H is predicted, predicted reference pixel points are W pixel points above and H pixel points on the left side of the corresponding target image block, and the reference pixel points can be obtained in the same mode as the traditional intra-frame prediction. And then, the (W + H) pixels can be used for obtaining a final predicted value of the target image block through three steps of averaging, affine transformation and upsampling. And then, according to the prediction result of each target image block, obtaining a prediction image frame corresponding to the target image frame.
Therefore, in the embodiment of the present disclosure, for each target image block to be predicted in the target image frame, the reference pixel point of the target image block may also be obtained first. Specifically, assuming that the size of the target image block is W × H, W pixels above the target image block and H pixels on the left side of the target image block may also be obtained as reference pixels. In addition, in the embodiment of the present disclosure, according to different prediction requirements, the target image block may include a luminance component, a chrominance component, and the like of each pixel point, and accordingly, in a subsequent prediction process, the luminance component and/or the chrominance component of the middle pixel point of the target image block are also predicted according to the luminance component, and/or the chrominance component, and the like of the reference pixel point.
Wherein the main purpose of the reference pixel point pre-processing (averaging/down-sampling) is to normalize the size of the reference pixels. As shown in fig. 3A and 3B, the reference pixel points of the target image block of 4 × 4 may be normalized to 4 pixels, and the reference pixel points of the target image block of other cases may be normalized to 8 pixels, that is, to the input long reference boundary pixel bdrytopAnd bdryleftConverting into short boundary reference pixel bdry according to coding unit sizetop redAnd bdryleft redSo as to reduce the calculation amount and the storage space of the model parameters in the prediction process. Then, the average converted left and upper short reference pixels are spliced (concat) into a vector bdryred
The MIP prediction pixels are generated by matrix weighting the averaged reference pixels and then adding an offset, i.e. a linear affine transformation. Predicted value is eyeDownsampled prediction signal pred of target prediction valueredOf size Wred×Hred. The process of affine transformation generating a prediction signal can be represented as aK*bdryred+bK=predredWherein A isKAnd bKRespectively, a matrix and an offset corresponding to the intra prediction mode k trained in advance.
The down-sampled prediction signal of the target image block is generated by linear affine transformation, and the residual prediction value of the target image block can be obtained by preredAny available linear interpolation is performed. Depending on the size of the target image block, horizontal difference, vertical interpolation or interpolation in both directions may be required, and if interpolation is required in both directions, if W of the target image block<H, performing interpolation in the horizontal direction first and then performing interpolation in the vertical direction; otherwise, the vertical interpolation is firstly carried out, and then the horizontal interpolation is carried out. Fig. 4 is a schematic diagram illustrating a process of linear interpolation based on a downsampled prediction signal of an 8 × 8 target image block.
Wherein one AKAnd a bKConstitute a MIP mode. The mode of MIP intra prediction can be subdivided into the following intra prediction modes according to the size of the image block:
group 0:4x4, comprising 35 MIP intra prediction modes;
tiles of Group 1:8x4, 4x8, 8x8, containing 19 MIP intra prediction modes;
group 2. for other tiles, 11 MIP intra prediction modes are included.
However, in practical application, a corresponding to different MIP intra-frame prediction modes accurately is not definedKAnd bKTherefore, the prediction result is inaccurate under the condition of inaccurate offset setting, and the coding and decoding effects of the image frame are further influenced. The offset may be fixed according to a different block size in each current intra prediction mode, or may be adaptively calculated according to the current prediction mode and the block size.
MIP intra prediction as described above takes only W pixels above the image block and H pixels to the left. However, the relatively close pixels have certain relevance, and the intra-frame prediction can be better performed by taking a few more pixels. Therefore, in the embodiment of the present disclosure, in order to improve the accuracy of the offset set in the MIP intra-frame prediction, the offset of the MIP intra-frame prediction may be obtained through pre-training, and then the prediction result of the target image block is obtained according to the reference pixel point, the preset matrix, and the pre-trained offset corresponding to the target image block. The offset training method may be preset according to a requirement, and the embodiment of the present disclosure is not limited thereto.
For example, the intra-frame prediction offset may be obtained by training according to a training image block obtained by the existing intra-frame prediction method to obtain a coding residual and training reference pixel points corresponding to the corresponding training image block. The training reference pixel points corresponding to the training image blocks can be set according to requirements, and the embodiment of the disclosure is not limited.
Optionally, in this embodiment of the present disclosure, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction method, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
The specific values of n, m, and j may be preset according to requirements, and the embodiment of the present disclosure is not limited. In addition, in the embodiment of the present disclosure, n pixel points among the upper W + n pixel points may not be in the same row as the W pixel points, and the W + n pixel points may also be in the same row; correspondingly, m pixel points among the left H + m pixel points may not be in the same row with the H pixel points, and also may be in the same row with the H + m pixel points, and specifically, the m pixel points may be preset according to requirements, and the embodiment of the present disclosure is not limited. The coding residual of the training image block can be understood as the difference between the initial matrix of the training image block and the prediction result of the training image block obtained by the existing intra-frame prediction mode.
For example, taking the upper W + n pixels as an example, the W pixels may be the W pixels P above the target image block and adjacent to the top W internal pixels in the target image blockWAnd the other n pixel points can be in PWAbove and with PWAny n adjacent pixel points; or, W + n pixel points may be located in a pixel row above and adjacent to the target image block, and W pixel points may be W pixel points P above and adjacent to the top W internal pixel points in the target image blockWIn addition, n pixel points can be in PWOn the extension line of (a).
Moreover, in the embodiment of the present disclosure, the relationship between the offset and the coding residual of each training image block, the W + n pixel points above, the H + m pixel points on the left side, and the j pixel points on the upper left corner may be preset according to the requirement, and the embodiment of the present disclosure is not limited.
For example, a preset machine learning model can be trained by the coding residuals of a plurality of training image blocks, W + n pixel points above, H + m pixel points on the left side, and j pixel points on the upper left corner, and then the training result is used as an offset. Furthermore, depending on the machine learning model set up, the training results may be one or more values, and accordingly one or more offsets may be obtained. Moreover, when the same image block size corresponding to a plurality of offsets is obtained, in the subsequent application process, prediction can be performed according to each offset, and the best effect is taken as the final prediction result.
In the embodiment of the disclosure, a reference pixel point of each target image block to be predicted in a target image frame is obtained; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the offset accuracy of the MIP is improved, the intra-frame prediction accuracy is improved, the coding data quantity is reduced, and the coding effect is improved.
Moreover, the offset is obtained by training a training image block which is obtained by obtaining a coding residual through the existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block. The offset accuracy of intra prediction and the accuracy of the intra prediction result can be further improved.
Referring to fig. 5, in the embodiment of the present disclosure, before step S12, the method may further include:
and step S14, obtaining the coding residuals of the plurality of training image blocks by the existing intra-frame prediction mode.
Step S15, obtaining the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.
As described above, in the embodiment of the present disclosure, the offset at the time of intra prediction of the target image block may be acquired in advance. Further, as described above, tiles of different sizes correspond to different MIP intra prediction modes, and the different MIP intra prediction modes correspond to respective offsets.
Therefore, in the embodiment of the present disclosure, in order to obtain an offset when performing intra prediction on a target image block, a selected training image block may be the same as the size of the target image block, and of course, in the embodiment of the present disclosure, the size of the training image block may also be different from the size of the target image block, which is not limited in the embodiment of the present disclosure.
Firstly, the coding residuals of a plurality of training image blocks need to be obtained by the existing intra prediction mode. The conventional intra prediction method may be any available conventional intra prediction method, and may be preset according to a requirement, which is not limited in this disclosure. Moreover, the intra prediction method used by each training image block in the training process is the same, and may also be different, and the embodiment of the present disclosure is not limited thereto.
The W + n pixel points above the training image block can be understood as the W + n pixel points above the training image block in the image frame where the training image block is located, correspondingly, the H + m pixel points on the left side of the training image block can be understood as the H + m pixel points on the left side of the training image block in the image frame where the training image block is located, and the j pixel points on the upper left corner of the training image block can be understood as the j pixel points on the upper left corner of the training image block in the image frame where the training image block is located.
Optionally, in an embodiment of the present disclosure, the step S15 further may include:
step A1, training a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and the intra-frame prediction mode of the training image block until the two norms between the output of the machine learning model and the coding residual meet a first preset threshold.
And a step a2 of using the output of the machine learning model as the offset corresponding to the intra prediction mode.
As described above, in practical applications, there are 65 kinds of MIP intra prediction modes according to different tile sizes, and each MIP intra prediction mode corresponds to a pair of matrices and offsets. That is, the target image block may also correspond to a plurality of different MIP intra prediction methods.
Therefore, in the embodiment of the present disclosure, in order to improve the accuracy of the offset, offsets corresponding to different MIP intra prediction modes can be obtained through training in advance. Then the training data in training the machine learning model at this time may also include the intra-prediction mode of the training image block, i.e., the MIP intra-prediction mode.
For example, if we want to train the offsets of 35 MIP intra prediction modes corresponding to 4 × 4 image blocks, we can obtain multiple 4 × 4 training image blocks for each MIP intra prediction mode, train a machine learning model with W + n pixel points above each training image block, H + m pixel points on the left side, j pixel points on the upper left corner, and a corresponding MIP intra prediction mode, and thus obtain the offsets corresponding to the corresponding MIP intra prediction modes. Also, the MIP intra-prediction mode that is the training data of the machine learning model may be an identification of the corresponding MIP intra-prediction mode, for example, if the MIP intra-prediction mode that is currently used as the training data is the MIP intra-prediction mode type 5, then "5" may be used as the training data corresponding to the corresponding MIP intra-prediction mode.
Moreover, in the embodiment of the present disclosure, the offsets of the MIP intra prediction modes corresponding to image blocks of other sizes than the target image block may also be trained in advance. Then, a plurality of training image blocks corresponding to each MIP intra-frame prediction mode can be respectively obtained, and further, training is respectively performed, so as to obtain an offset corresponding to each MIP intra-frame prediction mode. Moreover, a plurality of offsets corresponding to a certain MIP intra-frame prediction mode may exist in the obtained training result, and in the subsequent application process, prediction may be performed based on each offset, and an optimal value in the prediction result may be taken. Or, there may be no offset corresponding to a certain MIP intra prediction mode obtained through training, and then, the offset of another MIP intra prediction mode corresponding to the image block with the same size may be referred to for prediction.
In practical application, the coding residual d is a difference between the original coding block X and a prediction result X 'obtained by the conventional intra prediction method, that is, X ═ X' + d, and MIP prediction result pred ═ aK*bdryred+bKAnd the original MIP prediction mode can be simplified by adding an offset to the existing intra prediction mode, i.e. the above AK*bdryredWhen the current intra prediction mode is determined as the prediction result X ', X-pred ═ X ' + d- (X ' + b) is determined at this timek)=d-bkI.e. if offset b iskCloser to the braidThe code residual d, the closer the MIP prediction result is to the original coding block, the smaller the data amount of the coding residual based on the MIP prediction, and the smaller the data amount which needs to be coded subsequently.
Therefore, in the embodiment of the present disclosure, a first preset threshold may be set for a distance, that is, a second norm, between a training result output by the machine learning model and a coding residual obtained based on an existing intra-frame coding mode, and when the second norm between the output of the machine learning model and the coding residual of a corresponding training image satisfies the first preset threshold, a current output of the machine learning model may be used as an offset corresponding to the currently trained intra-frame prediction mode. The first preset threshold may be preset according to a requirement, and the embodiment of the present disclosure is not limited thereto.
Optionally, in an embodiment of the present disclosure, the step S15 further may include:
and B1, training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold.
And B2, using the output of the machine learning model as the offset corresponding to all intra-frame prediction modes corresponding to the training image block.
In addition, in this embodiment of the present disclosure, when training offset, different MIP intra-frame prediction modes may not be distinguished, so that the training data at this time may include a coding residual of a training image block, where W + n pixel points above the training image block, H + m pixel points on the left side, and j pixel points on the upper left corner, and may not include the MIP intra-frame prediction mode of the training image block, and then a preset machine learning model may be trained by the training data of a plurality of training image blocks until two norms between an output of the machine learning model and the coding residual of a corresponding training image block satisfy a second preset threshold, and then the current output of the machine learning model may be used as an offset corresponding to all intra-frame prediction modes corresponding to the corresponding training image block.
The second preset threshold may be the same as or different from the first preset threshold, and may be preset according to a requirement, which is not limited in this embodiment of the present disclosure.
As described above, the MIP intra prediction modes are different from each other according to the sizes of the patches, but if the intra prediction mode of each training patch is not input when the machine learning model is trained, and the size of each training patch belongs to the same Group (Group), the training result of the machine learning can be used as the offset of each MIP intra prediction mode included in the Group corresponding to the training patch.
For example, in the embodiment of the present disclosure, the offsets of 35 MIP intra prediction modes corresponding to Group 0 can be obtained through training of 4 × 4 training image blocks; training to obtain 19 offsets of MIP intra-frame prediction modes corresponding to Group 1 through 8 × 4, and/or 4 × 8, and/or 8 × 8 training image blocks; and training the training image blocks with other sizes to obtain 11 MIP intra-frame prediction modes corresponding to the Group 2.
Of course, in the embodiment of the present disclosure, an offset obtained by one training may also be used as an offset of each MIP intra-prediction mode corresponding to all packets, and the training image block at this time may include training image blocks of different size types in the above three packets, or may only include training image blocks of different size types in a part of packets, which may be preset according to requirements, and the embodiment of the present disclosure is not limited thereto.
The machine learning models can be set according to requirements, and the same or different machine learning models can be adopted when training offsets corresponding to different sizes or different MIP intra-frame prediction modes, which does not limit the embodiments of the present disclosure.
Optionally, in this disclosure, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the target image block.
In the embodiment of the present disclosure, in order to balance the accuracy and the training efficiency of the offset obtained by training, for each training pixel block, an integer multiple of the length of each training pixel block and an integer multiple of the width of each training pixel block may be taken as reference pixel points, and since the original MIP intra-frame prediction takes a pixel point that is one time as long as an image block and a pixel point that is one time as wide as a reference pixel point, in the process of training offset, more reference pixel points may be taken with respect to the original MIP intra-frame prediction, and in addition, since the reference value of a pixel point closer to an image block is larger, in the embodiment of the present disclosure, it is preferable that the value of n is the same as the width W of a corresponding training image block, the value of m is the same as the height H of a corresponding training image block, and. Also can get 2W pixel points of training image piece top, 2H pixel points on the left side of training image piece, 1 pixel point of training image piece upper left corner, as the reference pixel point in the training data, and 2W pixel points of training image piece top are located same pixel line, and 2H pixel points on the left side of training image piece are located same pixel row, and corresponding pixel line and pixel row all are adjacent with training image piece.
Fig. 6 is a schematic diagram of a reference pixel of a training image block. Wherein, light grey region indicates for training image piece, and the pixel that the width that dark region in training image piece top shows is W and the extension that dark region right side length is W are 2W pixel of training image piece top, and the pixel that the height that dark region in training image piece left side shows is H and the extension that dark region below height is H are 2H pixel of training image piece left side. In addition, there are 1 pixel point in the upper left corner of the training image block.
Therefore, when the W + n pixel points above the training image block are obtained, the W pixel points are the W pixel points adjacent to the top pixel point in the training image block, and the other n pixel points can be the n pixel points which are positioned on the same line with the W pixel points and adjacent to the right side of the W pixel points; correspondingly, when H + m pixel points on the left side of the training image block are obtained, H pixel points are H pixel points adjacent to the leftmost pixel point in the training image block, and the other m pixel points can be m pixel points which are located in the same column as the H pixel points and adjacent to the lower portion of the H pixel points.
In the embodiment of the present disclosure, the coding residuals of a plurality of training image blocks may be obtained in an existing intra prediction manner; and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block. According to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block, training a preset machine learning model until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold; and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode. And/or training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold; and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block. Therefore, the accuracy of the offset during the intra-frame prediction can be further improved, and the accuracy of the intra-frame prediction can be improved.
In addition, in the embodiment of the present disclosure, a value of n is the same as that of W, a value of m is the same as that of H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block. The data size of model training can be reduced while the accuracy of migration is improved, and therefore the efficiency of migration training is improved.
Fig. 7 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus includes a reference pixel point obtaining module 21, an intra prediction module 22, and a predicted image frame obtaining module 23.
The reference pixel point obtaining module 21 is configured to perform, for each target image block to be predicted in the target image frame, obtaining a reference pixel point of the target image block.
And the intra-frame prediction module 22 is configured to perform prediction according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block to obtain a prediction result of the target image block.
And the predicted image frame obtaining module 23 is configured to obtain a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.
Optionally, in this embodiment of the present disclosure, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction method, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
In the embodiment of the disclosure, a reference pixel point of each target image block to be predicted in a target image frame is obtained; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the offset accuracy of the MIP is improved, the intra-frame prediction accuracy is improved, the coding data quantity is reduced, and the coding effect is improved.
The offset is obtained by training a training image block which is obtained by obtaining a coding residual through an existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block. Referring to fig. 8, the image processing apparatus may further include:
and the coding residual acquiring module 24 is configured to acquire the coding residuals of the plurality of training image blocks by using the existing intra-frame prediction mode.
And an offset obtaining module 25 configured to obtain the offset according to the encoded residual, the W + n pixel points above the training image block, the H + m pixel points on the left side of the training image block, and the j pixel points on the upper left corner of the training image block.
Optionally, in this embodiment of the present disclosure, the offset obtaining module 25 may further include:
a first model training sub-module configured to perform training of a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner, and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;
a first offset obtaining sub-module configured to perform an operation of taking an output of the machine learning model as an offset corresponding to the intra prediction mode.
Optionally, in this embodiment of the present disclosure, the offset obtaining module 25 may further include:
a second model training sub-module configured to train a preset machine learning model through a coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, and j pixel points at the upper left corner until a second norm between an output of the machine learning model and the coding residual meets a second preset threshold;
and the second offset acquisition sub-module is configured to execute the output of the machine learning model as the offset corresponding to all the intra-frame prediction modes corresponding to the training image block.
Optionally, in this disclosure, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.
In the embodiment of the present disclosure, the coding residuals of a plurality of training image blocks may be obtained in an existing intra prediction manner; and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block. According to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block, training a preset machine learning model until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold; and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode. And/or training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold; and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block. Therefore, the accuracy of the offset during the intra-frame prediction can be further improved, and the accuracy of the intra-frame prediction can be improved.
In addition, in the embodiment of the present disclosure, a value of n is the same as that of W, a value of m is the same as that of H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block. The data size of model training can be reduced while the accuracy of migration is improved, and therefore the efficiency of migration training is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 9 is a block diagram illustrating an apparatus 300 for image processing according to an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 9, the apparatus 300 may include one or more of the following components: a processing component 302, a memory 304, a power component 306, a multimedia component 308, an audio component 310, an input/output (I/O) interface 312, a sensor component 314, and a communication component 316.
The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.
The memory 304 is configured to store various types of data to support operations at the device 300. Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 306 provides power to the various components of the device 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 300.
The multimedia component 308 includes a screen that provides an output interface between the device 300 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.
The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, sensor assembly 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of apparatus 300, the change in position of apparatus 300 or a component of apparatus 300, the presence or absence of user contact with apparatus 300, the orientation or acceleration/deceleration of apparatus 300, and the change in temperature of apparatus 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The apparatus 300 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 10 is a block diagram illustrating an apparatus 400 for image processing according to an example embodiment. For example, the apparatus 400 may be provided as a server. Referring to fig. 10, apparatus 400 includes a processing component 422, which further includes one or more processors, and memory resources, represented by memory 432, for storing instructions, such as applications, that are executable by processing component 422. The application programs stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processing component 422 is configured to execute instructions to perform the image processing method described above.
The apparatus 400 may also include a power component 426 configured to perform power management of the apparatus 400, a wired or wireless network interface 450 configured to connect the apparatus 400 to a network, and an input output (I/O) interface 458. The apparatus 400 may operate based on an operating system stored in the memory 432, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An image processing method, comprising:
aiming at each target image block to be predicted in a target image frame, acquiring reference pixel points of the target image block;
acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;
and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.
2. The method according to claim 1, wherein the offset is obtained by training a training image block obtained by an existing intra prediction method to obtain a coded residual, and W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
3. The method according to claim 2, wherein before the step of obtaining the prediction result of the target image block according to the reference pixel, a preset matrix, and a pre-trained offset corresponding to the target image block, the method further comprises:
obtaining coding residual errors of a plurality of training image blocks in an existing intra-frame prediction mode;
and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.
4. The method of claim 3, wherein the step of obtaining the offset according to the encoded residual, and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels at the top left corner of the training image block comprises:
training a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;
and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode.
5. The method of claim 3, wherein the step of obtaining the offset according to the encoded residual, and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels at the top left corner of the training image block comprises:
training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold;
and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block.
6. The method according to any one of claims 2 to 5, wherein n has the same value as W, m has the same value as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.
7. An image processing apparatus characterized by comprising:
the reference pixel point acquisition module is configured to execute the steps of acquiring reference pixel points of target image blocks aiming at each target image block to be predicted in a target image frame;
the intra-frame prediction module is configured to execute the prediction of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;
and the predicted image frame acquisition module is configured to acquire a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.
8. The apparatus of claim 7, wherein the offset is obtained from a training image block obtained by an existing intra prediction method and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.
9. An image processing apparatus characterized by comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the image processing method of any one of claims 1 to 6.
10. A storage medium in which instructions, when executed by a processor of an image processing apparatus, enable the image processing apparatus to perform the image processing method according to any one of claims 1 to 6.
CN201910829305.6A 2019-09-03 2019-09-03 Image processing method, device and storage medium Active CN110708559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910829305.6A CN110708559B (en) 2019-09-03 2019-09-03 Image processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910829305.6A CN110708559B (en) 2019-09-03 2019-09-03 Image processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN110708559A true CN110708559A (en) 2020-01-17
CN110708559B CN110708559B (en) 2022-03-25

Family

ID=69193536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910829305.6A Active CN110708559B (en) 2019-09-03 2019-09-03 Image processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN110708559B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020211807A1 (en) * 2019-04-16 2020-10-22 Beijing Bytedance Network Technology Co., Ltd. Matrix derivation in intra coding mode
WO2020221373A1 (en) * 2019-05-01 2020-11-05 Beijing Bytedance Network Technology Co., Ltd. Matrix-based intra prediction using filtering
CN113706642A (en) * 2021-08-31 2021-11-26 北京三快在线科技有限公司 Image processing method and device
US11425389B2 (en) 2019-04-12 2022-08-23 Beijing Bytedance Network Technology Co., Ltd. Most probable mode list construction for matrix-based intra prediction
US11451784B2 (en) 2019-05-31 2022-09-20 Beijing Bytedance Network Technology Co., Ltd. Restricted upsampling process in matrix-based intra prediction
US11546633B2 (en) 2019-05-01 2023-01-03 Beijing Bytedance Network Technology Co., Ltd. Context coding for matrix-based intra prediction
US11606570B2 (en) 2019-10-28 2023-03-14 Beijing Bytedance Network Technology Co., Ltd. Syntax signaling and parsing based on colour component
US11659185B2 (en) 2019-05-22 2023-05-23 Beijing Bytedance Network Technology Co., Ltd. Matrix-based intra prediction using upsampling
US11805275B2 (en) 2019-06-05 2023-10-31 Beijing Bytedance Network Technology Co., Ltd Context determination for matrix-based intra prediction
JP7538243B2 (en) 2020-04-02 2024-08-21 フラウンホッファー-ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ MIP for all channels for 4:4:4 chroma format and single tree

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100246675A1 (en) * 2009-03-30 2010-09-30 Sony Corporation Method and apparatus for intra-prediction in a video encoder
CN101888556A (en) * 2008-03-19 2010-11-17 华为技术有限公司 Coding method, decoding method, coding device and decoding device
CN104702959A (en) * 2015-03-20 2015-06-10 上海国茂数字技术有限公司 Intra-frame prediction method and system of video coding
CN108427951A (en) * 2018-02-08 2018-08-21 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and computer equipment
CN108495135A (en) * 2018-03-14 2018-09-04 宁波大学 A kind of fast encoding method of screen content Video coding
CN109792526A (en) * 2016-09-30 2019-05-21 高通股份有限公司 The improvement of decoding mode is converted in frame per second
TW201924331A (en) * 2017-10-13 2019-06-16 弗勞恩霍夫爾協會 Intra-prediction mode concept for block-wise picture coding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101888556A (en) * 2008-03-19 2010-11-17 华为技术有限公司 Coding method, decoding method, coding device and decoding device
US20100246675A1 (en) * 2009-03-30 2010-09-30 Sony Corporation Method and apparatus for intra-prediction in a video encoder
CN101854545A (en) * 2009-03-30 2010-10-06 索尼公司 The method of intra-prediction and the equipment that are used for video encoder
CN104702959A (en) * 2015-03-20 2015-06-10 上海国茂数字技术有限公司 Intra-frame prediction method and system of video coding
CN109792526A (en) * 2016-09-30 2019-05-21 高通股份有限公司 The improvement of decoding mode is converted in frame per second
TW201924331A (en) * 2017-10-13 2019-06-16 弗勞恩霍夫爾協會 Intra-prediction mode concept for block-wise picture coding
CN108427951A (en) * 2018-02-08 2018-08-21 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and computer equipment
CN108495135A (en) * 2018-03-14 2018-09-04 宁波大学 A kind of fast encoding method of screen content Video coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MICHAEL SCHAFER等: "AN AFFINE-LINEAR INTRA PREDICTION WITH COMPLEXITY CONSTRAINTS", 《2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING(ICIP)》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11831877B2 (en) 2019-04-12 2023-11-28 Beijing Bytedance Network Technology Co., Ltd Calculation in matrix-based intra prediction
US11425389B2 (en) 2019-04-12 2022-08-23 Beijing Bytedance Network Technology Co., Ltd. Most probable mode list construction for matrix-based intra prediction
US11451782B2 (en) 2019-04-12 2022-09-20 Beijing Bytedance Network Technology Co., Ltd. Calculation in matrix-based intra prediction
US11457220B2 (en) 2019-04-12 2022-09-27 Beijing Bytedance Network Technology Co., Ltd. Interaction between matrix-based intra prediction and other coding tools
US11463702B2 (en) 2019-04-12 2022-10-04 Beijing Bytedance Network Technology Co., Ltd. Chroma coding mode determination based on matrix-based intra prediction
WO2020211807A1 (en) * 2019-04-16 2020-10-22 Beijing Bytedance Network Technology Co., Ltd. Matrix derivation in intra coding mode
US11457207B2 (en) 2019-04-16 2022-09-27 Beijing Bytedance Network Technology Co., Ltd. Matrix derivation in intra coding mode
US11546633B2 (en) 2019-05-01 2023-01-03 Beijing Bytedance Network Technology Co., Ltd. Context coding for matrix-based intra prediction
WO2020221373A1 (en) * 2019-05-01 2020-11-05 Beijing Bytedance Network Technology Co., Ltd. Matrix-based intra prediction using filtering
US11463729B2 (en) 2019-05-01 2022-10-04 Beijing Bytedance Network Technology Co., Ltd. Matrix-based intra prediction using filtering
US11659185B2 (en) 2019-05-22 2023-05-23 Beijing Bytedance Network Technology Co., Ltd. Matrix-based intra prediction using upsampling
US11451784B2 (en) 2019-05-31 2022-09-20 Beijing Bytedance Network Technology Co., Ltd. Restricted upsampling process in matrix-based intra prediction
US11943444B2 (en) 2019-05-31 2024-03-26 Beijing Bytedance Network Technology Co., Ltd. Restricted upsampling process in matrix-based intra prediction
US11805275B2 (en) 2019-06-05 2023-10-31 Beijing Bytedance Network Technology Co., Ltd Context determination for matrix-based intra prediction
US11606570B2 (en) 2019-10-28 2023-03-14 Beijing Bytedance Network Technology Co., Ltd. Syntax signaling and parsing based on colour component
JP7538243B2 (en) 2020-04-02 2024-08-21 フラウンホッファー-ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ MIP for all channels for 4:4:4 chroma format and single tree
CN113706642A (en) * 2021-08-31 2021-11-26 北京三快在线科技有限公司 Image processing method and device

Also Published As

Publication number Publication date
CN110708559B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN110708559B (en) Image processing method, device and storage medium
CN109345485B (en) Image enhancement method and device, electronic equipment and storage medium
CN109859144B (en) Image processing method and device, electronic equipment and storage medium
CN110536168B (en) Video uploading method and device, electronic equipment and storage medium
EP3118847A1 (en) Image displaying method, image displaying device, computer program and recording medium
US11222235B2 (en) Method and apparatus for training image processing model, and storage medium
CN109547782A (en) MPM candidate list construction method, device, electronic equipment and storage medium
CN110796012B (en) Image processing method and device, electronic equipment and readable storage medium
CN115052150A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN111953903A (en) Shooting method, shooting device, electronic equipment and storage medium
CN109120929B (en) Video encoding method, video decoding method, video encoding device, video decoding device, electronic equipment and video encoding system
CN105392056B (en) The determination method and device of television situation pattern
CN107527072B (en) Method and device for determining similar head portrait and electronic equipment
CN110611820A (en) Video coding method and device, electronic equipment and storage medium
CN117956145A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN112331158B (en) Terminal display adjusting method, device, equipment and storage medium
CN107730443B (en) Image processing method and device and user equipment
CN112750081A (en) Image processing method, device and storage medium
CN109660794B (en) Decision method, decision device and computer readable storage medium for intra prediction mode
CN114339022A (en) Camera shooting parameter determining method and neural network model training method
CN107451972B (en) Image enhancement method, device and computer readable storage medium
CN112188095B (en) Photographing method, photographing device and storage medium
CN112954293B (en) Depth map acquisition method, reference frame generation method, encoding and decoding method and device
CN111225208B (en) Video coding method and device
CN111835977B (en) Image sensor, image generation method and device, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant