CN110708559A

CN110708559A - Image processing method, device and storage medium

Info

Publication number: CN110708559A
Application number: CN201910829305.6A
Authority: CN
Inventors: 张元尊; 郑云飞; 闻兴; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2020-01-17
Anticipated expiration: 2039-09-03
Also published as: CN110708559B

Abstract

The present disclosure relates to an image processing method, apparatus, and storage medium, the method comprising: aiming at each target image block to be predicted in a target image frame, acquiring reference pixel points of the target image block; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the method has the advantages of improving the offset accuracy of intra-frame prediction during image processing and further improving the accuracy of intra-frame prediction results.

Description

Image processing method, device and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, and a storage medium.

Background

In general, for an image, the luminance and chrominance values of two adjacent pixels are often relatively close, i.e. the color changes gradually, and does not change to a completely different color at a glance. Video coding is performed in order to take advantage of this correlation for compression. The intra-frame prediction is to use the correlation of the video spatial domain and the adjacent coded pixels in the same frame of image to predict the current pixel so as to achieve the purpose of effectively removing the video spatial redundancy.

The original idea of the Intra Prediction technique MIP (Matrix Weighted Intra Prediction) is derived from an Intra Prediction technique based on a neural network, that is, a multi-layer neural network is used to predict pixel values of a current image block based on adjacent pixels. However, the complexity of the prediction mode is too high, and an intra-frame prediction technology based on linear affine transformation is developed through balancing. Specifically, for a W H tile in the current image frame, the input of MIP is W pixels above this tile and H pixels to the left, and the original MIP prediction mode can be simplified by the current intra prediction mode plus an offset.

However, in the related art, MIP only provides a framework for intra-frame prediction, and does not provide an offset obtaining manner, so that accuracy of an intra-frame prediction result is affected under the condition that offset is inaccurate, and accuracy of an encoding result based on MIP intra-frame prediction is affected finally.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus and storage medium to at least solve the problem in the related art that the accuracy of an intra prediction result is affected due to an inaccurate offset. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including:

aiming at each target image block to be predicted in a target image frame, acquiring reference pixel points of the target image block;

acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;

and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.

Optionally, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.

Optionally, before the step of obtaining the prediction result of the target image block according to the reference pixel point, a preset matrix, and a pre-trained offset corresponding to the target image block, the method further includes:

obtaining coding residual errors of a plurality of training image blocks in an existing intra-frame prediction mode;

and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.

Optionally, the step of obtaining the offset according to the encoded residual, the W + n pixel points above the training image block, the H + m pixel points on the left side of the training image block, and the j pixel points on the upper left corner of the training image block includes:

training a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;

and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode.

training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold;

and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block.

Optionally, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

the reference pixel point acquisition module is configured to execute the steps of acquiring reference pixel points of target image blocks aiming at each target image block to be predicted in a target image frame;

the intra-frame prediction module is configured to execute the prediction of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block;

and the predicted image frame acquisition module is configured to acquire a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.

Optionally, the apparatus further comprises:

a coding residual obtaining module configured to obtain coding residuals of a plurality of the training image blocks by an existing intra prediction manner;

and the offset acquisition module is configured to acquire the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.

Optionally, the offset obtaining module includes:

a first model training sub-module configured to perform training of a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner, and an intra-frame prediction mode of the training image block until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold;

a first offset obtaining sub-module configured to perform an operation of taking an output of the machine learning model as an offset corresponding to the intra prediction mode.

Optionally, the offset obtaining module includes:

a second model training sub-module configured to train a preset machine learning model through a coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, and j pixel points at the upper left corner until a second norm between an output of the machine learning model and the coding residual meets a second preset threshold;

and the second offset acquisition sub-module is configured to execute the output of the machine learning model as the offset corresponding to all the intra-frame prediction modes corresponding to the training image block.

According to a third aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any one of the image processing methods as described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an image processing apparatus, enable the image processing apparatus to perform any one of the image processing methods as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when the instructions in the storage medium are executed by a processor of an image processing apparatus, enables the image processing apparatus to perform any one of the image processing methods as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, a reference pixel point of each target image block to be predicted in a target image frame is obtained; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the offset accuracy in intra-frame prediction is improved, the intra-frame prediction accuracy is improved, the coded data quantity is reduced, and the coding effect is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is one of the flow diagrams illustrating one method of image processing according to one exemplary embodiment.

Fig. 2 is a process diagram illustrating a MIP in accordance with an exemplary embodiment.

Fig. 3A is one of the schematic diagrams illustrating a downsampling process according to an exemplary embodiment.

Fig. 3B is a second schematic diagram illustrating a downsampling process according to an exemplary embodiment.

Fig. 4 is a diagram illustrating a difference process for MIP according to an exemplary embodiment.

FIG. 5 is a second flowchart illustrating a method of image processing according to an exemplary embodiment.

FIG. 6 is a diagram illustrating a reference pixel of a test image block according to an exemplary embodiment.

Fig. 7 is one of block diagrams illustrating an image processing apparatus according to an exemplary embodiment.

Fig. 8 is a second block diagram of an image processing apparatus according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an apparatus in accordance with an example embodiment.

FIG. 10 is a block diagram illustrating an apparatus in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment, which may include the steps of, as shown in fig. 1:

in step S11, for each target image block to be predicted in the target image frame, a reference pixel point of the target image block is obtained.

In step S12, obtaining a prediction result of the target image block according to the reference pixel point, a preset matrix, and a pre-trained offset corresponding to the target image block;

in step S13, a predicted image frame corresponding to the target image frame is obtained according to the prediction result of each target image block.

As mentioned above, for a W × H image block to be predicted in an image frame, the input of the MIP is W pixels above the image block and H pixels on the left side, as shown in fig. 2 as an example, the general framework of the MIP includes: carrying out average operation (down sampling) on W pixel points above the image block and H pixel points on the left side to obtain sampling points (4 or 8); performing matrix vector multiplication on the obtained sampling points, and adding an offset to obtain partial prediction values (for example, matrices with the sizes of 4x4, 4x8, 8x4, 8x8 and the like); and carrying out bilinear interpolation according to the partial predicted values to obtain a final brightness predicted value. Firstly, a target image block with the size of W multiplied by H is predicted, predicted reference pixel points are W pixel points above and H pixel points on the left side of the corresponding target image block, and the reference pixel points can be obtained in the same mode as the traditional intra-frame prediction. And then, the (W + H) pixels can be used for obtaining a final predicted value of the target image block through three steps of averaging, affine transformation and upsampling. And then, according to the prediction result of each target image block, obtaining a prediction image frame corresponding to the target image frame.

Therefore, in the embodiment of the present disclosure, for each target image block to be predicted in the target image frame, the reference pixel point of the target image block may also be obtained first. Specifically, assuming that the size of the target image block is W × H, W pixels above the target image block and H pixels on the left side of the target image block may also be obtained as reference pixels. In addition, in the embodiment of the present disclosure, according to different prediction requirements, the target image block may include a luminance component, a chrominance component, and the like of each pixel point, and accordingly, in a subsequent prediction process, the luminance component and/or the chrominance component of the middle pixel point of the target image block are also predicted according to the luminance component, and/or the chrominance component, and the like of the reference pixel point.

Wherein the main purpose of the reference pixel point pre-processing (averaging/down-sampling) is to normalize the size of the reference pixels. As shown in fig. 3A and 3B, the reference pixel points of the target image block of 4 × 4 may be normalized to 4 pixels, and the reference pixel points of the target image block of other cases may be normalized to 8 pixels, that is, to the input long reference boundary pixel bdry^topAnd bdry^leftConverting into short boundary reference pixel bdry according to coding unit size^top _redAnd bdry^left _redSo as to reduce the calculation amount and the storage space of the model parameters in the prediction process. Then, the average converted left and upper short reference pixels are spliced (concat) into a vector bdry_red。

The MIP prediction pixels are generated by matrix weighting the averaged reference pixels and then adding an offset, i.e. a linear affine transformation. Predicted value is eyeDownsampled prediction signal pred of target prediction value_redOf size W_red×H_red. The process of affine transformation generating a prediction signal can be represented as a_K*bdry_red+b_K＝pred_redWherein A is_KAnd b_KRespectively, a matrix and an offset corresponding to the intra prediction mode k trained in advance.

The down-sampled prediction signal of the target image block is generated by linear affine transformation, and the residual prediction value of the target image block can be obtained by pre_redAny available linear interpolation is performed. Depending on the size of the target image block, horizontal difference, vertical interpolation or interpolation in both directions may be required, and if interpolation is required in both directions, if W of the target image block<H, performing interpolation in the horizontal direction first and then performing interpolation in the vertical direction; otherwise, the vertical interpolation is firstly carried out, and then the horizontal interpolation is carried out. Fig. 4 is a schematic diagram illustrating a process of linear interpolation based on a downsampled prediction signal of an 8 × 8 target image block.

Wherein one A_KAnd a b_KConstitute a MIP mode. The mode of MIP intra prediction can be subdivided into the following intra prediction modes according to the size of the image block:

group 0:4x4, comprising 35 MIP intra prediction modes;

tiles of Group 1:8x4, 4x8, 8x8, containing 19 MIP intra prediction modes;

group 2. for other tiles, 11 MIP intra prediction modes are included.

However, in practical application, a corresponding to different MIP intra-frame prediction modes accurately is not defined_KAnd b_KTherefore, the prediction result is inaccurate under the condition of inaccurate offset setting, and the coding and decoding effects of the image frame are further influenced. The offset may be fixed according to a different block size in each current intra prediction mode, or may be adaptively calculated according to the current prediction mode and the block size.

MIP intra prediction as described above takes only W pixels above the image block and H pixels to the left. However, the relatively close pixels have certain relevance, and the intra-frame prediction can be better performed by taking a few more pixels. Therefore, in the embodiment of the present disclosure, in order to improve the accuracy of the offset set in the MIP intra-frame prediction, the offset of the MIP intra-frame prediction may be obtained through pre-training, and then the prediction result of the target image block is obtained according to the reference pixel point, the preset matrix, and the pre-trained offset corresponding to the target image block. The offset training method may be preset according to a requirement, and the embodiment of the present disclosure is not limited thereto.

For example, the intra-frame prediction offset may be obtained by training according to a training image block obtained by the existing intra-frame prediction method to obtain a coding residual and training reference pixel points corresponding to the corresponding training image block. The training reference pixel points corresponding to the training image blocks can be set according to requirements, and the embodiment of the disclosure is not limited.

Optionally, in this embodiment of the present disclosure, the offset is obtained by training a training image block obtained by obtaining a coding residual through an existing intra-frame prediction method, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.

The specific values of n, m, and j may be preset according to requirements, and the embodiment of the present disclosure is not limited. In addition, in the embodiment of the present disclosure, n pixel points among the upper W + n pixel points may not be in the same row as the W pixel points, and the W + n pixel points may also be in the same row; correspondingly, m pixel points among the left H + m pixel points may not be in the same row with the H pixel points, and also may be in the same row with the H + m pixel points, and specifically, the m pixel points may be preset according to requirements, and the embodiment of the present disclosure is not limited. The coding residual of the training image block can be understood as the difference between the initial matrix of the training image block and the prediction result of the training image block obtained by the existing intra-frame prediction mode.

For example, taking the upper W + n pixels as an example, the W pixels may be the W pixels P above the target image block and adjacent to the top W internal pixels in the target image block_WAnd the other n pixel points can be in P_WAbove and with P_WAny n adjacent pixel points; or, W + n pixel points may be located in a pixel row above and adjacent to the target image block, and W pixel points may be W pixel points P above and adjacent to the top W internal pixel points in the target image block_WIn addition, n pixel points can be in P_WOn the extension line of (a).

Moreover, in the embodiment of the present disclosure, the relationship between the offset and the coding residual of each training image block, the W + n pixel points above, the H + m pixel points on the left side, and the j pixel points on the upper left corner may be preset according to the requirement, and the embodiment of the present disclosure is not limited.

For example, a preset machine learning model can be trained by the coding residuals of a plurality of training image blocks, W + n pixel points above, H + m pixel points on the left side, and j pixel points on the upper left corner, and then the training result is used as an offset. Furthermore, depending on the machine learning model set up, the training results may be one or more values, and accordingly one or more offsets may be obtained. Moreover, when the same image block size corresponding to a plurality of offsets is obtained, in the subsequent application process, prediction can be performed according to each offset, and the best effect is taken as the final prediction result.

In the embodiment of the disclosure, a reference pixel point of each target image block to be predicted in a target image frame is obtained; acquiring a prediction result of the target image block according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block; and acquiring a predicted image frame corresponding to the target image frame according to the prediction result of each target image block. Therefore, the offset accuracy of the MIP is improved, the intra-frame prediction accuracy is improved, the coding data quantity is reduced, and the coding effect is improved.

Moreover, the offset is obtained by training a training image block which is obtained by obtaining a coding residual through the existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block. The offset accuracy of intra prediction and the accuracy of the intra prediction result can be further improved.

Referring to fig. 5, in the embodiment of the present disclosure, before step S12, the method may further include:

and step S14, obtaining the coding residuals of the plurality of training image blocks by the existing intra-frame prediction mode.

Step S15, obtaining the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block.

As described above, in the embodiment of the present disclosure, the offset at the time of intra prediction of the target image block may be acquired in advance. Further, as described above, tiles of different sizes correspond to different MIP intra prediction modes, and the different MIP intra prediction modes correspond to respective offsets.

Therefore, in the embodiment of the present disclosure, in order to obtain an offset when performing intra prediction on a target image block, a selected training image block may be the same as the size of the target image block, and of course, in the embodiment of the present disclosure, the size of the training image block may also be different from the size of the target image block, which is not limited in the embodiment of the present disclosure.

Firstly, the coding residuals of a plurality of training image blocks need to be obtained by the existing intra prediction mode. The conventional intra prediction method may be any available conventional intra prediction method, and may be preset according to a requirement, which is not limited in this disclosure. Moreover, the intra prediction method used by each training image block in the training process is the same, and may also be different, and the embodiment of the present disclosure is not limited thereto.

The W + n pixel points above the training image block can be understood as the W + n pixel points above the training image block in the image frame where the training image block is located, correspondingly, the H + m pixel points on the left side of the training image block can be understood as the H + m pixel points on the left side of the training image block in the image frame where the training image block is located, and the j pixel points on the upper left corner of the training image block can be understood as the j pixel points on the upper left corner of the training image block in the image frame where the training image block is located.

Optionally, in an embodiment of the present disclosure, the step S15 further may include:

step A1, training a preset machine learning model according to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and the intra-frame prediction mode of the training image block until the two norms between the output of the machine learning model and the coding residual meet a first preset threshold.

And a step a2 of using the output of the machine learning model as the offset corresponding to the intra prediction mode.

As described above, in practical applications, there are 65 kinds of MIP intra prediction modes according to different tile sizes, and each MIP intra prediction mode corresponds to a pair of matrices and offsets. That is, the target image block may also correspond to a plurality of different MIP intra prediction methods.

Therefore, in the embodiment of the present disclosure, in order to improve the accuracy of the offset, offsets corresponding to different MIP intra prediction modes can be obtained through training in advance. Then the training data in training the machine learning model at this time may also include the intra-prediction mode of the training image block, i.e., the MIP intra-prediction mode.

For example, if we want to train the offsets of 35 MIP intra prediction modes corresponding to 4 × 4 image blocks, we can obtain multiple 4 × 4 training image blocks for each MIP intra prediction mode, train a machine learning model with W + n pixel points above each training image block, H + m pixel points on the left side, j pixel points on the upper left corner, and a corresponding MIP intra prediction mode, and thus obtain the offsets corresponding to the corresponding MIP intra prediction modes. Also, the MIP intra-prediction mode that is the training data of the machine learning model may be an identification of the corresponding MIP intra-prediction mode, for example, if the MIP intra-prediction mode that is currently used as the training data is the MIP intra-prediction mode type 5, then "5" may be used as the training data corresponding to the corresponding MIP intra-prediction mode.

Moreover, in the embodiment of the present disclosure, the offsets of the MIP intra prediction modes corresponding to image blocks of other sizes than the target image block may also be trained in advance. Then, a plurality of training image blocks corresponding to each MIP intra-frame prediction mode can be respectively obtained, and further, training is respectively performed, so as to obtain an offset corresponding to each MIP intra-frame prediction mode. Moreover, a plurality of offsets corresponding to a certain MIP intra-frame prediction mode may exist in the obtained training result, and in the subsequent application process, prediction may be performed based on each offset, and an optimal value in the prediction result may be taken. Or, there may be no offset corresponding to a certain MIP intra prediction mode obtained through training, and then, the offset of another MIP intra prediction mode corresponding to the image block with the same size may be referred to for prediction.

In practical application, the coding residual d is a difference between the original coding block X and a prediction result X 'obtained by the conventional intra prediction method, that is, X ═ X' + d, and MIP prediction result pred ═ a_K*bdry_red+b_KAnd the original MIP prediction mode can be simplified by adding an offset to the existing intra prediction mode, i.e. the above A_K*bdry_redWhen the current intra prediction mode is determined as the prediction result X ', X-pred ═ X ' + d- (X ' + b) is determined at this time_k)＝d-b_kI.e. if offset b is_kCloser to the braidThe code residual d, the closer the MIP prediction result is to the original coding block, the smaller the data amount of the coding residual based on the MIP prediction, and the smaller the data amount which needs to be coded subsequently.

Therefore, in the embodiment of the present disclosure, a first preset threshold may be set for a distance, that is, a second norm, between a training result output by the machine learning model and a coding residual obtained based on an existing intra-frame coding mode, and when the second norm between the output of the machine learning model and the coding residual of a corresponding training image satisfies the first preset threshold, a current output of the machine learning model may be used as an offset corresponding to the currently trained intra-frame prediction mode. The first preset threshold may be preset according to a requirement, and the embodiment of the present disclosure is not limited thereto.

and B1, training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold.

And B2, using the output of the machine learning model as the offset corresponding to all intra-frame prediction modes corresponding to the training image block.

In addition, in this embodiment of the present disclosure, when training offset, different MIP intra-frame prediction modes may not be distinguished, so that the training data at this time may include a coding residual of a training image block, where W + n pixel points above the training image block, H + m pixel points on the left side, and j pixel points on the upper left corner, and may not include the MIP intra-frame prediction mode of the training image block, and then a preset machine learning model may be trained by the training data of a plurality of training image blocks until two norms between an output of the machine learning model and the coding residual of a corresponding training image block satisfy a second preset threshold, and then the current output of the machine learning model may be used as an offset corresponding to all intra-frame prediction modes corresponding to the corresponding training image block.

The second preset threshold may be the same as or different from the first preset threshold, and may be preset according to a requirement, which is not limited in this embodiment of the present disclosure.

As described above, the MIP intra prediction modes are different from each other according to the sizes of the patches, but if the intra prediction mode of each training patch is not input when the machine learning model is trained, and the size of each training patch belongs to the same Group (Group), the training result of the machine learning can be used as the offset of each MIP intra prediction mode included in the Group corresponding to the training patch.

For example, in the embodiment of the present disclosure, the offsets of 35 MIP intra prediction modes corresponding to Group 0 can be obtained through training of 4 × 4 training image blocks; training to obtain 19 offsets of MIP intra-frame prediction modes corresponding to Group 1 through 8 × 4, and/or 4 × 8, and/or 8 × 8 training image blocks; and training the training image blocks with other sizes to obtain 11 MIP intra-frame prediction modes corresponding to the Group 2.

Of course, in the embodiment of the present disclosure, an offset obtained by one training may also be used as an offset of each MIP intra-prediction mode corresponding to all packets, and the training image block at this time may include training image blocks of different size types in the above three packets, or may only include training image blocks of different size types in a part of packets, which may be preset according to requirements, and the embodiment of the present disclosure is not limited thereto.

The machine learning models can be set according to requirements, and the same or different machine learning models can be adopted when training offsets corresponding to different sizes or different MIP intra-frame prediction modes, which does not limit the embodiments of the present disclosure.

Optionally, in this disclosure, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the target image block.

In the embodiment of the present disclosure, in order to balance the accuracy and the training efficiency of the offset obtained by training, for each training pixel block, an integer multiple of the length of each training pixel block and an integer multiple of the width of each training pixel block may be taken as reference pixel points, and since the original MIP intra-frame prediction takes a pixel point that is one time as long as an image block and a pixel point that is one time as wide as a reference pixel point, in the process of training offset, more reference pixel points may be taken with respect to the original MIP intra-frame prediction, and in addition, since the reference value of a pixel point closer to an image block is larger, in the embodiment of the present disclosure, it is preferable that the value of n is the same as the width W of a corresponding training image block, the value of m is the same as the height H of a corresponding training image block, and. Also can get 2W pixel points of training image piece top, 2H pixel points on the left side of training image piece, 1 pixel point of training image piece upper left corner, as the reference pixel point in the training data, and 2W pixel points of training image piece top are located same pixel line, and 2H pixel points on the left side of training image piece are located same pixel row, and corresponding pixel line and pixel row all are adjacent with training image piece.

Fig. 6 is a schematic diagram of a reference pixel of a training image block. Wherein, light grey region indicates for training image piece, and the pixel that the width that dark region in training image piece top shows is W and the extension that dark region right side length is W are 2W pixel of training image piece top, and the pixel that the height that dark region in training image piece left side shows is H and the extension that dark region below height is H are 2H pixel of training image piece left side. In addition, there are 1 pixel point in the upper left corner of the training image block.

Therefore, when the W + n pixel points above the training image block are obtained, the W pixel points are the W pixel points adjacent to the top pixel point in the training image block, and the other n pixel points can be the n pixel points which are positioned on the same line with the W pixel points and adjacent to the right side of the W pixel points; correspondingly, when H + m pixel points on the left side of the training image block are obtained, H pixel points are H pixel points adjacent to the leftmost pixel point in the training image block, and the other m pixel points can be m pixel points which are located in the same column as the H pixel points and adjacent to the lower portion of the H pixel points.

In the embodiment of the present disclosure, the coding residuals of a plurality of training image blocks may be obtained in an existing intra prediction manner; and acquiring the offset according to the coding residual, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block. According to the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side, j pixel points at the upper left corner and an intra-frame prediction mode of the training image block, training a preset machine learning model until a second norm between the output of the machine learning model and the coding residual meets a first preset threshold; and taking the output of the machine learning model as the corresponding offset of the intra-frame prediction mode. And/or training a preset machine learning model through the coding residual of the training image block, W + n pixel points above the training image block, H + m pixel points on the left side and j pixel points at the upper left corner until a second norm between the output of the machine learning model and the coding residual meets a second preset threshold; and taking the output of the machine learning model as the corresponding offset of all intra-frame prediction modes corresponding to the training image block. Therefore, the accuracy of the offset during the intra-frame prediction can be further improved, and the accuracy of the intra-frame prediction can be improved.

In addition, in the embodiment of the present disclosure, a value of n is the same as that of W, a value of m is the same as that of H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block. The data size of model training can be reduced while the accuracy of migration is improved, and therefore the efficiency of migration training is improved.

Fig. 7 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus includes a reference pixel point obtaining module 21, an intra prediction module 22, and a predicted image frame obtaining module 23.

The reference pixel point obtaining module 21 is configured to perform, for each target image block to be predicted in the target image frame, obtaining a reference pixel point of the target image block.

And the intra-frame prediction module 22 is configured to perform prediction according to the reference pixel point, a preset matrix and a pre-trained offset corresponding to the target image block to obtain a prediction result of the target image block.

And the predicted image frame obtaining module 23 is configured to obtain a predicted image frame corresponding to the target image frame according to the prediction result of each target image block.

The offset is obtained by training a training image block which is obtained by obtaining a coding residual through an existing intra-frame prediction mode, W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block. Referring to fig. 8, the image processing apparatus may further include:

and the coding residual acquiring module 24 is configured to acquire the coding residuals of the plurality of training image blocks by using the existing intra-frame prediction mode.

And an offset obtaining module 25 configured to obtain the offset according to the encoded residual, the W + n pixel points above the training image block, the H + m pixel points on the left side of the training image block, and the j pixel points on the upper left corner of the training image block.

Optionally, in this embodiment of the present disclosure, the offset obtaining module 25 may further include:

Optionally, in this disclosure, a value of n is the same as W, a value of m is the same as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an apparatus 300 for image processing according to an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 9, the apparatus 300 may include one or more of the following components: a processing component 302, a memory 304, a power component 306, a multimedia component 308, an audio component 310, an input/output (I/O) interface 312, a sensor component 314, and a communication component 316.

The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.

The memory 304 is configured to store various types of data to support operations at the device 300. Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 306 provides power to the various components of the device 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 300.

The multimedia component 308 includes a screen that provides an output interface between the device 300 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.

The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, sensor assembly 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of apparatus 300, the change in position of apparatus 300 or a component of apparatus 300, the presence or absence of user contact with apparatus 300, the orientation or acceleration/deceleration of apparatus 300, and the change in temperature of apparatus 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The apparatus 300 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 10 is a block diagram illustrating an apparatus 400 for image processing according to an example embodiment. For example, the apparatus 400 may be provided as a server. Referring to fig. 10, apparatus 400 includes a processing component 422, which further includes one or more processors, and memory resources, represented by memory 432, for storing instructions, such as applications, that are executable by processing component 422. The application programs stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processing component 422 is configured to execute instructions to perform the image processing method described above.

The apparatus 400 may also include a power component 426 configured to perform power management of the apparatus 400, a wired or wireless network interface 450 configured to connect the apparatus 400 to a network, and an input output (I/O) interface 458. The apparatus 400 may operate based on an operating system stored in the memory 432, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the offset is obtained by training a training image block obtained by an existing intra prediction method to obtain a coded residual, and W + n pixel points above the training image block, H + m pixel points on the left side of the training image block, and j pixel points on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.

3. The method according to claim 2, wherein before the step of obtaining the prediction result of the target image block according to the reference pixel, a preset matrix, and a pre-trained offset corresponding to the target image block, the method further comprises:

4. The method of claim 3, wherein the step of obtaining the offset according to the encoded residual, and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels at the top left corner of the training image block comprises:

5. The method of claim 3, wherein the step of obtaining the offset according to the encoded residual, and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels at the top left corner of the training image block comprises:

6. The method according to any one of claims 2 to 5, wherein n has the same value as W, m has the same value as H, and j is 1; w + n pixel points above the training image block are located in the same pixel row, H + m pixel points on the left side of the training image block are located in the same pixel column, and the pixel row and the pixel column are both adjacent to the training image block.

7. An image processing apparatus characterized by comprising:

8. The apparatus of claim 7, wherein the offset is obtained from a training image block obtained by an existing intra prediction method and W + n pixels above the training image block, H + m pixels on the left side of the training image block, and j pixels on the upper left corner of the training image block; w is the width of the training image block, H is the height of the training image block, n, m and j are positive integers, and the size of the training image block is the same as that of the target image block.

9. An image processing apparatus characterized by comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image processing method of any one of claims 1 to 6.

10. A storage medium in which instructions, when executed by a processor of an image processing apparatus, enable the image processing apparatus to perform the image processing method according to any one of claims 1 to 6.