CN118368443B

CN118368443B - Transformation method applied to video and image processing

Info

Publication number: CN118368443B
Application number: CN202410776166.6A
Authority: CN
Inventors: 张坤城; 卓长建; 夏杭杰; 夏阳; 赵鹏飞
Original assignee: Shenzhen Banner Cloud Intelligence Technology Co ltd
Current assignee: Shenzhen Banner Cloud Intelligence Technology Co ltd
Priority date: 2024-06-17
Filing date: 2024-06-17
Publication date: 2024-08-30
Anticipated expiration: 2044-06-17
Also published as: CN118368443A

Abstract

The invention relates to a transformation method applied to video and image processing, which comprises the steps of constructing and loading a video and image self-adaptive error prediction model, wherein the video and image self-adaptive error prediction model is trained based on video and image historical transformation data; pre-processing the input video and image data, ready for input to the DCT/IDCT transform basic step; performing the basic steps of DCT/IDCT transformation while recording key intermediate variables in each stage of transformation; running a self-adaptive error prediction model, and predicting an error range of each step according to the current transformation characteristics and historical data; after all transformation steps are completed, post-processing operation is carried out, post-processing is carried out after positive transformation, integer matrix and right shift operation are applied, shift parameters are fixed, and output is adjusted through addition, multiplication and functions.

Description

Transformation method applied to video and image processing

Technical Field

The invention belongs to the field of video processing, and particularly relates to a transformation method applied to video and image processing.

Background

Video and image processing technology plays a vital role in modern communication, entertainment, monitoring, medical diagnostics and many scientific research fields. With the rapid development of the digitization age, the demands for high-efficiency and high-quality video and image encoding and decoding are increasing, which requires processing algorithms to be capable of maintaining higher compression efficiency without sacrificing excessive visual quality. In many image and video compression technologies, discrete Cosine Transform (Discrete Cosine Transform, DCT) and its inverse Transform (INVERSE DISCRETE Cosine Transform, IDCT) are a core component of a series of international standards such as JPEG, MPEG, etc. due to their excellent energy concentrating properties.

However, the conventional DCT/IDCT method faces several significant technical challenges in practical applications. First, DCT is prone to accumulate errors during the transition from the spatial domain to the frequency domain, especially in successive multi-level transform operations, which mainly result from quantization processes, limited precision numerical operations (e.g., addition, subtraction, shifting operations), and algorithmically implemented approximations. These accumulated errors can significantly reduce the visual quality and compression efficiency of the reconstructed image, especially in application scenarios that are sensitive to image details and edges. Second, conventional DCT/IDCT algorithms often lack adaptivity, i.e., cannot be flexibly adjusted to achieve optimal performance in the face of different types of video content (e.g., high dynamic range video, video in low light conditions, or video containing fast motion scenes) and image quality requirements.

Furthermore, the processing of accumulated errors in the prior art mostly relies on post-processing techniques such as deblocking filtering, which tend to be a passive countermeasure to the errors rather than active prevention and accurate correction. Therefore, it is particularly urgent to develop a novel DCT/IDCT method that can actively predict and compensate errors during transformation while maintaining the complexity of the algorithm controllable.

Disclosure of Invention

The present invention is directed to a transformation method applied to video and image processing, so as to solve the problems set forth in the background art.

In order to solve the technical problems, the invention provides the following technical scheme:

The transformation method applied to video and image processing comprises the steps of constructing and loading a video and image self-adaptive error prediction model, wherein the video and image self-adaptive error prediction model is trained based on historical transformation data of the video and image; pre-processing the input video and image data, ready for input to the DCT/IDCT transform basic step; performing the basic steps of DCT/IDCT transformation while recording key intermediate variables in each stage of transformation;

running a self-adaptive error prediction model, and predicting an error range of each step according to the current transformation characteristics and historical data;

After all transformation steps are completed, post-processing operation is carried out, post-processing is carried out after positive transformation, integer matrix and right shift operation are applied, shift parameters are fixed, and output is adjusted through addition, multiplication and functions.

Further, the construction of the video and image adaptive error prediction model is specifically as follows: collecting a plurality of DCT/IDCT transformation data from historical video and image processing instances, the video and image data comprising intermediate results of the transformation, final output, and error measurements;

extracting features of video and image data, wherein the features of the video and image data comprise pixel differences before and after transformation;

normalizing the video and image data characteristics to eliminate dimension influence;

Selecting a polynomial regression model to construct a video and image self-adaptive error prediction model, and estimating parameters of the video and image self-adaptive error prediction model by using a least square method to ensure that the video and image self-adaptive error prediction model can be optimally fitted with error distribution in training data;

Cross-verifying the video and image self-adaptive error prediction model, dividing the video and image processing data into a training set and a verification set, and adjusting the video and image self-adaptive error prediction model according to the cross-verification result, and adjusting parameters of an optimization algorithm until satisfactory prediction performance is obtained; the optimized mathematical model is converted into an algorithm code, so that the algorithm code is convenient to load and call in the actual video and image processing flow.

Further, the video and image adaptive error prediction model is:

the corresponding least square parameter estimation formula is:

wherein P is a design matrix whose ith behavior E is the error vector.

Further, the pretreatment flow comprises the following steps:

Adjusting the video frames and the images to the size of target processing, ensuring the frame rate of all frames to be consistent for the video in the video and image data, and carrying out necessary frame inserting or frame extracting processing; the denoising algorithm is applied to reduce image noise in video and image data, and gamma correction is carried out on the image by adjusting the brightness and contrast of the image, and meanwhile, the data distribution is optimized, so that the image is more suitable for DCT transformation; dividing the video frame and image into a plurality of blocks of fixed format pixels;

For a video sequence, inter-frame alignment is carried out, so that the motion consistency between continuous frames is ensured, and the misprediction caused by motion is reduced; the image data is converted to the required precision format, such as integer or floating point, to match the requirements of the DCT/IDCT algorithm.

Further, the error range of each step is predicted according to the current transformation characteristics and the historical data specifically as follows: extracting features related to transformation from a video frame or image block to be processed currently, normalizing data, combining historical transformation data, picking up a historical case closest to the current data based on feature similarity, taking the historical case as a reference basis for error prediction, determining similarity by adopting distance measurement, training a polynomial regression model by using the integrated historical data and features, wherein the model aims at learning an error pattern in the historical data and predicting error distribution possibly occurring in the current transformation step,

Taking the normalized characteristics of the current data as input, and sending the normalized characteristics into a trained error prediction model;

Error range prediction: the model outputs a possible error range or a specific error value for each step of the current transformation according to the input characteristics, analyzes the predicted error range, determines the transformation step or intermediate variable with the most obvious influence on the final output, preferentially performs error compensation, dynamically formulates a compensation strategy based on the predicted error, and comprises but is not limited to adjusting coefficients, directly adding compensation values at specific positions or adjusting calculation sequences.

Further, the DCT/IDCT transformation basic steps comprise performing one-dimensional inverse transformation on the video and the image, selecting integer operation, and calculating the video and the image by using multiplication coefficients to obtain one-dimensional inverse transformed video and image output data;

performing inverse transformation post-treatment, and performing right shift operation on the one-dimensional inverse transformation result;

Preprocessing the forward transformation of the video and the image, performing left shift operation on the input video and the image data, and fixing shift parameters;

And carrying out one-dimensional forward transformation on the video and the image, selecting integer operation, calculating the video and the image by using multiplication coefficients, and obtaining output data of the one-dimensional forward transformation in the opposite direction to the one-dimensional reverse transformation.

The beneficial effects are that: the transformation method applied to video and image processing, provided by the application, realizes effective control of accumulated errors in the DCT/IDCT transformation process by introducing the self-adaptive error prediction model and the dynamic compensation mechanism, and remarkably enhances the quality and efficiency of video and image processing, and the specific technical effects are as follows:

Remarkably improves visual quality and compression efficiency: the error range of each transformation step is accurately predicted through the self-adaptive error prediction model, and error compensation is pertinently implemented, so that accumulated errors common in multi-stage transformation are effectively restrained, the visual quality of a reconstructed video and an image is remarkably improved, meanwhile, higher compression efficiency is maintained, and information loss is reduced.

Enhancing the adaptability and robustness of the algorithm: the method can dynamically adjust the compensation strategy according to the characteristics of the input video and the image, can flexibly adapt to the high dynamic range video, the low illumination environment or the fast motion scene, ensures that the optimal processing effect can be achieved under various conditions, and improves the generality and the robustness of the algorithm.

Optimizing computing resources and processing speed: although the error prediction and compensation steps are added, due to the adoption of efficient algorithms such as a polynomial regression model, unnecessary calculation is reduced through preprocessing and intelligent adjustment of intermediate variables, the overall calculation complexity is controlled, the high efficiency of the processing flow is maintained, and the method is beneficial to real-time video processing and application of a large-scale data set.

Intelligent feedback and continuous optimization: through an intelligent feedback mechanism, model parameters and a compensation strategy are continuously adjusted according to the deviation between actual output and an expected result, continuous self-optimization of algorithm performance is realized, stability and accuracy of long-term operation are ensured, and the requirement of manual intervention is reduced.

Promoting standardization and compatibility: the technical scheme optimizes on the basis of not changing the existing DCT/IDCT basic architecture, ensures the compatibility with the existing video coding and decoding standards (such as JPEG, MPEG and the like), is convenient for seamless integration and application in the prior art framework, and accelerates the popularization and popularization of the technology.

Drawings

Fig. 1 is a flow chart of a transformation method applied to video and image processing.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The application discloses a transformation method applied to video and image processing, referring to fig. 1, comprising the steps of: s1, constructing and loading a video and image self-adaptive error prediction model, wherein the video and image self-adaptive error prediction model is trained based on video and image historical transformation data, and comprises, but is not limited to, linear testing and random number generation testing data sets; the construction of the video and image self-adaptive error prediction model is specifically as follows: collecting a plurality of DCT/IDCT transformation data from historical video and image processing instances, the video and image data comprising intermediate results of the transformation, final output, and error measurements;

The general error e can be predicted from a polynomial combination of m features in the form of:

Where d is the highest order of the polynomial, Is the total number of model parameters (including constant terms),

Is the model parameter to be solved.

Model parameters are estimated using least squares, i.e. minimizing the sum of squares of residuals:

where N is the number of samples, Is the value of the ith sample at the jth feature polynomial term (including the individual order and interaction terms of X _{i_norm}).

The data set is divided into a training set and a validation set, and after using the training set parameter estimates, model performance, such as Mean Square Error (MSE) or decision coefficients, is estimated on the validation set. According to the verification result, the polynomial order d and the model parameter beta are adjusted through a grid search, random search or gradient descent method until the model performance meets the preset standard.

Specific mathematical formula example

Two features are considered in the applicationThe second order polynomial model of (3) without considering the interaction term, the model is:

the corresponding least square parameter estimation formula is:

wherein P is a design matrix whose ith behavior E is the error vector.

S2, preprocessing the input video and image data, and preparing to be input to a DCT/IDCT transformation basic step; the DCT/IDCT transformation basic steps include: performing one-dimensional inverse transformation on the video and the image, selecting integer operation, and calculating multiplication coefficients of the video and the image to obtain one-dimensional inverse transformed video and image output data;

The detailed pretreatment flow in S2:

adjusting the video frames and images to the size of the target process, typically a multiple of 8, to ensure compatibility with the DCT/IDCT block size;

For the video in the video and image data, ensuring that the frame rate of all frames is consistent, and carrying out necessary frame inserting or frame extracting treatment; a denoising algorithm (such as median filtering, bilateral filtering or non-local mean denoising) is applied to reduce image noise in video and image data, and the signal-to-noise ratio is improved;

The brightness and the contrast of the image are adjusted, so that the overall visual effect of the image is more uniform, and the details of dark parts and bright parts are enhanced; gamma correction is carried out on the image, so that display consistency on different devices is ensured, and meanwhile, data distribution is optimized to be more suitable for DCT transformation; dividing the video frames and images into blocks of pixels of a fixed format (e.g., 8 x 8);

Through the preprocessing step, the input video and image data are optimized to the state most suitable for DCT/IDCT transformation, and meanwhile, high-quality input is provided for the self-adaptive error prediction model, so that the high efficiency and accuracy of the whole transformation process are ensured.

S3, performing DCT/IDCT transformation basic steps, and simultaneously recording key intermediate variables in each level of transformation;

In the present solution, the "key intermediate variables" refer to variables that are generated during the execution of the DCT/IDCT transform and have a direct influence on the final transform result. These variables are typically temporary stored data during execution of the transformation algorithm that reflect the critical state of the data change before and after the transformation. For 8x8 Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT), key intermediate variables include, but are not limited to, the following:

key intermediate variables in DCT transforms

Coefficient matrix in performing the DCT transform, a set of fixed orthogonal transform coefficient matrices is first used, which define how to transform from the spatial domain to the frequency domain. While these are not intermediate variables in the traditional sense, they are the core of the transform computation, and subsequent adaptive error compensation involves fine tuning of these coefficients.

Frequency domain coefficients-coefficient matrices generated after each 8x8 pixel block is subjected to DCT transformation, in particular, direct current components (DC coefficients) and alternating current components (AC coefficients), which are key variables directly reflecting the transformation result.

Quantization results quantized coefficients are also important intermediate variables if quantization operations are performed, since quantization processes introduce non-linear errors that need to be taken into account by subsequent error compensation mechanisms.

Key intermediate variables in IDCT transforms

Inverse quantization coefficients-if quantized after DCT, the coefficients or quantization table used in the inverse quantization process are key variables that determine the degree of recovery of the inverse quantized data.

Inverse transform coefficients the IDCT process likewise uses a set of fixed inverse transform coefficient matrices, which are the basis for recovering the spatial domain signal.

Reconstructing a matrix of pixels that are inverse IDCT transformed but not yet post-processed (e.g., deblocking), the pixel values being directly related to image quality.

Intermediate variable adjustment involved in adaptive error compensation

Prediction error-potential error distribution in the predicted current transformation step according to the adaptive error prediction model. These prediction error values are the basis for adjusting the intermediate variables.

Compensation value, specific compensation value or coefficient adjustment scheme calculated for prediction error. For example, for some key coefficients (e.g., DC coefficients), a compensation value is directly added to correct the predicted error; for other coefficients, the multiplication coefficients in the transformation or inverse transformation process are adjusted.

Intermediate calculations, these are temporary variables in a particular transformation stage or calculation step, which are adjusted according to an error compensation strategy to reduce accumulated errors.

The method comprises the following specific steps

And (3) error prediction, namely predicting errors generated in each step by using an error prediction model trained by historical data on the current transformation block.

Intermediate variable identification determines which intermediate variables (e.g., specific coefficients, quantization results, etc.) have the greatest impact on the final output and are most susceptible to error accumulation.

And (3) compensation calculation, namely calculating the compensation quantity required by each key intermediate variable according to the prediction error. This involves complex computational logic such as adjusting specific elements in the DCT/IDCT coefficient matrix based on the prediction error distribution.

Compensation is performed by modifying the values of these intermediate variables directly or by adjusting the parameters of the subsequent calculation steps after each stage of the transformation, applying the calculated compensation values.

Iterative optimization, namely continuously fine-tuning an error prediction model and a compensation strategy according to the deviation of actual output and an expected result through an intelligent feedback mechanism, so as to realize continuous optimization.

S4, running an adaptive error prediction model, and predicting an error range of each step according to the current transformation characteristics and historical data;

Extracting characteristics related to transformation from a current video frame or image block to be processed, normalizing data, combining historical transformation data, picking out a historical case closest to the current data based on the characteristic similarity, taking the historical case as a reference basis of error prediction, adopting distance measurement (such as Euclidean distance and cosine similarity) to determine the similarity, training a polynomial regression model by using the integrated historical data and the characteristics, and predicting error distribution possibly occurring in the current transformation step, wherein the model aims at learning an error mode in the historical data; model parameters are adjusted through methods such as cross validation, grid search or gradient descent, so as to optimize prediction performance;

Error range prediction: the model outputs a possible error range or a specific error value for each step of the current transformation according to the input characteristics, wherein the predicted values reflect the error distribution expectation based on the historical data and the current data characteristics;

Analyzing the predicted error range, determining which transformation steps or intermediate variables have the most significant effect on the final output, and needing to perform error compensation preferentially, and dynamically formulating a compensation strategy based on the predicted error, including but not limited to adjusting coefficients, directly adding compensation values at specific positions or adjusting calculation sequences;

I.e. fine tuning the intermediate variable according to the prediction error, including adjusting coefficients or directly adding compensation values at specific positions;

Through the series of steps, the technical scheme can dynamically predict the error range accurately according to the specific condition and past experience of the current transformation, and take measures to compensate, so that accumulated errors are effectively restrained, and the quality and efficiency of video and image processing are improved;

Through the steps, the scheme not only solves the problem of precision and calculation error accumulation in the prior art, but also improves the adaptability and robustness of the algorithm through an intelligent self-adaptive mechanism, so that the method becomes a more reliable and efficient transformation method in the field of video and image processing.

The transformation method applied to video and image processing, provided by the application, realizes effective control of accumulated errors in the DCT/IDCT transformation process by introducing the self-adaptive error prediction model and the dynamic compensation mechanism, and remarkably enhances the quality and efficiency of video and image processing, and the specific technical effects are as follows:

In summary, the technical scheme of the application solves the core problem in the traditional DCT/IDCT method through a series of innovative improvements, improves the comprehensive performance of video and image processing through an intelligent means, and provides a powerful technical support for the development of the related field.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the present application.

Claims

1. The transformation method applied to video and image processing is characterized by comprising the steps of constructing and loading a video and image self-adaptive error prediction model, wherein the video and image self-adaptive error prediction model is trained based on video and image historical transformation data; pre-processing the input video and image data, ready for input to the DCT/IDCT transform basic step; performing the basic steps of DCT/IDCT transformation while recording key intermediate variables in each stage of transformation;

after all transformation steps are completed, post-processing operation is carried out, post-processing is carried out after positive transformation, integer matrix and right shift operation are applied, shift parameters are fixed, and output is adjusted through addition, multiplication and functions; the construction of the video and image self-adaptive error prediction model is specifically as follows: collecting a plurality of DCT/IDCT transformation data from historical video and image processing instances, the video and image data comprising intermediate results of the transformation, final output, and error measurements;

2. The transformation method applied to video and image processing according to claim 1, wherein the preprocessing flow:

Adjusting the video frames and the images to the size of target processing, ensuring the frame rate of all frames to be consistent for the video in the video and image data, and carrying out frame inserting or frame extracting processing; the denoising algorithm is applied to reduce image noise in video and image data, and gamma correction is carried out on the image by adjusting the brightness and contrast of the image, and meanwhile, the data distribution is optimized, so that the image is more suitable for DCT transformation; dividing the video frame and image into a plurality of blocks of fixed format pixels;

3. The transformation method applied to video and image processing according to claim 1, wherein predicting the error range of each step based on the current transformation characteristics and the history data is specifically: extracting features related to transformation from a video frame or image block to be processed currently, normalizing data, combining historical transformation data, picking up a historical case closest to the current data based on feature similarity, taking the historical case as a reference basis for error prediction, determining similarity by adopting distance measurement, training a polynomial regression model by using the integrated historical data and features, wherein the model aims at learning an error pattern in the historical data and predicting error distribution possibly occurring in the current transformation step,

Error range prediction: the model outputs a possible error range or a specific error value for each step of the current transformation according to the input characteristics, analyzes the predicted error range, determines the transformation step or intermediate variable with the most obvious influence on the final output, preferentially performs error compensation, dynamically formulates a compensation strategy based on the predicted error, and comprises adjusting coefficients, directly adding compensation values at specific positions or adjusting a calculation sequence.

4. The transformation method applied to video and image processing according to claim 1, wherein the basic step of DCT/IDCT transformation includes performing one-dimensional inverse transformation on the video and image, selecting integer arithmetic, and calculating the video and image using multiplication coefficients to obtain one-dimensional inverse transformed video and image output data;