CN116095333A - Image compression method, device, equipment and storage medium - Google Patents

Image compression method, device, equipment and storage medium Download PDF

Info

Publication number
CN116095333A
CN116095333A CN202310068109.8A CN202310068109A CN116095333A CN 116095333 A CN116095333 A CN 116095333A CN 202310068109 A CN202310068109 A CN 202310068109A CN 116095333 A CN116095333 A CN 116095333A
Authority
CN
China
Prior art keywords
coefficients
coefficient
row
column
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310068109.8A
Other languages
Chinese (zh)
Inventor
郭莉娜
王岩
王园园
秦红伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Intelligent Technology Co Ltd
Priority to CN202310068109.8A priority Critical patent/CN116095333A/en
Publication of CN116095333A publication Critical patent/CN116095333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses an image compression method, an image compression device, image compression equipment and a storage medium, wherein the method comprises the following steps: acquiring DCT coefficients of at least one color component corresponding to each frame of image in a video sequence to be compressed; rearranging and grouping the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group; respectively carrying out context prediction and coding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix; and determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component.

Description

Image compression method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, but not limited to, and in particular, to an image compression method, apparatus, device, and storage medium.
Background
JPEG (Joint Photographic Experts Group, a compression standard established by the group of joint image experts) is the first international image compression standard, and JPEG images are widely distributed in data centers, cloud storage, network file systems and other storage centers. However, due to the limitations of the JPEG algorithm performance, most of these images are not compressed enough and there is still a lot of redundant information.
In order to save storage and bandwidth resources, some related technologies introduce lossless compression on the basis of JPEG, and compress the volume of the JPEG file under the condition of guaranteeing the original JPEG file to be lossless. These techniques typically design predictors, context models, etc. by means of feature engineering to promote the compression rate of lossless compression.
Disclosure of Invention
In view of this, embodiments of the present application at least provide an image compression method, apparatus, device, and storage medium. The technical scheme of the embodiment of the application is realized as follows:
in one aspect, an embodiment of the present application provides an image compression method, including:
acquiring DCT coefficients of at least one color component corresponding to each frame of image in a video sequence to be compressed; rearranging and grouping the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group; performing context prediction and coding on the high-frequency coefficient matrix and the low-frequency coefficient matrix for each color component respectively to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix; and determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component.
In some embodiments, the rearrangement grouping is performed on the DCT coefficients of each color component according to a frequency order, so as to obtain a high-frequency coefficient matrix and a low-frequency coefficient matrix, which includes: extracting coefficients with the same frequency in the DCT coefficients to form a space dimension aiming at the DCT coefficients of each color component, and forming channel dimensions by the coefficients with different frequencies to obtain multichannel DCT coefficients; and dividing the multichannel DCT coefficients in the channel dimension according to the frequency order to obtain the high-frequency coefficient matrix and the low-frequency coefficient matrix.
In the above embodiment, rearrangement grouping of the original DCT coefficients of each color component is realized through space dimension extraction and channel dimension splitting, so that the correlation of the DCT coefficients between the space and the channel can be fully mined later.
In some embodiments, coefficients of the same frequency in the DCT coefficients form a spatial dimension, coefficients of different frequencies can form a channel dimension, and the performing context prediction and encoding on the high frequency coefficient matrix and the low frequency coefficient matrix of each color component respectively to obtain first encoded information corresponding to the high frequency coefficient matrix and second encoded information corresponding to the low frequency coefficient matrix includes: acquiring a priori probability distribution parameter estimated for the DCT coefficient of each color component; performing context prediction and coding on the high-frequency coefficient matrix in the space dimension by using the prior probability distribution parameters to obtain first coding information corresponding to the high-frequency coefficient matrix; and carrying out context prediction on the low-frequency coefficient matrix in the space dimension and the channel dimension by using the prior probability distribution parameters to obtain second coding information corresponding to the low-frequency coefficient matrix.
In the embodiment, the high-frequency coefficient matrix is partitioned in the space dimension in consideration of small high-frequency information quantity, and then the autoregressive context is modeled, so that the parallelism of the model is ensured, and the speed is improved; meanwhile, the low-frequency information is abundant, the space dimension and the channel dimension are partitioned, and then autoregressive modeling is carried out, so that the correlation of DCT coefficients between the space and the channel can be fully mined, and redundant information is eliminated.
In some embodiments, the performing context prediction and encoding on the high frequency coefficient matrix in the spatial dimension by using the prior probability distribution parameter to obtain first encoded information corresponding to the high frequency coefficient matrix includes: in the space dimension, disassembling the high-frequency coefficient matrix to obtain M row coefficients; predicting and encoding the M row coefficients by using the prior probability distribution parameters in a multistage spatial autoregressive mode to obtain first encoding information corresponding to the high-frequency coefficient matrix; wherein the prediction process of the Mth row coefficient depends on the first M-1 row coefficients.
In the above embodiment, the spatial dimension of the high-frequency coefficient matrix is first disassembled, then the result of the last prediction is used as the context of the next prediction process, and probability distribution prediction and coding of each row of coefficients in the high-frequency coefficient matrix are sequentially performed in an autoregressive prediction mode, so that effective compression of the high-frequency coefficient matrix is realized, and the first coding information of the high-frequency coefficient matrix is obtained.
In some embodiments, the predicting and encoding the M row coefficients by using the prior probability distribution parameter in a multi-level spatial autoregressive manner to obtain first encoded information corresponding to the high-frequency coefficient matrix includes: based on the prior probability distribution parameters, predicting a 1 st row coefficient in the M row coefficients to obtain a 1 st predicted value; performing channel connection on the first i row coefficients in the M row coefficients and the prior probability distribution parameters to obtain an ith fusion parameter; i is an integer of 1 or more and M or less; predicting the (i+1) th row coefficient based on the (i) th fusion parameter to obtain an (i+1) th predicted value; and respectively encoding the ith row coefficient based on the ith predicted value to obtain first encoded information corresponding to the high-frequency coefficient matrix.
In the above embodiment, the line coefficient before the spatial position of the current line coefficient is used as the context of the current line coefficient, and the current line coefficient is predicted by using the prior probability distribution parameters output by other modules, so as to complete the context modeling of the prediction process, and then each line coefficient is encoded to obtain the first encoding information corresponding to the high-frequency coefficient matrix. Therefore, the high-frequency coefficient matrix is disassembled to divide the context and conduct autoregressive prediction, the parallelism of the model can be ensured, and the speed is improved.
In some embodiments, the performing context prediction and encoding on the low-frequency coefficient matrix in the spatial dimension and the channel dimension by using the prior probability distribution parameter to obtain second encoded information corresponding to the low-frequency coefficient matrix includes: disassembling the low-frequency coefficient matrix in the space dimension to obtain N row coefficients; based on the prior probability distribution parameters, predicting and encoding the N row coefficients to obtain encoding information corresponding to the N row coefficients; for each row coefficient in the N row coefficients, disassembling in the channel dimension to obtain K column coefficients corresponding to each row coefficient; predicting and encoding K column coefficients corresponding to each row coefficient in a multistage channel autoregressive mode to obtain encoded information corresponding to the K column coefficients; and determining second coding information corresponding to the low-frequency coefficient matrix based on the coding information corresponding to the N row coefficients and the coding information corresponding to the K column coefficients in each row coefficient.
In the above embodiment, considering that the low-frequency coefficient information is rich, the low-frequency coefficient matrix is segmented in both the space dimension and the channel dimension, and then autoregressive modeling is performed on the K column coefficients disassembled by each row of coefficients, so that the correlation of DCT coefficients between the space and the channel is fully excavated, and redundant information in the low-frequency coefficient matrix is eliminated.
In some embodiments, the predicting and encoding the N line coefficients based on the prior probability distribution parameter, to obtain encoded information corresponding to the N line coefficients, includes: based on the prior probability distribution parameters, predicting a 1 st line coefficient in the N line coefficients to obtain an initial predicted value corresponding to the 1 st line coefficient; connecting the 1 st row coefficient in the N row coefficients with the prior probability distribution parameters to obtain shared distribution parameters; based on the sharing distribution parameters, predicting other row coefficients except the 1 st row coefficient in the N row coefficients respectively to obtain initial predicted values corresponding to the other row coefficients respectively; and respectively encoding the same line coefficient based on the initial predicted value corresponding to each line coefficient in the N line coefficients to obtain encoding information corresponding to the N line coefficients.
In the above embodiment, the 1 st line coefficient of the N line coefficients is directly predicted by using the prior probability distribution parameter, and the other line coefficients except for the 1 st line coefficient are predicted by combining the 1 st line coefficient and the prior probability distribution parameter, so that the initial prediction value of each line coefficient of the N line coefficients is used to encode the line coefficients, thereby ensuring the parallelism of the model and improving the compression efficiency.
In some embodiments, the N line coefficients are { C } in order of spatial location 1 ,C 2 ,C3,…,C N And predicting and encoding K column coefficients corresponding to each row coefficient in a multistage channel autoregressive mode to obtain encoded information corresponding to the K column coefficients, wherein the method comprises the following steps of: based on the line coefficient C 1 Corresponding initial predicted value for the C 1 Predicting a 1 st column coefficient in the disassembled K column coefficients to obtain a corresponding 1 st optimized predicted value; for said C 1 The first j column coefficients and the row coefficient C of the disassembled K column coefficients 1 Carrying out channel connection on the corresponding initial predicted value to obtain a j-th channel parameter; j is an integer of 1 or more and K or less; based on the j-th channel parameter, to the C 1 Predicting the j+1th column coefficient in the disassembled K column coefficients to obtain a j+1th optimized predicted value; respectively based on the j-th optimized predicted value, for the C 1 Coding the j-th column coefficient in the disassembled K column coefficients to obtain the C 1 The code information of the disassembled K column coefficients.
In the above embodiment, the 1 st row coefficient C in the low frequency coefficient matrix 1 The K columns disassembled are first of all made use of C 1 Predicting the 1 st column coefficient, and then using the previous prediction result as the context of the next prediction process to predict the column coefficients except the 1 st column coefficient in the K column coefficients in turn, thereby passing through the multi-stage channel autoregressive method Predicting and coding to obtain C 1 And the code information of the disassembled K columns of coefficients.
In some embodiments, the N line coefficients are { C } in order of spatial location 1 ,C 2 ,C 3 ,…,C N Predicting and encoding the K column coefficients corresponding to each row coefficient in the manner of multi-stage channel autoregressive, to obtain encoded information corresponding to the K column coefficients, including: based on the line coefficient { C 2 ,C 3 ,…,C N Predicting the 1 st column coefficient disassembled by the same row coefficient to obtain a 1 st optimized predicted value corresponding to the 1 st column coefficient disassembled; based on the row coefficients { C }, respectively 2 ,C 3 ,…,C N Determining the context information of the (t+1) th column coefficient in the same row coefficient by the initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient; t is an integer of 1 or more and K-1 or less; based on the context information, predicting a (t+1) th column coefficient in the same row coefficient to obtain a (t+1) th optimized predicted value corresponding to the (t+1) th column coefficient; based on the obtained t-th optimized predicted value, coding a t-th column coefficient in the same row coefficient to obtain the row coefficient { C } 2 ,C 3 ,…,C N Coding information of K column coefficients of each row coefficient disassembly.
In the above embodiment, for the row coefficient { C } in the low frequency coefficient matrix 2 ,C 3 ,…,C N The K columns are first disassembled using the row coefficients { C } 2 ,C 3 ,…,C N Predicting 1 st column coefficient by initial predicted value of each row coefficient, determining context information of (t+1) st column coefficient in the same row coefficient, and sequentially predicting column coefficients except 1 st column coefficient in K column coefficients, thereby predicting and coding by multi-stage channel autoregressive mode to obtain { C } 2 ,C 3 ,…,C N Code information of K column coefficients of each row coefficient decomposition.
In some embodiments, the coefficients { C } are based on the row coefficients, respectively 2 ,C 3 ,…,C N The initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient determine the context information of the (t+1) th column coefficient in each row coefficient, including: the line coefficient { C 2 ,C 3 ,…,C N Channel splicing is carried out on the t-th column of each row of coefficients to obtain a combination coefficient corresponding to the t-th column; and determining the context information of the (t+1) th column coefficient in each row coefficient based on the initial predicted value corresponding to each row coefficient and the combination coefficient corresponding to the first t column in each row coefficient.
In the above embodiment, the line coefficients { C }, are first 2 ,C 3 ,…,C N Channel splicing is carried out on the same column of coefficients in the sequence to obtain the combined coefficients of the corresponding columns, thereby combining { C } 2 ,C 3 ,…,C N The initial predicted value of each line of coefficients and the corresponding combined coefficient of the first t columns can determine the context information of the (t+1) th column coefficient in the same line of coefficients, so that the (t+1) th column coefficient can be accurately predicted by using the context information later.
In another aspect, an embodiment of the present application provides an image compression apparatus, including:
the acquisition module is used for acquiring DCT coefficients of at least one color component corresponding to each frame of image in the video sequence to be compressed;
the grouping module is used for carrying out rearrangement grouping on the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group;
the prediction module is used for respectively carrying out context prediction and encoding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first encoding information corresponding to the high-frequency coefficient matrix and second encoding information corresponding to the low-frequency coefficient matrix;
and the determining module is used for determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component.
In yet another aspect, embodiments of the present application provide a computer device including a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing some or all of the steps of the above method when the program is executed.
In yet another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, performs some or all of the steps of the above-described method.
In the embodiment of the application, first, a DCT coefficient of at least one color component corresponding to each frame image in a video sequence to be compressed is obtained; then, rearranging and grouping the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group; respectively carrying out context prediction on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix; finally, determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component; in this way, the calculated amount of the model is reasonably and effectively distributed by re-dividing and respectively predicting DCT coefficients of at least one color component in each frame of image; meanwhile, correlation among DCT coefficients of different frequencies is fully excavated, redundant information can be eliminated, and the compression rate is remarkably improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the technical aspects of the application.
Fig. 1 is an optional flowchart of an image compression method according to an embodiment of the present application;
fig. 2 is an optional flowchart of an image compression method according to an embodiment of the present application;
fig. 3 is an optional flowchart of an image compression method according to an embodiment of the present application;
FIG. 4 is a logic flow diagram for constructing a context model for lossless compression of JPEG images provided by an embodiment of the present application;
FIG. 5A is a schematic diagram illustrating a process for reordering and grouping DCT coefficients according to one embodiment of the present application;
fig. 5B is a schematic diagram of a block process of the high-frequency coefficient matrix in a spatial dimension according to an embodiment of the present application;
FIG. 5C is a flowchart illustrating a prediction method of a high-frequency coefficient matrix according to an embodiment of the present application;
fig. 5D is a schematic diagram of a block process of the low-frequency coefficient matrix in a spatial dimension according to an embodiment of the present application;
FIG. 5E is a flowchart illustrating a prediction method of a low-frequency coefficient matrix according to an embodiment of the present application;
FIG. 5F is a flow chart of predicting the 1 st row coefficient disassembled by the low frequency coefficient matrix in the channel dimension according to the embodiment of the present application;
FIG. 5G is a flowchart of predicting the channel dimension of the remaining line coefficients disassembled by the low frequency coefficient matrix according to the embodiment of the present application;
fig. 6 is a schematic diagram of a composition structure of an image compression apparatus according to an embodiment of the present application;
fig. 7 is a schematic hardware entity diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and examples, which should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a specific ordering of objects, it being understood that the "first/second/third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the present application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the present application only and is not intended to be limiting of the present application. Before further elaborating on the embodiments of the present application, the terms and terms related to the embodiments of the present application are explained first, and the terms and terms related to the embodiments of the present application are applicable to the following explanation.
Image compression is an image processing technique that reduces the amount of data by reducing redundancy between original image data, which can save memory space and bandwidth resources to some extent. Image compression can be divided into lossy compression and lossless compression. Lossy compression transforms the image into the transform domain, and then the image is quantized to reduce non-zero numbers, thereby achieving the purpose of compressing the image. Lossless compression generally increases image redundancy by pixel rearrangement and other methods, and then removes redundancy to a greater extent by encoding, thereby achieving the purpose of image compression.
JPEG can generate smaller files for photographic (or similar) images because JPEG employs a specific lossy coding method for photographic images, which is suitable for low contrast, smooth image color transitions, noisy, and irregularly structured situations. In JPEG compression, a luminance value matrix of an input image is first divided into 8×8 blocks that do not overlap with each other. After discrete cosine transform (Discrete Cosine Transform, DCT) is performed on each block, an 8×8 DCT coefficient matrix is obtained in which low frequency components are concentrated in the upper left corner of the matrix and high frequency components are concentrated in the lower right corner of the matrix. The DCT coefficients are quantized according to an 8 x 8 quantization step size matrix (also called quantization tables, each quantization table being represented by a quality factor), and the resulting quantized DCT coefficients are encoded and written into a JPEG file. Since the 8 x 8 block is DCT-processed, 64 spatial frequencies are obtained after transformation, including 1 Direct Current (DC) frequency and 63 Alternating Current (AC) frequencies.
In order to save storage and bandwidth resources, some technical schemes introduce lossless compression on the basis of JPEG, and compress the volume of the JPEG file under the condition of guaranteeing the original JPEG file to be lossless. These include Lepton, packjpg, mozJPEG, JPEGrescan, JPEG XL, etc. These schemes typically design predictors, context models, etc. by means of feature engineering to promote the compression rate of lossless compression.
In recent years, an end-to-end image compression technique based on deep learning has made a great progress. Different from the characteristic engineering mode, the end-to-end image compression method of the deep neural network is adopted, network parameters can be jointly optimized, complex manual algorithm design is avoided, and meanwhile, the neural network can learn and remove redundant information in the image more intelligently. It can be seen that the context model implemented by neural networks is an effective module for such image compression techniques to achieve high compression rates. However, these context models are designed for the RGB (Red Green Blue, red Green Blue visible) domain of lossless formats such as PNG (Portable Network Graphic Format, portable network graphics format), and do not consider JPEG image characteristics, and are not applicable to JPEG image lossless compression techniques based on deep learning.
Embodiments of the present application provide an image compression method that may be performed by a processor of a computer device. The computer device may be a server, a notebook computer, a tablet computer, a desktop computer, a smart television, a set-top box, a mobile device (such as a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, and a portable game device) or the like with image compression capability. Fig. 1 is a schematic flowchart of an alternative image compression method according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps S110 to S140:
step S110, DCT coefficients of at least one color component corresponding to each frame image in a video sequence to be compressed are obtained;
here, each frame image in the video sequence is a JPEG image, and the at least one color component includes at least one of a chrominance cb channel component, a saturation Cr channel component, and a luminance Y channel component. In case that intra compression of the video sequence is required, the initial DCT coefficients of at least one color component are extracted from the code stream of each frame image to perform a subsequent recompression process. For example, DCT coefficients of luminance components are extracted from a code stream of each frame image, and the DCT coefficients of the luminance components are obtained by performing 8×8 pixel value blocks on a luminance Y channel in a JPEG image, and performing discrete cosine transform on each sub-block.
It should be noted that, the JPEG image compression is a block-based image compression scheme, and the JPEG file stores coefficients of the DCT transform. The DCT coefficients are obtained by performing 8×8 pixel value blocks on each of three channels of brightness Y, chroma cb and saturation Cr in an original image, and performing discrete cosine transform on each pixel value block, namely 64 DCT coefficients are formed in each 8×8 pixel value block.
Step S120, for the DCT coefficient of each color component, rearrangement grouping is carried out according to the frequency order, so as to obtain a high-frequency coefficient matrix and a low-frequency coefficient matrix;
here, the DCT coefficients of the Y-channel component obtained through DCT transformation can be subjected to zigzag scanning, and zigzag ordering is determined, namely, the positions in DCT coefficient matrixes corresponding to different pixel value blocks are reordered; then, based on zig-zag ordering, the DCT coefficients are recombined into 64 frequency channels according to the frequency, and the frequencies corresponding to the DCT coefficients in each frequency channel are the same; and dividing the 64 frequency channels into 2 groups, wherein the frequency corresponding to the frequency channel with the opposite position arranged at the back can form a high-frequency coefficient matrix, and the frequency corresponding to the frequency channel with the opposite position arranged at the front can form a low-frequency coefficient matrix.
It can be understood that the DCT coefficients of the JPEG image have the characteristics of rich low-frequency information and sparse high-frequency information, so that the core idea of the embodiment of the present application is to perform the high-frequency and low-frequency grouping according to the information richness. Since the resolution of each pixel value block is [ h, w ], the coefficients of the same frequency in the DCT coefficients of each color channel component are extracted together to form a spatial dimension, and DCT coefficients of different frequencies form a channel dimension, so that the resolution becomes [ h/8,w/8,64] after the processed DCT coefficients are arranged according to the frequency, i.e. recombined into 64 frequency channels. The division manner of the high-frequency and low-frequency packets in this embodiment of the present application is not limited, and one possible implementation manner is that the low-frequency coefficient matrix corresponds to the first 36 frequency channels, which is understood as a three-dimensional matrix of [ h/8,w/8,36], the high-frequency coefficient matrix corresponds to the second 28 frequency channels, which is understood as a three-dimensional matrix of [ h/8,w/8,28], and the frequencies corresponding to the DCT coefficients of [ h/8,w/8] of each two-dimensional matrix are the same.
Wherein,,
step S130, respectively carrying out context prediction and coding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix;
Here, the first encoded information is a compressed bit stream of DCT coefficients in a high frequency coefficient matrix, and the second encoded information is a compressed bit stream of DCT coefficients in a low frequency coefficient matrix. In practice, a probability mass function (Probability Mass Function, PMF) of the frequency corresponding to each DCT coefficient is first predicted, and then the corresponding DCT coefficient is entropy encoded based on the probability mass function to obtain a compressed bit stream of DCT coefficients, thereby reducing the storage size of color components (e.g., Y-channel components).
Because the high-frequency information amount is small, the high-frequency coefficient matrix can be segmented in the space dimension, and context prediction and coding are carried out for each space segmentation to obtain first coding information corresponding to the high-frequency coefficient matrix. Because the low-frequency information is rich, the blocks are carried out in the space dimension and the channel dimension, and then the second coding information corresponding to the low-frequency coefficient matrix is obtained through autoregressive modeling and coding, the correlation of DCT coefficients between the space dimension and the channel dimension can be fully mined, redundant information is eliminated, and therefore the compression rate is remarkably improved.
Step S140, determining target compressed data corresponding to each frame of image based on the first encoding information and the second encoding information of the at least one color component.
Here, after the first encoded information corresponding to the high frequency coefficient matrix and the second encoded information corresponding to the low frequency coefficient matrix are obtained, a compressed bit stream of each color component is obtained, and the encoded information of the DCT coefficients in the three color components is combined, so that the target compressed data of each frame image in the video sequence can be determined. The coding information obtaining manners of the DCT coefficients of different color components may be the same (for example, the manners described above), or may be different (for example, the DCT coefficient coding information of the Y channel component is obtained by using the embodiment of the present application, and the DCT coefficient coding information of other channel components is obtained by using a general coding technique in the field).
In the embodiment of the application, first, a DCT coefficient of at least one color component corresponding to each frame image in a video sequence to be compressed is obtained; then, rearranging and grouping the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group; respectively carrying out context prediction and coding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix; finally, determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component; in this way, the DCT coefficient of at least one color component in each frame of image is re-divided and respectively predicted according to the characteristic that the DCT coefficient has abundant low-frequency information and sparse high-frequency information, so that the calculated amount of the model is reasonably and effectively distributed; meanwhile, correlation among DCT coefficients of different frequencies is fully excavated, redundant information can be eliminated, and the compression rate is remarkably improved.
In some embodiments, the step S120 may include the following steps 121 to 122:
step 121, extracting coefficients with the same frequency in the DCT coefficients to form a space dimension, and coefficients with different frequencies to form a channel dimension for the DCT coefficients of each color component to obtain multi-channel DCT coefficients;
here, the multi-channel DCT coefficients are 64-channel DCT coefficients. In an implementation, the DCT coefficient matrix of each color component is first straightened according to the channel dimension, where the original resolution is [ h×w ], and then the multi-channel DCT coefficients are obtained after straightening, and the resolution becomes [ h/8,w/8,64], for example, for the multi-channel DCT coefficients after 16×16 conversion of the original resolution, the size of the DCT coefficient of each channel in the space dimension is 2×2.
Since each 8×8 pixel value block of the input image is subjected to DCT transformation during JPEG compression, a DCT coefficient matrix of size 8×8×1 is obtained, and each DCT coefficient matrix includes 0 to 63 different positions and corresponds to 64 coefficients of different frequencies. The coefficients in different DCT coefficient matrices that are in the same relative position have the same frequency, the coefficients in different relative positions have different frequencies, e.g., the coefficients in each DCT coefficient matrix in the 0 position have the same frequency, the coefficients in the 1 position have the same frequency, and so on.
And 122, dividing the multichannel DCT coefficients in the channel dimension according to the frequency order to obtain the high-frequency coefficient matrix and the low-frequency coefficient matrix.
Here, the multi-channel DCT coefficients are rearranged in the channel dimension, the high frequency coefficient matrix corresponding to the back 28 channels can be understood as a three-dimensional matrix of [ h/8,w/8,28], and the low frequency coefficient matrix corresponding to the front 36 channels can be understood as a three-dimensional matrix of [ h/8,w/8,36 ]. The DCT coefficients of each two-dimensional matrix [ h/8,w/8] correspond to the same frequency, but the DCT coefficients have different values.
In the above embodiment, considering that the code rate ratio of the low frequency DCT coefficient of the color component (such as the Y-channel component) is far greater than the code rate of the high frequency DCT coefficient, the rearrangement grouping of the original DCT coefficient is realized through space dimension extraction and channel dimension splitting, so that the correlation of the DCT coefficients between the space and the channel can be fully mined.
In some embodiments, the coefficients of the same frequency in the DCT coefficients form a spatial dimension, and the coefficients of different frequencies form a channel dimension, and the step S130 may include the following steps S131 to S133:
step S131, obtaining estimated prior probability distribution parameters of the DCT coefficient of each color component;
Here, the prior probability distribution parameter (hyper_y) may be a preliminary probability distribution estimate for each color channel output by a pre-trained entropy parameter prediction module (Entropy Parameters).
In some embodiments, after the DCT coefficients of the color components after upsampling are fused, the DCT coefficients are input to the entropy parameter prediction module together with the coded prior information Yprior corresponding to each color component, so that the entropy parameter prediction module is utilized to output the prior probability distribution parameter corresponding to each color component.
Step S132, performing context prediction and coding on the high-frequency coefficient matrix in the space dimension by using the prior probability distribution parameter to obtain first coding information corresponding to the high-frequency coefficient matrix;
the method comprises the steps of firstly predicting DCT coefficients in different spatial positions, namely different rows, in a high-frequency coefficient matrix through a trained convolutional neural network to obtain probability estimation scores corresponding to the DCT coefficients of each row, then entropy coding the DCT coefficients of the ith row (i is any row) based on the probability estimation scores to obtain a compressed bit stream of the DCT coefficients of the ith row, and finally determining first coding information of the whole high-frequency coefficient matrix.
And S133, carrying out context prediction and coding on the low-frequency coefficient matrix in the space dimension and the channel dimension respectively by utilizing the prior probability distribution parameters to obtain second coding information corresponding to the low-frequency coefficient matrix.
The method comprises the steps of firstly predicting DCT coefficients in different spatial positions, namely different rows, in a low-frequency coefficient matrix through a trained convolutional neural network, then further predicting DCT coefficients in different channel positions, namely different columns, in the same spatial position to obtain probability estimation fractions of DCT coefficients in each row and each column in the low-frequency coefficient matrix, then entropy coding the DCT coefficients in each row and each column based on the probability estimation fractions to obtain compressed bit streams of the DCT coefficients in each row and each column, and finally determining second coding information of the whole low-frequency coefficient matrix.
In the embodiment, the high-frequency coefficient matrix is predicted according to the rows in the space dimension in consideration of small high-frequency information quantity, and then the autoregressive context is modeled, so that the parallelism of the model is ensured, and the speed is improved; meanwhile, the low-frequency information is abundant, the space dimension and the channel dimension are predicted, and then autoregressive modeling is carried out, so that the correlation of DCT coefficients between the space and the channel can be fully mined, and redundant information is eliminated.
Fig. 2 is a schematic flowchart of an alternative image compression method according to an embodiment of the present application, as shown in fig. 2, the step S132 includes the following steps S210 to S250:
step S210, in the space dimension, disassembling the high-frequency coefficient matrix to obtain M row coefficients;
here, M is a positive integer, and a specific value may be set according to an actual situation, which is not specifically limited in the embodiment of the present application. Taking the DCT coefficient matrix converted by Y component with the resolution of 32×32 as an example, the corresponding high frequency coefficient matrix is [4,4,28 ]] Marking positions 1, 2, 3 and 4 of the high-frequency coefficient matrix in the space dimension according to each 2 multiplied by 2 area, marking the same new small blocks with the same composition, and disassembling (Row Split) according to each small block to obtain four Row coefficients in the DCT coefficient matrix: r is (r) (1) 、r (2) 、r (3) 、r (4) Each row of coefficients includes DCT coefficients of size 2 x 28.
Step S220, based on the prior probability distribution parameter, predicting the 1 st line coefficient in the M line coefficients to obtain a 1 st predicted value;
here, the 1 st line coefficient of the M line coefficients is as described above as r (1) For DCT coefficients at spatial locations numbered 1, in practice, a priori probability distribution parameters are input into a trained convolutional neural network, outputting a 1 st predictor corresponding to the 1 st line coefficient.
Step S230, performing channel connection on the first i row coefficients in the M row coefficients and the prior probability distribution parameters to obtain an ith fusion parameter;
here, i is an integer of 1 or more and M or less; for example, performing channel connection on the 1 st row coefficient and the prior probability distribution parameter to obtain a 1 st fusion parameter; and carrying out channel connection on the 1 st row coefficient, the 2 nd row coefficient and the prior probability distribution parameter to obtain the 2 nd fusion parameter.
It is noted that, since the M line coefficients are obtained by disassembling the high-frequency coefficient matrix, the resolution of each line coefficient is correspondingly reduced, and before the prior probability distribution parameters are connected in the channel, the resolution alignment operation is required, and since the prior probability distribution parameters are consistent with the resolution of the original Y component, the resolution of the prior probability distribution parameters is 2 times that of each disassembled line coefficient, and after 2 times up-sampling processing is performed on each line coefficient, the channel connection is performed with the prior probability distribution parameters.
Step S240, predicting the (i+1) th row coefficient based on the (i) th fusion parameter to obtain an (i+1) th predicted value;
here, since the i-th fusion parameter includes the first i-th line coefficient, the i+1-th line coefficient is predicted by the i-th fusion parameter, that is, the i+1-th line coefficient is predicted depending on the first i-th line coefficient. The prediction can be performed by a deep learning mode, for example, the ith fusion parameter and the label of the ith+1th row coefficient are input into a trained convolutional neural network, so that the ith+1th predicted value is obtained through prediction. In this way, the line coefficient before the space position of the current line coefficient is used as the context information of the current line coefficient, and the current line coefficient is predicted by using the prior probability distribution parameters output by other modules, so that the context modeling of the prediction process is completed.
Step S250, coding the ith row coefficient based on the ith predicted value to obtain first coding information corresponding to the high-frequency coefficient matrix.
Here, each line of coefficients is encoded to obtain first encoded information corresponding to the high-frequency coefficient matrix. The encoding may be entropy encoding, arithmetic encoding, which are commonly used in the field of image encoding, for example, encoding by an arithmetic encoder, and the specific encoding process is not described in detail here.
In some embodiments, each row of coefficients in the high frequency coefficient matrix is predicted and encoded one by one to obtain first encoded information, i.e. the 1 st row of coefficients r (1) After prediction and coding are completed, the prediction and coding are used as the 2 nd line coefficient r (2) The context of (2) line coefficient r (2) Prediction and coding are performed, and so on. In other embodiments, the first encoded information is obtained by encoding each of the high frequency coefficient matrix after prediction is completed for all the row coefficients.
In the embodiment of the application, the high-frequency coefficient matrix is firstly disassembled in space dimension, then the result of the last prediction is used as the context of the next prediction process, probability distribution prediction and coding of each row of coefficients in the high-frequency coefficient matrix are sequentially performed in an autoregressive prediction mode, and therefore effective compression of the high-frequency coefficient matrix is achieved, and first coding information of the high-frequency coefficient matrix is obtained. Therefore, the high-frequency coefficient matrix is disassembled to divide the context and conduct autoregressive prediction, the parallelism of the model can be ensured, and the speed is improved.
Fig. 3 is a schematic flowchart of an alternative image compression method according to an embodiment of the present application, as shown in fig. 3, the step S133 includes the following steps S310 to S350:
step S310, disassembling the low-frequency coefficient matrix in the space dimension to obtain N row coefficients;
here, N is a positive integer different from M, and similar implementation processes of disassembling the high-frequency coefficient matrix to obtain M row coefficients can be referred to as related description in the above embodiments.
Step S320, based on the prior probability distribution parameters, predicting and encoding the N row coefficients to obtain encoding information corresponding to the N row coefficients;
in some embodiments, the prior probability distribution parameter is used, the first N-1 row coefficients are used as context information of the nth row coefficient, and the N row coefficients are predicted and encoded in a multi-stage spatial autoregressive manner, so that encoded information corresponding to the N row coefficients is obtained.
In this way, probability distribution of each line of coefficients in the low-frequency coefficient matrix is predicted and encoded in sequence in an autoregressive prediction mode, so that the low-frequency coefficient matrix is effectively compressed in the space dimension.
In some embodiments, based on the prior probability distribution parameter, predicting a 1 st line coefficient in the N line coefficients to obtain an initial predicted value corresponding to the 1 st line coefficient; connecting the 1 st row coefficient in the N row coefficients with the prior probability distribution parameters to obtain shared distribution parameters; based on the sharing distribution parameters, predicting other row coefficients except the 1 st row coefficient in the N row coefficients respectively to obtain initial predicted values corresponding to the other row coefficients respectively; and respectively encoding the same line coefficient based on the initial predicted value corresponding to each line coefficient in the N line coefficients to obtain encoding information corresponding to the N line coefficients.
In this way, the 1 st row coefficient in the N row coefficients is directly predicted by using the prior probability distribution parameters, and the other row coefficients except the 1 st row coefficient are predicted by combining the first j row coefficients and the prior probability distribution parameters, so that the respective initial predicted value of each row coefficient in the N row coefficients is used for encoding the model, and the compression efficiency is improved while the parallelism of the model is ensured.
Step S330, for each of the N row coefficients, performing disassembly in the channel dimension to obtain K column coefficients corresponding to each row coefficient;
here, for example, for each line coefficient of size 2×2×36 in the DCT coefficient matrix, column disassembly is performed in the channel dimension, resulting in K columns in each line coefficient:
Figure BDA0004073700210000111
where i=1, 2, 3, 4. The number of columns K in each row coefficient and the number of channels contained in each column are the same, and specific values can be set according to actual conditionsThe embodiment is not particularly limited thereto. />
Step S340, predicting K column coefficients corresponding to each row coefficient in a multistage channel autoregressive manner to obtain coding information corresponding to the K column coefficients;
Here, for K column coefficients corresponding to each line coefficient in the low-frequency coefficient matrix, except for the 1 st column directly predicting based on the initial predicted value corresponding to the corresponding line coefficient, for example, predicting through a convolutional neural network, and the rest of any j columns, all use the 1 st column to the j-1 st line as context information to determine a probability estimation score corresponding to each column and encode, so as to realize autoregressive in the channel dimension direction.
Step S350, determining second coding information corresponding to the low-frequency coefficient matrix based on the coding information corresponding to the N row coefficients and the coding information corresponding to the K column coefficients in each row coefficient.
Here, according to the correlation of the DCT coefficients between the space and the channel, the coding information of N row coefficients in the low-frequency coefficient matrix acquired in the space dimension and the coding information corresponding to K column coefficients in each row coefficient acquired in the channel dimension are combined, so as to obtain more accurate and more effective second coding information.
In the embodiment of the application, the low-frequency coefficient matrix is partitioned in the space dimension and the channel dimension in consideration of the abundance of the low-frequency coefficient information, and then autoregressive modeling is performed on K column coefficients disassembled by each row of coefficients, so that the correlation of DCT coefficients between the space and the channel is fully excavated, and redundant information in the low-frequency coefficient matrix is eliminated.
In some embodiments, the N line coefficients are { C } in order of spatial location 1 ,C 2 ,C 3 ,…,C N For row coefficient C 1 The step S340 includes the following steps S341 to S344:
step S341, based on the line coefficient C 1 Corresponding initial predicted value for the C 1 Predicting a 1 st column coefficient in the disassembled K column coefficients to obtain a corresponding 1 st optimized predicted value;
here, the line coefficient C 1 The corresponding initial predicted value is obtained by direct prediction through a convolutional neural network based on the acquired prior probability distribution parameter, and then the line coefficient C is obtained 1 The corresponding initial predicted value and the 1 st column coefficient are input into a trained channel prediction model to obtain a 1 st optimized predicted value of the 1 st column coefficient.
Step S342, for the C 1 The first j column coefficients and the row coefficient C of the disassembled K column coefficients 1 Carrying out channel connection on the corresponding initial predicted value to obtain a j-th channel parameter;
here, j is an integer of 1 or more and K or less; for example, for column coefficient 1 and row coefficient C 1 Carrying out channel connection on the corresponding initial predicted value to obtain a 1 st channel parameter; for column coefficient 1, column coefficient 2 and row coefficient C 1 And (5) performing channel connection on the corresponding initial predicted value to obtain a 2 nd channel parameter.
Step S343, based on the j-th channel parameters, of the C 1 Predicting the j+1th column coefficient in the disassembled K column coefficients to obtain a j+1th optimized predicted value;
here, since the j-th channel parameter contains the first j column coefficients, the j+1th column coefficients are predicted by the j-th channel parameter, that is, the j+1th column coefficients are predicted depending on the first j column coefficients. In this way, the column coefficient before the current column coefficient in the same row coefficient is used as the context information of the current column coefficient in the channel dimension, and the current column coefficient is predicted by combining the initial predicted value corresponding to the same row coefficient, so that the context modeling of the prediction process is completed.
Step S344, based on the j-th optimized prediction value, for the C 1 Coding the j-th column coefficient in the disassembled K column coefficients to obtain the C 1 The code information of the disassembled K column coefficients.
In some embodiments, after obtaining the respective optimized predictors for all column coefficients, the 1 st row coefficient C is respectively 1 The K column coefficients in the matrix are encoded, so that the 1 st row coefficient in the low-frequency coefficient matrix can be obtained at the same time in the channel dimensionAnd the parallelism of the model is improved by coding information of each column.
In some embodiments, after each jth column coefficient is obtained, the jth column coefficient is encoded first, and then the jth column coefficient and the previous j-1 column coefficient are taken as the context information of the jth+1th column coefficient, and the jth+1th column coefficient is predicted and encoded, so that the encoding information of the 1 st row coefficient in the low-frequency coefficient matrix in each column of the channel dimension is sequentially obtained.
In the above embodiment, the 1 st row coefficient C in the low frequency coefficient matrix 1 The disassembled K columns are first utilized C 1 Predicting the 1 st column coefficient, then taking the last prediction result as the context of the next prediction process, and predicting the column coefficients except the 1 st column coefficient in the K column coefficients in turn so as to obtain C by means of multi-stage channel autoregressive 1 The code information of the disassembled K column coefficients.
In some embodiments, the N line coefficients are { C } in order of spatial location 1 ,C 2 ,C 3 ,…,C N For row coefficient { C } 2 ,C 3 ,…,C N Step S340 includes the following steps S345 to S348:
step S345, based on the line coefficients { C } 2 ,C 3 ,…,C N Predicting the 1 st column coefficient disassembled by the corresponding row coefficient to obtain a 1 st optimized predicted value corresponding to the 1 st column coefficient disassembled;
Here, the line coefficient { C 2 ,C 3 ,…,C N The initial predicted value corresponding to each row coefficient in the sequence is obtained by direct prediction through a convolutional neural network based on the acquired prior probability distribution parameter, and then the row coefficient C is obtained i The initial predicted value and the ith column coefficient corresponding to (i being 2,3, …, N) are input into a trained channel prediction model, and the ith optimized predicted value of the ith column coefficient is obtained.
Assume that the line coefficients { C 2 ,C 3 ,…,C N Initial pre-prediction for each line of coefficients in }, andthe measured values are pri in turn (2) ,pri (3) ,...,pri (N) Respectively utilize pri (2) Prediction line coefficient C 2 The 1 st column coefficient R is disassembled 21 Obtaining R 21 1 st optimized predicted value of (2); by pri (3) Prediction line coefficient C 3 The 1 st column coefficient R is disassembled 31 Obtaining R 31 1 st optimized predicted value of (2); and so on until the line coefficient { C 2 ,C 3 ,…,C N The 1 st column coefficient, which is disassembled by all the row coefficients, is predicted.
Step S346 is based on the row coefficients { C }, respectively 2 ,C 3 ,…,C N Determining the context information of the (t+1) th column coefficient in the same row coefficient by the initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient;
in some embodiments, after the first t column coefficients in each row coefficient are subjected to channel splicing, the first t column coefficients are spliced with the initial predicted values of the corresponding row coefficients to obtain context information of the (t+1) th column coefficient in the same row coefficient, so that the prediction process of the (t+1) th column coefficient in the same row coefficient depends on the initial predicted values corresponding to the same row coefficient and the first t column coefficients, and thus, accurate prediction of the (t+1) th column coefficient is ensured, and redundant information of channel dimensions is reduced.
In some embodiments, the line coefficients { C 2 ,C 3 ,…,C N Channel splicing is carried out on the t-th column of each row of coefficients to obtain a combination coefficient corresponding to the t-th column; t is an integer of 1 or more and K-1 or less; and determining the context information of the (t+1) th column coefficient in each row coefficient based on the initial predicted value corresponding to each row coefficient and the combination coefficient corresponding to the first t column in each row coefficient.
Thus, the line coefficients { C }, are first 2 ,C 3 ,…,C N Channel splicing is carried out on the same column of coefficients in the sequence to obtain the combined coefficients of the corresponding columns, thereby combining { C } 2 ,C 3 ,…,C N The initial predicted value of each line of coefficients in the same line and the corresponding combined coefficients of the first t columns can be determinedThe context information of the (t+1) th column coefficient in the numbers, so that the redundant information of the space dimension and the channel dimension is reduced while the (t+1) th column coefficient is accurately predicted.
Step S347, based on the context information, predicting a (t+1) th column coefficient in the same row coefficient to obtain a (t+1) th optimized predicted value corresponding to the (t+1) th column coefficient;
here, the context information and the (t+1) -th column coefficient are input into the trained channel prediction model, and the (t+1) -th optimized prediction value corresponding to the (t+1) -th column coefficient is output.
Step S348, based on the obtained t-th optimized predicted value, encoding the t-th column coefficient in the same row coefficient to obtain the row coefficient { C } 2 ,C 3 ,…,C N Coding information of K column coefficients of each row coefficient disassembly.
In the embodiment of the application, the row coefficient { C } in the low-frequency coefficient matrix is aimed at 2 ,C 3 ,…,C N The K columns are first disassembled using the row coefficients { C } 2 ,C 3 ,…,C N Predicting 1 st column coefficient by initial predicted value of each row coefficient, determining context information of (t+1) st column coefficient in the same row coefficient, and sequentially predicting column coefficients except 1 st column coefficient in K column coefficients to obtain { C by multi-stage channel autoregressive method 2 ,C 3 ,…,C N Code information of K column coefficients of each row coefficient disassembly.
The above image compression method is described below in connection with a specific embodiment, however, it should be noted that the specific embodiment is only for better illustrating the present application and is not meant to be a undue limitation on the present application.
Because the code rate ratio of the Y component is far greater than the code rate of the Cb or Cr component in the JPEG code stream, and the code rate ratio of the low-frequency DCT coefficient of the Y component is far greater than the code rate ratio of the high-frequency DCT coefficient, according to the data characteristic of uneven distribution of the JPEG code stream, the embodiment of the application provides a context model driven by data for the Y component, and redundant information among coefficients is fully eliminated.
Fig. 4 is a logic flow diagram for constructing a context model for lossless compression of a JPEG image according to an embodiment of the present application, as shown in fig. 4, the method includes the following steps S410 to S430, by which:
step S410, the DCT coefficients of the Y component in the JPEG image are rearranged and grouped according to the frequency, and a high-frequency coefficient matrix and a low-frequency coefficient matrix are obtained;
the DCT coefficients of the Y component are recombined into 64 channels according to the frequency, and the frequencies corresponding to the coefficients in each channel are the same; the 64 channels are then divided into 2 groups, a high frequency coefficient matrix (comprising 28 channels) and a low frequency coefficient matrix (comprising 36 channels), respectively.
Fig. 5A is a schematic diagram of a process of rearranging and grouping DCT coefficients according to an embodiment of the present application, where, as shown in fig. 5A, taking a Y component with a resolution of 16×16 as an example (a number represents a position in a grid), there are 64 DCT coefficients for each 8×8 small block, the DCT coefficients are sequentially encoded into a total of 64 positions from 0 to 63 by zigzag sorting, and frequencies corresponding to the DCT coefficients at the same encoding position are the same. Extracting coefficients with the same frequency in the initial DCT coefficients of the Y component to form a space dimension, and forming channel dimensions by the DCT coefficients with different frequencies, so as to obtain a 2X 2 matrix of 64 channels, wherein the DCT coefficients of each channel are 2X 2 in the space dimension, and the processed DCT coefficients are arranged according to the frequency; the DCT coefficients of the 28 channels with the highest frequency are divided into a matrix of high frequency coefficients, and the DCT coefficients of the 36 channels with the lowest frequency are divided into a matrix of low frequency coefficients.
Step S420, the high-frequency coefficient matrix is segmented in the space dimension, and then autoregressive context modeling is carried out;
here, since the amount of high-frequency information is small, the blocking only needs to be performed in the spatial dimension. FIG. 5B is a block diagram of a high frequency coefficient matrix in a spatial dimension according to an embodiment of the present application, as shown in FIG. 5B, in the form of [4,4,28 ]]For example (in a lattice)The numbers of (a) represent the positions), the high-frequency coefficient matrix is marked with 1, 2, 3 and 4 in sequence every 2×2 area in the space dimension, and then new small blocks are formed for the same marked elements in the matrix, so that the high-frequency coefficient matrix is divided into 4 space blocks in the space dimension, namely r (1) ,r (2) ,r (3) ,r (4) Wherein the dashed line indicates the dependency between different spatial blocks, i.e. the prediction of the i (i being 2, 3 or 4) th block depends on the coding/decoding result of the i-1 th block.
FIG. 5C is a flowchart illustrating a prediction method of a high-frequency coefficient matrix according to an embodiment of the present application, where, as shown in FIG. 5C, a solid line represents a coded data stream process, and an input high-frequency coefficient matrix is first spatially segmented to obtain 4 row coefficients r (1) ,r (2) ,r (3) ,r (4) At the same time, the prior probability distribution parameter (hyper) is obtained by other modules y ) And performs resolution alignment, on the one hand due to r (1) The method comprises the steps that no context information exists, prediction is carried out only by using prior probability distribution parameters, namely, the aligned prior probability distribution parameters are directly input into a convolutional neural network P for prediction, and a prediction value is encoded through an arithmetic encoder (Arithmetic Encoder, AE); on the other hand will r (1) As r (2) Context information of help r (2) Prediction, i.e. the prior probability distribution parameter after alignment and r (1) After connection, the data are input into a convolutional neural network P for prediction and coding, r (2) After the coding is finished, r is added again (1) And r (2) Together as contextual information to help r (3) And predicting and encoding, and the like until all the line coefficients divided by the high-frequency coefficient matrix are encoded/decoded completely.
The broken line represents the decoding data stream process, firstly, the prior probability distribution parameters are obtained through other modules and are aligned in resolution, then, the prior probability distribution parameters aligned through the convolutional neural network P are predicted to obtain the parameter predicted value, and the parameter predicted value and the AE-coded r are processed through an arithmetic decoder (Arithmetic Decoder, AD) (1) The encoded information is decoded simultaneously to obtain r (1) The method comprises the steps of carrying out a first treatment on the surface of the Then r is set (1) Aligned a prioriThe probability distribution parameters are input into a convolutional neural network P for prediction and decoding after connection, and r is obtained (2) The method comprises the steps of carrying out a first treatment on the surface of the And then r is set (2) Connecting with the aligned prior probability distribution parameters, inputting into a convolutional neural network P for prediction and decoding to obtain r (3) The method comprises the steps of carrying out a first treatment on the surface of the Finally r is set (3) Connecting with the aligned prior probability distribution parameters, inputting into a convolutional neural network P for prediction and decoding to obtain r (4) So that all blocks divided by the high-frequency coefficient matrix are decoded, and finally r is calculated (1) ,r (2) ,r (3) ,r (4) And (5) performing reduction to obtain a high-frequency coefficient matrix. Therefore, the method only blocks in the space dimension, and then models the context by autoregressive, so that the parallelism of the model is ensured, and the speed is improved.
And S430, partitioning the low-frequency coefficient matrix in the space dimension and the channel dimension, and then performing autoregressive context modeling.
Here, since the low frequency information is rich, the blocking needs to be performed in both the spatial dimension and the channel dimension. Fig. 5D is a schematic diagram of a block process of a low-frequency coefficient matrix in a spatial dimension according to an embodiment of the present application, as shown in fig. 5D, first, similar to a high-frequency coefficient matrix, the low-frequency coefficient matrix is divided into 4 spatial blocks, i.e., row coefficients, i.e., r according to the spatial dimension (1) ,r (2) ,r (3) ,r (4) Wherein r is (2) ,r (3) ,r (4) Are all dependent on r (1) Is a result of encoding/decoding of (a). Each line coefficient is then subdivided by channel, e.g. for r (1) Channel blocking is performed to obtain K column coefficients
Figure BDA0004073700210000151
FIG. 5E is a flowchart illustrating a prediction method of a low-frequency coefficient matrix according to an embodiment of the present application, where, as shown in FIG. 5E, prediction is performed in a sequence of spatial-first-second-channel, i.e. different row coefficients r are explored by a spatial regression model 55 (1) ,r (2) ,r (3) ,r (4) Correlation between channels within the spatial block represented by each row coefficient is represented by a first model 551 or a second model552.
r (1) For the 1 st row coefficient of space dimension disassembly, no space context exists, and the prior probability distribution parameters after resolution alignment are processed only through the convolutional neural network P to obtain an initial predicted value pri (1) Thereafter at r by the first model 551 (1) And (5) carrying out channel regression prediction. Then r is set (1) After channel connection with the prior probability distribution parameters, the channel connection is input to a convolutional neural network P, and the coefficients r of the other rows are predicted at the same time (2) ,r (3) ,r (4) Respectively obtain corresponding predicted values pri (2) ,pri (3) ,pri (4)
FIG. 5F is a flow chart of prediction of the 1 st row coefficient disassembled by the low frequency coefficient matrix in the channel dimension according to the embodiment of the present application, as shown in FIG. 5F, the 1 st row coefficient r corresponding to the low frequency coefficient matrix is obtained by the first model 551 (1) And carrying out channel regression prediction. Line 1 coefficient r (1) Disassembling into K column coefficients in the channel dimension
Figure BDA0004073700210000161
For->
Figure BDA0004073700210000162
Without context information, only the initial predictor pri of the 1 st line coefficient is relied on (1) Prediction, later->
Figure BDA0004073700210000163
As->
Figure BDA0004073700210000164
Context information help of->
Figure BDA0004073700210000165
Prediction of->
Figure BDA0004073700210000166
As->
Figure BDA0004073700210000167
Context information help of->
Figure BDA0004073700210000168
Prediction, and so on, until r (1) All column coefficients disassembled in the channel dimension are encoded/decoded.
FIG. 5G is a flow chart of prediction of the remaining line coefficients in the channel dimension, which is disassembled by the low frequency coefficient matrix according to the embodiment of the present application, as shown in FIG. 5G, the remaining line coefficients r are predicted by the second model 552 (i) I=2, 3..k performs channel regression prediction. Taking K as 4 as an example, first, for the row coefficient r (2) ,r (3) ,r (4) Column 1 coefficient of each disassembly
Figure BDA0004073700210000169
Respectively rely on pri only (2) ,pri (3) ,pri (4) Prediction(s) of->
Figure BDA00040737002100001610
After encoding/decoding is completed, the channels are spliced to form +.>
Figure BDA00040737002100001611
As the combination coefficient corresponding to column 1, and then respectively corresponding to pri (2) ,pri (3) ,pri (4) Splicing to obtain the 2 nd column coefficient in each row coefficient
Figure BDA00040737002100001612
Context information of (i.e.)>
Figure BDA00040737002100001613
Then respectively forecast
Figure BDA00040737002100001614
Will then->
Figure BDA00040737002100001615
Make up of channel splice->
Figure BDA00040737002100001616
As the combination coefficient corresponding to column 2, use is made of
Figure BDA00040737002100001617
Figure BDA00040737002100001618
And->
Figure BDA00040737002100001619
Predicting column 3->
Figure BDA00040737002100001620
Up to r (2) ,r (3) ,r (4) The K column coefficients disassembled in the channel dimension are all encoded/decoded.
According to the embodiment of the application, DCT coefficients are divided into two groups of high frequency and low frequency according to different frequencies, the two groups are further divided according to the information quantity contained in the two groups, and finally, different networks are adopted to realize prediction. The embodiment of the application realizes the context model by utilizing the neural network in a data driving mode, can be used as a submodule of the JPEG lossless compression technology based on deep learning, and helps to improve the compression rate.
According to the data characteristics of uneven distribution of the JPEG code stream, the embodiment of the application provides a data-driven context model for the Y component, redundant information among coefficients is fully eliminated, and compared with the context model designed in a characteristic engineering mode, the compression rate of JPEG lossless compression is remarkably improved. In addition, the context model adjusts the context division and prediction modes, reasonably and effectively distributes the calculated amount of the model, and obviously improves the running speed of the context model on the premise of not obviously influencing the compression rate.
The image compression method provided by the embodiment of the application can be at least used for scenes such as a data center and cloud storage. In these scenes, the massive JPEG storage and increment files occupy a large amount of storage and bandwidth resources, so that the cost of data storage and transmission is greatly raised, and the occupation of storage and bandwidth resources can be remarkably reduced under the condition of ensuring that the JPEG files are completely lossless by the image compression method provided by the embodiment of the application.
Based on the foregoing embodiments, the embodiments of the present application provide an image compression apparatus, where the apparatus includes each module included, and each sub-module and each unit included in each module may be implemented by a processor in a computer device; of course, the method can also be realized by a specific logic circuit; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.
Fig. 6 is a schematic structural diagram of an image compression apparatus according to an embodiment of the present application, as shown in fig. 6, the apparatus 600 includes an obtaining module 610, a grouping module 620, a predicting module 630, and a determining module 640, where:
the acquiring module 610 is configured to acquire a DCT coefficient of at least one color component corresponding to each frame of image in the video sequence to be compressed;
the grouping module 620 is configured to reorder and group the DCT coefficients of each color component according to a frequency order, so as to obtain a high-frequency coefficient matrix and a low-frequency coefficient set;
the prediction module 630 is configured to perform context prediction on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component, so as to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix;
the determining module 640 is configured to determine target compressed data corresponding to each frame of image based on the first encoding information and the second encoding information of the at least one color component.
In some possible embodiments, the grouping module 620 includes: the extraction submodule is used for extracting the DCT coefficients of each color component, wherein coefficients with the same frequency in the DCT coefficients form a space dimension, and coefficients with different frequencies form a channel dimension to obtain multichannel DCT coefficients; and the grouping sub-module is used for dividing the multichannel DCT coefficients in the channel dimension according to the frequency order to obtain the high-frequency coefficient matrix and the low-frequency coefficient matrix.
In some possible embodiments, the coefficients of the same frequency in the DCT coefficients form a spatial dimension, and the coefficients of different frequencies can form a channel dimension, and the prediction module 620 includes: the acquisition submodule is used for acquiring prior probability distribution parameters estimated for the DCT coefficient of each color component; the first prediction submodule is used for carrying out context prediction and coding on the high-frequency coefficient matrix in the space dimension by utilizing the prior probability distribution parameter to obtain first coding information corresponding to the high-frequency coefficient matrix; and the second prediction submodule is used for carrying out context prediction and coding on the low-frequency coefficient matrix in the space dimension and the channel dimension respectively by utilizing the prior probability distribution parameter to obtain second coding information corresponding to the low-frequency coefficient matrix.
In some possible embodiments, the first prediction submodule includes: the first splitting unit is used for splitting the high-frequency coefficient matrix in the space dimension to obtain M row coefficients; the first prediction unit is used for predicting and encoding the M row coefficients in a multistage spatial autoregressive mode by utilizing the prior probability distribution parameters to obtain first encoding information corresponding to the high-frequency coefficient matrix; wherein the prediction process of the Mth row coefficient depends on the first M-1 row coefficients.
In some possible embodiments, the first prediction unit includes: the first prediction subunit is used for predicting the 1 st line coefficient in the M line coefficients based on the prior probability distribution parameter to obtain a 1 st predicted value; the first connection subunit is used for carrying out channel connection on the first i row coefficients in the M row coefficients and the prior probability distribution parameters to obtain an ith fusion parameter; i is an integer of 1 or more and M or less; the second prediction subunit is used for predicting the (i+1) th row coefficient based on the (i) th fusion parameter to obtain an (i+1) th predicted value; and the first coding subunit is used for respectively coding the ith row coefficient based on the ith predicted value to obtain first coding information corresponding to the high-frequency coefficient matrix.
In some possible embodiments, the second prediction submodule includes: the second splitting unit is used for splitting the low-frequency coefficient matrix in the space dimension to obtain N row coefficients; the second prediction unit is used for predicting and encoding the N line coefficients based on the prior probability distribution parameters to obtain encoding information corresponding to the N line coefficients; the third splitting unit is used for sequentially splitting the j-th row coefficient in the N row coefficients in the channel dimension to obtain K column coefficients in each row coefficient; j is an integer of 1 or more and N or less; the third prediction unit is used for predicting and encoding K column coefficients corresponding to each row coefficient in a multistage channel autoregressive mode to obtain encoded information corresponding to the K column coefficients; and the determining unit is used for determining second coding information corresponding to the low-frequency coefficient matrix based on the coding information corresponding to the N row coefficients and the coding information corresponding to the K column coefficients in each row coefficient.
In some possible embodiments, the second prediction unit includes: the first prediction subunit is used for predicting the 1 st line coefficient in the N line coefficients based on the prior probability distribution parameter to obtain an initial predicted value corresponding to the 1 st line coefficient; a first connection subunit, configured to connect the 1 st row coefficient of the nth row coefficients with the prior probability distribution parameter to obtain a shared distribution parameter; the second prediction subunit is configured to predict, based on the shared distribution parameter, other line coefficients except the 1 st line coefficient in the N line coefficients, so as to obtain initial predicted values corresponding to the other line coefficients respectively; and the first coding subunit is used for respectively coding the same line coefficient based on the initial predicted value corresponding to each line coefficient in the N line coefficients to obtain coding information corresponding to the N line coefficients.
In some possible embodiments, the first prediction subunit is further configured to predict, based on the prior probability distribution parameter, a 1 st line coefficient of the N line coefficients, to obtain an initial predicted value corresponding to the 1 st line coefficient; connecting the 1 st row coefficient in the N row coefficients with the prior probability distribution parameters to obtain shared distribution parameters; based on the sharing distribution parameters, predicting the row coefficients except the 1 st row coefficient in the N row coefficients respectively to obtain initial predicted values corresponding to the same row coefficient respectively; and respectively encoding the same row coefficient based on the initial predicted value corresponding to each row coefficient to obtain encoding information corresponding to the N row coefficients.
In some possible embodiments, the N line coefficients are { C } in order according to spatial position 1 ,C 2 ,C3,…,C N -the third prediction unit comprises: a third predictor unit for based on the line coefficient C 1 Corresponding initial predicted value for the C 1 Predicting a 1 st column coefficient in the disassembled K column coefficients to obtain a corresponding 1 st optimized predicted value; a second connection subunit for the C 1 The first j column coefficients and the row coefficient C of the disassembled K column coefficients 1 Carrying out channel connection on the corresponding initial predicted value to obtain a j-th channel parameter; j is an integer of 1 or more and K or less; a fourth predictor unit for comparing the C based on the j-th channel parameter 1 Predicting the j+1th column coefficient in the disassembled K column coefficients to obtain a j+1th optimized predicted value; a second coding subunit for optimizing the predicted value based on the j-th, respectively, for the C 1 Coding the j-th column coefficient in the disassembled K column coefficients to obtain the C 1 The code information of the disassembled K column coefficients.
In some possible embodiments, the N line coefficients are { C } in order according to spatial position 1 ,C 2 ,C3,…,C N -the third prediction unit further comprises: a fifth prediction subunit for generating a prediction result based on the line coefficient { C 2 ,C 3 ,…,C N Predicting the 1 st column coefficient disassembled by the corresponding line coefficient to obtain the 1 st best corresponding to the 1 st column coefficient disassembled by the initial predicted value corresponding to each line coefficient in the sequence }Transforming the predicted value; a determining subunit for respectively based on the row coefficients { C 2 ,C 3 ,…,C N Determining the context information of the (t+1) th column coefficient in the same row coefficient by the initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient; t is an integer of 1 or more and K-1 or less; a sixth prediction subunit, configured to predict, based on the context information, a (t+1) th column coefficient in the same row coefficient, to obtain a (t+1) th optimized predicted value corresponding to the (t+1) th column coefficient; a third coding subunit, configured to code a t column coefficient in the same row coefficient based on the obtained t optimal prediction value, to obtain the row coefficient { C } 2 ,C 3 ,…,C N Coding information of K column coefficients of each row coefficient disassembly.
In some possible embodiments, the determining subunit is further configured to compare the row coefficient { C 2 ,C 3 ,…,C N Channel splicing is carried out on the t-th column of each row of coefficients to obtain a combination coefficient corresponding to the t-th column; and determining the context information of the (t+1) th column coefficient in each row coefficient based on the initial predicted value corresponding to each row coefficient and the combination coefficient corresponding to the first t column in each row coefficient.
The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to perform the methods described in the method embodiments, and for technical details not disclosed in the apparatus embodiments of the present application, please understand with reference to the description of the method embodiments of the present application.
In the embodiment of the present application, if the image compression method is implemented in the form of a software functional module and sold or used as a separate product, the image compression method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or portions contributing to the related art, and the software product may be stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific hardware, software, or firmware, or to any combination of hardware, software, and firmware.
The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the program to realize part or all of the steps of the method.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.
Embodiments of the present application provide a computer program comprising computer readable code which, when run in a computer device, performs some or all of the steps for implementing the above method.
Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, storage medium, computer program and computer program product of the present application, please refer to the description of the method embodiments of the present application.
It should be noted that, fig. 7 is a schematic diagram of a hardware entity of a computer device in the embodiment of the present application, as shown in fig. 7, the hardware entity of the computer device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein: the processor 701 generally controls the overall operation of the computer device 700. Communication interface 702 may enable the computer device to communicate with other terminals or servers over a network.
The memory 703 is configured to store instructions and applications executable by the processor 701, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by various modules in the processor 701 and the computer device 700, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM). Data transfer may occur between the processor 701, the communication interface 702 and the memory 703 via the bus 704.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by the function and the internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise. The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units. Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application.

Claims (13)

1. An image compression method, the method comprising:
obtaining discrete cosine transform DCT coefficients of at least one color component corresponding to each frame of image in a video sequence to be compressed;
Rearranging and grouping the DCT coefficients of the color components according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient matrix;
respectively carrying out context prediction and coding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of the color component to obtain first coding information corresponding to the high-frequency coefficient matrix and second coding information corresponding to the low-frequency coefficient matrix;
and determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component.
2. The method according to claim 1, wherein the reordering of the DCT coefficients for each of the color components in order of higher frequency results in a high frequency coefficient matrix and a low frequency coefficient matrix, comprising:
extracting coefficients with the same frequency in the DCT coefficients to form a space dimension aiming at the DCT coefficients of each color component, and forming channel dimensions by the coefficients with different frequencies to obtain multichannel DCT coefficients;
and dividing the multichannel DCT coefficients in the channel dimension according to the frequency order to obtain the high-frequency coefficient matrix and the low-frequency coefficient matrix.
3. The method according to claim 1 or 2, wherein coefficients of the same frequency in the DCT coefficients form a spatial dimension, coefficients of different frequencies are capable of forming a channel dimension, and the performing context prediction and encoding on the high frequency coefficient matrix and the low frequency coefficient matrix of each color component respectively to obtain first encoded information corresponding to the high frequency coefficient matrix and second encoded information corresponding to the low frequency coefficient matrix includes:
acquiring a priori probability distribution parameter estimated for the DCT coefficient of each color component;
performing context prediction and coding on the high-frequency coefficient matrix in the space dimension by using the prior probability distribution parameters to obtain first coding information corresponding to the high-frequency coefficient matrix;
and carrying out context prediction and coding on the low-frequency coefficient matrix in the space dimension and the channel dimension respectively by utilizing the prior probability distribution parameters to obtain second coding information corresponding to the low-frequency coefficient matrix.
4. The method of claim 3, wherein performing context prediction and encoding on the high frequency coefficient matrix in the spatial dimension by using the prior probability distribution parameter to obtain the first encoded information corresponding to the high frequency coefficient matrix includes:
In the space dimension, disassembling the high-frequency coefficient matrix to obtain M row coefficients; m is a positive integer;
predicting and encoding the M row coefficients by using the prior probability distribution parameters in a multistage spatial autoregressive mode to obtain first encoding information corresponding to the high-frequency coefficient matrix; wherein the prediction process of the Mth row coefficient depends on the first M-1 row coefficients.
5. The method according to claim 4, wherein predicting and encoding the M row coefficients by using the prior probability distribution parameter through a multi-level spatial autoregressive manner to obtain the first encoded information corresponding to the high frequency coefficient matrix includes:
based on the prior probability distribution parameters, predicting a 1 st row coefficient in the M row coefficients to obtain a 1 st predicted value;
performing channel connection on the first i row coefficients in the M row coefficients and the prior probability distribution parameters to obtain an ith fusion parameter; i is an integer of 1 or more and M or less;
predicting the (i+1) th row coefficient based on the (i) th fusion parameter to obtain an (i+1) th predicted value;
and respectively encoding the ith row coefficient based on the ith predicted value to obtain first encoded information corresponding to the high-frequency coefficient matrix.
6. The method according to any one of claims 3 to 5, wherein said performing context prediction and coding on the low frequency coefficient matrix in the spatial dimension and the channel dimension respectively by using the prior probability distribution parameter to obtain second coding information corresponding to the low frequency coefficient matrix includes:
disassembling the low-frequency coefficient matrix in the space dimension to obtain N row coefficients; n is a positive integer different from M;
based on the prior probability distribution parameters, predicting and encoding the N row coefficients to obtain encoding information corresponding to the N row coefficients;
for each row coefficient in the N row coefficients, disassembling in the channel dimension to obtain K column coefficients corresponding to each row coefficient;
predicting and encoding K column coefficients corresponding to each row coefficient in a multistage channel autoregressive mode to obtain encoded information corresponding to the K column coefficients;
and determining second coding information corresponding to the low-frequency coefficient matrix based on the coding information corresponding to the N row coefficients and the coding information corresponding to the K column coefficients in each row coefficient.
7. The method according to claim 6, wherein predicting and encoding the N line coefficients based on the prior probability distribution parameter to obtain encoded information corresponding to the N line coefficients includes:
Based on the prior probability distribution parameters, predicting a 1 st line coefficient in the N line coefficients to obtain an initial predicted value corresponding to the 1 st line coefficient;
connecting the 1 st row coefficient in the N row coefficients with the prior probability distribution parameters to obtain shared distribution parameters;
based on the sharing distribution parameters, predicting other row coefficients except the 1 st row coefficient in the N row coefficients respectively to obtain initial predicted values corresponding to the other row coefficients respectively;
and respectively encoding the same line coefficient based on the initial predicted value corresponding to each line coefficient in the N line coefficients to obtain encoding information corresponding to the N line coefficients.
8. The method according to claim 6 or 7, wherein the N line coefficients are { C in order according to spatial position 1 ,C 2 ,C 3 ,…,C N And predicting and encoding K column coefficients corresponding to each row coefficient in a multistage channel autoregressive mode to obtain encoded information corresponding to the K column coefficients, wherein the method comprises the following steps of:
based on the line coefficient C 1 Corresponding initial predicted value for the C 1 Predicting a 1 st column coefficient in the disassembled K column coefficients to obtain a corresponding 1 st optimized predicted value;
For said C 1 The first j column coefficients and the row coefficient C of the disassembled K column coefficients 1 Carrying out channel connection on the corresponding initial predicted value to obtain a j-th channel parameter; j is an integer of 1 or more and K or less;
based on the j-th channel parameter, to the C 1 Predicting the j+1th column coefficient in the disassembled K column coefficients to obtain a j+1th optimized predicted value;
respectively based on the j-th optimized predicted value, for the C 1 Coding the j-th column coefficient in the disassembled K column coefficients to obtain the C 1 The code information of the disassembled K column coefficients.
9. A method according to any one of claims 6 to 8, wherein the N line coefficients are { C in order according to spatial position 1 ,C 2 ,C 3 ,…,C N And predicting and encoding the K column coefficients corresponding to each row coefficient in a multi-stage channel autoregressive manner to obtain encoded information corresponding to the K column coefficients, and further including:
based on the line coefficient { C 2 ,C 3 ,…,C N Predicting the 1 st column coefficient disassembled by the same row coefficient according to the initial predicted value corresponding to each row coefficient in the sequence to obtain the 1 st optimized predicted value corresponding to the 1 st column coefficient disassembledA value;
based on the row coefficients { C }, respectively 2 ,C 3 ,…,C N Determining the context information of the (t+1) th column coefficient in the same row coefficient by the initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient; t is an integer of 1 or more and K-1 or less;
Based on the context information, predicting a (t+1) th column coefficient in the same row coefficient to obtain a (t+1) th optimized predicted value corresponding to the (t+1) th column coefficient;
based on the obtained t-th optimized predicted value, coding a t-th column coefficient in the same row coefficient to obtain the row coefficient { C } 2 ,C 3 ,…,C N Coding information of K column coefficients of each row coefficient disassembly.
10. The method according to claim 9, wherein the respective coefficients { C 2 ,C 3 , ,C N The initial predicted value corresponding to each row coefficient and the first t column coefficients in the same row coefficient determine the context information of the (t+1) th column coefficient in each row coefficient, including:
the line coefficient { C 2 ,C 3 ,…,C N Channel splicing is carried out on the t-th column of each row of coefficients to obtain a combination coefficient corresponding to the t-th column;
and determining the context information of the (t+1) th column coefficient in each row coefficient based on the initial predicted value corresponding to each row coefficient and the combination coefficient corresponding to the first t column in each row coefficient.
11. An image compression apparatus, the apparatus comprising:
the acquisition module is used for acquiring DCT coefficients of at least one color component corresponding to each frame of image in the video sequence to be compressed;
The grouping module is used for carrying out rearrangement grouping on the DCT coefficients of each color component according to the frequency order to obtain a high-frequency coefficient matrix and a low-frequency coefficient group;
the prediction module is used for respectively carrying out context prediction and encoding on the high-frequency coefficient matrix and the low-frequency coefficient matrix of each color component to obtain first encoding information corresponding to the high-frequency coefficient matrix and second encoding information corresponding to the low-frequency coefficient matrix;
and the determining module is used for determining target compressed data corresponding to each frame of image based on the first coding information and the second coding information of the at least one color component.
12. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 10 when the program is executed.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
CN202310068109.8A 2023-01-12 2023-01-12 Image compression method, device, equipment and storage medium Pending CN116095333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310068109.8A CN116095333A (en) 2023-01-12 2023-01-12 Image compression method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310068109.8A CN116095333A (en) 2023-01-12 2023-01-12 Image compression method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116095333A true CN116095333A (en) 2023-05-09

Family

ID=86209980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310068109.8A Pending CN116095333A (en) 2023-01-12 2023-01-12 Image compression method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116095333A (en)

Similar Documents

Publication Publication Date Title
CN111953998B (en) Point cloud attribute coding and decoding method, device and system based on DCT (discrete cosine transformation)
US10666944B2 (en) Image encoding/decoding apparatus, image processing system, image encoding/decoding method and training method
WO2020253828A1 (en) Coding and decoding method and device, and storage medium
US10003792B2 (en) Video encoder for images
KR20200003888A (en) Selective Blending for Entropy Coding in Video Compression
Ma et al. Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues
CN115379241A (en) Method and apparatus for coding last significant coefficient flag
US11202054B2 (en) Method and apparatus for inter-channel prediction and transform for point-cloud attribute coding
CN104754362B (en) Image compression method using fine-divided block matching
CN105812804B (en) The method and device of optimum quantization value during a kind of determining RDOQ
CN113810693B (en) Lossless compression and decompression method, system and device for JPEG image
US20230419554A1 (en) Point cloud attribute encoding method, decoding method, encoding device, and decoding device
CN113079378B (en) Image processing method and device and electronic equipment
CN118020297A (en) End-to-end image and video coding method based on hybrid neural network
CN113747163A (en) Image coding and decoding method and compression method based on context reorganization modeling
CN110324639B (en) Techniques for efficient entropy encoding of video data
EP3813372A1 (en) Sparse matrix representation using a boundary of non-zero coefficients
CN113573056A (en) Method, device, storage medium and terminal for optimizing and quantizing rate distortion
US8787686B2 (en) Image processing device and image processing method
CN116095333A (en) Image compression method, device, equipment and storage medium
EP4117289A1 (en) Image processing method and image processing device
CN115102934A (en) Point cloud data decoding method, encoding method, device, equipment and storage medium
CN115834888A (en) Feature map encoding and decoding method and device
US8861880B2 (en) Image processing device and image processing method
CN114449277A (en) Method and apparatus for context derivation for coefficient coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination