CN113196779B - Method and device for compressing video clip - Google Patents

Method and device for compressing video clip Download PDF

Info

Publication number
CN113196779B
CN113196779B CN201980066142.9A CN201980066142A CN113196779B CN 113196779 B CN113196779 B CN 113196779B CN 201980066142 A CN201980066142 A CN 201980066142A CN 113196779 B CN113196779 B CN 113196779B
Authority
CN
China
Prior art keywords
ics
frame
channel
data
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980066142.9A
Other languages
Chinese (zh)
Other versions
CN113196779A (en
Inventor
戴维·J·白瑞迪
严雪飞
张卫平
于长志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Ankedi Intelligent Technology Co ltd
Original Assignee
Wuxi Ankedi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Ankedi Intelligent Technology Co ltd filed Critical Wuxi Ankedi Intelligent Technology Co ltd
Publication of CN113196779A publication Critical patent/CN113196779A/en
Application granted granted Critical
Publication of CN113196779B publication Critical patent/CN113196779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Abstract

The invention discloses a compression method, compression equipment and a video recording system. The compression method comprises the following steps: reading out a plurality of sets of original pixel values from a camera head, wherein each set of original pixel values corresponds to a frame of a video clip; intra-frame compression by compressing each set of original pixel values into an intra-frame compressed sample (ICS) frame using a compression core, wherein the ICS frame includes a first ICS frame and several remaining ICS frames (R-ICS frames) occurring after the first ICS frame, and wherein the compression core has Ncomp ICS lanes, and Ncomp is an integer not less than 1; quantizing the first ICS frame and quantizing the R-ICS frame into a QR-ICS frame, wherein the quantized first ICS frame comprises Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame corresponds to one ICS channel of the quantized first ICS frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame corresponds to one ICS channel of the QR-ICS frame; for each sub-QR-ICS frame corresponding to each ICS channel, performing image block matching subtraction in the sub-QR-ICS frame associated with the first single-channel quantized ICS frame corresponding to the ICS channel, and generating the ICS frame with the image blocks subtracted, wherein one or more motion vectors correspond to one sub-QR-ICS frame, and the motion vectors represent the relative positioning between the matched image blocks in the sub-QR-ICS frame corresponding to the ICS channel and the reference image blocks in the first single-channel quantized ICS frame corresponding to the ICS channel, and the motion vectors corresponding to one ICS channel are shared by other ICS channels; for the ICS frames of the subtracted image blocks corresponding to each ICS channel, combining the ICS frames of the subtracted image blocks into stacks, wherein each stack comprises a preset number of ICS frames of the subtracted image blocks; for an ICS frame in each stack to which an image block is subtracted corresponding to each ICS channel, shared data representing similar data between the ICS frames from which the image block is subtracted is determined, and a stack residual frame is determined based on the shared data.

Description

Method and device for compressing video clip
Technical Field
The present invention relates to a method and apparatus for video clip compression, and more particularly to a method for low power video clip compression.
Background
The general function of a camera is to convert parallel optical data into a compressed, continuous electronic format to transmit information or to store information. In some embodiments, the parallel optical data may correspond to a video clip.
The video segment is composed of a plurality of frames, and the frames of the video segment may have high resolution, so that it is possible to slow down the data transmission speed, which means that it is necessary to compress the video segment in a certain way before the data transmission is performed.
The operating power of the camera is critical to limiting the pixel capacity. One conventional camera includes a focal plane and a "system on a chip" image processing platform. Image Signal Processing (ISP) chips perform a variety of image processing functions, such as demosaicing, non-uniformity correction, and so on, including image compression.
Standard compression algorithms are typically implemented in hardware application circuits to improve power efficiency and speed, and these algorithms still require significant power and memory resources. Furthermore, in general, the ISP chip requires power and chip area for image processing. The ISP chips typically use more power than the image sensor capture and data readout due to the multiple digital steps required per pixel on average in the process.
The reconstructed image quality is another measure of the compression method.
Based on the above description, it may be desirable to employ a method that takes into account operating power, reconstruction quality, and the like.
Disclosure of Invention
One aspect of the present disclosure relates to a method of compressing a video segment. The compression method may include one or more of the following operations: reading a plurality of sets of raw pixel values from a camera head, each set of raw pixel values corresponding to a frame of a video clip; performing intra-frame compression, compressing each set of original pixel values into one frame of an intra-frame compression sample (ICS) using a compression kernel, wherein the ICS frame comprises a first frame of the intra-frame compression sample and a plurality of frames (R-ICS frames) continuing to sample after the first frame, wherein the compression kernel has Ncomp ICS channels, and Ncomp is an integer no less than 1; quantizing the first frame of intra-frame compression samples and quantizing the R-ICS frame into a QR-ICS frame; wherein the quantized first ICS frames comprise Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame corresponds to one ICS channel of the quantized first ICS frames; each QR-ICS frame comprises Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame corresponds to one ICS channel of the QR-ICS frame; for each sub-QR-ICS frame corresponding to each ICS channel, performing image block matching deduction on the sub-QR-ICS frame associated with the first single-channel quantized ICS frame corresponding to the ICS channel, and generating the ICS frame with subtracted image blocks, wherein one or more motion vectors correspond to one sub-QR-ICS frame, and the matching image block of the sub-QR-ICS frame corresponding to the ICS channel has the relative positioning between the reference image block in the first single-channel quantized ICS frame corresponding to the ICS channel represented by one motion vector, and the motion vector corresponding to one ICS channel is shared by other ICS channels; for the ICS frames of the subtracted image blocks corresponding to each ICS channel, grouping the ICS frames of the subtracted image blocks into stacks, wherein each stack comprises a preset number of ICS frames of the subtracted image blocks; for an ICS frame of the subtracted image block in each stack corresponding to each ICS channel, shared data representing similar data in the ICS frame of the subtracted image block is determined, and a stack residual frame is determined from the shared data.
In some embodiments, the stage of compressing each set of original pixel values into a compressed frame by using a compression kernel comprises: for each frame of a video segment, each portion of the original pixel values divided into groups is compressed into integers using a compression kernel, where each portion of the original pixel value grouping corresponds to a segment of the frame.
In some embodiments, making an image block match subtraction for the QR-ICS frame associated with the quantized first ICS frame at the stage of corresponding the sub-QR-ICS frame to each ICS channel comprises: for each sub-QR-ICS frame corresponding to one ICS channel, carrying out motion prediction based on image block matching on the image block of the sub-QR-ICS frame associated with the first ICS frame quantized in a single channel corresponding to the ICS channel, wherein the sub-QR-ICS frame is divided into a plurality of image blocks so as to facilitate image block searching in the first ICS frame quantized in the single channel, and in the image blocks of the sub-QR-ICS frame, no gap or no overlap exists; determining an ICS frame from which the image blocks are subtracted by subtracting one or more matching image blocks from each sub-QR-ICS frame.
In some embodiments, at the stage of associating each sub-QR-ICS frame with an ICS channel, motion prediction based on image block matching is performed on an image block of the sub-QR-ICS frame associated with the first ICS frame quantized in a single channel corresponding to the ICS channel, including: defining a sub-QR-ICS frame corresponding to an ICS channel as a matching frame, defining a first ICS frame of single-channel quantization corresponding to the ICS channel as a search frame, and performing hierarchical image block search in the search frame, wherein the matching frame is divided into a plurality of mutually related image blocks, and the hierarchical image block search comprises: aiming at each correlated image block in the matched frame, carrying out image block search in a search frame by using a step size area, wherein the step size area is a preset integer which is more than or equal to 1; in the process of searching image blocks in a search frame, calculating the square difference between each image block in the search frame and the square difference between the mutually associated image blocks in a matching frame; if the lowest square difference is smaller than a preset threshold value, determining a target image block in the search frame as a reference image block by using the lowest square difference, and determining image blocks which are correlated with each other as matching image blocks; if the least square difference is not smaller than a preset threshold value and the image block area of the associated image block is larger than a preset minimum image block area, defining the associated image block as a matching frame, defining the target image block as a search frame, and performing hierarchical image block search in the search frame; and repeating the hierarchical image block search until a reference image block corresponding to the associated image block is found, wherein the square difference of the reference image block is smaller than a preset threshold value, or the image block area of the associated image block is not larger than a preset minimum image block area.
In some embodiments, the number of ICS frames in each stack that subtracts an image block is equal.
In some embodiments, the determining shared data at the stage of corresponding the ICS frame of the subtracted image blocks in each stack to each ICS channel, the determining a stack residual frame based on the shared data for each ICS frame of the subtracted image blocks, comprises: determining shared data by convolving, for each stack corresponding to the ICS channel, the ICS frame of the subtracted image block in each stack with a first kernel; determining quantized shared data of each stack by quantizing a value in the shared data to an integer of a preset bit width for each stack corresponding to the ICS channel; rescaling the quantized shared data to RQ shared data for each stack corresponding to the ICS channel; for each stack corresponding to the ICS channel, reshaping the RQ shared data into RRQ shared data by performing deconvolution on the second kernel; for each of the image-block-subtracted ICS frames in each stack corresponding to the ICS channel, determining a stack residual frame by subtracting the RRQ shared data from the image-block-subtracted ICS frame.
In some embodiments, determining shared data at the stage of corresponding the ICS frame of the subtracted image block in each stack to each ICS channel, and determining a stack residual frame from these shared data, comprises: calculating a weighted summation value of values at the same position in the ICS frame of the subtracted image block using the weighted summation parameters for each stack corresponding to the ICS channel, thereby compressing the ICS frame of the subtracted image block into a weighted summation frame; determining shared data by convolving the weighted sum frame with a first kernel for each stack corresponding to the ICS channel; determining quantized shared data by quantizing a value in the shared data to an integer of a preset bit width for each stack corresponding to the ICS channel; rescaling the quantized shared data to RQ shared data for each stack corresponding to the ICS channel; for each stack corresponding to the ICS channel, reshaping the RQ shared data into RRQ shared data by performing deconvolution on the second kernel; for each of the image-block-subtracted ICS frames in each stack corresponding to the ICS channel, determining a stack residual frame by subtracting the RRQ shared data from the image-block-subtracted ICS frame.
In some embodiments, the method further comprises: compressing each stack residual frame using a third kernel compression for each stack corresponding to the ICS channel, determining quantized compressed stack residual frames by quantizing values in each compressed stack residual frame to integers of a preset bit width for each stack corresponding to the ICS channel; performing entropy encoding operation on the quantized shared data corresponding to each ICS channel, the quantized compressed stack residual frame corresponding to each ICS channel and the motion vector in each stack being shared by several ICS channels, respectively, wherein the entropy-encoded quantized shared data corresponding to each ICS channel, the entropy-encoded quantized compressed stack residual frame corresponding to each ICS channel and the entropy-encoded motion vector in each stack shared by several ICS channels are stored for decoding; the entropy coding operation is based on a global data dictionary, which is based on a large amount of data of the same type.
In some embodiments, the method further comprises: for each stack corresponding to the ICS channel, performing entropy encoding operations on entropy-encoded quantized compressed stack residual frames and their corresponding quotient-encoded motion vectors for the entropy-encoded quantized shared data; rescaling each quantized compressed stack residual frame to an RQ compressed stack residual frame for each stack corresponding to the ICS channel; decompressing each RQ compressed stack residual frame into a first decompressed ICS frame by performing a deconvolution on a fourth kernel for each stack corresponding to an ICS channel; for each stack corresponding to the ICS channel, reshaping the RQ shared data into RRQ shared data by performing deconvolution on the second kernel; determining, for a first decompressed ICS frame in each stack corresponding to the ICS channel, a second decompressed ICS frame by adding the RRQ shared data and one or more matching image blocks corresponding to the ICS channel and the stored motion vectors to the first decompressed ICS frame; determining, for the second decompressed ICS frame in each stack, a third decompressed ICS frame by superimposing the corresponding second decompressed ICS frames of all ICS channels together; decompressing, with a decompression core, the third decompressed ICS frame into a neural network for Quality Improvement (QINN) by performing intra-frame decompression for the third decompressed ICS frame in each stack to determine a reconstructed frame; wherein, the first kernel to the fourth kernel are shared by a plurality of stacks corresponding to the ICS channel.
In some embodiments, the time module includes first through fourth cores, or the time module includes first through fourth cores and a weighted sum parameter, and the parameters in the compression cores, the time module, the compression cores, and the QINN are determined by sample-based training, the sample-based training process including: reading a plurality of sets of raw pixel value samples, wherein each set of raw pixel value samples corresponds to a frame; performing intra-frame compression, compressing each set of raw pixel value samples into ICS frame samples with an initial compression kernel, wherein the ICS frame samples include a first ICS frame sample and a number of R-ICS frame samples located after the first ICS frame sample, and wherein the initial compression kernel has Ncomp ICS channels, and Ncomp is an integer no less than 1; quantizing the first ICS frame samples and quantizing the R-ICS frame samples into QR-ICS frame samples, wherein the quantized first ICS frame samples comprise Ncomp single-channel quantized first ICS frame samples, and each single-channel quantized first ICS frame sample corresponds to one ICS channel of the quantized first ICS frame samples, wherein each QR-ICS frame sample comprises Ncomp sub-QR-ICS frame samples, and each sub-QR-ICS frame sample corresponds to one ICS channel of the QR-ICS frame samples; for sub-QR-ICS frame samples corresponding to each ICS channel, making an indeterminate match subtraction for sub-QR-ICS frame samples associated with the first ICS frame sample quantized in a single channel, and generating samples of an ICS frame of subtracted image blocks, wherein one or more motion vectors correspond to one sub-QR-ICS frame sample, and the relative positioning of a matching image block in the sub-QR-ICS frame sample and a reference image block of the first ICS frame sample quantized in the single channel is represented by a motion vector, and wherein the motion vector corresponding to one ICS channel is shared by other ICS channels; grouping the ICS frame samples of the subtracted image blocks into stacks for the ICS frame samples of the subtracted image blocks corresponding to each ICS channel, wherein each stack comprises subtracted image block samples of a preset number of ICS frame samples; determining, for an ICS frame of a subtracted image block in each stack corresponding to each ICS channel, shared data samples and compressed stack residual frame samples, a first decompressed ICS frame sample with an initial time module, wherein the initial time module includes a fifth kernel, a sixth kernel, a seventh kernel, and an eighth kernel, or the time module includes fifth through eighth kernels and an initial weighted summation parameter; for each stack, determining reconstructed frame samples using an initial compression kernel and an initial QINN; training the initial compression kernel and the initial QINN into an intermediate compression kernel and an intermediate QINN; the parameters of the initial time block are trained by multi-graph joint loss training.
In some embodiments, the training of the parameters of the initial time module by multi-graph joint loss training includes: determining 4 computation graphs, wherein the 4 graphs are a process of applying an initial time block, a first computation graph G1 representing a process of holding a first quantized point and a second quantized point, a second computation graph G2 representing a process of holding a first quantized point, a third computation graph G3 representing a process of holding a second quantized point, and a fourth computation graph G4 representing a process of holding no quantized point; wherein the first quantization point represents quantized output data of the fifth kernel and the second quantization point represents quantized output data of the sixth kernel; determining three optimization treatments in sequence in the iterative training process; wherein the first optimization process is set to train parameters before the first quantization point to minimize a first total loss comprising DA _ E from the first quantization point of G1, DA _ E from the second quantization point of G3, and reconstruction loss from G4; wherein the second optimization process is set to train parameters between the first quantization point and the second quantization point to minimize a second total loss, the second total loss comprising DA _ E from the second quantization point of G1 and the reconstruction loss from G2; wherein the third optimization process is set to train parameters after the second quantization point to minimize a third total loss, the third total loss comprising the reconstruction loss from G1; where DA _ E represents a distinguishable approximation of entropy; the fifth through eighth cores may be trained as fifth through eighth intermediate cores by iteratively running the first, second, and third optimization processes to train parameters in the initially existing module, wherein the parameters of the fifth through eighth intermediate cores are floating point numbers.
In some embodiments, the first graph G1 includes: inputting data T1 into the first convolution layer by using a parameter Para (bQ1) to determine data T2, wherein the data T1 corresponds to ICS frame samples of the subtracted image blocks of all stacks corresponding to each channel; determining data T2_ Q by quantizing the data T2 at the first quantization point; determining data T3 from data T2_ Q, wherein in a process from T2_ Q to T3, the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ2), wherein the process further includes one rescaling operation before the first deconvolution layer and one deduction operation after the second convolution layer; determining data T3_ Q by quantizing the data T3 at the second quantization point, wherein the data T3_ Q corresponds to a quantized compressed stack residual frame; determining data T4 from data T3_ Q, wherein in the process from T3_ Q to T4, the trained parameters comprise a second deconvolution layer using parameters Para (aQ2), wherein the process further comprises a rescaling operation prior to the second deconvolution layer; wherein Para (bQ1) is a parameter in the fifth kernel, or Para (bQ1) is a weighted sum parameter and a parameter in the fifth kernel; where Para (aQ1, bQ2) is a parameter in the sixth and seventh cores, and Para (aQ2) is a parameter in the eighth core.
In some embodiments, the second graph G2 includes: determining data T2 by inputting data T1 into the first convolutional layer with parameter Para (bQ 1); determining data T2_ Q by quantizing the data T2 at the first quantization point; determining data T3 from data T2_ Q, wherein in the process from data T2_ Q to T3, the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ 2); data T4(2) is determined from data T3, wherein in the process from data T3 to T4(2), the trained parameters include a second deconvolution layer using parameters Para (aQ 2).
In some embodiments, the third graph G3 includes: inputting data T1 into the first convolutional layer by using a parameter Para (bQ1) to determine data T2; determining data T3(3) from the data T2; wherein in the process from data T2 to T3(3), the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ2), wherein the process further includes a subtraction operation after the second convolution layer; determining data T3_ Q (3) by quantizing the data T3 at the second quantization point; data T4(3) is determined from data T3_ Q (3), wherein in the process from data T3_ Q (3) to T4(3), the trained parameters include a second deconvolution layer using parameter Para (aQ2), wherein the process further includes a rescaling operation prior to the second convolution layer.
In some embodiments, the fourth graph G4 includes: inputting data T1 into the first convolutional layer using a parameter Para (bQ1) to determine data T2; determining T3(4) from data T2, wherein in a process from data T2 to T3(4), the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ2), wherein the process further includes a subtraction operation after the second convolution layer; data T4(4) is determined from data T3(4), where in going from data T3(4) to T4(4), the trained parameters include a second deconvolution layer using parameters Para (aQ 2).
In some embodiments, the method further comprises: determining a first kernel by performing an integer transformation on a parameter in a fifth intermediate kernel, determining a second kernel by performing an integer transformation on a parameter in a sixth intermediate kernel, and determining a third kernel by performing an integer transformation on a parameter in a seventh intermediate kernel; determining a fourth kernel by fine tuning parameters in the eighth intermediate kernel; the compressed kernels and QINN are determined by fine-tuning parameters in the intermediate compressed kernels and intermediate QINN.
Another aspect of the present disclosure relates to an apparatus for compressing a video clip, comprising: a readout unit configured to read out a plurality of sets of original pixel values from the camera head; a processor, wherein the processor is configured to perform multi-frame compression of a video segment, wherein the compression comprises: performing intra-frame compression, compressing each set of original pixel values into an intra-frame compressed sample (ICS) frame using a compression core, wherein the ICS frame comprises a first frame of intra-frame compression sampling and a number of frames (R-ICS frames) continuing sampling after the first frame, and wherein the compression core has Ncomp ICS frames and Ncomp is an integer not less than 1; quantizing the first ICS frame and quantizing the R-ICS frame into a QR-ICS frame, wherein the quantized first ICS frame comprises Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame corresponds to one ICS channel of the quantized first ICS frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame corresponds to one ICS channel of the QR-ICS frame; for a sub-QR-ICS frame corresponding to each ICS channel, making image block matching deduction on the sub-QR-ICS frame associated with the first single-channel quantized ICS frame corresponding to the ICS channel, and generating the ICS frame with the subtracted image blocks, wherein one or more motion vectors correspond to one sub-QR-ICS frame, and wherein the relative positioning between the matched image blocks of the sub-QR-ICS frame and the reference image blocks in the first single-channel quantized ICS frame is represented by the motion vectors, wherein the motion vectors corresponding to one ICS channel are shared by other ICS channels; for the ICS frames of the subtracted image blocks corresponding to each ICS channel, grouping the ICS frames of the subtracted image blocks into stacks, wherein each stack comprises a preset number of ICS frames of the subtracted image blocks; determining shared data for the ICS frames of the subtracted image blocks in each stack corresponding to each ICS channel, wherein the shared data represents similar data in the ICS frames of the subtracted image blocks, and determining a stack residual frame from the shared data; wherein each set of original pixel values corresponds to a frame of the video segment.
Additional features will be set forth in part in the description which follows, and in part will be readily apparent to those skilled in the art from that description or recognized by practicing the exemplary embodiments as described herein. The inventive features related to the present invention may be appreciated and understood through practice or application of various aspects of the methodology, the detailed examples of which follow discuss the process and its comprehensive content.
Drawings
The present invention will be further described in exemplary embodiments. These exemplary embodiments will be described in detail with reference to the accompanying drawings. These embodiments are non-limiting exemplary embodiments in which like reference numerals represent similar structures throughout the several views of the drawings. Wherein:
FIG. 1 illustrates an example of an original raw Bayer picture, according to some embodiments of the invention;
FIG. 2 illustrates a method of intra-frame compression according to some embodiments of the invention;
FIG. 3 illustrates a convolution process as described in 204 according to some embodiments of the invention;
FIG. 4 illustrates one example of an intra-frame compression process using frame policy one as described in 204, according to some embodiments of the invention;
FIG. 5 is an example of a compression kernel in a raw Bayer data single-layer convolution 2D compression process according to some embodiments of the invention;
FIG. 6 is an array of integers for a [256,480,4] shape after input pixel values have been compressed using a compression kernel;
FIG. 7 illustrates a method of determining shared data after an intra-frame compression process according to some embodiments of the invention;
FIG. 8 illustrates an exemplary process of FIG. 7 according to some embodiments of the invention;
FIG. 9 illustrates an exemplary image block matching subtraction method according to some embodiments of the invention;
FIG. 10 is an exemplary image block matching based motion prediction method according to some embodiments of the invention;
FIG. 11 is an exemplary shared data based compression method according to some embodiments of the invention;
FIG. 12 illustrates an exemplary process of FIG. 11 in accordance with some embodiments of the invention;
FIG. 13 illustrates a method of reconstructing a video segment according to some embodiments of the present invention;
FIG. 14 illustrates an exemplary process of FIG. 13 in accordance with some embodiments of the invention;
FIG. 15 illustrates a sample-based training method according to some embodiments of the inventions;
FIG. 16 illustrates an exemplary multi-graph joint loss training in accordance with some embodiments of the invention;
FIG. 17 illustrates an exemplary process of the first computational graph G1, according to some embodiments of the invention;
FIG. 18 illustrates an exemplary process of the second computational graph G2 according to some embodiments of the invention;
FIG. 19 illustrates an exemplary process of the third computation graph G3 according to some embodiments of the invention;
FIG. 20 illustrates an exemplary process of the fourth computational graph G4, according to some embodiments of the invention;
FIG. 21 illustrates exemplary four computation graphs, according to some embodiments of the invention;
FIG. 22 illustrates an exemplary time module determination method according to some embodiments of the invention;
FIG. 23 is a schematic view of a compression device according to some embodiments of the present invention;
FIG. 24 is a schematic diagram of a video recording system according to some embodiments of the present invention.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, systems, components, and/or circuits have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present invention. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
It will be understood that the terms "system," "engine," "unit," "module," and/or "block" as used herein are a way of distinguishing, in ascending order, different components, elements, components, parts, or assemblies at different levels. However, these terms may be substituted by other expressions if the same object can be achieved.
Generally, the words "module," "unit," or "block" as used herein refers to a collection of logic, or software instructions, embodied in hardware or firmware. The modules, units, or blocks described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, software modules/units/blocks may be compiled and linked into executable processes. It should be understood that software modules may be called from other modules/units/blocks or themselves, and/or may be called in response to detected events or interrupts. Software modules/units/blocks configured for execution on a computing device may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disk, or any other tangible medium, or downloaded as digital (and may be initially stored, decompressed, or decrypted before execution in a compressed or installable format requiring installation). Such software code may be stored, in part or in whole, on a storage device executing the computing device for execution by the computing device. Software such as an EPROM may be included in firmware. It should also be understood that hardware modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or may include programmable units, such as programmable gate arrays or processors. The modules/units/blocks or computing device functions described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware. Generally, a module/unit/block described herein refers to a logical module/unit/block that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks, even though they have a physical organization or storage. The description may apply to the system, the engine, or a portion thereof.
It will be understood that when an element, engine, module or block is referred to as being "on," "connected to" or "coupled to" another element, engine, module or block, it can be directly on, connected or coupled to the other element, engine, module or block or an intermediate element, engine, module or block, unless the context clearly dictates otherwise. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed transactions.
These and other features of the present invention, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It is understood that the drawings are not to scale.
The terminology used herein is for the purpose of describing particular examples and embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this disclosure, specify the presence of integers, devices, acts, features, steps, elements, operations, and/or components, but do not preclude the presence or addition of one or more other integers, devices, acts, features, steps, elements, operations, components, and/or groups thereof.
The invention discloses a compression method, a compression device and a video recording system. As will be described in detail in the following examples.
In some embodiments, light received by the camera may be read out by an Image Signal Processing (ISP) chip as raw Bayer data. As shown in fig. 1, the camera must read out the raw Bayer data using either a parallel stream or a serial stream. Fig. 1 shows an example of a raw Bayer picture according to some embodiments of the present invention. As shown in fig. 1, an original Bayer picture is presented in the shape of [2048,3840], and each pixel may have a corresponding pixel value. The pixel values may be read out sequentially after the camera head captures the frame.
For each electronic sensor array, the readout data from one focal plane can be presented in a raster format, meaning in rows, read out sequentially. The input pixel values (raw Bayer data) are presented in the shape of [2048,3840 ]. In some embodiments, the original raw Bayer picture of [2048,3840] is compressed into an array of integers of [256,480,4], as will be described in FIG. 6. The pixels may also correspond to different colors, typically red, green and blue, but the color values may typically be mosaiced as they pass through the sensor, so that a given pixel corresponds to a given color.
Pixel values in a conventional camera may need buffering for demosaicing, image processing (non-uniformity correction, color space conversion, de-noising, sharpening, white balance and black level adjustment, etc.), and compression. Since compression is performed as a two-dimensional transform, many lines (typically 8 lines) must be buffered and each pixel value must be accumulated in several transform buffers. Furthermore, the separation of pixel values by quantization matrix and compression (huffman) coding has to be performed in each image information set.
In some embodiments, intra-frame compression may be performed in the raw Bayer data stream. FIG. 2 illustrates a method of intra-frame compression, according to some embodiments of the invention. In some embodiments, the method of intra-frame compression may be performed in a camera-head connected electronic device.
In 202, sets of raw pixel values may be read out sequentially from one camera head. In some embodiments, the raw pixel values may be read out sequentially as raw Bayer data. For example, a camera head may capture a frame containing multiple sets of raw pixel values, each set of raw pixel values corresponding to a frame of a video clip. Each pixel may be represented by a pixel value. The pixel values may be transmitted in a binary form.
At 204, intra-frame compression may be performed by compressing each set of raw pixel values into one frame of intra-frame compressed samples (ICS) with a compression kernel. For each frame, the compression may compress each portion of the original set of pixel values into an integer with a compression kernel, where each portion of the original set of pixel values corresponds to a segment of the frame. In some embodiments, a segment may be an image block or a segment, which will be described in the frame policy. In some embodiments, the ICS frame includes a first ICS frame and a number of frames that continue to appear after the first ICS frame (R-ICS frames). In some embodiments, the compression core may have Ncomp ICS channels and Ncomp may be an integer no less than 1.
In some embodiments, the elements in the compression core may be integers, which are convenient for application to hardware, such as an FPGA. For example, elements in a compression core may be binary and the bit width of the elements may be 12 bits, 10 bits, 8 bits, 6 bits, 4 bits, or 2 bits. Also, when an element is a binary number of 2 bits, the element may be-1 or +1, or the element may be 0 or 1.
Frame strategy one
In some embodiments, a set of original pixel values may correspond to pixels in a frame, and the compression kernel may be a 2D kernel in which the frame may be divided into a plurality of 2D image blocks. One 2D image block and one 2D kernel may have one and the same size. For example, the 2D kernel may have a size [ k ]x,ky]And shape [ N ]X,NY]Can be divided into NX,NY]2D image block, wherein Nx=NX/kx,Ny=NY/ky. The pixel values corresponding to pixels in a particular image block may be in the shape kx,ky,1]Multiplied by a shape of kx,ky,Ncomp]And the pixel values in the 2D image block may be compressed into Ncomp numbers (Ncomp is a manually defined preset integer.) finally, the original pixel values of the input frame may be compressed into COMP, where COMP is a dimension of Ncomp numbers Nx,Ny,Ncomp]Ncomp may represent the number of ICS channels of COMP. The intra-frame compression process may be a 2D convolution operation, as shown in the following equation:
Figure BDA0003010051360000091
wherein the indexes i and j respectively iterate k circularlyxAnd kyAnd the index k is a number from 1 to Ncomp.
The compression rate can be expressed as Ncomp/(k) without taking into account the difference between the bit width of the input pixel value (the original pixel value is typically 8 or 10 bits) and the bit width (8 bits) of the compressed digital arrayx*ky). In some embodiments, various [ k ] s may be setx,ky,Ncomp]To achieve different compression ratios. For example, [16,16 ] using a 2D kernel]、[8,8,4]、[16,16,8]And [16,16, 1]]Different compression ratios, such as 1/16, 1/16, 1/32, and 1/256, respectively, may be achieved.
Frame strategy two
In some embodiments, a set of original pixel values may correspond to pixels in a frame, and the compression kernel may be a 1D kernel. Since the pixels in a frame are transmitted sequentially, the sequence of pixels corresponding to the frame may be divided into segments. The 1D compression kernel may be an integer vector having the same size for the segment, and the intra-frame compression process may be a 1D convolution operation with the 1D kernel combining pixels in the 1D segment of the frame into an integer.
In some embodiments, each component in the integer vector may be-1 or + 1. In some embodiments, each component in the integer vector may be 0 or 1. For example, 16 incoming pixel values may be combined into one number using a length-16 integer vector [0,1,0,0,1,0, … 1 ]. As another example, 16 incoming pixel values may be combined into one number using a length-16 integer vector [ -1,1, -1,1, -1,1, -1, … 1 ].
In particular, the sequence may be divided row by row. Various 1D kernels or 1D integer vectors have been developed, including [128,1,4], [32,1,4 ]. And a combination of different convolution-1D kernels for different rows in the raw Bayer data can be used to control the overall compression ratio for a frame.
This way of partitioning the pixel sequence uses a smaller buffer size compared to the way of partitioning the 2D image block, since pixel values from different lines/segments do not need to be buffered when the incoming pixel values can be processed as segments.
As described above, the original pixel value corresponding to an image block or a segment may be compressed into an integer, and the set of original pixel values may be compressed into a plurality of integers. These integers may be stored or buffered during compression or decompression.
It should be noted that the intra-frame compression process is presented merely to provide illustration and is not intended to limit the scope of the invention. Many variations and modifications in the technology will suggest themselves to those skilled in the art having the benefit of this disclosure. However, such changes and modifications do not depart from the scope of the present invention. For example, compression kernels having other bit widths may also be applied to compress the original pixel values.
Fig. 3 shows a convolution process described in 204, according to some embodiments of the invention. As shown in fig. 3, a pixel array of size 4 x 4 can be compressed into an integer using a compression kernel. The compression kernel may also have a size of 4 x 4.
In some embodiments, the compressed data (ICS frame) may be quantized into an n-bit integer for storage and/or transmission. For example, the first ICS frame may be quantized to a quantized first ICS frame, and the R-ICS frame may be quantized to a quantized R-ICS frame (QR-ICS frame). In some embodiments, the quantized first ICS frames may include Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame may correspond to one ICS channel of the quantized first ICS frame; and each QR-ICS frame may include Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame may correspond to one ICS lane of the QR-ICS frame.
The quantization process in the present invention is much simpler than that of JPEG. The output result of the convolution operation can be scaled (reduced in bit width) to accommodate a range of 8-bit integers. However, entropy coding does not require complex computations to reduce the quality loss, and the quantization loss directly affects the overall quality of compression/decompression, which maintains a highly similar overall quality as JPEG.
However, the intra-frame compression process is already simple, and as illustrated in 204, it may provide a way to reduce the necessary buffer size. For applying one convolution 2D kernel of size 4, 4, 1 to an image block of one pixel of shape 4, 4, it is not necessary to buffer all pixels (16 in total) and perform one element product-sum summation. Instead, the pixels may be processed line by line with the appropriate kernel weight elements as they are read in, and the output values (digital array) placed in a buffer until a single convolution operation is completed. After each convolution operation, the buffered numbers may be output into memory and the buffer may be cleared.
When the above method is performed, for a bit with one dimension [ k ]x,ky,Ncomp]One size of the convolutional 2D kernel processing of [ N ]X,NY]The necessary buffer size for the incoming raw Bayer picture is k for the raw Bayer pixelsxAnd (6) rows.
Fig. 4 illustrates an example of an intra-frame compression process using frame policy one, described at 204, according to some embodiments of the invention. As shown in fig. 4, a frame may be compressed from tile to tile, in some embodiments, the frame may be processed by hardware such as an FPGA, a convolution kernel is applied to each tile of pixels, and neither overlap nor gaps occur in moving to the next tile until the last tile is moved. The image block 1 shown in fig. 4 represents an image block that has already been processed, while the image block 2 represents an image block that is being processed.
Fig. 5 is an example of a compression kernel in a single layer convolution 2D compression of raw Bayer data, according to some embodiments of the invention. The execution of a single layer convolution-2D operation may be used to compress an Nx,Ny]The raw Bayer data. The compression kernel is [ k ]x,ky,Ncomp]=[8,8,4]Is shape, and the kernel is quantized from a trained floating point weighted neural network to a value in the range of-7, 7]The inner 4 bits are signed integers. The kernel is shown in FIG. 5, and 4 planes represent [8, respectively]And (4) matrix. Each plane represents [8, 8, i ] of the compression kernel]One of the moieties (i ═ 0,1, 2, 3).
FIG. 6 shows a shape [256,480,4] of the compressed input pixel values of FIG. 1 after the input pixel values have been compressed using the compression kernel]An array of integers. The intra-frame compression may be performed using the compression kernel shown in fig. 5. i.e. ithThe plane shows the matrix [256,480, i]Wherein the index i is 0,1, 2, 3.
Fig. 7 illustrates a method of shared data determination after an intra-frame compression process, according to some embodiments of the invention. This shared data determination method may result in further compression of the transmitted/buffered data.
As depicted in fig. 2, many frames in a video segment may be compressed into ICS frames. The ICS frame may include a first ICS frame and several remaining ICS frames (R-ICS frames) that occur after the first ICS frame.
At 702, for sub-QR-ICS frames corresponding to each ICS channel, image block matching subtraction is performed in the sub-QR-ICS frame related to the first ICS frame quantized in a single channel corresponding to the same ICS channel, and an ICS frame with the image block subtraction is generated. In an image block-subtracted ICS frame, there may be one or more matching image blocks associated with the quantized first ICS frame, and for a matching image block associated with the reference image block in the first ICS frame, there is a motion vector. A motion vector represents the relative positioning between a matching image block in a sub-QR-ICS frame and a reference image block in the first ICS frame quantized in a single channel. In some embodiments, the motion vectors may be shared by the ICS channels, and thus, the motion vector may be determined only once per QR-ICS frame for all ICS channels in one QR-ICS frame.
At 704, for the frames of the subtracted image blocks corresponding to each ICS channel, the ICS frames of the subtracted image blocks are grouped into stacks, where each stack includes a preset number of subtracted image blocks. In some embodiments, the number of ICS frames of the subtracted image blocks in each stack may be equal to each other. For example, a video slice consisting of 100 frames may be compressed into 100 ICS frames (a first ICS frame and 99R-ICS frames), quantization may be performed on 100 ICS frames (including the quantized first ICS frame and 99 QR-ICS frames), then for each ICS channel, image block matching subtraction may be performed in the 99 sub-QR-ICS frames associated with one single-channel quantized first ICS, and ICS frames of 99 subtracted image blocks are generated, such that the ICS frames of 99 subtracted image blocks may be grouped into 33 stacks (total number of stacks of 33 Ncomp), where each stack includes ICS frames of 3 subtracted image blocks.
At 706, for an ICS frame of the subtracted image block in each stack corresponding to each ICS channel, shared data is determined, and a stack residual frame is determined based on the shared data. The shared data may represent similar data in an ICS frame of the subtracted image block corresponding to the ICS channel.
FIG. 8 illustrates an exemplary process of FIG. 7, according to some embodiments of the invention.
As shown in fig. 8, one shape is Nx,Ny]May be compressed intoSize of [ Nx,Ny,Ncomp]COMP of (1). Then, COMP may be quantized to COMP _ Q. Image block matching subtraction can be performed in COMP _ Q and generate one size [ N [ ]x,Ny,Ncomp]One image-block-subtracted frame COMP _ SMP, where one image-block-subtracted frame includes Ncomp image-block-subtracted ICS frames (which are executed in an ICS-channel-by-ICS-channel order as described in fig. 7) and then corresponds to each ICS channel, the image-block-subtracted ICS frames (COMP _ SMP in each ICS channel, having a size [ Nx, Ny [ ]]) Stacks can be assembled and there are Nfp ICS frames in each stack that subtract image blocks. The pixel value data in each stack may be denoted COMP SMPS.
FIG. 9 illustrates an exemplary method of image block matching subtraction, according to some embodiments of the invention. The image block matching subtraction may be performed in one sub-QR-ICS frame associated with the first ICS frame quantized in a single channel.
In 902, for each sub-QR-ICS frame corresponding to one ICS channel, motion prediction based on image block matching is performed on an image block of a sub-QR-ICS frame associated with a first ICS frame quantized in a single channel corresponding to the ICS channel, wherein the sub-QR-ICS frame is divided into image blocks for performing image block search in the first ICS frame quantized in the single channel, and there is neither a gap nor an overlap in the sub-QR-ICS frame.
At 904, for each sub-QR-ICS frame, an ICS frame with image blocks subtracted is determined by subtracting the matching image block from the sub-QR-ICS frame. Finally, an ICS frame for a plurality of subtracted image blocks corresponding to the R-ICS frame can be determined.
In some embodiments, image block matching based motion prediction may be performed based on hierarchical image block search. Fig. 10 is an exemplary method for motion prediction based on image block matching according to some embodiments of the present invention.
In 1002, a sub-QR-ICS frame corresponding to an ICS channel may be defined as a matching frame and a single-channel quantized first ICS frame corresponding to the ICS channel may be defined as a search frame. At 1004, a hierarchical image block search is performed in a search frame. The matching frame is divided into associated image blocks.
For each associated image block in the matching frame, steps 1006 to 1012 below may describe the contents of the hierarchical image block search, and the hierarchical image block search may be performed in each associated image block.
For an associated image block, the image block search may be performed in the search frame with an area of one step. In some embodiments, the step size area is a predetermined integer no less than 1. In some embodiments, during an image block search in a search frame, a squared difference between each image block in the search frame and an associated image block in a matching frame may be determined. Finally, for the associated image block, a plurality of squared differences corresponding to each image block in the search frame may be determined.
At 1006, for an associated image block, it may be determined whether the least squares difference is less than a preset threshold. The least square error is one of a plurality of square errors.
At 1008, if the least square error is less than the preset threshold, a target image block with the least square error in the search frame may be determined as a reference image block and the associated image block may be determined as a matching image block.
At 1010, if the minimum squared error is not less than a preset threshold, it may be determined whether the image block area of the associated image block is subject to a preset minimum image block area.
At 1012, if the image block area of the associated image block is greater than a preset minimum image block area, the associated image block may be defined as a matching frame and the target image block may be defined as a search frame, and then the process repeats 1004 for hierarchical image block search.
For an associated image block, if it is determined that the image block area of a reference image block or the associated image block is not greater than the preset minimum image block area, the determining image block search may be ended. For all associated image blocks, steps 1006 to 1012 are performed. Finally, one or more matching image blocks in the sub-QR-ICS frame may be determined, and for each matching image block, a motion vector corresponding thereto may be determined.
Since the associated image block may be further divided, the image block area may be continuously reduced during the hierarchical image block search. In some embodiments, the step size area and the tile area of the associated tile may have a positive correlation. For example, when the associated image block has a larger image block area, the step size area may be larger; and when the associated image block has a smaller image block area, the step size area may be smaller. In general, the step-size area is a preset integer no less than 2, and the step-size area size may be 1 only when the associated image block cannot be divided any more.
FIG. 11 is an exemplary shared data based compression method, according to some embodiments of the invention. In particular, FIG. 11 shows one process of step 706.
At 1102, for each stack corresponding to an ICS channel, the ICS frame of the subtracted image block in the ICS channel may be convolved with a first kernel to determine shared data, which may represent similar data in the ICS frame of the subtracted image block in one stack.
At 1104, for each stack to which the ICS channel corresponds, quantized shared data may be determined by quantizing values in the shared data to integers having a preset bit width using a first scaling factor. In some embodiments, the quantization may comprise two steps. First, values in the shared data may be scaled (reduced in bit width) using a first scaling factor to accommodate a range of n-bit integers. For example, a value in the shared data may be multiplied by a first scaling factor. The first scaling factor may be an integer or a true fraction. Second, the scaling value of the shared data may be integer.
In 1106, for each stack corresponding to the ICS channel, the quantized shared data may be rescaled to rescaled quantized shared data (RQ shared data). In some embodiments, the quantized shared data may be rescaled by dividing the values of the quantized shared data using a first scaling factor. In some embodiments, the rescaling operation may incur quantization loss. For example, using the first scaling factor 1/2, a pixel in the shared data with a value of 23 may be scaled to 11.5 and then be integer to 12. Integer 12 may then be rescaled to integer 24 using a first scaling factor 1/2, which may cause a quantization loss between integer 24 and integer 23, where the errors of convolution and deconvolution are not accounted for.
In 1108, a deconvolution may be performed using a second kernel to reshape the RQ shared data into a reshaped RQ shared data (RRQ shared data).
At 1110, for each stack corresponding to the ICS channel, the RRQ shared data may be subtracted from the ICS frame from which the image block was subtracted, thereby determining a stack residual frame.
In some embodiments, the method of fig. 11 may be further simplified. By convolving a weighted sum frame with a first kernel, the shared data in each stack corresponding to each ICS channel can be determined. The weighted sum frame may be determined by weighted sum of the values at the same position in the ICS frame of the subtracted image block using the weighted sum parameter. Also, the convolution of a weighted sum frame requires much less energy and memory than the convolution of an ICS frame of a plurality of subtracted image blocks.
In some embodiments, the stack residual frames may be further compressed, quantized, etc., and for each stack to which the ICS channel corresponds, a third kernel stack may be used to compress each stack residual frame in the stack. Then, for each stack corresponding to the ICS channel, a second scaling factor may be used to quantize the value in each compressed stack residual frame to an integer having a preset bit width, thereby determining a quantized compressed stack residual frame. In some embodiments, the quantization process may be the same as in 1104. In some embodiments, the second scaling factor used in quantization may or may not be identical to the first scaling factor.
In some embodiments, entropy encoding may be performed on the quantized shared data after step 1102. Second, rescaling is performed as described in 1108. In some embodiments, entropy encoding may be performed on the quantized compressed stack residual frame after step 1112. The entropy encoded quantized compressed stack residue frame may be stored for decoding. In some embodiments, entropy coding may be performed on motion vectors shared in the ICS channel. Entropy coding is an operation prior to transmission or storage, and the entropy-coded quantized shared data corresponding to each ICS channel, the entropy-coded quantized compressed stack residual frame corresponding to each ICS channel, and the entropy-coded motion vector corresponding to each stack shared in the ICS channels are stored for decoding/decompression. In some embodiments, the quantized first ICS frame may also be stored for adding matching image blocks to reconstruct multiple frames of the video segment.
Entropy encoding may be performed based on a global data dictionary that is pre-constructed based on a large amount of data of the same type. In some embodiments, the global data dictionary may be determined based on a construction process of a huffman-coded codec.
First, a large amount of data of the same type may be used to construct a general dictionary suitable for each data type. In some embodiments, the same type of data may be quantized shared data, or quantized compressed stack residual frames, or motion vectors, etc., which share statistical similarity of value distributions (as if different gaussian distribution peaks overlap or are close to each other) although there is a difference between the same type of data originating from different frames. The global data dictionary may then be used to encode values in entropy encoding a certain type of input data.
Fig. 12 illustrates an exemplary process (also including quantization and entropy encoding processes) of fig. 11, according to some embodiments of the invention. As shown in fig. 12, COMP _ SMPS (size of Nx, Ny, Nfp) can be implemented by]) And a first kernel (size [ ksmx, ksmy, Nfp, ncomp _ sm)]) Convolving to determine shared data Smem (with size [ Nx/ksmx, Ny/ksmy, ncomp _ sm)]). Smem can be quantized to Smem _ Q. The Smem _ Q may then be rescaledSmem _ Q _ rsc (size [ N ]x/kresx,Ny/kresy,ncomp_sm]) And a second kernel (of size [ ksmx, ksmy, ncomp _ sm) is used]) Deconvolution is performed on Smem _ Q _ rsc so that Smem _ Q _ rsc can be reshaped into SMem _ rs (size [ N)x,Ny]). Then, in each ICS channel and each stack, SMem _ rs can be subtracted from the ICS frame COMP _ SMPS _ i of each subtracted image block, thereby determining a stack residual frame COMP _ sres _ i. A third kernel (of size [ kresx, kresy, ncomp _ res) may be used]) Each stack residual frame in each stack is further compressed to comp _ sres _ i (size N)x/kresx,Ny/kresy,ncomp_res]). Finally, comp _ sres _ i can be quantized to comp _ sres _ i _ Q. And comp _ sres _ i _ Q may be entropy-encoded into comp _ res _ i _ Q _ EC.
Fig. 13 illustrates a method of reconstructing a video segment, according to some embodiments of the invention.
In 1302, entropy decoding is performed on the entropy encoded quantized shared data, the entropy encoded quantized compressed stack residual frame, and its corresponding entropy encoded motion vector for each stack corresponding to the ICS channel. In some embodiments, entropy coding may be performed prior to transmission, as decompression and compression may be performed on different sides. When the side that decompresses receives entropy encoded data, entropy decoding may be performed first.
In 1304, for each stack to which the ICS channel corresponds, each quantized compressed stack residue frame may be rescaled to an RQ compressed stack residue frame, and the quantized shared data may be rescaled to RQ shared data. In some embodiments, as described in 1106, RQ shared data may be determined based on the first scaling factor. The quantized compressed stack residual frame may be partitioned using a second scaling factor to determine each RQ compressed stack residual frame.
At 1306, for each stack corresponding to the ICS channel, a deconvolution may be performed using a fourth kernel to decompress each RQ compressed stack residual frame into a first decompressed ICS frame.
At 1308, for each stack corresponding to the ICS channel, deconvolution may be performed using the second kernel to reshape the RQ shared data into RRQ shared data.
In 1310, for each first decompressed ICS frame in each stack corresponding to the ICS channel, its corresponding RRQ shared data and one or more of its corresponding matching image blocks with stored motion vectors may be added to the first decompressed ICS frame to determine a second decompressed ICS frame.
In 1312, for every second decompressed ICS frame in each stack, the second decompressed ICS frames corresponding to all ICS channels may be stacked together, thereby determining a third decompressed ICS frame. The third decompressed ICS frame may be an ICS channel stack of the second decompressed ICS frame. The third decompressed ICS frame may correspond to one R-ICS frame, and there may be a preset number of third decompressed ICS frames in each stack.
At 1314, for every third decompressed ICS frame in each stack, intra-frame decompression may be performed on the third decompressed ICS frame using one decompression core and a neural network for Quality Improvement (QINN) to determine one reconstructed frame. In some embodiments, the process of using the decompression kernel can be viewed as two steps: a rescaling step and a decompressing step. The rescaling step may correspond to the quantization process after intra-frame compression described in fig. 3, and the rescaling may be performed based on the same scaling factor as in fig. 3. In some embodiments, step 1312 may be divided into two steps: rescaling the third decompressed ICS frame to a third decompressed ICS frame, and decompressing the third decompressed ICS frame to one reconstructed frame.
In some embodiments, the stack corresponding to the ICS channel may share the first core through the fourth core. Further, there may be a set of first to fourth cores corresponding to each ICS channel, and each set of first to fourth cores may be determined by sample-based training.
Frames in one stack may be reconstructed by their corresponding RQ shared data. Furthermore, all frames in a video segment may also be reconstructed.
FIG. 14 illustrates an exemplary process of FIG. 13, according to some embodiments of the invention. As shown in FIG. 14, the quantized compressed stack residual frame comp _ sres _ i _ Q may be rescaled to comp _ sres _ i _ rsc (size [ N [)x/kresx,Ny/kresy,Ncomp_res]) And a fourth core may be used to decompress COMP _ sres _ i _ rsc to COMP _ sres _ i _ D, where COMP _ sres _ i _ D has a size of [ N [x,Ny]And the fourth core has a size of kresx, kresy, Ncomp _ res]. SMem _ rs may then be added to COMP _ sres _ i _ D to determine COMP _ D, where COMP _ D has a size [ N ]x,Ny]. The determination of SMem _ rs is based on entropy decoding (using the global data dictionary), rescaling (using the first scaling factor) and reshaping (using the second kernel). Then, a recompose frame can be determined by deconvoluting COMP _ D _ all _ chans, which is a set of values in all ICS channels, using a decompression kernel and a quality-improved neural network (QINN), with a size of Nx,Ny,Ncomp]And the compressed kernel has a size of kx,ky,Ncomp]The size of the reconstructed frame is [ N ]x,Ny]。
In some embodiments, one time module includes the first to fourth cores for each Ncomp ICS channel, or the time module includes the first to fourth cores and a weighted sum parameter. In some embodiments, the parameters in the compression kernel, the time module, the decompression kernel, and the QINN may be determined by sample-based training. Further, the time module for all ICS channels can be determined based on sample-based training.
FIG. 15 illustrates a sample-based training method, according to some embodiments of the invention.
At 1502, multiple sets of raw pixel value samples can be read out, where each set of raw pixel value samples corresponds to a frame. Multiple sets of raw pixel value samples may be used for sample-based training.
At 1504, intra-frame compression may be performed using an initial compression kernel to compress each set of raw pixel value samples into an ICS frame sample. Wherein the ICS frame samples comprise a first ICS frame sample and R-ICS frame samples.
In 1506, the first ICS frame size sample may be quantized to a quantized first ICS frame sample, and the R-ICS frame sample may be quantized to a quantized QR-ICS frame sample. In some embodiments, the quantized first ICS frame samples may include Ncomp single-channel quantized first ICS frame samples, and each single-channel quantized first ICS frame may correspond to one ICS channel of the quantized first ICS frame samples, wherein each QR-ICS frame sample may include Ncomp sub-QR-ICS frame samples, and each sub-QR-ICS frame sample may correspond to one ICS channel of the QR-ICS frame samples.
At 1508, for each sub-QR-ICS frame sample corresponding to an ICS channel, performing image block matching subtraction in the sub-QR-ICS frame sample associated with the first ICS frame sample quantized in the single channel, and generating ICS frame samples of the subtracted image block, wherein one or more motion vectors correspond to one sub-QR-ICS frame sample, and one motion vector represents a relative positioning between one matching image block in the one sub-QR-ICS frame sample and one reference image block in the first ICS frame sample quantized in the single channel, and wherein the motion vector in the one ICS channel is shared by other ICS channels.
At 1510, for each ICS-channel corresponding ICS-frame samples of the subtracted image block, the ICS-frame samples of the subtracted image block can be grouped into stacks, where each stack includes a preset number of samples of the ICS-frame samples of the subtracted image block
At 1512, for the image-subtracted ICS frames in each stack in each ICS channel, an initial time module may be used to determine the shared data samples, the compressed stack residual frame samples, and the first decompressed ICS frame samples, where the initial time module includes a fifth kernel, a sixth kernel, a seventh kernel, and an eighth kernel, or the time module includes the fifth through eighth kernels and the initial weighted sum parameters.
At 1514, for each stack, reconstructed frame samples may be determined using an initial decompression core and an initial QINN.
At 1516, the initial compression kernel may be trained as a compression kernel, and the initial decompression kernel and the initial QINN may be trained as an intermediate decompression kernel and an intermediate QINN.
At 1518, parameters of the initial time block may be trained via multi-graph joint loss training.
FIG. 16 illustrates an exemplary multi-graph joint loss training, according to some embodiments of the invention.
At 1602, 4 computation graphs may be determined. These 4 figures are the process using the initial time module. In some embodiments, the first computation graph G1 represents a process of keeping the first and second quantized points, the second computation graph G2 represents a process of keeping the first quantized points, the third computation graph G3 represents a process of keeping the second quantized points, and the fourth computation graph G4 represents a process of keeping no quantized points. In some embodiments, the first quantization point represents quantized output data of the fifth core and the second quantization point represents quantized output data of the seventh core. These 4 calculation graphs are also shown in fig. 21, according to some embodiments of the present invention.
At 1604, 3 optimizers may be determined in a sequential manner during the iterative training. In some embodiments, the first optimizer is set to train parameters before the first quantization point to minimize a first total loss, which includes DA _ E from the first quantization point of G1, DA _ E from the second quantization point of G3, and the reconstruction loss from G4. In some embodiments, the second optimizer is set to train parameters between the first and second quantization points to minimize a second total loss, which includes DA _ E from the second quantization point of G1, and the reconstruction loss from G2. In some embodiments, the third optimizer is configured to train parameters after the second quantization point to minimize a third total loss, which includes the reconstruction loss from G1. In some embodiments, DA _ E represents a differentiable approximation of entropy.
In 1606, the first, second, and third optimizers may be iteratively run to train parameters in the initial time module, where the fifth through eighth kernels may be trained as fifth through eighth intermediate kernels. In some embodiments, the parameters in the fifth through eighth intermediate cores are floating point numbers. In some embodiments, weighted sum parameters may also be trained.
Fig. 17 illustrates an exemplary process of the first computation graph G1, according to some embodiments of the invention.
At 1702, data T1 may be entered into a first convolutional layer using parameters Para (bQ1), thereby determining data T2. In some embodiments, the data T1 means the input of the initial time module, which corresponds to ICS frame samples for all the stacks of the subtracted image blocks. In some embodiments, Para (bQ1) is a parameter in the fifth kernel, or Para (bQ1) is a weighted sum parameter and a parameter in the fifth kernel.
In 1704, the data T2 at the first quantization point may be quantized, thereby determining data T2_ Q.
In 1706, data T3 may be determined based on data T2_ Q. In some embodiments, in the process from data T2_ Q to T3, the trained parameters may include a first deconvolution layer and a second convolution layer with parameters Para (aQ1, bQ 2). In some embodiments, Para (aQ1, bQ2) is a parameter in the sixth and seventh cores. The process may further include a rescaling operation before the first deconvolution layer and a deduction operation after the second convolution layer.
At 1708, the data T3 at the second quantization point may be quantized, thereby determining data T3_ Q. In some embodiments, the data T3_ Q may correspond to a quantized compressed stack residual frame.
In 1710, data T4 may be determined based on the data T3_ Q. In some embodiments, in the process from data T3_ Q to T4, the trained parameters may include a second deconvolution layer with parameter Para (aQ 2). In some embodiments, Para (aQ2) is a parameter in the eighth kernel. The process may further include a rescaling operation prior to the second convolutional layer.
As described above, the processing in fig. 17 may correspond to that described in fig. 11 to 13.
Fig. 18 illustrates an exemplary process of the second computation graph G2, according to some embodiments of the invention.
At 1802, data T1 may be entered into a first convolutional layer using parameters Para (bQ1), thereby determining data T2.
At 1804, the data T2 at the first quantization point may be quantized, thereby determining data T2_ Q.
In 1806, data T3 may be determined based on the data T2_ Q. In some embodiments, in the process from data T2_ Q to T3, the trained parameters may include a first deconvolution layer and a second sequentially occurring convolution layer with parameters Para (aQ1, bQ 2). The process also includes a rescaling operation before the first deconvolution layer and a subtraction operation after the second convolution layer, as in step 1706.
In 1808, data T4(2) may be determined based on the data T3. In some embodiments, in the process from data T3 to T4(2), the trained parameters may include a second deconvolution layer with parameter Para (aQ 2). The rescaling operation described in 1710 is also not a necessary operation since the data T3 has not been quantized yet.
FIG. 19 illustrates an exemplary process of the third forward computation graph G3, according to some embodiments of the invention.
At 1902, data T1 may be input into the first convolutional layer using parameter Para (bQ1), thereby determining data T2.
In 1904, data T3(3) may be determined based on data T2. In some embodiments, in the process from data T2 to T3(3), the trained parameters may include, in order, a first deconvolution layer and, in order, a second convolution layer with parameter Para (aQ1, bQ 2). In some embodiments, the process may further include a subtraction operation after the second convolutional layer. Since the data T2 has not been quantized, the rescaling operation as described in 1706 is also not required.
In 1906, the data T3 at the second quantization point may be quantized, thereby determining data T3_ Q (3).
In 1908, data T4(3) may be determined based on data T3_ Q (3). In some embodiments, in the process from data T3_ Q (3) to T4(3), the trained parameters include a second deconvolution layer with parameter Para (aQ 2). In some embodiments, the process may further include a rescaling operation prior to the second deconvolution layer.
Fig. 20 illustrates an exemplary process of the fourth computation graph G4, according to some embodiments of the invention.
In 2002, data T1 may be input into the first convolutional layer using parameter Para (bQ1), thereby determining data T2.
In 2004, data T3(4) may be determined based on data T2. In some embodiments, in the process from data T2 to T3(4), the trained parameters may include a first deconvolution layer and a second convolution layer with parameters Para (aQ1, bQ2) that occur in sequence. In some embodiments, the process may further include a deduction operation after the second convolution layer.
In 2006, data T4(4) may be determined based on data T3 (4). In some embodiments, in the process from data T3(4) to T4(4), the trained parameters may include a second deconvolution layer with parameter Para (aQ 2).
As described above in fig. 15 to 20, the fifth to eighth cores may be trained as fifth to eighth intermediate cores. In some embodiments, the fifth through eighth intermediate cores may be further processed to determine a time module.
FIG. 22 illustrates an exemplary time module determination method, according to some embodiments of the invention.
At 2202, a parameter in a fifth intermediate core may be integer-ized to determine a first core, and a parameter in a sixth intermediate core may be integer-ized to determine a second core, and a parameter in a ground-start intermediate core may be integer-ized to determine a third core.
At 2204, parameters in the eighth intermediate core may be fine-tuned to determine the fourth core.
In some embodiments, the parameters in the first through third cores may be integers, while the parameters in the fourth core may still be floating point numbers.
At 2206, parameters in the intermediate decompression kernel and the intermediate QINN may be trimmed to determine the decompression kernel and the QINN.
FIG. 23 is an exemplary diagram of a compression device, according to some embodiments of the invention. As shown in fig. 23, the compressing apparatus may include a readout unit 2310, a compressing unit 2320, and a memory 2330. In some embodiments, the compression device 2300 may be configured to compress raw Bayer data corresponding to one video clip from the focal plane (sensor array) of one camera.
The readout unit 2310 may be set to sequentially read out a plurality of sets of original pixel values. In some embodiments, each set of original pixel values may correspond to a frame of a video segment.
The compression unit 2320 may be configured to perform the compression processes described in fig. 2, 7 to 12, wherein intra-frame compression and shared data based compression may be performed.
FIG. 24 is an exemplary diagram of a video recording system according to some embodiments of the present invention. The video recording system 2400 may include a compression module 2410 and a decompression module 2420.
The compression module 2410 may be configured to read out sets of raw pixel values in order by a readout unit 2411, and may be configured to perform a compression operation by a compression unit 2412. It should be noted that the compression module may be the same as the compression device described in fig. 23.
The memory 2420 may be configured to store the quantized first ICS frame, the entropy encoded quantized shared data corresponding to each ICS channel, the entropy encoded quantized compressed stack residual frame corresponding to each ICS channel, and the entropy encoded motion vector in each stack shared in the ICS channels. In some embodiments, the stored data may be used to reconstruct a plurality of frames in a video clip.
The decompression module 2430 can be configured to reconstruct a plurality of frames of a video segment. The reconstruction operation may be performed as shown in fig. 13 to 14.
The process of compression and decompression has been described in the method section of the present invention and we do not go into the compression apparatus and video recording system in any depth here.
Having thus described the basic concept, it may become rather apparent to those skilled in the art, having read the present detailed disclosure, that the foregoing detailed disclosure is intended to be presented by way of example only, and not by way of limitation. Various alterations, improvements, and modifications will occur to those skilled in the art, though not expressly stated herein. Such alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of the disclosure.
Furthermore, certain terminology has been used to describe embodiments of the invention. For example, the terms "one embodiment," "an embodiment," and/or "some embodiments" mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined together in one or more embodiments of the invention.
Moreover, those skilled in the art will understand that various aspects of the invention may be illustrated and described in any of a number of patentable classes or contexts, including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, various aspects of the present invention may be implemented in hardware entirely, in software entirely (including firmware, resident software, micro-code, etc.), or in a combination of software and hardware, which implementations may be generally referred to herein as "blocks," modules, "" engines, "" units, "" components, "or" systems. Furthermore, aspects of the present invention may take the form of a computer processing product embodied in one or more computer readable media having computer readable processing code embodied therein.
Furthermore, the order in which the elements or sequences of processes are described, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. While the foregoing disclosure discusses, by way of various examples, what are presently considered to be various useful embodiments of the present invention, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalents within the spirit and scope of the disclosed embodiments. For example, although implementations of the various components described above may be embodied in a hardware device, they may also be implemented as a software-only solution, e.g., installation on an existing processing device or mobile device.
Also, it should be appreciated that in the foregoing description of embodiments of the invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

Claims (17)

1. A method of video clip compression, comprising:
reading out a plurality of sets of original pixel values from a camera head, wherein each set of original pixel values corresponds to a frame of a video clip;
intra-frame compression by compressing each set of original pixel values into an intra-frame compressed sample (ICS) frame using a compression core, wherein the ICS frame includes a first ICS frame and a number of remaining frames (R-ICS frames) occurring after the first ICS frame, and wherein the compression core has Ncomp ICS lanes, and Ncomp is an integer not less than 1;
quantizing the first ICS frame and quantizing the R-ICS frame into a QR-ICS frame, wherein the quantized first ICS frame comprises Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame corresponds to one ICS channel of the quantized first ICS frame; wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame corresponds to one ICS channel of the QR-ICS frame;
for the sub-QR-ICS frame corresponding to each ICS channel, performing image block matching subtraction in the sub-QR-ICS frame associated with the first single-channel quantized ICS frame corresponding to the ICS channel, and generating the ICS frame with the image blocks subtracted, wherein one or more motion vectors correspond to one sub-QR-ICS frame, and the motion vectors represent the relative positioning between the matched image blocks in the sub-QR-ICS frame corresponding to the ICS channel and the reference image blocks in the first single-channel quantized ICS frame corresponding to the ICS channel, wherein the motion vectors corresponding to one ICS channel are shared by other ICS channels;
for the ICS frames of the subtracted image blocks corresponding to each ICS channel, combining the ICS frames of the subtracted image blocks into stacks, wherein each stack comprises a preset number of ICS frames of the subtracted image blocks;
for an ICS frame of the subtracted image block in each stack corresponding to each ICS channel, shared data representing similar data in the ICS frame of the subtracted image block is determined, and a stack residual frame is determined based on the shared data.
2. The method of claim 1, wherein intra-frame compressing by compressing each set of original pixel values into a compressed frame using a compression kernel comprises:
for each frame of the video segment, each portion of the set of original pixel values is compressed into an integer using a compression kernel, wherein each portion of the set of original pixel values corresponds to a segment of the frame.
3. The method of claim 1, wherein performing image block match subtraction in the QR-ICS frame associated with the quantized first ICS frame for the sub-QR-ICS frame corresponding to each ICS channel comprises:
for each sub-QR-ICS frame corresponding to one ICS channel, performing motion prediction based on image block matching on an image block of the sub-QR-ICS frame associated with a first single-channel quantized ICS frame corresponding to the ICS channel, wherein the sub-QR-ICS frame is divided into image blocks for performing image block search in the first single-channel quantized ICS frame, and the image blocks in the sub-QR-ICS-frame have neither gaps nor overlaps;
for each sub-QR-ICS frame, determining an ICS frame from which the image blocks are subtracted by subtracting one or more matching image blocks from the sub-QR-ICS frame.
4. The method of claim 3, wherein for each sub-QR-ICS frame corresponding to one ICS channel, the performing motion prediction based on the matching image block on the image block of the sub-QR-ICS frame associated with the first ICS frame quantized in a single channel corresponding to the ICS channel comprises:
defining a sub-QR-ICS frame corresponding to an ICS channel as a matching frame, defining a first ICS frame of single-channel quantization corresponding to the ICS channel as a search frame, and performing hierarchical image block search in the search frame, wherein the matching frame is divided into associated image blocks and the hierarchical image block search comprises:
for each associated image block in the matching frame, performing image block search in the search frame by using the area of one step, wherein the area of the step is a preset integer not less than 1;
when searching for image blocks in a search frame, calculating the square difference between each image block in the search frame and the associated image block in a matching frame;
if the lowest square difference is smaller than a preset threshold value, determining the target image block as a reference image block by using the lowest square difference, and determining the associated image block as a matching image block;
if the lowest square difference is not smaller than a preset threshold value and if the image block area of the associated image block is larger than a preset minimum image block area, defining the associated image block as a matching frame, defining the target image block as a search frame, and performing hierarchical image block search in the search frame;
and repeating the hierarchical image block search until a reference image block corresponding to the associated image block is found, wherein the square difference between the two is smaller than a preset threshold value, or the image block area of the associated image block is not larger than a preset minimum image block area.
5. The method according to claim 1, wherein the number of ICS frames in each stack, from which an image block is subtracted, is equal.
6. The method according to claim 1, wherein for each ICS frame of a subtracted image block in each stack corresponding to each ICS channel, determining shared data, and for each ICS frame of a subtracted image block, determining one stack residual frame based on the shared data comprises:
for each stack corresponding to the ICS channel, convolving the ICS frames of the subtracted image blocks in each stack by using a first kernel, thereby determining shared data;
for each stack corresponding to the ICS channel, quantizing a value in the shared data into an integer having a preset bit width, thereby determining quantized shared data for each stack;
for each stack corresponding to the ICS channel, rescaling the quantized shared data into RQ shared data;
for each stack corresponding to the ICS channel, performing deconvolution by using a second kernel, so as to remold the RQ shared data into RRQ shared data;
and for each ICS frame of the subtracted image blocks in each stack corresponding to the ICS channel, subtracting the RRQ shared data from the ICS frame of the subtracted image blocks, thereby determining a stack residual frame.
7. The method according to claim 1, wherein for an ICS frame of a subtracted image block in each stack corresponding to each ICS channel, determining shared data, and determining a stack residual frame based on the shared data comprises:
for each stack corresponding to the ICS channel, performing weighted summation on values at the same position in the ICS frame of the subtracted image block by using a weighted summation parameter, so as to compress the ICS frame of the subtracted image block into a weighted summation frame;
for each stack corresponding to the ICS channel, convolving the weighted sum frame using a first kernel, thereby determining shared data;
for each stack corresponding to the ICS channel, quantizing the value in the shared data into an integer with a preset bit width, thereby determining quantized shared data;
for each stack corresponding to the ICS channel, rescaling the quantized shared data into RQ shared data;
for each stack corresponding to the ICS channel, performing deconvolution by using a second kernel, so as to remold the RQ shared data into RRQ shared data;
for each of the image-block-subtracted ICS frames in each stack to which the ICS channel corresponds, the stack residual frame is determined by subtracting the RRQ shared data from the image-block-subtracted ICS frame.
8. The method according to claim 6 or 7, wherein the method further comprises:
for each stack corresponding to the ICS channel, compressing each stack residual frame using a third kernel;
for each stack corresponding to the ICS channel, quantizing the value in each compressed stack residual frame into an integer with a preset bit width, thereby determining a quantized and compressed stack residual frame;
entropy encoding the quantized shared data corresponding to each ICS channel, the quantized compressed stack residual frame corresponding to each ICS channel, and the motion vector in each stack shared in the ICS channel, respectively;
wherein the entropy encoded quantized shared data corresponding to each ICS channel, the entropy encoded quantized compressed stack residual frame corresponding to each ICS channel, and the entropy encoded motion vector in each stack shared in the ICS channel are stored for decoding;
wherein the entropy coding is an operation based on a global data dictionary, and the global data dictionary is constructed in advance based on a large amount of data of the same type.
9. The method of claim 8, wherein the method further comprises:
for each stack to which the ICS channel corresponds, performing entropy decoding on the entropy-encoded quantized shared data, the entropy-encoded quantized compressed stack residual frame, and its corresponding entropy-encoded motion vector;
for each stack corresponding to the ICS channel, rescaling each quantized compressed stack residual frame to an RQ compressed stack residual frame;
for each stack corresponding to the ICS channel, performing deconvolution using a fourth kernel, thereby decompressing each RQ compressed stack residual frame into a first decompressed ICS frame;
for each stack corresponding to the ICS channel, performing deconvolution by using a second kernel, so as to remold the RQ shared data into RRQ shared data;
for each first decompressed ICS frame in each stack corresponding to the ICS channel, adding the RRQ shared data and its corresponding one or more matching image blocks with stored motion vectors to the first decompressed ICS frame, thereby determining a second decompressed ICS frame;
for each second decompressed ICS frame in each stack, stacking the second decompressed ICS frames corresponding to all ICS channels together, thereby determining a third decompressed ICS frame;
for every third decompressed ICS frame in each stack, performing intra-frame decompression on the third decompressed ICS frame using a decompression kernel and a neural network for Quality Improvement (QINN), thereby determining a reconstructed frame;
wherein the first kernel to the fourth kernel are shared by the stacks corresponding to the ICS channels.
10. The method of claim 9, wherein the parameters in the compression kernel, the time module, the decompression kernel, and the QINN are determined by sample-based training, wherein the time module comprises the first through fourth kernels, or the time module comprises the first through fourth kernels and a weighted summation parameter, the sample-based training comprising:
reading out a plurality of groups of original pixel value samples, wherein each group of original pixel value samples corresponds to one frame;
compressing each set of raw pixel value samples into ICS frame samples by using an initial compression kernel so as to perform intra-frame compression, wherein the ICS frame samples comprise a first ICS frame sample and a plurality of R-ICS frame samples appearing after the first ICS frame, and the initial compression kernel has Ncomp ICS channels, and Ncomp is an integer not less than 1;
quantizing the first ICS frame samples and quantizing the R-ICS frame samples into QR-ICS frame samples, wherein the quantized first ICS frame samples include Ncomp single-channel quantized first ICS frame samples, and each single-channel quantized first ICS frame sample corresponds to one ICS channel of the quantized first ICS frame samples, wherein each QR-ICS frame sample includes Ncomp sub-QR-ICS frame samples, and each sub-QR-ICS frame sample corresponds to one ICS channel of the QR-ICS frame samples;
for the sub-QR-ICS frame sample corresponding to each ICS channel, performing image block matching deduction in the sub-QR-ICS frame sample associated with the first ICS frame sample subjected to single-channel quantization, and generating an ICS frame sample subjected to deduction of the image block, wherein one or more motion vectors correspond to one sub-QR-ICS frame sample, the motion vectors represent the relative positioning between the matched image block in the sub-QR-ICS frame sample and a reference image block in the first ICS frame sample subjected to single-channel quantization, and the motion vector corresponding to one ICS channel is shared by other ICS channels;
for each ICS channel corresponding ICS frame sample of the subtracted image block, combining the ICS frame samples of the subtracted image block into stacks, wherein each stack comprises subtracted image block samples of a preset number of ICS frame samples;
for an ICS frame of a subtracted image block in each stack corresponding to each ICS channel, determining shared data samples and compressed stack residual frame samples and a first decompressed ICS frame sample using an initial time module, wherein the initial time module includes a fifth core, a sixth core, a seventh core, and an eighth core, or the time module includes fifth to eighth cores and an initial weighted summation parameter;
for each stack, determining reconstructed frame samples using an initial decompression kernel and an initial QINN;
training an initial compression kernel as a compression kernel, and training an initial decompression kernel and an initial QINN as an intermediate decompression kernel and an intermediate QINN;
the parameters of the initial time block are trained by multi-graph joint loss training.
11. The method of claim 10, wherein training the parameters of the initial time block through multi-graph joint loss training comprises:
determining 4 computation graphs, wherein 4 graphs are a process using an initial time block, wherein a first computation graph G1 represents a process of holding first and second quantized points, a second computation graph G2 represents a process of holding a first quantized point, a third computation graph G3 represents a process of holding a second quantized point, and a fourth computation graph G4 represents a process of holding no quantized point;
wherein the first quantization point represents quantized output data of the fifth kernel and the second quantization point represents quantized output data of the seventh kernel;
determining three optimization treatments in sequence in the process of iterative training;
wherein the first optimization process is set to train parameters before the first quantization point to minimize a first total loss comprising DA _ E from the first quantization point of G1, DA _ E from the second quantization point of G3, and reconstruction loss from G4;
wherein the second optimization process is set to train parameters between the first quantization point and the second quantization point to minimize a second total loss, the second total loss comprising DA _ E from the second quantization point of G1 and the reconstruction loss from G2;
wherein the third optimization process is set to train parameters after the second quantization point to minimize a third total loss, the third total loss comprising the reconstruction loss from G1;
where DA _ E represents a distinguishable approximation of entropy;
the fifth through eighth cores may be trained as fifth through eighth intermediate cores by iteratively running the first, second, and third optimization processes to train parameters in the initial time block, wherein the parameters of the fifth through eighth intermediate cores are floating point numbers.
12. The method of claim 11, wherein the first graph G1 includes:
inputting data T1 into the first convolution layer using a parameter Para (bQ1), thereby determining data T2, wherein data T1 corresponds to ICS frame samples of the subtracted image blocks of all stacks corresponding to each channel;
quantizing the data T2 at the first quantization point, thereby determining data T2_ Q;
determining data T3 from data T2_ Q, wherein in a process from data T2_ Q to T3, the trained parameters include a first deconvolution layer and a second convolution layer using parameter Para (aQ1, bQ2), wherein the process further includes a rescaling operation before the first deconvolution layer and a deduction operation after the second convolution layer;
quantizing the data T3 at the second quantization point to determine data T3_ Q, wherein the data T3_ Q corresponds to a quantized compressed stack residual frame;
determining data T4 from data T3_ Q, wherein in the process from data T3_ Q to T4, the trained parameters include a second deconvolution layer using parameters Para (aQ2), wherein the process further includes a rescaling operation prior to the second deconvolution layer;
wherein Para (bQ1) is a parameter in the fifth kernel, or Para (bQ1) is a weighted sum parameter and a parameter in the fifth kernel;
where Para (aQ1, bQ2) is a parameter in the sixth and seventh cores, and Para (aQ2) is a parameter in the eighth core.
13. The method of claim 12, wherein the second graph G2 includes:
inputting data T1 into the first convolutional layer using a parameter Para (bQ1), thereby determining data T2;
quantizing the data T2 at the first quantization point, thereby determining data T2_ Q;
determining data T3 based on data T2, wherein in going from data T2_ Q to T3, the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ 2);
data T4(2) is determined based on data T3, wherein in the process from data T3 to T4(2), the trained parameters include a second deconvolution layer using parameter Para (aQ 2).
14. The method of claim 13, wherein the third graph G3 includes:
inputting data T1 into the first convolutional layer using a parameter Para (bQ1), thereby determining data T2;
determining data T3(3) from the data T2; wherein in the process from data T2 to T3(3), the trained parameters include a first deconvolution layer and a second convolution layer using parameter Para (aQ1, bQ2), wherein the process further includes a subtraction operation after the second convolution layer;
quantizing the data T3 at the second quantization point, thereby determining data T3_ Q (3);
determining data T4(3) from data T3_ Q (3), wherein in a process from data T3_ Q (3) to T4(3), the trained parameters include a second deconvolution layer using parameter Para (aQ2), wherein the process further includes a rescaling operation prior to the second convolution layer.
15. The method of claim 14, wherein the fourth graph G4 includes:
inputting data T1 into the first convolutional layer using a parameter Para (bQ1), thereby determining data T2;
determining T3(4) from the data T2, wherein in the process from data T2 to T3(4), the trained parameters include a first deconvolution layer and a second convolution layer using parameters Para (aQ1, bQ2), wherein the process further includes a subtraction operation after the second convolution layer;
data T4(4) is determined from data T3(4), wherein in the process from data T3(4) to T4(4), the trained parameters include a second deconvolution layer using parameters Para (aQ 2).
16. The method of claim 11, further comprising:
determining a first kernel by performing an integer transformation of the parameter in the fifth intermediate kernel, determining a second kernel by performing an integer transformation of the parameter in the sixth intermediate kernel, and determining a third kernel by performing an integer transformation of the parameter in the seventh intermediate kernel;
determining a fourth kernel by fine tuning parameters in the eighth intermediate kernel;
the compressed kernels and QINN are determined by fine-tuning parameters in the intermediate compressed kernels and intermediate QINN.
17. An apparatus for video segment compression, comprising:
a readout unit, wherein the readout unit is configured to read out a plurality of sets of original pixel values from a camera head;
a processor, wherein the processor is configured to perform compression on a plurality of frames in a video segment, wherein the compression process comprises:
performing intra-frame compression by compressing each set of original pixel values into an intra-frame compressed sample (ICS) frame using a compression kernel, wherein the ICS frame includes a first ICS frame and several remaining ICS frames (R-ICS frames) occurring after the first ICS frame, and wherein the compression kernel has Ncomp ICS channels, and Ncomp is an integer not less than 1;
quantizing the first ICS frame and quantizing the R-ICS frame into QR-ICS frames, wherein the quantized first ICS frame comprises Ncomp single-channel quantized first ICS frames, and each single-channel quantized first ICS frame corresponds to one ICS channel of the quantized first ICS frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames, and each sub-QR-ICS frame corresponds to one ICS channel of the QR-ICS frame;
for each sub-QR-ICS frame corresponding to each ICS channel, performing matched image block subtraction in the sub-QR-ICS frame associated with the first single-channel quantized ICS frame corresponding to the ICS channel, and generating the ICS frame of the subtracted image blocks, wherein one or more motion vectors correspond to one sub-QR-ICS frame, and the motion vectors represent the relative positioning between the matched image block in the sub-QR-ICS frame and the reference image block in the first single-channel quantized ICS frame, and the motion vector corresponding to one ICS channel is shared by other ICS channels;
for the ICS frames of the subtracted image blocks corresponding to each ICS channel, forming stacks of the ICS frames of the subtracted image blocks, wherein each stack comprises a preset number of ICS frames of the subtracted image blocks;
for an ICS frame of the subtracted image block in each stack corresponding to each ICS channel, determining shared data, wherein the shared data represents similar data between the ICS frames of the subtracted image block, and determining a stack residual frame based on the shared data;
wherein each set of original pixel values corresponds to a frame of the video segment.
CN201980066142.9A 2019-10-10 2019-10-10 Method and device for compressing video clip Active CN113196779B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/110470 WO2021068175A1 (en) 2019-10-10 2019-10-10 Method and apparatus for video clip compression

Publications (2)

Publication Number Publication Date
CN113196779A CN113196779A (en) 2021-07-30
CN113196779B true CN113196779B (en) 2022-05-20

Family

ID=75436925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980066142.9A Active CN113196779B (en) 2019-10-10 2019-10-10 Method and device for compressing video clip

Country Status (2)

Country Link
CN (1) CN113196779B (en)
WO (1) WO2021068175A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108022270A (en) * 2016-11-03 2018-05-11 奥多比公司 The image patch sampled using the probability based on prophesy is matched
CN108632625A (en) * 2017-03-21 2018-10-09 华为技术有限公司 A kind of method for video coding, video encoding/decoding method and relevant device
CN109587502A (en) * 2018-12-29 2019-04-05 深圳市网心科技有限公司 A kind of method, apparatus of frame data compression, equipment and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2647202A1 (en) * 2010-12-01 2013-10-09 iMinds Method and device for correlation channel estimation
US20140212046A1 (en) * 2013-01-31 2014-07-31 Sony Corporation Bit depth reduction techniques for low complexity image patch matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108022270A (en) * 2016-11-03 2018-05-11 奥多比公司 The image patch sampled using the probability based on prophesy is matched
CN108632625A (en) * 2017-03-21 2018-10-09 华为技术有限公司 A kind of method for video coding, video encoding/decoding method and relevant device
CN109587502A (en) * 2018-12-29 2019-04-05 深圳市网心科技有限公司 A kind of method, apparatus of frame data compression, equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2021068175A1 (en) 2021-04-15
CN113196779A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
KR102017996B1 (en) A method and apparatus of image processing using line input and ouput
US11962937B2 (en) Method and device of super resolution using feature map compression
US20230276023A1 (en) Image processing method and device using a line-wise operation
AU2018357828A1 (en) Method and apparatus for super-resolution using line unit operation
CN112425158B (en) Monitoring camera system and method for reducing power consumption of monitoring camera system
CN113196779B (en) Method and device for compressing video clip
US10009617B2 (en) Method of operating encoder and method of operating system on chip including the encoder
WO2023193629A1 (en) Coding method and apparatus for region enhancement layer, and decoding method and apparatus for area enhancement layer
US10652540B2 (en) Image processing device, image processing method, and recording medium storing image processing program
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
Dekel Adaptive compressed image sensing based on wavelet-trees
JP2022187683A (en) Data compression/decompression system and method
WO2020230188A1 (en) Encoding device, encoding method and program
CN113141506A (en) Deep learning-based image compression neural network model, and method and device thereof
EP1652146B1 (en) Implementation of the jpeg2000 compression algorithm in hardware
EP4294017A1 (en) Method for image encoding
CN113170160B (en) ICS frame transformation method and device for computer vision analysis
EP4294015A1 (en) Method for image encoding
EP4294013A1 (en) Method for image encoding
EP4294006A1 (en) Method for image encoding
EP4294016A1 (en) Method for image encoding
EP4294014A1 (en) Method for image encoding
Li et al. Learned Image Compression with Multi-Scan Based Channel Fusion
WO2024077738A1 (en) Learned image compression based on fast residual channel attention network
WO2023242594A1 (en) Method for image encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device for video clip compression

Effective date of registration: 20230817

Granted publication date: 20220520

Pledgee: Jiangsu Jiangyin Rural Commercial Bank Co.,Ltd. Wuxi Branch

Pledgor: Wuxi ankedi Intelligent Technology Co.,Ltd.

Registration number: Y2023980052623

PE01 Entry into force of the registration of the contract for pledge of patent right