WO2019237659A1 - Procédé et appareil d'échantillonnage par compression aveugle, et système d'imagerie - Google Patents

Procédé et appareil d'échantillonnage par compression aveugle, et système d'imagerie Download PDF

Info

Publication number
WO2019237659A1
WO2019237659A1 PCT/CN2018/117040 CN2018117040W WO2019237659A1 WO 2019237659 A1 WO2019237659 A1 WO 2019237659A1 CN 2018117040 W CN2018117040 W CN 2018117040W WO 2019237659 A1 WO2019237659 A1 WO 2019237659A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel values
groups
frames
weight tensor
group
Prior art date
Application number
PCT/CN2018/117040
Other languages
English (en)
Inventor
David Jones Brady
Xuefei YAN
Jianqiang Wang
Chao Huang
Zian Li
Yunfu DENG
Original Assignee
Suzhou Aqueti Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Aqueti Technology Co., Ltd. filed Critical Suzhou Aqueti Technology Co., Ltd.
Priority to CN201880094416.0A priority Critical patent/CN112470472B/zh
Priority to CN201980038408.9A priority patent/CN112425158B/zh
Priority to PCT/CN2019/073703 priority patent/WO2019237753A1/fr
Publication of WO2019237659A1 publication Critical patent/WO2019237659A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/40Extracting pixel data from image sensors by controlling scanning circuits, e.g. by modifying the number of pixels sampled or to be sampled
    • H04N25/44Extracting pixel data from image sensors by controlling scanning circuits, e.g. by modifying the number of pixels sampled or to be sampled by partially reading an SSIS array
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/70SSIS architectures; Circuits associated therewith
    • H04N25/71Charge-coupled device [CCD] sensors; Charge-transfer registers specially adapted for CCD sensors
    • H04N25/75Circuitry for providing, modifying or processing image signals from the pixel array
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images

Definitions

  • the present invention relates to a blind compressive sampling method, apparatus, and an imaging method and system, particularly to a method for low power imaging.
  • the general function of a camera is to transform parallel optical data into compressed serial electronic formats for transmission or information storage.
  • a parallel camera (including an array of cameras) , which use multiple focal planes to capture data in parallel, are increasingly popular because they allow parallel capture and processing of image data fields.
  • Each camera in the array may capture a different area in the field of view or may capture diverse color, exposure or focal range components.
  • a parallel camera may capture 100-1,000 megapixels per “frame” . This data rate considerably strains system communication, processing and storing resources.
  • focal plane operation at video rates may require 1 nanojoule per captured pixel. Processing per pixel to create the read-out compressed data stream typically requires 5-10x more power than pixel capture, e.g. ⁇ 10 nanojoules per pixel.
  • every pixel must be processed because all pixels are used in producing the visualized image.
  • initial image display and analysis may not involve display of every captured pixel.
  • scenes are analyzed in compressed or low-resolution format prior to analysis of high-resolution components. Since many pixels may never be observed, it may be more efficient to delay image processing until one knows which pixels need analysis.
  • a conventional camera consists of a focal plane and a “system on chip” image processing platform.
  • the image signal processing (ISP) chip implements various image processing functions, such as demosaicing, nonuniformity correction, etc. as well as image compression.
  • ISP chips typically implement color demosaicing and other image processing algorithms that also require power and chip area. In view of the multiple digital steps per pixel required in these processes, ISP chips commonly use substantially more power than image sensor capture and read-out.
  • Compressive measurement is a mechanism for substantially reducing camera head power.
  • compressive measurement uses physical layer coding to project image data onto coded patterns that can be computationally inverted to recover the original image. Such methods may be used in spectral imaging, as disclosed in US7336353B2 or in temporal imaging, as disclosed in Llull, P., Liao, X., Yuan, X., Yang, J., Kittle, D., Carin, L., ... & Brady, D.J. (2013) . Coded aperture compressive temporal imaging. Optics express, 21 (9) , 10526-10545.
  • the blind compressive sampling method may include one or more of the following operations.
  • a plurality of groups of raw pixel values may be read out in sequence from one or more imaging sensor array.
  • Each group of raw pixel values may be compressed into an integer with a compression weight tensor.
  • a plurality of integers corresponding to the plurality of groups of raw pixel values may be stored.
  • Each group of raw pixel values may correspond to a portion of one or more frames captured by the one or more imaging sensor array, and the plurality of integers may be used for reconstructing the one or more frames.
  • each element in the compression weight tensor may be an integer.
  • bit lengths of elements in the compression weight tensor may be 12-bit, 8-bit, 6-bit, 4-bit or 2-bit, wherein when the integer is 2-bit binaries, each element in the compression weight tensor is -1/+1, or 0/1.
  • a group of raw pixel values may correspond to pixels in a 2D patch of a frame.
  • compressing the group of raw pixel values into an integer with a compression weight tensor may comprise: performing 2D convolution operation to pixels in the group with a 2D kernel.
  • a group of raw pixel values may correspond to pixels in a 1D segment of a frame.
  • compressing the group of raw pixel values into an integer with a compression weight tensor may comprise: performing 1D convolution operation to pixels in the group with one or more 1D kernels.
  • a group of raw pixel values may correspond to pixels from multi-frames, and each pixel may be from a same location of the multi-frames.
  • compressing the group of raw pixel values into an integer with a compression weight tensor may comprise: performing weighted summation of the of the group of raw pixel values.
  • weight values for different locations of a frame within the multi-frames may correspond to a 1D kernel or a 2D kernel, and the 1D or 2D kernels of the multi-frames may be orthogonal or quasi-orthogonal to each other.
  • the multi-frames may be captured by parallel camera, and each frame may be captured by a camera within the parallel camera.
  • the method may further include one or more the following operations.
  • the plurality of integers may be decompressed into a plurality of groups of PQI output pixel values with a decompression weight tensor, wherein the decompression weight tensor may have a same dimension as the compression weight tensor.
  • a plurality of groups of output pixel values may be obtained by inputting the plurality of groups of PQI output pixel values into a QINN.
  • parameters of the decompression weight tensor and the QINN may be determined by sample-based training.
  • the sample-training process may include the follow operations.
  • a plurality of groups of sample pixel values may be read out in sequence from one or more imaging sensor array.
  • a first plurality of sample integers may be determined by compressing each group of sample pixel values into an integer with a first compression weight tensor.
  • a first plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the first plurality of sample integers with a first decompression weight tensor.
  • a first plurality of groups of output pixel values may be determined by inputting the first plurality of groups of PQI output pixel values into a first QINN.
  • parameters in the first decompression weight tensor and parameters in the first QINN may be tuned based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the first plurality of groups of output pixel values.
  • the first compression weight tensor may be Gold codes, quasi-orthogonal codes or quasi-random codes.
  • parameters of the compression weight tensor may be determined based on sample-based training together with the parameters of the decompression weight tensor and the QINN, the training process may include the follow operations.
  • a plurality of groups of sample pixel values may be read out in sequence from one or more imaging sensor array.
  • a first plurality of sample floating-numbers may be determined by compressing each group of sample pixel values with a second compression weight tensor.
  • a second plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the first plurality of sample floating-numbers with a second decompression weight tensor.
  • the second compression weight tensor may be tuned to a floating-number weight tensor by tuning parameters in the second compression weight tensor based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the second plurality of groups of PQI output pixel values.
  • the method may further include the follow operations.
  • An integerized compression weight tensor may be determined by integerizing parameters in the floating-number weight tensor.
  • a second plurality of sample integers may be determined by compressing each group of sample pixel values into an integer with the integerized compression weight tensor.
  • a third plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the second plurality of sample integers with the second decompression weight tensor.
  • a second plurality of groups of output pixel values may be determined by inputting the third plurality of groups of PQI output pixel values into a second QINN. Parameters in the second decompression weight tensor and parameters in the second QINN may be tuned based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the second plurality of groups of output pixel values.
  • the minimization of the quality loss may include one of the follow operations: minimizing the mean square difference between the plurality of groups of sample pixel values and the first plurality of groups of output pixel values, minimizing the mean square difference between the plurality of groups of sample pixel values and the second plurality of groups of PQI output pixel values, or minimizing the mean square difference between the plurality of groups of sample pixel values and the second plurality of groups of output pixel values.
  • each group of raw pixel values may correspond to a portion of multi-frames captured in sequence by an imaging sensor array, and pixels in a frame in the multi-frames may be compressed with a certain compression ratio by a compression weight tensor.
  • pixels in the multi-frames may be compressed with a same compression ratio.
  • pixels in the first frame and pixels in the last frame of the multi-frames may be compressed with a first compression ratio
  • pixels in one or more frames between the first frame and the last frame may be compressed with a second compression ratio, wherein the first compression ratio is larger than the second compression ratio
  • the method may further include the follow operations.
  • pixel values in a 3D entity may be tuned by inputting the 3D entity into a third QINN, wherein the 3D entity is obtained by stacking the plurality of groups of output pixel values corresponding to the multi-frames together.
  • a blind compressive sampling apparatus including a reading-out unit configured to read out a plurality of groups of raw pixel values in sequence from one or more imaging sensor array, a compressing unit configured to compress each group of raw pixel values into an integer with a compression weight tensor and a storage configured to configured to store a plurality of integers corresponding to the plurality of groups of raw pixel values.
  • Each group of raw pixel values may correspond to a portion of one or more frames captured by the one or more imaging sensor array and the plurality of integers may be used for reconstructing the one or more frames.
  • each element in the compression weight tensor may be an integer.
  • bit lengths of elements in the compression weight tensor may be 12-bit, 8-bit, 6-bit, 4-bit or 2-bit, wherein when the integer is 2-bit binaries, each element in the compression weight tensor is -1/+1, or 0/1.
  • a group of raw pixel values may correspond to pixels in a 2D patch of a frame
  • the compressing unit may be configured to perform 2D convolution operation to pixels in the group with a 2D kernel.
  • a group of raw pixel values may correspond to pixels in a 1D segment of a frame, and the compressing unit may be configured to perform 1D convolution operation to pixels in the group with one or more 1D kernels.
  • a group of raw pixel values may correspond to pixels from multi-frames, and each pixel may be from a same location of the multi-frames, and the compressing unit may be configured to perform a weighted summation of the group of raw pixel values.
  • weight values of different locations of a frame within the multi-frames may correspond to a 1D kernel, and the 1D kernels of the multi-frames may be orthogonal or quasi-orthogonal to each other.
  • the multi-frames may be captured by parallel camera, and each frame may be captured by a camera within the parallel camera.
  • Yet another aspect of the present disclosure is directed to an imaging system including a compressing module configured to compress a plurality of groups of raw pixel values from one or more imaging sensor array into a plurality of integers and a decompressing module configured to decompress the plurality of integers into a plurality of groups of output pixel values.
  • the compressing module may include a reading-out unit configured to read out a plurality of groups of raw pixel values in sequence from one or more imaging sensor array, a compressing unit configured to compress each group of raw pixel values into an integer with a compression weight tensor and a storage configured to store a plurality of integers corresponding to the plurality of groups of raw pixel values.
  • Each group of raw pixel values may correspond to a portion of one or more frames captured by the one or more imaging sensor array and the plurality of integers may be used for reconstructing the one or more frames.
  • FIG. 1A is an exemplary schematic diagram of a parallel camera
  • FIG. 1B shows a pixel arrangement of a frame according to some embodiments of the disclosure
  • FIG. 2 illustrates a blind compressive sampling method according to some embodiments of the present disclosure
  • FIG. 3A shows a convolution process as described in 204 according to some embodiments of the present disclosure
  • FIG. 3B illustrates a compression process of frame strategy one as described in 204 according to some embodiments of the present disclosure
  • FIG. 4 is an example of four weight tensors during a single-layer convolutional 2D compression of raw-bayer data from the parallel camera according to some embodiments of the present disclosure
  • FIG. 5 is an example of original and reconstructed raw-bayer picture according to some embodiments of the present disclosure.
  • FIG. 6 is an integer array of the shape [256, 480, 4] after compression of the input pixel values shown in FIG. 5 (bottom panel) according to some embodiments of the present disclosure
  • FIG. 7 shows demosaiced RGB pictures from the original raw-bayer picture and the reconstructed raw-bayer picture according to some embodiments of the present disclosure
  • FIG. 8 illustrates a decompression method for the plurality of integers after the compression process according to some embodiments of the present disclosure
  • FIG. 9 shows an exemplary frame compressed from two frames based on the frame strategy three described in FIG. 2 according to some embodiments of the present disclosure
  • FIG. 10A-D show exemplary separation processes of the two frames described in FIG. 9 according to some embodiments of the present disclosure
  • FIG. 11 illustrates a method for training a decompression weight tensor and a QINN according to some embodiments of the present disclosure
  • FIG. 12 illustrates a method for training a floating-number weight tensor according to some embodiments of the present disclosure
  • FIG. 13 shows a method for training parameters in a decompression weight tensor and parameters in a QINN according to some embodiments of the present disclosure
  • FIG. 14 is an exemplary diagram of a video compression strategy relative to the video compression strategy. H. 265 according to some embodiments of the present disclosure.
  • FIG. 15 is an exemplary diagram of a blind compressive sampling apparatus according to some embodiments of the present disclosure.
  • FIG. 16 is an exemplary diagram of an imaging system according to some embodiments of the present disclosure.
  • system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, section or assembly of different level in ascending order. However, the terms may be displaced by other expression if they may achieve the same purpose.
  • module, ” “unit, ” or “block, ” as used herein refers to logic embodied in hardware or firmware, or to a collection of software instructions.
  • a module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device.
  • a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts.
  • Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) .
  • a computer-readable medium such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) .
  • Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device.
  • Software instructions may be embedded in a firmware, such as an EPROM.
  • modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or can be included of programmable units, such as programmable gate arrays or processors.
  • the modules/units/blocks or computing device functionality described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware.
  • the modules/units/blocks described herein refer to logical modules/units/blocks that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks despite their physical organization or storage. The description may be applicable to a system, an engine, or a portion thereof.
  • the present disclosure provided herein relates to a blind compressive sampling method, apparatus and an imaging system. Detailed descriptions will be illustrated in the following embodiments.
  • FIG. 1A is an exemplary schematic diagram of a parallel camera.
  • the parallel camera may include an array of cameras, and the array of cameras may receive light corresponding to a scene at an image field. Light received by the array of cameras may be read out as raw-bayer data by image signal processing (ISP) chip shown in FIG. 1A.
  • ISP image signal processing
  • the parallel cameras array may consist of an array of “microcameras” . Each of the cylindrical objects in the picture corresponds to a microcamera (including a lens and an electronic sensor array) .
  • the microcameras may point in different directions and observe different fields of view. In some embodiments, many different arrangements for the microcameras may be possible.
  • ring arrangement enabling 360-degree field of view capture are popular for capturing virtual reality media
  • stereo arrangements are popular to capture 3D perspective
  • microcameras with different color sensitivity or different temporal sampling as in M. Shankar, N.P. Pitsianis, and D.J. Brady, “Compressive video sensors using multichannel imagers, ” Appl. Opt. 49 (10) , B9–B17, (2010) . ) are also of interest.
  • the array of microcameras may also capture different ranges or focal depths using diverse focal lengths or focus settings. In each case, raw-bayer data must be read-out of the parallel camera using parallel or serial streaming as shown in FIG. 1B.
  • FIG. 1B FIG.
  • FIG. 1B shows a pixel arrangement of a frame according to some embodiments of the disclosure.
  • a frame may include 49 pixels and each pixel may have a corresponding pixel value.
  • the pixel values may be read out in sequence by the camera head after capturing the frame.
  • the read-out data from a focal plane may be in a raster format, meaning rows are read out in sequence.
  • Pixel values, represented as x i, j are read-out in order as x 1, j for j from 1 to 7, then x 2, j , etc.
  • Typical focal planes may have more than just 49 pixels, i and j typically may extend to several thousands.
  • frames are also read out sequentially in time, so the pixel values in general are indexed by three values, x i, j, k , with k corresponding to frame number. Pixels also correspond to different colors, typically red, green and blue, but the color values may be typically mosaicked across the sensor so that a given pixel corresponds to a given known color.
  • Pixel values in conventional cameras may be buffered for demosaicing, image processing (nonuniformity correction, color space transformation, denoising, sharpening, white balance and black level adjustment, etc. ) and compression. Since compression is implemented as a two-dimensional transformation, multiple rows (typically 8) must be buffered and each pixel value must be accumulated in several transformation buffers. In addition, division of pixel values by quantization matrices and compressive (Huffman) coding must be implemented on each image block.
  • the present disclosure presents a method/strategy for immediate compression of raw-bayer data streams with minimal processing.
  • the overall power in the camera heads (of the microcameras) can be reduced to a level comparable to the sensor read-out power. This can be achieved by minimizing the amount of buffered pixel values and perform simple numerical operations per pixel.
  • the present compression/coding methods/strategies may apply compact kernel convolution and coding. These methods constitute “blind compression, ” or “compressive sampling” in that the read-out signal is immediately multiplexed in linear transformations of pixel data, without analysis of the content of the pixel data. Compressive sampling has most commonly used decompression strategies based on convex optimization, but more recently neural networks based decompressive inference have proved effective, as discussed for example in Mousavi, Ali, Ankit B. Patel, and Richard G. Baraniuk. "A deep learning approach to structured signal recovery. " Communication, Control, and Computing (Allerton) , 2015 53rd Annual Allerton Conference on. IEEE, 2015.
  • JPEG2000 transform localized spatial, spectral and/or temporal blocks in the image data cube onto sparsifying basis.
  • JPEG2000 consists of a block based discrete cosine transformation along with coefficient thresholding and quantization, followed by lossless sequence compression.
  • Video standards add temporal frame differential analysis to this approach, JPEG2000 extends this approach using multiscale transformations over still localized bases. These approaches are based on the fact that pixels within a localized region contain correlated data, which enables compression. Such pixel maps may be sparsely represented by transformation onto a different basis.
  • JPEG2000 The difference between the JPEG2000 and the blind compressive sampling approach may be explained as follows. Firstly, consider two numbers, A and B. In communicating A or B, a certain amount of information is transmitted based on the probability entropy implicit in each number. If the mutual information of A and B is nonzero, then communication of A+B may transmit a different amount of information than A-B. JPEG2000 compression relies on this fact to communicate information, since in images if A and B are adjacent pixels A+B almost always involves more information than A-B. At an edge, A-B may communicate more, so JPEG2000 sends only values of A-B at points where this value is large.
  • blind compressive sampling in contrast, combines pixel values in a one-dimensional (1D) or 2D shape with pre-set compression weight tensor into an array of numbers.
  • the compression process can also be viewed as carrying out convolutional operations as used in the Convolutional Neural Network.
  • the components/values/elements of the weight tensor may be restricted to integers for convenient application using hardware such as FPGA.
  • the compression ratio can be calculated as the ratio of the total bit number of the array of numbers and the input pixel values from the parallel camera.
  • a pre-set decompression weight tensor and a pre-set deep neural network may be used to decompress the array of numbers from the compression process back to raw data (with certain loss of quality compared with the original) .
  • the decompressed raw data may further go through the necessary processes including the demosaic to usual RGB or YUV pictures or video frames.
  • FIG. 2 illustrates a blind compressive sampling method according to some embodiments of the present disclosure.
  • the compressing method may be implemented by the camera head.
  • a plurality of groups of raw pixel values may be read out in sequence from one or more imaging sensor array.
  • the raw pixel values may be read out in sequence as raw-bayer data.
  • a camera head may capture a frame comprising a plurality of pixels. Each pixel may be represented by a pixel value. The pixel value may be transmitted in a form of binary.
  • an imaging sensor array may be an electronic sensor array and the raw pixel values may be the input pixel values as described in FIG. 1A.
  • the plurality of groups of raw pixel values may be compressed with one of three frame strategies which will be described below.
  • each group of raw pixel values may be compressed into an integer with a compression weight tensor.
  • each group of raw pixel values corresponds to a portion of one or more frames captured by the one or more imaging sensor array.
  • elements in the weight tensor may be integers for easy application on hardware such as FPGA.
  • the element in the compression weight tensor may be binaries and the bit lengths of the elements may be 12-bit, 10-bit, 8-bit, 6-bit, 4-bit or 2-bit. Further, when the elements are 2-bit binaries, the elements may be -1 or +1; or the elements may be 0 or 1.
  • a group of raw pixel values may correspond to pixels in a 2D patch of a frame, and the compression weight tensor may be a 2D kernel.
  • the 2D patch and the 2D kernel may have a same dimension.
  • the 2D kernel may have a dimension [k x , k y ]
  • the frame of shape [N x , N y , 1] may be divided into [N x /k x , N y /k y ] patches.
  • Pixel values corresponding to the pixels in a certain patch with shape [k x , k y , 1] may be multiplied by a 2D kernel with shape [k x , k y , 1, ncomp] , and the pixel values in the 2D patch may be compressed into ncomp numbers (ncomp is a preset integer defined manually) .
  • ncomp is a preset integer defined manually
  • the input pixel values of the frame may be compressed into an array of ncomp numbers (an array of integers) .
  • the compression process may be a 2D convolution operation described as equation (1) shows below:
  • Output (k) sum i, j (pixel i, j , weight i, j, k ) (1) where indices i and j loop through k x and k y respectively, and index k is from 1 to ncomp.
  • the compression ratio without considering the difference between bit length of input pixel values (raw pixel values are usually 8-bit or 10-bit) and bit length of the array of numbers after compression (8 bit) can be expressed as ncomp/ (k x *k y ) .
  • different compression ratios can be achieved by using various settings of [k x , k y , ncomp] .
  • different compression ratios such as 1/16, 1/16, 1/32 and 1/256 can be achieved with 2D kernels [16, 16, 16] , [8, 8, 4] , [16, 16, 8] and [16, 16, 1] respectively.
  • a group of raw pixel values may correspond to pixels in a 1D segment of a frame.
  • the sequence of pixels corresponding to the frame may be divided into segments.
  • the weight tensor may be an integer vector with a same dimension with the segment, and the compression process may be a 1D convolution operation that combines pixels in the 1D segment of the frame into an integer with a 1D kernel or a 1D integer vector.
  • each element in the integer vector may be -1 or +1. In some embodiments, each element in the integer vector may be 0 or 1.
  • 16 incoming pixel values may be combined together using an integer vector [0, 1, 0, 0, 1, 0, ... 1] with length of 16 into one number.
  • 16 incoming pixel values may be combined together using an integer vector [-1, 1, -1, -1, 1, -1, ... 1] with length of 16 into one number.
  • the sequence may be divided row by row.
  • Various 1D kernels or 1D integer vectors have been developed, including [128, 1, 4] , [32, 1, 4] .
  • combinations of different convolutional-1D kernels for different rows in the raw-bayer data may be used to control the total compression ratio of a picture/frame.
  • This division way of the sequence of pixels uses less buffer size than that of the 2D patch division way, because pixel values from different rows/segments do not need to be buffered while the incoming pixel values can be processed as segments.
  • a group of raw pixel values may correspond to pixels from multi-frames, and each pixel is from a same location of the multi-frames.
  • the weight tensor may include weight values for the pixel values in the multi-frames. For example, there may be a first frame, a second frame and a third frame, and each frame has a dimension of [16, 16, 1] , and there may be a first location, a second location, a third location, ..., and a sixteenth pixel location in each frame. For each pixel location in a frame, there may be a corresponding weight value.
  • Weight values for pixel locations in a frame may correspond to a 1D kernel or a 2D kernel. In some embodiments, the 1D kernels or 2D kernels between the multi-frames are orthogonal or quasi-orthogonal to each other.
  • x ijk l refer to the ij th spatial pixel value at time k from the l th microcamera.
  • the compression process may consist of compressing with a weight tensor c:
  • c ijk l is a weight value corresponding to the ij th spatial pixel value at time k for the l th microcamera
  • c ijk l may be typically drawn from +1 or -1.
  • other weight values may be also considered.
  • c ijk l may be typically drawn from 0 or 1.
  • c ijk l may be typically drawn from 12-bit, 10-bti, 8-bit, 6-bit, 4-bit or binaries.
  • the sequence of weight values for a given microcamera may be selected to be orthogonal or quasi-orthogonal to that for other microcameras, using for example the Gold codes common to CDMA, but quasi-random or other codes may also be used.
  • the multi-frames can be compressed into one frame.
  • the multi-frames x ijk l may be recovered from g ijk with a decompression weight tensor and a deep learning neural network.
  • the decompression weight tensor may have a same dimension as the compression weight tensor. Details of the decompression weight tensor and the deep learning neural network may be seen in FIG. 8 and FIG. 14-16.
  • the frame strategy three may be used for multi-frames captured by microcameras at a same time in a parallel camera, while the frame strategy one or the frame strategy two may be applied to compress raw pixel values in multi-frame captured in time sequence.
  • the frame strategy three may be applied together with one of the frame strategy one and the frame strategy two to compress video frames. Video strategies will be also discussed in this disclosure.
  • a plurality of integers corresponding to the plurality of groups of raw pixel values may be stored. As described in 204, each group of raw pixel values may be compressed into an integer, and the plurality of groups of raw pixel values may be compressed into a plurality of integers. The plurality of integers may be stored or buffered during the compression/decompression process.
  • FIG. 3A shows a convolution process as described in 204 according to some embodiments of the present disclosure.
  • an array of pixels with dimension of 4*4 may be compressed into one integer with a weight tensor.
  • the weight tensor may also have a dimension of 4*4.
  • the compressed data may be quantized to 8-bit integers for storage and/or transmission.
  • the quantization process in the current invention is much simpler than that of JPEG2000’s.
  • the results of the output of the convolutional operation may be scaled (reduce bit length) to suit the range of 8-bit integers. While there are no complicated calculations for entropy coding for reducing quality loss, and the quantization loss directly affects the overall quality of compression/decompression, the compression/decompression process keeps high similar overall quality as that of JPEG2000.
  • While the compression process is already simple, it may provide a way to reduce the necessary buffer size, as illustrated in 204.
  • a convolutional-2D kernel with dimension [4, 4, 1] to a patch of pixels in shape [4, 4] , one does not need to put all the pixels (16 in total) in buffer and carry out the elementwise product and summation once. Instead, one can process the pixels with the proper kernel weight elements row-by-row as the pixels are read in, and put the output value (array of numbers) in buffer until a single convolutional operation is finished. After each convolutional operation, the buffered numbers can be output to storage, and the buffer can be cleared.
  • the necessary buffer size is [k x , k y , ncomp*N x /k x ] when the method above is carried out.
  • FIG. 3B illustrates a compression process of frame strategy one as described in 204 according to some embodiments of the present disclosure.
  • a frame may be compressed patch by patch.
  • the frame may be processed by hardware such as FPGA, the convolutional kernel is applied to each patch of pixels, and move to the next patch with no overlap nor gap until the last one.
  • the patch 1 shown in FIG. 3B represents the patch that have been processed, and patch 2 represents the patch being processed.
  • FIG. 4 is an example of four weight tensors during a single-layer convolutional 2D compression process of raw-bayer data according to some embodiments of the present disclosure.
  • a single-layer convolutional-2D operation may be implemented to compress a [N x , N y ] -pixel raw-bayer data.
  • FIG. 5 is an example of original and reconstructed raw-bayer picture according to some embodiments of the present disclosure. Metrics of PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity) are shown on the top of the figure.
  • PSNR Peak Signal to Noise Ratio
  • SSIM Structuretural Similarity
  • the input pixel values (raw-bayer data) and reconstructed raw-bayer picture are presented in the bottom and top panels of Fig. 5, respectively.
  • the [2048, 3840] original raw-bayer picture is compressed to an integer array of the shape [256, 480, 4] which will be described in Fig. 6.
  • the compression ratio is 1/16 (not considering the extra compression due to bit length reduction) .
  • the bottom panel presents the original raw-bayer picture (comprised by the input pixel values)
  • the top panel presents the reconstructed raw-bayer picture which is from the input pixel values going through the compression and decompression processes.
  • the decompression process will be discussed in FIG. 8.
  • the reconstructed raw-bayer picture has to be demosaiced to be a usual RGB or YUV format, and the demosaic step can be done after the decompression.
  • FIG. 6 is an integer array of the shape [256, 480, 4] after compression of the input pixel values shown in FIG. 5 (right panel) according to some embodiments of the present disclosure.
  • the compression may be implemented with the weight tensor shown in FIG. 4.
  • FIG. 7 shows demosaiced RGB pictures from the original raw-bayer picture and the reconstructed raw-bayer picture according to some embodiments of the present disclosure.
  • the bottom panel presents the demosaiced RGB picture from the original raw-bayer picture
  • the top panel presents the demosaiced RGB picture from the reconstructed raw-bayer picture. Both the demosaiced RGB pictures from the original raw-bayer picture and the reconstructed raw-bayer picture are discussed with monochrome images as shown in FIG. 7.
  • FIG. 8 illustrates a decompression method for the plurality of integers after the compression process according to some embodiments of the present disclosure.
  • the decompression method may be performed by the camera head or by a server remote from the camera head.
  • the plurality of integers may be decompressed into a plurality of groups of Pre-Quality-Improvement (PQI) output pixel values.
  • the decompression process may be implemented with a decompression weight tensor, wherein the decompression weight tensor may have a same dimension with the compression weight tensor.
  • parameters in the decompression weight tensor may be tuned based on machine learning.
  • a plurality of groups of output pixel values may be determined by inputting the plurality of groups of PQI output pixel values into a neural network for quality improvements (QINN) .
  • the plurality of groups of output pixel values may correspond to the reconstructed raw-bayer picture in FIG 5.
  • the neural network for quality improvements may be a deep convolutional neural network.
  • the QINN may be made of three parts: the pre-inception module, the inception module and the post-inception module.
  • the pre-inception module may use two different kernels in parallel to carry out convolutional operations without changing the [N x , N y ] dimension of the input data, but combines the inter-pixel information into a 3D entity of dimension [N x , N y , NC] .
  • NC 96.
  • the inception module may combine two different convolutional kernels and may be intended to further mix the information provided by the pre-inception module.
  • a 3D entity of the same dimension as the input may be produced by the convolutional operations. This 3D entity may be added to the input in an elementwise manner, for the purpose to form a Residual Network, to produce the output.
  • Both the input and output of the inception module may be in dimension [N x , N y , NC] , and because of this design, a number of inception modules can be stacked together to form a deep neural network, but at the same time keeping the good features of easily-trainable because of the Residual-Network design.
  • the post-inception module may use a convolutional operation to combine the output from the final inception module and the output from the pre-inception module. Another convolutional operation may be put at the end to make the final output in shape [N x , N y ] as the raw-bayer picture.
  • we may stack 8 inception modules besides the pre-and post-inception modules to form the QINN.
  • parameters (or values) in the decompression weight tensor/the QINN may be trained.
  • the sample training method may be also seen in FIG. 11-13.
  • the plurality of groups of output pixel values may be further demosaiced by the traditional methods, such as function “demosaic” in Matlab.
  • another QINN can be used for the demosaiced RGB picture so the quality of the final RGB output is optimized.
  • the input and output of this QINN for RGB pictures are in dimension [N x , N y , 3] .
  • the quality of the compression has been measured by directly comparing the original raw-bayer data with the plurality of groups of output pixel values (after going through compression and decompression) , and comparing the RGB output from them.
  • the quality comparison have been tested using the quantities PSNR and SSIM for the RGB output according to this disclosure and the RGB output from using JPEG2000 under the same compression ratios (using the non-compressed RGB output as ground truth) , and it has been found that the Frame-Strategy one of current invention has close quality to JPEG2000 under 1/16 compression (PSNR and SSIM according to this disclosure are relatively less than 1%lower than JPEG2000’s) .
  • the quality of Frame-Strategy two is about 2.5% (row by row) or 5% (segment with random numbers of pixel values) , both of which is lower than that of Frame-Strategy one (relatively) .
  • the computation for the compression using the current invention involves only multiplying integers from the weight tensor to the incoming pixels and do summations for certain groups, it is much simpler than the processes JPEG2000 has to go through, namely demosaicing, Fourier Discrete Cosine Transformation, entropy coding and JPEG2000-specific quantization.
  • the current invention takes much less power than JPEG2000 for compression.
  • Different strategies in the current invention involves different amount of computation, but they all have much less power consumption than that of JPEG2000’s.
  • decompression weight tensor with different parameters/values may be obtained based on machine learning for different compression weight tensor.
  • FIG. 9 shows an exemplary frame compressed from two frames based on the frame strategy three described in FIG. 2 according to some embodiments of the present disclosure.
  • CDMA-based separation/reconstruction of frames drawn from the Modified National Institute of Standards and Technology (MNIST) database may be analyzed.
  • Each frame may have 28 by 28 pixels.
  • Two-frame were added pairwise after one of the frames had been multiplied by a quasi-random code with code values drawn from +1 or -1.
  • the first layer is a dense network transforming the 28 by 28 input the compressed frame into 28 by 28 by 2-layer frames. Three convolutional network layers were then used to restore the frames to their original form.
  • FIG. 10A-D show exemplary separation processes of the two frames described in FIG. 9 according to some embodiments of the present disclosure.
  • the upper left image is the compressed multiplexed frame that would be received at the decompression system.
  • the two center images are the images restored from the compressed data.
  • the two right most images are the original images from the MNIST data set. As above the restored images are in the center, the original images are at right. While deeper networks may be used and more frames may be multiplexed together, separation of frames shown FIG. 10A-D demonstrate the functionality of the compressive sampling method.
  • FIG. 11 illustrates a method for training a decompression weight tensor and a QINN according to some embodiments of the present disclosure.
  • the trained decompression and the trained QINN may be used for CDMA compression/decompression process.
  • a plurality of groups of sample pixel values may be read out in sequence from one or more imaging sensor array.
  • a group of sample pixel values may correspond to pixels from multi-frames, and each pixel is from a same location of the multi-frames.
  • the plurality of groups of sample pixel values may correspond to pixels from each location of the multi-frames.
  • a first plurality of sample integers may be determined by compressing each group of sample pixel values into an integer with a first compression weight tensor.
  • the multi-frames may be compressed into and may be represented by one frame.
  • Each pixel value at a location of the one frame may be a weight summation of the pixel values at the same location of the multi-frames.
  • weight values for pixel locations in a frame may correspond to a 1D kernel or a 2D kernel.
  • the 1D or 2D kernels of the multi-frames are orthogonal or quasi-orthogonal to each other.
  • the first compression weight tensor may include the 1D or 2D kernels.
  • the first compression weight tensor may be the Gold codes common to CDMA, but quasi-random or other codes may also be used.
  • a first plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the first plurality of sample integers with a first decompression weight tensor.
  • the first decompression weight tensor may have a same dimension as the first compression weight tensor, and parameters in the first decompression weight tensor may be trained.
  • a first plurality of groups of output pixel values may be determined by inputting the first plurality of groups of PQI output pixel values into a first QINN.
  • the first QINN may be a deep convolutional neural network. And parameters of the first QINN may also be trained.
  • parameters in the first decompression weight tensor and parameters in the first QINN may be tuned based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the first plurality of groups of output pixel values.
  • the quality loss may be defined as the mean square difference between the plurality of groups of sample pixel values and the first plurality of groups of output pixel values.
  • decompression weight tensor with different parameters/values may be obtained based on machine learning for different compression weight tensor.
  • FIG. 12 illustrates a method for training a floating-number weight tensor according to some embodiments of the present disclosure.
  • a plurality of groups of sample pixel values may be read out in sequence from one or more imaging sensor array.
  • a group of sample pixel values may correspond to pixels in a 2D patches or a 1D segment of a frame.
  • the plurality of groups of sample pixel values may be different from the plurality of groups of sample pixel values described in FIG. 11, as the plurality of groups of sample pixel values here in FIG. 12 may correspond to frames in time series of a camera module in a parallel camera, while that in FIG. 11 may correspond to frames captured by camera modules in a parallel camera at a same time.
  • a first plurality of sample floating-numbers may be determined by compressing each group of sample pixel values with a second compression weight tensor. With this step, pixel values in a 2D patch or a 1D segment in a frame may be compressed into a floating-number pixel value with a compression ratio.
  • the second compression weight tensor may be a 2D kernel, or one or more 1D kernels, wherein elements in the 2D kernel or elements in the one or more 1D kernel are floating-numbers.
  • a second plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the first plurality of sample floating-numbers with a second decompression weight tensor.
  • the second decompression weight tensor may have a same dimension as the second compression weight tensor.
  • the second compression weight tensor may be tuned to a floating-number weight tensor by tuning parameters in the second compression weight tensor based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the second plurality of groups of PQI output pixel values.
  • the quality loss may be defined as the mean square difference between the plurality of groups of sample pixel values and the second plurality of groups of PQI output pixel values.
  • FIG. 13 shows a method for training parameters in a decompression weight tensor and parameters in a QINN according to some embodiments of the present disclosure.
  • an integerized compression weight tensor may be determined by integerizing parameters/values in the floating-number weight tensor.
  • the weight tensor may include a 2D kernel, or one or more 1D kernels. Values of elements in the 2D kernel, or one or more 1D kernels may be integerized to obtain an integerized compression weight tensor.
  • a second plurality of sample integers may be determined by compressing each group of sample pixel values into an integer with the integerized compression weight tensor.
  • the integerized compression weight tensor may be implemented on a FPGA.
  • a third plurality of groups of PQI output pixel values may be determined by carrying out an inverse-operation to the second plurality of sample integers with the second decompression weight tensor.
  • the second decompression weight tensor may be the same one used in 1204.
  • a second plurality of groups of output pixel values may be determined by inputting the third plurality of groups of PQI output pixel values into a second QINN.
  • the second QINN may be different from the first QINN described in FIG. 11, as the first QINN used in decompression process is designed for the CDMA decompression process.
  • parameters in the second decompression weight tensor and parameters in the second QINN may be tuned based on machine learning by minimizing the quality loss between the plurality of groups of sample pixel values and the second plurality of groups of output pixel values.
  • the quality loss may be defined as the mean square difference between the plurality of groups of sample pixel values and the second plurality of groups of output pixel values.
  • the trained second decompression weight tensor and the trained second QINN may be applied together with the integerized compression weight tensor.
  • the decompression process is provided only for illustration purpose, and not intended to limit the scope of the present disclosure.
  • multiple variations and modifications may be made under the teachings of the present disclosure.
  • those variations and modifications do not depart from the scope of the present disclosure.
  • the parameters/values in the floating-number weight tensor may be integerized with any known integerization method.
  • the frame strategy three may be applied together with one of frame strategy one or frame strategy two to compress/decompress frames in video.
  • a same frame strategy with different compression ratio may be applied to compress/decompress frames in video. Considering only a fixed angle/field of view captured by an imaging sensor array, only frame strategy one or frame strategy two need to be discussed.
  • a group of raw pixel values may correspond to a portion of multi-frames captured in time sequence by an imaging sensor array.
  • the group of raw pixel values may correspond to a portion of a frame in the multi-frames.
  • the group of raw pixel values may correspond to pixels in a 2D patch or a 1D segment of a frame in the multi-frames.
  • pixels in the first and the last frame may be compressed with a first compression ratio
  • pixels in one or more frames between the first frame and the last frame may be compressed with a second compression ratio, wherein the first compression ratio is larger than the second compression ratio.
  • the number of multi-frames may be 10.
  • the 1 st frame and the 10 th frame may be compressed with a not-so-low compression ratio (such as 1/16)
  • the frames 2-9 may be compressed with a much lower compression ratio (such as 1/256) .
  • Each frame in the multi-frames may be compressed using frame strategy one or frame strategy two, after decompression with corresponding decompression tensors, the reconstructed multi-frames may be input into a video QINN.
  • the 10 frames may be stacked together as a 3D entity, and this 3D entity may be fine-tuned by a post-processing neural network to improve the quality of decompression.
  • the middle frames are able to use the information from the adjacent frames and their quality can thus be improved.
  • the average compression ratio of this strategy for a full video can be calculated as 1/96.
  • the 1 st frame is under 1/16 compression and the 2 nd to the 9 th frames are under 1/256 compression
  • the 10 th frame under 1/16 may be considered as the 1 st frame of a group two, so information of the 10 th frame in the group one may be used for both the group one and the group two.
  • each 1/16 frame (except the 1 st frame in the group one and the 10 th frame in the last group) may be used by two adjacent groups, so for each group only count 1 frame 1/16 and 8 frames 1/256. Do the math: average of this 9 is 1/96.
  • the number of multi-frames may be 10.
  • the motion-interpolation information using frames 1 and 10 are combined with the compressed information of frames 2-9.
  • the rest of the strategy is the same as video strategy one.
  • each frame in the multi-frames may be compressed under a same compression ratio, instead of using different compression ratios for different frames.
  • the rest of the strategy is the same as video strategy one.
  • the frame strategy three may also be applied in the video strategy to compress/decompress multi-frames captured by microcameras in time sequence.
  • These frame strategies may be adaptive sampling strategies, which means that frame-content analysis may be implemented during the compression process.
  • the kernels used in the compression process may be adapted to media type and/or image content.
  • adaption can be implemented by analyzing blind compressed data from one or more frames of multi-frames. The data load and computational complexity per pixel is reduced relative to conventional methods in proportion to the compression ratio and the adaptation rate (the number of frames considered per adaptation) .
  • frame-content analysis strategies may be carried out with much less computation and power consumption, and may further reduce the size of the compressed data and/or improve the quality of decompression.
  • the plurality of groups of sample pixel values may correspond to one or more sample frames, and the one or more sample frames may correspond to a certain type or may include a certain content.
  • a decompression weight tensor and a QINN may be tuned.
  • a compression weight tensor may be tuned together with the decompression weight tensor and the QINN.
  • each group of compression/decompression settings may correspond to a type/content
  • each group of compression/decompression settings may include a compression weight tensor, a compression tensor, and a QINN specifically trained for the corresponding type/content.
  • pixel values in each frame of multi-frames may be blindly compressed into an array of integers first by a general compression weight tensor, and pixel values in the multi-frames may be compressed into multi-arrays of integers. At least one of the multi-arrays of integers corresponding to the multi-frames may be analyzed by a classification neural network to identify the type/content of the scene (s) in the multi-frames.
  • a compression weight tensor, a decompression weight tensor and a QINN may be determined in the database corresponding to the type/content, and the compression weight tensor, the decompression weight tensor and the QINN may be specifically applied for compressing/decompressing the multi-frames.
  • the pixel values in the multi-frames may be compressed twice by the general compression weight tensor and the type-specific compression weight tensor respectively, which means the pixel values in the multi-frames may be compressed again by the compression weight tensor specifically for the type/content after identifying the type/content of the multi-frames.
  • the type-specific decompression weight tensor and the type-specific QINN may be applied to decompress the multi-arrays of integers compressed by the type-specific compression weight tensor.
  • a decompression weight tensor and a QINN may be determined in the database corresponding to the type/content, and the decompression weight tensor and the QINN may be specifically applied for decompressing the multi-arrays of integers compressed by the general compression weight tensor. Comparing with the compressed frame content strategy one, the compressed frame content strategy two may only compress the multi-frames once.
  • the reconstructed raw-bayer picture has to be demosaiced to be a usual RGB or YUV format, and the demosaic step can be done after the decompression. And motion analysis may be performed after the demosaic step. Conventional motion-analysis may be carried out on pixels of the RGB format frames and may consume significant computation and power.
  • the inter-frame-motion analysis in the present disclosure may be performed to compressed data after the compressed frame content strategy. It may significantly reduce the amount of information being saved.
  • pixel values of a frame may be compressed into an array of integers.
  • motion-analysis may be performed by analyzing multi-arrays of integers (compressed data) corresponding to the multi-frames.
  • the multi-arrays of integers corresponding to the multi-frames may have the same content with certain motion
  • the details of the content may be saved only for the first or the last or a middle frame, and a series of motion vectors and residual contents may be saved for other frames. Saving details of content in a frame (the first, the last or the middle) once, the motion vectors and the residual contents for other frames may take much smaller size than saving all the contents for all the frames.
  • FIG. 14 is an exemplary diagram of a video compression strategy relative to the video compression strategy. H. 265 according to some embodiments of the present disclosure. In some embodiments, one of the frame strategies in FIG. 2 and one of the video strategies in FIG. 13 may be applied in the video compression strategy described here.
  • FIG. 14A is an exemplary diagram of the traditional video compression strategy. H.265.
  • video source may be input into the encoding/decoding process.
  • the video source may be the RGB data after demosaicing the raw-bayer data corresponding to multi-frames captured by a parallel camera in time sequence.
  • the partition step may break each frame in the multi-frames into coding blocks, and the predict (subtract) step may do interframe differencing.
  • the transform step may do analysis on sparse basis and then compressed HEVC video (compressed video shown in FIG. 14A) may be transmitted after the entropy encode step.
  • Video output may be obtained after an entropy decode step, an inverse transform step, a predict (add) step and a reconstruct step.
  • FIG. 14B is a video compression strategy according to some embodiments of the present disclosure. As shown in FIG. 14B, comparing to the video strategy. H. 265, our video compression strategy may insert a compressively sampling step between the partition step and the predict (subtract) step. In some embodiments, one of the frame strategies described in FIG. 2 may be applied in the compressive sampling step. Further, one of the video strategies described in FIG. 13 may be also applied together with a frame strategy to the compressively sampling step. The predict (subtract) step may do interframe differencing. The transform step may do analysis on sparse basis and then compressed HEVC video (compressed video shown in FIG. 14B) may be transmitted after the entropy encode step.
  • Video output may be obtained after an entropy decode step, an inverse transform step, a predict (add) step and a neural reconstruction step.
  • the predict (add) step may be a prediction NN and may be merged with the neural reconstruction step. Due to the compressive sampling at the video source, the data load and power in the rest of the encoding/decoding process can be substantially reduced.
  • FIG. 15 is an exemplary diagram of a blind compressive sampling apparatus according to some embodiments of the present disclosure.
  • the blind compressive sampling apparatus may include a read-out unit 1510, a compressing unit 1520 and a storage 1530.
  • the blind compressive sampling apparatus 1500 may be configured to compressing raw-bayer data from focal planes (sensor arrays) of a parallel camera.
  • the reading-out unit 1510 may be configured to read out a plurality of groups of raw pixel values in sequence from one or more imaging sensor array.
  • each group of raw pixel values may correspond to a portion of one or more frames captured by the one or more imaging sensor array.
  • a group of raw pixel values may correspond to pixels in a 2D patch or a 1D segment of a frame.
  • a group of raw pixel values may correspond to pixels from multi-frames, and each pixel is from a same location of the multi-frames.
  • the multi-frames may be captured by multi camera modules in a parallel camera at a same time.
  • the compressing unit 1520 may be configured to compress each group of raw pixel values into an integer with a compression weight tensor.
  • the compression weight tensor may be a 2D kernel, or one or more 1D kernels.
  • Each element in the compression weight tensor may be an integer.
  • bit length of each element in the weight tensor may be a 12-bit, 8-bit, 6-bit, 4bit or 2-bit binary.
  • each element in the compression weight tensor is -1/+1, or 0/1.
  • 16 incoming pixel values may be combined together using an integer vector [0, 1, 0, 0, 1, 0, ... 1] with length of 16 into one number.
  • 16 incoming pixel values may be combined together using an integer vector [-1, 1, -1, -1, 1, -1, ... 1] with length of 16 into one number.
  • compressing a group of pixel values into one number may be a convolution operation.
  • compressing a group of pixel values into one number may be a weighted summation operation. Weight values of different locations of a frame in the multi-frames may correspond to a 1D kernel, and the 1D kernels of the multi-frames may be orthogonal or quasi-orthogonal to each other.
  • the storage 1530 may be configured to store a plurality of integers corresponding to the plurality of groups of raw pixel values.
  • the plurality of integers may be used for reconstructing the one or more frames.
  • FIG. 16 is an exemplary diagram of an imaging system according to some embodiments of the present disclosure.
  • the imaging system 1600 may include a compressing module 1610 and a decompressing module 1620.
  • the compressing module 1610 may be configured to compress a plurality of groups of raw pixel values from one or more imaging sensor array into a plurality of integers, wherein the compressing module 1610 may have a same function as the blind compressive sampling apparatus 1500.
  • the compressing module 1610 may include a reading-out unit 1611, a compressing unit 1612 and a storage 1613.
  • the reading-out unit 1611 may be configured to read out a plurality of groups of raw pixel values in sequence from one or more imaging sensor array
  • the compressing unit 1612 may be configured to compress each group of raw pixel values into an integer with a compression weight tensor
  • the storage may be configured to store a plurality of integers corresponding to the plurality of groups of raw pixel values.
  • each group of raw pixel values may correspond to a portion of one or more frames captured by the one or more imaging sensor array, and the plurality of integers may be used for reconstructing the one or more frames.
  • the decompressing module 1620 may be configured to decompress the plurality of integers into a plurality of groups of output pixel values. In some embodiments, the decompressing module 1620 may perform the decompression method as described in FIG. 8.
  • aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “block, ” “module, ” “engine, ” “unit, ” “component, ” or “system” . Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Studio Devices (AREA)
  • Image Processing (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'échantillonnage par compression aveugle, et un procédé d'imagerie. Le procédé d'échantillonnage par compression aveugle comprend des étapes consistant à : lire séquentiellement une pluralité de groupes de valeurs de pixel brutes à partir d'un ou de plusieurs réseaux de capteurs d'imagerie ; compresser chaque groupe de valeurs de pixel brutes en un entier avec un tenseur de poids de compression ; et stocker une pluralité d'entiers correspondant à la pluralité des groupes de valeurs de pixel brutes. Chaque groupe de valeurs de pixel brutes correspond à une partie d'une ou de plusieurs trames capturées par le ou les réseaux de capteurs d'imagerie et la pluralité des entiers sont utilisés pour reconstruire la ou les trames.
PCT/CN2018/117040 2018-06-11 2018-11-22 Procédé et appareil d'échantillonnage par compression aveugle, et système d'imagerie WO2019237659A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880094416.0A CN112470472B (zh) 2018-06-11 2018-11-22 盲压缩采样方法、装置及成像系统
CN201980038408.9A CN112425158B (zh) 2018-06-11 2019-01-29 一种监视相机系统及降低监视相机系统功耗的方法
PCT/CN2019/073703 WO2019237753A1 (fr) 2018-06-11 2019-01-29 Système de caméra de surveillance et procédé permettant de réduire la consommation d'énergie de ce dernier

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862683293P 2018-06-11 2018-06-11
US62/683,293 2018-06-11

Publications (1)

Publication Number Publication Date
WO2019237659A1 true WO2019237659A1 (fr) 2019-12-19

Family

ID=68764362

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/CN2018/117040 WO2019237659A1 (fr) 2018-06-11 2018-11-22 Procédé et appareil d'échantillonnage par compression aveugle, et système d'imagerie
PCT/CN2019/073703 WO2019237753A1 (fr) 2018-06-11 2019-01-29 Système de caméra de surveillance et procédé permettant de réduire la consommation d'énergie de ce dernier
PCT/US2019/036602 WO2019241285A1 (fr) 2018-06-11 2019-06-11 Compression à division de code pour caméras matricielles

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/CN2019/073703 WO2019237753A1 (fr) 2018-06-11 2019-01-29 Système de caméra de surveillance et procédé permettant de réduire la consommation d'énergie de ce dernier
PCT/US2019/036602 WO2019241285A1 (fr) 2018-06-11 2019-06-11 Compression à division de code pour caméras matricielles

Country Status (3)

Country Link
US (1) US10944923B2 (fr)
CN (2) CN112470472B (fr)
WO (3) WO2019237659A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3997863B1 (fr) * 2019-07-12 2023-09-06 University College Cork-National University of Ireland Cork Procédé et système pour effectuer une détection d'image optique à grande vitesse
CN111432169B (zh) * 2019-12-25 2021-11-30 杭州海康威视数字技术股份有限公司 视频传输方法、装置、设备和系统
CN115660971B (zh) * 2022-10-08 2024-02-23 镕铭微电子(济南)有限公司 一种基于深度学习硬件加速器实现usm锐化的方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133502A1 (en) * 2004-11-30 2006-06-22 Yung-Lyul Lee Image down-sampling transcoding method and device
CN101710993A (zh) * 2009-11-30 2010-05-19 北京大学 基于块的自适应超分辨率视频处理方法及系统
US20110026819A1 (en) * 2009-07-28 2011-02-03 Samsung Electronics Co., Ltd. Apparatus, method, and medium of encoding and decoding image data using sampling
CN103700074A (zh) * 2013-12-23 2014-04-02 电子科技大学 基于离散余弦变换系数分布的自适应压缩感知采样方法
CN106254879A (zh) * 2016-08-31 2016-12-21 广州精点计算机科技有限公司 一种应用自编码神经网络的有损图像压缩方法
CN107155110A (zh) * 2017-06-14 2017-09-12 福建帝视信息科技有限公司 一种基于超分辨率技术的图片压缩方法

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9317604D0 (en) 1993-08-24 1993-10-06 Philips Electronics Uk Ltd Receiver for ds-cdma signals
US5896368A (en) * 1995-05-01 1999-04-20 Telefonaktiebolaget Lm Ericsson Multi-code compressed mode DS-CDMA systems and methods
JP3901514B2 (ja) * 2001-12-27 2007-04-04 富士通株式会社 画像圧縮方法、その復元方法及びそのプログラム
US7283231B2 (en) 2004-07-20 2007-10-16 Duke University Compressive sampling and signal inference
US7336353B2 (en) 2005-10-17 2008-02-26 Duke University Coding and modulation for hyperspectral imaging
US9182228B2 (en) * 2006-02-13 2015-11-10 Sony Corporation Multi-lens array system and method
US20080123750A1 (en) * 2006-11-29 2008-05-29 Michael Bronstein Parallel deblocking filter for H.264 video codec
US20090160970A1 (en) * 2007-12-20 2009-06-25 Fredlund John R Remote determination of image-acquisition settings and opportunities
CN102104768A (zh) * 2009-12-22 2011-06-22 乐金电子(中国)研究开发中心有限公司 图像监控方法、主控装置及系统
US8254646B2 (en) * 2010-01-25 2012-08-28 Apple Inc. Image preprocessing
CN102238366A (zh) * 2010-04-26 2011-11-09 鸿富锦精密工业(深圳)有限公司 实现影像追踪监控的摄影机及方法
CN102271251B (zh) * 2010-06-02 2013-01-16 华晶科技股份有限公司 无失真的图像压缩方法
CN102447884A (zh) * 2010-10-14 2012-05-09 鸿富锦精密工业(深圳)有限公司 网络摄像机解析度自动调整系统及方法
US9686560B2 (en) * 2015-02-23 2017-06-20 Teledyne Dalsa, Inc. Lossless data compression and decompression apparatus, system, and method
CN106454229A (zh) * 2016-09-27 2017-02-22 成都理想境界科技有限公司 一种监测方法、摄像装置和图像处理设备和监测系统
CN107197260B (zh) * 2017-06-12 2019-09-13 清华大学深圳研究生院 基于卷积神经网络的视频编码后置滤波方法
CN107995494B (zh) * 2017-12-12 2019-11-22 珠海全志科技股份有限公司 视频图像数据的压缩方法与解压方法、计算机装置、计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133502A1 (en) * 2004-11-30 2006-06-22 Yung-Lyul Lee Image down-sampling transcoding method and device
US20110026819A1 (en) * 2009-07-28 2011-02-03 Samsung Electronics Co., Ltd. Apparatus, method, and medium of encoding and decoding image data using sampling
CN101710993A (zh) * 2009-11-30 2010-05-19 北京大学 基于块的自适应超分辨率视频处理方法及系统
CN103700074A (zh) * 2013-12-23 2014-04-02 电子科技大学 基于离散余弦变换系数分布的自适应压缩感知采样方法
CN106254879A (zh) * 2016-08-31 2016-12-21 广州精点计算机科技有限公司 一种应用自编码神经网络的有损图像压缩方法
CN107155110A (zh) * 2017-06-14 2017-09-12 福建帝视信息科技有限公司 一种基于超分辨率技术的图片压缩方法

Also Published As

Publication number Publication date
WO2019241285A1 (fr) 2019-12-19
WO2019237753A1 (fr) 2019-12-19
US10944923B2 (en) 2021-03-09
CN112425158A (zh) 2021-02-26
CN112470472A (zh) 2021-03-09
CN112425158B (zh) 2023-05-23
US20190379845A1 (en) 2019-12-12
CN112470472B (zh) 2023-03-24

Similar Documents

Publication Publication Date Title
US11025907B2 (en) Receptive-field-conforming convolution models for video coding
US5412427A (en) Electronic camera utilizing image compression feedback for improved color processing
CN110612722B (zh) 对数字光场图像编码和解码的方法和设备
JP2002516540A (ja) 知覚的に無損失の画像をもたらす2次元離散ウェーブレット変換に基づくカラー画像の圧縮
WO2019237659A1 (fr) Procédé et appareil d'échantillonnage par compression aveugle, et système d'imagerie
US9398273B2 (en) Imaging system, imaging apparatus, and imaging method
EP3743855A1 (fr) Modèles de convolution conformes à un champ de réception pour un codage vidéo
US7194129B1 (en) Method and system for color space conversion of patterned color images
WO2022266955A1 (fr) Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif
CN107257493A (zh) 处理图像/视频数据的方法及装置
KR20220090559A (ko) 이미지 프로세서
Lee et al. Lossless compression of CFA sampled image using decorrelated Mallat wavelet packet decomposition
Yan et al. Compressive sampling for array cameras
CN116074538A (zh) 图像编码设备及其控制方法和计算机可读存储介质
CN109819251B (zh) 一种脉冲阵列信号的编解码方法
US7313272B2 (en) Image compression apparatus and image processing system
Trifan et al. A survey on lossless compression of Bayer color filter array images
CN114979711A (zh) 音视频或图像分层压缩方法和装置
CN113170160B (zh) 用于计算机视觉分析的ics帧变换方法和装置
WO2021068175A1 (fr) Procédé et appareil de compression de séquence vidéo
CN110971913B (zh) 一种基于填充Y通道的Bayer图像压缩方法
CN115150370B (zh) 一种图像处理的方法
US20220256127A1 (en) Image encoding apparatus, method for controlling the same, and non-transitory computer-readable storage medium
Mukati Light field coding and processing for view sequences
JP2023070055A (ja) 画像符号化装置及びその制御方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922711

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18922711

Country of ref document: EP

Kind code of ref document: A1