CN113170160A - ICS frame transformation method and device for computer vision analysis - Google Patents

ICS frame transformation method and device for computer vision analysis Download PDF

Info

Publication number
CN113170160A
CN113170160A CN201980066175.3A CN201980066175A CN113170160A CN 113170160 A CN113170160 A CN 113170160A CN 201980066175 A CN201980066175 A CN 201980066175A CN 113170160 A CN113170160 A CN 113170160A
Authority
CN
China
Prior art keywords
frame
ics
training
sample
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201980066175.3A
Other languages
Chinese (zh)
Other versions
CN113170160B (en
Inventor
戴维·J·白瑞迪
严雪飞
姜玉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Ankedi Intelligent Technology Co ltd
Original Assignee
Wuxi Ankedi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Ankedi Intelligent Technology Co ltd filed Critical Wuxi Ankedi Intelligent Technology Co ltd
Publication of CN113170160A publication Critical patent/CN113170160A/en
Application granted granted Critical
Publication of CN113170160B publication Critical patent/CN113170160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/84Camera processing pipelines; Components thereof for processing colour signals
    • H04N23/843Demosaicing, e.g. interpolating colour pixel values
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/84Camera processing pipelines; Components thereof for processing colour signals
    • H04N23/88Camera processing pipelines; Components thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses an ICS frame conversion method and device. The method comprises the following steps: reading one or more ICS frames of size [ NX/kx, NY/ky, Ncomp ]; determining one or more transformed ICS frames of size [ NX/kx, NY/ky,3] by linearly transforming the one or more ICS frames using parameters in a 2D array of size [ Ncomp,3 ]; outputting the one or more transformed ICS frames to a neural network for computer vision analysis; wherein intra-frame compression is performed on one or more raw Bayer frames using a compression kernel, thereby determining one or more ICS frames, wherein the one or more raw Bayer frames are captured by one or more camera heads; wherein the size of each raw Bayer frame is [ NX, NY ], and the size of the compression kernel is [ kx, ky, Ncomp ]; where NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp are positive integers, and Ncomp represents the number of ICS channels of the compression core.

Description

ICS frame transformation method and device for computer vision analysis
Technical Field
The invention relates to computer vision analysis, in particular to a method for processing data in a compression process of computer vision analysis.
Background
The general function of a camera is to transform optical data into a compressed, continuous electronic format, thereby transferring or storing information. The optical data may correspond to one or more raw Bayer frames. Raw Bayer frames may generally have high resolution but slow transmission speed, so it is necessary to apply a compression method before transmission. It may be necessary to perform a decompression operation.
One conventional camera includes a focal plane and an image processing stage of the system-on-chip. The chip can perform demosaicing, white balance adjustment, color mixing adjustment, gamma correction, compression and decompression in sequence, and one frame or one image can only be viewed or analyzed after reconstruction (decompression).
Typically, Neural Networks (NNs) will be used for Computer Vision (CV) analysis. The computer vision analysis may be object detection/classification, face recognition, etc., and frames (typically primarily in RGB format) will be input into NNs for NN applications (object detection/classification, face recognition, etc.) or NNs training.
Most existing visual analytics NNs have small input XY plane sizes, such as [416, 416], and high resolution de-mosaiced frames must be down-sampled (and in most cases subsequently zero-padded) before being input to the NNs.
Performing power and computations is a critical limitation of the camera pixel capacity, and the consumption of performing power and computations includes demosaicing, white balance adjustment, color mixing adjustment, gamma correction, compression, decompression, and downsampling. While it may not be necessary to retrain the existing computer vision analysis NNs on the market with a large amount of new labeled training data, some methods may need to be employed to reduce the execution power and computational load.
Disclosure of Invention
One aspect of the invention discloses an ICS frame transformation method for computer vision analysis. The transformation method of the ICS frame comprises one or more of the following operations: one or more ICS frames of size [ NX/kx, NY/ky, Ncomp ] can be read. By linearly transforming one or more ICS frames using parameters in a 2D array of size [ Ncomp,3], one or more transformed ICS frames of size [ NX/kx, NY/ky,3] can be determined. One or more transformed ICS frames can be output to a neural network for computer vision analysis. In some embodiments, the one or more ICS frames may be determined by performing intra-frame compression on one or more raw Bayer frames using a compression kernel, where the one or more raw Bayer frames may be captured by one or more camera heads. In some embodiments, the size of each raw Bayer frame may be [ NX, NY ], and the size of the compression core may be [ kx, ky, Ncomp ], and NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp may be positive integers, and Ncomp represents the number of ICS channels of the compression core.
In some embodiments, for each ICS frame, the transformed ICS frame to which it corresponds may be determined by summing the pixel values at the same XY plane position in Ncomp ICS channels, with weighting factors in the three 1D vectors [ Ncomp, j ] of the 2D array, where j is 0,1, and 2.
In some embodiments, the parameters in the 2D array may be determined based on sample training, and the sample training comprises: reading one or more first sample raw Bayer frames, wherein each first sample raw Bayer frame has a size [ NX, NY ]; determining one or more de-mosaiced first sample raw Bayer frames by performing de-mosaicing on each first sample raw Bayer frame, wherein the size of each de-mosaiced first sample raw Bayer frame is [ NX, NY,3 ]; one or more transformed training marker frames are determined by performing downsampling on the first sample raw Bayer frame of each demosaic. Wherein the size of each transformation training mark frame is [ NX/kx, NY/ky,3 ]; determining one or more first sample ICS frames by performing intra-frame compression on each first sample raw Bayer frame using a compression kernel; linearly transforming the one or more first-sample ICS-frames by using initial parameters in the 2D array to determine one or more first-sample transformed ICS-frames of size [ NX/kx, NY/ky,3 ]; minimizing the total training loss between the ICS frame of the one or more first sample transformations and the corresponding one or more transformed training marker frames by adjusting initial parameters in the 2D array, thereby determining parameters in the 2D array,
in some embodiments, each first sample transformed ICS frame corresponds to a transformation training marker frame, and the training penalty for an ICS frame of a first sample transform is the mean square error between the ICS frame of the first sample transform and its corresponding transformation training marker frame, and the total training penalty between the ICS frame of one or more first sample transforms and its corresponding transformation training marker frame or frames is the sum of the individual training penalty values.
In some embodiments, intra-frame compression comprises: for each raw Bayer frame, each group of pixel values in the raw Bayer frame is compressed into an integer using a compression kernel, wherein the pixels in each raw Bayer frame are divided into a plurality of groups, and each group of pixels corresponds to a 2D or 1D raw pixel array of the raw Bayer frame.
In some embodiments, the compression kernel may be determined based on sample training, and the sample training includes: reading out one or more second sample raw Bayer frames, wherein each second sample raw Bayer frame has a size [ NX, NY ]; for each second sample raw Bayer frame, determining an ICS training marker frame corresponding to the second sample raw Bayer frame by performing a linear transformation on R, B, Gb and Gr pixels in the second sample raw Bayer frame and combining them together, wherein each ICS training marker frame has a size [ NX ', NY', Nlabel ]; performing intra-frame compression on each second sample raw Bayer frame by using an initial compression kernel to determine one or more second sample ICS frames of size [ NX/kx, NY/ky ], wherein the size of the initial compression kernel is [ kx, ky, Ncomp ]; determining one or more second-sample decompressed ICS frames of size [ NX ', NY', Nlabel ] by performing decompression on each second-sample ICS frame using an initial decompressed kernel of size [ kx ', ky', Ncomp, Nlabel ], wherein kx '═ NX' ×/NX, ky '═ NY' × NY/NY; training an initial compression kernel based on the one or more ICS training marker frames, thereby determining a compression kernel; wherein NX ', NY', kx ', ky' and Nlabel are all positive integers.
In some embodiments, training an initial compression kernel based on one or more ICS training marker frames, such that determining the compression kernel may comprise: adjusting parameters in the initial compression kernel based on machine learning to minimize overall quality loss between the one or more second sample decompressed ICS frames and their corresponding one or more ICS training marker frames, thereby determining a floating point number compression kernel; the compression kernel is determined by integer-quantizing parameters in the floating-point compression kernel.
In some embodiments, each second-sample decompressed ICS frame corresponds to one ICS training marker frame, and the quality loss of the second-sample decompressed ICS frame is a mean-square error between the second-sample decompressed ICS frame and its corresponding ICS training marker frame, and the total quality loss between the one or more second-sample decompressed ICS frames and the one or more ICS training marker frames is a sum of the individual quality loss values.
In some embodiments, the method further comprises: determining one or more inter-second-sample ICS frames by performing intra-frame compression on each second-sample raw Bayer frame using a compression kernel; determining one or more second-sample-intermediate-decompressed ICS frames by performing decompression on each second-sample-intermediate ICS frame using initial decompression; parameters in the initial decompression kernel are adjusted based on machine learning to minimize overall quality loss between the one or more second-sample intermediate decompressed frames and their corresponding one or more ICS-training marker frames, thereby determining the decompression kernel.
In some embodiments, each second-sample-intermediate-decompressed ICS frame corresponds to one ICS training marker frame, and the quality loss of the second-sample-intermediate-decompressed ICS frame is a mean-square error between the second-sample-intermediate-decompressed ICS frame and its corresponding ICS training marker frame, and the total quality loss between the one or more second-sample-intermediate-decompressed ICS frames and the one or more ICS training marker frames is a sum of the respective quality loss values.
In some embodiments, the method further comprises: determining one or more second inter-sample ICS frames by performing intra-frame compression on each first sample primitive Bayer frame using a compression kernel; determining one or more second-sample-intermediate-decompressed ICS frames by performing decompression on each second-sample-intermediate ICS frame using the initial decompression core; determining one or more second-sample reconstructed frames by inputting each second-sample intermediate decompressed ICS frame into an initial QINN; parameters in the initial decompression kernel and the initial QINN are adjusted based on machine learning to minimize overall quality loss between the one or more second-sample reconstructed frames and the one or more ICS training marker frames, thereby determining the decompression kernel and the QINN.
In some embodiments, each second-sample reconstructed frame corresponds to one ICS training marker frame, and the quality loss of one second-sample reconstructed ICS frame is a mean-square error between the second-sample reconstructed ICS frame and its corresponding ICS training marker frame, and the total quality loss between the one or more second-sample reconstructed ICS frames and the one or more ICS training marker frames is a sum of individual quality loss values.
In another aspect of the invention, an ICS frame transformation apparatus for computer vision analysis is disclosed, comprising a readout module, a processor, and an output port. The readout module may be configured to read out one or more raw Bayer frames, and each raw Bayer frame may have a size [ NX, NY ]. The processor may be configured to determine one or more transformed ICS frames by linearly transforming the one or more ICS frames using parameters in a 2D array of size Ncomp, 3. The output port may be configured to output the one or more transformed ICS frames to a neural network for computer vision analysis. In some embodiments, intra-frame compression may be performed on one or more raw Bayer frames using a compression kernel, which may be captured by one or more camera heads, to determine one or more ICS frames. In some embodiments, the size of each raw Bayer frame may be [ NX, NY ], and the size of the compression core may be [ kx, ky, Ncomp ], and NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp may be positive integers, with Ncomp representing the number of ICS channels of the compression core.
In some embodiments, for each ICS frame, the processor determines the transformed ICS frame to which it corresponds by summing the pixel values at the same XY plane position in Ncomp ICS channels, with weighting factors in three 1D vectors [ Ncomp, j ] of the 2D array, where j is 0,1, and 2.
In some embodiments, the parameters in the 2D array may be determined based on sample training, and the sample training comprises: reading one or more first sample raw Bayer frames, wherein each first sample raw Bayer frame has a size [ NX, NY ]; determining first sample raw Bayer frames of one or more demosaics by performing demosaicing on each first sample raw Bayer frame, wherein the first sample raw Bayer frames of each demosaic are each [ NX, NY,3] in size; determining one or more transformed training token frames by performing downsampling on the first sample raw Bayer frame of each demosaic, wherein each transformed training token frame is of a size [ NX/kx, NY/ky,3 ]; determining one or more first sample ICS frames by performing intra-frame compression on each first sample raw Bayer frame using a compression kernel; determining one or more first sample-transformed ICS-frames of size [ NX/kx, NY/ky,3] by linearly transforming the one or more first sample ICS-frames using initial parameters in a 2D array of size [ Ncomp,3 ]; adjusting initial parameters in the 2D array to minimize overall training loss between the ICS frame of the one or more first sample transforms and its corresponding one or more transformed training marker frames, thereby determining parameters in the 2D array.
In some embodiments, each first sample transformed ICS frame corresponds to a transformation training marker frame, and the training penalty for an ICS frame of a first sample transform is the mean square error between the ICS frame of the first sample transform and its corresponding transformation training marker frame, and the total training penalty between the ICS frame of one or more first sample transforms and its corresponding transformation training marker frame or frames is the sum of the individual training penalty values.
In some embodiments, intra-frame compression comprises: for each raw Bayer frame, each group of pixel values in the raw Bayer frame is compressed into an integer using a compression kernel, wherein the pixels in each raw Bayer frame are divided into a plurality of groups, and each group of pixels corresponds to a 2D or 1D raw pixel array of the raw Bayer frame.
In some embodiments, the compression kernel may be determined based on sample training, including: reading out one or more second sample raw Bayer frames, wherein each second sample raw Bayer frame has a size [ NX, NY ]; for each second sample raw Bayer frame, determining an ICS training marker frame corresponding thereto by performing a linear transformation on R, B, Gb and Gr pixels in the second sample raw Bayer frame and combining them together, wherein each ICS training marker frame has a size [ NX ', NY', Nlabel ]; performing intra-frame compression on each second sample raw Bayer frame by using an initial compression kernel to determine one or more second sample ICS frames of size [ NX/kx, NY/ky, Ncomp ], wherein the initial compression kernel is of size [ kx, ky, Ncomp ]; determining one or more second sample decompressed ICS frames of size [ NX ', NY', Nlabel ], where kx '═ NX'. kx/NX, ky '. NY'. ky/NY, by performing decompression on each second sample ICS frame using an initial compressed kernel of size [ NX ', NY', Nlabel ]; training an initial compression kernel based on the one or more ICS training marker frames, thereby determining a compression kernel; wherein NX ', NY ', kx ', ky and Nlabel are all positive integers.
In some embodiments, training an initial compression kernel based on one or more ICS training marker frames, such that determining the compression kernel may comprise: adjusting parameters in the initial compression kernel based on machine learning to minimize an overall loss of quality between the one or more second-sample decompressed ICS frames and their corresponding one or more ICS training marker frames, thereby determining a floating-point number compression kernel; the compression kernel is determined by spending the parameter integer in the floating point compression kernel.
In some embodiments, each second-sample decompressed ICS frame corresponds to one ICS training marker frame, and the quality loss of one second-sample decompressed ICS frame is the mean square error between the second-sample decompressed ICS frame and its corresponding ICS training marker frame, and the total quality loss between the one or more second-sample decompressed ICS frames and the one or more ICS training marker frames is the sum of the individual quality loss values.
Additional features will be set forth in part in the description which follows, and in part will be readily apparent to those skilled in the art from that description or recognized by practicing the exemplary embodiments as described herein. The inventive features related to the present invention may be appreciated and understood through practice or application of various aspects of the methodology, the detailed examples of which follow discuss the process and its comprehensive content.
Drawings
The present invention will be further described in exemplary embodiments. These exemplary embodiments will be described in detail with reference to the accompanying drawings. These embodiments are non-limiting exemplary embodiments in which like reference numerals represent similar structures throughout the several views of the drawings. Wherein:
fig. 1 illustrates an example of an original raw Bayer frame, according to some embodiments of the invention.
FIG. 2 illustrates methods of intra-frame compression and decompression according to some embodiments of the invention.
FIG. 3 illustrates the convolution process described in 204 according to some embodiments of the invention.
Fig. 4 illustrates an example of an intra-frame compression process employing the frame strategy described at 204, according to some embodiments of the invention.
Fig. 5 is an example of a compression kernel during single-layer convolution 2D compression of raw Bayer data according to an embodiment of the present invention.
FIG. 6 is an integer array of shapes [256,480,4] that have been compressed with the compression kernel for the input pixel values.
FIG. 7 illustrates a method of transforming ICS frames for computer vision analysis, in accordance with some embodiments of the present invention.
FIG. 8 illustrates an example of a transformed ICS frame, according to some embodiments of the invention.
FIG. 9 illustrates computer vision analysis results in RGB format of the stack of transformed ICS frames in FIG. 8, in accordance with some embodiments of the present invention.
FIG. 10 illustrates an example of a method of training parameters in a 2D array according to some embodiments of the invention.
FIG. 11 illustrates two widely used formats of an information set matrix according to some embodiments of the invention.
FIG. 12 illustrates an example of a pre-training method of a compression kernel according to some embodiments of the invention.
FIG. 13 illustrates an example of a training method of a compression kernel according to some embodiments of the invention.
Fig. 14 is an example of a training method for a decompression core according to some embodiments of the invention.
FIG. 15 illustrates another training method of a decompression core according to some embodiments of the invention.
FIG. 16 illustrates an ICS frame transformation apparatus for computer vision analysis.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, systems, components, and/or circuits have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present invention. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
It will be understood that the terms "system," "engine," "unit," "module," and/or "block" as used herein are a way of distinguishing, in ascending order, different components, elements, components, parts, or assemblies at different levels. However, these terms may be substituted by other expressions if the same object can be achieved.
Generally, the words "module," "unit," or "block" as used herein refers to a collection of logic, or software instructions, embodied in hardware or firmware. The modules, units, or blocks described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, software modules/units/blocks may be compiled and linked into executable processes. It should be understood that software modules may be called from other modules/units/blocks or themselves, and/or may be called in response to detected events or interrupts. Software modules/units/blocks configured for execution on a computing device may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disk, or any other tangible medium, or downloaded as digital (and may be initially stored, decompressed, or decrypted before execution in a compressed or installable format requiring installation). Such software code may be stored, in part or in whole, on a storage device executing the computing device for execution by the computing device. Software such as an EPROM may be included in firmware. It should also be understood that hardware modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or may include programmable units, such as programmable gate arrays or processors. The modules/units/blocks or computing device functions described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware. Generally, a module/unit/block described herein refers to a logical module/unit/block that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks, even though they have a physical organization or storage. The description may apply to the system, the engine, or a portion thereof.
It will be understood that when an element, engine, module or block is referred to as being "on," "connected to" or "coupled to" another element, engine, module or block, it can be directly on, connected or coupled to the other element, engine, module or block or an intermediate element, engine, module or block, unless the context clearly dictates otherwise. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed transactions.
These and other features of the present invention, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It is understood that the drawings are not to scale.
The terminology used herein is for the purpose of describing particular examples and embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this disclosure, specify the presence of integers, devices, acts, features, steps, elements, operations, and/or components, but do not preclude the presence or addition of one or more other integers, devices, acts, features, steps, elements, operations, components, and/or groups thereof.
The invention discloses a computer vision analysis method and a computer vision analysis device. This will be described in detail in the following examples. As will be described in fig. 1-6, the computer vision analysis method will be applied with a new compression/decompression method. The new compression/decompression method, in which demosaicing and other operations are performed after compression and decompression, is different from the existing methods described in the background.
In one camera, light received by the camera may be read out by one chip as raw Bayer data. As shown in fig. 1, raw Bayer data must be read from the camera by using either a parallel stream or a serial stream. Fig. 1 shows an example of a raw Bayer frame corresponding to raw Bayer data, according to some embodiments of the invention. As shown in fig. 1, the original Bayer frame has a shape of [2048,3840], and each pixel may have a pixel value (original pixel value) corresponding thereto. After capturing the frame, the raw pixel values may be read out by the chip in sequence.
For an electronic sensor array of a camera, the data read out from one focal plane may be in raster format, meaning that rows are read out sequentially. The shape of the input pixel value (original pixel value) is [2048,3840 ]. As will be described in FIG. 6, in some embodiments, [2048,3840] raw Bayer frames may be compressed into an array of integers of the shape [256,480,4 ]. The pixels also correspond to different colors, typically red, green and blue, but the color values are typically mosaiced as they pass through the sensor, so a given pixel corresponds to a given known color.
Intra-frame compression may be performed on the raw Bayer data stream. Fig. 2 illustrates a method of intra frame compression/decompression, according to some embodiments of the invention.
In 202, sets of raw pixel values may be read out sequentially from the camera head. In some embodiments, the raw pixel values may be read out sequentially as raw Bayer data. For example, the camera head may capture a raw Bayer frame that includes multiple sets of raw pixel values, where each set of raw pixel values corresponds to a 2D or 1D array of raw pixels of the raw Bayer frame.
At 204, intra-frame compression may be performed by compressing each set of raw pixel values into an integer using a compression kernel, and finally, the raw Bayer frame may be compressed into an ICS frame. In some embodiments, the compression core may have Ncomp ICS channels, and Ncomp may be an integer no less than 1. The compression process is described in the frame policy.
In some embodiments, the elements in the compression core may be integers to facilitate application to hardware, such as a Field Programmable Gate Array (FPGA). For example, elements in a compression core may be binary, and the bit width of an element may be 12 bits, 10 bits, 8 bits, 6 bits, 4 bits, or 2 bits. Further, when the element is a binary of two bits, the element may be-1 or +1, or the element may be 0 or 1.
Strategy one of frame
In the raw Bayer frame, a set of raw pixel values may correspond to pixels in a 2D patch of the raw Bayer frame, and the compression kernel may be a 2D kernel in which the raw Bayer frame may be divided into a plurality of 2D patches. A 2D patch and 2D kernel have one and the same size. For example, the 2D kernel may have a size [ k ]x,ky]And has a shape of [ NX,NY]Can be divided into Nx,Ny]2D Patch, wherein Nx=NX/kx,Ny=NY/ky. Is in the shape of [ k ]x,ky,1]The pixel value corresponding to a pixel in a certain patch may be [ k ] in shapex,ky,Ncomp]And the pixel values in the 2D patch can be compressed into Ncomp numbers (Ncomp is a manually defined preset integer). Finally, the input raw pixel values of the frame can be compressed to COMP, where COMP is a size [ k ]x,ky,Ncomp]Ncomp number (an array of integers), Ncomp may represent the number of ICS channels of COMP. The intra-frame compression process may be a 2D convolution operation, as shown in equation (1) below:
Figure BDA0003010492770000111
wherein the indexes i and j respectively cyclically output kxAnd kyAnd the index k is a number from 0 to Ncomp-1.
The compression rate may be expressed as Ncomp/(k) without considering the difference between the bit width of the input pixel value (the original pixel value is either 8 bits or 10 bits) and the bit width of the compressed digital array (8 bits)x*ky). By pair [ k ]x,ky,Ncomp]Various settings of (2) are employed to achieve different compression ratios. For example, different compressibility, such as 1/16,1/16,1/32, and 1/256 may be achieved by using 2D kernels [16,16, respectively],[8,8,4],[16,16,8]And [16,16, 1]]To be implemented.
Frame strategy two
In the raw Bayer frame, a set of raw pixel values may correspond to pixels in a 1D segment of the raw Bayer frame, and the compression kernel may be a 1D kernel in which the raw Bayer frame may be divided into a plurality of 1D segments. The 1D segment and the 1D kernel may have one and the same size. In some embodiments, each element in the compression core may be-1 or + 1; or each element in the compression core may be a 0 or a 1. For example, 16 incoming pixel values (1D original pixel array) may be combined into one number using a length-16 compression kernel [0,1,0,0,1,0, … 1 ]. In yet another example, 16 incoming pixel values may be combined into one number using a length-16 compression kernel [ -1,1, -1,1, -1,1, -1, … 1 ].
In particular, the sequence may be divided line by line. Various 1D compression kernels have been developed, including [128,1,4], [32,1,4 ]. Also, a combination of different convolution 1D kernels for different rows in the raw Bayer data can be used to control the overall compression ratio of one picture/one frame.
This way of splitting the pixel sequence uses a smaller buffer area than the way of splitting the 2D patch, since pixel values from different lines/segments do not need to be buffered, while incoming pixel values can be handled as segments.
As described above, each set of raw pixel values may be compressed into an integer, and the raw Bayer frame may be compressed into a large number of integers. During compression/decompression, a large number of integers may be stored or buffered.
At 206, decompression may be performed using a decompression core to determine a decompressed ICS frame. In some embodiments, decompression may perform deconvolution of a large number of integers of the ICS frame. In some embodiments, quantization and entropy coding may be performed after decompression, meaning that entropy decoding, rescaling, and integer formation (rescaling and integer formation correspond to quantization operations) may be performed prior to decompression.
It is noted that the intra-frame compression process is presented for illustration only and is not intended to limit the scope of the present invention. Many variations and modifications will be apparent to those of ordinary skill in the art in light of the teachings of this invention. However, such changes and modifications may not depart from the scope of the present invention. For example, the original pixel values may also be compressed using compression kernels having other bit widths.
Fig. 3 shows a convolution process, as described in 204, according to some embodiments of the present invention. As shown in fig. 3, an array of pixel values of size 4 x 4 may be compressed into an integer using a compression kernel. The compression kernel may also be 4 x 4 in size.
Although the process of intra-frame compression has become simple, it may provide a way to reduce the area of the necessary buffers, as illustrated in 204. For a patch that applies one convolution 2D kernel of size 4,4,1 to one pixel of shape 4,4, it is not necessary to buffer all pixels (16 in total) and perform one element multiplication and summation. Conversely, when a pixel is read, the pixel can be processed using the appropriate kernel to weight the elements row by row and put the output values (digital array) into the buffer until the single convolution operation is completed. After each convolution operation, the buffered numbers may be output into memory and the buffer may be cleared.
For one size of [ k ]x,ky,Ncomp]Of the convolution 2D kernelSize of [ NX,NY]When the above method is implemented, the necessary buffer area is k of the original Bayer pixelsxAnd (4) row.
Fig. 4 illustrates an example of an intra-frame compression process using the frame strategy described in 204, according to some embodiments of the invention. As shown in fig. 4, the compression of one frame may be performed on a patch-by-patch basis. In some embodiments, the frame may be processed by hardware such as a Field Programmable Gate Array (FPGA), with a convolution kernel applied to each patch of pixels, and there is no overlap or gap in moving to the next patch until moving to the last patch position. Patch 1 shown in fig. 4 represents a processed patch, while patch 2 represents a processing patch.
Fig. 5 is an example of a single-layer convolution 2D compression process of an original Bayer frame, also referred to as a kernel, according to some embodiments of the invention. A single layer convolution 2D operation may be performed to compress one [ N ]X,NY]The shape of the compressed kernel of a pixel raw Bayer frame (e.g., FIG. 1) is [ k ]x,ky,Ncomp]=[8,8,3]. The compressed kernel is shown in FIG. 5, and 3 planes represent [8, respectively]And (4) matrix. Each plane represents [8,8, i ] of the compression kernel]One of the components, (i ═ 0,1,2), and each plane represents a channel of the compression kernel.
FIG. 6 is a graph of the shape [256,480, 3] of the original pixel value of FIG. 1 after compression using the compression kernel of FIG. 5]An array of integers of (1). i.e. ithThe plane shows a matrix [256,480, i]Wherein the index i is 0,1, 2.
In some embodiments, the compressed integer array may be reconstructed using a decompression kernel (or further using QINN). Further, demosaicing and other operations, such as white balance adjustment, color mixing adjustment, and gamma correction, may be performed after reconstruction. Since most existing computer vision analysis requires frames in RGB format and the frames in RGB format need to be determined by demosaicing operations (after decompression), the computational power and computations involve compression, decompression, demosaicing and downsampling prior to computer vision analysis (computer vision analysis NNs typically have a small input XY plane size for the reasons described in the background). A common approach for computer vision analysis is to use existing computer vision analysis NNs and input frames after decompression, demosaicing, and other operations. Another approach is to design own computer vision analysis NNs that require training of the computer vision analysis NNs with large amounts of newly labeled training data to achieve the goals that are achieved by existing NNs on the market, which can be costly. Decompression may be viewed as an upsampling operation in a sense that a downsampling operation may be performed on the demosaiced frames before the frames are input into NNs for computer vision analysis. Therefore, in order to reduce power consumption and computation amount, the steps of decompression and downsampling may be skipped.
FIG. 7 illustrates an ICS frame transformation method for computer vision analysis, according to some embodiments of the present invention.
At 702, one or more ICS frames can be read, where each ICS frame has a size [ NX/kx, NY/ky, Ncomp]. In some embodiments, one or more ICS frames may be determined as described in 204 of FIG. 2, where sizes [ NX/kx, NY/ky, Ncomp may be used]Will each have a size of NX,NY]Is compressed into a size of [ NX/kx, NY/ky, Ncomp]The ICS frame of (1).
In 704, one or more ICS frames may be linearly transformed using parameters in a 2D array of size [ Ncomp,3], thereby determining one or more transformed ICS frames of size [ NX/kx, NY/ky,3 ].
In some embodiments, for each ICS frame, the linear transformation may be to add pixel values at the same XY plane position in Ncomp ICS channels using weighting factors in 3 1D vectors [ Ncomp, j ] of the 2D array, where j is 0,1, and 2. The determination of a pixel in a transformed ICS frame can be shown as equation (2) below:
Figure BDA0003010492770000151
where j is the RGB channel index of RGB _ trans, and j is a number from 0 to 2; where i is the ICS channel index and i is a number from 0 to Ncomp-1; and COMP is the ICS frame and trans _ w is a parameter in the 2D array, where i _ x is the x-axis index of one pixel, and i _ x is from 0 to
Figure BDA0003010492770000152
The number of (a); and i _ y is the y-axis index of a pixel, i _ y being from 0 to
Figure BDA0003010492770000153
The number of (2).
An exemplary transformed ICS frame is shown in fig. 8. As shown in FIG. 8,3 channels of the transformed ICS frame representing RGB channels may be used for computer vision analysis, and further, the RGB format of the transformed ICS frame may be determined by stacking the 3 channels in FIG. 8. And in fig. 9, a computer vision analysis result can be determined by putting the RGB format of the transformed ICS frame into Yolo _ v3 (an existing computer vision analysis NN).
At 706, the one or more transformed ICS frames are output to a neural network for computer vision analysis. In some embodiments, one or more transformed ICS frames may be used directly for computer vision analysis without downsampling.
It is noted that NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp are all positive integers, and Ncomp represents the number of ICS channels of the compression core.
In order for each transformed ICS frame to be sufficiently well representative of its corresponding RGB format for computer vision analysis, parameters in the 2D array may be determined based on sample training, as will be described in FIG. 10. FIG. 10 illustrates an exemplary training method for parameters in a 2D array, according to some embodiments of the invention.
In 1002, one or more raw Bayer frame samples may be read out, where each first raw Bayer frame sample has a size [ NX, NY ].
In 104, first raw Bayer frame samples of one or more demosaics may be determined by performing a demosaicing on each of the first raw Bayer frame samples, where the first raw Bayer frame samples of each demosaic are of a size [ NX, NY,3 ].
An original Bayer frame can be viewed as placing several information group matrices of area [2,2] without gaps or overlaps between the matrices, where each information group matrix contains 1R, 1B, and 2G pixels: the number of G pixels is doubled according to the convention (the convention stems from the fact that the human eye is most sensitive to green when seeing all colors). FIG. 11 shows two widely used formats of an information set matrix, according to some embodiments of the present invention.
There are many ways to solve the mosaic, and one of the bases is based on spatial interpolation. First, for R (or B) pixels, a subframe can be extracted and composed purely of pixels of this type without changing the spatial relationship between the pixels of this type. The area of the x (y) size of the image is half of the original Bayer frame. This small monochrome image (sub-frame) is then upsampled using conventional interpolation methods to obtain an R (or B) monochrome image having the same XY dimensions as the original Bayer frame. Secondly, for the Gb and Gr pixels, they are used to replace the R & B pixels nearest to them, and after a fixed area interpolation, a monochromatic green image with the same XY size as the original Bayer frame is obtained. Again, three monochromatic images (red, green, blue) are superimposed together as a visible RGB image with the same XY size as the original Bayer frame.
At 1006, downsampling may be performed on the first raw Bayer frame sample for each demosaic to determine one or more transformed training token frames, wherein each transformed training token frame has a size [ NX/kx, NY/ky,3 ]. In some embodiments, the downsampling may be performed using an established downsampling method, or an own downsampling method may be devised.
At 1008, intra-frame compression may be performed on each first raw Bayer frame sample using a compression kernel, thereby determining one or more first ICS frame samples. In some embodiments, intra-frame compression may be like the method in fig. 2.
At 1010, one or more first transformed ICS frame samples may be linearly transformed using initial parameters in a 2D array of size [ Ncomp,3] to determine one or more first transformed ICS frame samples of size [ NX/kx, NY/ky,3 ]. The linear transformation may be as in 704.
At 1012, initial parameters in the 2D array may be fine-tuned to minimize overall training loss between the ICS frame samples of the one or more first transforms and their corresponding one or more transform training marker frames, thereby determining parameters in the 2D array.
Each first transformed ICS frame sample corresponds to a transform training marker frame, and the training loss for the first transformed ICS frame sample is the mean square error between the first transformed ICS frame sample and its corresponding transform training marker frame, and the total training loss between one or more first transformed ICS frame samples and their corresponding one or more transform training marker frames is the sum of the individual training loss values.
The illustration in fig. 10 is provided to enable any person skilled in the art to make and use the invention. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. For example, the processes in FIG. 10 may change order. In particular, steps 1004 and 1006 may be performed after step 1008 or step 1010.
In some embodiments, the compression kernel may be determined based on sample training. FIG. 12 illustrates an exemplary pre-training method of compressing kernels, according to some embodiments of the invention.
In 1202, one or more second raw Bayer frame samples may be read out, where each second raw Bayer frame has a size [ NX, NY ]. In some embodiments, the training samples in fig. 12 and the training samples in fig. 10 may overlap each other. For example, at least one of the one or more first raw Bayer frame samples may be found in the one or more second raw Bayer frame samples. In some embodiments, the training samples in fig. 12 and the training samples in fig. 10 do not overlap each other. For example, the one or more first raw Bayer frame samples are the same as the one or more second raw Bayer frame samples. As another example, any one or more of the first raw Bayer frame samples may not be found in the one or more second raw Bayer frame samples.
At 1204, for each second raw Bayer frame sample, an ICS training mark frame corresponding thereto may be determined by performing a linear transformation on and combining R, B, Gb, and Gr pixels in the second raw Bayer frame sample, where each ICS training mark frame has a size [ NX ', NY', Nlabel ], and NX, NY, NX ', NY', and Nlabel are positive integers.
It is worth noting that the operation on the second raw Bayer frame sample (performing a linear transformation on and combining R, B, Gb, and Gr pixels) is a high level concept of demosaicing. In some cases, demosaicing may be used to determine one or more ICS training marker frames. For example, for a raw Bayer frame, the following steps may be used to determine an ICS training marker frame corresponding thereto: firstly, R, B, Gr and Gb pixels can be extracted from an original Bayer frame to be used as 2D arrays, and the size of each array is [ NX/2, NY/2 ]; secondly, Gr and Gb may be combined to size [ NX/2, NY/2] G ═ 2 (Gr + Gb)/2; again, typically a linear RGB to YUV transformation may be applied to transform a stacked RGB2D array of size [ NX/2, NY/2,3], and a transformed stacked RGB2D array of size [ NX/2, NY/2,3] is the ICS training marker frame corresponding thereto. Finally, the size of the decompression kernel may be [ kx/2, ky/2, Ncomp,3 ].
At 1206, intra-frame compression may be performed on each second raw Bayer frame sample using an initial compression kernel of size [ kx, ky, Ncomp ] to determine one or more second ICS frame samples of size [ NX/kx, NY/ky, Ncomp ].
In 1208, each second ICS frame sample may be decompressed using an initial decompressed kernel of size [ kx ', ky', Ncomp, Nlabel ], to determine one or more second decompressed ICS frame samples of size [ NX ', NY', Nlabel ], where kx '═ NX' ×/NX, ky '═ NY' × ky/NY. In some embodiments, decompression may be a deconvolution operation using a decompression kernel. One or more second decompressed ICS frame samples of the same size ([ NX ', NY', Nlabel ]) as the one or more ICS training marker frames may be determined using a decompression kernel of size [ kx ', ky', Ncomp, Nlabel ].
At 1210, an initial compression kernel may be trained based on one or more ICS training marker frames to determine a compression kernel. The training process may be as described in fig. 13.
The illustration in fig. 12 is provided to enable any person skilled in the art to make and use the invention. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. For example, the processes in FIG. 12 may change order. In particular, step 1204 may be performed after step 1206 or step 1208.
FIG. 13 illustrates an exemplary training method for a compression kernel, according to some embodiments of the invention.
In 1302, parameters in the initial compression kernel can be fine-tuned based on machine learning to minimize an overall loss of quality between the one or more second decompressed ICS frame samples and the one or more ICS training marker frames to determine a floating point number compression kernel. Each second decompressed ICS frame sample corresponds to an ICS training marker frame, and the quality loss of one second decompressed ICS frame sample is the mean square error between the second decompressed ICS frame sample and its corresponding ICS training marker frame, and the total quality loss between one or more second decompressed ICS frame samples and its corresponding ICS training marker frame or frames is the sum of the individual quality loss values.
At 1304, a compression kernel may be determined by integer-quantizing parameters in the floating-point compression kernel.
As shown in fig. 14, the decompression core may be determined based on further training. FIG. 14 is an exemplary training method for decompression cores, according to some embodiments of the invention.
At 1402, intra-frame compression may be performed on each second raw Bayer frame sample using a compression kernel, thereby determining one or more second intermediate ICS frame samples. The compression kernel originates at step 1304.
At 1404, decompression may be performed on each second intermediate ICS frame sample using the initial decompression core to determine one or more second intermediate decompressed ICS frame samples. And in 1406, parameters in the initial compression kernel can be fine-tuned based on machine learning to minimize overall quality loss between the one or more second intermediate decompressed frame samples and their corresponding one or more ICS training marker frames to determine the decompression kernel.
Each second intermediately decompressed ICS frame sample corresponds to an ICS training marker frame, the quality loss of one second intermediately decompressed ICS frame sample is the mean square error between the second intermediately decompressed ICS frame sample and its corresponding ICS training marker frame, and the total quality loss between the one or more second intermediately decompressed ICS frame samples and the one or more ICS training marker frames is the sum of the individual quality loss values.
In some embodiments, a neural network for Quality Improvement (QINN) may be applied to reduce the quality loss during compression and decompression, and fig. 15 shows another training method for decompression kernels, according to some embodiments of the present invention.
At 1502, intra-frame compression may be performed on each first raw Bayer frame sample using a compression kernel to determine one or more second intermediate ICS frame samples. The compression kernel originates at step 1304.
At 1504, decompression may be performed on each second intermediate ICS frame sample using the initial decompression core to determine one or more second intermediate decompressed ICS frame samples. And at 1506, each second intermediate decompressed ICS frame sample may be placed into an initial QINN to determine one or more second reconstructed frame samples. And at 1508, parameters in the initial decompression kernel and the initial QINN may be fine-tuned based on machine learning to minimize overall quality loss between the one or more second reconstructed frame samples and their corresponding one or more ICS training marker frames, thereby determining the decompression kernel and one QINN.
Each second reconstructed ICS frame sample corresponds to one ICS training marker frame, and the quality loss of one second reconstructed ICS frame sample is a mean square error between the second reconstructed ICS frame sample and its corresponding ICS training marker frame, and the total quality loss between the one or more second reconstructed ICS frame samples and the one or more ICS training marker frames is a sum of individual quality loss values.
FIG. 16 illustrates an ICS frame transformation apparatus for computer vision analysis. As shown in fig. 16, the apparatus may include a readout module 1610, a processor 1620, and an output port 1630.
The readout module 1610 may be configured to readout one or more raw Bayer frames, and each raw Bayer frame is of size [ NX, NY ]. Processor 1620 may be configured to determine one or more transformed ICS frames by linearly transforming one or more ICS frames using parameters in one 2D array of size Ncomp, 3. Output port 1630 may be configured to output the one or more transformed ICS frames to a neural network for computer vision analysis.
In some embodiments, intra-frame compression may be performed on one or more raw Bayer frames using a compression kernel, which are captured by one or more camera heads, to determine one or more ICS frames. In some embodiments, each raw Bayer frame is of size [ NX, NY ], and the compression kernel is of size [ kx, ky, Ncomp ]. NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp are all positive integers, and Ncomp represents the number of ICS channels of the compression core.
In some embodiments, for each ICS frame, the processor determines a transformed ICS frame corresponding thereto by adding the pixel values at the same XY plane position in Ncomp ICS channels using weighting factors in 3 1D vectors [ Ncomp, j ], where j is 0,1, and 2.
In some embodiments, the parameters in the 2D array may be determined based on sample training, and the process of this sample training may be the same as described in fig. 10. In some embodiments, the intra-frame compression process may be the same as described in fig. 1-6, and as such, the compression kernel is determined based on sample training, and the process of sample training may be the same as described in fig. 12-15.
Having thus described the basic concepts, it may become apparent to those skilled in the art upon reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only, and not by way of limitation. Various alterations, improvements, and modifications will occur to those skilled in the art, though not expressly stated herein. Such alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of the disclosure.
Furthermore, certain terminology has been used to describe embodiments of the invention. For example, the terms "one embodiment," "an embodiment," and/or "some embodiments" mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined together in one or more embodiments of the invention.
Moreover, those skilled in the art will understand that various aspects of the invention may be illustrated and described in any of a number of patentable classes or contexts, including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, various aspects of the present invention may be implemented in hardware entirely, in software entirely (including firmware, resident software, micro-code, etc.), or in a combination of software and hardware, which implementations may be generally referred to herein as "blocks," modules, "" engines, "" units, "" components, "or" systems. Furthermore, aspects of the present invention may take the form of a computer processing product embodied in one or more computer readable media having computer readable processing code embodied therein.
Furthermore, the order in which the elements or sequences of processes are described, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. While the foregoing disclosure discusses, by way of various examples, what are presently considered to be various useful embodiments of the present invention, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalents within the spirit and scope of the disclosed embodiments. For example, although implementations of the various components described above may be embodied in a hardware device, they may also be implemented as a software-only solution, e.g., installation on an existing processing device or mobile device.
Also, it should be appreciated that in the foregoing description of embodiments of the invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

Claims (20)

1. An ICS frame transformation method for computer vision analysis, comprising:
reading one or more ICS frames of size [ NX/kx, NY/ky, Ncomp ];
linearly transforming one or more ICS frames using parameters in a 2D array of size [ Ncomp,3], determining one or more transformed ICS frames of size [ NX/kx, NY/ky,3 ];
outputting the one or more transformed ICS frames to a neural network for computer vision analysis;
wherein intra-frame compression is performed on one or more raw Bayer frames captured by one or more camera heads by using a compression kernel to determine one or more ICS frames;
the size of each original Bayer frame is [ NX, NY ], and the size of a compression kernel is [ kx, ky, Ncomp ];
NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp are all positive integers, and Ncomp represents the number of ICS channels of the compression core.
2. The method of claim 1, wherein determining one or more transformed ICS frames of size [ NX/kx, NY/ky,3] by linearly transforming one or more ICS frames using parameters in a 2D array of size [ Ncomp,3] comprises:
for each ICS frame, the pixel values at the same XY plane position in Ncomp ICS channels are added using weighting factors in 3 1D vectors [ Ncomp, j ] of the 2D array, where j is 0,1, and 2.
3. The method of claim 1 or 2, wherein the parameters in the 2D array are determined based on sample training, the sample training comprising:
reading out one or more first raw Bayer frame samples, wherein the size of each first raw Bayer frame sample is [ NX, NY ];
determining first raw Bayer frame samples of one or more demosaics by performing demosaicing on each first raw Bayer frame sample, wherein the first raw Bayer frame sample of each demosaic is of size [ NX, NY,3 ];
determining one or more transformed training token frames by performing downsampling on the first raw Bayer frame sample of each demosaic, wherein each transformed training token frame has a size [ NX/kx, NY/ky,3 ];
determining one or more first ICS frame samples by performing intra-frame compression on each first raw Bayer frame sample using a compression kernel;
linearly transforming one or more first ICS frame samples using initial parameters in a 2D array of size [ Ncomp,3], thereby determining one or more first transformed ICS frame samples of size [ NX/kx, NY/ky,3 ];
fine-tuning initial parameters in the 2D array, determining parameters in the 2D array by minimizing overall training loss between the ICS frame samples of the one or more first transforms and their corresponding one or more transform training marker frames.
4. The method of claim 3, wherein each first transformed ICS frame sample corresponds to a transformed training marker frame, and a training loss for a first transformed ICS frame sample is a mean square error between the first transformed ICS frame sample and its corresponding transformed training marker frame, and an overall training loss between one or more first transformed ICS frame samples and their corresponding transformed training marker frame or frames is a sum of individual training loss values.
5. The method of claim 1, wherein the process of intra-frame compression comprises:
for each raw Bayer frame, each group of pixel values in the raw Bayer frame is compressed into an integer using a compression kernel, wherein the pixels in each raw Bayer frame are divided into a plurality of groups, and each group of pixels corresponds to one 2D or 1D raw pixel array in the raw Bayer frame.
6. The method of claim 1, 3 or 5, wherein the compression kernel is determined based on sample training, the sample training comprising:
reading out one or more second original Bayer frame samples, wherein each second original Bayer frame sample has a size [ NX, NY ];
for each second raw Bayer frame sample, determining an ICS training marker frame corresponding thereto by performing a linear transformation on and combining R, B, Gb, and Gr pixels together, wherein the size of each ICS training marker frame is [ NX ', NY', Nlabel ];
determining one or more second ICS frame samples of size [ NX/kx, NY/ky, Ncomp ] by performing intra-frame compression on each second raw Bayer frame sample using an initial compression kernel of size [ kx, ky, Ncomp ];
determining one or more second decompressed ICS frame samples of size [ NX ', NY', Nlabel ] by performing decompression on each second ICS frame sample using an initial decompressed kernel of size [ kx ', ky', Ncomp, Nlabel ], wherein kx '═ NX' ×/NX, ky '═ NY' × ky/NY;
training an initial compression kernel based on the one or more ICS training marker frames, thereby determining a compression kernel;
wherein NX ', NY', kx ', ky', and Nlabel are all positive integers.
7. The method of claim 6, wherein training the initial compression kernel based on one or more ICS training marker frames, such that determining the compression kernel comprises:
fine-tuning parameters in the initial compression kernel based on machine learning, determining a floating point number compression kernel by minimizing overall quality loss between one or more second decompressed ICS frame samples and their corresponding one or more ICS training marker frames;
the compression kernel is determined by integer-quantizing parameters in the floating-point compression kernel.
8. The method of claim 7, wherein each second decompressed ICS frame sample corresponds to an ICS training marker frame, and the quality loss of the second decompressed ICS frame sample is a mean square error between the second decompressed ICS frame and its corresponding ICS training marker frame, and the total quality loss between the one or more second decompressed ICS frame samples and the one or more ICS training marker frames is a sum of individual quality loss values.
9. The method of claim 7, wherein the method further comprises:
performing intra-frame compression on each second raw Bayer frame sample using a compression kernel, thereby determining one or more second intermediate ICS frame samples;
performing decompression on each second intermediate ICS frame sample using an initial decompression core, thereby determining one or more second intermediate decompressed ICS frame samples;
fine-tuning parameters in the initial decompression kernel based on machine learning, the decompression kernel being determined by minimizing a loss of quality between one or more second intermediate decompressed frame samples and its corresponding one or more ICS training marker frames.
10. The method of claim 9, wherein each second intermediate decompressed ICS frame sample corresponds to an ICS training marker frame, and the quality loss of a second intermediate decompressed ICS frame sample is a mean square error between the second intermediate decompressed ICS frame sample and its corresponding ICS training marker frame, and the total quality loss between one or more second intermediate decompressed ICS frame samples and one or more ICS training marker frames is a sum of individual quality loss values.
11. The method of claim 7, wherein the method further comprises:
performing intra-frame compression on each first raw Bayer frame sample using a compression kernel, thereby determining one or more second intermediate ICS frame samples;
performing decompression on each second intermediate ICS frame sample using an initial decompression core, thereby determining one or more second intermediate decompressed ICS frame samples;
determining one or more second reconstructed frame samples by inputting each second intermediate decompressed ICS frame sample into the initial QINN;
parameters in the initial decompression kernel and the initial QINN are fine-tuned based on machine learning, the decompression kernel and the QINN are determined by minimizing the overall loss of quality between the one or more second reconstructed frame samples and the one or more ICS training marker frames.
12. The method of claim 11, wherein each second reconstructed ICS frame sample corresponds to an ICS training marker frame, the quality loss for the second reconstructed ICS frame sample is a mean square error between the second reconstructed ICS frame sample and its corresponding ICS training marker frame, and the total quality loss between the one or more second reconstructed ICS frame samples and the one or more ICS training marker frames is a sum of individual quality loss values.
13. An ICS frame transformation apparatus for computer vision analysis, comprising:
a readout module configured to readout one or more raw Bayer frames, and each raw Bayer frame is of size [ NX, NY ];
a processor configured to determine one or more transformed ICS frames by linearly transforming the one or more ICS frames using parameters in a 2D array of size [ Ncomp,3 ];
an output port configured to output the one or more transformed ICS frames to a neural network for computer vision analysis;
wherein intra-frame compression is performed on the one or more raw Bayer frames using a compression kernel, thereby determining one or more ICS frames, wherein the one or more raw Bayer frames are captured by the one or more camera heads;
wherein the size of each raw Bayer frame is [ NX, NY ], and the size of the compression kernel is [ kx, ky, Ncomp ];
where NX, NY, kx, ky, NX/kx, NY/ky, and Ncomp are positive integers, and Ncomp represents the number of ICS channels of the compression core.
14. The apparatus of claim 13, wherein for each ICS frame, the processor adds together pixel values at the same XY plane position in Ncomp ICS channels using weighting factors in 3 1D vectors [ Ncomp, j ] of the 2D array, where j is 0,1, and 2, to determine a transformed ICS frame corresponding thereto.
15. The apparatus of claim 13 or 14, wherein the parameters in the 2D array are determined based on sample training, the sample training comprising:
reading out one or more first raw Bayer frame samples, wherein each first raw Bayer frame sample has a size [ NX, NY ];
determining first raw Bayer frame samples of one or more demosaics by performing demosaicing on each first raw Bayer frame sample, wherein the first raw Bayer frame sample of each demosaic is of size [ NX, NY,3 ];
determining one or more transformed training token frames by performing downsampling on the first raw Bayer frame sample of each demosaic, wherein each transformed training token frame has a size [ NX/kx, NY/ky,3 ];
determining one or more first ICS frame samples by performing intra-frame compression on each first raw Bayer frame sample using a compression kernel;
determining one or more first transformed ICS-frame samples by linearly transforming the one or more first ICS-frame samples using initial parameters in a 2D array of size [ Ncomp,3 ];
fine-tuning initial parameters in the 2D array, and determining parameters in the 2D array by minimizing overall training loss between the ICS frame samples of the one or more first transforms and the one or more transform training marker frames to which they correspond.
16. The apparatus of claim 15, wherein each first transformed ICS frame sample corresponds to a transformed training marker frame, and the training loss of the first transformed ICS frame sample is a mean square error between the first ICS frame sample and its corresponding transformed training marker frame, and the total training loss between the one or more first transformed ICS frame samples and their corresponding one or more transformed training marker frames is a sum of individual training loss values.
17. The apparatus of claim 15, wherein the intra-frame compression process comprises:
for each raw Bayer frame, each group of pixel values in the raw Bayer frame is compressed into an integer using a compression kernel, wherein the pixels in each raw Bayer frame are divided into a plurality of groups, and each group of pixels corresponds to a 2D or 1D raw pixel array in the raw Bayer frame.
18. The apparatus of claim 13, 15 or 17, wherein the compression kernel is determined based on sample training, the sample training comprising:
reading out one or more second original Bayer frame samples, wherein each second original Bayer frame sample has a size [ NX, NY ];
for each second raw Bayer frame sample, performing linear transformation on R, B, Gb and Gr pixels in the second raw Bayer frame sample and combining the pixels together to determine an ICS training mark frame corresponding to the pixel, wherein the size of each ICS training mark frame is [ NX ', NY', Nlabel ];
performing intra-frame compression on each second raw Bayer frame sample using an initial compression kernel, the initial compression kernel having a size [ kx, ky, Ncomp ], thereby determining one or more second ICS frame samples having a size [ NX/kx, NY/ky, Ncomp ];
performing decompression on each second ICS frame sample using an initial decompression kernel of size [ kx ', ky', Ncomp, Nlabel ], thereby determining one or more second decompressed ICS frame samples, where kx '═ NX'. kx } NX, ky '. NY'. ky } NY;
training an initial compression kernel based on the one or more ICS training marker frames, thereby determining a compression kernel;
wherein NX ', NY', kx ', ky', and Nlabel are positive integers.
19. The apparatus of claim 18, wherein training the initial compression kernel based on the one or more ICS training marker frames, such that determining the compression kernel comprises:
fine-tuning parameters in the initial compression kernel based on machine learning, determining a floating point compression kernel by minimizing overall quality loss between one or more second decompressed ICS frame samples and their corresponding one or more ICS training marker frames;
the compression kernel is determined by integer-quantizing parameters in the floating-point compression kernel.
20. The apparatus of claim 19, wherein each second decompressed ICS frame sample corresponds to an ICS training marker frame, the quality loss for the second decompressed ICS frame sample is a mean square error between the second decompressed ICS frame sample and its corresponding ICS training marker frame, and an overall quality loss between the one or more second decompressed ICS frame samples and the one or more ICS training marker frames is a sum of individual quality loss values.
CN201980066175.3A 2019-11-21 2019-11-21 ICS frame transformation method and device for computer vision analysis Active CN113170160B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/120031 WO2021097771A1 (en) 2019-11-21 2019-11-21 Ics-frame transformation method and apparatus for cv analysis

Publications (2)

Publication Number Publication Date
CN113170160A true CN113170160A (en) 2021-07-23
CN113170160B CN113170160B (en) 2022-06-14

Family

ID=75980323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980066175.3A Active CN113170160B (en) 2019-11-21 2019-11-21 ICS frame transformation method and device for computer vision analysis

Country Status (2)

Country Link
CN (1) CN113170160B (en)
WO (1) WO2021097771A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133502A1 (en) * 2004-11-30 2006-06-22 Yung-Lyul Lee Image down-sampling transcoding method and device
CN104519361A (en) * 2014-12-12 2015-04-15 天津大学 Video steganography analysis method based on space-time domain local binary pattern
CN105791854A (en) * 2016-03-09 2016-07-20 中国人民武装警察部队工程大学 Singular value modification video steganographic algorithm based on combination with improved matrix coding
CN107197297A (en) * 2017-06-14 2017-09-22 中国科学院信息工程研究所 A kind of video steganalysis method of the detection based on DCT coefficient steganography
CN109635791A (en) * 2019-01-28 2019-04-16 深圳大学 A kind of video evidence collecting method based on deep learning
US20190313114A1 (en) * 2018-04-06 2019-10-10 Qatar University System of video steganalysis and a method of using the same
CN110457996A (en) * 2019-06-26 2019-11-15 广东外语外贸大学南国商学院 Moving Objects in Video Sequences based on VGG-11 convolutional neural networks distorts evidence collecting method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101713B (en) * 2016-07-06 2018-10-09 武汉大学 A kind of video steganalysis method based on the optimal calibration of window

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133502A1 (en) * 2004-11-30 2006-06-22 Yung-Lyul Lee Image down-sampling transcoding method and device
CN104519361A (en) * 2014-12-12 2015-04-15 天津大学 Video steganography analysis method based on space-time domain local binary pattern
CN105791854A (en) * 2016-03-09 2016-07-20 中国人民武装警察部队工程大学 Singular value modification video steganographic algorithm based on combination with improved matrix coding
CN107197297A (en) * 2017-06-14 2017-09-22 中国科学院信息工程研究所 A kind of video steganalysis method of the detection based on DCT coefficient steganography
US20190313114A1 (en) * 2018-04-06 2019-10-10 Qatar University System of video steganalysis and a method of using the same
CN109635791A (en) * 2019-01-28 2019-04-16 深圳大学 A kind of video evidence collecting method based on deep learning
CN110457996A (en) * 2019-06-26 2019-11-15 广东外语外贸大学南国商学院 Moving Objects in Video Sequences based on VGG-11 convolutional neural networks distorts evidence collecting method

Also Published As

Publication number Publication date
WO2021097771A1 (en) 2021-05-27
CN113170160B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
US20210012537A1 (en) Loop filter apparatus and image decoding apparatus
US10582168B2 (en) Green image data processing
CN109842799B (en) Intra-frame prediction method and device of color components and computer equipment
WO2012164896A1 (en) Image processing device, image processing method, and digital camera
WO2021228513A1 (en) Learned downsampling based cnn filter for image and video coding using learned downsampling feature
CN102143322A (en) Image capturing apparatus and control method thereof
CN112425158B (en) Monitoring camera system and method for reducing power consumption of monitoring camera system
WO2022061879A1 (en) Image processing method, apparatus and system, and computer-readable storage medium
JP2023528641A (en) Adaptive image enhancement using inter-channel correlation information
EP4094443A1 (en) Global skip connection based cnn filter for image and video coding
WO2020007990A1 (en) Compression of a raw image
US7194129B1 (en) Method and system for color space conversion of patterned color images
US20130027584A1 (en) Method and apparatus for frame rotation in the jpeg compressed domain
CN113170160B (en) ICS frame transformation method and device for computer vision analysis
CN114788280A (en) Video coding and decoding method and device
Chung et al. Novel and Optimal Luma Modification-Based Chroma Downsampling for Bayer Color Filter Array Images
CN110572652B (en) Static image processing method and device
Korhonen Improving image fidelity by luma-assisted chroma subsampling
US20040207737A1 (en) Image compression apparatus and image processing system
CN108989820B (en) Data compression method and device adopting respective corresponding chroma sampling formats at all stages
CN113196779B (en) Method and device for compressing video clip
CN106296754B (en) Show data compression method and display data processing system
US8630487B2 (en) Image processing apparatus and method
US20220256127A1 (en) Image encoding apparatus, method for controlling the same, and non-transitory computer-readable storage medium
CN117670793A (en) Rail damage detection method and device based on edge detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: ICS frame transformation method and device for computer vision analysis

Effective date of registration: 20230817

Granted publication date: 20220614

Pledgee: Jiangsu Jiangyin Rural Commercial Bank Co.,Ltd. Wuxi Branch

Pledgor: Wuxi ankedi Intelligent Technology Co.,Ltd.

Registration number: Y2023980052623

PE01 Entry into force of the registration of the contract for pledge of patent right