CN117897956A - Context-based event camera lossless image compression - Google Patents
Context-based event camera lossless image compression Download PDFInfo
- Publication number
- CN117897956A CN117897956A CN202280059339.1A CN202280059339A CN117897956A CN 117897956 A CN117897956 A CN 117897956A CN 202280059339 A CN202280059339 A CN 202280059339A CN 117897956 A CN117897956 A CN 117897956A
- Authority
- CN
- China
- Prior art keywords
- event
- frames
- frame
- polarity
- symbols
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007906 compression Methods 0.000 title claims description 27
- 230000006835 compression Effects 0.000 title claims description 27
- 238000000034 method Methods 0.000 claims abstract description 102
- 239000013598 vector Substances 0.000 claims abstract description 58
- 230000001364 causal effect Effects 0.000 claims description 30
- 230000003044 adaptive effect Effects 0.000 claims description 21
- KLPWJLBORRMFGK-UHFFFAOYSA-N Molindone Chemical compound O=C1C=2C(CC)=C(C)NC=2CCC1CN1CCOCC1 KLPWJLBORRMFGK-UHFFFAOYSA-N 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 229940028394 moban Drugs 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 description 14
- 238000012549 training Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention provides a method for efficiently storing and encoding an event stream (3) from an event camera (1), i.e. converting the event stream (3) into event frames (8) and generating a combined event frame (11) from a plurality of event frames (8) to be processed as an image; or storing spatial information (5) and polarity information (6) from the plurality of event frames (8) in a separately optimized data structure; the spatial information (5) is stored in a single event map image (15), and the polarity information (6) is incorporated into a polarity vector (16). The stored event data is encoded using a context-based lossless encoding method, and only pixel locations at which at least one event (4) occurs are encoded according to a category index (20) based on the number of detected events and an event frame index (23) representing the locations of the events in an event frame stack.
Description
Technical Field
The present invention relates generally to the field of digital image processing, and more particularly to the encoding of event data captured by event cameras.
Background
An event camera, also known as a neuromorphic camera, a silicon retina or dynamic vision sensor, is an imaging sensor that responds to local brightness changes. Event camera does not capture images using shutters as conventional) frame cameras. Instead, each pixel within the event camera operates independently and asynchronously, reporting a change in brightness when it occurs, otherwise, keeping silent.
During operation, each pixel stores a reference brightness level and is continually compared to the current brightness level. If the difference in brightness exceeds the threshold, the pixel will reset its reference level and generate an event: i.e. discrete data packets containing pixel addresses of varying brightness, time stamps and instantaneous measurements of polarity (increasing or decreasing) or illuminance. Thus, the event camera outputs an asynchronous event stream triggered by a scene lighting change.
It follows that the event stream delivered by the pixel event sensor contains only information about the variable object and no information about a uniform surface or stationary background.
Such an event camera has significant advantages over conventional (color) cameras, namely high dynamic range and low delay. In particular, event cameras may offer the possibility of very high temporal resolution, because asynchronous events may be detected at a minimum timestamp distance (i.e. 10 -6 Second) triggering, which corresponds to achieving up to 10 per second 6 Frame per second (fps). With these advantages, the event camera can be efficiently applied to the technical fields of object recognition, automatic driving vehicles, robotics and the like.
In some cases, such as fast moving texture scenes, millions of events are generated per second. To handle such busy scenarios, existing event processes may require a significant amount of parallel computation.
One existing event stream coding scheme is a pulse coding method, the sense of which comes from the most advanced scheme proposed for coding large amounts of data, namely, compression using the spatial and temporal characteristics of event location information. This method includes an adaptive macro-cube partition structure, an address priority mode and a time priority mode. Asynchronous events are packed into macro cubes that encode spatial, temporal, and polarity information, respectively. Thus, this approach encodes all of the original events that occur within a time interval, however, not all events are necessary for some applications, and a lossy scheme is more desirable. Furthermore, increasing the distance between events and accumulating event information may affect the performance of the pulse coding method, which is undesirable. In other words, the scheme is only used to process the original asynchronous data, not to encode event frames.
Another encoding method is time-aggregate-based lossless video encoding for neuromorphic vision sensor data algorithms, i.e., encoding a sequence of events by accumulating events over a time interval. The sequence of events generates two separate frames, one associated with an increase in luminous intensity and the other associated with a decrease in luminous intensity, i.e. based on positive and negative polarity of the event, respectively. The two frames are then concatenated into a "superframe" consisting of a "positive" frame on the left and a "negative" frame on the right. Finally, the super-frame is encoded using an efficient video coding (High Efficiency Video Coding, HEVC) codec. For high resolution (e.g., 640 x 480, 1280 x 720) sensors, the captured events are sparsely distributed in the frame, and HEVC cannot efficiently encode superframes because the video codec is designed to encode information of a different type than event count (information found in color format). Thus, one major drawback of this approach is that its performance depends on the performance of the selected video codec. Another disadvantage of this method is that it does not provide a lossless scheme when all events and all their corresponding information have to be encoded by the event data codec.
Given the ever higher frame rates that event cameras can achieve, and thus the very high levels of raw representations of event sequences, there is an increasing need for a method or system to provide schemes for efficiently encoding raw event data received from such event cameras, particularly schemes suitable for lossless event data compression.
Disclosure of Invention
In light of the foregoing, described herein are systems and methods for efficient data processing for event cameras. Such systems and methods accumulate events received from an event camera over a period of time in the form of an event stream, convert asynchronous events into event frames, and efficiently losslessly encode the event frames for further processing by other applications.
The above and other objects are achieved by the features of the independent claims disclosing a new context-based lossless image compression codec for event camera sensors, and a new lossless coding framework for event data, wherein spatial information is encoded into packets of a plurality of event frames, and polarity information is encoded by traversing the spatial information. The disclosed method provides a new event data representation in which spatial information and polarity information are encoded separately; a new strategy for encoding event space information using a binary map indicating the location in the event frame where at least one event occurred, the number of events, and a corresponding event frame index. In addition, a new event frame-based polarity encoding algorithm is provided; a new sparse coding mode (Sparse Coding Mode, SCM) activated under a specific event sparsity constraint.
According to a first aspect, there is provided a method comprising: receiving an event stream from an event camera; converting the event stream into a plurality of event frames; a combined event frame is generated by combining the plurality of event frames. The merging of the number of event frames is accomplished by converting event frame symbols assigned to pixels of the event frame into combined event frame symbols for corresponding pixels of the event frame based on spatial information, time stamps, and polarity information for each event in the event stream.
Generating a combined event frame according to the method provides a more efficient representation of the event stream data because information from multiple event frames is consolidated into a single image format, and then lossless image compression codecs can be used by encoding the combined event frame into a single image; or a single image format is further processed using a video codec by collecting multiple combined event frames in a video sequence.
In a possible implementation manner of the first aspect, the event frame symbols are combined into a combined event frame symbol according to the following formula:
CEF=k n-1 EF i-(n-1) +(…)+k 2 EF i-2 +k 1 EF i-1 +EF i
wherein CEF represents the combined event frame symbol, EF represents the event frame symbol, k is the number of event frame symbols, and n is the number of event frames to be combined into the combined event frame. The symbols of the event frame can be efficiently combined into the combined event frame symbols using this mapping formula.
In another possible implementation manner of the first aspect, the number of event frame symbols is k=3, indicating that the polarity information is a positive polarity event, wherein an increase in brightness is detected at a corresponding pixel of the event frame; representing the polarity information as a negative polarity event, wherein a decrease in brightness is detected at a corresponding pixel of the event frame; or no event is detected at a respective pixel of the event frame for a respective subinterval. In this implementation, the number of possible combined event frame symbols is 3 n Where n is the number of event frames to be combined into a combined event frame. The inventors have found that these specific parameters provide a very efficient way to merge multiple event frames.
In another possible implementation manner of the first aspect, the number of event frames to be combined into a combined event frame is n=5, and the event frame symbols are combined into a combined event frame symbol according to the following formula:
CEF=3 4 EF i-4 +3 3 EF i-3 +3 2 EF i-2 +3 1 EF i-1 +EF i
the inventors have found that these specific parameters provide a very efficient way to merge multiple event frames.
In another possible implementation manner of the first aspect, the method further includes encoding a plurality of subsequent combined event frames: by collecting the combined event frames into a video sequence and encoding using any video encoding standard, such as high efficiency video coding (High Efficiency Video Coding, HEVC); or each combined event frame is encoded separately into the original image using any lossless image compression codec, such as a Context-based adaptive lossless image codec (Context-based Adaptive Lossless Image Codec, CALIC) or a free lossless image format (Free Lossless Image Format, FLIF). This allows further processing or storage of the combined event frame.
According to a second aspect, there is provided a method comprising: receiving an event stream from an event camera; converting the event stream into a plurality of event frames by merging spatial information and polarity information of the detected event into event frames based on the respective time stamps of the event; the spatial information and the polarity information from the plurality of event frames are stored in separate data structures optimized for the respective information type to be stored.
Storing event frame data in separate data structures optimized for the respective information types to be stored not only provides an efficient way of storing, but also supports the use of different coding schemes to further encode different types of information (spatial and polarity) extracted from the event frames. The proposed method provides improved coding performance compared to the most advanced methods designed for lossless video coding and lossless image compression. Furthermore, the proposed method also provides improved performance when all asynchronous events received from the sensor are losslessly encoded.
In a possible implementation manner of the second aspect, storing the spatial information includes merging spatial information from the plurality of event frames into a single event map image, the single event map image being stored as a set of image bit planes, which not only provides an efficient data structure for storage, but also for further encoding.
In another possible implementation manner of the second aspect, the spatial information contained in the event map image is stored using the following combination: a binary map comprising binary map symbols assigned to each pixel to indicate a location at which at least one event occurred; a category index representing the number of events that occur nrEv; an event frame index that indicates that an event occurred in a particular single event frame. Such spatial event data representation ensures an efficient encoding method because only pixel locations are encoded, wherein at least one event is indicated in the event frame, i.e. pixel locations without events are ignored, thereby reducing the data size and shortening the data processing time.
In another possible implementation manner of the second aspect, the binary map is encoded using a template context model, wherein for each pixel of the binary map, a causal neighborhood of a corresponding binary map symbol is first determined and used to calculate the context index. Then, estimating the code length required to encode the binary map using a different number of neighbors collected in a particular order from the causal neighbors to determine an optimal model order; each binary map symbol is encoded based on its respective causal neighborhood, encoding the optimal model order and then encoding the binary map for traversal in raster scan order, thereby providing a more efficient encoding method for event frame spatial information than previously used prior art encoding methods.
In one embodiment, the optimal model order is determined using a model with a maximum order of m causal neighbors, where m is in the range of 1 to 18. In the example shown in fig. 6, m=18. The inventors have found that these specific parameters provide a very efficient method for determining the optimal model order.
In another possible implementation manner of the second aspect, the class index is represented by class index symbols in an alphabet, and each class index symbol includes p bit planes. Encoding the category index includes encoding the category index symbol on a bitwise plane, encoding a first bit in a representation of each category index symbol using a template context model, wherein,the first bit plane (denoted as) Encoding in a raster scan, the template context model based on contexts calculated for each bit of the category index symbol determined from a respective causal neighborhood on the first bit plane; for each subsequent bit plane (denoted +.>) Coding, wherein said corresponding context is using said +.>Is a causal neighborhood of (1) and from its previous at least one bit plane +. >Is determined by the corresponding context template. This provides a more efficient coding method for event frame spatial information than previously used prior art coding methods.
In one embodiment, the number p of bitplanes of the class index symbol is determined asWherein (1)>Representing the upper bound operator.
In another possible implementation manner of the second aspect, for each subsequent bit plane having at least two previous bit planesUsing +.>And->To determine the corresponding context. The inventors have found that using the previous two bitplanes provides a very efficient way of encoding class indexes.
In one embodiment, for each subsequent bit plane, the corresponding causal neighborhood has an order0 length, from its previous bit planeHas an order1 length. The inventors have found that using these lengths provides a very efficient way of encoding class indexes.
In another possible implementation manner of the second aspect, the event frame index includes an event frame index symbol representing the individual event frame in which any event occurs at the corresponding pixel; each event frame index is encoded according to the class index of the respective pixel, each class of the class index representing a total number nrEv of events detected at the respective pixel. The event frame index is encoded by: first, dividing the alphabet into n sub-alphabets, and associating the sub-alphabets with each category; then remapping the event frame index symbols to remapped symbols of the corresponding sub-alphabet based on the corresponding category index; and finally, associating category symbols according to the corresponding category indexes. Using sub-alphabets assigned to each category provides a very efficient method for event frame index encoding.
In one embodiment, the alphabet comprises 2 n And a number of possible symbols, where n is the number of encoded event frames. In the example of n=8, the alphabet comprises 256 symbols.
In another possible implementation of the second aspect, the sub-alphabets include different numbers of symbols, and each sub-alphabet is associated with a category based on a number of symbols required to remap the event frame index symbols of the respective category. The use of such a sub-alphabet provides a very efficient method for event frame index encoding.
In another possible implementation manner of the second aspect, each event frame index includesA number of possible symbol combinations, where n represents the number of event frames, nrEv corresponds to the category index representing the total number of events occurring in the n event frames, and each sub-alphabet thus includes all +.>And possibly symbols. The inventors have found that these specific parameters provide a very efficient method for event frame index encoding.
In another possible implementation manner of the second aspect, the remapped event frame index symbol and the associated class symbol are combined into n class vectors, the number n of class vectors corresponding to the n event frames, wherein the first n-1 class vectors are encoded using an adaptive markov model, and the last class vector is associated with a deterministic case in which an event occurs in each event frame, thereby generating a very efficient event frame index encoding method.
In one embodiment, the number of event frames is n=8, wherein the sub-alphabet comprises 8, 28, 56 or 70 symbols, respectively, and to determine the optimal order of the adaptive markov model, the maximum order is set to: 5 (for sub-alphabets with 8 symbols), 3 (for sub-alphabets with 28 symbols), 2 (for sub-alphabets with 56 or 70 symbols). The inventors have found that these specific parameters provide a very efficient method for event frame index encoding.
In another possible implementation of the second aspect, storing polarity information from the plurality of event frames includes merging the polarity information from each event frame into a polarity vector including binary polarity symbols determined based on brightness changes during the detected event, resulting in a very efficient method for encoding the polarity information from the event frames.
In another possible implementation of the second aspect, the polarity symbols from each polarity vector of the plurality of event frames are concatenated into a concatenated polarity vector, resulting in a very efficient representation of polarity information from an event frame.
In one embodiment, the polarity symbols from each polarity vector are concatenated into a concatenated polarity vector on condition that the total number of events associated with the respective event map image is below a threshold event number tnreev. In one example tnreev=150, this conditional concatenation of polarity information into a concatenated polarity vector ensures that the encoding algorithm achieves optimal efficiency.
In another possible implementation manner of the second aspect, the polarity information from the plurality of event frames is stored in a plurality of corresponding polarity vectors by traversing the spatial information of the event frames bit plane by bit plane or event frame by event frame, thereby ensuring that the polarity data encoding achieves the best efficiency.
In another possible implementation manner of the second aspect, the number of the polarity vectors is according to the corresponding event frames EF received over n sub-periods 9 i=1→n Wherein each PV i Corresponds to EF i And encoded as vectors using adaptive Markov modeling, or traversing the event map image using the binary map and pair PV using a template context model i Encodes each polarity symbol of (1), wherein PV i The respective context of the polarity symbol in (a) is using the data from the current event frame EF i Is determined by a corresponding causal neighborhood of (a) and at least one previous event frame EF i-1 Serving as the respective upper and lower Wen Moban; wherein for each event frame EF having at least two previous event frames i ,PV i The corresponding context of the polarity symbol in (a) is to use the two previous event frames EF from i-1 And EF i-2 Is determined by the context template of (a). This results inA very efficient method of encoding polarity information from event frames is provided.
In one embodiment, to encode each PV i The context index is calculated based on three parameters: an optimal model order0 corresponding to the causal neighborhood, an optimal model order1, order2 corresponding to the context template. In one embodiment, the maximum pattern order searched for order0, order1, and order2 is set to 7, 6, and 3, respectively, for each event frame having at least two previous event frames. In an embodiment of an event frame having only one previous event frame, the maximum pattern order searched for order0, order1 is 7 and 6, respectively, with order2 not used. In one embodiment, the maximum pattern order searched for order0 is set to 10 for each event frame without the previous event frame, with order1 and order2 not used. The inventors have found that using these model orders provides a very efficient method for encoding polarity information from event frames.
In an embodiment where the polar vector is encoded as a vector using adaptive markov modeling, (order 0, order1, order 2) = (0, 0) is encoded to indicate that the adaptive markov modeling method is selected, thereby ensuring that the polar data encoding achieves the best efficiency.
In another possible implementation manner of the second aspect, spatial information and polarity information from the N event frames are encoded using a sparse coding mode (Sparse Coding Mode, SCM), wherein if a total number N of events detected within the plurality of sub-periods is below an event threshold, the sparse coding mode is activated and at least one of the spatial information or the polarity information is encoded using a lower complexity coding method; otherwise, the sparse coding mode is not activated. The sparse events in sub-period 9 can be efficiently encoded using such a sparse coding mode.
In another possible implementation manner of the second aspect, in response to the determined total number of events being below the first event threshold N<ET 1 The SCM is thus activated and the spatial information is encoded as follows: always encode a bit to indicate isActivating or deactivating the SCM; if SCM is activated, then: log of use 2 (ET 1 ) Bits encode N; for each event e i The spatial information is encoded as follows: log of use 2 (H) Bit pair x i Coding; log of use 2 (W) bit pair y i Coding; log of use 2 (n) bits encode the event frame index, where H is the height of the event camera sensor 1 and W is the width of the event camera sensor 1. In one embodiment, the first event threshold is known to be 10 based on the resolution W H of the event camera 1<ET 1 <50. In one example ET 1 In =20, the inventors found that these specific parameters provide a very efficient method for event frame spatial information encoding.
In another possible implementation manner of the second aspect, in response to the determined total number of events being below the second event threshold N<ET 2 The SCM is thus activated and the polarity information is encoded as follows: concatenating the polarity symbols from each polarity vector of the plurality of event frames into a single concatenated polarity vector; the serial polarity vector is encoded using a 0-order markov model. In one embodiment, the second event threshold preferably ranges from 100<ET 2 <200, more preferably ET 2 =150. The inventors have found that these specific parameters provide a very efficient method for event frame polarity information encoding.
In another possible implementation manner of the second aspect, the number n of event frames is in a range of 1 to 32, depending on the length of the sub-period 9. In one embodiment, the length of the sub-period 9 is very short and the number of event frames is in the range of 16 to 32. In another embodiment, the length of the sub-period is longer and the number of event frames is in the range of 1 to 8. In one example, the number of event frames is n=8. The inventors have found that these particular event frame ranges for these particular sub-period lengths provide a very efficient method for event frame encoding.
According to a third aspect, there is provided a computer-based system, the system comprising: the event camera is used for recording an event stream; a processor coupled to the storage device for converting the event stream into a plurality of event frames; the storage device includes instructions that, when executed by the processor, cause the computer-based system to merge the information in the plurality of event frames into a combined event frame by performing the method according to any possible implementation of the first aspect. The resulting system is extremely efficient for merging multiple event frames.
According to a fourth aspect, there is provided a computer-based system comprising: the event camera is used for recording an event stream; a processor coupled to the storage device for converting the event stream into a plurality of event frames; the storage device includes instructions that, when executed by the processor, cause the computer-based system to process the information from the plurality of event frames according to a method of any possible implementation of the second aspect. The resulting system is extremely efficient for processing event data from multiple event frames.
According to a fifth aspect, there is provided a non-transitory computer readable medium having stored therein program instructions which, when executed by a processor, cause the processor to perform a method according to any one of the possible implementations of the first aspect. The resulting computer-readable medium is highly efficient for merging multiple event frames.
According to a sixth aspect, there is provided a non-transitory computer readable medium having stored therein program instructions which, when executed by a processor, cause the processor to perform a method according to any one of the possible implementations of the second aspect. The resulting computer-readable medium is highly efficient for processing event data from multiple event frames.
These and other aspects are apparent from and will be elucidated with reference to one or more embodiments described hereinafter.
Drawings
In the following detailed portion of the invention, aspects, embodiments and implementations will be explained in more detail with reference to example embodiments shown in the drawings in which:
FIG. 1 illustrates an example of receiving an event stream from an event camera, according to an embodiment of the invention;
FIG. 2 illustrates an example of converting an event stream into a plurality of event frames according to an embodiment of the present invention;
FIG. 3 illustrates steps for generating a combined event frame from a plurality of event frames, according to an example of an embodiment of the present invention;
FIG. 4 illustrates steps for extracting spatial information and polarity information from a plurality of event frames and storing in separate data structures, according to another example of an embodiment of the present invention;
FIG. 5 illustrates a step of storing spatial information from an event map image, according to another example of an embodiment of the present invention;
FIG. 6 shows a schematic diagram of determining a causal neighborhood and neighborhood order (on the left) in a current bit plane and a template context and neighborhood order (on the right) from a subsequent bit plane for a current pixel, according to another example of an embodiment of the invention;
FIG. 7 shows a schematic diagram of an encoded polarity vector according to another example of an embodiment of the invention;
FIG. 8 shows a schematic diagram of encoding an event frame index based on a corresponding category index, according to another example of an embodiment of the invention;
fig. 9 shows another example of remapping event frame index symbols to remap symbols for a first class index, where nrEV = 1, according to an embodiment of the invention;
fig. 10 shows a flow chart of applying a sparse coding mode (Sparse Coding Mode, SCM) to encode spatial information and polarity information from an event frame, according to another example of an embodiment of the invention;
FIG. 11 shows compression results obtained on a training data set after using a lossless compression method on three values of the length of a sub-period, according to an example of an embodiment of the present invention;
FIG. 12A shows the smallest possible in the subintervalLength 10 -6 Within seconds, training the relative compression results in the dataset;
FIG. 12B shows an example of a minimum possible length of 10 at sub-periods according to an embodiment of the invention -6 The event density results for each event sequence in the training dataset are in units of mega events per second.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the related invention. However, it will be understood by those skilled in the art that the present invention may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuits have not been described in detail as not to unnecessarily obscure aspects of the present invention.
Fig. 1 shows an example of receiving an event stream 3 from an event camera 1 according to the invention. The event camera 1 may be mounted on an autonomous vehicle and comprises a plurality of pixel sensors 2, each pixel sensor 2 being adapted to independently detect an event 4 representing a change in brightness in a captured scene. Event 4 in event stream 3 includes spatial information 5 of pixel sensor 2 that detected event 4, timestamp 7 of event 4, and polarity information 6 of event 4, depending on the brightness change detected by pixel sensor 2. The event stream 3 in this example is received within a first period of time, which in turn is divided into a plurality of subintervals 9. The length of these sub-periods 9 may be chosen according to the data frequency required for the application and may even be as short as 10 -6 The length of seconds, which corresponds to achieving up to 10 per second 6 Frame per second (fps).
For example, the number of event frames 8 may be in the range of 1 to 32, depending on the length of the subinterval 9.
Where the length of subinterval 9 is relatively short (e.g. 10 -6 Seconds) the number of event frames 8 is in the range of 16 to 32. In another example where the length of the subinterval 9 is longer, the number of event frames 8 is in the range of 1 to 8. In a particular example, the number of event frames 8 is n=8.
In application, the input data is sometimes preferably a stream of events 3{e collecting a spatiotemporal neighborhood of time intervals i } i=1,2,…,N The input data is then processed to produce an output. In general, however, event data is typically used as an image, wherein a sequence of asynchronous events is divided into spatiotemporal neighbors (i.e., time-amounts) of subintervals 9.
Fig. 2 shows that the event stream 3 received over the first period is converted into an event frame 8 by dividing the first period into a plurality of subperiods 9 as described above. One Event Frame (EF) 8 is generated for each time-amount of the size w×h×Δ (where H is the event camera sensor height, W is the event camera sensor width, and Δ is the length of the sub-period 9). Event-pixel polarity information 6 is first summed and then the pixel polarity is set to the sign of the sum, i.e. event frame sign 10 is assigned to each pixel of event frame 8, thereby accumulating events 4 occurring at the same pixel location (x, y) and thereby yielding the final event frame 8 on the right.
In one example, the original event frame 8 contains 3 symbols that can be represented using 2 bits per pixel. However, such a representation is inefficient. The proposed solution to improve the event data representation is to combine the information found in multiple event frames 8 into a single combined event frame 11.
Fig. 3 shows an example of generating a merged event frame 11 by merging a plurality of event frames 8 according to the present invention. The merging of event frames 8 is done by converting all event frame symbols 10 assigned to corresponding pixels of event frames 8 into combined event frame symbols 12 of corresponding pixels of event frames 11.
The event frame symbols 10 are combined into a combined event frame symbol 12 according to the following formula:
CEF=k n-1 EF i-n-1 +…+k 2 EF i-2 +k 1 EF i-1 +EF i
where CEF denotes the combined event frame symbol 12, EF denotes the event frame symbol 10, k is the number of event frame symbols 10, and n is the number of event frames 8 to be combined into the combined event frame 11.
In one example, the number of event frame symbols 10 is k=3, e.g., {0,1,2}, indicating that polarity information 6 is one of:
positive polarity event 4, i.e. an increase in brightness is detected at the corresponding pixel of event frame 8;
negative polarity event 4, i.e. a decrease in brightness is detected at the corresponding pixel of event frame 8; or alternatively
No event 4 is detected at the corresponding pixel of the event frame 8 within the corresponding sub-period 9.
In this example, the number of possible combined event frame symbols 12 is 3 n Where n is the number of event frames 8 to be combined into a combined event frame 11.
In one example, as shown in fig. 3, the number of event frames 8 to be combined into a combined event frame 11 is n=5, and the event frame symbols 10 are combined into a combined event frame symbol 12 according to the following formula:
CEF=3 4 EF i-4 +3 3 EF i-3 +3 2 EF i-2 +3 1 EF i-1 +EF i
however, the number n of event frames 8 may not be 5. For example, the number n of event frames 8 may be less than 5, but this may result in coding inefficiency. On the other hand, if the number n of event frames 8 is greater than 5, then the combined event frame 11 will use more than 8 bits to represent the combined event frame symbol 12, which is also undesirable because image and video codecs are typically designed to accept an 8-bit data representation as input.
Finally, as shown in fig. 3, the combined event frames 11 may be further encoded by collecting the combined event frames 11 into the video sequence 13 and encoding using any video encoding standard (e.g., high efficiency video coding (High Efficiency Video Coding, HEVC)) codec. According to alternative examples, each combined event frame 11 may be encoded individually into a single original image 14 using any lossless image compression codec, such as a Context-based adaptive lossless image codec (Context-based Adaptive Lossless Image Codec, CALIC) or a free lossless image format (Free Lossless Image Format, FLIF) codec.
Fig. 4 shows the steps of extracting spatial information 5 and polarity information 6 from a plurality of event frames 8 and storing in separate data structures according to another example of the invention.
In this example, the initial step of receiving the event stream 3 from the event camera 1 and converting the event stream 3 received over the first period of time into a plurality of event frames 8 corresponding to the selected subinterval 9 is performed as described above, i.e. the spatial information 5 and the polarity information 6 of the detected event 4 are incorporated into the event frames 8 based on the respective time stamps 7 of the event 4. After the event frame 8 is generated, the spatial information 5 and the polarity information 6 are stored in separate data structures, as described below.
In this example, storing the spatial information 5 includes merging the spatial information 5 from the plurality of event frames 8 into a single event map image 15 stored as a set of image bit planes 25; storing polarity information 6 includes incorporating polarity information 6 from each event frame 8 into a polarity vector 16, the polarity vector 16 including binary polarity symbols 17 determined based on brightness changes in the detected event 4.
The polarity vector 16 may be generated by traversing the spatial information 5 of the event frame 8, e.g. bit-plane by bit-plane or event frame by event frame.
As shown in fig. 4, the polarity symbols 17 from each polarity vector 16 may in turn be further concatenated into a concatenated polarity vector 32. This concatenation-in-series polarity vector 32 condition depends on the condition that the total number of events associated with the respective event map image 15 is below a threshold number of events, such as a second event threshold ET, which will be explained in more detail below in connection with the sparse coding mode 2 . The threshold number of events is calculated from the total number of events in all the event frames 8 after merging. In one example, which will be explained later, the second event threshold is ET 2 =150, i.e. less than 150 events, the polarity symbols 17 from the polarity vector 16 are concatenated into a concatenated polarity vector 32, which is encoded by activating the SCM 108 to obtain polarity information.
Fig. 5 shows steps of storing space information 5 according to an example of the invention. As shown, spatial information 5 included in the event map image 15 is stored using a combination of binary map 18, category index 20, and event frame index 23.
Binary map 18 includes a binary map symbol 19 assigned to each pixel of binary map 18 to indicate where at least one event 4 has occurred in any corresponding pixel of the plurality of event frames 8.
The class index 20 represents the number nrEv of events 4 that occur at the corresponding pixels of the binary map 18 indicated by the binary map symbol 19. The class index 20 is represented using class index symbols 21, each class index symbol 21 comprising p bitplanes.
In one example, the number p of bitplanes of the class index symbol 21 is determined asWherein (1)>Representing the upper bound operator. For example, if there are n=8 event frames, p=3 bit planes are used to represent the category index symbol 21, because +.>If more event frames 8 are used, more than 3 bit planes are used to represent the number of events.
Finally, the event frame index 23 represents the individual event frames 8 of the n event frames 8 of which the event 4 occurs at the corresponding pixels of the binary map 18 indicated by the binary map symbol 19, i.e. the positions of these individual event frames 8 in the event frame stack. The event frame index 23 may include an event frame index symbol 24 that represents each event frame 8 in the number of event frames 8 in which any event 4 occurred.
In the example shown, the number of events 4 occurring at the corresponding pixel of the binary map 18 is nrev=2, represented in the event frame index 23 as the number in binary (radix 2) (00000101) 2 ) Indicating that an event occurred in the first event frame 8 and the third event frame 8EF 1 And EF 3 And (3) upper part.
Fig. 6 to 9 show compression methods for encoding separately stored spatial information 5 and polarity information 6, as described above.
In order to encode different types of spatial information and polarity information, a new encoding method is developed, which will be described below.
Binary map 18 is encoded using a template context model, and for each pixel of binary map 18, a causal neighborhood 26 of a corresponding binary map symbol 19 is determined. The left side of FIG. 6 shows BP for representing the current bit plane 25 when encoding polarity information i Is an example of causal neighborhood and neighborhood order. Binary map 18 comprises binary map symbols 19 in the alphabet {0,1}, which may be represented using a single bit plane. Thus, the binary map 18 has only one bit plane 25, so it uses a causal neighborhood 26, i.e. not all pixels are decoded. Next, the context index of the binary distribution model used to encode the current binary map symbol 19 is calculated using the causal neighborhood 26 and the found optimal model order0.
Specifically, the different number of neighbors collected in the order of causal neighbors 26 are used to estimate the code length needed to encode binary map 18 to determine the optimal model order0. Finally, each binary map symbol 19 is encoded based on the context of the best order computation using the neighborhood found in the current pixel causal neighborhood 26, the best model order found is encoded, and then used to encode the binary map 18 traversed in raster scan order.
In one example, the optimal model order is determined using a model with a maximum order of m causal neighbors, where m is in the range of 1 to 18. In the example shown in fig. 6, m=18, i.e. 18 causal neighborhoods are used to calculate the context index of the binary distribution model used to encode the current binary map symbol 19.
As described above, the class index 20 is represented using class index symbols 21, each class index symbol 21 comprising p bitplanes 25. The category index 20 represents the number nrEv of events 4, also called eventsThe number of "1" s in the binary representation of the symbol in the piece map image 15, which uses the alphabet {0,1, …,2 } p -p bit symbols in 1, the symbols being encoded bit-plane by bit-plane.
Similar to binary map 18, a first bit plane 25 (denoted as) Encoding in raster scan, i.e. encoding the first bit in the representation of each class index symbol 21 using a template context model as described above, which is based on the first bit plane 25 +.>The context calculated for each bit of the category index symbol 21 determined by the corresponding cause and effect neighborhood 26.
Thereafter, the current bit plane is usedIs derived from the previous at least one bit plane for the current bit plane +.>Determining a respective context for each subsequent bit-plane by a respective context template 27Encoding, as shown in FIG. 6, the bitwise plane is opposite to the current bit plane BP i And the previous bit plane BP i-1 Polarity information 16 is encoded.
For each subsequent bit-plane having at least two previous bit-planes 25It is possible to use +.>And->The context template 27 of which determines the corresponding context. This is similar to the polarity information encoding shown in fig. 7, which will be explained below.
In one example, for each subsequent bit plane 25, the corresponding causal neighborhood 26 has an order0 length from its previous bit planeThe context template 27 of (1) has an order1 length.
Thus, the second bit plane (denoted as) Coding: i.e. using the extract from +.>Is from +.about.1 length)>Context templates 27 (with order0 length) to form a context. Similarly, the method is used for the third bit plane (denoted +.>) Coding is performed wherein the coding is performed using the coding from +.>Causal neighborhood 26 and from- >And->Context templates 27 of (c) to form a context.
FIG. 7 shows the antipodal sequence of corresponding event frames 8 received over n sub-periods 9The number of sex vectors 16 is encoded, namely EF i=1→n Wherein each polarity vector 16PV i Corresponds to EF i . As described above, the polarity vector 16 includes binary polarity symbols 17 and may be encoded using adaptive markov modeling (e.g., using the maximum order 14) as a vector.
Alternatively, the polarity vector 16 may also be encoded by traversing the event map image 15 using the binary map 18 and encoding each polarity symbol 17 using a template context model as shown in fig. 6 and 7, wherein the data from the current event frame EF is used i Is determined by PV in the corresponding causal neighborhood 26 of (2) i And uses at least one previous event frame EF i-1 As a corresponding context template 27; for each event frame EF having at least two previous event frames 8 i Using frames EF from the first two events i-1 And EF i-2 Context template 27 of (c) determines PV i Is used to determine the context of the corresponding context.
Thus, in the case of encoding a polarity vector 16, the causal neighborhood 26 will include polarity information 6, and in the case of encoding an event map image 15, the causal neighborhood 26 will include spatial information 5. Although each type of information uses a causal neighborhood 26, each neighborhood is represented by a corresponding type of information (spatial or polar).
For encoding each PV i The corresponding causal neighborhood 26 may have an optimal model order0 and the corresponding context template 27 may have a corresponding optimal model order1 and order2.
In the example of each event frame 8 having at least two previous event frames 8, the maximum pattern order of order0, order1, and order2 is set to 7, 6, and 3, respectively. The maximum pattern order of order0 and order1 may be 7 and 6, respectively, for an event frame 8 having only one previous event frame 8, in which case order2 is not used. For each event frame 8 without the previous event frame 8, the maximum pattern order of order0 may be set to 10 without using order1 and order2.
In the example where the polarity vector 16 is encoded as a vector using adaptive markov modeling, (order 0, order1, order 2) = (0, 0) would be encoded to indicate that the adaptive markov modeling method was selected.
Fig. 8 shows a schematic diagram of the encoding of an event frame index 23 according to an example of the invention. As described above, the event frame index 23 includes the event frame index symbol 24 for representing each event frame 8 among the number of the plurality of event frames 8 in which any event 4 has occurred.
The encoding of each event frame index 23 depends on the class index 20 of the respective pixel, each class of the class index 20 representing the total number nrEv of events 4 detected at the respective pixel during the plurality of event frames 8. To do this, as shown in FIG. 8, 2 n The alphabet 28 of individual symbols is divided into n sub-alphabets 29, n being the number of encoded event frames 8. Then, a sub-alphabet 29 is associated with each category, andthe event frame index symbols 24 are remapped to the +_s of the corresponding sub-alphabet 29 based on nrEV corresponding category indexes 20>In the remapped symbol 30 as shown in fig. 8. Finally, based on the nrEV index-associated class symbols 22 of the corresponding class index 20, the class index 22 is represented using a binary representation of nrEV-1 (using p bitplanes).
In one example, the alphabet 28 includes 2 n The number of possible symbols, n, is the number of encoded event frames. In the example shown in fig. 8, n=8, i.e. the alphabet 28 comprises 256 symbols. Thus, as shown, the sub-alphabets 29 include different numbers of symbols, each sub-alphabet 29 being associated with a category based on the number of symbols required to remap the event frame index symbols 24 of the corresponding category.
As shown in FIG. 8, each event frame index 23 may includeThe number of possible symbol combinations, n representing the number of event frames 8, nrEv corresponding to the tableA category index 20 showing the total number of events 4 that occur in n event frames 8. Thus, each sub-alphabet 29 comprises all +. >The possible symbols, which will be explained in more detail below.
In the example shown, the number of event frames 8 is n=8. In practice this means that when eight Event Frames (EF) 8 are stored, eight categories are found in the event map image 5, i.e. 1,2, … …, 8 events may occur at the current pixel position. The EF index 23 is encoded with an alphabet associated with each category, the first category (nrev=1) indicating that an event 4 has occurred in the EF stack, and thus the symbols {1,2 in the alphabet 28 1 ,2 2 ,…,2 7 Remapping toIn the alphabet of symbols ({ 0,1,2, …,7 }) and use +.>The bit plane computes the class symbol as nrEv-1 10 =b p-1 b p-2 …b 0 I.e. with 0 10 =000 2 And (5) associating. The second class (nrev=2) indicates that two events 4 have occurred in the EF stack, and therefore, the symbols in the alphabet 28 that are included in their binary representation 2 bits set to "1" (e.g., {3, 5, … …, 192 }) are remapped to }>In the alphabet of symbols, and with category symbol 1 10 =001 2 And (5) associating. The third category (nrev=3) indicates that three events 4 have occurred in the EF stack, and therefore, the symbols in the alphabet 28 that include the bits set to "1" in their binary representation 3 bits (e.g., {7, 11, … …, 224 }) are remapped to +. >In the alphabet of symbols and with category symbol 2 10 =010 2 And (5) associating. In this way, the fourth, fifth, sixth, and seventh categories re-map the symbols in the alphabet 28 to respectivelyAnd->In the alphabet of symbols and with category symbol 3 10 =011 2 、4 10 =100 2 、5 10 =101 2 And 6 10 =110 2 Respectively associated. Finally, the eighth category indicates that an event occurred in each EF, which is a deterministic case, since all bits are set to 1; and symbol 7 10 =111 2 And (5) associating.
According to another example (not shown), if we choose n=10 event frames 8 to encode, then the alphabet 28 would include 2 10 =1024 symbols, the total number of events 4 nrev=10 reaches an upper limit, so we will need p=4 bitplanes to encode the class index 20, and each sub-alphabet 29 will haveAnd equal symbols.
In summary, the coding in this methodNumber of bit planesIncluding 1 bit plane for binary map 18; for category index 20A bit plane; n bit planes for polarity information, namely, in total: />Bit planes, if n=8, means that there are a total of 1+3+8=12 bit planes, and if n=4, means that there are a total of 1+4+2=7 bit planes.
Fig. 9 illustrates steps for remapping a first class of event frame index symbols 24 to remap symbols 30 according to an example of the invention.
The remapped symbol 30 and the associated category symbol 22 are then merged into n category vectors 31, the n category vectors 31 corresponding to the number n of event frames 8. The first n-1 class vectors 31 may be encoded using an adaptive markov model, the last class vector 31 being associated with a deterministic case, i.e. event 4 occurs in each event frame 8. To determine the best order for the adaptive Markov model, the maximum order search is set to: 5 (for sub-alphabet 29 with 8 symbols), 3 (for sub-alphabet 29 with 28 symbols), 2 (for sub-alphabet 29 with 56 or 70 symbols). Experiments have shown that a large number of symbols are collected by the first 2 to 3 categories, which demonstrates the effectiveness of the proposed method.
Fig. 10 shows a flow chart of applying a sparse coding mode (Sparse Coding Mode, SCM) to encode spatial information 5 and polarity information 6 from an event frame 8, according to another example of the invention. When the selected length delta of sub-period 9 is very small, e.g. only delta=10 -6 In seconds, only a small number of events 4 occur in each event frame 8. In this case, the proposed algorithm may be too complex to use and have a higher bit rate. Thus, the SCM is activated according to the number N of events in the event map image 15.
Specifically, if the total number N of events 4 detected within the plurality of subintervals 9 is below the event threshold, the sparse coding mode is activated and at least one of the spatial information 5 or the polarity information 6 is encoded using a lower complexity coding method. Otherwise, i.e. if the total number N of events 4 detected within a plurality of sub-periods 9 is higher than the event threshold, the sparse coding mode is not activated and at least one of the spatial information 5 or the polarity information 6 is coded according to the above described method.
The initial steps 101 to 103 are similar or identical to the steps described above, i.e. in step 101, an event stream 3 is received from the event camera 1; converting the event stream 3 into a plurality of event frames 8 at step 102; in step 103, spatial information 5 and polarity information 6 are extracted from event frame 8 and stored in separate data structures.
Subsequently, in response to the determined total number of events 4 being below the first event threshold N<ET 1 In step 104, a sparse coding mode is activated to code spatial information 5. In this encoding process, one bit is always used to indicate whether the sparse coding mode is active or off; if the sparse coding mode is active, log is used at step 105 2 (ET 1 ) The bits encode N. Finally, in step 106, for each event 4e i The spatial information 5 is encoded as follows: in step 107, log is used 2 (H) Bit pair x i Coding; log of use 2 (W) bit pair y i Coding; log of use 2 (n) bits encode the event frame cable 23, where H is the height of the event camera sensor 1 and W is the width of the event camera sensor 1.
From the resolution W×H of the event camera, it is known that the first event threshold is 10<ET 1 <50. In one example, ET 1 =20。
The encoding of the polarity information 6 also depends on the activation of the sparse coding mode. Specifically, in parallel step 108, in response to the determined total number of events 4 being below the second event threshold N<ET 2 The SCM is activated to encode the polarity information 6. At step 109, the polarity symbols 17 from each polarity vector 16 are concatenated into a single concatenated polarity vector 32 (as described above); in step 110, the serial polarity vector 32 is encoded using a 0-order Markov model.
The second event threshold is at 100<ET 2 <Within 200, more preferably, ET 2 =150。
Fig. 11 and 12 show lossless compression results on training data sets acquired using different codecs, according to an example of the present invention.
Experimental evaluation of the encoding method was performed using an ETH_training dataset with 82 event sequences with 640×480 event camera resolution, see publication "M.Gehrig, W.Aarents, D.Gehrig and D.Scanamuzza" DSEC: A Stereo Event Camera Dataset for details for Driving Scenarios,”[IEEE Trans.Robot.Autom.,vol.6,no.3,pp.4947-4954,Jul.2021.]". The encoding method explained above is implemented in the C language and encodes a sequence of 8 Event Frames (EF) losslessly. Four frame rates were studied: (i) Δ= 5.555 milliseconds (180 fps); (ii) Δ=1 millisecond (1000 fps); (iii) Δ=0.1 milliseconds (10000 fps) and (iv) Δ=10 -6 Seconds (1000000 fps), i.e., all events acquired by the sensor are collected by the EF. For Δ= 5.555 milliseconds, 1 millisecond, 0.1 millisecond, the raw data size is reported using a representation of 2 bits per EF pixel. For delta=10 -6 Second, the raw data size is reported using an 8 Byte (B) representation of each event, as shown in the sensor specification. The entire sequence of events is encoded for delta= 5.555 milliseconds, for delta=1 millisecond, 0.1 millisecond, 10 6 Second, the first 20 seconds, 2 seconds, and 20 milliseconds of each event sequence are encoded, respectively.
The proposed method performance is compared with the following most advanced method applicable for lossless encoding of events represented as a combined event frame 11: (1) HEVC standard implemented using FFmpeg; (2) A context adaptive lossless image codec (Context Adaptive Lossless Image Codec, CALIC) codec; (3) A free lossless image format (Free Lossless Image Format, FLIF) codec. The lossless compression results are compared using the following manner: (a) A compression ratio (Compression Ratio, CR), defined as the ratio between the compression size and the original data size, and (b) a relative compression (Relative Compression, RC), defined as the ratio between the compression size and the proposed method size.
Fig. 11 shows the compression results on the eth_training data set. The relative compression (Relative Compression, RC) results for different subinterval 9 lengths are shown in the following subgraph: (a) Δ= 5.555 milliseconds; (b) Δ=1 millisecond; (c) Δ=0.1 milliseconds. Compression ratio (Compression Ratio, CR) results for different subinterval 9 lengths are shown in the following subgraph: (d) Δ= 5.555 milliseconds; (e) Δ=1 millisecond; (f) Δ=0.1 milliseconds. The proposed method provides an average performance improvement of 20.66% compared to FLIF; compared with HEVC, the lifting rate reaches 70.68%. In the case of Δ=0.1 ms, the proposed representation is 274 times more efficient than the asynchronous raw data.
Fig. 12A shows a data set for length delta=10 on the eth_training data set -6 Lossless compression result for subinterval 9 of seconds. Fig. 12B shows that for a length of Δ=10 in the eth_training data set -6 The event density results for each event sequence for the same sub-period 9 of seconds are in megaevents per second (Mega event per second, mev/s). The results show that the efficiency of the proposed representation is 5.8 times higher than the sensor representation. The performance of the proposed method is related to the event density in each sequence, since in ETH _ Training, a large number of mega events per second (Mega event per second, mev/s) are captured using a high-speed moving car.
In summary, the disclosed method proposes a context-based efficient lossless image codec for encoding event data, and a more efficient way of storing event data. The proposed coding method employs a method of coding a location where at least one event occurs and then coding an EF index in an EF stack. Experimental evaluations using four frame rates showed an average performance improvement of 20.66% compared to FLIF and 70.68% compared to HEVC. When all events are collected by an event frame, the efficiency of the proposed representation is 5.8 times higher than the sensor camera event representation.
Although not shown in the drawings, the present invention also extends to a computer-based system comprising: the event camera 1 as described in the above example for recording an event stream 3; at least one processor is coupled to the storage device and is configured to convert the event stream 3 into a plurality of event frames 8, as described above. The storage device may further include instructions that, when executed by the processor, cause the computer-based system to merge information from the plurality of event frames 8 into a combined event frame 11, or process information from the plurality of event frames 8, by performing a method according to one of the examples described above.
Accordingly, the processor may process images and/or data related to one or more functions described in the present disclosure. In some embodiments, the processor may include a central processing unit (central processing unit, CPU), application-specific integrated circuit (ASIC), application-specific instruction set processor (ASIP), graphics processing unit (graphics processing unit, GPU), physical processing unit (physics processing unit, PPU), digital signal processor (digital signal processor, DSP), field-programmable gate array (field programmable gate array, FPGA), programmable logic device (programmable logic device, PLD), controller, microcontroller unit, reduced instruction set computer (reduced instruction-set computer, RISC), microprocessor, or the like, or any combination thereof.
In some embodiments, the processor may also be used to control a display device of a computer-based system (not shown) to display any raw or encoded data received from the event camera 1. The display may include a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) based display, a flat panel display or curved screen, a rollable or flexible display panel, a Cathode Ray Tube (CRT), or a combination thereof.
In some embodiments, the processor may also be used to control an input device (not shown) to receive user input. The input device may be a keyboard, touch screen, mouse, remote control, wearable device, etc. or a combination thereof. The input device may include a keyboard, a touch screen (e.g., with haptic feedback, etc.), voice input, eye-tracking input, a brain monitoring system, or any other similar input mechanism. Input information received through the input device may be transferred to the processor for further processing. Other types of input devices may include cursor control devices, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to a processor, etc., and for controlling cursor movement on a display device.
The storage device may be used to store data acquired directly from the event camera 1 and any other cameras; and/or data processed from the processor. In some embodiments, the storage device may store images received from the respective cameras and/or processed images received from the processor in different formats, including: bmp, jpg, png, tiff, gif, pcx, tga, exif, fpx, svg, psd, cdr, pcd, dxf, ufo, eps, ai, raw, WMF, etc., or any combination thereof. In some embodiments, the storage device may store an algorithm to be applied in the processor, such as the encoding algorithm described in any of the examples above. In some embodiments, the storage device may include a mass storage device, a removable storage device, volatile read-write memory, read-only memory (ROM), and the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like.
Although not explicitly shown, the storage device and the processor may also be implemented as part of a server in data connection with the event camera 1 or a client device (e.g., an autonomous vehicle) connected to the event camera 1 via a network, which may send an event stream to the server via the network. The server may then run the encoding algorithm described above. In some embodiments, the network may be any type of wired network or wireless network or combination thereof. For example, the network may include a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a local area network (local area network, LAN), a wide area network (wide area network, WAN), a wireless local area network (wireless local area network, WLAN), a metropolitan area network (metropolitan area network, MAN), a wide area network (wide area network, WAN), a public switched telephone network (public telephone switched network, PSTN), a bluetooth network, a ZigBee network, a near field communication (near field communication, NFC) network, and the like, or any combination thereof.
Various aspects and implementations have been described herein in connection with various embodiments. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored or distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems.
Reference signs used in the claims shall not be construed as limiting the scope.
Claims (27)
1. A method, the method comprising:
receiving an event stream (3) from an event camera (1); wherein the event stream (3) comprises a plurality of events (4), and each event comprises spatial information (5), a timestamp (7) and polarity information (6) associated with a brightness change;
-converting said event stream (3) received over a first period of time into a plurality of event frames (8) by: dividing the first period into a plurality of sub-periods (9), each sub-period (9) corresponding to an event frame (8); -assigning an event frame symbol (10) to each pixel of each event frame (8), each event frame symbol (10) representing the presence and polarity information (6) of an event (4) detected within a respective sub-period (9) and being assigned based on the spatial information (5) and a timestamp (7) of the detected event (4);
-generating a combined event frame (11) by merging a plurality of event frames (8), the merging of the plurality of event frames (8) comprising converting all the event frame symbols (10) assigned to corresponding pixels of the plurality of event frames (8) into a combined event frame symbol (12) of the corresponding pixels of the event frame (11).
2. The method according to claim 1, characterized in that the event frame symbols (10) are combined into a combined event frame symbol (12) according to the following formula:
CEF=k n-1 EF i-(n-1) +(…)+k 2 EF i-2 +k 1 EF i-1 +EF i
CEF denotes the combined event frame symbols (12), EF denotes the event frame symbols (10), k is the number of event frame symbols (10), and n is the number of event frames (8) to be combined into the combined event frame (11).
3. The method according to claim 2, characterized in that the number of event frame symbols (10) is k = 3, representing the polarity information (6) as:
a positive polarity event (4), wherein an increase in brightness is detected at a respective pixel of the event frame (8);
-a negative polarity event (4), wherein a decrease in brightness is detected at a respective pixel of the event frame (8); or alternatively
-no event (4) is detected at a respective pixel of the event frame (8) within a respective sub-period (9);
the number of combined event frame symbols (12) is 3 n N is the number of event frames (8) to be combined into a combined event frame (11).
4. A method according to claim 3, characterized in that the number of event frames (8) to be combined into a combined event frame (11) is n = 5, the event frame symbols (10) being combined into a combined event frame symbol (12) according to the following formula:
CEF=3 4 EF i-4 +3 3 EF i-3 +3 2 EF i-2 +3 1 EF i-1 +EF i
5. The method according to any one of claims 1 to 4, further comprising encoding a plurality of subsequent combined event frames (11) by:
collecting a plurality of combined event frames (11) in a video sequence (13) and encoding said video sequence (13) using a video coding standard, such as high efficiency video coding (High Efficiency Video Coding, HEVC); or alternatively
Each combined event frame (11) is encoded into a single original image (14) using a lossless image compression codec, such as a Context-based adaptive lossless image codec (Context-based Adaptive Lossless Image Codec, CALIC) or a free lossless image format (Free Lossless Image Format, FLIF) codec.
6. A method, the method comprising:
-receiving (101) an event stream (3) from an event camera (1); the event stream (3) comprises a plurality of events (4), and each event (4) comprises spatial information (5), a timestamp (7) and polarity information (6) associated with a brightness change;
-converting (102) the event stream (3) received over a first period of time into a plurality of event frames (8) by: dividing the first period into a plurality of sub-periods (9), each sub-period (9) corresponding to an event frame (8); -merging the spatial information (5) and polarity information (6) of the detected event (4) into an event frame (8) based on the respective timestamp (7) of the event (4);
-storing (103) the spatial information (5) and the polarity information (6) of the event (4) from the plurality of event frames (8) in separate data structures optimized for the respective information type to be stored.
7. The method according to claim 6, wherein storing the spatial information (5) comprises merging spatial information (5) from the plurality of event frames (8) into a single event map image (15) stored as a set of image bit planes (25).
8. The method according to claim 7, characterized in that the spatial information (5) included in the event map image (15) is stored using a combination of:
-a binary map (18) comprising binary map symbols (19) assigned to each pixel of said binary map (18) to indicate the position in a corresponding pixel of any one of said plurality of event frames (8) at which at least one event (4) has occurred;
a category index (20) representing the number nrEv of events (4) occurring at respective pixels of said binary map (18) indicated by binary map symbols (19);
an event frame index (23) representing a single event frame (8) of the n event frames (8) in which any event (4) has occurred at a corresponding pixel of the binary map (18) indicated by a binary map symbol (19).
9. The method according to claim 8, characterized in that the binary map (18) is encoded using a template context model;
for each pixel of the binary map (18), determining a causal neighborhood (26) of a corresponding binary map symbol;
estimating the code length required to encode the binary map (18) using a different number of neighborhoods collected in the order of the causal neighborhood (26), thereby determining an optimal model order;
each binary map symbol (19) is encoded based on a distribution model associated with a context index calculated using an optimal order neighborhood found in its respective causal neighborhood (26) to encode an optimal model order, and then used to encode the binary map (18) traversed in raster scan order.
10. The method according to claim 8 or 9, characterized in that the class index (20) is represented using class index symbols (21) in an alphabet (28), each class index symbol (21) comprising p bit planes (25); encoding the category index (20) includes encoding the category index symbol (21) bit-wise:
using a template context model (the template context model is based on a slave first bit plane (25) The context of each bit calculation of the class index symbol (21) determined by its corresponding causal neighborhood (26) on) encodes the first bit represented by said each class index symbol (21) to encode said first bit of said class index (20) in a raster scanPlane (25) (denoted +.>) Coding;
for each subsequent bit plane (25) (denoted as) Encoding, wherein +.>And at least one bit plane from its previous one +.>To determine the respective context, wen Moban (27).
11. Method according to claim 10, characterized in that for each subsequent bit plane (25) having at least two previous bit planes (25)Using the bit-planes (25) from the previous two>And->The context Wen Moban (27) of (c) to determine the corresponding context.
12. The method according to any of claims 8 to 11, wherein the event frame index (23) comprises an event frame index symbol (24) representing a single event frame (8) of the plurality of event frames (8) in which any event (4) occurred at a respective pixel;
each event frame index (23) is encoded according to the class index (20) of the respective pixel, the class index (20) representing, for each class, a total number nrEv of events (4) detected at the respective pixel during the plurality of event frames (8);
Dividing the alphabet (28) into n sub-alphabets (29), and associating a sub-alphabet (29) with each category;
remapping the event frame index symbols (24) to remapped symbols (30) of the respective sub-alphabets (29) based on the respective category indices (20);
-associating class symbols (22) based on the respective class indexes (20).
13. The method of claim 12, wherein the sub-alphabets (29) include different numbers of symbols, and each sub-alphabet (29) is associated with a category based on a number of symbols required to remap the event frame index symbols (24) for the corresponding category.
14. The method according to claim 13, wherein each event frame index (23) comprises-symbol combinations, wherein n represents the number of event frames (8), nrEv corresponds to said category index (20) representing the total number of events (4) occurring in said n event frames (8), each sub-alphabet (29) thus comprising all->And a symbol.
15. The method according to any of the claims 12 to 14, characterized in that the remapping symbol (30) and the associated class symbol (22) are combined into n class vectors (31), the number n of class vectors (31) corresponding to the number n of event frames (8), the first n-1 class vectors (31) being encoded using an adaptive markov model, the last class vector (31) being associated with a deterministic case, wherein an event (4) occurs in each event frame (8).
16. The method according to any one of claims 6 to 15, wherein storing the polarity information (6) from the plurality of event frames (8) comprises merging polarity information (6) from each event frame (8) into a polarity vector (16) comprising binary polarity symbols (17) determined based on a brightness increase or decrease during a detected event (4).
17. The method according to claim 16, characterized by concatenating the polarity symbols (17) from each polarity vector (16) of the plurality of event frames (8) into a concatenated polarity vector (32).
18. The method according to claim 16, characterized in that the polarity information (6) from the n event frames (8) is stored in n respective polarity vectors PV by traversing the spatial information (5) of the event frames (8) bit plane by bit plane or event frame by event frame i (16) Is a kind of medium.
19. The method according to claim 18, characterized in that the respective event frames (8) EF received over n sub-periods (9) are followed i=1→n Is sequentially for the n polarity vectors (16) PV i Encoding each polarity vector (16) PV i Corresponding to the corresponding event frame (8) EF i Is encoded by:
Encoding into vectors using adaptive markov modeling; or alternatively
Traversing the event map image (15) using binary mapping (18) and encoding each polarity symbol (17) using a template context model, each PV i The corresponding context of the symbol is to use the data from the current event frame (8) EF i Is determined by its corresponding causal neighborhood (26) of polarity symbols (17), at least one previous event frame (8) EF i-1 Is used as the corresponding upper and lower Wen Moban (27) The method comprises the steps of carrying out a first treatment on the surface of the For each event frame (8) EF having at least two previous event frames (8) i Using the data from the first two event frames (8) EF i-1 And EF i-2 Upper and lower Wen Moban (27) of (2) to determine the current polarity vector (16) PV i The corresponding context of the current polarity symbol (17).
20. The method according to any of the claims 6 to 19, characterized in that the spatial information (5) and the polarity information (6) from the N event frames (8) are encoded using a sparse coding mode (Sparse Coding Mode, SCM), wherein if the total number N of events (4) detected within N sub-periods (9) is below an event threshold, SCM is activated and at least one of the spatial information (5) or the polarity information (6) is encoded using a lower complexity coding method; otherwise, SCM is not activated and at least one of the spatial information (5) or the polarity information (6) is encoded according to the method of any of claims 6 to 19.
21. The method according to claim 20, wherein in response to determining that the total number of events (4) is below a first event threshold N<ET 1 Activating (104) a sparse coding mode to code the spatial information (5), as follows:
one bit is always used to indicate whether the sparse coding mode is active or off;
if the sparse coding mode is active, then:
use of log in step (105) 2 (ET 1 ) Bits encode N;
for each event (4) e i In step (106) the spatial information (5) is encoded as follows: log of use 2 (H) Bit pair x i Coding; log of use 2 (W) bit pair y i Coding; log of use 2 (n) bits encode an event frame index (23) in step (107), where H is the height of the event camera sensor (1) and W is the width of the event camera sensor (1).
22. The method according to claim 20 or 21, wherein in response to determining that the total number of events (4) is below a second event threshold N<ET 2 In step (108) a sparse coding mode is activated to code the polarity information (6), as follows:
-concatenating in step (109) the polarity symbols (17) from each polarity vector (16) of the plurality of event frames (8) into a single concatenated polarity vector (32);
The serial polarity vector (32) is encoded using a 0-order Markov model in step (110).
23. The method according to any one of claims 6 to 22, characterized in that the number of event frames (8) is in the range of 1 to 32 and depends on the length Δ of the subinterval (9).
24. A computer-based system, comprising:
an event camera (1) for recording an event stream (3);
a processor coupled to the storage device for converting the event stream (3) into a plurality of event frames (8);
the storage device comprising instructions that, when executed by the processor, cause the computer-based system to incorporate information in the plurality of event frames (8) into a combined event frame (11) by performing the method according to any of claims 1 to 5.
25. A computer-based system, comprising:
an event camera (1) for recording an event stream (3);
a processor coupled to the storage device for converting the event stream (3) into a plurality of event frames (8);
the storage device includes instructions that, when executed by the processor, cause the computer-based system to process information from the plurality of event frames (8) according to the method of any one of claims 6 to 23.
26. A non-transitory computer readable medium having program instructions stored thereon, which when executed by a processor, cause the processor to perform the method according to any of claims 1 to 5.
27. A non-transitory computer readable medium having program instructions stored thereon, which when executed by a processor, cause the processor to perform the method according to any of claims 6 to 23.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2022/054639 WO2023160789A1 (en) | 2022-02-24 | 2022-02-24 | Context-based lossless image compression for event camera |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117897956A true CN117897956A (en) | 2024-04-16 |
Family
ID=80937317
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280059339.1A Pending CN117897956A (en) | 2022-02-24 | 2022-02-24 | Context-based event camera lossless image compression |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117897956A (en) |
WO (1) | WO2023160789A1 (en) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102373261B1 (en) * | 2017-09-28 | 2022-03-10 | 애플 인크. | Systems and methods for processing event camera data |
-
2022
- 2022-02-24 WO PCT/EP2022/054639 patent/WO2023160789A1/en active Application Filing
- 2022-02-24 CN CN202280059339.1A patent/CN117897956A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2023160789A1 (en) | 2023-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10462476B1 (en) | Devices for compression/decompression, system, chip, and electronic device | |
CN113574883B (en) | Video compression using depth generative models | |
CN110059772B (en) | Remote sensing image semantic segmentation method based on multi-scale decoding network | |
CN110225341B (en) | Task-driven code stream structured image coding method | |
US20200160565A1 (en) | Methods And Apparatuses For Learned Image Compression | |
Bi et al. | Spike coding for dynamic vision sensors | |
CN113574882A (en) | Video compression using depth generative models | |
KR20180131073A (en) | Method and apparatus for processing multiple-channel feature map images | |
CN113284203B (en) | Point cloud compression and decompression method based on octree coding and voxel context | |
KR20230098222A (en) | Solid-state imaging devices and electronic devices | |
US20240221230A1 (en) | Feature map encoding and decoding method and apparatus | |
WO2022100173A1 (en) | Video frame compression method and apparatus, and video frame decompression method and apparatus | |
CN108965873A (en) | A kind of adaptive division methods of pulse array coding | |
CN117897956A (en) | Context-based event camera lossless image compression | |
CN109474825B (en) | Pulse sequence compression method and system | |
CN109379590B (en) | Pulse sequence compression method and system | |
CN116843774A (en) | Point cloud data compression method, device, equipment and storage medium | |
US20050232362A1 (en) | High-speed image compression apparatus using last non-zero detection circuit | |
Huang et al. | Multi-channel multi-loss deep learning based compression model for color images | |
CN114743138A (en) | Video violent behavior recognition model based on 3D SE-Densenet network | |
Alexandre et al. | Learned image compression with soft bit-based rate-distortion optimization | |
CN112822493A (en) | Adaptively encoding video frames based on complexity | |
Shanmathi et al. | Comparative study of predictors used in lossless image compression | |
KR20230075248A (en) | Device of compressing data, system of compressing data and method of compressing data | |
CN109525848B (en) | Pulse sequence compression method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |