WO2020115571A1

WO2020115571A1 - A system and method for video compression using key frames and sums of absolute differences

Info

Publication number: WO2020115571A1
Application number: PCT/IB2019/055387
Authority: WO
Inventors: Anirudha Shrikant Kurhade; Shrikrishna Patwardhan
Original assignee: Kpit Technologies Limited
Priority date: 2018-12-07
Filing date: 2019-06-26
Publication date: 2020-06-11

Abstract

According to an embodiment, a system for compressing a video receives a series of video frames and identifies one or more key frames from the series of video frames based on a pitch. The system computes sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames. Further, the system generates a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

Description

A SYSTEM AND METHOD FOR VIDEO COMPRESSION USING KEY FRAMES

AND SUMS OF ABSOLUTE DIFFERENCES

TECHNICAL FIELD

[0001] The present invention relates to a system and method for video compression and in particular, the present invention relates to video compression based on relevant information in the video frames and annotation thereof.

BACKGROUND

[0002] With the rapid development in internet technology, the video communication is becoming quite popular. The raw video takes up a lot of space in the memory and their transmission too requires significant time. Further, the transfer of uncompressed video over network requires very high bandwidth. In order to overcome these issues, videos are compressed before transmission. Video compression is the technique of reducing and removing the redundant video data, so that it consumes less space in the memory and can be transmitted over the network with reduced bandwidth. Video compression plays an important role in many applications like television, surveillance, operating theatres, unmanned aerial vehicles, etc.

[0003] In one of the exemplary application, video compression technique is used to compress the surrounding video captured by a camera mounted on an autonomous vehicle. Autonomous vehicles are fitted with systems that control the manoeuvring of the vehicles without requiring human intervention. Such vehicles use variety of sensors such as camera, lidar, radar, global positioning system (GPS), etc. for monitoring the surrounding environment. The control system in the vehicle uses this sensory information to identify appropriate navigation path, obstacles in the path and cruise accordingly. In this case, the decision making is based on a pre-programmed rule set in the controller.

[0004] Now days, in many of the autonomous vehicles, supervised machine learning approaches like deep learning is used for vehicle autonomy. Such a system is capable of own decision making based on the surrounding environment that it encounters. Such systems require human- annotated training data, which can be used to train the system by providing samples of desired performance. The camera-based sensor is used to capture the video of surrounding environment. Further, the video data is sent to the annotation tool for annotation of different objects visible in the video. The annotated video is further used for deep learning phase.

[0005] However, many a times there is a lot of repetitive data / frames (which contribute to no or almost no new information in terms of objects in front of the vehicle) in these videos, which increases the video file size, processing time required for video annotation as well as training time required for the deep learning phase because of huge amount of repetitive data in the videos.

[0006] Currently, different techniques are used to reduce this redundant data in the video. In one of the technique, FFmpeg tool is used for new scene detection. The tool incorporates sum of absolute difference (SAD) calculation for every frame of the video, for identifying the new scene. However, since there is gradual change of information in consecutive frames, the SAD index between two consecutive frames cannot represent the change of relevant information effectively, because sometimes in consecutive frames, the change of pixel intensities due to relevant change is suppressed by the changes due to noise.

[0007] In another technique, a contemporary approach for foreground-background separation like classical frame-differencing, mean filter and Gaussian approach is used for new scene detection, in case when the camera is stationary. But when the camera is moving, these techniques do not give insightful results since the background too is in relevant motion with respect to the moving camera.

[0008] Also, currently, the complete video is used in annotation phase in which all the frames are annotated, and the redundant frames can be deleted manually. This results in increase in processing time of the video. Further, identifying and eliminating the unwanted frames manually is extremely tedious.

[0009] Therefore, there is a need for a system and method for intelligent video compression by reducing the redundant frames in the video. Also, there is a need of a system which dynamically reduces the redundant frames based on the relevant information. Further, there is also a need of a system, which reduces the processing time for annotation as well as training time required for neural network model during deep learning phase.

OBJECTS OF THE DISCLOSURE [0010] It is an object of the present disclosure to provide a system and method for video compression based on relevant information in the video frames.

[0011] Yet another object of the present disclosure is to provide a system and method for video compression by dynamically reducing the redundant frames of the video.

[0012] Yet another object of the present disclosure is to provide a system and method for video compression for selecting frames from video data based on the relevance index.

[0013] Yet another object of the present disclosure is to provide a system and method for video compression without affecting the frame quality and crucial information of the video.

[0014] Yet another object of the present disclosure is to provide annotation of the compressed video.

[0015] Yet another object of the present disclosure is to reduce the processing time required for annotation.

[0016] Yet another object of the present disclosure is to reduce the training time required for neural network model during deep learning phase.

[0017] Yet another object of the present disclosure is to reduce the overall cost of the system.

[0018] Still another object of the present disclosure is to provide a fast and adaptive video compression.

SUMMARY

[0019] This summary is provided to introduce simplified concepts of a system and method for video compression, which are further described below in the Detailed Description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended for use in determining/limiting the scope of the claimed subject matter.

[0020] The present invention relates to a system and method for video compression and in particular the present invention relates to video compression based on relevant information in the video frames. [0021] An aspect of the present disclosure relates to a system for compressing a video, the system comprising: an input unit comprising an image sensor for capturing the video comprising a series of video frames; a processing unit comprising a processor coupled with a memory, the memory storing instructions executable by the processor to: receive the series of video frames and identify one or more key frames from the series of video frames based on a pitch; compute sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and generate a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

[0022] In an embodiment, on computing SAD value associated with each key frame of the one or more key frames, the processor normalizes the SAD value of the key frame by dividing the SAD value by a second highest SAD value determined from the SAD values associated with the one or more key frames.

[0023] In an embodiment, on normalization of the SAD value of each key frame of the one or more key frames, the processor determines DRI value of each key frame based on any or a combination of the SAD value of the key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames.

[0024] In an embodiment, the system is implemented in a vehicle for compressing the video captured by the image sensor and pertaining to surroundings of the vehicle.

[0025] In an embodiment, on identifying the one or more key frames, the processor performs any or a combination of blurring, contrast enhancement operations and pixel scaling on each key frame of the one or more key frames.

[0026] In an embodiment, the processor annotates the set of video frames of the compressed videos while tracking relevant objects in the original video.

[0027] In an embodiment, the processor dynamically reduces plurality of redundant frames from the series of frames based on relevant information to identify the set of video frames.

[0028] Another aspect of the present disclosure relates to a method for compressing a video, carried out according to instructions stored in a computer, comprising: receiving a video comprising a series of video frames captured by an image sensor and identifying one or more key frames from the series of video frames based on a pitch; computing sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and generating a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

[0029] Various objects, features, aspects and advantages of the present disclosure will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like features.

[0030] Within the scope of this application it is expressly envisaged that the various aspects, embodiments, examples and alternatives set out in the preceding paragraphs, in the claims and/or in the following description and drawings, and in particular the individual features thereof, may be taken independently or in any combination. Features described in connection with one embodiment are applicable to all embodiments, unless such features are incompatible.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] The accompanying drawings are included to provide a further understanding of the present invention and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present invention and, together with the description, serve to explain the principles of the present invention.

[0032] FIG. 1 illustrates architecture of a system for video compression to illustrate its overall working in accordance with an embodiment of the present disclosure.

[0033] FIG. 2 illustrates exemplary modules of processing unit in accordance to an embodiment of the present invention.

[0034] FIG. 3 illustrates exemplary graph for DRI computation for different values of factor in accordance to an embodiment of the present invention.

[0035] FIG. 4 illustrates an exemplary flowchart of video compression in accordance to an embodiment of the present invention. DETAILED DESCRIPTION

[0036] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.

[0037] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.

[0038] Embodiments of the present invention include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special- purpose processor programmed with the instructions to perform the steps. Alternatively, steps may be performed by a combination of hardware, software, and firmware and/or by human operators.

[0039] Various methods described herein may be practiced by combining one or more machine-readable storage media containing the code according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the invention could be accomplished by modules, routines, subroutines, or subparts of a computer program product.

[0040] If the specification states a component or feature“may”,“can”,“could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.

[0041] As used in the description herein and throughout the claims that follow, the meaning of“a,”“an,” and“the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of“in” includes“in” and“on” unless the context clearly dictates otherwise.

[0042] Exemplary embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown. These exemplary embodiments are provided only for illustrative purposes and so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those of ordinary skill in the art. The invention disclosed may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Various modifications will be clear to persons skilled in the art. The general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Moreover, all statements herein reciting embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure). Also, the terminology and phraseology used is for the purpose of describing exemplary embodiments and should not be considered limiting. Thus, the present invention is to be accorded the widest scope encompassing numerous alternatives, modifications and equivalents consistent with the principles and features disclosed. For purpose of clarity, details relating to technical material that is known in the technical fields related to the invention have not been described in detail so as not to unnecessarily obscure the present invention.

[0043] Thus, for example, it will be appreciated by those of ordinary skill in the art that the diagrams, schematics, illustrations, and the like represent conceptual views or processes illustrating systems and methods embodying this invention. The functions of the various elements shown in the figures may be provided using dedicated hardware as well as hardware capable of executing associated software. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the entity implementing this invention. Those of ordinary skill in the art further understand that the exemplary hardware, software, processes, methods, and/or operating systems described herein are for illustrative purposes and, thus, are not intended to be limited to any particular named element. [0044] Embodiments of the present invention may be provided as a computer program product, which may include a machine-readable storage medium tangibly embodying thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process. The term“machine-readable storage medium” or“computer-readable storage medium” includes, but is not limited to, fixed (hard) drives, magnetic tape, floppy diskettes, optical disks, compact disc read-only memories (CD-ROMs), and magneto-optical disks, semiconductor memories, such as ROMs, PROMs, random access memories (RAMs), programmable read-only memories (PROMs), erasable PROMs (EPROMs), electrically erasable PROMs (EEPROMs), flash memory, magnetic or optical cards, or other type of media/machine-readable medium suitable for storing electronic instructions (e.g., computer programming code, such as software or firmware). A machine-readable medium may include a non-transitory medium in which data may be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-program product may include code and/or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.

[0045] Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.

[0046] Systems depicted in some of the figures may be provided in various configurations. In some embodiments, the systems may be configured as a distributed system where one or more components of the system are distributed across one or more networks in a cloud computing system. [0047] Each of the appended claims defines a separate invention, which for infringement purposes is recognized as including equivalents to the various elements or limitations specified in the claims. Depending on the context, all references below to the "invention" may in some cases refer to certain specific embodiments only. In other cases it will be recognized that references to the "invention" will refer to subject matter recited in one or more, but not necessarily all, of the claims.

[0048] All methods described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.

[0049] Various terms as used herein are shown below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.

[0050] This summary is provided to introduce simplified concepts of a system and method for vehicle detection, which are further described below in the Detailed Description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended for use in determining/limiting the scope of the claimed subject matter.

[0051] The present invention relates to a system and method for video compression and in particular the present invention relates to video compression based on relevant information in the video frames.

[0052] An aspect of the present disclosure relates to a system for compressing a video, the system comprising: an input unit comprising an image sensor for capturing the video comprising a series of video frames; a processing unit comprising a processor coupled with a memory, the memory storing instructions executable by the processor to: receive the series of video frames and identify one or more key frames from the series of video frames based on a pitch; compute sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and generate a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

[0053] In an embodiment, on computing SAD value associated with each key frame of the one or more key frames, the processor normalizes the SAD value of the key frame by dividing the SAD value by a second highest SAD value determined from the SAD values associated with the one or more key frames.

[0054] In an embodiment, on normalization of the SAD value of each key frame of the one or more key frames, the processor determines DRI value of each key frame based on any or a combination of the SAD value of the key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames.

[0055] In an embodiment, the system is implemented in a vehicle for compressing the video captured by the image sensor and pertaining to surroundings of the vehicle.

[0056] In an embodiment, on identifying the one or more key frames, the processor performs any or a combination of blurring, contrast enhancement operations and pixel scaling on each key frame of the one or more key frames.

[0057] In an embodiment, the processor annotates the set of video frames of the compressed videos while tracking relevant objects in the original video.

[0058] In an embodiment, the processor dynamically reduces plurality of redundant frames from the series of frames based on relevant information to identify the set of video frames.

[0059] Another aspect of the present disclosure relates to a method for compressing a video, carried out according to instructions stored in a computer, comprising: receiving a video comprising a series of video frames captured by an image sensor and identifying one or more key frames from the series of video frames based on a pitch; computing sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and generating a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

[0060] In one of the exemplary embodiment, the system and method of the present invention is used for an automobile application. The present invention is used to compress videos captured by any vision-based detection unit mounted on the vehicle. The system of the present invention compresses the video by eliminating redundant data from the video. The annotated compressed video may be further transferred to train the neural network model. The trained neural network model can be transferred to vehicle which then executes the automated functionalities including but not limited to collision avoidance, automated braking, etc.

[0061] FIG. 1 illustrates architecture of a system for video compression to illustrate its overall working in accordance with an embodiment of the present disclosure.

[0062] According to an embodiment, a system for video compression comprises an input unit 102, a processing unit 104 and an annotation tool 106. The input unit 102 may comprise an image sensor or a camera configured to capture video comprising series of video frames. In an implementation, the input unit 102 may be configured in a vehicle (interchangeably referred to as host vehicle, hereinafter) to capture video of surrounding of the vehicle. For example, the image sensor or the camera may be placed below rear-view mirror of the host vehicle. The processing unit 104 may comprise a processor and a memory and/or may be integrated with existing systems and controls of a host vehicle to form an advanced driver assistance system (ADAS), or augment an existing ADAS. For instance, signals generated by the processing unit 104 may be sent to engine control unit (ECU) of the host vehicle and may aid in emergency braking of the host vehicle. Further, the annotation tool 106 may be coupled with an output unit that may be a display device or any other audio visual device.

[0063] According to an embodiment, the captured surrounding video may be transferred to the processor 103 through any means including, but not limited to, wire transfer, wireless transfer, etc. The processor could be any type of device capable of processing electronic instructions including but not limited to, host computer, application specific integrated circuits (ASICs), local server, cloud server, smart device, etc.

[0064] In an embodiment, during key frame identification 108, the processing unit 104 receives the series of video frames and identifies one or more key frames from the series of video frames based on a pitch. Further, the processing unit 104may perform any or a combination of blurring, contrast enhancement operations and pixel scaling on each key frame of the one or more key frames.

[0065] In an embodiment, during key frame processing 110, the processing unit 104 computes sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames. On computing SAD value associated with each key frame of the one or more key frames, the processing unit 104 may normalize the SAD value of the key frame by dividing the SAD value by a second highest SAD value determined from the SAD values associated with the one or more key frames. Further, on normalization of the SAD value of each key frame of the one or more key frames, the processing unit 104 may determine DRI value of each key frame based on any or a combination of the SAD value of the key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames.

[0066] In an embodiment, during compressed video generation 112, the processing unit 104 generates a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

[0067] In an embodiment, the annotation tool 106 annotates the set of video frames of the compressed videos while tracking relevant objects in the original video.

[0068] FIG. 2 illustrates exemplary modules of a processing unit 104 in accordance with an embodiment of the present disclosure.

[0069] In an aspect, the processing unit 104 may comprise one or more processor(s) 202. The one or more processor(s) 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) 202 are configured to fetch and execute computer-readable instructions stored in a memory 206 of the processing unit 104. The memory 206 may store one or more computer-readable instructions or routines, which may be fetched and executed to create or share the data units over a network service. The memory 206 may comprise any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like. [0070] The processing unit 104 may also comprise an interface(s) 204. The interface(s) 204 may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 204 may facilitate communication of processing unit 104 with various devices coupled to the processing unit 104 such as the input unit 102 and the output unit 106. The interface(s) 204 may also provide a communication pathway for one or more components of the processing unit 104. Examples of such components include, but are not limited to, processing module(s) 208 and database 210.

[0071] The processing module(s) 208 may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing module(s) 208. In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing module(s) 208 may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing module(s) 208 may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine- readable storage medium may store instructions that, when executed by the processing resource, implement the processing module(s) 208. In such examples, the processing unit 104 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to processing unit 104 and the processing resource. In other examples, the processing module(s) 208 may be implemented by electronic circuitry.

[0072] The database 210 may comprise data that is either stored or generated as a result of functionalities implemented by any of the components of the processing module(s) 208.

[0073] In an exemplary embodiment, the processing module(s) 208 may comprise a key frame identification module 212, a key frame processing module 214, a compressed video generation module 216, a video annotation module 218 and other modules 220.

[0074] It would be appreciated that modules being described are only exemplary modules and any other module or sub-module may be included as part of the system 100 or the processing unit 104. These modules too may be merged or divided into super-modules or sub-modules as may be configured. Key Frame Identification Module 212

[0075] In an embodiment, the key frame identification module 212 receives a video comprising a series of video frames, from one or more image sensors of an input unit, which may be implemented in a vehicle for compressing the video pertaining to surroundings of the vehicle. Further, the key frame identification module 212 identifies one or more key frames from the series of video frames based on a pitch.

[0076] According to an embodiment, the video comprises series of frames at time instance (t). Frames separated by‘pitch’ time-steps starting from the first frame are called key frames. A template for each key frame is a frame from the input video at time instance (t- pitch) from the respective key frame, except for the first key frame. For first key frame the template is an image with all pixel values equal to zero. Pitch is the difference between two frame ids. For determination of key frames, the pitch is kept constant and the frames separated at a distance of‘pitch’ frames starting from the first frames are identified as key frames. For the first frame of the video, the template is assumed to have all zero values.

[0077] According to an example, the pitch is selected to be 12. It is to be noted that the pitch value is exemplary only and maybe changed as per the requirement of the application. The entire video data is segmented considering this pitch value so that the resulting output of the module 212 comprises the key frames separated by pitch value i.e. 12.

Key Frame Processing Module 214

[0078] In an embodiment, the key frame processing module 214 computes sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames.

[0079] According to an embodiment, the key frame processing module 214 processes the identified key frames by blurring, contrast enhancement and pixel scaling of each key frame. Further, those skilled in the art would appreciate that SAD is a standard method in image processing used for calculating the correlation of an image with some template. In context of the present example, the SAD values are computed between the identified key frames and their respective templates. According to an example, if the pitch is set to 1, the SAD does not give a good estimate of whether the objects in the frames have moved significantly, or whether new objects have appeared or whether some objects have left the frames. In such a case, when the pitch is 1, there is an arbitrarily small and gradual change in the frames and hence the SAD is almost similar. In order to overcome this drawback, the pitch may be selected to be greater than 1. The pitch, which is selected to be greater than 1, is enough for getting a scene change index that is robust of noise; however the scene change index is still static. Therefore, for compressing videos dynamically based on whether a portion of video contributes to new information or not, it is crucial to define a DRI.

[0080] According to an embodiment, a DRI is calculated for the identified key frames. Once the SAD is calculated for all key frames, the SAD values are normalized by dividing all SAD values with the second highest SAD value. The first frame always has the highest SAD value in a typical video because the template in case of first frame is an all-zeros matrix. This value is very high and skews the distribution of normalized SAD values towards the lower spectrum, which is undesirable. In order to avoid this, the SAD values are normalized by dividing all SAD values with the second highest SAD value.

[0081] In an embodiment, on normalization of the SAD value of each key frame of the one or more key frames, the processor determines DRI value of each key frame based on any or a combination of the SAD value of said key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames. After normalization, the median (medianSAD) and standard deviation (stdDev) of the normalized SAD values are computed. Further, the key frames may be divided into at least three sets based on their SAD value as below:

• For frames with normalized SAD values less than (medianSAD - 2*stdDev), the DRI of such key frames is set to zero.

• For frames with normalized SAD values between (medianSAD - 2* stdDev) and (medianSAD + stdDev), the DRI of the key frames is calculated as follows:

DRI = SAD + factor

The factor may be considered to be zero. However, if the overall compression rate needs to be scaled globally, the factor may be changed to a value greater than zero to decrease the compression and may be changed to a value less than zero to increase the compression. • For, frames with normalized SAD values above (medianSAD + stdDev), the DRI of the key frames may be set to 1.

Further, all the DRI values are clipped between 0 and 1. Hence, if the DRI value is less than 0, the DRI value may be set to 0 and if the DRI value is greater than 1, the DRI value may be set to 1.

[0082] FIG.3 illustrates an exemplary graph for DRI computation for different values of factor according to an embodiment of the present disclosure. As illustrated, the graph represents following:

• Solid black line is DRI calculation with factor = 0

• Dashed line 1 is DRI calculation with factor > 0 (reduced compression)

• Dashed line 2 is DRI calculation with factor < 0 (increased compression)

[0083] According to an embodiment, changing the pitch size does not result in change of compression factor. If for example, pitch size is increased to 24 from 12, the increased pitch size may not imply increased compression. If with pitch size 12, SAD between frames 0 and 12 is 0.5 and that between 12 and 24 is also 0.5, in this case 12 frames are selected between 0 and 24 (50% frames). If the SAD between 0 and 24 is 0.7, in this case approximately 17 frames are selected. If the SAD is 0.2 approximately 5 frames are selected. The relation between SAD and pitch size may be random. Hence, in order to avoid missing of any crucial frame, DRI of each key frame is computed.

Compressed Video Generation Module 216

[0084] In an embodiment, the compressed video generation module 216 generates a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames. According to an example, once the DRI values are computed for each key frame, some frames from a set of (pitch- 1) previous frames are selected, such that the fraction of selected frames from the previous (pitch- 1) number of frames is proportional to the DRI of the key frame. The DRI signifies the fraction of frames to be selected from the (pitch- 1) frames that appear before the key frame. This is performed for all the key frames except for the first frame where previous frames are not selected since they do not exist.

[0085] To synthesize the compressed video, all the remaining frames are dropped from the input video except for the selected frames and the key frames. According to an example, if the pitch is set to 2 and an input video has frames [1, 2, 3, 4, 5, 6], then key frames selected according to the difference of pitch are [1, 3, 5]. Suppose DRIs for key frames [1, 3, 5] are [1.0, 0.0, 1.0] respectively. In this case, the number of frames which are proportional to DRI are dropped between the two key frames. For example, for key frame 3, DRI is 0.0, which signifies to eliminate all frames between key frames 1 and 3, hence, frame 2 is eliminated. Similarly, the DRI for key frame 5 is 1.0, which signifies all frames between key frames 3 and 5 will be selected. In other words, frame 4 will be a part of compressed video. Accordingly, the compressed video contains the frames [1, 3, 4, 5]. Therefore, the above mentioned technique helps in identifying redundant frames and thereby eliminating the redundant frames in the compressed video.

Video Annotation Module 218

[0086] In an exemplary embodiment, the video annotation module 218 annotates the set of video frames of the compressed videos while tracking relevant objects in the original video. Those skilled in the art would appreciate that the video annotation module 218 may be a part of separate annotation tool; however, for ease in providing explanation, the video annotation module 218 may be considered as a part of the processing unit 104. According to an implementation, the compressed video is further sent to the annotation tool by any means including but not limited to wire transfer, wireless transfer, etc. However, if the compressed video is used directly in the annotation phase with standard object tracking algorithms likes KCF or Median Flow, the algorithms are not robust to considerably large displacements of the objects in the frames. In an exemplary embodiment, if a (4 pixel X 4 pixel) box moves from (0, 0) location to (6, 6) location, the tracking algorithm might not track the object at all or give an erroneous tracking. This might happen in the compressed video if two intermediate frames are dropped, where the object was at location (2, 2) and (4, 4) in the said input video. In this case, the standard tracking algorithms give superior performance on the said input video than the said compressed video. In order to overcome these drawbacks, the entire video is used for object tracking, but the video annotation module 218 displays only the frames of compressed video for annotation. The video annotation module 218 assesses the video frame- by-frame, and sends every frame for object tracking, but it displays only the frames of the compressed video for annotation purpose. The dropped frames are used by the tracking algorithm to interpolate and track the objects properly.

Supplementary Modules 220

[0087] In an aspect, supplementary modules 220implement functionalities that supplement applications or functions performed by the system 100, processing unit 104 or the processing module(s) 208.According to exemplary implementation, the supplementary module(s) 218 may receive the annotated compressed video from the video annotation module 218 to train a neural network model. Further, the trained neural network model can be utilized to execute automated functionalities of a vehicle including, but not limited to, collision avoidance, autonomous parking, automated braking, etc.

[0088] Although the proposed system has been elaborated as above to include all the main modules, it is completely possible that actual implementations may include only a part of the proposed modules or a combination of those or a division of those into sub-modules in various combinations across multiple devices that may be operatively coupled with each other, including in the cloud. Further the modules may be configured in any sequence to achieve objectives elaborated. Also, it may be appreciated that proposed system may be configured in a computing device or across a plurality of computing devices operatively connected with each other, wherein the computing devices may be any of a computer, a smart device, an Internet enabled mobile device and the like. Therefore, all possible modifications, implementations and embodiments of where and how the proposed system is configured are well within the scope of the present invention.

[0089] FIG. 4 illustrates an exemplary flowchart of video compression in accordance to an embodiment of the present invention.

[0090] In an aspect, the proposed method may be described in general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method can also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.

[0091] The order in which the method as described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method may be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described system.

[0092] In an embodiment, a method for compressing a video comprises at block 402, receiving a video comprising a series of video frames captured by an image sensor and identifying one or more key frames from the series of video frames based on a pitch. The method further comprises at block 404, computing SAD value associated with each key frame of the one or more key frames to determine DRI value associated with each key frame of the one or more key frames. Furthermore, the method comprises at block 406, generating a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

Exemplary Test Results

[0093] In an exemplary test, 26 videos recorded on streets and highways, were used for the testing performance of the system. The videos were saved in the AVI format for calculation of percentage compression. The factor in the DRI computation is assumed to be 0 for this analysis.

Compression Results:

[0094] Different videos were compressed by different amounts by the system, depending on the total amount of dynamic content in the videos. In the best case, a video was compressed by 71% and in the worst case; a video was compressed by 37%. [0095] Compression results are shown in the following table:

[0096] The difference in the compression percentages of memory and number of frames is seen because the memory overheads required by the AVI format remain unchanged in both cases, compressed and uncompressed. Thus, it can be concluded that the system was capable of compressing videos approximately by 50%.

Annotation Cost Analysis/ Deep Learning Accuracy Analysis:

[0097] Further, to check the effect of the video compression on the accuracy of a typical deep learning algorithm, accuracy of a deep learning model after training with uncompressed version of the 26 videos was compared with the accuracy of the model trained with compressed version of the same 26 videos. The compression done by the system had no adverse effect on the accuracy of deep learning models. Precision in both cases was approximately 51%. The system gave a 50% compression performance without affecting the quality of the relevant information content. This also proves that the system precisely drops the frames, which would not add to the learning of AI/Dccp learning systems, thus, resulting in saving of a lot of precious time required for annotation.

[0098] Although, the present invention is described with respect to autonomous vehicle industry, one skilled in the art would envision that the present invention is not limited only to autonomous vehicle industry but may also be used in any industrial application wherein relevance-based video compression is required. For example, the system and method of the present disclosure may also be used for video compression in the industries like, but not limited to, surveillance, film industry, education system, etc.

[0099] Those skilled in the art would appreciate that the embodiments of the present disclosure relate to a system and method for fast and adaptive video compression. Embodiments of the present disclosure dynamically reduce the redundant frames of the video and provide compressed video without affecting the frame quality and crucial information of the video. Further, embodiments herein provide annotation of the compressed video thereby reducing the time required for annotation and also reduce training time required for neural network model during deep learning phase.

[00100] Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C ....and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.

[00101] While some embodiments of the present disclosure have been illustrated and described, those are completely exemplary in nature. The disclosure is not limited to the embodiments as elaborated herein only and it would be apparent to those skilled in the art that numerous modifications besides those already described are possible without departing from the inventive concepts herein. All such modifications, changes, variations, substitutions, and equivalents are completely within the scope of the present disclosure. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims.

ADVANTAGES OF THE PRESENT DISCLOSURE

[00102] The present disclosure provides a system and method for video compression based on relevant information in the video frames.

[00103] The present disclosure provides a system and method for video compression by dynamically reducing the redundant frames of the video.

[00104] The present disclosure provides a system and method for video compression for selecting frames from video data based on the relevance index.

[00105] The present disclosure provides system and method for video compression without affecting the frame quality and crucial information of the video. [00106] The present disclosure provides annotation of the compressed video.

[00107] The present disclosure provides system and method for dynamically eliminating redundant frames based on relevant information.

[00108] The present disclosure provides system and method for video compression that reduces the processing time required for annotation.

[00109] The present disclosure provides system and method for video compression that reduces the training time required for neural network model during deep learning phase.

[00110] The present disclosure provides system and method for video compression that is cost effective.

[00111] The present disclosure provides system and method for fast and adaptive video compression.

Claims

We claim:

1. A system for compressing a video, said system comprising:

an input unit comprising an image sensor for capturing the video comprising a series of video frames;

a processing unit comprising a processor coupled with a memory, the memory storing instructions executable by the processor to:

receive the series of video frames and identify one or more key frames from the series of video frames based on a pitch;

compute sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and

generate a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

2. The system of claim 1, wherein on computing SAD value associated with each key frame of the one or more key frames, the processor normalizes the SAD value of said key frame by dividing said SAD value by a second highest SAD value determined from the SAD values associated with the one or more key frames.

3. The system of claim 2, wherein on normalization of the SAD value of each key frame of the one or more key frames, the processor determines DRI value of each key frame based on any or a combination of said SAD value of said key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames.

4. The system of claim 1, wherein the system is implemented in a vehicle for compressing the video captured by the image sensor and pertaining to surroundings of the vehicle.

5. The system of claim 1, wherein on identifying the one or more key frames, the processor performs any or a combination of blurring, contrast enhancement operations and pixel scaling on each key frame of the one or more key frames.

6. The system of claim 1, wherein the processor annotates the set of video frames of the compressed videos.

7. The system of claim 1, wherein the processor dynamically reduces plurality of redundant frames from the series of frames based on relevant information to identify the set of video frames.

8. A method for compressing a video, carried out according to instructions stored in a computer, comprising:

receiving a video comprising a series of video frames captured by an image sensor and identifying one or more key frames from the series of video frames based on a pitch;

computing sum of absolute differences (SAD) value associated with each key frame of the one or more key frames to determine dynamic relevance index (DRI) value associated with each key frame of the one or more key frames; and

generating a compressed video by identifying a set of video frames from the series of video frames, wherein each frame of the set of video frames is identified using the determined DRI value associated with at least one key frame of the one or more key frames.

9. The method of claim 8, wherein on computing the SAD value associated with each key frame of the one or more key frames, the method further comprises normalizing the SAD value of said key frame by dividing said SAD value by a second highest SAD value determined from the SAD values associated with the one or more key frames.

10. The method of claim 9, wherein on normalizing the SAD value of each key frame of the one or more key frames, the method further comprises determining DRI value of each key frame based on any or a combination of said SAD value of said key frame, median of the SAD values of the one or more key frames and standard deviation of the SAD values of the one or more key frames.