WO2023237919A1 - Cascade stages priority-based processing - Google Patents

Cascade stages priority-based processing Download PDF

Info

Publication number
WO2023237919A1
WO2023237919A1 PCT/IB2022/057387 IB2022057387W WO2023237919A1 WO 2023237919 A1 WO2023237919 A1 WO 2023237919A1 IB 2022057387 W IB2022057387 W IB 2022057387W WO 2023237919 A1 WO2023237919 A1 WO 2023237919A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frames
video frames
processing
frame
Prior art date
Application number
PCT/IB2022/057387
Other languages
French (fr)
Inventor
Harald Gustafsson
Ahmed Ali-Eldin HASSAN
Siddharth AMIN
Dean ATWINE
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2023237919A1 publication Critical patent/WO2023237919A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Embodiments of the invention relate to the field of processing; and more specifically, to cascade stages priority-based processing.
  • filtration can lower the amount of processing that is done based on one stream, that then can also lead to fewer frames being processed that are necessary.
  • having fewer frames is positive since the batch can be finished in a shorter duration and then the next batch can be processed.
  • the stream processing can be interleaved between streams and hence when one stream skips many frames it cannot be utilized without doing some cross-stream management.
  • a method including receiving, during a time window, video frames from video streams; deriving a number of video frames that can be processed by a video processing application of the video processing system based on available video processing resources; determining that the derived number of video frames that can be processed by the video processing application is less than the received plurality of video frames; performing a first ranking of the video frames based on, for each of the video streams, a set of one or more priority scores assigned to a set of one or more video frames of that stream received in a prior time window, where the first ranking includes a first set of one or more of the video frames that are selected for processing by the video processing application, a second set of one or more of the video frames selected for further preprocessing in a set of one or more preprocessing stages, and a third set of one or more of the video frames that are not selected for processing by the video processing application; performing the set of one or more preprocessing stages to assign a priority score to each
  • Deriving the number of video frames that can be processed by the video processing application may include multiplying an estimated number of frames per second that can be processed by the video processing application by the time window.
  • the selected video frames for processing at the video processing application may not exceed the derived number of video frames that can be processed by a video processing application.
  • the set of one or more preprocessing stages may include a difference detection stage that determines, for each of the second set of one or more of the video frames, a difference detection against a background frame, and where video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
  • the set of one or more preprocessing stages may include a difference detection stage that determines, for each of the second set of one or more of the video frames, a difference calculation against a previous frame of the video stream of that video frame, and where video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
  • the set of one or more preprocessing stages may include an object detection stage that determines whether, for each of the second set of one or more of the video frames, that video frame includes an object of interest, where video frames that include the object of interest have a higher priority value than video frames that do not include the object of interest.
  • the method may further include storing, for each of the streams corresponding to the second set of one or more of the video frames, the priority score assigned to that video frame. The priority score associated with each of the video streams may move to the middle over time.
  • one or more embodiments of a non-transitory computer-readable medium or distributed media containing computer-executable program instructions or code portions stored thereon are disclosed for performing one or more embodiments of the methods of the present invention when executed by a processor entity of an apparatus, an electronic device, or other computing device. Further features of the various embodiments are as claimed in the dependent claims.
  • Figure 1 is a block diagram that illustrates a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
  • Figure 2 is a flow diagram for a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
  • FIG. 3 is a block diagram of a host in accordance with various aspects described herein.
  • Figure 4 is a block diagram illustrating a virtualization environment in which functions implemented by some embodiments may be virtualized.
  • an elastic method handles low-latency real-time video processing of multiple video streams in a single video processing system where the video processing system does not have enough computational resources (e.g., memory, compute power, etc.) to process all the frames of the multiple video streams at a given time.
  • the elastic method includes continuously prioritizing which video frames of the multiple video streams to process and dropping those frames with relatively lower priority if there are not enough resources to process all the frames.
  • video frames that include humans may be prioritized over video frames that do not include humans.
  • video processing application includes detecting moving vehicles such as a car or truck
  • video frames that include movement of vehicles may be prioritized over video frames that do not.
  • the prioritization of video frames is based on the content of the video frames such as whether the video frames include an item of interest and/or indicate movement.
  • the video frames that are of interest to the video processing application are sometimes referred herein as frames of interest.
  • within the frames of interest there can be frames that are prioritized for processing over other frames due to their content.
  • all frames that include a human may be of interest, and of those frames those that include a human and a robot may be prioritized over those that do not include a robot.
  • a video processing system receives frames from multiple streams for a time window and prioritizes the frames for further processing at a video processing application.
  • the frames are ranked by priority and processed by the video processing system according to the ranking until the predicted available resources are used.
  • the video processing system does not have enough resources to process all the frames all the time. If, at a given time, the video processing system does not have enough available resources to process the video frames at that time, the frames are prioritized for processing in a cascade of rankings to determine which of the frames are processed until the predicted available resources are used.
  • the ranking is made by a derived ranking of previous frame(s) from the same stream; and one or more preprocessing analysis deriving indication of ranking.
  • the one or more preprocessing analysis may include performing a difference detection from previous frames (ranking those frames with more difference higher than those frames with less difference) and/or detecting if a specific object is in the frame (ranking those frames with that object higher than those frames that do not have that object). In each stage, a frame can be prioritized out (not selected for processing).
  • Embodiments may provide one or more of the following technical advantage(s).
  • Embodiments allow for the over-subscription of processing resources while maintaining high priority information to be extracted using the video processing. This increases the overall processor utilization and minimizes the overall needed compute capacity. Further, the queues are mostly empty since frames down-prioritized may be dropped. This lowers cost and energy consumption during normal operations. Also, this offers a good processing base when acting as a redundant resource during fault of another video processing resource, such as when an availability zone (with a second video processor) experience faults, the work can be handled by this video processing system with only an increased contention on priority.
  • FIG. 1 is a block diagram that illustrates a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
  • the video processing system 100 receives frames from the streams 101-108. The frames are decoded and may be received from a graphics processing unit (GPU) or other video processing decoding unit.
  • the video processing system 100 can be implemented on a central processing unit (CPU), a GPU, or other processing unit, and/or be a distributed system that is implemented on multiple processing units that may be in separate physical devices. Further, the video processing system 100 may be implemented in a containerized system that has shared computing resources with other containers or software.
  • the processing system 100 receives the frame 112 from the stream 102, frame 113 from the stream 103, frame 115 from the stream 105, frame 116 from the stream 106, frame 117 from the stream 107, and frame 118 from the stream 108.
  • the frame 114 from the stream 104 is not received for processing because it is outside of the frame collection time window 130. Also, there is no frame received from the stream 101.
  • the video processing system 100 derives the number of frames that can be processed by the video processing application based on the available video processing resources. For example, the estimated number of frames per second (FPS) that can be processed by the video processing application 137 is multiplied by the frame collection time window 130 to determine the number of frames that can be processed by the video processing application 137.
  • the estimated number of FPS that can be processed may be based on benchmarking of the computational task. For example, a relatively large deep neural network (DNN) may have a relatively small FPS processing rate.
  • the estimated number of FPS that can be processed may be a static value or dynamically updated based on previous processing.
  • the video processing system 100 ranks the frames for processing in cascading stages to select the frames for processing by the video processing application.
  • the video processing system 100 performs a first ranking, a stream level ranking 132, for each of the frames received during the frame collection time window 130.
  • the stream level ranking 132 is derived based on, for each stream, the priority score assigned to previous frame(s) of that stream received during a previous frame collection time window.
  • the priority score for a frame is described later herein in subsequent ranking stages. By way of example, if a previous frame was determined to have a high priority score (e.g., the frame included an object of interest), it is likely that a subsequent frame will also be determined to have a high priority score (e.g., include the same object of interest).
  • the stream level ranking 132 ranks the frames received during the frame collection time window 130 into multiple categories such as a high rank category, a medium rank category, and a low rank category.
  • Frames in the high rank category are processed by the video processing application 137.
  • Frames in the medium rank category are further evaluated for ranking.
  • Frames in the low rank category are not processed by the video processing application 137.
  • the frames in the process ranking group 140 correspond to a high rank
  • the frames in the evaluate ranking group 142 correspond to a medium rank
  • the frames in the drop ranking group 144 correspond to a low rank.
  • the ranking of frames is relative.
  • the number of frames in the process ranking group (the highest-ranking category) cannot exceed the number of frames that can be processed by the video processing application 137 for that time window.
  • a rule-based solution can be used to have a relative or absolute priority difference compared to the highest prioritized frame that would otherwise not be processed. For example, if the processing system 100 receives four frames and the video processing application 137 has available resources to process three frames and frame number four has a priority of .7, then any frames above .7 + delta (with delta either .7 * relative factor; or absolute factor) is grouped into the high rank category and the others grouped into the medium ranked category or the low ranked category.
  • the medium rank category may be assigned at most X frames compared to the available slots (available slots meaning the total number of frames that can be processed minus the frames that are assigned to the high rank category). Using the above example, if there are two slots assigned with frames to be processed and one slot remaining, then the medium rank category should either include a relative or absolute number of frames (e.g., 1 * 2, or 1 + 2 frames) and any other frame is put in the low-ranking category. Although this example is rule-based, an artificial intelligence or machine learning algorithm may be used to decide frame priorities.
  • the frames included in the highest rank category have priority scores that are high enough that further preprocessing analysis (e.g., processing in the preprocessing difference detection 133 and/or the preprocessing object detection 135) will not substantially change the priority score for the frame.
  • the frames included in the medium rank category have priority scores that need further evaluation in one or more preprocessing stages. After the ranking is complete, there will be no frames included in the medium rank (either the frames will be assigned to the high ranking group and processed or assigned to the low ranking group and dropped).
  • the lowest rank category includes frame(s) that have priority scores that are such that any further preprocessing analysis will not substantially increase the priority scores.
  • the frame of that stream is assigned to the medium rank category.
  • each update step may include a fixed amount of change of priority. For example, if the stream level priority is over the middle, the priority score may be subtracted by a predefined amount; and if the stream level priority is under the middle, the priority score may be added to by the predefined amount.
  • the update step may increase with the time since updated (e.g., a constant multiplied with the time difference).
  • the frame(s) in the medium rank category are evaluated in one or more preprocessing stages and assigned a priority score.
  • Each preprocessing stage has a specific analysis such as a difference detection or an object detection. Such analysis is followed by a priority calculation and ranking.
  • a preprocessing stage includes performing one or more difference detection stages.
  • a difference detection stage may include an image difference calculation made against a background frame (without dynamic objects) that gives information on whether there is activity in the frame.
  • the difference calculation includes obtaining a background frame in grayscale format, converting the current frame into grayscale, taking the absolute difference between the frames on a pixel level, thresholding the difference on a pixel level to derive a binary value per pixel to remove noise, and determining the amount of difference (e.g., the ratio of pixels above the threshold and the total number of pixels). If the amount of difference is below a threshold, then the frame priority is set lower which indicates that there may not be activity in the frame. If the amount of the difference is above the threshold, then the frame priority is set at a medium level, possibly derived from the amount of difference. Alternatively, the priority is derived as a function of the difference.
  • the difference calculation includes obtaining the previous frame, taking the absolute difference between the frames on a pixel level, thresholding the difference on a pixel level (e.g., on one or more of the color components such as RBG or YUV) to derive a binary value per pixel to remove noise, and determining the amount of difference (e.g., the ratio of pixels above the threshold and the total number of pixels). If the amount of difference is below a threshold, then the frame priority is set lower which indicates that there may not be activity in the frame. If the amount of the difference is above the threshold, then the frame priority is set at a medium level, possibly derived from the amount of difference. Alternatively, the priority is derived as a function of the difference.
  • the video processing system 100 performs the preprocessing difference detection stage 133 on the frames 113, 116, and 117. Each of the frames 113, 116, and 117 have a priority score that is updated at the end of this stage.
  • the video processing system 100 determines that the frame 117 is moved from the medium ranking category to the low-ranking category and the frames 113 and 116 remain in the medium ranking category. That is, the frame 117 in this example has a lower priority score compared to the frames 113 and 116. The frame 117 is dropped and is not processed by the video processing system 100.
  • a preprocessing stage includes performing a Boolean object detection to determine whether an object of interest is included in the frame.
  • the object of interests depend on the context of the video processing application. For instance, an object of interest may be a human, a robot, a vehicle, wildlife, etc. There may be multiple objects of interest that can be detected in frame. If the object of interest is detected in the frame, the priority of the frame is increased. If the object of interest is not detected in the frame, the priority of the frame is decreased. Alternatively, the priority value is changed as a function of whether the object of interest is included in the frame.
  • the object detection may include using a machine learning object detection algorithm that uses a neural net. For example, the object detection may use an ML based image/object classification algorithm.
  • the image classification algorithm includes transforming the input image pixels to an array of values between zero and one, typically related to the grayscale level of the pixel and running the transformed input through a fully connected neural network layer with fewer output signals (e.g., 128) then through a fully connected neural network layer with one output per object class. This output may then correspond to the probability of that class being present in the image.
  • the weights used in the neural network layer can be learned by supervised learning using images with known object classes.
  • a preprocessing stage includes performing an occlusion detection that lowers the priority of frames that have content that is occluding an area of interest. For example, if a frame includes a vehicle parked in front of a camera obscuring the area of interest, that frame may have its priority score lowered.
  • a preprocessing stage includes determining whether a distance to center of an object of interest in a frame and assigning a priority score depending on the distance to the center. For example, a frame that includes an object of interest towards a center of the frame may have a higher priority score than a frame that includes the object of interest that is on an edge of the frame.
  • the video processing system 100 performs the preprocessing object detection stage on the frames 113 and 116.
  • Each of the frames 113 and 116 has a priority score that is updated at the end of this stage.
  • the video processing system 100 determines that the frame 116 is moved from the medium ranking category to the low-ranking category and the frame 113 moves from the medium ranking to the high-ranking category. That is, the frame 116 in this example has a lower priority score compared to the frame 113.
  • the frame 116 is dropped and is not processed by the video processing application 137.
  • the frames that have been ranked as high priority are processed by the video processing application 137.
  • the specific processing done in the video processing application 137 depends on the context of the use of the video solution.
  • the video processing application 137 may analyze the frames 112, 115, and 113, find a human, and calculate a real-world position for the human.
  • the video processing application 137 includes positioning fiducial markers, transport robots, fork lifts, vehicles, or other object(s)
  • the video processing application 137 analyzes the frames 112, 115, and 113 to find the object(s) and calculate a real-world position for the object(s).
  • Figure 1 shows an order of preprocessing stages
  • the order and/or number of stages can be done differently in different embodiments.
  • the object detection stage can be performed prior to a difference detection stage.
  • any mix of preprocessing stages may be performed in an embodiment.
  • Figure 2 is a flow diagram for a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
  • the operations described in Figure 2 are described with reference to the exemplary embodiment of Figure 1. However, the operations of Figure 2 can be performed by embodiments different from that of Figure 1 , and the embodiment of Figure 1 can perform operations different from that of Figure 2.
  • the video processing system 100 receives multiple video frames from multiple video streams during a time window.
  • the video processing system 100 derives the number of frames that can be processed by the video processing application 137 based on the available video processing resources. For example, the estimated number of FPS that can be processed by the video processing application 137 is multiplied by the frame collection time window 130 to determine the number of frames that can be processed by the video processing application 137.
  • the estimated number of FPS that can be processed may be based on benchmarking of the computational task. For example, a relatively large deep neural network (DNN) may have a relatively small FPS processing rate.
  • the estimated number of FPS that can be processed may be a static value or dynamically updated based on previous processing.
  • the video processing system 100 determines whether the number of frames received is lower than the derived number of frames that can be processed by the video processing application 137. If it is, then operation 225 is performed where the video processing application 137 processes all the frames received in that time window. In an embodiment, the priority scoring and/or ranking of these frames is not performed in this situation. In another embodiment, even though all frames are processed in this situation, each frame is assigned a priority score and/or ranked for use in future rankings and/or other uses (e.g., load balancing, scaling). If, however, the number of frames received is not lower than the derived number of frames that can be processed by the video processing application 137, then operation 230 is performed.
  • the video processing system 100 performs a first ranking (e.g., a stream level ranking 132) for the frames received during the frame collection time window 130.
  • the stream level ranking 132 is derived based on, for each stream, the priority score assigned to previous frame(s) of that stream. By way of example, if a previous frame was determined to have a high priority score (e.g., the frame included an object of interest), it is likely that a subsequent frame will also be determined to have a high priority score (e.g., include the same object of interest).
  • the stream level ranking 132 ranks the frames received during the frame collection time window 130 into multiple categories such as a high rank category, a medium rank category, and a low rank category. Frames in the high rank category are processed by the video processing application 137. Frames in the medium rank category are further evaluated for ranking in one or more preprocessing stages. Frames in the low rank category are not processed by the video processing application 137.
  • the video processing system 100 performs a set of one or more preprocessing stages to assign a priority score to each of the frames that are ranked in the medium rank category.
  • the one or more preprocessing stages may include: performing a difference calculation for each frame against a background frame and assigning/updating a priority score for the frame based on the result, performing a difference calculation for each frame against a previous frame of the stream and assigning/updating a priority score for the frame based on the result, performing an object detection to determine whether a set of one or more object(s) of interest are included in the frame and assigning/updating a priority score for the frame based on the result, performing an occlusion detection to detect whether an object is obscuring an area of interest in the frame and assigning/updating a priority score for the frame based on the result, and/or performing a distance to the center of an object of interest in the frame and assigning/updating a priority score for the frame based on the result
  • a ranking of frames is done at the end of the preprocessing stage(s) based on the priority score assigned to the frames.
  • the total frames selected for processing at the video processing application 137 does not exceed the derived number of frames that can be processed by the video processing application 137.
  • the video processing application 137 processes the selected frames.
  • FIG. 3 is a block diagram of a host 300 in accordance with various aspects described herein.
  • the host 300 may be or comprise various combinations of hardware and/or software, including a standalone server, a blade server, a cloud-implemented server, a distributed server, a virtual machine, container, or processing resources in a server farm.
  • the host 300 may provide (alone or along with other devices such as other hosts) one or more services such as the cascade stages priority-based processing described herein.
  • the host 300 includes processing circuitry 302 that is operatively coupled via a bus 304 to an input/output interface 306, a network interface 308, a power source 310, and a memory 312. Other components may be included in other embodiments.
  • the processing circuitry 302 is configured to process instructions and data and may be configured to implement any sequential state machine operative to execute instructions stored as machine-readable computer programs in the memory 312.
  • the processing circuitry 302 may be implemented as one or more hardware- implemented state machines (e.g., in discrete logic, field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), etc.); programmable logic together with appropriate firmware; one or more stored computer programs, general-purpose processors, such as a microprocessor or digital signal processor (DSP), together with appropriate software; or any combination of the above.
  • the processing circuitry 302 may include multiple central processing units (CPUs).
  • the input/output interface 306 may be configured to provide an interface or interfaces to an input device, output device, or one or more input and/or output devices.
  • the power source 310 is structured as a battery or battery pack. Other types of power sources, such as an external power source (e.g., an electricity outlet), photovoltaic device, or power cell, may be used.
  • the power source 310 may further include power circuitry for delivering power from the power source 310 itself, and/or an external power source, to the various parts of the host 300 via input circuitry or an interface such as an electrical power cable. Power circuitry may perform any formatting, converting, or other modification to the power from the power source 310 to make the power suitable for the respective components of the host 300 to which power is supplied.
  • the processing circuitry 302 may be configured to communicate with an access network or other network using the network interface 308.
  • the network interface 308 may comprise one or more communication subsystems.
  • the network interface 308 may include one or more transceivers used to communicate wireless or through a wired connection with other network devices across a network.
  • the memory 312 may be or be configured to include memory such as random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read- only memory (EEPROM), magnetic disks, optical disks, hard disks, removable cartridges, flash drives, and so forth.
  • the memory 312 may include one or more computer programs including one or more host application programs 314 and data 316. Embodiments of the host 300 may utilize only a subset or the components shown.
  • the host application programs 314 may be implemented in a container-based architecture. As an example, the host application programs 314 may provide functionality for the cascade stages priority-based processing described herein.
  • the memory 312 may be configured to include a number of physical drive units, such as redundant array of independent disks (RAID), flash memory, USB flash drive, external hard disk drive, thumb drive, pen drive, key drive, high-density digital versatile disc (HD-DVD) optical disc drive, internal hard disk drive, Blu-Ray optical disc drive, holographic digital data storage (HDDS) optical disc drive, external mini-dual in-line memory module (DIMM), synchronous dynamic random access memory (SDRAM), external micro-DIMM SDRAM, other memory, or any combination thereof.
  • the memory 312 may allow the host 300 to access instructions, host application programs and the like, stored on transitory or non-transitory memory media, to off-load data, or to upload data.
  • An article of manufacture, such as one utilizing a communication system may be tangibly embodied as or in the memory 312, which may be or comprise a device-readable storage medium.
  • FIG. 4 is a block diagram illustrating a virtualization environment 400 in which functions implemented by some embodiments may be virtualized.
  • virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources.
  • virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components.
  • Some or all the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 400 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a host.
  • VMs virtual machines
  • Applications 402 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment Q400 to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.
  • Hardware 404 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth.
  • Software may be executed by the processing circuitry to instantiate one or more virtualization layers 406 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 408a and 408b (one or more of which may be generally referred to as VMs 408), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein.
  • the virtualization layer 406 may present a virtual operating platform that appears like networking hardware to the VMs 408.
  • the VMs 408 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 406.
  • Different embodiments of the instance of a virtual appliance 402 may be implemented on one or more of VMs 408, and the implementations may be made in different ways.
  • Hardware 404 may be implemented in a standalone network node with generic or specific components. Hardware 404 may implement some functions via virtualization.
  • hardware 404 may be part of a larger cluster of hardware (e.g., such as in a data center) where many hardware nodes work together and are managed via management and orchestration 410, which, among others, oversees lifecycle management of applications 402.
  • the computing devices described herein e.g., hosts
  • other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions and methods disclosed herein.
  • Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination.
  • processing circuitry may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination.
  • computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components.
  • a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface.
  • non-computationally intensive functions of any of such components may be implemented in software or firmware and computational
  • processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non-transitory computer- readable storage medium.
  • some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner.
  • the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and/or by end users and a wireless network generally.
  • deriving the number of video frames that can be processed by the video processing application (137) includes multiplying an estimated number of frames per second that can be processed by the video processing application (137) by the time window.
  • deriving the number of video frames that can be processed by the video processing application (137) includes multiplying an estimated number of frames per second that can be processed by the video processing application (137) by the time window.
  • the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference detection against a background frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
  • the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference calculation against a previous frame of the video stream of that video frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
  • the set of one or more preprocessing stages includes an object detection stage that determines whether, for each of the second set of one or more of the plurality of video frames (113, 116, 117), that video frame includes an object of interest, wherein video frames that include the object of interest have a higher priority value than video frames that do not include the object of interest.
  • a host for prioritizing video frames of multiple video streams for application processing comprising: processing circuitry configured to perform any of the steps of any of the Group A embodiments; and power supply circuitry configured to supply power to the processing circuitry.
  • a non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of the Group A embodiments.
  • An electronic device for prioritizing video frames of multiple video streams for application processing comprising: a processor; and a non-transitory computer-readable storage medium that provides instructions that, if executed by the processor, cause the electronic device to perform any of the steps of any of the Group A embodiments.
  • a machine-readable medium comprising computer program code which when executed by an electronic device carries out any of the steps of any of the Group A embodiments.
  • a system for prioritizing video frames of multiple video streams for application processing comprising: a non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of the Group A embodiments.

Abstract

Cascade stages priority-based processing is described. A video processing system receives video frames from multiple video streams during a time window. The number of frames that can be processed by a video processing application is derived based on available processing resources. If this number is less than the received number of video frames, a first ranking of the video frames is performed based on priority scores assigned to frame(s) of the streams received in a prior time window. The first ranking includes frame(s) selected for processing by the application, frame(s) selected for further preprocessing including assigning a priority score to these frame(s), and frame(s) that are dropped. A second ranking of these frames is performed based on these assigned priority scores. The second ranking includes selecting at least one video frame for processing by the application and unselected frames are dropped. The application processes the selected frames.

Description

CASCADE STAGES PRIORITY-BASED PROCESSING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 63/349,939, filed June 7, 2022, which is hereby incorporated by reference.
TECHNICAL FIELD
[0002] Embodiments of the invention relate to the field of processing; and more specifically, to cascade stages priority-based processing.
BACKGROUND ART
[0003] There exist methods to speed up artificial intelligence (AI)/machine learning (ML) analysis of video content. One such method is described in the article “NoScope: Optimizing Neural Network Queries Over Video at Scale”, by Kang et al., Proceedings of the VLDB Endowment, 10(11), 2017. The idea is to have a cascade of analysis to filter out frames from further processing, for example detect if any motion from previous frames, specialized detector for a specific object, etc. This is a filtration technique for one video stream and is made to sequentially process video files quickly.
[0004] There currently exist certain challenge(s). Previous work only handles one stream. But for real-time processing of multiple streams, single-stream filtration may not be enough.
Although filtration can lower the amount of processing that is done based on one stream, that then can also lead to fewer frames being processed that are necessary. For batch or off-line processing, having fewer frames is positive since the batch can be finished in a shorter duration and then the next batch can be processed. However, for real-time processing of multiple streams, the stream processing can be interleaved between streams and hence when one stream skips many frames it cannot be utilized without doing some cross-stream management.
SUMMARY OF THE INVENTION
[0005] Cascade stages priority-based processing is described. In one aspect, a method is performed including receiving, during a time window, video frames from video streams; deriving a number of video frames that can be processed by a video processing application of the video processing system based on available video processing resources; determining that the derived number of video frames that can be processed by the video processing application is less than the received plurality of video frames; performing a first ranking of the video frames based on, for each of the video streams, a set of one or more priority scores assigned to a set of one or more video frames of that stream received in a prior time window, where the first ranking includes a first set of one or more of the video frames that are selected for processing by the video processing application, a second set of one or more of the video frames selected for further preprocessing in a set of one or more preprocessing stages, and a third set of one or more of the video frames that are not selected for processing by the video processing application; performing the set of one or more preprocessing stages to assign a priority score to each of the second set of one or more of the video frames; performing a second ranking of the second set of one or more of the video frames based on the priority score assigned to each of the second set of one or more of the video frames, where the second ranking includes selecting at least one of the second set of one or more of the video frames for processing by the video processing application, and where a remaining ones of the second set of one or more of the video frames are not selected for processing by the video processing application; and processing, at the video processing application, the selected video frames.
[0006] Deriving the number of video frames that can be processed by the video processing application may include multiplying an estimated number of frames per second that can be processed by the video processing application by the time window. The selected video frames for processing at the video processing application may not exceed the derived number of video frames that can be processed by a video processing application. The set of one or more preprocessing stages may include a difference detection stage that determines, for each of the second set of one or more of the video frames, a difference detection against a background frame, and where video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference. The set of one or more preprocessing stages may include a difference detection stage that determines, for each of the second set of one or more of the video frames, a difference calculation against a previous frame of the video stream of that video frame, and where video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference. The set of one or more preprocessing stages may include an object detection stage that determines whether, for each of the second set of one or more of the video frames, that video frame includes an object of interest, where video frames that include the object of interest have a higher priority value than video frames that do not include the object of interest. The method may further include storing, for each of the streams corresponding to the second set of one or more of the video frames, the priority score assigned to that video frame. The priority score associated with each of the video streams may move to the middle over time.
[0007] In further aspects, one or more embodiments of a non-transitory computer-readable medium or distributed media containing computer-executable program instructions or code portions stored thereon are disclosed for performing one or more embodiments of the methods of the present invention when executed by a processor entity of an apparatus, an electronic device, or other computing device. Further features of the various embodiments are as claimed in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:
[0009] Figure 1 is a block diagram that illustrates a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
[0010] Figure 2 is a flow diagram for a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment.
[0011] Figure 3 is a block diagram of a host in accordance with various aspects described herein.
[0012] Figure 4 is a block diagram illustrating a virtualization environment in which functions implemented by some embodiments may be virtualized.
DETAILED DESCRIPTION
[0013] Certain aspects of the disclosure and their embodiments may provide solutions to these or other challenges. In an embodiment, an elastic method handles low-latency real-time video processing of multiple video streams in a single video processing system where the video processing system does not have enough computational resources (e.g., memory, compute power, etc.) to process all the frames of the multiple video streams at a given time. The elastic method includes continuously prioritizing which video frames of the multiple video streams to process and dropping those frames with relatively lower priority if there are not enough resources to process all the frames.
[0014] It is probable that at a certain time, some of the frames are not of interest and other frames are of interest. At another time, other frames from the same streams might have a different probability of being of interest. For example, having three cameras filming parts of a corridor and a person walking along it, the priority may shift between frames from different streams with the position of the person. It is assumed that the video processor is oversubscribed and thus cannot handle full processing on all streams all the time. Prioritization of frames is based on pre-analysis of the current frames and analysis priority data from past frames, according to an embodiment. [0015] The criteria for prioritizing the video frames may be different depending on the context of the video processing application that uses those frames. For instance, if the video processing application includes detecting humans in the video, video frames that include humans may be prioritized over video frames that do not include humans. As another example, if the video processing application includes detecting moving vehicles such as a car or truck, video frames that include movement of vehicles may be prioritized over video frames that do not. In an example embodiment, the prioritization of video frames is based on the content of the video frames such as whether the video frames include an item of interest and/or indicate movement. The video frames that are of interest to the video processing application are sometimes referred herein as frames of interest. Further, in some embodiments, within the frames of interest there can be frames that are prioritized for processing over other frames due to their content. As an example, all frames that include a human may be of interest, and of those frames those that include a human and a robot may be prioritized over those that do not include a robot.
[0016] A video processing system receives frames from multiple streams for a time window and prioritizes the frames for further processing at a video processing application. The frames are ranked by priority and processed by the video processing system according to the ranking until the predicted available resources are used. The video processing system does not have enough resources to process all the frames all the time. If, at a given time, the video processing system does not have enough available resources to process the video frames at that time, the frames are prioritized for processing in a cascade of rankings to determine which of the frames are processed until the predicted available resources are used.
[0017] In an embodiment, the ranking is made by a derived ranking of previous frame(s) from the same stream; and one or more preprocessing analysis deriving indication of ranking. The one or more preprocessing analysis may include performing a difference detection from previous frames (ranking those frames with more difference higher than those frames with less difference) and/or detecting if a specific object is in the frame (ranking those frames with that object higher than those frames that do not have that object). In each stage, a frame can be prioritized out (not selected for processing).
[0018] Certain embodiments may provide one or more of the following technical advantage(s). Embodiments allow for the over-subscription of processing resources while maintaining high priority information to be extracted using the video processing. This increases the overall processor utilization and minimizes the overall needed compute capacity. Further, the queues are mostly empty since frames down-prioritized may be dropped. This lowers cost and energy consumption during normal operations. Also, this offers a good processing base when acting as a redundant resource during fault of another video processing resource, such as when an availability zone (with a second video processor) experience faults, the work can be handled by this video processing system with only an increased contention on priority.
ADDITIONAL EXPLANATION
[0019] Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.
[0020] Figure 1 is a block diagram that illustrates a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment. The video processing system 100 receives frames from the streams 101-108. The frames are decoded and may be received from a graphics processing unit (GPU) or other video processing decoding unit. The video processing system 100 can be implemented on a central processing unit (CPU), a GPU, or other processing unit, and/or be a distributed system that is implemented on multiple processing units that may be in separate physical devices. Further, the video processing system 100 may be implemented in a containerized system that has shared computing resources with other containers or software.
[0021] Although eight streams are shown, the number is exemplary and there may be fewer or more streams. During the frame collection time window 130, the processing system 100 receives the frame 112 from the stream 102, frame 113 from the stream 103, frame 115 from the stream 105, frame 116 from the stream 106, frame 117 from the stream 107, and frame 118 from the stream 108. The frame 114 from the stream 104 is not received for processing because it is outside of the frame collection time window 130. Also, there is no frame received from the stream 101.
[0022] The video processing system 100 derives the number of frames that can be processed by the video processing application based on the available video processing resources. For example, the estimated number of frames per second (FPS) that can be processed by the video processing application 137 is multiplied by the frame collection time window 130 to determine the number of frames that can be processed by the video processing application 137. The estimated number of FPS that can be processed may be based on benchmarking of the computational task. For example, a relatively large deep neural network (DNN) may have a relatively small FPS processing rate. The estimated number of FPS that can be processed may be a static value or dynamically updated based on previous processing.
[0023] If the number of frames received during the frame collection time window 130 is less than the estimated number of frames that can be processed by the video processing application, all the frames are processed by the video processing application regardless of priority. In an embodiment, the priority scoring and/or ranking of these frames is not performed in this situation. In another embodiment, even though all frames are processed in this situation, each frame is assigned a priority score and/or ranked for use in future rankings and/or other uses (e.g., load balancing, scaling). If the number of frames received during the frame collection time window 130 is not less than the estimated number of frames that can be processed by the video processing application, then the video processing system 100 ranks the frames for processing in cascading stages to select the frames for processing by the video processing application.
[0024] The video processing system 100 performs a first ranking, a stream level ranking 132, for each of the frames received during the frame collection time window 130. The stream level ranking 132 is derived based on, for each stream, the priority score assigned to previous frame(s) of that stream received during a previous frame collection time window. The priority score for a frame is described later herein in subsequent ranking stages. By way of example, if a previous frame was determined to have a high priority score (e.g., the frame included an object of interest), it is likely that a subsequent frame will also be determined to have a high priority score (e.g., include the same object of interest). The stream level ranking 132 ranks the frames received during the frame collection time window 130 into multiple categories such as a high rank category, a medium rank category, and a low rank category. Frames in the high rank category are processed by the video processing application 137. Frames in the medium rank category are further evaluated for ranking. Frames in the low rank category are not processed by the video processing application 137. As shown in Figure 1, the frames in the process ranking group 140 correspond to a high rank, the frames in the evaluate ranking group 142 correspond to a medium rank, and the frames in the drop ranking group 144 correspond to a low rank.
[0025] In an embodiment, the ranking of frames is relative. The number of frames in the process ranking group (the highest-ranking category) cannot exceed the number of frames that can be processed by the video processing application 137 for that time window. A rule-based solution can be used to have a relative or absolute priority difference compared to the highest prioritized frame that would otherwise not be processed. For example, if the processing system 100 receives four frames and the video processing application 137 has available resources to process three frames and frame number four has a priority of .7, then any frames above .7 + delta (with delta either .7 * relative factor; or absolute factor) is grouped into the high rank category and the others grouped into the medium ranked category or the low ranked category. The medium rank category may be assigned at most X frames compared to the available slots (available slots meaning the total number of frames that can be processed minus the frames that are assigned to the high rank category). Using the above example, if there are two slots assigned with frames to be processed and one slot remaining, then the medium rank category should either include a relative or absolute number of frames (e.g., 1 * 2, or 1 + 2 frames) and any other frame is put in the low-ranking category. Although this example is rule-based, an artificial intelligence or machine learning algorithm may be used to decide frame priorities.
[0026] With respect to the stream level ranking 132, the frames included in the highest rank category have priority scores that are high enough that further preprocessing analysis (e.g., processing in the preprocessing difference detection 133 and/or the preprocessing object detection 135) will not substantially change the priority score for the frame. The frames included in the medium rank category have priority scores that need further evaluation in one or more preprocessing stages. After the ranking is complete, there will be no frames included in the medium rank (either the frames will be assigned to the high ranking group and processed or assigned to the low ranking group and dropped). The lowest rank category includes frame(s) that have priority scores that are such that any further preprocessing analysis will not substantially increase the priority scores.
[0027] In an embodiment, if a stream does not have a stream level priority score, the frame of that stream is assigned to the medium rank category.
[0028] In an embodiment, the stream level priority moves toward the middle over time. For example, each update step may include a fixed amount of change of priority. For example, if the stream level priority is over the middle, the priority score may be subtracted by a predefined amount; and if the stream level priority is under the middle, the priority score may be added to by the predefined amount. Alternatively, the update step may increase with the time since updated (e.g., a constant multiplied with the time difference).
[0029] With respect to Figure 1, as a result of the stream level ranking 132, the frames 112 and 115 have been assigned to the process ranking group 140 (the high-ranking category), the frames 113, 116, and 117 have been assigned to the evaluate ranking group 142 (the medium ranking category), and the frame 118 has been assigned to the drop ranking group 144 (the low ranking category). Accordingly, the frame 118 is dropped and is not processed by the video processing application 137.
[0030] The frame(s) in the medium rank category (the evaluate ranking group 142) are evaluated in one or more preprocessing stages and assigned a priority score. Each preprocessing stage has a specific analysis such as a difference detection or an object detection. Such analysis is followed by a priority calculation and ranking.
[0031] In an embodiment, a preprocessing stage includes performing one or more difference detection stages. For instance, a difference detection stage may include an image difference calculation made against a background frame (without dynamic objects) that gives information on whether there is activity in the frame. In an embodiment, the difference calculation includes obtaining a background frame in grayscale format, converting the current frame into grayscale, taking the absolute difference between the frames on a pixel level, thresholding the difference on a pixel level to derive a binary value per pixel to remove noise, and determining the amount of difference (e.g., the ratio of pixels above the threshold and the total number of pixels). If the amount of difference is below a threshold, then the frame priority is set lower which indicates that there may not be activity in the frame. If the amount of the difference is above the threshold, then the frame priority is set at a medium level, possibly derived from the amount of difference. Alternatively, the priority is derived as a function of the difference.
[0032] As another example of a difference detection stage, an image difference calculation is made against a previous frame with a known preprocessing analysis result. In an embodiment, the difference calculation includes obtaining the previous frame, taking the absolute difference between the frames on a pixel level, thresholding the difference on a pixel level (e.g., on one or more of the color components such as RBG or YUV) to derive a binary value per pixel to remove noise, and determining the amount of difference (e.g., the ratio of pixels above the threshold and the total number of pixels). If the amount of difference is below a threshold, then the frame priority is set lower which indicates that there may not be activity in the frame. If the amount of the difference is above the threshold, then the frame priority is set at a medium level, possibly derived from the amount of difference. Alternatively, the priority is derived as a function of the difference.
[0033] In the example of Figure 1, the video processing system 100 performs the preprocessing difference detection stage 133 on the frames 113, 116, and 117. Each of the frames 113, 116, and 117 have a priority score that is updated at the end of this stage. At the frame level ranking 134, the video processing system 100 determines that the frame 117 is moved from the medium ranking category to the low-ranking category and the frames 113 and 116 remain in the medium ranking category. That is, the frame 117 in this example has a lower priority score compared to the frames 113 and 116. The frame 117 is dropped and is not processed by the video processing system 100.
[0034] In an embodiment, a preprocessing stage includes performing a Boolean object detection to determine whether an object of interest is included in the frame. The object of interests depend on the context of the video processing application. For instance, an object of interest may be a human, a robot, a vehicle, wildlife, etc. There may be multiple objects of interest that can be detected in frame. If the object of interest is detected in the frame, the priority of the frame is increased. If the object of interest is not detected in the frame, the priority of the frame is decreased. Alternatively, the priority value is changed as a function of whether the object of interest is included in the frame. The object detection may include using a machine learning object detection algorithm that uses a neural net. For example, the object detection may use an ML based image/object classification algorithm. In an example, the image classification algorithm includes transforming the input image pixels to an array of values between zero and one, typically related to the grayscale level of the pixel and running the transformed input through a fully connected neural network layer with fewer output signals (e.g., 128) then through a fully connected neural network layer with one output per object class. This output may then correspond to the probability of that class being present in the image. The weights used in the neural network layer can be learned by supervised learning using images with known object classes.
[0035] In an embodiment, a preprocessing stage includes performing an occlusion detection that lowers the priority of frames that have content that is occluding an area of interest. For example, if a frame includes a vehicle parked in front of a camera obscuring the area of interest, that frame may have its priority score lowered.
[0036] In an embodiment, a preprocessing stage includes determining whether a distance to center of an object of interest in a frame and assigning a priority score depending on the distance to the center. For example, a frame that includes an object of interest towards a center of the frame may have a higher priority score than a frame that includes the object of interest that is on an edge of the frame.
[0037] In the example of Figure 1, the video processing system 100 performs the preprocessing object detection stage on the frames 113 and 116. Each of the frames 113 and 116 has a priority score that is updated at the end of this stage. At the frame level ranking 136, the video processing system 100 determines that the frame 116 is moved from the medium ranking category to the low-ranking category and the frame 113 moves from the medium ranking to the high-ranking category. That is, the frame 116 in this example has a lower priority score compared to the frame 113. The frame 116 is dropped and is not processed by the video processing application 137.
[0038] The frames that have been ranked as high priority (the frames 112, 115, and 113) are processed by the video processing application 137. The specific processing done in the video processing application 137 depends on the context of the use of the video solution. By way of example, if the video processing application 137 includes positioning humans in a floorplan, the video processing application 137 may analyze the frames 112, 115, and 113, find a human, and calculate a real-world position for the human. As another example, if the video processing application 137 includes positioning fiducial markers, transport robots, fork lifts, vehicles, or other object(s), the video processing application 137 analyzes the frames 112, 115, and 113 to find the object(s) and calculate a real-world position for the object(s). [0039] Although Figure 1 shows an order of preprocessing stages, the order and/or number of stages can be done differently in different embodiments. For example, the object detection stage can be performed prior to a difference detection stage. Further, any mix of preprocessing stages may be performed in an embodiment.
[0040] Although an example of video processing has been described, some embodiments of the solution may be used with other types of work items such as analyzing real-time logs/traces and fraud detection.
[0041] Figure 2 is a flow diagram for a video processing system prioritizing video frames of multiple video streams for application processing according to an embodiment. The operations described in Figure 2 are described with reference to the exemplary embodiment of Figure 1. However, the operations of Figure 2 can be performed by embodiments different from that of Figure 1 , and the embodiment of Figure 1 can perform operations different from that of Figure 2.
[0042] At operation 210, the video processing system 100 receives multiple video frames from multiple video streams during a time window. Next, at operation 215, the video processing system 100 derives the number of frames that can be processed by the video processing application 137 based on the available video processing resources. For example, the estimated number of FPS that can be processed by the video processing application 137 is multiplied by the frame collection time window 130 to determine the number of frames that can be processed by the video processing application 137. The estimated number of FPS that can be processed may be based on benchmarking of the computational task. For example, a relatively large deep neural network (DNN) may have a relatively small FPS processing rate. The estimated number of FPS that can be processed may be a static value or dynamically updated based on previous processing.
[0043] Next, at operation 220, the video processing system 100 determines whether the number of frames received is lower than the derived number of frames that can be processed by the video processing application 137. If it is, then operation 225 is performed where the video processing application 137 processes all the frames received in that time window. In an embodiment, the priority scoring and/or ranking of these frames is not performed in this situation. In another embodiment, even though all frames are processed in this situation, each frame is assigned a priority score and/or ranked for use in future rankings and/or other uses (e.g., load balancing, scaling). If, however, the number of frames received is not lower than the derived number of frames that can be processed by the video processing application 137, then operation 230 is performed. [0044] At operation 230, the video processing system 100 performs a first ranking (e.g., a stream level ranking 132) for the frames received during the frame collection time window 130. The stream level ranking 132 is derived based on, for each stream, the priority score assigned to previous frame(s) of that stream. By way of example, if a previous frame was determined to have a high priority score (e.g., the frame included an object of interest), it is likely that a subsequent frame will also be determined to have a high priority score (e.g., include the same object of interest). The stream level ranking 132 ranks the frames received during the frame collection time window 130 into multiple categories such as a high rank category, a medium rank category, and a low rank category. Frames in the high rank category are processed by the video processing application 137. Frames in the medium rank category are further evaluated for ranking in one or more preprocessing stages. Frames in the low rank category are not processed by the video processing application 137.
[0045] Next, at operation 235, the video processing system 100 performs a set of one or more preprocessing stages to assign a priority score to each of the frames that are ranked in the medium rank category. The one or more preprocessing stages may include: performing a difference calculation for each frame against a background frame and assigning/updating a priority score for the frame based on the result, performing a difference calculation for each frame against a previous frame of the stream and assigning/updating a priority score for the frame based on the result, performing an object detection to determine whether a set of one or more object(s) of interest are included in the frame and assigning/updating a priority score for the frame based on the result, performing an occlusion detection to detect whether an object is obscuring an area of interest in the frame and assigning/updating a priority score for the frame based on the result, and/or performing a distance to the center of an object of interest in the frame and assigning/updating a priority score for the frame based on the result. One or more of these stages may be performed. After each stage, in an embodiment, the frames may be ranked and frames that are in a low category are not processed by the video processing application 137 and frames that are in a medium category may be further processed in a subsequent processing stage.
[0046] At operation 240, a ranking of frames is done at the end of the preprocessing stage(s) based on the priority score assigned to the frames. The total frames selected for processing at the video processing application 137 does not exceed the derived number of frames that can be processed by the video processing application 137. Next, at 245, the video processing application 137 processes the selected frames.
[0047] Figure 3 is a block diagram of a host 300 in accordance with various aspects described herein. As used herein, the host 300 may be or comprise various combinations of hardware and/or software, including a standalone server, a blade server, a cloud-implemented server, a distributed server, a virtual machine, container, or processing resources in a server farm. The host 300 may provide (alone or along with other devices such as other hosts) one or more services such as the cascade stages priority-based processing described herein.
[0048] The host 300 includes processing circuitry 302 that is operatively coupled via a bus 304 to an input/output interface 306, a network interface 308, a power source 310, and a memory 312. Other components may be included in other embodiments. The processing circuitry 302 is configured to process instructions and data and may be configured to implement any sequential state machine operative to execute instructions stored as machine-readable computer programs in the memory 312. The processing circuitry 302 may be implemented as one or more hardware- implemented state machines (e.g., in discrete logic, field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), etc.); programmable logic together with appropriate firmware; one or more stored computer programs, general-purpose processors, such as a microprocessor or digital signal processor (DSP), together with appropriate software; or any combination of the above. For example, the processing circuitry 302 may include multiple central processing units (CPUs).
[0049] In the example, the input/output interface 306 may be configured to provide an interface or interfaces to an input device, output device, or one or more input and/or output devices.
[0050] In some embodiments, the power source 310 is structured as a battery or battery pack. Other types of power sources, such as an external power source (e.g., an electricity outlet), photovoltaic device, or power cell, may be used. The power source 310 may further include power circuitry for delivering power from the power source 310 itself, and/or an external power source, to the various parts of the host 300 via input circuitry or an interface such as an electrical power cable. Power circuitry may perform any formatting, converting, or other modification to the power from the power source 310 to make the power suitable for the respective components of the host 300 to which power is supplied.
[0051] The processing circuitry 302 may be configured to communicate with an access network or other network using the network interface 308. The network interface 308 may comprise one or more communication subsystems. The network interface 308 may include one or more transceivers used to communicate wireless or through a wired connection with other network devices across a network.
[0052] The memory 312 may be or be configured to include memory such as random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read- only memory (EEPROM), magnetic disks, optical disks, hard disks, removable cartridges, flash drives, and so forth. The memory 312 may include one or more computer programs including one or more host application programs 314 and data 316. Embodiments of the host 300 may utilize only a subset or the components shown. The host application programs 314 may be implemented in a container-based architecture. As an example, the host application programs 314 may provide functionality for the cascade stages priority-based processing described herein. [0053] The memory 312 may be configured to include a number of physical drive units, such as redundant array of independent disks (RAID), flash memory, USB flash drive, external hard disk drive, thumb drive, pen drive, key drive, high-density digital versatile disc (HD-DVD) optical disc drive, internal hard disk drive, Blu-Ray optical disc drive, holographic digital data storage (HDDS) optical disc drive, external mini-dual in-line memory module (DIMM), synchronous dynamic random access memory (SDRAM), external micro-DIMM SDRAM, other memory, or any combination thereof. The memory 312 may allow the host 300 to access instructions, host application programs and the like, stored on transitory or non-transitory memory media, to off-load data, or to upload data. An article of manufacture, such as one utilizing a communication system may be tangibly embodied as or in the memory 312, which may be or comprise a device-readable storage medium.
[0054] Figure 4 is a block diagram illustrating a virtualization environment 400 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. Some or all the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 400 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a host.
[0055] Applications 402 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment Q400 to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.
[0056] Hardware 404 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 406 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 408a and 408b (one or more of which may be generally referred to as VMs 408), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 406 may present a virtual operating platform that appears like networking hardware to the VMs 408.
[0057] The VMs 408 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 406. Different embodiments of the instance of a virtual appliance 402 may be implemented on one or more of VMs 408, and the implementations may be made in different ways.
[0058] Hardware 404 may be implemented in a standalone network node with generic or specific components. Hardware 404 may implement some functions via virtualization.
Alternatively, hardware 404 may be part of a larger cluster of hardware (e.g., such as in a data center) where many hardware nodes work together and are managed via management and orchestration 410, which, among others, oversees lifecycle management of applications 402. [0059] Although the computing devices described herein (e.g., hosts) may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware.
[0060] In certain embodiments, some or all the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non-transitory computer- readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non-transitory computer- readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and/or by end users and a wireless network generally.
EMBODIMENTS
Group A Embodiments
1. A method performed by a video processing system (100), the method comprising: receiving, during a time window, a plurality of video frames (112, 113, 115, 116, 117, 118) from a plurality of video streams (102, 103, 105, 106, 107, 108); deriving a number of video frames that can be processed by a video processing application (137) of the video processing system (100) based on available video processing resources; determining that the derived number of video frames that can be processed by the video processing application (137) is less than the received plurality of video frames (112, 113, 115, 116, 117, 118); performing a first ranking of the plurality of video frames (112, 113, 115, 116, 117, 118) based on, for each of the plurality of video streams (102, 103, 105, 106, 107, 108), a set of one or more priority scores assigned to a set of one or more video frames of that stream received in a prior time window, wherein the first ranking includes a first set of one or more of the plurality of video frames (112, 115) that are selected for processing by the video processing application (137), a second set of one or more of the plurality of video frames (113, 116, 117) selected for further preprocessing in a set of one or more preprocessing stages (133, 135), and a third set of one or more of the plurality of video frames (118) that are not selected for processing by the video processing application (137); performing the set of one or more preprocessing stages (133, 135) to assign a priority score to each of the second set of one or more of the plurality of video frames (113, 116, 117); performing a second ranking of the second set of one or more of the plurality of video frames (113, 116, 117) based on the priority score assigned to each of the second set of one or more of the plurality of video frames (113, 116, 117), wherein the second ranking includes selecting at least one of the second set of one or more of the plurality of video frames (113) for processing by the video processing application (137), and wherein a remaining ones of the second set of one or more of the plurality of video frames (116, 117) are not selected for processing by the video processing application (137); and processing, at the video processing application (137), the selected video frames (112, 115, 113). The method of embodiment 1, wherein deriving the number of video frames that can be processed by the video processing application (137) includes multiplying an estimated number of frames per second that can be processed by the video processing application (137) by the time window. The method of any previous embodiment, wherein the selected video frames (112, 115, 113) for processing at the video processing application (137) does not exceed the derived number of video frames that can be processed by a video processing application (137). The method of any previous embodiment, wherein the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference detection against a background frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference. The method of any previous embodiment, wherein the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference calculation against a previous frame of the video stream of that video frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference. The method of any previous embodiment, wherein the set of one or more preprocessing stages includes an object detection stage that determines whether, for each of the second set of one or more of the plurality of video frames (113, 116, 117), that video frame includes an object of interest, wherein video frames that include the object of interest have a higher priority value than video frames that do not include the object of interest.
7. The method of any previous embodiment, further comprising storing, for each of the streams corresponding to the second set of one or more of the plurality of video frames, the priority score assigned to that video frame.
8. The method of any previous embodiment, wherein the priority score associated with each of the plurality of video streams moves to the middle over time.
Group B Embodiments
9. A host for prioritizing video frames of multiple video streams for application processing, comprising: processing circuitry configured to perform any of the steps of any of the Group A embodiments; and power supply circuitry configured to supply power to the processing circuitry.
10. A non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of the Group A embodiments.
11. An electronic device for prioritizing video frames of multiple video streams for application processing, comprising: a processor; and a non-transitory computer-readable storage medium that provides instructions that, if executed by the processor, cause the electronic device to perform any of the steps of any of the Group A embodiments.
12. A machine-readable medium comprising computer program code which when executed by an electronic device carries out any of the steps of any of the Group A embodiments.
13. A system for prioritizing video frames of multiple video streams for application processing, comprising: a non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of the Group A embodiments. REFERENCES
“NoScope: Optimizing Neural Network Queries Over Video at Scale”, by Kang et al., Proceedings of the VLDB Endowment, 10(11), 2017.
Thuan, Do. "Evolution of yolo algorithm and yolov5: the state-of-the-art object detection algorithm.", Bachelor’s Thesis, Oulu University of Applied Sciences, (2021).

Claims

CLAIMS What is claimed is:
1. A method performed by a video processing system (100), the method comprising: receiving, during a time window, a plurality of video frames (112, 113, 115, 116, 117, 118) from a plurality of video streams (102, 103, 105, 106, 107, 108); deriving a number of video frames that can be processed by a video processing application (137) of the video processing system (100) based on available video processing resources; determining that the derived number of video frames that can be processed by the video processing application (137) is less than the received plurality of video frames (112, 113, 115, 116, 117, 118); performing a first ranking of the plurality of video frames (112, 113, 115, 116, 117, 118) based on, for each of the plurality of video streams (102, 103, 105, 106, 107, 108), a set of one or more priority scores assigned to a set of one or more video frames of that stream received in a prior time window, wherein the first ranking includes a first set of one or more of the plurality of video frames (112, 115) that are selected for processing by the video processing application (137), a second set of one or more of the plurality of video frames (113, 116, 117) selected for further preprocessing in a set of one or more preprocessing stages (133, 135), and a third set of one or more of the plurality of video frames (118) that are not selected for processing by the video processing application (137); performing the set of one or more preprocessing stages (133, 135) to assign a priority score to each of the second set of one or more of the plurality of video frames (113, 116, 117); performing a second ranking of the second set of one or more of the plurality of video frames (113, 116, 117) based on the priority score assigned to each of the second set of one or more of the plurality of video frames (113, 116, 117), wherein the second ranking includes selecting at least one of the second set of one or more of the plurality of video frames (113) for processing by the video processing application (137), and wherein a remaining ones of the second set of one or more of the plurality of video frames (116, 117) are not selected for processing by the video processing application (137); and processing, at the video processing application (137), the selected video frames (112, 115, 113).
2. The method of claim 1, wherein deriving the number of video frames that can be processed by the video processing application (137) includes multiplying an estimated number of frames per second that can be processed by the video processing application (137) by the time window.
3. The method of any previous claim, wherein the selected video frames (112, 115, 113) for processing at the video processing application (137) does not exceed the derived number of video frames that can be processed by a video processing application (137).
4. The method of any previous claim, wherein the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference detection against a background frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
5. The method of any previous claim, wherein the set of one or more preprocessing stages includes a difference detection stage that determines, for each of the second set of one or more of the plurality of video frames (113, 116, 117), a difference calculation against a previous frame of the video stream of that video frame, and wherein video frames that have a relatively higher difference have a higher priority value compared to video frames with a relatively lower difference.
6. The method of any previous claim, wherein the set of one or more preprocessing stages includes an object detection stage that determines whether, for each of the second set of one or more of the plurality of video frames (113, 116, 117), that video frame includes an object of interest, wherein video frames that include the object of interest have a higher priority value than video frames that do not include the object of interest.
7. The method of any previous claim, further comprising storing, for each of the streams corresponding to the second set of one or more of the plurality of video frames, the priority score assigned to that video frame.
8. The method of any previous claim, wherein the priority score associated with each of the plurality of video streams moves to the middle over time.
9. A non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of claims
10. An electronic device for prioritizing video frames of multiple video streams for application processing, comprising: a processor; and a non-transitory computer-readable storage medium that provides instructions that, if executed by the processor, cause the electronic device to perform any of the steps of any of claims 1-8.
11. A machine-readable medium comprising computer program code which when executed by an electronic device carries out any of the steps of any of claims 1-8.
12. A system for prioritizing video frames of multiple video streams for application processing, comprising: a non-transitory computer-readable storage medium that provides instructions that, if executed by a processor, will cause said processor to perform any of the steps of any of claims 1-8.
PCT/IB2022/057387 2022-06-07 2022-08-08 Cascade stages priority-based processing WO2023237919A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263349939P 2022-06-07 2022-06-07
US63/349,939 2022-06-07

Publications (1)

Publication Number Publication Date
WO2023237919A1 true WO2023237919A1 (en) 2023-12-14

Family

ID=83283218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/057387 WO2023237919A1 (en) 2022-06-07 2022-08-08 Cascade stages priority-based processing

Country Status (1)

Country Link
WO (1) WO2023237919A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184245A1 (en) * 2007-01-30 2008-07-31 March Networks Corporation Method and system for task-based video analytics processing
US20110188701A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Optimizing video stream processing
US20180052711A1 (en) * 2015-03-13 2018-02-22 Hangzhou Hikvision Digital Technology Co., Ltd. Method and system for scheduling video analysis tasks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080184245A1 (en) * 2007-01-30 2008-07-31 March Networks Corporation Method and system for task-based video analytics processing
US20110188701A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Optimizing video stream processing
US20180052711A1 (en) * 2015-03-13 2018-02-22 Hangzhou Hikvision Digital Technology Co., Ltd. Method and system for scheduling video analysis tasks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG ET AL.: "NoScope: Optimizing Neural Network Queries Over Video at Scale", PROCEEDINGS OF THE VLDB ENDOWMENT, vol. 10, no. 11, 2017, XP055571451, DOI: 10.14778/3137628.3137664
THUAN, DO: "Bachelor's Thesis", 2021, OULU UNIVERSITY OF APPLIED SCIENCES, article "Evolution of yolo algorithm and yolov5: the state-of-the-art object detection algorithm"

Similar Documents

Publication Publication Date Title
CN106557778B (en) General object detection method and device, data processing device and terminal equipment
US20180150746A1 (en) Feature Set Determining Method and Apparatus
US8756174B2 (en) Forward feature selection for support vector machines
US11501162B2 (en) Device for classifying data
EP4080416A1 (en) Adaptive search method and apparatus for neural network
US11538237B2 (en) Utilizing artificial intelligence to generate and update a root cause analysis classification model
CN111160469A (en) Active learning method of target detection system
US9299034B2 (en) Predicting change-points in continuous time-series data using two classifier stages
Hu et al. On exploring image resizing for optimizing criticality-based machine perception
US10997748B2 (en) Machine learning model development with unsupervised image selection
US20120057751A1 (en) Particle Tracking Methods
CA3135954A1 (en) Code generation for deployment of a machine learning model
WO2023237919A1 (en) Cascade stages priority-based processing
US20230031755A1 (en) Generative adversarial network for processing and generating images and label maps
CN112099889A (en) Information display method, equipment, device and storage medium
CN116521350A (en) ETL scheduling method and device based on deep learning algorithm
KR101623113B1 (en) Apparatus and method for learning and classification of decision tree
CN115984671A (en) Model online updating method and device, electronic equipment and readable storage medium
CN111428886B (en) Method and device for adaptively updating deep learning model of fault diagnosis
CN107316313A (en) Scene Segmentation and equipment
CN111783655A (en) Image processing method and device, electronic equipment and storage medium
US20230111043A1 (en) Determining a fit-for-purpose rating for a target process automation
CN114020355B (en) Object loading method and device based on cache space
KR102407263B1 (en) Neuromorphic Memory Management System and Data Operation Method thereof
US20230176762A1 (en) Object storage system, migration control device, and migration control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22769364

Country of ref document: EP

Kind code of ref document: A1