US20180082130A1 - Foreground detector for video analytics system - Google Patents
Foreground detector for video analytics system Download PDFInfo
- Publication number
- US20180082130A1 US20180082130A1 US15/582,524 US201715582524A US2018082130A1 US 20180082130 A1 US20180082130 A1 US 20180082130A1 US 201715582524 A US201715582524 A US 201715582524A US 2018082130 A1 US2018082130 A1 US 2018082130A1
- Authority
- US
- United States
- Prior art keywords
- foreground
- background
- pixel
- pixels
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G06K9/00771—
-
- G06K9/00711—
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Definitions
- Embodiments of the invention generally relate to techniques for analyzing digital images. More specifically, embodiments presented herein provide a variety of techniques for effectively and efficiently segmenting foreground and background elements in a stream of video frames trained on a scene.
- Video analytics generally refers to applications that evaluate digital image data, and a variety of approaches have been developed to programmatically evaluate a video stream. For example, some video analytics systems may be configured to detect a set of pre-defined patterns in a video stream. Many video analytics applications generate a background model to evaluate a video stream.
- a background model generally represents static elements of a scene within a field-of-view of a video camera. For example, consider a video camera trained on a stretch of roadway. In such a case, the background would include the roadway surface, the medians, any guard rails or other safety devices, and traffic control devices, etc., visible to the camera.
- the background model may include an expected (or predicted) pixel value (e.g., an RGB or grey scale value) for each pixel of the scene when the background is visible to the camera.
- the background model provides a predicted image of the scene in which no activity is occurring (e.g., an empty roadway). Conversely, vehicles traveling on the roadway (and any other person or thing engaging in some activity) occlude the background when visible to the camera and represent scene foreground objects.
- a background model needs to segment scene foreground and background at or near the same frame rate of a video analytics system. That is, a video analytics system should be able to segment foreground from background for each frame (or every N frames) dynamically while processing a live video feed.
- the video channel may be noisy or include compression artifacts.
- the nature of the scene itself can make it difficult to generate and maintain an accurate background model. For example, ambient lighting levels can change suddenly, resulting in large groups of pixels being misclassified as depicting foreground. In these cases, it becomes difficult to classify any given pixel from frame-to-frame as depicting background or foreground, (e.g., due to pixel color fluctuations that occur due to camera noise or lighting changes).
- a background model also needs to respond to gradual changes in scene lighting.
- some elements of a scene that would preferably be categorized as background can be detected as foreground objects, e.g., a traffic light changing from green to yellow to red or an elevator door opening and closing.
- the changes can result in elements of the traffic light (as captured in pixel data) being incorrectly classified as depicting scene foreground.
- Other examples of a dynamic background include periodic motion such as a scene trained on a waterfall or ocean waves or tree branches bending in a breeze. While these changes in the scene are visually apparent as changes in pixel color from frame-to-frame, they should not result in the pixels being classified as elements of scene foreground. Further, as objects enter the scene, they may, effectively, become part of the scene background (e.g., when a car parks in a parking spot). Because other components in a video analytics system may track each foreground object from frame to frame, such false or stale foreground objects waste processing resources and can disrupt other analytics components which rely on an accurate segmentation of scene foreground and background.
- One embodiment includes a computer-implemented method for generating a background model of a scene depicted in a sequence of video frames captured by a video camera.
- This method may include receiving a video frame, wherein the video frame includes one or more appearance values for each of a plurality of pixels and classifying each pixel as depicting either foreground or background by comparing the one or more appearance values of each pixel to a background model of the scene.
- This method may also include performing one or more context based evaluations on one or more of the pixels classified as depicting foreground, wherein the context based evaluations selectively reclassifies one or more of the pixels as depicting foreground or background based on the classification of other pixels in the video frame as depicting either foreground or background.
- inventions include, without limitation, a computer-readable medium that includes instructions that enable a processing unit to implement one or more aspects of the disclosed methods as well as a system having a processor, memory, and application programs configured to implement one or more aspects of the disclosed methods.
- FIG. 1 illustrates components of a video analytics system, according to one embodiment of the invention.
- FIG. 2 further illustrates components of the video analytics system shown in FIG. 1 , according to one embodiment.
- FIG. 3 illustrates method for segmenting scene foreground and background using a combined pixel-based and context-based evaluation, according to one embodiment.
- FIG. 4 illustrates a method for a generating a candidate BG/FG mask using a pixel based component, according to one embodiment of the invention.
- FIG. 5 illustrates examples of image data generated for a current frame, according to one embodiment.
- FIG. 6 illustrates a method for a refining the candidate BG/FG mask using a context based component, according to one embodiment of the invention.
- FIG. 7 illustrates a method to update a background model, according to one embodiment of the invention.
- FIG. 8 illustrates an example of computing server which includes video analytics system, according to one embodiment of the invention.
- Embodiments of the invention presented herein provide a robust background detector for a video analytics system. More specifically, embodiments of the invention provide techniques for generating and maintaining a background model from image data provided by a video stream.
- the background detector is generally configured to generate a background model of a scene captured in recorded video.
- the background model includes a background image which has color channel (e.g., RGB) values or grayscale brightness values for each pixel in the background image.
- the background detector evaluates pixels to determine whether, in that frame, a pixel depicts an element of background or foreground.
- the background detector may create a background/foreground (BG/FG) mask corresponding to the frame.
- the BG/FG mask may depict every pixel of foreground as white (e.g., 255 in an 8-bit grayscale) and every pixel of background as black (e.g., 0 in an 8-bit grayscale).
- the background detector evaluates a frame using a pixel based component and a context based component.
- the pixel based component compares each pixel in the current frame with a corresponding pixel in the background model. Based on a distance between the two pixels, the background detector assigns the pixel as depicting either foreground or background.
- the background model may also include a mean and variance determined, per-pixel, based on the observed distribution of color values received for a given pixel (e.g., distributions determined per color channel for each pixel).
- the distance between a pixel in a current frame and the corresponding pixel in the background model may be determined as a measure of a distance between the pixel color values (e.g., each of the R, G, B values) and the observed distributions—such as a Mahalanobis distance. If the distance exceeds a pre-defined threshold, then the pixel based component sets the pixel as depicting foreground. Otherwise, the pixel based component sets the pixel as depicting background.
- a constant FG/BG threshold may be ineffective for determining whether a given pixel is a foreground or background. Accordingly, as described below, the threshold may be updated dynamically using a camera noise model. After performing the distance comparisons, the pixel based component provides a candidate BG/FG mask, where each pixel has an assigned background or foreground state.
- the context based component refines the candidate background/foreground image.
- the context based component may perform a series of morphological operations on each pixel assigned as foreground. For example, the context based component may dilate foreground pixels in the candidate BG/FG mask. Doing so converts nearby pixels assigned as background to foreground, particularly where multiple dilated pixels overlap on a pixel classified as background.
- the context based component may erode foreground pixels. Doing so may convert small areas of foreground to background, e.g., in cases where the foreground assignment was the result of a compression or noise artifact.
- the FG/BG mask is used to identify contiguous regions of foreground in the current frame. Each such region represents a foreground object or “blob” in the current frame.
- the context based component compares foreground objects in the current frame (represented by a defined region of pixels) with a corresponding region of pixels in a background image represented by the mean-image within background model. For example, the context based component may determine a normalized cross-correlation between the groups of pixels. If pixels classified as foreground have changed in a similar manner relative to one another (i.e., the pixel colors have all shifted in a similar manner), then the context based component may reclassify the entire blob as background. Doing so may address issues of lighting changes that result in a region being misclassified as foreground, without the computational expense of maintaining multi-state background models. The remaining foreground objects are treated as “true” foreground by the video analytics system.
- the video analytics system After identifying a group of foreground objects in the current frame, the video analytics system updates the background model based on the pixel values of the current frame and on the determination of foreground and background. For pixels in the current frame assigned as depicting background, the color channel values of such pixels are used to incrementally update the mean and variance of the corresponding pixels in the background model. In one embodiment, exponential weights are used in order to give more weight to more recent background samples than older samples. That is, the color values of the most recent frames make a greater contribution to the pixel values of the background model.
- the color values of pixels classified as depicting foreground are absorbed into the background model. That is, with each frame, the raw color values of each foreground pixel contribute to an update of the background model for that frame. Specifically, the values of a pixel classified as foreground are used to update the mean associated with that pixel in the background model, but the variance remains unchanged. For pixels in the current frame detected as foreground, the pixel value is absorbed into the mean of the corresponding pixel in the background model based on an observed likelihood (i.e., a frequency) of that pixel being classified as foreground over a recent-history window and a set of user-specified parameters. The higher the frequency at which a pixel is classified as foreground, the lower the absorption rate.
- foreground objects may be said to be slowly absorbed into the background model. Once absorbed, pixels in subsequent frames now classified as background are used to update both the mean and the variance of the corresponding in the background model. Doing so allows the background model to “pull” elements of foreground into background, preventing “stale” background objects from interfering with the video analytics system.
- the video analytics system may classify pixels in which the car appears as depicting foreground. While the car moves in the parking lot, particularly in a roadway where other cars frequently appear as well, the absorption rate is lower relative to other regions within the field-of-view of the camera (e.g., regions depicting a parking stall). When the car pulls into a parking stall and stops moving, the color values of the pixels are pulled into the background more rapidly, as such pixels may have a low-frequency of being classified as foreground. That is, the mean of the pixels depicting the car may change more quickly to absorb the car as a new part of the background state.
- identifying foreground and background using both pixel based and context based evaluations provides an effective technique for segmenting scene foreground from background in a video stream. Further, this approach can scale to process large numbers of camera feeds simultaneously, e.g., using parallel processing architectures. Further still, the approaches for incrementally updating the mean and variance for pixels in the background model, absorbing foreground pixels into the background model via an absorption window, and dynamically updating background/foreground thresholds used by the pixel based component collectively ensure that the video analytics system can effectively and efficiently respond to changes in a scene, without overly increasing computational complexity. Thus, embodiments presented herein can detect scene foreground and background within the constraints requited to process a video feed in real-time for a large number of cameras.
- FIG. 1 illustrates a network computing environment 100 , according to one embodiment of the invention.
- the network computing environment 100 includes a video camera 105 , a network 110 , and a server computer system 115 .
- the network 110 may transmit video data recorded by the video camera 105 to the server system 115 .
- the video camera 105 could be connected to the server system 115 directly (e.g., via USB or other form of connecting cable).
- Network 110 receives video data (e.g., video stream(s), video images, or the like) from the video camera 105 .
- the server system 115 could also receive a video stream from other input sources, e.g., a VCR, DVR, DVD, computer, web-cam device, or the like.
- each video camera 105 is one of multiple video surveillance cameras 105 used to monitor an enterprise campus.
- each video camera 105 would be trained at a certain area (e.g., a parking lot, a roadway, a building entrance, etc.).
- each video camera 115 would provide a streaming video feed analyzed independently by the server system 115 .
- the area visible to the video camera 105 is referred to as the “scene.”
- the video camera 105 may be configured to record the scene as a sequence of individual video frames at a specified frame-rate (e.g., 24 frames per second), where each frame includes a fixed number of pixels (e.g., 320 ⁇ 240).
- Each pixel of each frame may specify a color value (e.g., an RGB value) or grayscale value (e.g., a radiance value between 0-255).
- the video stream may be encoded using known such formats e.g., MPEG2, MJPEG, MPEG4, H.263, H.264, and the like.
- the server system 115 includes video analytics components (e.g., hardware and software applications) used to analyze the video stream received from the video camera 105 .
- the video analytics components 120 may be configured to classify foreground objects, derive metadata describing the appearance, actions, and/or interactions of such objects (based on changes in pixel color values from frame to frame).
- the resulting video analytics metadata may be used for a variety of applications.
- the output of the video analytics components 120 may be supplied to a machine-learning engine 125 .
- the machine-learning engine 125 may be configured to evaluate, observe, learn and remember details regarding events (and types of events) that occur within the scene. When observations differ from the learned behavior, the system can generate an alert.
- the video analytics component 120 may normalize the metadata derived from observations of foreground objects into numerical values (e.g., to values falling within a range from 0 to 1 with respect to a given data type).
- the metadata could include values for multiple features of each foreground object (e.g., values for a height and width in pixels, color, shape, appearance features, etc.).
- each value type could be modeled as a statistical distribution between 0 and 1.
- the video analytics component 120 then packages the resulting normalized values as feature vector.
- the resulting feature vectors of each foreground objects then provided to the machine learning components 125 for each frame.
- the machine learning components 125 include a neuro-linguistic module that performs neural network-based linguistic analysis of the feature vectors. To generate the model, the machine learning components 125 receive normalized data values and organize the vectors into clusters. Further, the neuro-linguistic module may assign a symbol, e.g., letters, to each cluster which reaches some measure of statistical significance. From the letters, the neuro-linguistic module builds a dictionary of observed combinations of symbols, i.e., words based on a statistical distribution of symbols identified in the input data. Specifically, the neuro-linguistic module may identify patterns of symbols in the input data at different frequencies of occurrence, up to a maximum word size (e.g., 5 letters).
- a maximum word size e.g., 5 letters
- the most frequently observed words provide a dictionary of words corresponding to the video stream.
- the neuro-linguistic module uses words from the dictionary, the neuro-linguistic module generates phrases based on probabilistic relationships of each word occurring in sequence relative to other words, up to a maximum phrase length. For example, the neuro-linguistic module may identify a relationship between a given three-letter word that frequently appears in sequence with a given four-letter word, and so on.
- the syntax allows the machine learning components 125 to learn, identify, and recognize patterns of behavior without the aid or guidance of predefined activities.
- the machine learning components 125 learn patterns by generalizing input and building memories of what is observed. Over time, the machine learning components 125 use these memories to distinguish between normal and anomalous behavior reflected in observed data.
- the neuro-linguistic module builds letters, words (nouns, adjectives, verbs, etc.), phrases, and estimates an “unusualness score” for each identified letter, word, or phrase.
- the unusualness score indicates how infrequently the letter, word, or phrase has occurred relative to past observations.
- the behavior recognition system may use the unusualness scores to both identify and measure how unusual a current syntax is relative to a stable model of symbols (i.e., letters), a stable model of words built from the symbols (i.e., the dictionary) and a stable model of phrase built from the words (i.e., the syntax)—collectively the neuro-linguistic model.
- the neuro-linguistic module may decay, reinforce, and generate letters, words, and syntax phrases over time.
- the neuro-linguistic module “learns on-line” as new data is received and occurrences increase, decrease, or appear.
- the video analytics component 120 and machine-learning components 125 both process video data in real-time.
- time scales for processing information by the video analytics component 120 and the machine-learning component 125 may differ.
- the video analytics component 120 processes video data frame-by-frame, while the machine-learning component 125 processes data every N-frames.
- FIG. 1 illustrates merely one possible arrangement of a network computing environment 100 which includes a video analytics component 120 .
- the video camera 105 is shown connected to the computer system 115 via the network 110 , the video camera 105 could also be connected directly to the server system 115 .
- various components and modules of the server system 115 may be implemented in other systems.
- the video analytics component 120 could be implemented as part of a video input device (e.g., as a firmware component integrated with a video camera 105 ).
- the output of the video camera 105 may be provided to the machine learning components 125 on the server 115 .
- the output from the video analytics component 120 and machine-learning component 125 may be supplied to other computer systems.
- the video analytics component 120 and machine learning component 125 may process video from multiple input sources (i.e., from multiple cameras).
- a feed monitor 135 running on client system 130 provides an application used to monitor and control streaming feeds evaluated by the video components 120 and/or the machine learning component 125 .
- FIG. 2 further illustrates the video analytics component 105 first shown in FIG. 1 , according to one embodiment.
- the video analytics component 105 includes a background foreground (BG/FG) segmentation component 220 , a background model 230 , a tracker component 250 and a micro-feature (MF) classifier 255 .
- Image 205 represents an incoming frame of video received from a video camera.
- the background model 230 includes per-pixel data 240 .
- the per-pixel data 240 includes a color value 242 , a mean and variance 244 , and a foreground frequency 246 for each pixel in the background model 230 .
- the image 205 provides color channel values (e.g., RGB values) for each pixel in a frame of streaming video.
- the background foreground (BG/FG) segmentation component 220 generates a BG/FG mask 210 identifying which pixels depict foreground and which pixels depict background in the image 205 , based on the background model 230 .
- the BG/FG segmentation component 220 outputs a background image 215 and image metadata 260 , as well as updates the background model 230 .
- the BG/FG segmentation component 220 evaluates the image 205 using both a pixel based detector 222 and a context based detector 224 .
- the pixel based detector 222 evaluates each pixel in the image 205 relative to a corresponding pixel in the background model 230 .
- the pixel based detector 222 determines a measure of distance between the pixel in the image 205 and the corresponding pixel 242 in the background model.
- the distance using may be determined as a Mahalanobis distance, per color channel. Of course other distance measure could be used or developed for a particular case.
- the pixel based detector 222 determines a distance between the pixel in the image and a distribution of the observed distribution of values for that pixel in the red, blue, and green color channels maintained by background model 230 .
- the per-pixel data 240 includes a mean and variance 244 for each color distribution used in calculating the Mahalanobis distance.
- the distance is compared to a threshold to determine whether to classify the pixel as depicting foreground or background (at least according to the evaluation done by the pixel based detector 222 ).
- the threshold is determined as a dynamic value updated based on a camera noise model, defined as follows:
- the mean_gray value provides a gray-level representation of mean (R, G, B) values for that pixel in 244 . This equation converts the three channel color value to gray-scale (luminance) value with the above constant coefficients.
- the mean_gray value may then be used to compute the dynamic threshold as follows:
- threshold min_threshold+(max_threshold—min_threshold)*(mean_gray ⁇ 255) (Eq. 2).
- the thresholds could be defined differently for different regions of the scene and other formulas for computing a threshold could be used as well. Of course, a static threshold could be used in some cases.
- the output of the pixel based detector 222 is a candidate background image specifying a classification of each pixel in image 205 as depicting scene foreground or background, relative to the background model 230 .
- the context based detector uses the state of neighboring pixels, i.e., context, of a pixel to refine candidate background/foreground image. Stated differently, the context based detector 224 selectively reclassifies one or more of the pixels as depicting foreground or background based on the classifications of other in the image 205 as depicting either foreground or background. In one embodiment, the context based detector 224 evaluates the pixels in the candidate background/foreground image and refines the assignment of pixels assigned as foreground and background.
- the context based detector 224 may perform a series of morphological operations on pixels assigned as foreground. For example, context based detector 224 may perform a dilation operation on foreground pixels, expanding the size of such pixels. Pixels classified as background, but covered by a threshold number of foreground pixels are converted to foreground. While the threshold can be as low as one, a dilate window of 5 ⁇ 5 has proven to be effective. Doing so may convert small “islands” of background surrounded by foreground pixels to foreground.
- the context based detector may perform erosion operations on pixels classified as foreground (or converted to foreground by dilation operations). Specifically the erosion operation may covert small connected groups of pixels assigned as being foreground to background. For example, the erosion operation may convert any foreground blob having X pixels or less to background. While the threshold can be as low as one, a erode window of 3 ⁇ 3 has proven to be effective. Doing so helps reduce camera noise or compression artifacts from creating small foreground objects.
- the context based detector After performing the morphological operations, the context based detector identifies contiguous regions of pixels that remain classified as foreground. Such regions may be referred to as “foreground blobs” or “foreground objects.” Each foreground blob is then compared with the corresponding region in the background image 215 .
- the background image 210 is composed of the RGB values 242 for each pixel in the background module. Stated differently, the background image 215 provides an image that presents the scene as though only scene background is visible to the camera—an “empty stage.”
- the context based detector 224 performs a normalized cross-correlation operation to compare a foreground object in the image 205 and the corresponding region of pixels in the background image 215 .
- the region classified as foreground may be the result of lighting changes in the scene and not the presence of a foreground object.
- the normalized cross-correlation gives the values in the range of [ ⁇ 1.0, 1.0] and any value above 0.9 is proven to be effective threshold indicating high correlation.
- the context based detector 224 may classify the pixels included in that foreground object as depicting background in the BG/FG mask.
- the resulting BG/FG mask 210 identifies the final determination of foreground and background for the image 205 .
- the BG/FG segmentation component 220 may update the background model 230 based on the image 204 and the resulting BG/FG mask 210 .
- the RGB values for the corresponding pixel in the image are used to incrementally update the mean and variance 244 in background model 230 .
- exponential weights are used in order to give more weight to recent values of that pixel in previous frames.
- a pseudo-code example of updating the background model of the mean and variance for a red, blue, and green color channel is given below:
- the BG/FG component After updating the mean and variance 244 of each pixel classified as background, the BG/FG component also updates the background model 230 based on the color channel values of each pixel classified as foreground.
- foreground objects are slowly absorbed into the background, based on the frequency at which the pixels of that foreground object are classified as foreground (i.e., based on FG frequency 246 ).
- the color values of the pixels depicting the car are “absorbed” into the background model.
- the color values of pixels depicting the car are used to update only the mean 244 of the corresponding pixels in the background model 230 .
- An absorption factor (i.e., the rate at which the pixel color values of a foreground object are absorbed into the background) may be derived from the observed likelihood of a pixel being classified as foreground.
- a pseudo-code example of updating the background model of the mean and variance for a red, blue, and green color channel is given below:
- the tracker 250 provides a component configured to identify and track foreground objects from one frame to the next.
- the tracker 250 may use optical flow, contour, or feature based tracking methods to identify an object across a sequence of frames. That is, the tracker 250 may receive the foreground blobs from the image 205 and generate computational models for each blobs.
- the tracker 250 may receive each successive frame of raw-video (i.e., image 205 ) along with the BG/FG mask 219 and attempt to track the motion of, for example, a car depicted by a given foreground patch as it moves through the scene.
- the tracker 250 provides continuity to other elements of the video analytics component 105 by tracking the car from frame-to-frame. Over time, the tracker 250 builds a trajectory of a foreground object as it appears, moves through the scene, and eventually exits (or is absorbed into the background). The resulting trajectories may be evaluated by a variety of tools, e.g., to first learn expected patterns of trajectories, and subsequently to identify unusual trajectories (e.g., a car going the wrong way) or unusual interactions between trajectories (e.g., two cars colliding).
- tools e.g., to first learn expected patterns of trajectories, and subsequently to identify unusual trajectories (e.g., a car going the wrong way) or unusual interactions between trajectories (e.g., two cars colliding).
- the MF classifier 255 may also calculate a variety of kinematic and/or appearance features of a foreground object, e.g., size, height, width, area (in pixels), reflectivity, shininess, rigidity, speed, velocity, etc.
- the resulting values may be normalized to a value between 0 and 1, packaged as a feature vector (along with an object ID), and output by the video analytics component 105 as image metadata 260 .
- the video analytics component 105 can repeat the process for the next frame of video.
- FIG. 3 illustrates method 300 for segmenting scene foreground and background using a combined pixel-based and context-based evaluation, according to one embodiment.
- the method 300 begins at step 305 , where the video analytics component 105 receives image data, e.g., a current frame of video from a streaming camera feed.
- the pixel based detector determines a distance between each pixel in the image data and a corresponding pixel in the background model.
- the distance may be determined as a measure of a distance between the pixel color values (e.g., each of the R, G, B values) and the observed distribution of color values for that pixel (i.e., a Mahalanobis distance).
- Each pixel with a distance from the background model that exceeds a dynamic threshold is classified as depicting foreground, resulting in a candidate background image (i.e., a BG/FG mask).
- the context based detector refines the candidate background image using morphological operations to dilate and erode foreground pixels. After performing such operations, the context based detector identifies each remaining foreground blob and compares it to the corresponding region in the background image maintained by the background model (i.e., an image based made up of the predicted background color value for each pixel). Following step 315 , the final classification of each pixel as background or foreground provides a BG/FG mask of background and foreground for that frame.
- the video analytics component updates the background model based on the pixel color values of the image data received at step 305 and the classification of each pixel as depicting foreground or background.
- a mean and variance of a distribution of color values for each color channel is updated.
- the incremental update approach set forth above may be used.
- the mean of each color channel distribution in the background model is updated.
- the dynamic absorption window approach discussed above may be used.
- FIG. 4 illustrates a method 400 for a generating a candidate BG/FG mask using a pixel based component, according to one embodiment of the invention.
- the method begins at step 405 where the pixel based component computes a distance (e.g., a Mahalanobis distance) between a current pixel and a corresponding pixel model maintained by the background model.
- the pixel based component compares this distance to a dynamic threshold determined for the current pixel.
- the pixel based component determines whether the distance exceeds the threshold. If not, at step 420 , the pixel is classified as depicting background. Otherwise, the pixel is classified as depicting foreground.
- FIG. 5 illustrates a candidate background image 510 generated by the pixel based detector a raw image 505 and a background model 530 .
- each pixel classified as depicting foreground is represented by a black dot.
- Three relatively dense regions of pixels can be observed, along with a number of other small regions of one or two foreground pixels.
- FIG. 6 illustrates a method 600 for refining the candidate BG/FG mask using a context based component, according to one embodiment of the invention.
- the method 600 begins at step 605 where the context based detector performs morphological operations to dilate and erode pixels classified as foreground. As described, the dilation of foreground pixels may result in neighboring pixels, then classified as background, being converted to foreground. And the erosion of foreground may convert small, isolated patches of foreground into background.
- image 515 of FIG. 5 shows candidate BG/FG image 510 after being refined using the morphological operations.
- the BG/FG segmentation component identifies foreground blobs that remain after the morphological operations of step 605 (e.g., regions 517 , 518 , and 519 of image 515 ).
- a loop begins to evaluate each foreground blob.
- the context based component compares a region of the video frame defined by a current foreground blob with a corresponding region in the background image.
- normalized cross correlation techniques may be used to determine how correlated the changes in pixel values in the current image are relative to the corresponding pixels in the background image.
- step 620 if a correlation threshold is satisfied, then the pixels in that region are reclassified as depicting background in the current frame and the initial assignment of foreground is presumed to be the result of a lighting change. Otherwise, the pixels in the foreground region under consideration remain classified as foreground. For example foreground regions 518 and 519 (which each correspond to a car in the raw image 505 ) would be expected to have a relatively weak normalized cross correlation score and remain as foreground regions 521 and 522 in the final BG/FG mask 520 . However, region 517 would be expected to have a relatively high normalized cross correlation score and be reclassified as background. This result is depicted in the final BG/FG mask 520 , which includes foreground objects 521 and 522 , corresponding to regions 518 and 519 in image 515 , but the pixels of region 517 have been reclassified as background.
- FIG. 7 illustrates a method 700 for updating a background model, according to one embodiment of the invention.
- the method 700 may be performed after a current video frame of a video stream has been segmented into background and foreground using the techniques discussed above.
- the method 700 begins at step 705 , where the mean and variance of each pixel classified as depicting background are updated based on the color channel values of the pixel in the current frame.
- the background model may maintain a distribution of pixel values for each pixel, along with a mean and variance related to the distribution.
- the update to the mean and variance may use exponential weights to give more weight to more recent background samples than older samples.
- a loop begins to absorb a portion of each foreground pixel into the background using a dynamic absorption factor.
- the system determines a frequency at which the current pixel has been classified as foreground. Note, the frequency may be determined relative to a configure window of past frames, e.g., over the previous 1000 frames.
- the system determines a dynamic absorption factor as described above. Again, the more frequent the pixel is classified as foreground (i.e., the more active that region of the frame), the lower the absorption rate.
- the system updates the mean for the pixel in the background model based on the color value of the pixel in the current frame and the absorption factor.
- step 730 if additional foreground pixels remain to be absorbed into the background model, then the method returns to step 710 and the loop repeats for the next foreground pixels. Otherwise, the method 700 ends.
- FIG. 8 illustrates an example of computing server 800 which includes a video analytics component, according to one embodiment of the invention.
- the computing system 800 includes, without limitation, a central processing unit (CPU) 805 , a graphics processing unit (GPU) 806 , a network interface 815 , a memory 820 , and storage 830 , each connected to a bus 817 .
- the computing system 800 may also include an I/O device interface 810 connecting I/O devices 812 (e.g., keyboard, display and mouse devices) to the computing system 800 .
- I/O device interface 810 connecting I/O devices 812 (e.g., keyboard, display and mouse devices) to the computing system 800 .
- the computing elements shown in computing system 800 may correspond to a physical computing system (e.g., a system in a data center) or may be a virtual computing instance executing within a computing cloud.
- the CPU 805 retrieves and executes programming instructions stored in the memory 820 as well as stores and retrieves application data residing in the memory 830 .
- the interconnect 817 is used to transmit programming instructions and application data between the CPU 805 , I/O devices interface 810 , storage 830 , network interface 815 , and memory 820 .
- CPU 805 is included to be representative of a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like.
- the memory 820 is generally included to be representative of a random access memory.
- the storage 830 may be a disk drive storage device. Although shown as a single unit, the storage 830 may be a combination of fixed and/or removable storage devices, such as fixed disc drives, removable memory cards, optical storage, network attached storage (NAS), or a storage area-network (SAN).
- the graphics processing unit GPU 806 is a specialized integrated circuit designed to accelerate the image output in a frame buffer intended for output to a display. GPUs are very efficient at manipulating computer graphics and are generally more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel.
- components of the BG/FG segmentation component discussed above may be implemented to process frames of a video stream in parallel on GPU 806 .
- the pixel based detector may be configured to generate the candidate background image using the GPU 806 to evaluate multiple pixels simultaneously.
- the context based component may perform the normalized cross correlations on the GPU 806 , as well as use the GPU 806 to update multiple pixels in the background model simultaneously (based on the classification of pixels as depicting foreground and background).
- other components of the video analytics system may be implemented to execute on GPU 806 as well.
- the micro feature classifier may determine the kinematics or appearance features of a foreground object using GPU 806 as well as generate elements of the feature vector in parallel.
- the video analytics component 822 may use GPU accelerated computing to accelerate the process of segmenting scene foreground and background in frames of streaming video. The resulting efficiency in processing pixel data on GPU 806 allows the server 800 to scale to support multiple camera feeds using a smaller hardware footprint.
- the memory 820 includes a video analytics component 822 and a current frame 824 and storage 830 include a background model 832 and a background image 834 .
- the video analytics component 822 may be configured to segment scene foreground from background in the current frame 824 using both a pixel based detector and a context based detector to determine a segmentation of scene foreground and background in frames of streaming video.
- the video analytics component 822 may maintain an accurate background model 832 by incrementally updating the mean and variance for pixels in the background model 832 , absorbing foreground pixels into the background model 832 via an absorption window, and by dynamically updating background/foreground thresholds used by the pixel based component.
- the background image 834 provides a representation of a scene absent any foreground objects that may change over time (1) as elements of scene foreground are absorbed by the background model 832 and (2) as background illumination gradually changes. Doing so collectively ensure that the video analytics component 822 can effectively and efficiently respond to changes in a scene, without overly increasing computational complexity.
- a pixel based detector and a context based detector provides an effective technique for segmenting scene foreground from background in a video stream. Further, this approach can scale to process large numbers of camera feeds simultaneously, e.g., using parallel processing architectures.
- aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus or device.
- each block in the flowchart or block diagrams may represent a module, segment or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by special-purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- Embodiments of the invention may be provided to end users through a cloud computing infrastructure.
- Cloud computing generally refers to the provision of scalable computing resources as a service over a network.
- Cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.
- cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.
- a virtual server instance in a computing cloud could be configured to execute the video analytics components to process a streaming camera feed (or feeds).
- the computing resources could be scaled as needed as multiple camera feeds are added.
Abstract
Description
- This application is a continuation of International Patent Application PCT/US2015/058025, filed on Oct. 29, 2015, which is a continuation of U.S. patent application Ser. No. 14/526,756, filed on Oct. 29, 2014 (now U.S. Pat. No. 9,349,054); the entirety of each of the aforementioned applications is hereby incorporated by reference
- Embodiments of the invention generally relate to techniques for analyzing digital images. More specifically, embodiments presented herein provide a variety of techniques for effectively and efficiently segmenting foreground and background elements in a stream of video frames trained on a scene.
- Video analytics generally refers to applications that evaluate digital image data, and a variety of approaches have been developed to programmatically evaluate a video stream. For example, some video analytics systems may be configured to detect a set of pre-defined patterns in a video stream. Many video analytics applications generate a background model to evaluate a video stream. A background model generally represents static elements of a scene within a field-of-view of a video camera. For example, consider a video camera trained on a stretch of roadway. In such a case, the background would include the roadway surface, the medians, any guard rails or other safety devices, and traffic control devices, etc., visible to the camera. The background model may include an expected (or predicted) pixel value (e.g., an RGB or grey scale value) for each pixel of the scene when the background is visible to the camera. The background model provides a predicted image of the scene in which no activity is occurring (e.g., an empty roadway). Conversely, vehicles traveling on the roadway (and any other person or thing engaging in some activity) occlude the background when visible to the camera and represent scene foreground objects.
- To process a live camera feed, a background model needs to segment scene foreground and background at or near the same frame rate of a video analytics system. That is, a video analytics system should be able to segment foreground from background for each frame (or every N frames) dynamically while processing a live video feed.
- However, a variety of challenges arise in generating a background model. For example, the video channel may be noisy or include compression artifacts. In addition, the nature of the scene itself can make it difficult to generate and maintain an accurate background model. For example, ambient lighting levels can change suddenly, resulting in large groups of pixels being misclassified as depicting foreground. In these cases, it becomes difficult to classify any given pixel from frame-to-frame as depicting background or foreground, (e.g., due to pixel color fluctuations that occur due to camera noise or lighting changes). A background model also needs to respond to gradual changes in scene lighting.
- Similarly, some elements of a scene that would preferably be categorized as background can be detected as foreground objects, e.g., a traffic light changing from green to yellow to red or an elevator door opening and closing. The changes can result in elements of the traffic light (as captured in pixel data) being incorrectly classified as depicting scene foreground. Other examples of a dynamic background include periodic motion such as a scene trained on a waterfall or ocean waves or tree branches bending in a breeze. While these changes in the scene are visually apparent as changes in pixel color from frame-to-frame, they should not result in the pixels being classified as elements of scene foreground. Further, as objects enter the scene, they may, effectively, become part of the scene background (e.g., when a car parks in a parking spot). Because other components in a video analytics system may track each foreground object from frame to frame, such false or stale foreground objects waste processing resources and can disrupt other analytics components which rely on an accurate segmentation of scene foreground and background.
- One approach to modeling such scenes is to create a complex background model which supports multiple background states per pixel. However, doing so results in a background model where processing requirements scale with the complexity of the scene. This limits the ability of a video analytics system to analyze a large numbers of camera feeds in parallel.
- One embodiment includes a computer-implemented method for generating a background model of a scene depicted in a sequence of video frames captured by a video camera. This method may include receiving a video frame, wherein the video frame includes one or more appearance values for each of a plurality of pixels and classifying each pixel as depicting either foreground or background by comparing the one or more appearance values of each pixel to a background model of the scene. This method may also include performing one or more context based evaluations on one or more of the pixels classified as depicting foreground, wherein the context based evaluations selectively reclassifies one or more of the pixels as depicting foreground or background based on the classification of other pixels in the video frame as depicting either foreground or background.
- Other embodiments include, without limitation, a computer-readable medium that includes instructions that enable a processing unit to implement one or more aspects of the disclosed methods as well as a system having a processor, memory, and application programs configured to implement one or more aspects of the disclosed methods.
- So that the manner in which the above recited features, advantages, and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments illustrated in the appended drawings.
- It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
-
FIG. 1 illustrates components of a video analytics system, according to one embodiment of the invention. -
FIG. 2 further illustrates components of the video analytics system shown inFIG. 1 , according to one embodiment. -
FIG. 3 illustrates method for segmenting scene foreground and background using a combined pixel-based and context-based evaluation, according to one embodiment. -
FIG. 4 illustrates a method for a generating a candidate BG/FG mask using a pixel based component, according to one embodiment of the invention. -
FIG. 5 illustrates examples of image data generated for a current frame, according to one embodiment. -
FIG. 6 illustrates a method for a refining the candidate BG/FG mask using a context based component, according to one embodiment of the invention. -
FIG. 7 illustrates a method to update a background model, according to one embodiment of the invention. -
FIG. 8 illustrates an example of computing server which includes video analytics system, according to one embodiment of the invention. - Embodiments of the invention presented herein provide a robust background detector for a video analytics system. More specifically, embodiments of the invention provide techniques for generating and maintaining a background model from image data provided by a video stream. As described below, the background detector is generally configured to generate a background model of a scene captured in recorded video. The background model includes a background image which has color channel (e.g., RGB) values or grayscale brightness values for each pixel in the background image. When a new frame is received, the background detector evaluates pixels to determine whether, in that frame, a pixel depicts an element of background or foreground. Once determined, the background detector may create a background/foreground (BG/FG) mask corresponding to the frame. For example, the BG/FG mask may depict every pixel of foreground as white (e.g., 255 in an 8-bit grayscale) and every pixel of background as black (e.g., 0 in an 8-bit grayscale).
- In one embodiment, to generate the BG/FG mask, the background detector evaluates a frame using a pixel based component and a context based component. The pixel based component compares each pixel in the current frame with a corresponding pixel in the background model. Based on a distance between the two pixels, the background detector assigns the pixel as depicting either foreground or background. In addition to a pixel color value, the background model may also include a mean and variance determined, per-pixel, based on the observed distribution of color values received for a given pixel (e.g., distributions determined per color channel for each pixel). In such a case, the distance between a pixel in a current frame and the corresponding pixel in the background model may be determined as a measure of a distance between the pixel color values (e.g., each of the R, G, B values) and the observed distributions—such as a Mahalanobis distance. If the distance exceeds a pre-defined threshold, then the pixel based component sets the pixel as depicting foreground. Otherwise, the pixel based component sets the pixel as depicting background. Note, a constant FG/BG threshold may be ineffective for determining whether a given pixel is a foreground or background. Accordingly, as described below, the threshold may be updated dynamically using a camera noise model. After performing the distance comparisons, the pixel based component provides a candidate BG/FG mask, where each pixel has an assigned background or foreground state.
- After the pixel based evaluation, the context based component refines the candidate background/foreground image. In one embodiment, the context based component may perform a series of morphological operations on each pixel assigned as foreground. For example, the context based component may dilate foreground pixels in the candidate BG/FG mask. Doing so converts nearby pixels assigned as background to foreground, particularly where multiple dilated pixels overlap on a pixel classified as background. In addition, the context based component may erode foreground pixels. Doing so may convert small areas of foreground to background, e.g., in cases where the foreground assignment was the result of a compression or noise artifact.
- After performing the morphological operations, the FG/BG mask is used to identify contiguous regions of foreground in the current frame. Each such region represents a foreground object or “blob” in the current frame. In one embodiment, the context based component compares foreground objects in the current frame (represented by a defined region of pixels) with a corresponding region of pixels in a background image represented by the mean-image within background model. For example, the context based component may determine a normalized cross-correlation between the groups of pixels. If pixels classified as foreground have changed in a similar manner relative to one another (i.e., the pixel colors have all shifted in a similar manner), then the context based component may reclassify the entire blob as background. Doing so may address issues of lighting changes that result in a region being misclassified as foreground, without the computational expense of maintaining multi-state background models. The remaining foreground objects are treated as “true” foreground by the video analytics system.
- After identifying a group of foreground objects in the current frame, the video analytics system updates the background model based on the pixel values of the current frame and on the determination of foreground and background. For pixels in the current frame assigned as depicting background, the color channel values of such pixels are used to incrementally update the mean and variance of the corresponding pixels in the background model. In one embodiment, exponential weights are used in order to give more weight to more recent background samples than older samples. That is, the color values of the most recent frames make a greater contribution to the pixel values of the background model.
- In one embodiment, the color values of pixels classified as depicting foreground are absorbed into the background model. That is, with each frame, the raw color values of each foreground pixel contribute to an update of the background model for that frame. Specifically, the values of a pixel classified as foreground are used to update the mean associated with that pixel in the background model, but the variance remains unchanged. For pixels in the current frame detected as foreground, the pixel value is absorbed into the mean of the corresponding pixel in the background model based on an observed likelihood (i.e., a frequency) of that pixel being classified as foreground over a recent-history window and a set of user-specified parameters. The higher the frequency at which a pixel is classified as foreground, the lower the absorption rate.
- Over a number of frames, if a foreground object continues to remain relatively stationary, the mean will eventually change to where the pixel is no longer classified as foreground. Thus, foreground objects may be said to be slowly absorbed into the background model. Once absorbed, pixels in subsequent frames now classified as background are used to update both the mean and the variance of the corresponding in the background model. Doing so allows the background model to “pull” elements of foreground into background, preventing “stale” background objects from interfering with the video analytics system.
- For example, assume a camera is trained on a parking lot. When a car appears, the video analytics system may classify pixels in which the car appears as depicting foreground. While the car moves in the parking lot, particularly in a roadway where other cars frequently appear as well, the absorption rate is lower relative to other regions within the field-of-view of the camera (e.g., regions depicting a parking stall). When the car pulls into a parking stall and stops moving, the color values of the pixels are pulled into the background more rapidly, as such pixels may have a low-frequency of being classified as foreground. That is, the mean of the pixels depicting the car may change more quickly to absorb the car as a new part of the background state.
- Advantageously, identifying foreground and background using both pixel based and context based evaluations provides an effective technique for segmenting scene foreground from background in a video stream. Further, this approach can scale to process large numbers of camera feeds simultaneously, e.g., using parallel processing architectures. Further still, the approaches for incrementally updating the mean and variance for pixels in the background model, absorbing foreground pixels into the background model via an absorption window, and dynamically updating background/foreground thresholds used by the pixel based component collectively ensure that the video analytics system can effectively and efficiently respond to changes in a scene, without overly increasing computational complexity. Thus, embodiments presented herein can detect scene foreground and background within the constraints requited to process a video feed in real-time for a large number of cameras.
-
FIG. 1 illustrates anetwork computing environment 100, according to one embodiment of the invention. As shown, thenetwork computing environment 100 includes avideo camera 105, anetwork 110, and aserver computer system 115. Thenetwork 110 may transmit video data recorded by thevideo camera 105 to theserver system 115. Of course, thevideo camera 105 could be connected to theserver system 115 directly (e.g., via USB or other form of connecting cable).Network 110 receives video data (e.g., video stream(s), video images, or the like) from thevideo camera 105. In addition to a live feed provided by thevideo camera 105, theserver system 115 could also receive a video stream from other input sources, e.g., a VCR, DVR, DVD, computer, web-cam device, or the like. - As an example, assume the
video camera 105 is one of multiplevideo surveillance cameras 105 used to monitor an enterprise campus. In such a case, eachvideo camera 105 would be trained at a certain area (e.g., a parking lot, a roadway, a building entrance, etc.). And eachvideo camera 115 would provide a streaming video feed analyzed independently by theserver system 115. Generally, the area visible to thevideo camera 105 is referred to as the “scene.” Thevideo camera 105 may be configured to record the scene as a sequence of individual video frames at a specified frame-rate (e.g., 24 frames per second), where each frame includes a fixed number of pixels (e.g., 320×240). Each pixel of each frame may specify a color value (e.g., an RGB value) or grayscale value (e.g., a radiance value between 0-255). Further, the video stream may be encoded using known such formats e.g., MPEG2, MJPEG, MPEG4, H.263, H.264, and the like. - In one embodiment, the
server system 115 includes video analytics components (e.g., hardware and software applications) used to analyze the video stream received from thevideo camera 105. In addition to segmenting scene foreground from background, thevideo analytics components 120 may be configured to classify foreground objects, derive metadata describing the appearance, actions, and/or interactions of such objects (based on changes in pixel color values from frame to frame). The resulting video analytics metadata may be used for a variety of applications. For example, in one embodiment, the output of thevideo analytics components 120 may be supplied to a machine-learning engine 125. In turn, the machine-learning engine 125 may be configured to evaluate, observe, learn and remember details regarding events (and types of events) that occur within the scene. When observations differ from the learned behavior, the system can generate an alert. - In one embodiment, the
video analytics component 120 may normalize the metadata derived from observations of foreground objects into numerical values (e.g., to values falling within a range from 0 to 1 with respect to a given data type). For example, the metadata could include values for multiple features of each foreground object (e.g., values for a height and width in pixels, color, shape, appearance features, etc.). In turn, each value type could be modeled as a statistical distribution between 0 and 1. Thevideo analytics component 120 then packages the resulting normalized values as feature vector. The resulting feature vectors of each foreground objects then provided to themachine learning components 125 for each frame. - In one embodiment, the
machine learning components 125 include a neuro-linguistic module that performs neural network-based linguistic analysis of the feature vectors. To generate the model, themachine learning components 125 receive normalized data values and organize the vectors into clusters. Further, the neuro-linguistic module may assign a symbol, e.g., letters, to each cluster which reaches some measure of statistical significance. From the letters, the neuro-linguistic module builds a dictionary of observed combinations of symbols, i.e., words based on a statistical distribution of symbols identified in the input data. Specifically, the neuro-linguistic module may identify patterns of symbols in the input data at different frequencies of occurrence, up to a maximum word size (e.g., 5 letters). The most frequently observed words (e.g., 20) provide a dictionary of words corresponding to the video stream. Using words from the dictionary, the neuro-linguistic module generates phrases based on probabilistic relationships of each word occurring in sequence relative to other words, up to a maximum phrase length. For example, the neuro-linguistic module may identify a relationship between a given three-letter word that frequently appears in sequence with a given four-letter word, and so on. The syntax allows themachine learning components 125 to learn, identify, and recognize patterns of behavior without the aid or guidance of predefined activities. Thus, unlike a rules-based video surveillance system, which relies on predefined patterns to identify or search for in a video stream, themachine learning components 125 learn patterns by generalizing input and building memories of what is observed. Over time, themachine learning components 125 use these memories to distinguish between normal and anomalous behavior reflected in observed data. - For instance, the neuro-linguistic module builds letters, words (nouns, adjectives, verbs, etc.), phrases, and estimates an “unusualness score” for each identified letter, word, or phrase. The unusualness score (for a letter, word, or phrase observed in input data) indicates how infrequently the letter, word, or phrase has occurred relative to past observations. Thus, the behavior recognition system may use the unusualness scores to both identify and measure how unusual a current syntax is relative to a stable model of symbols (i.e., letters), a stable model of words built from the symbols (i.e., the dictionary) and a stable model of phrase built from the words (i.e., the syntax)—collectively the neuro-linguistic model. In addition, as the neuro-linguistic module receives more input data, the neuro-linguistic module may decay, reinforce, and generate letters, words, and syntax phrases over time. In parlance with the machine learning field, the neuro-linguistic module “learns on-line” as new data is received and occurrences increase, decrease, or appear.
- In general, the
video analytics component 120 and machine-learningcomponents 125 both process video data in real-time. However, time scales for processing information by thevideo analytics component 120 and the machine-learningcomponent 125 may differ. For example, in one embodiment, thevideo analytics component 120 processes video data frame-by-frame, while the machine-learningcomponent 125 processes data every N-frames. - Note, however,
FIG. 1 illustrates merely one possible arrangement of anetwork computing environment 100 which includes avideo analytics component 120. For example, although thevideo camera 105 is shown connected to thecomputer system 115 via thenetwork 110, thevideo camera 105 could also be connected directly to theserver system 115. Further, various components and modules of theserver system 115 may be implemented in other systems. For example, thevideo analytics component 120 could be implemented as part of a video input device (e.g., as a firmware component integrated with a video camera 105). In such a case, the output of thevideo camera 105 may be provided to themachine learning components 125 on theserver 115. Similarly, the output from thevideo analytics component 120 and machine-learningcomponent 125 may be supplied to other computer systems. For example, thevideo analytics component 120 andmachine learning component 125 may process video from multiple input sources (i.e., from multiple cameras). In such a case, afeed monitor 135 running onclient system 130 provides an application used to monitor and control streaming feeds evaluated by thevideo components 120 and/or themachine learning component 125. -
FIG. 2 further illustrates thevideo analytics component 105 first shown inFIG. 1 , according to one embodiment. As shown, thevideo analytics component 105 includes a background foreground (BG/FG)segmentation component 220, abackground model 230, atracker component 250 and a micro-feature (MF)classifier 255.Image 205 represents an incoming frame of video received from a video camera. As also shown, thebackground model 230 includes per-pixel data 240. Specifically, the per-pixel data 240 includes acolor value 242, a mean andvariance 244, and aforeground frequency 246 for each pixel in thebackground model 230. - The
image 205 provides color channel values (e.g., RGB values) for each pixel in a frame of streaming video. Once received, the background foreground (BG/FG)segmentation component 220 generates a BG/FG mask 210 identifying which pixels depict foreground and which pixels depict background in theimage 205, based on thebackground model 230. In addition, the BG/FG segmentation component 220 outputs abackground image 215 andimage metadata 260, as well as updates thebackground model 230. - In one embodiment, to generate the BG/
FG mask 210 for the current streaming video frame (i.e., for image 205), the BG/FG segmentation component 220 evaluates theimage 205 using both a pixel baseddetector 222 and a context baseddetector 224. The pixel baseddetector 222 evaluates each pixel in theimage 205 relative to a corresponding pixel in thebackground model 230. Specifically, the pixel baseddetector 222 determines a measure of distance between the pixel in theimage 205 and thecorresponding pixel 242 in the background model. In one embodiment, the distance using may be determined as a Mahalanobis distance, per color channel. Of course other distance measure could be used or developed for a particular case. In embodiments using the Mahalanobis distance measure, the pixel baseddetector 222 determines a distance between the pixel in the image and a distribution of the observed distribution of values for that pixel in the red, blue, and green color channels maintained bybackground model 230. In addition, the per-pixel data 240 includes a mean andvariance 244 for each color distribution used in calculating the Mahalanobis distance. - The distance is compared to a threshold to determine whether to classify the pixel as depicting foreground or background (at least according to the evaluation done by the pixel based detector 222). In one embodiment, the threshold is determined as a dynamic value updated based on a camera noise model, defined as follows:
-
mean_gray=0.299*red mean value+0.587*green mean value+0.114*blue-mean-value (Eq. 1) - The mean_gray value provides a gray-level representation of mean (R, G, B) values for that pixel in 244. This equation converts the three channel color value to gray-scale (luminance) value with the above constant coefficients. The mean_gray value may then be used to compute the dynamic threshold as follows:
-
threshold=min_threshold+(max_threshold—min_threshold)*(mean_gray−255) (Eq. 2). - While the min_threshold and max_threshold values may be set as a matter of user preference, values of min_threshold=16 and max_threshold=80 have proven to be effective in some cases. Further, the thresholds could be defined differently for different regions of the scene and other formulas for computing a threshold could be used as well. Of course, a static threshold could be used in some cases.
- The output of the pixel based
detector 222 is a candidate background image specifying a classification of each pixel inimage 205 as depicting scene foreground or background, relative to thebackground model 230. The context based detector uses the state of neighboring pixels, i.e., context, of a pixel to refine candidate background/foreground image. Stated differently, the context baseddetector 224 selectively reclassifies one or more of the pixels as depicting foreground or background based on the classifications of other in theimage 205 as depicting either foreground or background. In one embodiment, the context baseddetector 224 evaluates the pixels in the candidate background/foreground image and refines the assignment of pixels assigned as foreground and background. More specifically, the context baseddetector 224 may perform a series of morphological operations on pixels assigned as foreground. For example, context baseddetector 224 may perform a dilation operation on foreground pixels, expanding the size of such pixels. Pixels classified as background, but covered by a threshold number of foreground pixels are converted to foreground. While the threshold can be as low as one, a dilate window of 5×5 has proven to be effective. Doing so may convert small “islands” of background surrounded by foreground pixels to foreground. - In addition, the context based detector may perform erosion operations on pixels classified as foreground (or converted to foreground by dilation operations). Specifically the erosion operation may covert small connected groups of pixels assigned as being foreground to background. For example, the erosion operation may convert any foreground blob having X pixels or less to background. While the threshold can be as low as one, a erode window of 3×3 has proven to be effective. Doing so helps reduce camera noise or compression artifacts from creating small foreground objects.
- After performing the morphological operations, the context based detector identifies contiguous regions of pixels that remain classified as foreground. Such regions may be referred to as “foreground blobs” or “foreground objects.” Each foreground blob is then compared with the corresponding region in the
background image 215. Thebackground image 210 is composed of the RGB values 242 for each pixel in the background module. Stated differently, thebackground image 215 provides an image that presents the scene as though only scene background is visible to the camera—an “empty stage.” In one embodiment, the context baseddetector 224 performs a normalized cross-correlation operation to compare a foreground object in theimage 205 and the corresponding region of pixels in thebackground image 215. If the two regions are highly correlated, the region classified as foreground may be the result of lighting changes in the scene and not the presence of a foreground object. (Note that the normalized cross-correlation gives the values in the range of [−1.0, 1.0] and any value above 0.9 is proven to be effective threshold indicating high correlation.) And the context baseddetector 224 may classify the pixels included in that foreground object as depicting background in the BG/FG mask. - After performing the operations of the pixel based
detector 222 and context baseddetector 224, the resulting BG/FG mask 210 identifies the final determination of foreground and background for theimage 205. In addition, the BG/FG segmentation component 220 may update thebackground model 230 based on the image 204 and the resulting BG/FG mask 210. For each pixel in BG/FG mask 210 classified as background, the RGB values for the corresponding pixel in the image are used to incrementally update the mean andvariance 244 inbackground model 230. In particular, exponential weights are used in order to give more weight to recent values of that pixel in previous frames. A pseudo-code example of updating the background model of the mean and variance for a red, blue, and green color channel is given below: -
TABLE I Incremental update of mean and variance for background (BG) pixels for (int i = 0; i < 3; ++i) { diff = raw[i] − bg_mean[i] incr = alpha * diff bg_mean[i] = bg_mean[i] + incr bg_variance[i] = (1 − alpha) * (bg_variance[i] + diff * incr) }
alpha is a user-defined parameter defining an exponential weight for the background samples. While the value may be set as a matter of preference, a value of 0.02 has proven to be effective. - After updating the mean and
variance 244 of each pixel classified as background, the BG/FG component also updates thebackground model 230 based on the color channel values of each pixel classified as foreground. As noted above, foreground objects are slowly absorbed into the background, based on the frequency at which the pixels of that foreground object are classified as foreground (i.e., based on FG frequency 246). As an example, assume a camera trained on a parking lot, when a car parks (and following any passengers emerging), the color values of the pixels depicting the car are “absorbed” into the background model. Specifically, in one embodiment, the color values of pixels depicting the car are used to update only the mean 244 of the corresponding pixels in thebackground model 230. An absorption factor (i.e., the rate at which the pixel color values of a foreground object are absorbed into the background) may be derived from the observed likelihood of a pixel being classified as foreground. A pseudo-code example of updating the background model of the mean and variance for a red, blue, and green color channel is given below: -
TABLE II Absorption of foreground (FG) pixels within background (BG) model likelihood = fg_samples / total_samples; bg_window = min_window + (likelihood / max_likelihood) * (max_window − min_window); absorb_rate = 1 / (beta * bg_window) for (int i = 0; i < 3; ++i) { bg_mean[i] = absorb_rate * (raw[i] − bg_mean[i]) }
In this example, the min_window (e.g., 10 seconds), max window (e.g., 600 seconds), max_likelihood (e.g., 0.25) and beta (e.g., 10) are user-defined parameters. Note, as the FG likelihood increases, the BG window increases and FG absorption decreases. Stated differently, the absorption factor and the foreground frequency are inversely proportional to one another. - Additionally, while the background component determines a BG/FG mask and a collection of foreground objects independently from frame-to-frame, the
tracker 250 provides a component configured to identify and track foreground objects from one frame to the next. For example, thetracker 250 may use optical flow, contour, or feature based tracking methods to identify an object across a sequence of frames. That is, thetracker 250 may receive the foreground blobs from theimage 205 and generate computational models for each blobs. For example, thetracker 250 may receive each successive frame of raw-video (i.e., image 205) along with the BG/FG mask 219 and attempt to track the motion of, for example, a car depicted by a given foreground patch as it moves through the scene. That is, thetracker 250 provides continuity to other elements of thevideo analytics component 105 by tracking the car from frame-to-frame. Over time, thetracker 250 builds a trajectory of a foreground object as it appears, moves through the scene, and eventually exits (or is absorbed into the background). The resulting trajectories may be evaluated by a variety of tools, e.g., to first learn expected patterns of trajectories, and subsequently to identify unusual trajectories (e.g., a car going the wrong way) or unusual interactions between trajectories (e.g., two cars colliding). - The
MF classifier 255 may also calculate a variety of kinematic and/or appearance features of a foreground object, e.g., size, height, width, area (in pixels), reflectivity, shininess, rigidity, speed, velocity, etc. In one embodiment, the resulting values may be normalized to a value between 0 and 1, packaged as a feature vector (along with an object ID), and output by thevideo analytics component 105 asimage metadata 260. - After evaluating the
image 205 to derive the BG/FG mask 219 and updatingbackground model 230, thevideo analytics component 105 can repeat the process for the next frame of video. -
FIG. 3 illustratesmethod 300 for segmenting scene foreground and background using a combined pixel-based and context-based evaluation, according to one embodiment. As shown, themethod 300 begins atstep 305, where thevideo analytics component 105 receives image data, e.g., a current frame of video from a streaming camera feed. Atstep 310, the pixel based detector determines a distance between each pixel in the image data and a corresponding pixel in the background model. As noted, in one embodiment, the distance may be determined as a measure of a distance between the pixel color values (e.g., each of the R, G, B values) and the observed distribution of color values for that pixel (i.e., a Mahalanobis distance). Each pixel with a distance from the background model that exceeds a dynamic threshold is classified as depicting foreground, resulting in a candidate background image (i.e., a BG/FG mask). - At
step 315, the context based detector refines the candidate background image using morphological operations to dilate and erode foreground pixels. After performing such operations, the context based detector identifies each remaining foreground blob and compares it to the corresponding region in the background image maintained by the background model (i.e., an image based made up of the predicted background color value for each pixel). Followingstep 315, the final classification of each pixel as background or foreground provides a BG/FG mask of background and foreground for that frame. Atstep 320, the video analytics component updates the background model based on the pixel color values of the image data received atstep 305 and the classification of each pixel as depicting foreground or background. As noted, for pixels classified as background, a mean and variance of a distribution of color values for each color channel is updated. For example, the incremental update approach set forth above may be used. In addition, for each pixel associated as foreground, the mean of each color channel distribution in the background model is updated. For example, the dynamic absorption window approach discussed above may be used. -
FIG. 4 illustrates amethod 400 for a generating a candidate BG/FG mask using a pixel based component, according to one embodiment of the invention. As shown, the method begins atstep 405 where the pixel based component computes a distance (e.g., a Mahalanobis distance) between a current pixel and a corresponding pixel model maintained by the background model. Atstep 410, the pixel based component compares this distance to a dynamic threshold determined for the current pixel. Atstep 415, the pixel based component determines whether the distance exceeds the threshold. If not, atstep 420, the pixel is classified as depicting background. Otherwise, the pixel is classified as depicting foreground. Themethod 400 ends after each pixel is evaluated and classified as depicting scene foreground or background.FIG. 5 illustrates acandidate background image 510 generated by the pixel based detector araw image 505 and abackground model 530. As shown, each pixel classified as depicting foreground is represented by a black dot. Three relatively dense regions of pixels can be observed, along with a number of other small regions of one or two foreground pixels. - After generating the candidate BG/FG image, the context based detector refines this image to determine a final BG/
FG mask 520.FIG. 6 illustrates amethod 600 for refining the candidate BG/FG mask using a context based component, according to one embodiment of the invention. As shown, themethod 600 begins atstep 605 where the context based detector performs morphological operations to dilate and erode pixels classified as foreground. As described, the dilation of foreground pixels may result in neighboring pixels, then classified as background, being converted to foreground. And the erosion of foreground may convert small, isolated patches of foreground into background. For example,image 515 ofFIG. 5 shows candidate BG/FG image 510 after being refined using the morphological operations. As can be seen in image, 515 dilating the three dense regions of foreground pixels inimage 510 results in threecontiguous regions - At
step 610, the BG/FG segmentation component identifies foreground blobs that remain after the morphological operations of step 605 (e.g.,regions step 605, a loop begins to evaluate each foreground blob. Atstep 615, the context based component compares a region of the video frame defined by a current foreground blob with a corresponding region in the background image. As noted, normalized cross correlation techniques may be used to determine how correlated the changes in pixel values in the current image are relative to the corresponding pixels in the background image. Atstep 620, if a correlation threshold is satisfied, then the pixels in that region are reclassified as depicting background in the current frame and the initial assignment of foreground is presumed to be the result of a lighting change. Otherwise, the pixels in the foreground region under consideration remain classified as foreground. Forexample foreground regions 518 and 519 (which each correspond to a car in the raw image 505) would be expected to have a relatively weak normalized cross correlation score and remain asforeground regions FG mask 520. However,region 517 would be expected to have a relatively high normalized cross correlation score and be reclassified as background. This result is depicted in the final BG/FG mask 520, which includes foreground objects 521 and 522, corresponding toregions image 515, but the pixels ofregion 517 have been reclassified as background. -
FIG. 7 illustrates amethod 700 for updating a background model, according to one embodiment of the invention. Themethod 700 may be performed after a current video frame of a video stream has been segmented into background and foreground using the techniques discussed above. Themethod 700 begins atstep 705, where the mean and variance of each pixel classified as depicting background are updated based on the color channel values of the pixel in the current frame. As noted, the background model may maintain a distribution of pixel values for each pixel, along with a mean and variance related to the distribution. Further, the update to the mean and variance may use exponential weights to give more weight to more recent background samples than older samples. - Following
step 705, a loop begins to absorb a portion of each foreground pixel into the background using a dynamic absorption factor. First, atstep 710, the system determines a frequency at which the current pixel has been classified as foreground. Note, the frequency may be determined relative to a configure window of past frames, e.g., over the previous 1000 frames. Atstep 715, the system determines a dynamic absorption factor as described above. Again, the more frequent the pixel is classified as foreground (i.e., the more active that region of the frame), the lower the absorption rate. Atstep 720, the system updates the mean for the pixel in the background model based on the color value of the pixel in the current frame and the absorption factor. Again, the update to the background model leaves the variance unchanged. Atstep 730, if additional foreground pixels remain to be absorbed into the background model, then the method returns to step 710 and the loop repeats for the next foreground pixels. Otherwise, themethod 700 ends. -
FIG. 8 illustrates an example ofcomputing server 800 which includes a video analytics component, according to one embodiment of the invention. As shown, thecomputing system 800 includes, without limitation, a central processing unit (CPU) 805, a graphics processing unit (GPU) 806, anetwork interface 815, amemory 820, andstorage 830, each connected to abus 817. Thecomputing system 800 may also include an I/O device interface 810 connecting I/O devices 812 (e.g., keyboard, display and mouse devices) to thecomputing system 800. Further, in context of this disclosure, the computing elements shown incomputing system 800 may correspond to a physical computing system (e.g., a system in a data center) or may be a virtual computing instance executing within a computing cloud. - The
CPU 805 retrieves and executes programming instructions stored in thememory 820 as well as stores and retrieves application data residing in thememory 830. Theinterconnect 817 is used to transmit programming instructions and application data between theCPU 805, I/O devices interface 810,storage 830,network interface 815, andmemory 820. Note,CPU 805 is included to be representative of a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like. And thememory 820 is generally included to be representative of a random access memory. Thestorage 830 may be a disk drive storage device. Although shown as a single unit, thestorage 830 may be a combination of fixed and/or removable storage devices, such as fixed disc drives, removable memory cards, optical storage, network attached storage (NAS), or a storage area-network (SAN). - The graphics
processing unit GPU 806 is a specialized integrated circuit designed to accelerate the image output in a frame buffer intended for output to a display. GPUs are very efficient at manipulating computer graphics and are generally more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel. - In one embodiment, components of the BG/FG segmentation component discussed above may be implemented to process frames of a video stream in parallel on
GPU 806. For example, the pixel based detector may be configured to generate the candidate background image using theGPU 806 to evaluate multiple pixels simultaneously. Similarly, the context based component may perform the normalized cross correlations on theGPU 806, as well as use theGPU 806 to update multiple pixels in the background model simultaneously (based on the classification of pixels as depicting foreground and background). Further, in addition to evaluating blocks of pixel data of the background image, background model and raw image on theGPU 806 in parallel, other components of the video analytics system may be implemented to execute onGPU 806 as well. For example, the micro feature classifier may determine the kinematics or appearance features of a foregroundobject using GPU 806 as well as generate elements of the feature vector in parallel. More generally, the video analytics component 822 may use GPU accelerated computing to accelerate the process of segmenting scene foreground and background in frames of streaming video. The resulting efficiency in processing pixel data onGPU 806 allows theserver 800 to scale to support multiple camera feeds using a smaller hardware footprint. - Illustratively, the
memory 820 includes a video analytics component 822 and acurrent frame 824 andstorage 830 include abackground model 832 and abackground image 834. As discussed above, the video analytics component 822 may be configured to segment scene foreground from background in thecurrent frame 824 using both a pixel based detector and a context based detector to determine a segmentation of scene foreground and background in frames of streaming video. Further still, the video analytics component 822 may maintain anaccurate background model 832 by incrementally updating the mean and variance for pixels in thebackground model 832, absorbing foreground pixels into thebackground model 832 via an absorption window, and by dynamically updating background/foreground thresholds used by the pixel based component. Additionally, thebackground image 834 provides a representation of a scene absent any foreground objects that may change over time (1) as elements of scene foreground are absorbed by thebackground model 832 and (2) as background illumination gradually changes. Doing so collectively ensure that the video analytics component 822 can effectively and efficiently respond to changes in a scene, without overly increasing computational complexity. - Advantageously, using both a pixel based detector and a context based detector provides an effective technique for segmenting scene foreground from background in a video stream. Further, this approach can scale to process large numbers of camera feeds simultaneously, e.g., using parallel processing architectures.
- In the preceding, reference is made to embodiments of the invention. However, the invention is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
- Aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples a computer readable storage medium include: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the current context, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus or device.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by special-purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- Embodiments of the invention may be provided to end users through a cloud computing infrastructure. Cloud computing generally refers to the provision of scalable computing resources as a service over a network. More formally, cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Thus, cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.
- Users can access any of the computing resources that reside in the cloud at any time, from anywhere across the Internet. For example, in context of this disclosure, a virtual server instance in a computing cloud could be configured to execute the video analytics components to process a streaming camera feed (or feeds). In such case, the computing resources could be scaled as needed as multiple camera feeds are added.
- While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims (2)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/582,524 US10303955B2 (en) | 2014-10-29 | 2017-04-28 | Foreground detector for video analytics system |
US16/385,732 US10872243B2 (en) | 2014-10-29 | 2019-04-16 | Foreground detector for video analytics system |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/526,756 US9349054B1 (en) | 2014-10-29 | 2014-10-29 | Foreground detector for video analytics system |
PCT/US2015/058025 WO2016069881A1 (en) | 2014-10-29 | 2015-10-29 | Foreground detector for video analytics system |
US15/582,524 US10303955B2 (en) | 2014-10-29 | 2017-04-28 | Foreground detector for video analytics system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2015/058025 Continuation WO2016069881A1 (en) | 2014-10-29 | 2015-10-29 | Foreground detector for video analytics system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/385,732 Continuation US10872243B2 (en) | 2014-10-29 | 2019-04-16 | Foreground detector for video analytics system |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180082130A1 true US20180082130A1 (en) | 2018-03-22 |
US10303955B2 US10303955B2 (en) | 2019-05-28 |
Family
ID=55852998
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/526,756 Active US9349054B1 (en) | 2014-10-29 | 2014-10-29 | Foreground detector for video analytics system |
US15/582,524 Active US10303955B2 (en) | 2014-10-29 | 2017-04-28 | Foreground detector for video analytics system |
US16/385,732 Active US10872243B2 (en) | 2014-10-29 | 2019-04-16 | Foreground detector for video analytics system |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/526,756 Active US9349054B1 (en) | 2014-10-29 | 2014-10-29 | Foreground detector for video analytics system |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/385,732 Active US10872243B2 (en) | 2014-10-29 | 2019-04-16 | Foreground detector for video analytics system |
Country Status (2)
Country | Link |
---|---|
US (3) | US9349054B1 (en) |
WO (1) | WO2016069881A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190095741A1 (en) * | 2015-03-05 | 2019-03-28 | Canon Kabushiki Kaisha | Region-of-interest detection apparatus, region-of-interest detection method, and recording medium |
US10282847B2 (en) * | 2016-07-29 | 2019-05-07 | Otis Elevator Company | Monitoring system of a passenger conveyor and monitoring method thereof |
US10373340B2 (en) * | 2014-10-29 | 2019-08-06 | Omni Ai, Inc. | Background foreground model with dynamic absorption window and incremental update for background model thresholds |
US10872243B2 (en) | 2014-10-29 | 2020-12-22 | Intellective Ai, Inc. | Foreground detector for video analytics system |
Families Citing this family (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839203B1 (en) | 2016-12-27 | 2020-11-17 | Amazon Technologies, Inc. | Recognizing and tracking poses using digital imagery captured from multiple fields of view |
US11334836B2 (en) | 2017-01-04 | 2022-05-17 | MSM Holdings Pte Ltd | System and method for analyzing media for talent discovery |
US10496949B2 (en) | 2017-01-04 | 2019-12-03 | Christopher Zoumalan | Compositions and methods for treating cutaneous conditions |
CN110114801B (en) * | 2017-01-23 | 2022-09-20 | 富士通株式会社 | Image foreground detection device and method and electronic equipment |
GB2560177A (en) | 2017-03-01 | 2018-09-05 | Thirdeye Labs Ltd | Training a computational neural network |
GB2560387B (en) | 2017-03-10 | 2022-03-09 | Standard Cognition Corp | Action identification using neural networks |
US10699421B1 (en) | 2017-03-29 | 2020-06-30 | Amazon Technologies, Inc. | Tracking objects in three-dimensional space using calibrated visual cameras and depth cameras |
JP6930179B2 (en) * | 2017-03-30 | 2021-09-01 | 富士通株式会社 | Learning equipment, learning methods and learning programs |
CN108875759B (en) * | 2017-05-10 | 2022-05-24 | 华为技术有限公司 | Image processing method and device and server |
US10133933B1 (en) | 2017-08-07 | 2018-11-20 | Standard Cognition, Corp | Item put and take detection using image recognition |
US11250376B2 (en) | 2017-08-07 | 2022-02-15 | Standard Cognition, Corp | Product correlation analysis using deep learning |
US11023850B2 (en) | 2017-08-07 | 2021-06-01 | Standard Cognition, Corp. | Realtime inventory location management using deep learning |
US10650545B2 (en) | 2017-08-07 | 2020-05-12 | Standard Cognition, Corp. | Systems and methods to check-in shoppers in a cashier-less store |
US10853965B2 (en) | 2017-08-07 | 2020-12-01 | Standard Cognition, Corp | Directional impression analysis using deep learning |
US11200692B2 (en) | 2017-08-07 | 2021-12-14 | Standard Cognition, Corp | Systems and methods to check-in shoppers in a cashier-less store |
US11232687B2 (en) | 2017-08-07 | 2022-01-25 | Standard Cognition, Corp | Deep learning-based shopper statuses in a cashier-less store |
US10474991B2 (en) | 2017-08-07 | 2019-11-12 | Standard Cognition, Corp. | Deep learning-based store realograms |
US10445694B2 (en) | 2017-08-07 | 2019-10-15 | Standard Cognition, Corp. | Realtime inventory tracking using deep learning |
US10127438B1 (en) * | 2017-08-07 | 2018-11-13 | Standard Cognition, Corp | Predicting inventory events using semantic diffing |
US10474988B2 (en) | 2017-08-07 | 2019-11-12 | Standard Cognition, Corp. | Predicting inventory events using foreground/background processing |
US11232294B1 (en) | 2017-09-27 | 2022-01-25 | Amazon Technologies, Inc. | Generating tracklets from digital imagery |
CN109598741A (en) * | 2017-09-30 | 2019-04-09 | 佳能株式会社 | Image processing apparatus and method and monitoring system |
US11284041B1 (en) | 2017-12-13 | 2022-03-22 | Amazon Technologies, Inc. | Associating items with actors based on digital imagery |
US11074453B2 (en) | 2018-01-31 | 2021-07-27 | Hewlett Packard Enterprise Development Lp | Video active region batching |
US11468698B1 (en) | 2018-06-28 | 2022-10-11 | Amazon Technologies, Inc. | Associating events with actors using digital imagery and machine learning |
US11468681B1 (en) | 2018-06-28 | 2022-10-11 | Amazon Technologies, Inc. | Associating events with actors using digital imagery and machine learning |
US11482045B1 (en) | 2018-06-28 | 2022-10-25 | Amazon Technologies, Inc. | Associating events with actors using digital imagery and machine learning |
US20200099961A1 (en) * | 2018-09-24 | 2020-03-26 | Dice Corporation | Networked video management system |
US11783707B2 (en) | 2018-10-09 | 2023-10-10 | Ford Global Technologies, Llc | Vehicle path planning |
US10795773B2 (en) * | 2018-11-12 | 2020-10-06 | Eagle Eye Networks, Inc | Persistent video camera and method of operation |
US10924669B2 (en) * | 2018-11-12 | 2021-02-16 | Eagle Eye Networks, Inc. | Persistent video camera and method of operation |
US11232575B2 (en) | 2019-04-18 | 2022-01-25 | Standard Cognition, Corp | Systems and methods for deep learning-based subject persistence |
US11460851B2 (en) | 2019-05-24 | 2022-10-04 | Ford Global Technologies, Llc | Eccentricity image fusion |
US11521494B2 (en) | 2019-06-11 | 2022-12-06 | Ford Global Technologies, Llc | Vehicle eccentricity mapping |
US11662741B2 (en) * | 2019-06-28 | 2023-05-30 | Ford Global Technologies, Llc | Vehicle visual odometry |
CN111311573B (en) * | 2020-02-12 | 2024-01-30 | 贵州理工学院 | Branch determination method and device and electronic equipment |
US11398094B1 (en) | 2020-04-06 | 2022-07-26 | Amazon Technologies, Inc. | Locally and globally locating actors by digital cameras and machine learning |
US11443516B1 (en) | 2020-04-06 | 2022-09-13 | Amazon Technologies, Inc. | Locally and globally locating actors by digital cameras and machine learning |
US11303853B2 (en) | 2020-06-26 | 2022-04-12 | Standard Cognition, Corp. | Systems and methods for automated design of camera placement and cameras arrangements for autonomous checkout |
US11361468B2 (en) | 2020-06-26 | 2022-06-14 | Standard Cognition, Corp. | Systems and methods for automated recalibration of sensors for autonomous checkout |
US20220417495A1 (en) * | 2021-06-28 | 2022-12-29 | Gentex Corporation | Stale video detection |
CN116546180B (en) * | 2022-11-21 | 2024-02-23 | 马凯翔 | Naked eye suspension 3D video generation method, device, equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5748775A (en) * | 1994-03-09 | 1998-05-05 | Nippon Telegraph And Telephone Corporation | Method and apparatus for moving object extraction based on background subtraction |
US5751378A (en) * | 1996-09-27 | 1998-05-12 | General Instrument Corporation | Scene change detector for digital video |
US6263088B1 (en) * | 1997-06-19 | 2001-07-17 | Ncr Corporation | System and method for tracking movement of objects in a scene |
US6570608B1 (en) * | 1998-09-30 | 2003-05-27 | Texas Instruments Incorporated | System and method for detecting interactions of people and vehicles |
US6661918B1 (en) * | 1998-12-04 | 2003-12-09 | Interval Research Corporation | Background estimation and segmentation based on range and color |
US6674877B1 (en) * | 2000-02-03 | 2004-01-06 | Microsoft Corporation | System and method for visually tracking occluded objects in real time |
US7136525B1 (en) * | 1999-09-20 | 2006-11-14 | Microsoft Corporation | System and method for background maintenance of an image sequence |
US7227893B1 (en) * | 2002-08-22 | 2007-06-05 | Xlabs Holdings, Llc | Application-specific object-based segmentation and recognition system |
US7929730B2 (en) * | 2007-10-29 | 2011-04-19 | Industrial Technology Research Institute | Method and system for object detection and tracking |
US8073197B2 (en) * | 2005-03-17 | 2011-12-06 | British Telecommunications Public Limited Company | Method of tracking objects in a video sequence |
US8094943B2 (en) * | 2007-09-27 | 2012-01-10 | Behavioral Recognition Systems, Inc. | Background-foreground module for video analysis system |
US8358834B2 (en) * | 2009-08-18 | 2013-01-22 | Behavioral Recognition Systems | Background model for complex and dynamic scenes |
US8456528B2 (en) * | 2007-03-20 | 2013-06-04 | International Business Machines Corporation | System and method for managing the interaction of object detection and tracking systems in video surveillance |
Family Cites Families (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060165386A1 (en) | 2002-01-08 | 2006-07-27 | Cernium, Inc. | Object selective video recording |
US6999600B2 (en) | 2003-01-30 | 2006-02-14 | Objectvideo, Inc. | Video scene background maintenance using change detection and classification |
US7418134B2 (en) | 2003-05-12 | 2008-08-26 | Princeton University | Method and apparatus for foreground segmentation of video sequences |
US7224735B2 (en) | 2003-05-21 | 2007-05-29 | Mitsubishi Electronic Research Laboratories, Inc. | Adaptive background image updating |
US7127083B2 (en) | 2003-11-17 | 2006-10-24 | Vidient Systems, Inc. | Video surveillance system with object detection and probability scoring based on object class |
WO2005122094A1 (en) | 2004-06-14 | 2005-12-22 | Agency For Science, Technology And Research | Method for detecting desired objects in a highly dynamic environment by a monitoring system |
US20060018516A1 (en) | 2004-07-22 | 2006-01-26 | Masoud Osama T | Monitoring activity using video information |
US7620266B2 (en) | 2005-01-20 | 2009-11-17 | International Business Machines Corporation | Robust and efficient foreground analysis for real-time video surveillance |
JP2009533778A (en) | 2006-04-17 | 2009-09-17 | オブジェクトビデオ インコーポレイテッド | Video segmentation using statistical pixel modeling |
US8467570B2 (en) | 2006-06-14 | 2013-06-18 | Honeywell International Inc. | Tracking system with fused motion and object detection |
US7916944B2 (en) | 2007-01-31 | 2011-03-29 | Fuji Xerox Co., Ltd. | System and method for feature level foreground segmentation |
US8086036B2 (en) | 2007-03-26 | 2011-12-27 | International Business Machines Corporation | Approach for resolving occlusions, splits and merges in video images |
US8200011B2 (en) | 2007-09-27 | 2012-06-12 | Behavioral Recognition Systems, Inc. | Context processor for video analysis system |
US8301577B2 (en) | 2008-11-21 | 2012-10-30 | National Yunlin University Of Science And Technology | Intelligent monitoring system for establishing reliable background information in a complex image environment |
US9373055B2 (en) | 2008-12-16 | 2016-06-21 | Behavioral Recognition Systems, Inc. | Hierarchical sudden illumination change detection using radiance consistency within a spatial neighborhood |
RU2484531C2 (en) | 2009-01-22 | 2013-06-10 | Государственное научное учреждение центральный научно-исследовательский и опытно-конструкторский институт робототехники и технической кибернетики (ЦНИИ РТК) | Apparatus for processing video information of security alarm system |
US8285046B2 (en) | 2009-02-18 | 2012-10-09 | Behavioral Recognition Systems, Inc. | Adaptive update of background pixel thresholds using sudden illumination change detection |
US20110043689A1 (en) | 2009-08-18 | 2011-02-24 | Wesley Kenneth Cobb | Field-of-view change detection |
US8218819B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object detection in a video surveillance system |
US8218818B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object tracking |
US8280158B2 (en) * | 2009-10-05 | 2012-10-02 | Fuji Xerox Co., Ltd. | Systems and methods for indexing presentation videos |
JP5713790B2 (en) * | 2011-05-09 | 2015-05-07 | キヤノン株式会社 | Image processing apparatus, image processing method, and program |
US8903167B2 (en) * | 2011-05-12 | 2014-12-02 | Microsoft Corporation | Synthesizing training samples for object recognition |
US20130279598A1 (en) * | 2011-10-14 | 2013-10-24 | Ryan G. Gomes | Method and Apparatus For Video Compression of Stationary Scenes |
US8666117B2 (en) | 2012-04-06 | 2014-03-04 | Xerox Corporation | Video-based system and method for detecting exclusion zone infractions |
US9111353B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Adaptive illuminance filter in a video analysis system |
US9317908B2 (en) | 2012-06-29 | 2016-04-19 | Behavioral Recognition System, Inc. | Automatic gain control filter in a video analysis system |
US9113143B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Detecting and responding to an out-of-focus camera in a video analytics system |
US9412025B2 (en) | 2012-11-28 | 2016-08-09 | Siemens Schweiz Ag | Systems and methods to classify moving airplanes in airports |
US9158985B2 (en) | 2014-03-03 | 2015-10-13 | Xerox Corporation | Method and apparatus for processing image of scene of interest |
US9349054B1 (en) | 2014-10-29 | 2016-05-24 | Behavioral Recognition Systems, Inc. | Foreground detector for video analytics system |
-
2014
- 2014-10-29 US US14/526,756 patent/US9349054B1/en active Active
-
2015
- 2015-10-29 WO PCT/US2015/058025 patent/WO2016069881A1/en active Application Filing
-
2017
- 2017-04-28 US US15/582,524 patent/US10303955B2/en active Active
-
2019
- 2019-04-16 US US16/385,732 patent/US10872243B2/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5748775A (en) * | 1994-03-09 | 1998-05-05 | Nippon Telegraph And Telephone Corporation | Method and apparatus for moving object extraction based on background subtraction |
US5751378A (en) * | 1996-09-27 | 1998-05-12 | General Instrument Corporation | Scene change detector for digital video |
US6263088B1 (en) * | 1997-06-19 | 2001-07-17 | Ncr Corporation | System and method for tracking movement of objects in a scene |
US6570608B1 (en) * | 1998-09-30 | 2003-05-27 | Texas Instruments Incorporated | System and method for detecting interactions of people and vehicles |
US6661918B1 (en) * | 1998-12-04 | 2003-12-09 | Interval Research Corporation | Background estimation and segmentation based on range and color |
US7136525B1 (en) * | 1999-09-20 | 2006-11-14 | Microsoft Corporation | System and method for background maintenance of an image sequence |
US6674877B1 (en) * | 2000-02-03 | 2004-01-06 | Microsoft Corporation | System and method for visually tracking occluded objects in real time |
US7227893B1 (en) * | 2002-08-22 | 2007-06-05 | Xlabs Holdings, Llc | Application-specific object-based segmentation and recognition system |
US8073197B2 (en) * | 2005-03-17 | 2011-12-06 | British Telecommunications Public Limited Company | Method of tracking objects in a video sequence |
US8456528B2 (en) * | 2007-03-20 | 2013-06-04 | International Business Machines Corporation | System and method for managing the interaction of object detection and tracking systems in video surveillance |
US8094943B2 (en) * | 2007-09-27 | 2012-01-10 | Behavioral Recognition Systems, Inc. | Background-foreground module for video analysis system |
US7929730B2 (en) * | 2007-10-29 | 2011-04-19 | Industrial Technology Research Institute | Method and system for object detection and tracking |
US8358834B2 (en) * | 2009-08-18 | 2013-01-22 | Behavioral Recognition Systems | Background model for complex and dynamic scenes |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10373340B2 (en) * | 2014-10-29 | 2019-08-06 | Omni Ai, Inc. | Background foreground model with dynamic absorption window and incremental update for background model thresholds |
US10872243B2 (en) | 2014-10-29 | 2020-12-22 | Intellective Ai, Inc. | Foreground detector for video analytics system |
US10916039B2 (en) | 2014-10-29 | 2021-02-09 | Intellective Ai, Inc. | Background foreground model with dynamic absorption window and incremental update for background model thresholds |
US20190095741A1 (en) * | 2015-03-05 | 2019-03-28 | Canon Kabushiki Kaisha | Region-of-interest detection apparatus, region-of-interest detection method, and recording medium |
US10748023B2 (en) * | 2015-03-05 | 2020-08-18 | Canon Kabushiki Kaisha | Region-of-interest detection apparatus, region-of-interest detection method, and recording medium |
US10282847B2 (en) * | 2016-07-29 | 2019-05-07 | Otis Elevator Company | Monitoring system of a passenger conveyor and monitoring method thereof |
Also Published As
Publication number | Publication date |
---|---|
WO2016069881A1 (en) | 2016-05-06 |
US10303955B2 (en) | 2019-05-28 |
US20160125245A1 (en) | 2016-05-05 |
US20190311204A1 (en) | 2019-10-10 |
US9349054B1 (en) | 2016-05-24 |
US10872243B2 (en) | 2020-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10872243B2 (en) | Foreground detector for video analytics system | |
US10916039B2 (en) | Background foreground model with dynamic absorption window and incremental update for background model thresholds | |
US9471844B2 (en) | Dynamic absorption window for foreground background detector | |
US10679315B2 (en) | Detected object tracker for a video analytics system | |
Braham et al. | Semantic background subtraction | |
US9959630B2 (en) | Background model for complex and dynamic scenes | |
Martel-Brisson et al. | Learning and removing cast shadows through a multidistribution approach | |
US20170083764A1 (en) | Detected object tracker for a video analytics system | |
US20170083766A1 (en) | Detected object tracker for a video analytics system | |
WO2016069902A9 (en) | Background foreground model with dynamic absorbtion window and incremental update for background model thresholds | |
US20170083765A1 (en) | Detected object tracker for a video analytics system | |
KR101646000B1 (en) | Surveillance System And Method To Object And Area | |
WO2017053822A1 (en) | Detected object tracker for a video analytics system | |
JP2016095701A (en) | Image processor, image processing method, and program | |
KR101600617B1 (en) | Method for detecting human in image frame | |
Agrawal et al. | An improved Gaussian Mixture Method based background subtraction model for moving object detection in outdoor scene | |
Sharma et al. | A survey on moving object detection methods in video surveillance | |
CN111027482A (en) | Behavior analysis method and device based on motion vector segmentation analysis | |
Zin et al. | Background modeling using special type of Markov Chain | |
Aung et al. | Foreground objects segmentation in videos with improved codebook model | |
Batra et al. | Application Of ADNN For Background Subtraction In Smart Surveillance System | |
Appiah et al. | Binary object recognition system on FPGA with bSOM | |
Ramamoorthy et al. | Intelligent video surveillance system using background subtraction technique and its analysis | |
Li et al. | Joint optimization of background subtraction and object detection for night surveillance | |
Vidya et al. | Video Surveillance System for Security Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PEPPERWOOD FUND II, LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GIANT GRAY, INC.;REEL/FRAME:043147/0402 Effective date: 20170131 Owner name: OMNI AI, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PEPPERWOOD FUND II, LP;REEL/FRAME:043147/0445 Effective date: 20170201 Owner name: BEHAVIORAL RECOGNITION SYSTEMS, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SAITWAL, KISHOR ADINATH;RISINGER, LON;COBB, WESLEY KENNETH;REEL/FRAME:043147/0289 Effective date: 20141027 Owner name: GIANT GRAY, INC., TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:BEHAVIORAL RECOGNITION SYSTEMS, INC.;REEL/FRAME:043380/0722 Effective date: 20160321 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: INTELLECTIVE AI, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OMNI AI, INC.;REEL/FRAME:052216/0585 Effective date: 20200124 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |