WO2016201683A1 - Cloud platform with multi camera synchronization - Google Patents

Cloud platform with multi camera synchronization Download PDF

Info

Publication number
WO2016201683A1
WO2016201683A1 PCT/CN2015/081836 CN2015081836W WO2016201683A1 WO 2016201683 A1 WO2016201683 A1 WO 2016201683A1 CN 2015081836 W CN2015081836 W CN 2015081836W WO 2016201683 A1 WO2016201683 A1 WO 2016201683A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
event
determining
features
location
Prior art date
Application number
PCT/CN2015/081836
Other languages
French (fr)
Inventor
Song CAO
Genquan DUAN
Original Assignee
Wizr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wizr filed Critical Wizr
Priority to PCT/CN2015/081836 priority Critical patent/WO2016201683A1/en
Priority to EP15895256.4A priority patent/EP3311334A4/en
Publication of WO2016201683A1 publication Critical patent/WO2016201683A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Definitions

  • the invention is related to a video processing system.
  • the present invention relates to a system described in the disclosure and/or shown in the drawings.
  • the present invention relates to a method as described in the disclosure and/or shown in the drawings.
  • the present invention relates to a method comprising: determining a first feature description of an event in a first video; determining a first location of the event in the first video; determining a first time stamp of the first video; determining a second feature description of an event in a second video; determining a second location of the event in the second video; determining a second time stamp of the second video; and determining a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.
  • the present invention relates to a video processing system comprising: one or more video storage locations storing a plurality of videos recorded from a plurality of video cameras; one or more video processors communicatively coupled with the one or more video storage locations, the one or more video processors configured to: determine a first feature description of an event in a first video; determine a first location of the event in the first video; determine a first time stamp of the first video; determine a second feature description of an event in a second video; determine a second location of the event in the second video; determine a second time stamp of the second video; and determine a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.
  • the present invention relates to a method comprising: segmenting a video spanning a first duration of scene into a plurality of video clips having a specific time period; creating a visual dictionary of the scene from the plurality of video clips; determining features within the video clips; ranking the video clips based on the features; and creating a summarization video from the highest ranked video clips having a second duration that is significantly shorter than the first duration.
  • the present invention relates to a method comprising: track an event of interest in a specific number of frames within a video; determine the location of the event within each of the specific number of frames; and estimate a location of the event in a subsequent frame within the video based on the location of the event within each of the specific number of frames.
  • the present invention relates to a method comprising: determining that an event has occurred in a video; determining event features; determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database; and notifying a user about the event.
  • the present invention relates to a method comprising: determining that an event has occurred in a video; determining event features; determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database; notifying a user about the event; receiving an indication from the user that the event is a false alarm; and updating the false alarm database with the event features using a machine learning algorithm.
  • the present invention relates to a method for video processing comprising: converting the video to a second video that has at least one of a lower resolution and a lower frame rate; and processing the video.
  • the present invention relates to a method comprising: determining features and/or events in a video; determining foreground regions of at least a portion of the video that includes the features and/or events; determining a region of interest within one or more frames of the at least a portion of the video; and detecting the presence of a human within the region of interest within the at least a portion of the video.
  • the present invention relates to a method comprising: determining features and/or events in a video; determining foreground regions of at least a portion of the video that includes the features and/or events; determining a region of interest within one or more frames of the at least a portion of the video; detecting the presence of a human within the region of interest within the at least a portion of the video; and determining whether the features and/or events represent a false alarm.
  • Figure 1 illustrates a block diagram of a system 100 for multi-camera synchronization.
  • Figure 2 is a flowchart of an example process for determining a similarity between two events recorded by one or more cameras according to some embodiments.
  • Figure 3 is a flowchart of an example process for summarizing video from a camera according to some embodiments.
  • Figure 4 is a flowchart of an example process for predicting the future location of an event in video frame and/or outputting a control value to a camera to track the event according to some embodiments.
  • Figure 5 shows an illustrative computational system for performing functionality to facilitate implementation of embodiments described herein.
  • Figure 6 is a flowchart of an example process for processing videos according to some embodiments.
  • Figure 7 is a flowchart of an example process for event filtering according to some embodiments.
  • Systems and methods are disclosed for multi-camera synchronization of events. Systems and methods are also disclosed for creating summarization videos. Systems and methods are also disclosed for predicting the future location of events within a video frame.
  • FIG. 1 illustrates a block diagram of a system 100 that may be used in various embodiments.
  • the system 100 may include a plurality of cameras: camera 120, camera 121, and camera 122. While three cameras are shown, any number of cameras may be included.
  • These cameras may include any type of video camera such as, for example, a wireless video camera, a black and white video camera, surveillance video camera, portable cameras, battery powered cameras, CCTV cameras, Wi-Fi enabled cameras, smartphones, smart devices, tablets, computers, GoPro cameras, wearable cameras, etc.
  • the cameras may be positioned anywhere such as, for example, within the same geographic location, in separate geographic location, positioned to record portions of the same scene, positioned to record different portions of the same scene, etc.
  • the cameras may be owned and/or operated by different users, organizations, companies, entities, etc.
  • the cameras may be coupled with the network 115.
  • the network 115 may, for example, include the Internet, a telephonic network, a wireless telephone network, a 3G network, etc.
  • the network may include multiple networks, connections, servers, switches, routers, connections, etc. that may enable the transfer of data.
  • the network 115 may be or may include the Internet.
  • the network may include one or more LAN, WAN, WLAN, MAN, SAN, PAN, EPN, and/or VPN.
  • one more of the cameras may be coupled with a base station, digital video recorder, or a controller that is then coupled with the network 115.
  • the system 100 may also include video data storage 105 and/or a video processor 110.
  • the video data storage 105 and the video processor 110 may be coupled together via a dedicated communication channel that is separate than or part of the network 115.
  • the video data storage 105 and the video processor 110 may share data via the network 115.
  • the video data storage 105 and the video processor 110 may be part of the same system or systems.
  • the video data storage 105 may include one or more remote or local data storage locations such as, for example, a cloud storage location, a remote storage location, etc.
  • the video data storage 105 may store video files recorded by one or more of camera 120, camera 121, and camera 122.
  • the video files may be stored in any video format such as, for example, mpeg, avi, etc.
  • video files from the cameras may be transferred to the video data storage 105 using any data transfer protocol such as, for example, HTTP live streaming (HLS) , real time streaming protocol (RTSP) , Real Time Messaging Protocol (RTMP) , HTTP Dynamic Streaming (HDS) , Smooth Streaming, Dynamic Streaming over HTTP, HTML5, Shoutcast, etc.
  • HTTP live streaming HLS
  • RTSP real time streaming protocol
  • RTMP Real Time Messaging Protocol
  • HDS HTTP Dynamic Streaming
  • Smooth Streaming Dynamic Streaming over HTTP
  • HTML5 Shoutcast
  • the video data storage 105 may store user identified event data reported by one or more individuals.
  • the user identified event data may be used, for example, to train the video processor 110 to capture feature events.
  • a video file may be recorded and stored in memory located at a user location prior to being transmitted to the video data storage 105. In some embodiments, a video file may be recorded by the camera and streamed directly to the video data storage 105.
  • the video processor 110 may include one or more local and/or remote servers that may be used to perform data processing on videos stored in the video data storage 105. In some embodiments, the video processor 110 may execute one more algorithms on one or more video files stored with the video storage location. In some embodiments, the video processor 110 may execute a plurality of algorithms in parallel on a plurality of video files stored within the video data storage 105. In some embodiments, the video processor 110 may include a plurality of processors (or servers) that each execute one or more algorithms on one or more video files stored in video data storage 105. In some embodiments, the video processor 110 may include one or more of the components of computational system 500 shown in Fig. 5.
  • Figure 2 is a flowchart of an example process 200 for determining a similarity between two events recorded by one or more cameras according to some embodiments.
  • One or more steps of the process 200 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110.
  • video processor 110 Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
  • Process 200 begins at block 205 where a set of videos stored in a storage location such as, for example, video data storage 105, may be monitored by one or more processors such as, for example, video processor 110.
  • the set of videos may be stored in different local and/or remote storage locations.
  • the separate storage locations may include local storage locations and/or remote storage locations relative to the video processor 110 that can be accessed by the video processor 110 directly and/or through the network 115.
  • a video V i may be stored in the storage location that was recorded by camera C i , where i is a number between 1 and the total number of videos stored in the storage location and/or the number of cameras.
  • the video V i may include a plurality of events E ij , where j represents the number of events within a given video V i .
  • block 210 it may be determined that an eventE i1 has occurred in a video V i recorded by camera C i .
  • low level events may be detected in block 210 such as, for example, motion detection events.
  • a feature detection algorithm may be used to determine that event E i1 has occurred.
  • the feature description can be determined using a low level detection algorithm.
  • block 210 may occur in conjunction with block 215.
  • An event may include any number or type of occurrences captured by a video camera and stored in a video file.
  • An event may include, for example, a person moving through a scene, a car or an object moving through a scene, a particular face entering the scene, a face, a shadow, animals entering the scene, an automobile entering or leaving the scene, etc.
  • a feature description f i1 can be determined for the event E i1 .
  • the feature descriptionf i1 may be determined using a feature detector algorithm such as, for example, SURG, SIFT, GLOH, HOG, Affine shape adaptation, Harris affine, Hessian affine, etc.
  • the feature description can be determined using a high level detection algorithm.
  • Various other feature detector algorithms may be used.
  • the feature description f i1 may be saved in the video storage location such as, for example, as metadata associated with the video V i .
  • the locationl i1 of the feature f i1 may be determined.
  • the locationl i1 of the featuref i1 may be determined in the scene or in the camera frame.
  • the locationl i1 may be represented in pixels.
  • the feature f i1 may cover a number of pixels within a scene, the location l i1 may then be determined from the center of the feature f il .
  • the locationl i1 may be saved in the video storage location such as, for example, as metadata associated with the video V i .
  • a time stamp t i1 of the feature may be determined.
  • the timestamp may be an absolute time relative to some standard time.
  • the time stamp t i1 may be saved in the video storage location such as, for example, as metadata associated with the video V i .
  • a similarity measure may be determined for the event E i1 relative to another event E i1 .
  • the similarity measure may be determined from the following equation:
  • ⁇ 1 , ⁇ 2 , and ⁇ 3 are constants that may be fixed for each camera, 1 ⁇ m ⁇ number of videos, 1 ⁇ n ⁇ number of features in video V m , and d (l i1 , l mn ) represent the Manhattan distance between the location l i1 and location l mn .
  • the events may be considered to be correlated if the similarity between the events is less than a threshold: ⁇ (E i1 , E mn ) ⁇ T; where T is a threshold value.
  • T 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, etc. If the event E i1 and the event E mn are correlated the two events may capture the same occurrence from the two different cameras C i and C m
  • the process 200 may be repeated for each video in the storage location or for specific videos in the storage location.
  • block 235 may determine the similarity between two events if the two videos or portions of the two videos are captured within a time window and/or if the two videos or portions of the two videos capture scenes that are with a specific geographic region relative to one another. For example, if an event E 11 is found in a video file recorded by camera 120 and camera 121 is physically close to camera 120, and/or camera 121 recorded a video file that is temporally close to event E 11 then process 200 may correlateevent E 11 withevents recorded with the video file produced by camera 121.
  • FIG 3 is a flowchart of an example process 300 for summarizing video from a camera according to some embodiments.
  • One or more steps of the process 300 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110.
  • video processor 110 Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
  • Process 300 begins at block 305 where video such as, for example, a video file from a camera, is segmented into a plurality of clips.
  • video such as, for example, a video file from a camera
  • one or more videos representing all the video recorded from a camera during a time period may be segmented into a plurality of clips of a short duration.
  • the time period may be forty-eight, thirty-six, twenty-four, twelve, six, one, etc. hours.
  • the short duration may be sixty, thirty fifteen, ten, five, one, etc. seconds.
  • segmented video clips of the short duration may be created as individual files.
  • segmented video clips may be created virtually by identifying the beginning and end points of virtual video clips within the original video files.
  • a visual dictionary may be created from the plurality of video clips.
  • the video dictionary may be built using an unsupervised method such as, for example, using a k-means to cluster segment video clips.
  • Each class center of the segmented video clips may be labeled as a concept in the visual dictionary.
  • the visual dictionary may identify the visual features within the plurality of video clips.
  • the visual dictionary may include faces, unique faces, background features, objects, animals, movement trajectories, color schemes, shading, brightness, etc.
  • the visual dictionary may be created from the video clips or other video files scene recorded by the same camera.
  • the visual dictionary may identify features and/or occurrences that differ from the background.
  • the video clip may be converted into features using the visual dictionary.
  • the video dictionary may represent a plurality of video concepts. These concepts may include, for example, features, actions, events, etc, such as, for example, walking, running, jumping, standing, automobile, color, animals, numbers, names, faces, etc.
  • the video clip may be compared with the video clip and a concept confidence score may be associated with the video clip based on the correlation or comparison of the video clip with concepts in the video dictionary. These confidence scores may be associated with the video clip as descriptions and/or features of the video clip.
  • the video clips may be ranked based on the features. For example, video clips with more unique features may be ranked higher than video clips with fewer unique features. As another example, video clips showing the background scene or mostly the background scene may be ranked lower. In some embodiments, video clips may be sorted based on the number of features or unique features in each video clip.
  • a summarization video clip may be created from the highest ranked video clips.
  • the summarization video clip may have a specific length such as, for example, five, two, one, etc. minute.
  • the summarization video clip may be created, for example with portions of each video clip ranked above a certain amount or including specific number of features.
  • the summarization video clip may be created from a specific number of video clips. For example, for a one minute summarization video clip the top fifteen highest ranked video clips may be used. As another example, for a one minute summarization video clip the seven highest ranked video clips may be used.
  • Figure 4 is a flowchart of an example process 400 for predicting the future location of an event in video frame and/or outputting a control value to a camera to track the event according to some embodiments.
  • One or more steps of the process 400 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110, camera 120, camera 121, and/or camera 122.
  • video processor 110 camera 120, camera 121, and/or camera 122.
  • various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
  • Process 400 begins at block 405 where an event (or feature) is tracked through a specific number of frames.
  • An event can be determined, for example, in a manner similar to what is described in conjunction with block 210 or block 215 of process 200 shown in Figure 2.
  • the specific number of frames for example, for example, may include 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 25, or 30 frames.
  • the location of the event may be determined in at least some of the specific number of frames.
  • the location of the event can be determined, for example, in a manner similar to what is described in conjunction with block 220 of process 200 shown in Figure 2.
  • the location of the event for example, may include a vertical position, a horizontal position, a width, and/or a height of the event within a frame in pixel dimensions.
  • the location of the event in a subsequent frame such as, for example, frame i+n, where n is any number greater than 1, may be estimated.
  • the location of the event in the subsequent frame may be determined based on the trajectory of the event in the specific number of frames. For example, the rate of change of the event through the previous specific number of frames can be estimated and then used to estimate the location of the event in the subsequent frame.
  • Various other techniques may be used to track events.
  • one or more shift values between the event location and the center of the frame in the subsequent frames can be determined based on the estimate found in block 415.
  • a camera instruction may be output that specifies a camera movement that centers the event in subsequent frames. In response, for example, the camera may be moved to center the event in the frame.
  • the estimate of the location of the event in a subsequent frame may be made for a frame corresponding to the delay. For example, if the delay is five frames, then estimate may be made for an additional frame plus the delay frames: i+6 frames.
  • r i+n, j can be compared with estimated r i+n, j . If r i+n, j and estimated r i+n, j are found to be similar then in future frames a region of interested defined by r i+n, j can be determined and subsequent feature detections can occur within this region of interest.
  • the region of interest may include a buffer of a few pixels surrounding r i+n, j .
  • video processing may be sped up by decreasing the data size of a video.
  • a video may be converted into a second video by compressing the video, decreasing the resolution of the video, lower the frame rate, or some combination of these.
  • a video with a 20 frame per second frame rate may be converted to a video with a 2 frame per second frame rate.
  • an uncompressed video may be compressed using any number of video compression techniques.
  • the computational system 500 (or processing unit) illustrated in Figure 5 can be used to perform and/or control operation of any of the embodiments described herein.
  • the computational system 500 can be used alone or in conjunction with other components.
  • the computational system 500 can be used to perform any calculation, solve any equation, perform any identification, and/or make any determination described here.
  • the computational system 500 may include any or all of the hardware elements shown in the figure and described herein.
  • the computational system 500 may include hardware elements that can be electrically coupled via a bus 505 (or may otherwise be in communication, as appropriate) .
  • the hardware elements can include one or more processors 510, including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration chips, and/or the like) ; one or more input devices 515, which can include, without limitation, a mouse, a keyboard, and/or the like; and one or more output devices 520, which can include, without limitation, a display device, a printer, and/or the like.
  • processors 510 including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration chips, and/or the like)
  • input devices 515 which can include, without limitation, a mouse, a keyboard, and/or the like
  • the computational system 500 may further include (and/or be in communication with) one or more storage devices 525, which can include, without limitation, local and/or network-accessible storage and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as random access memory ( “RAM” ) and/or read-only memory ( “ROM” ) , which can be programmable, flash-updateable, and/or the like.
  • storage devices 525 can include, without limitation, local and/or network-accessible storage and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as random access memory ( “RAM” ) and/or read-only memory ( “ROM” ) , which can be programmable, flash-updateable, and/or the like.
  • RAM random access memory
  • ROM read-only memory
  • the computational system 500 might also include a communications subsystem 530, which can include, without limitation, a modem, a network card (wireless or wired) , an infrared communication device, a wireless communication device, and/or chipset (such as a device, a 802.6 device, a Wi-Fi device, a WiMAX device, cellular communication facilities, etc. ) , and/or the like.
  • the communications subsystem 530 may permit data to be exchanged with a network (such as the network described below, to name one example) and/or any other devices described herein.
  • the computational system 500 will further include a working memory 535, which can include a RAM or ROM device, as described above.
  • the computational system 500 also can include software elements, shown as being currently located within the working memory 535, including an operating system 540 and/or other code, such as one or more application programs 545, which may include computer programs of the invention, and/or may be designed to implement methods of the invention and/or configure systems of the invention, as described herein.
  • an operating system 540 and/or other code such as one or more application programs 545, which may include computer programs of the invention, and/or may be designed to implement methods of the invention and/or configure systems of the invention, as described herein.
  • application programs 545 which may include computer programs of the invention, and/or may be designed to implement methods of the invention and/or configure systems of the invention, as described herein.
  • one or more procedures described with respect to the method (s) discussed above might be implemented as code and/or instructions executable by a computer (and/or a processor within a computer) .
  • a set of these instructions and/or codes might be stored on a computer-readable storage medium, such as the storage device (s)
  • the storage medium might be incorporated within the computational system 500 or in communication with the computational system 500.
  • the storage medium might be separate from the computational system 500 (e.g., a removable medium, such as a compact disc, etc. ) , and/or provided in an installation package, such that the storage medium can be used to program a general-purpose computer with the instructions/code stored thereon.
  • These instructions might take the form of executable code, which is executable by the computational system 500 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on the computational system 500 (e.g. , using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc. ) , then takes the form of executable code.
  • FIG. 6 is a flowchart of an example process 600 for processing videos according to some embodiments.
  • One or more steps of the process 600 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110.
  • video processor 110 Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
  • an algorithm such as, for example, the grabcut algorithm may be used to segment the region of interest.
  • the algorithm may estimate the color distribution of the target object identified within the bounding box and/or that of the background using a Gaussian mixture model.
  • the color distribution of the target object may then be used to construct a Markov random field over the pixel labels, with an energy function that prefers connected regions having the same label, and running a graph cut based optimization to infer their values. This process may be repeated a number of times until convergence. The result may provide a mask that blocks out the background.
  • human potential size may be determined by placing a graphical image of a human within the scene.
  • the graphical image of the human may be scaled by the user until the human is approximately the size of a human recorded within the scene. Based on the amount of scaling, a typical human size may be determined. This may be repeated multiple times throughout the scene.
  • a background model may be created from the first number of frames using the images mean and/or covariance.
  • the background model may be created, for example, using a Gaussian Mixture Model (GMM) .
  • the input for example, can be a single frame from the video.
  • the output for example, can be a motion probability and/or a motion mask.
  • GMM Gaussian Mixture Model
  • the whole background model is an online updating model.
  • the model can keep updating the background parameter to handle change of viewpoint or illumination such as, for example, as the time of day changes and/or as the camera moves, pans, and/or tilts.
  • ⁇ and ⁇ can be updated using a maximum likelihood algorithm such as, for example, an EM algorithm, when a new frame (s) is analyzed and/or added to the model.
  • object tracking through the video frame can be performed by a tracking module at block 615.
  • five frames (or any number of frames) can be input into the tracking module and a predicted candidate region may be output.
  • the object tracking module can be used to determine, estimate, and/or predict regions in a frame representing a human (a candidate region) .
  • a detection module may be used.
  • a frame may be input into the detection module with a region of interest defined.
  • a detection response may be output.
  • the frame may be converted into a grayscale image.
  • Associated Pairing Comparison Features may be extracted from the grayscale image.
  • the APCF may pare comparison of color and gradient orientation in granular space.
  • the APCF feature may provide a chain of weak classifiers. Each classifier may provide a determination for the current extracted APCF feature such as, for example, the current region.
  • APCF features in the whole image can be extracted.
  • a sliding window can scan the whole frame.
  • a detector M in each sliding window, a chain of weak classifier will be used to determine if this area is a human or not.
  • each sliding window needs to pass every weak classifier in order to decide this is the human.
  • false alarm data may be input into a self-learning module.
  • the false alarm data may be collected from user input regarding various inputs, videos, and/or features.
  • SURF feature or any other feature detector
  • the feature may be computed to match with the candidate SURF features; for example, the test image can be compared with the false alarm image. If the matched points are over a threshold value, T, which indicates the new detect results are much similar to the sample in our false alarm, then candidate a i can be labeled as a false alarm.
  • the threshold value, T can be calculated by the ratio of matched points divided by the total points.
  • video processing may be spread among a plurality of servers located in the cloud or on cloud computing process. For example, different aspects, steps, or blocks of a video processing algorithm may occur on a different server. Alternatively or additionally, video processing may be for different videos may occur at different servers in the cloud.
  • each video frame of a video may include metadata.
  • the video may be processed for event and/or object detection. If an event or an object occurs within the video then metadata associated with the video may include details about the object or the event.
  • the metadata may be saved with the video or as a standalone file.
  • the metadata may include the time, the number of people in the scene, the height of one or more persons, the weight of one or more persons, the number of cars in the scene, the color of one or more cars in the scene, the license plate of one or more cars in the scene, the identity of one or more persons in the scene, facial recognition data for one or more persons in the scene, object identifiers for various objects in the scene, the color of objects in the scene, the type of objects within the scene, the number of objects in the scene, the video quality, the lighting quality, the trajectory of an object in the scene, etc.
  • feature detection may occur within a user specified region of interest.
  • a user may draw a bounding box in one or more frames of a scene recorded by a camera.
  • the bounding box may define the region of interest within which events may be identified and/or tracked.
  • a user may specify two or more points within a frame and a polygonal shaped bounding box may be created based on these points.
  • the user may have the option to specify whether the bounding box bounds the region of interest or should be excluded from the region of interest.
  • FIG. 7 is a flowchart of an example process 700 for event filtering according to some embodiments.
  • One or more steps of the process 700 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110. Although illustrated as discrete blocks, various blocks may be divided into T
  • the process 700 begins at block 705.
  • one or more videos may be monitored.
  • the videos may be monitored by a computer system such as, for example, video processor 110.
  • the videos may be monitored using one or more processes distributed across the Internet.
  • the one or more videos may include a video stream from a video camera or a video file stored in memory.
  • the one or more videos may have any file type.
  • an event can be detected to have occurred in the one or more videos.
  • the event may include a person moving through a scene, a car or an object moving through a scene, one or more faces being detected, a particular face leaving or entering the scene, a face, a shadow, animals entering the scene, an automobile entering or leaving the scene, etc.
  • the event may be detected using any number of algorithms such as, for example, SURG, SIFT, GLOH, HOG, Affine shape adaptation, Harris affine, Hessian affine, etc.
  • the event may be detected using a high level detection algorithm.
  • an event description may be created that includes various event data.
  • the data may include data about the scene and/or data about objects in the scene such as, for example, object colors, object speed, object velocity, object vectors, object trajectories, object positions, object types, object characteristics, etc.
  • a detected object may be a person.
  • the event data may include data about the person such as, for example, the hair color, height, name, facial features, etc.
  • the event data may include the time the event starts and the time the event stops. This data may be saved as metadata with the video.
  • a new video clip may be created that includes the event.
  • the new video clip may include video from the start of the event to the end of the event.
  • background and/or foreground filtering within the video may occur at some time during the execution of process 700.
  • process 700 proceeds to block 715. If an event has not been detected, then process 700 returns to block 705.
  • a false alarm event may be an event that has event data similar to event data in the false alarm database.
  • the events data in the false alarm database may include data created using machine learning based on user input and/or other input. For example, the event data found in block 710 may be compared with data in the false alarm database.
  • process 700 returns to block 705. If a false alarm event has been detected, then process 700 proceeds to block 725.
  • a user may be notified.
  • the user may be notified using an electronic message such as, for example, a text message, an SMS message, a push notification, an alarm, a phone call, etc.
  • a push notification may be sent to a smart device (e.g., a smart phone, a tablet, a phablet, etc. ) .
  • an app executing on the smart device may notify the user that an event has occurred.
  • the notification may include event data describing the type of event.
  • the notification may also indicate the location where the event occurred or the camera that recorded the event.
  • the user may be provided with an interface to indicate that the event was a false alarm.
  • an app executing on the users smart device may present the user with the option to indicate that the event is a false alarm.
  • the app may present a video clip that includes the event to the user along with a button that would allow the user to indicate that the event is a false alarm. If a user indication has not been received, then process 700 returns to block 705. If a user indication has been received, then process 700 proceeds to block 735.
  • the event data and/or the video clip including the event may be used to update the false alarm database and process 700 may then return to block 705.
  • machine learning techniques may be used to update the false alarm database.
  • machine learning techniques may be used in conjunction with the event data and/or the video clip to update the false alarm database.
  • machine learning (or self-learning) algorithms may be used to add new false alarms to the database and/or eliminate redundant false alarms. Redundant false alarms, for example, may include false alarms associated with the same face, the same facial features, the same body size, the same color of a car, etc.
  • Process 700 may be used to filter any number of false alarms from any number of videos.
  • the one or more videos being monitored at block 705 may be a video stream of a door step scene (or any other location) .
  • An event may be detected at block 710 when a person enters the scene.
  • Event data may include data that indicates the position of the person in the scene, the size of the person, facial data, the time the event occurs, etc.
  • the event data may include whether the face is recognized and/or the identity of the face.
  • process 700 moves to block 730 and an indication can be sent to the user, for example, through an app executing on their smartphone.
  • the user can then visually determine whether the face is known by manually indicating as much through the user interface of the smartphone.
  • the facial data may then be used to train the false alarm database.
  • process 700 may determine whether a car of specific make, model, color, and/or with certain license plates is a known car that has entered a scene and depending on the data in the false alarm database the user may be notified.
  • process 700 may determine whether an animal has entered a scene and depending on the data in the false alarm database the user may be notified.
  • process 700 may determine whether a person has entered the scene between specific hours.
  • process 700 may determine whether a certain number of people are found within a scene.
  • video processing such as, for example, process 700, may be sped up by decreasing the data size of the video being processed.
  • a video may be converted into a second video by compressing the video, decreasing the resolution of the video, lower the frame rate, or some combination of these.
  • a video with a 70 frame per second frame rate may be converted to a video with a 7 frame per second frame rate.
  • an uncompressed video may be compressed using any number of video compression techniques.
  • the user may indicate that the event is an important event that they would like to receive notifications about. For example, if the video shows a strange individual mulling about the users home during late hours, the user may indicate that they would like to be notified about such an event. This information may be used by the machine learning algorithm to ensure that such an event is not considered a false alarm and/or that the user is notified about the occurrence of such an event or a similar event in the future.
  • video processing may be spread among a plurality of servers located in the cloud or on cloud computing process. For example, different aspects, steps, or blocks of a video processing algorithm may occur on a different server. Alternatively or additionally, video processing may be for different videos may occur at different servers in the cloud.
  • each video frame of a video may include metadata.
  • the video may be processed for event and/or object detection. If an event or an object occurs within the video then metadata associated with the video may include details about the object or the event.
  • the metadata may be saved with the video or as a standalone file.
  • a computing device can include any suitable arrangement of components that provides a result conditioned on one or more inputs.
  • Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
  • Embodiments of the methods disclosed herein may be performed in the operation of such computing devices.
  • the order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The current invention discloses a method comprising: determining a first feature description of an event in a first video; determining a first location of the event in the first video; determining a first time stamp of the first video; determining a second feature description of an event in a second video; determining a second location of the event in the second video; determining a second time stamp of the second video; and determining a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.

Description

CLOUD PLATFORM WITH MULTI CAMERA SYNCHRONIZATION TECHNICAL FIELD
The invention is related to a video processing system.
SUMMARY OF INVENTION
The present invention relates to a system described in the disclosure and/or shown in the drawings.
The present invention relates to a method as described in the disclosure and/or shown in the drawings.
The present invention relates to a method comprising: determining a first feature description of an event in a first video; determining a first location of the event in the first video; determining a first time stamp of the first video; determining a second feature description of an event in a second video; determining a second location of the event in the second video; determining a second time stamp of the second video; and determining a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.
The present invention relates to a video processing system comprising: one or more video storage locations storing a plurality of videos recorded from a plurality of video cameras; one or more video processors communicatively coupled with the one or more video storage locations, the one or more video processors configured to: determine a first feature description of an event in a first video; determine a first location of the event in the first video; determine a first time stamp of the first video; determine a second feature description of an event in a second video; determine a second location of the event in the second video; determine a second time stamp of the second video; and determine a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.
The present invention relates to a method comprising: segmenting a video spanning a first duration of scene into a plurality of video clips having a specific time period; creating a visual dictionary of the scene from the plurality of video clips; determining features within the video clips; ranking the video clips based on the features; and  creating a summarization video from the highest ranked video clips having a second duration that is significantly shorter than the first duration.
The present invention relates to a method comprising: track an event of interest in a specific number of frames within a video; determine the location of the event within each of the specific number of frames; and estimate a location of the event in a subsequent frame within the video based on the location of the event within each of the specific number of frames.
The present invention relates to a method comprising: determining that an event has occurred in a video; determining event features; determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database; and notifying a user about the event.
The present invention relates to a method comprising: determining that an event has occurred in a video; determining event features; determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database; notifying a user about the event; receiving an indication from the user that the event is a false alarm; and updating the false alarm database with the event features using a machine learning algorithm.
The present invention relates to a method for video processing comprising: converting the video to a second video that has at least one of a lower resolution and a lower frame rate; and processing the video.
The present invention relates to a method comprising: determining features and/or events in a video; determining foreground regions of at least a portion of the video that includes the features and/or events; determining a region of interest within one or more frames of the at least a portion of the video; and detecting the presence of a human within the region of interest within the at least a portion of the video.
The present invention relates to a method comprising: determining features and/or events in a video; determining foreground regions of at least a portion of the video that includes the features and/or events; determining a region of interest within one or more frames of the at least a portion of the video; detecting the presence of a human within the region of interest within the at least a portion of the video; and determining whether the features and/or events represent a false alarm.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWING
These and other features, aspects, and advantages of the present disclosure are better understood when the following Detailed Description is read with reference to the accompanying drawings.
Figure 1 illustrates a block diagram of a system 100 for multi-camera synchronization.
Figure 2 is a flowchart of an example process for determining a similarity between two events recorded by one or more cameras according to some embodiments.
Figure 3 is a flowchart of an example process for summarizing video from a camera according to some embodiments.
Figure 4 is a flowchart of an example process for predicting the future location of an event in video frame and/or outputting a control value to a camera to track the event according to some embodiments.
Figure 5 shows an illustrative computational system for performing functionality to facilitate implementation of embodiments described herein.
Figure 6 is a flowchart of an example process for processing videos according to some embodiments.
Figure 7 is a flowchart of an example process for event filtering according to some embodiments.
DESCRIPTION OF PREFERRED EMBODIMENT
Systems and methods are disclosed for multi-camera synchronization of events. Systems and methods are also disclosed for creating summarization videos. Systems and methods are also disclosed for predicting the future location of events within a video frame.
Figure 1 illustrates a block diagram of a system 100 that may be used in various embodiments. The system 100 may include a plurality of cameras: camera 120, camera 121, and camera 122. While three cameras are shown, any number of cameras may be included. These cameras may include any type of video camera such as, for example, a wireless video camera, a black and white video camera, surveillance video camera, portable cameras, battery powered cameras, CCTV cameras, Wi-Fi enabled cameras, smartphones, smart devices, tablets, computers, GoPro cameras, wearable cameras, etc. The cameras may be positioned anywhere  such as, for example, within the same geographic location, in separate geographic location, positioned to record portions of the same scene, positioned to record different portions of the same scene, etc. In some embodiments, the cameras may be owned and/or operated by different users, organizations, companies, entities, etc.
The cameras may be coupled with the network 115. The network 115 may, for example, include the Internet, a telephonic network, a wireless telephone network, a 3G network, etc. In some embodiments, the network may include multiple networks, connections, servers, switches, routers, connections, etc. that may enable the transfer of data. In some embodiments, the network 115 may be or may include the Internet. In some embodiments, the network may include one or more LAN, WAN, WLAN, MAN, SAN, PAN, EPN, and/or VPN.
In some embodiments, one more of the cameras may be coupled with a base station, digital video recorder, or a controller that is then coupled with the network 115.
The system 100 may also include video data storage 105 and/or a video processor 110. In some embodiments, the video data storage 105 and the video processor 110 may be coupled together via a dedicated communication channel that is separate than or part of the network 115. In some embodiments, the video data storage 105 and the video processor 110 may share data via the network 115. In some embodiments, the video data storage 105 and the video processor 110 may be part of the same system or systems.
In some embodiments, the video data storage 105 may include one or more remote or local data storage locations such as, for example, a cloud storage location, a remote storage location, etc.
In some embodiments, the video data storage 105 may store video files recorded by one or more of camera 120, camera 121, and camera 122. In some embodiments, the video files may be stored in any video format such as, for example, mpeg, avi, etc. In some embodiments, video files from the cameras may be transferred to the video data storage 105 using any data transfer protocol such as, for example, HTTP live streaming (HLS) , real time streaming protocol (RTSP) , Real Time Messaging Protocol (RTMP) , HTTP Dynamic Streaming (HDS) , Smooth Streaming, Dynamic Streaming over HTTP, HTML5, Shoutcast, etc.
In some embodiments, the video data storage 105 may store user identified event data reported by one or more individuals. The user identified event data may be used, for example, to train the video processor 110 to capture feature events.
In some embodiments, a video file may be recorded and stored in memory located at a user location prior to being transmitted to the video data storage 105. In some embodiments, a video file may be recorded by the camera and streamed directly to the video data storage 105.
In some embodiments, the video processor 110 may include one or more local and/or remote servers that may be used to perform data processing on videos stored in the video data storage 105. In some embodiments, the video processor 110 may execute one more algorithms on one or more video files stored with the video storage location. In some embodiments, the video processor 110 may execute a plurality of algorithms in parallel on a plurality of video files stored within the video data storage 105. In some embodiments, the video processor 110 may include a plurality of processors (or servers) that each execute one or more algorithms on one or more video files stored in video data storage 105. In some embodiments, the video processor 110 may include one or more of the components of computational system 500 shown in Fig. 5.
Figure 2 is a flowchart of an example process 200 for determining a similarity between two events recorded by one or more cameras according to some embodiments. One or more steps of the process 200 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
Process 200 begins at block 205 where a set of videos stored in a storage location such as, for example, video data storage 105, may be monitored by one or more processors such as, for example, video processor 110. In some embodiments, the set of videos may be stored in different local and/or remote storage locations. The separate storage locations, for example, may include local storage locations and/or remote storage locations relative to the video processor 110 that can be accessed by the video processor 110 directly and/or through the network 115.
A video Vi may be stored in the storage location that was recorded by camera Ci, where i is a number between 1 and the total number of videos stored in the storage  location and/or the number of cameras. The video Vi may include a plurality of events Eij, where j represents the number of events within a given video Vi.
At block 210 it may be determined that an eventEi1 has occurred in a video Vi recorded by camera Ci. In some embodiments, low level events may be detected in block 210 such as, for example, motion detection events. In some embodiments, a feature detection algorithm may be used to determine that event Ei1 has occurred. In some embodiments, the feature description can be determined using a low level detection algorithm. In some embodiments, block 210 may occur in conjunction with block 215.
An event may include any number or type of occurrences captured by a video camera and stored in a video file. An event may include, for example, a person moving through a scene, a car or an object moving through a scene, a particular face entering the scene, a face, a shadow, animals entering the scene, an automobile entering or leaving the scene, etc.
If an event has not been found as determined by block 210, then the process 200 returns to block 205. If an event has been found, then the process 200 proceeds to block 215.
At block 215 a feature description fi1 can be determined for the event Ei1. The feature descriptionfi1 may be determined using a feature detector algorithm such as, for example, SURG, SIFT, GLOH, HOG, Affine shape adaptation, Harris affine, Hessian affine, etc. In some embodiments, the feature description can be determined using a high level detection algorithm. Various other feature detector algorithms may be used. In some embodiments, the feature description fi1may be saved in the video storage location such as, for example, as metadata associated with the video Vi.
At block 220 the locationli1 of the feature fi1 may be determined. The locationli1 of the featurefi1 may be determined in the scene or in the camera frame. The locationli1, for example, may be represented in pixels. In some embodiments, the feature fi1 may cover a number of pixels within a scene, the location li1may then be determined from the center of the feature fil. In some embodiments, the locationli1may be saved in the video storage location such as, for example, as metadata associated with the video Vi.
At block 225 a time stamp ti1 of the feature may be determined. In some embodiments, the timestamp may be an absolute time relative to some standard time.  In some embodiments, the time stamp ti1may be saved in the video storage location such as, for example, as metadata associated with the video Vi.
At block 230 a similarity measure may be determined for the event Ei1 relative to another event Ei1. In some embodiments, the similarity measure may be determined from the following equation:
δ (Ei1, Emn) =ω1Σ (fi1-fmn) +ω2·d (li1, lmn) +ω3 |ti1-tmn| ,
where ω1, ω2, and ω3 are constants that may be fixed for each camera, 1≤m≤number of videos, 1≤n≤ number of features in video Vm, and d (li1, lmn) represent the Manhattan distance between the location li1 and location lmn. ω1, for example, may be equal to the inverse of n: ω1=1/n. ω2, for example, may be a value between 0 and 10. In a specific example, ω2=1. ω3, for example, may be a value between 0 and 10. In a specific example, ω3=1.
At block 235 it can be determined whether the event Ei1 and the event Emn are related. For example, the events may be considered to be correlated if the similarity between the events is less than a threshold: δ (Ei1, Emn) <T; where T is a threshold value. For example, the threshold value T = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, etc. If the event Ei1 and the event Emn are correlated the two events may capture the same occurrence from the two different cameras Ci and Cm
The process 200 may be repeated for each video in the storage location or for specific videos in the storage location.
In some embodiments a greedy method may be used to find events to be correlated. In some embodiments, block 235 may determine the similarity between two events if the two videos or portions of the two videos are captured within a time window and/or if the two videos or portions of the two videos capture scenes that are with a specific geographic region relative to one another. For example, if an event E11 is found in a video file recorded by camera 120 and camera 121 is physically close to camera 120, and/or camera 121 recorded a video file that is temporally close to event E11 then process 200 may correlateevent E11 withevents recorded with the video file produced by camera 121.
Figure 3 is a flowchart of an example process 300 for summarizing video from a camera according to some embodiments. One or more steps of the process 300 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
Process 300 begins at block 305 where video such as, for example, a video file from a camera, is segmented into a plurality of clips. For example, one or more videos representing all the video recorded from a camera during a time period may be segmented into a plurality of clips of a short duration. In some embodiments, the time period may be forty-eight, thirty-six, twenty-four, twelve, six, one, etc. hours. In some embodiments, the short duration may be sixty, thirty fifteen, ten, five, one, etc. seconds. In some embodiments, segmented video clips of the short duration may be created as individual files. In some embodiments, segmented video clips may be created virtually by identifying the beginning and end points of virtual video clips within the original video files.
At block 310 a visual dictionary may be created from the plurality of video clips. In some embodiments, the video dictionary may be built using an unsupervised method such as, for example, using a k-means to cluster segment video clips. Each class center of the segmented video clips may be labeled as a concept in the visual dictionary.
The visual dictionary may identify the visual features within the plurality of video clips. The visual dictionary may include faces, unique faces, background features, objects, animals, movement trajectories, color schemes, shading, brightness, etc. In some embodiments, the visual dictionary may be created from the video clips or other video files scene recorded by the same camera. In some embodiments, the visual dictionary may identify features and/or occurrences that differ from the background.
At block 315 the video clip may be converted into features using the visual dictionary. The video dictionary, for example, may represent a plurality of video concepts. These concepts may include, for example, features, actions, events, etc, such as, for example, walking, running, jumping, standing, automobile, color, animals, numbers, names, faces, etc. The video clip may be compared with the video clip and a concept confidence score may be associated with the video clip based on the correlation or comparison of the video clip with concepts in the video dictionary. These  confidence scores may be associated with the video clip as descriptions and/or features of the video clip.
At block 320 the video clips may be ranked based on the features. For example, video clips with more unique features may be ranked higher than video clips with fewer unique features. As another example, video clips showing the background scene or mostly the background scene may be ranked lower. In some embodiments, video clips may be sorted based on the number of features or unique features in each video clip.
At block 325 a summarization video clip may be created from the highest ranked video clips. In some embodiments, the summarization video clip may have a specific length such as, for example, five, two, one, etc. minute. The summarization video clip may be created, for example with portions of each video clip ranked above a certain amount or including specific number of features. As another example, the summarization video clip may be created from a specific number of video clips. For example, for a one minute summarization video clip the top fifteen highest ranked video clips may be used. As another example, for a one minute summarization video clip the seven highest ranked video clips may be used.
Figure 4 is a flowchart of an example process 400 for predicting the future location of an event in video frame and/or outputting a control value to a camera to track the event according to some embodiments. One or more steps of the process 400 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110, camera 120, camera 121, and/or camera 122. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
Process 400 begins at block 405 where an event (or feature) is tracked through a specific number of frames. An event can be determined, for example, in a manner similar to what is described in conjunction with block 210 or block 215 of process 200 shown in Figure 2. The specific number of frames, for example, for example, may include 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 25, or 30 frames.
At block 410 the location of the event may be determined in at least some of the specific number of frames. The location of the event can be determined, for example, in a manner similar to what is described in conjunction with block 220 of process 200 shown in Figure 2. The location of the event, for example, may include a vertical  position, a horizontal position, a width, and/or a height of the event within a frame in pixel dimensions. The location may be written as ri, j= (x, y, w, h) , where i represents the frame, j represents the camera, x represents the horizontal location of the event, y represents the vertical location of the event, w represents the width of the event, and h represents the height of the event.
At block 415 the location of the event in a subsequent frame such as, for example, frame i+n, where n is any number greater than 1, may be estimated. The location of the event in the subsequent frame may be determined based on the trajectory of the event in the specific number of frames. For example, the rate of change of the event through the previous specific number of frames can be estimated and then used to estimate the location of the event in the subsequent frame. Various other techniques may be used to track events.
At block 420 one or more shift values between the event location and the center of the frame in the subsequent frames can be determined based on the estimate found in block 415. At block 425 a camera instruction may be output that specifies a camera movement that centers the event in subsequent frames. In response, for example, the camera may be moved to center the event in the frame.
In some embodiments, there may be a delay between estimating the event in a subsequent frame and the response of the camera to a movement instruction. In such embodiments, the estimate of the location of the event in a subsequent frame may be made for a frame corresponding to the delay. For example, if the delay is five frames, then estimate may be made for an additional frame plus the delay frames: i+6 frames.
In some embodiments, ri+n, j can be compared with estimated ri+n, j. If ri+n, j and estimated ri+n, j are found to be similar then in future frames a region of interested defined by ri+n, j can be determined and subsequent feature detections can occur within this region of interest. In some embodiments, the region of interest may include a buffer of a few pixels surrounding ri+n, j.
In some embodiments, video processing may be sped up by decreasing the data size of a video. For example, a video may be converted into a second video by compressing the video, decreasing the resolution of the video, lower the frame rate, or some combination of these. For example, a video with a 20 frame per second frame rate may be converted to a video with a 2 frame per second frame rate. As another  example, an uncompressed video may be compressed using any number of video compression techniques.
The computational system 500 (or processing unit) illustrated in Figure 5 can be used to perform and/or control operation of any of the embodiments described herein. For example, the computational system 500 can be used alone or in conjunction with other components. As another example, the computational system 500 can be used to perform any calculation, solve any equation, perform any identification, and/or make any determination described here.
The computational system 500 may include any or all of the hardware elements shown in the figure and described herein. The computational system 500 may include hardware elements that can be electrically coupled via a bus 505 (or may otherwise be in communication, as appropriate) . The hardware elements can include one or more processors 510, including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration chips, and/or the like) ; one or more input devices 515, which can include, without limitation, a mouse, a keyboard, and/or the like; and one or more output devices 520, which can include, without limitation, a display device, a printer, and/or the like.
The computational system 500 may further include (and/or be in communication with) one or more storage devices 525, which can include, without limitation, local and/or network-accessible storage and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as random access memory ( “RAM” ) and/or read-only memory ( “ROM” ) , which can be programmable, flash-updateable, and/or the like. The computational system 500 might also include a communications subsystem 530, which can include, without limitation, a modem, a network card (wireless or wired) , an infrared communication device, a wireless communication device, and/or chipset (such as a
Figure PCTCN2015081836-appb-000001
device, a 802.6 device, a Wi-Fi device, a WiMAX device, cellular communication facilities, etc. ) , and/or the like. The communications subsystem 530 may permit data to be exchanged with a network (such as the network described below, to name one example) and/or any other devices described herein. In many embodiments, the computational system 500 will further include a working memory 535, which can include a RAM or ROM device, as described above.
The computational system 500 also can include software elements, shown as being currently located within the working memory 535, including an operating system 540  and/or other code, such as one or more application programs 545, which may include computer programs of the invention, and/or may be designed to implement methods of the invention and/or configure systems of the invention, as described herein. For example, one or more procedures described with respect to the method (s) discussed above might be implemented as code and/or instructions executable by a computer (and/or a processor within a computer) . A set of these instructions and/or codes might be stored on a computer-readable storage medium, such as the storage device (s) 525 described above.
In some cases, the storage medium might be incorporated within the computational system 500 or in communication with the computational system 500. In other embodiments, the storage medium might be separate from the computational system 500 (e.g., a removable medium, such as a compact disc, etc. ) , and/or provided in an installation package, such that the storage medium can be used to program a general-purpose computer with the instructions/code stored thereon. These instructions might take the form of executable code, which is executable by the computational system 500 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on the computational system 500 (e.g. , using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc. ) , then takes the form of executable code.
Figure 6 is a flowchart of an example process 600 for processing videos according to some embodiments. One or more steps of the process 600 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
At block 605 in the user module a user may indicate foreground regions with in the scene recorded in a video. In some embodiments, the user may indicate multiple regions within the scene that are considered foreground regions. In some embodiments, the user may do this graphically by drawing foreground areas within a scene. For example, a user may create a window by clicking or touching two areas within one or more frames on the video. The region of interest for a video frame may then be segmented into regions of interest using any process known in the art.
In some embodiments, an algorithm such as, for example, the grabcut algorithm may be used to segment the region of interest. For example, the algorithm may estimate the color distribution of the target object identified within the bounding box and/or  that of the background using a Gaussian mixture model. The color distribution of the target object may then be used to construct a Markov random field over the pixel labels, with an energy function that prefers connected regions having the same label, and running a graph cut based optimization to infer their values. This process may be repeated a number of times until convergence. The result may provide a mask that blocks out the background.
In some embodiments, at block 605, human potential size may be determined by placing a graphical image of a human within the scene. The graphical image of the human may be scaled by the user until the human is approximately the size of a human recorded within the scene. Based on the amount of scaling, a typical human size may be determined. This may be repeated multiple times throughout the scene.
At block 610 a background model may be created from the first number of frames using the images mean and/or covariance. The background model may be created, for example, using a Gaussian Mixture Model (GMM) . The input, for example, can be a single frame from the video. The output, for example, can be a motion probability and/or a motion mask.
The background model may be initialized as M. For each input frame, f, the difference between f and the background model M can be calculated. Three Gaussians, g1, g2, g3, can be used describe the three color channels of the input frame. Since each gaussian will output a probability, the probability of a mixture Gaussian is the sum of these three probabilities: p (i, j) =g1+g2+g3, which can describe a region or pixel (i, j) as foreground or background. Each Gaussian may be described as g (I, μ, σ) ; where I represents the image, μ represents the mean, and σ represents the covariance.
In some embodiments, the whole background model is an online updating model. The model can keep updating the background parameter to handle change of viewpoint or illumination such as, for example, as the time of day changes and/or as the camera moves, pans, and/or tilts. In some embodiments, μand σ can be updated using a maximum likelihood algorithm such as, for example, an EM algorithm, when a new frame (s) is analyzed and/or added to the model.
In some embodiments, object tracking through the video frame can be performed by a tracking module at block 615. In some embodiments, five frames (or any number of frames) can be input into the tracking module and a predicted candidate region may be output. In some embodiments, the object tracking module can be used to  determine, estimate, and/or predict regions in a frame representing a human (a candidate region) .
In some embodiments, the detection area can be limited. Assuming we have the detection result for frame fi, where i=1, . . . , 5 as rij, the recent detection responses may be stored in a buffer. Each detection response may be represented as rij= (x, y, w, h) . If the detection result rij and ri'j' are similar, then there is a high probability that this is a region of interest. In some embodiments, an object detector may only be used in the predicted region.
At block 620 one or more humans may be detected in a video frame of the video. In some embodiments, a detection module may be used. In some embodiments, a frame may be input into the detection module with a region of interest defined. A detection response may be output. In some embodiments, the frame may be converted into a grayscale image. Associated Pairing Comparison Features (APCF) may be extracted from the grayscale image. The APCF may pare comparison of color and gradient orientation in granular space. The APCF feature may provide a chain of weak classifiers. Each classifier may provide a determination for the current extracted APCF feature such as, for example, the current region.
For example, when locating a human in a frame or image, APCF features in the whole image can be extracted. After this, a sliding window can scan the whole frame. Assume we already trained a detector M, in each sliding window, a chain of weak classifier will be used to determine if this area is a human or not. In detail, each sliding window needs to pass every weak classifier in order to decide this is the human.
At block 625 false alarm events can be learned. In some embodiments, false alarm data may be input into a self-learning module. The false alarm data, for example, may be collected from user input regarding various inputs, videos, and/or features. For example, SURF feature (or any other feature detector) may be used to compute features of the candidate ai. The feature may be computed to match with the candidate SURF features; for example, the test image can be compared with the false alarm image. If the matched points are over a threshold value, T, which indicates the new detect results are much similar to the sample in our false alarm, then candidate ai can be labeled as a false alarm. In some embodiments, the threshold value, T, can be calculated by the ratio of matched points divided by the total points.
In some embodiments, video processing may be spread among a plurality of servers located in the cloud or on cloud computing process. For example, different aspects, steps, or blocks of a video processing algorithm may occur on a different server. Alternatively or additionally, video processing may be for different videos may occur at different servers in the cloud.
In some embodiments, each video frame of a video may include metadata. For example, the video may be processed for event and/or object detection. If an event or an object occurs within the video then metadata associated with the video may include details about the object or the event. The metadata may be saved with the video or as a standalone file. The metadata, for example, may include the time, the number of people in the scene, the height of one or more persons, the weight of one or more persons, the number of cars in the scene, the color of one or more cars in the scene, the license plate of one or more cars in the scene, the identity of one or more persons in the scene, facial recognition data for one or more persons in the scene, object identifiers for various objects in the scene, the color of objects in the scene, the type of objects within the scene, the number of objects in the scene, the video quality, the lighting quality, the trajectory of an object in the scene, etc.
In some embodiments, feature detection may occur within a user specified region of interest. For example, a user may draw a bounding box in one or more frames of a scene recorded by a camera. The bounding box may define the region of interest within which events may be identified and/or tracked. In some embodiments, a user may specify two or more points within a frame and a polygonal shaped bounding box may be created based on these points. In some embodiments, the user may have the option to specify whether the bounding box bounds the region of interest or should be excluded from the region of interest.
Figure 7 is a flowchart of an example process 700 for event filtering according to some embodiments. One or more steps of the process 700 may be implemented, in some embodiments, by one or more components of system 100 of Figure 1, such as video processor 110. Although illustrated as discrete blocks, various blocks may be divided into T
The process 700 begins at block 705. At block 705 one or more videos may be monitored. In some embodiments, the videos may be monitored by a computer system such as, for example, video processor 110. In some embodiments, the videos may be monitored using one or more processes distributed across the Internet. In some embodiments, the one or more videos may include a video stream from a video  camera or a video file stored in memory. In some embodiments, the one or more videos may have any file type.
At block 710 an event can be detected to have occurred in the one or more videos. The event may include a person moving through a scene, a car or an object moving through a scene, one or more faces being detected, a particular face leaving or entering the scene, a face, a shadow, animals entering the scene, an automobile entering or leaving the scene, etc. In some embodiments, the event may be detected using any number of algorithms such as, for example, SURG, SIFT, GLOH, HOG, Affine shape adaptation, Harris affine, Hessian affine, etc. In some embodiments, the event may be detected using a high level detection algorithm.
When an event is detected an event description may be created that includes various event data. The data may include data about the scene and/or data about objects in the scene such as, for example, object colors, object speed, object velocity, object vectors, object trajectories, object positions, object types, object characteristics, etc. In some embodiments, a detected object may be a person. In some embodiments, the event data may include data about the person such as, for example, the hair color, height, name, facial features, etc. The event data may include the time the event starts and the time the event stops. This data may be saved as metadata with the video.
In some embodiments, when an event is detected a new video clip may be created that includes the event. For example, the new video clip may include video from the start of the event to the end of the event.
In some embodiments, background and/or foreground filtering within the video may occur at some time during the execution of process 700.
If an event has been detected, then process 700 proceeds to block 715. If an event has not been detected, then process 700 returns to block 705.
At block 715 it can be determined whether the event is a false alarm event. In some embodiments, a false alarm event may be an event that has event data similar to event data in the false alarm database. The events data in the false alarm database may include data created using machine learning based on user input and/or other input. For example, the event data found in block 710 may be compared with data in the false alarm database.
If a false alarm event has been detected, then process 700 returns to block 705. If a false alarm event has been detected, then process 700 proceeds to block 725.
At block 725 a user may be notified. For example, the user may be notified using an electronic message such as, for example, a text message, an SMS message, a push notification, an alarm, a phone call, etc. In some embodiments, a push notification may be sent to a smart device (e.g., a smart phone, a tablet, a phablet, etc. ) . In response, an app executing on the smart device may notify the user that an event has occurred. In some embodiments, the notification may include event data describing the type of event. In some embodiments, the notification may also indicate the location where the event occurred or the camera that recorded the event.
At block 730 the user may be provided with an interface to indicate that the event was a false alarm. For example, an app executing on the users smart device (or an application executing on a computer) may present the user with the option to indicate that the event is a false alarm. For example, the app may present a video clip that includes the event to the user along with a button that would allow the user to indicate that the event is a false alarm. If a user indication has not been received, then process 700 returns to block 705. If a user indication has been received, then process 700 proceeds to block 735.
At block 735 the event data and/or the video clip including the event may be used to update the false alarm database and process 700 may then return to block 705. In some embodiments, machine learning techniques may be used to update the false alarm database. For example, machine learning techniques may be used in conjunction with the event data and/or the video clip to update the false alarm database. As another example, machine learning (or self-learning) algorithms may be used to add new false alarms to the database and/or eliminate redundant false alarms. Redundant false alarms, for example, may include false alarms associated with the same face, the same facial features, the same body size, the same color of a car, etc.
Process 700 may be used to filter any number of false alarms from any number of videos. For example, the one or more videos being monitored at block 705 may be a video stream of a door step scene (or any other location) . An event may be detected at block 710 when a person enters the scene. Event data may include data that indicates the position of the person in the scene, the size of the person, facial data, the time the event occurs, etc. The event data may include whether the face is recognized and/or the identity of the face. At block 720 it can be determined that the  event is a false alarm when the facial data is compared with facial data in the false alarm database. If there is a match indicating that the face is known, then the event is a false alarm and process 700 returns back to block 705. Alternatively, if the facial data does not match facial data in the false alarm data, then process 700 moves to block 730 and an indication can be sent to the user, for example, through an app executing on their smartphone. The user can then visually determine whether the face is known by manually indicating as much through the user interface of the smartphone. The facial data may then be used to train the false alarm database.
In other examples process 700 may determine whether a car of specific make, model, color, and/or with certain license plates is a known car that has entered a scene and depending on the data in the false alarm database the user may be notified.
In other examples process 700 may determine whether an animal has entered a scene and depending on the data in the false alarm database the user may be notified.
In other examples process 700 may determine whether a person has entered the scene between specific hours.
In other examples process 700 may determine whether a certain number of people are found within a scene.
In some embodiments, video processing such as, for example, process 700, may be sped up by decreasing the data size of the video being processed. For example, a video may be converted into a second video by compressing the video, decreasing the resolution of the video, lower the frame rate, or some combination of these. For example, a video with a 70 frame per second frame rate may be converted to a video with a 7 frame per second frame rate. As another example, an uncompressed video may be compressed using any number of video compression techniques.
In some embodiments, at block 730 the user may indicate that the event is an important event that they would like to receive notifications about. For example, if the video shows a strange individual mulling about the users home during late hours, the user may indicate that they would like to be notified about such an event. This information may be used by the machine learning algorithm to ensure that such an event is not considered a false alarm and/or that the user is notified about the occurrence of such an event or a similar event in the future.
In some embodiments, video processing may be spread among a plurality of servers located in the cloud or on cloud computing process. For example, different aspects, steps, or blocks of a video processing algorithm may occur on a different server. Alternatively or additionally, video processing may be for different videos may occur at different servers in the cloud.
In some embodiments, each video frame of a video may include metadata. For example, the video may be processed for event and/or object detection. If an event or an object occurs within the video then metadata associated with the video may include details about the object or the event. The metadata may be saved with the video or as a standalone file. The metadata, for example, may include the time, the number of people in the scene, the height of one or more persons, the weight of one or more persons, the number of cars in the scene, the color of one or more cars in the scene, the license plate of one or more cars in the scene, the identity of one or more persons in the scene, facial recognition data for one or more persons in the scene, object identifiers for various objects in the scene, the color of objects in the scene, the type of objects within the scene, the number of objects in the scene, the video quality, the lighting quality, the trajectory of an object in the scene, etc.
The term “substantially” means within 5%or 10%of the value referred to or within manufacturing tolerances.
Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter may be practiced without these specific details. In other instances, methods, apparatuses, or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.
Some portions are presented in terms of algorithms or symbolic representations of operations on data bits or binary digital signals stored within a computing system memory, such as a computer memory. These algorithmic descriptions or representations are examples of techniques used by those of ordinary skill in the data processing art to convey the substance of their work to others skilled in the art. An algorithm is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, operations or processing involves physical manipulation of physical quantities. Typically, although not necessarily, such quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, or otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to such signals  as bits, data, values, elements, symbols, characters, terms, numbers, numerals, or the like. It should be understood, however, that all of these and similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical, electronic, or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.
The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provides a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.
The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.
While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the  present disclosure has been presented for-purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Claims (15)

  1. A system described in the disclosure and/or shown in the drawings.
  2. A method as described in the disclosure and/or shown in the drawings.
  3. A method comprising:
    determining a first feature description of an event in a first video;
    determining a first location of the event in the first video;
    determining a first time stamp of the first video;
    determining a second feature description of an event in a second video;
    determining a second location of the event in the second video;
    determining a second time stamp of the second video; and
    determining a correlation value based on the first location, the second location, the first time stamp, the second time stamp, the first feature description, and the second feature description.
  4. The method according to claim 3, wherein the correlation value is determined from:
    δ (E1, E2) =ω1Σ (f1-f2) +ω2·d (l1, l2) +ω3 |t1-t2| ,
    where ω1 is a constant, ω2 is a constant, and ω3 is a constant, 1≤m≤number of videos, 1≤n≤number of features in video Vm, and d (l1, l2) represents the distance between the first distance and the second distance.
  5. A video processing system comprising:
    one or more video storage locations storing a plurality of videos recorded from a plurality of video cameras;
    one or more video processors communicatively coupled with the one or more video storage locations, the one or more video processors configured to:
    determine a first feature description of an event in a first video;
    determine a first location of the event in the first video;
    determine a first time stamp of the first video;
    determine a second feature description of an event in a second video;
    determine a second location of the event in the second video;
    determine a second time stamp of the second video; and
    determine a correlation value based on the first location, the second location, the  first time stamp, the second time stamp, the first feature description, and the second feature description.
  6. A method comprising:
    segmenting a video spanning a first duration of scene into a plurality of video clips having a specific time period;
    creating a visual dictionary of the scene from the plurality of video clips;
    determining features within the video clips;
    ranking the video clips based on the features; and
    creating a summarization video from the highest ranked video clips having a second duration that is significantly shorter than the first duration.
  7. The method according to claim 6, wherein the first duration is longer than twelve hours and the second duration is shorter than five minutes.
  8. The method according to claim 6, wherein the specific time period comprises less than sixty seconds.
  9. A method comprising:
    track an event of interest in a specific number of frames within a video;
    determine the location of the event within each of the specific number of frames; and
    estimate a location of the event in a subsequent frame within the video based on the location of the event within each of the specific number of frames.
  10. The method according to claim 9, further comprising:
    determining a shift between the center of the frame and the estimated location of the event; and
    outputting the shift to the camera.
  11. A method comprising:
    determining that an event has occurred in a video
    determining event features;
    determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database; and
    notifying a user about the event.
  12. A method comprising:
    determining that an event has occurred in a video;
    determining event features;
    determining whether the event is a false alarm event based on a comparison of the event features with event features in a false alarm database;
    notifying a user about the event;
    receiving an indication from the user that the event is a false alarm; and
    updating the false alarm database with the event features using a machine learning algorithm.
  13. A method for video processing comprising:
    converting the video to a second video that has at least one of a lower resolution and a lower frame rate; and
    processing the video.
  14. A method comprising:
    determining features and/or events in a video;
    determining foreground regions of at least a portion of the video that includes the features and/or events;
    determining a region of interest within one or more frames of the at least a portion of the video; and
    detecting the presence of a human within the region of interest within the at least a portion of the video.
  15. A method comprising:
    determining features and/or events in a video;
    determining foreground regions of at least a portion of the video that includes the features and/or events;
    determining a region of interest within one or more frames of the at least a portion of the video;
    detecting the presence of a human within the region of interest within the at least a portion of the video; and
    determining whether the features and/or events represent a false alarm.
PCT/CN2015/081836 2015-06-18 2015-06-18 Cloud platform with multi camera synchronization WO2016201683A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/081836 WO2016201683A1 (en) 2015-06-18 2015-06-18 Cloud platform with multi camera synchronization
EP15895256.4A EP3311334A4 (en) 2015-06-18 2015-06-18 Cloud platform with multi camera synchronization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/081836 WO2016201683A1 (en) 2015-06-18 2015-06-18 Cloud platform with multi camera synchronization

Publications (1)

Publication Number Publication Date
WO2016201683A1 true WO2016201683A1 (en) 2016-12-22

Family

ID=57544724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081836 WO2016201683A1 (en) 2015-06-18 2015-06-18 Cloud platform with multi camera synchronization

Country Status (2)

Country Link
EP (1) EP3311334A4 (en)
WO (1) WO2016201683A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936677A (en) * 2017-12-15 2019-06-25 浙江舜宇智能光学技术有限公司 Video synchronization method applied to more mesh cameras
EP3405889A4 (en) * 2016-01-21 2019-08-28 Wizr LLC Cloud platform with multi camera synchronization
WO2021034864A1 (en) * 2019-08-21 2021-02-25 XNOR.ai, Inc. Detection of moment of perception
WO2021081332A1 (en) * 2019-10-25 2021-04-29 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11244463B2 (en) 2019-10-25 2022-02-08 7-Eleven, Inc. Scalable position tracking system for tracking position in large spaces
US11275953B2 (en) 2019-10-25 2022-03-15 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11288518B2 (en) 2019-10-25 2022-03-29 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11501455B2 (en) 2018-10-26 2022-11-15 7-Eleven, Inc. System and method for position tracking using edge computing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040143434A1 (en) * 2003-01-17 2004-07-22 Ajay Divakaran Audio-Assisted segmentation and browsing of news videos
US8237792B2 (en) * 2009-12-18 2012-08-07 Toyota Motor Engineering & Manufacturing North America, Inc. Method and system for describing and organizing image data
CN103220550A (en) * 2012-01-19 2013-07-24 华为技术有限公司 Video conversion method and device
CN103839373A (en) * 2013-03-11 2014-06-04 成都百威讯科技有限责任公司 Sudden abnormal event intelligent identification alarm device and system
CN103942542A (en) * 2014-04-18 2014-07-23 重庆卓美华视光电有限公司 Human eye tracking method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8665333B1 (en) * 2007-01-30 2014-03-04 Videomining Corporation Method and system for optimizing the observation and annotation of complex human behavior from video sources
US8300953B2 (en) * 2008-06-05 2012-10-30 Apple Inc. Categorization of digital media based on media characteristics
EP2995079A4 (en) * 2013-05-10 2017-08-23 Robert Bosch GmbH System and method for object and event identification using multiple cameras

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040143434A1 (en) * 2003-01-17 2004-07-22 Ajay Divakaran Audio-Assisted segmentation and browsing of news videos
US8237792B2 (en) * 2009-12-18 2012-08-07 Toyota Motor Engineering & Manufacturing North America, Inc. Method and system for describing and organizing image data
CN103220550A (en) * 2012-01-19 2013-07-24 华为技术有限公司 Video conversion method and device
CN103839373A (en) * 2013-03-11 2014-06-04 成都百威讯科技有限责任公司 Sudden abnormal event intelligent identification alarm device and system
CN103942542A (en) * 2014-04-18 2014-07-23 重庆卓美华视光电有限公司 Human eye tracking method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3311334A4 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3405889A4 (en) * 2016-01-21 2019-08-28 Wizr LLC Cloud platform with multi camera synchronization
CN109936677A (en) * 2017-12-15 2019-06-25 浙江舜宇智能光学技术有限公司 Video synchronization method applied to more mesh cameras
CN109936677B (en) * 2017-12-15 2021-07-27 浙江舜宇智能光学技术有限公司 Video synchronization method applied to multi-view camera
US11501455B2 (en) 2018-10-26 2022-11-15 7-Eleven, Inc. System and method for position tracking using edge computing
WO2021034864A1 (en) * 2019-08-21 2021-02-25 XNOR.ai, Inc. Detection of moment of perception
US10997730B2 (en) 2019-08-21 2021-05-04 XNOR.ai, Inc. Detection of moment of perception
US11816876B2 (en) 2019-08-21 2023-11-14 Apple Inc. Detection of moment of perception
WO2021081332A1 (en) * 2019-10-25 2021-04-29 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11288518B2 (en) 2019-10-25 2022-03-29 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11275953B2 (en) 2019-10-25 2022-03-15 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11580748B2 (en) 2019-10-25 2023-02-14 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11580749B2 (en) 2019-10-25 2023-02-14 7-Eleven, Inc. Tracking positions using a scalable position tracking system
US11244463B2 (en) 2019-10-25 2022-02-08 7-Eleven, Inc. Scalable position tracking system for tracking position in large spaces
US11823396B2 (en) 2019-10-25 2023-11-21 7-Eleven, Inc. Scalable position tracking system for tracking position in large spaces
JP7420932B2 (en) 2019-10-25 2024-01-23 セブン-イレブン インコーポレイテッド Tracking location using a scalable location tracking system

Also Published As

Publication number Publication date
EP3311334A1 (en) 2018-04-25
EP3311334A4 (en) 2019-08-07

Similar Documents

Publication Publication Date Title
US10489660B2 (en) Video processing with object identification
WO2016201683A1 (en) Cloud platform with multi camera synchronization
US10395385B2 (en) Using object re-identification in video surveillance
US10282617B2 (en) Methods and systems for performing sleeping object detection and tracking in video analytics
US20190304102A1 (en) Memory efficient blob based object classification in video analytics
US20190034734A1 (en) Object classification using machine learning and object tracking
US20170193810A1 (en) Video event detection and notification
US20180144476A1 (en) Cascaded-time-scale background modeling
IL261696A (en) System and method for training object classifier by machine learning
US10410059B2 (en) Cloud platform with multi camera synchronization
US10152630B2 (en) Methods and systems of performing blob filtering in video analytics
US20180268563A1 (en) Methods and systems for performing sleeping object detection in video analytics
CN110795595A (en) Video structured storage method, device, equipment and medium based on edge calculation
US10140554B2 (en) Video processing
CN112633313B (en) Bad information identification method of network terminal and local area network terminal equipment
CN109564686A (en) The method and system of the motion model for object tracing device is updated in video analysis
US11954880B2 (en) Video processing
US20190370553A1 (en) Filtering of false positives using an object size model
KR102042397B1 (en) syntax-based method of producing heat-map for compressed video
WO2022228325A1 (en) Behavior detection method, electronic device, and computer readable storage medium
US20210287051A1 (en) Methods and systems for recognizing object using machine learning model
CN112085025B (en) Object segmentation method, device and equipment
US20190373165A1 (en) Learning to switch, tune, and retrain ai models
CN111062337B (en) People stream direction detection method and device, storage medium and electronic equipment
KR102375541B1 (en) Apparatus for Providing Artificial Intelligence Service with structured consistency loss and Driving Method Thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15895256

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE