WO2017027212A1 - Machine vision feature-tracking system - Google Patents

Machine vision feature-tracking system Download PDF

Info

Publication number
WO2017027212A1
WO2017027212A1 PCT/US2016/044143 US2016044143W WO2017027212A1 WO 2017027212 A1 WO2017027212 A1 WO 2017027212A1 US 2016044143 W US2016044143 W US 2016044143W WO 2017027212 A1 WO2017027212 A1 WO 2017027212A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
feature
video
region
candidate
Prior art date
Application number
PCT/US2016/044143
Other languages
French (fr)
Inventor
Zhiwei Xiong
Wenjun Zeng
Chi Su
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of WO2017027212A1 publication Critical patent/WO2017027212A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • Machine vision techniques for automatically identifying objects or features in digital images and video, are used for a wide variety of uses. Visible features in video frames, such as faces or license plates, may move, rotate, or become shadowed or otherwise obscured, e.g., as movement of the features (e.g., objects) is captured over multiple frames of the video. Such occurrences can reduce the accuracy of machine- vision algorithms such as object location, orientation detection, human-face detection and tracking.
  • a computing system can produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map.
  • the computing system can determine a candidate feature region based at least in part on the candidate feature-location map.
  • the computing system can then locate a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video.
  • the test region can be determined based at least in part on the candidate feature region.
  • features can be tracked over a video from an image sensor and an indication of tracked features can be presented, e.g., via an electronic display.
  • FIG. 1 is a block diagram depicting an example environment for implementing feature tracking in videos as described herein.
  • FIG. 2 is a graphical representation of example frames of video in which features are tracked.
  • FIG. 3 shows example histograms of regions of image data according to an example of feature tracking as described herein.
  • FIG. 4 is a dataflow diagram depicting example module interactions during feature tracking.
  • FIG. 5 is a flow diagram that illustrates an example process for tracking features in video frames.
  • FIG. 6 is a flow diagram that illustrates an example process for tracking and detecting features in video frames.
  • FIG. 7 is a flow diagram that illustrates an example process for tracking features in video frames across visibility reductions.
  • FIG. 8 is a block diagram depicting an example computing device configured to participate in feature tracking according to various examples described herein.
  • Examples described herein provide techniques and constructs to track features in digital videos. These techniques can enable feature tracking across multiple frames of a video with increased speed, increased precision, reduced compute time, and/or reduced memory requirements. This tracking can also permit additional techniques to be performed, such as obscuring or highlighting features in the video such as faces, or selecting relevant portions of the video for further analysis, e.g., character recognition of license plates.
  • Some examples described herein provide improved performance compared to conventional tracking algorithms. Some prior schemes detect features anew in each frame of the video. Some examples herein detect features in one out of a group of multiple frames and then track the features between frames, reducing the computational demands of tracking. Some examples permit tracking features even when the features are not present in some frames, e.g., because they are obscured behind other features.
  • video refers to a temporal sequence of digital images.
  • the images can be regularly spaced in time, e.g., 29.97, 30, or 60 frames per second (fps), or can be irregularly spaced in time.
  • Video can be stored and manipulated in, e.g., an uncompressed form or a compressed form.
  • Example compression techniques can include, but are not limited to, those described in the Portable Network Graphics (PNG), Joint Photographic Experts Group (JPEG), Motion JPEG, and Moving Picture Experts Group (MPEG) standards.
  • FIG. 1 shows an example environment 100 in which examples of feature- tracking systems can operate or in which feature tracking methods such as described below can be performed.
  • various devices and/or components of environment 100 include a computing device 102, depicted as a server computer.
  • Computing device 102 represents any type of device that can receive and process video content.
  • Computing device 102 can be implemented as, for example, but without limitation, an Internet-enabled television, a television set-top box, a game console, a desktop computer, a laptop computer, a tablet computer, or a smartphone.
  • Different devices or types of devices can have different uses for feature-tracking data.
  • controllers for robots or industrial machines can use feature-tracking information from a video of their workspaces to determine the location of workpieces during movement or a work step.
  • Surveillance or other video-recording systems operating, e.g., in public spaces, can use feature-tracking data to highlight suspects in a video or to obscure people's faces to protect their privacy.
  • computing device 102 includes, or is communicatively connected with, an image source 104, depicted as a camera.
  • Image source 104 can include one or more computing device(s) or other systems configured to provide video frames 106.
  • image source 104 provides a video of a scene 108.
  • the video has multiple video frames 106.
  • Scene 108 includes a subject 110, in this example, a person. This example scene 108 is for purposes of illustration and is not limiting.
  • computing device 102 or image source 104 can include one or more sensors 112, e.g., configured to capture video frames 106 or otherwise provide video frames 106 or data that can be processed into video frames 106.
  • image sensor 112 can be configured to provide a video having a plurality of frames 106.
  • Example image sensors can include front- and rear-facing cameras of a smartphone, a light sensor (e.g., a CdS photoresistor or a phototransistor), a still imager (e.g., a charge- coupled device (CCD), a complementary metal-oxide-semiconductor (CMOS) sensor, etc.), a video imager (e.g., CCD or CMOS), a fingerprint reader, a retinal scanner, an iris scanner, a computed-radiography scanner, or the like.
  • a light sensor e.g., a CdS photoresistor or a phototransistor
  • a still imager e.g., a charge- coupled device (CCD), a complementary metal-oxide-semiconductor (CMOS) sensor, etc.
  • CMOS complementary metal-oxide-semiconductor
  • a video imager e.g., CCD or CMOS
  • a fingerprint reader e
  • Some example sensors 112 can include visible-light image sensors (e.g., ⁇ G [400 nm, 700 nm]) or infrared-light image sensors (e.g., ⁇ G [700 nm, 15 ⁇ ] or ⁇ G [700 nm, 1 mm]).
  • Individual one(s) of one or more sensor(s) 112 can be configured to output sensor data corresponding to at least one physical property, e.g., a physical property of an environment of the device, such as ambient light or a scene image or video frame.
  • the image data representing the subject 110 in one or more of the video frame(s) 106 has a feature 114, in this example, the person's face.
  • other example features can include, but are not limited to, identification signs, e.g., license plates or parking permits on vehicles, boat registration markers on hulls or sails, aircraft tail numbers (e.g., "N38290" or "B-8888"), street-address numbers displayed on buildings or mailboxes, patterns on articles of clothing, e.g., cartoon faces, grotesques, or corporate or government logos on T-shirts, or articles of clothing, accessories, or vehicles having colors or patterns readily distinguishable from the background of the scene 108, e.g., a red umbrella, a black hat, or a striped dress.
  • Various aspects herein track faces or other features 114 across multiple video frames 106.
  • Computing device 102 includes at least one processor 116 and a memory 118 configured to store, e.g., video frame(s) 106 or other video data being processed.
  • Memory 118 can also store one or more of a detecting component 120, a tracking component 122 and a reporting component 124 stored in the memory 118 and executable on the processor 116.
  • the components 120, 122, or 124 can include, e.g., modules stored on computer-readable media such as computer storage media (discussed below) and having thereon computer-executable instructions.
  • the details of example computing device 102 can be representative of other computing devices 102 or of image source(s) 104. However, individual ones of computing devices 102 or image sources 104 can include additional or alternative hardware and/or software components.
  • Detecting component 120 can be configured to locate features 114, e.g., faces, in one or more video frames 106.
  • Tracking component 122 can be configured to track features 114 across multiple video frames 106, e.g., to determine how a given feature has moved between two consecutive frames, or to activate or deactivate trackers for specific features based on shadowing, occlusion, or other changes in the visibility of the features.
  • Reporting component 124 can be configured to, e.g., provide a visual representation of a located feature, such as a face, in frame(s) 106 of a video.
  • reporting component 124 can be configured to render at least some of the frames 106 and tracking indications for display via a display device 126.
  • Display device 126 can include an organic light-emitting-diode (OLED) display, a liquid- crystal display (LCD), a cathode-ray tube (CRT), or another type of visual display.
  • Display device 126 can be a component of a touchscreen, or can include a touchscreen.
  • reporting component 124 can provide, via display device 126, visual representations 128 of video frames or feature-tracking data, as discussed below.
  • reporting component 124 can present for display one or more frames 106 of the video and visual representations 128 of the at least some of the tracked ones of the located features.
  • Example visual representations 128(1)— 128(3) (individually or collectively referred to herein with reference 128) are shown in FIG. 1.
  • Representations 128 can represent possible graphical displays of tracking and video information, and can represent internal data structures discussed below.
  • Time can be represented, e.g., in wall-clock time (hours, minutes or seconds), frames or fields (e.g., as in timecodes standardized by the Society of Motion Picture and Television Engineers, SMPTE), or any combination thereof.
  • a tracker 130(1) is shown highlighting a feature of interest, in this example, the face of subject 110.
  • a tracker is a data structure representing a particular feature across one or more of the video frames 106, and can be graphically represented as a highlight such as an outline (illustrated) or as an obscurant such as a darkened or blurred area. Other example types of highlights and obscurants are discussed below.
  • Subject 110 has moved further to the right of the frame 106 (to the subject's left), and is beginning to move out of the frame 106.
  • detecting component 120, tracking component 122, and reporting component 124 can each be components of an operating system (not shown) or otherwise stored or executed locally on computing device 102 or image source 104.
  • one or more of the system components can be implemented as part of a distributed system, for example, as part of a cloud-based service or as components of another application.
  • image source 104 can be embodied in a smartphone and computing device 102 can be embodied in a cloud or other hosted feature-tracking service.
  • image source 104 is embodied in or connected to computing device 102.
  • image source 104 and computing device 102 can be embodied in a smartphone, tablet, MICROSOFT SURFACE, APPLE IPAD, or other device configured to capture a video and track features therein or display tracking information of features therein. This can advantageously permit tracking features even when not connected to a network.
  • a processor (not shown) of image source 104 can communicate with computing device 102 via a network 132, such as the Internet.
  • Computing device 102 can host one or more of detecting component 120, tracking component 122, and reporting component 124, and can exchange data with image source 104 via network 132 to perform processing described herein.
  • Network 132 can include a cable television network, radio frequency (RF), microwave, satellite, and/or data network, such as the Internet, and can also support wired or wireless media using any format and/or protocol, such as broadcast, unicast, or multicast. Additionally, network 132 can be any type of network, wired or wireless, using any type of network topology and any network communication protocol, and can be represented or otherwise implemented as a combination of two or more networks.
  • RF radio frequency
  • FIG. 2 shows a graphical representation 200 of example frames 106 of video, e.g., of scene 108, in which features are tracked, e.g., as discussed herein.
  • FIG. 2 shows representations of four frames 202(1 )-202(4) (individually or collectively referred to herein with reference 202). Dotted or short-dashed rectangles represent trackers 204(1)- 204(3) (individually or collectively referred to herein with reference 204). Trackers 204(l)-204(3) correspond to respective tracked regions of the frames 202.
  • the frames 202 show a total of three distinct people, namely subjects 206(l)-206(3) (individually or collectively referred to herein with reference 206). For purposes of exposition, and without limitation, the trackers 204 are numbered corresponding to the three subjects 206.
  • Trackers 204(1) and 204(2) highlight and represent the detected or tracked features, in this example, the faces of subjects 206(1) and 206(2), respectively.
  • the initial locations of trackers 204(1) and 204(2) can be determined, e.g., by the detecting component 120.
  • Tracking component 122 has determined a new position of tracker 204(2) to correspond to the new (tracked) location of the face of subject 206(2). However, the face of subject 206(1) is obscured. Tracking component 122 has detected the obscuration and has designated tracker 204(1) as an out-frame tracker 208, represented graphically below frame 202(2) for purposes of explanation. That is, the feature being tracked by tracker 204(1) is not visible in frame 202(2).
  • the term "in- frame tracker” refers to a tracker that is not an out-frame tracker 208.
  • Inset 210 shows an enlarged view of the faces of subjects 206(1) and 206(2) in frame 202(2).
  • Tracker 204(2) indicates the tracked area of the face of subject 206(2).
  • Candidate feature region 212 shows a region determined, e.g., by tracking component 122, as corresponding to the tracked region of tracker 204(1) in frame 202(1).
  • Tracking component 122 can locate the feature 114 of interest in candidate feature region 212, or determine that the feature 114 of interest is not visible in candidate feature region 212, by comparing a histogram of a test region in frame 202(2) of the video, e.g., the candidate feature region 212, to a histogram of a corresponding region in a prior frame 202(1) of the video.
  • tracking component 122 has determined histogram 214 as an example histogram of the tracked region of tracker 204(1) in frame 202(1).
  • histogram 214 shows percentages of black ("K”), brown (“Br"), and flesh tone (“Fl”) colors in the tracked region of tracker 204(1), e.g., 25% K, 15% Br, and 60% Fl.
  • histograms include binned data of various color components of pixels in a region of a frame, e.g., as discussed below with reference to Eq. (2).
  • the specific colors that are typically represented in a histogram correspond to the color space (e.g., specific amounts of red, green, and blue in the RGB color space).
  • Black, brown, and flesh tone colors are used in this example, merely to simplify the explanation. In an example implementation, however, the brown, black, and flesh tones are actually represented as combinations of, e.g., red, green, and blue color information or hue, saturation, and value information.
  • tracking component 122 has determined histogram 216 of candidate feature region 212 in frame 202(2).
  • subject 206(1) has a relatively smaller amount of black hair and subject 206(2) has a relatively larger amount of brown hair.
  • histogram 216 is different from histogram 214. In this example, histogram 216 has 20% K, 50% Br, and 30% Fl.
  • the tracking component 122 can compare histograms, e.g., by computing differences or distances between histograms. In the illustrated example, tracking component 122 determines a distance (or difference, and likewise throughout) between histogram 214 and histogram 216. Tracking component 122 determines that the distance is greater than a selected threshold, and therefore that candidate feature region 212 does not correspond to the tracked feature 114, in this example, the face of subject 206(1). Tracking component 122 therefore marks or otherwise designates tracker 204(1) in frame 202(2) as an out-frame tracker 208, as shown.
  • Tracking component 122 has determined an updated location of tracker 204(2) corresponding to the tracked location of the face of subject 206(2).
  • the detecting component 120 can detect features (e.g., faces) in frame 202(3).
  • the detecting component 120 detects the face of subject 206(1), no longer obscured behind subject 206(2).
  • the tracking component 122 can compare the newly-detected face of subject 206(1) with in-frame trackers, out-frame trackers 208, or any combination thereof.
  • the tracking component 122 can compare the stored location information of out-frame tracker(s) 208 such as tracker 204(1) with location information of detected feature(s). Tracking component 122 can additionally or alternatively compare histograms of newly-discovered features 114 with histograms of in-frame trackers or of out-frame trackers 208. In the illustrated example, the tracking component 122 has determined that tracker 204(1) corresponds to the detected face of subject 206(1) in frame 202(3). [0036] In some examples, the tracking component 122 can additionally or alternatively track features based at least in part on test regions different from the tracked regions.
  • clothing can vary more person-to-person than flesh color
  • using clothing can improve the accuracy of tracking faces, e.g., when one person crosses in front of another ("face crossing"), e.g., as shown in frames 202(l)-202(3), or when a face disappears and then reappears (“occlusion"), e.g., as shown in frames 202(2) and 202(3).
  • face crossing e.g., when one person crosses in front of another
  • occlusion e.g., as shown in frames 202(2) and 202(3).
  • This can also provide improved accuracy in, e.g., tracking license plates of vehicles.
  • vehicle color and bumper sticker color and pattern can vary more vehicle-to-vehicle than license plate color.
  • tracking based at least in part on image data of portions of the bumper of a vehicle in addition to (or instead of) image data of portions of the license plate of the vehicle can permit more accurately tracking multiple vehicles in a scene 108.
  • image data of portions of the bumper of a vehicle in addition to (or instead of) image data of portions of the license plate of the vehicle can permit more accurately tracking multiple vehicles in a scene 108.
  • only one test region is described here in addition to the tracked region.
  • any number of test regions can be used for a given tracked region.
  • the number of test regions for a given tracked region can vary between frames 202.
  • the tracking component 122 can determine test regions based on candidate feature regions such as candidate feature region 212, on tracked regions such as for trackers 204, or any combination thereof.
  • the tracking component 122 can determine a test region 218 in frame 202(1) based at least in part on the tracked region, e.g., of tracker 204(1).
  • the tracking component 122 can also determine respective test regions 220(1) and 220(2) in frame 202(3) based at least in part on the tracked regions of trackers 204(1) and 204(2) in frame 202(3), e.g., the locations of respective detected faces of subjects 206(1), 206(2). Histograms of these regions can be used in tracking, as is discussed below with reference to FIG. 3.
  • the areas shown by trackers 204(1) and 204(2) in frames 202(l)-202(4) can be, e.g., candidate feature regions or test regions as discussed herein with reference to tracking component 122. For example if subjects 206(1) and 206(2) had very similar skin tones and hair color, test regions 220(1) and 220(2), corresponding to the subjects' clothing, could be used to distinguish the subjects from one another.
  • the detecting component 120 can locate a new feature, e.g., the face of newly-appeared subject 206(3), in frame 202(4).
  • the tracking component 122 can determine, e.g., using comparisons of locations, histograms, or other data described herein, that the new feature does not correspond to an existing tracker, e.g., an out-frame tracker 208, and can thus allocate a new tracker 204(3) initialized with the detection results.
  • the visual representation of tracker 204(3) is shown as an obscurant shading the face of subject 206(3). Other examples of obscurants are discussed herein.
  • tracking component 122 can determine a correspondence between detected features 114 and trackers 204 using assignment algorithms. Examples of detection of features are discussed above with reference to frames 202(1) and 202(4). Assignment algorithms, such as the Hungarian algorithm or other dynamic-programming or combinatorial optimization algorithms, use a score, e.g., a difference or distance, between a given tracker (in-frame or out-frame) and a given detected feature. For example, the score can be the histogram distance between image data in a tracked region and image data in a region holding a detected feature; the overlap ratio, discussed below with reference to Eq. (3), between a tracker and a feature; or a combination thereof, e.g., a weighted sum.
  • a score e.g., a difference or distance, between a given tracker (in-frame or out-frame) and a given detected feature.
  • the score can be the histogram distance between image data in a tracked region and image data in a region holding a detected feature; the overlap
  • Assignment algorithms can determine a correspondence between a given set of detected features and a given set of trackers (e.g., in-frame, out-frame, or both) so that the score is mathematically maximized (e.g., for goodness-of-fit scores) or minimized (e.g., for distance scores).
  • Other mathematical optimization algorithms can also or alternatively be used, e.g., gradient-descent algorithms.
  • Other algorithms can be used, e.g., partitioning the image into regions and performing assignment or other testing on trackers and detected features in the same region.
  • both in-frame and out-frame trackers can be provided to the assignment algorithm (or other mathematical optimization technique).
  • the algorithm e.g., executed on processor 116, can determine a mapping between the trackers and the detected features. The mapping may indicate that some trackers do not correspond to any detected feature. Those trackers can then, in response to such a mapping, be designated as out-frame trackers 208.
  • FIG. 3 shows example histograms 300 of example test regions discussed above with reference to FIG. 2.
  • test regions e.g., clothing regions
  • feature regions e.g., face regions
  • tracking component 122 can determine test region 218 based at least in part on, e.g., location information or other properties of tracker 204(1) in frame 202(1) or the detected feature represented by tracker 204(1) in frame 202(1).
  • Tracking component 122 can also determine test regions 220(1) and 220(2) based at least in part on, e.g., location information or other properties of trackers 204(1) and 204(2) in frame 202(3) or the detected features represented by trackers 204(1) and 204(2) in frame 202(3).
  • subject 206(1) is wearing a white (“Wh”) shirt and subject 206(2) is wearing a red (“R”) dress.
  • the illustrated respective histograms 302(1) and 302(2) of second test regions 220(1) and 220(2) show 5% R/95% Wh and 100% R/0% Wh, respectively.
  • Histogram 304 of test region 218 shows 0% R/100% Wh.
  • the example histograms are shown for purposes of explanation and without limitation.
  • a histogram can include equally sized hue bins centered on [0°, 15°, ...
  • Red can be represented, in this example, as histogram peaks of hue near 0°, saturation near 100%, and value, e.g., 50% or higher.
  • White can be represented as pixels with histogram peaks of saturation near 0% and value near 100%.
  • the tracking component 122 can compute histogram distances and determine that, since the distance between the histogram 304 (corresponding to test region 218) and histogram 302(1) (corresponding to test region 220(1)) is smaller than the distance between histogram 304 and histogram 302(2) (corresponding to test region 220(2)), tracker 204(1) should be associated with a region above test region 220(1) rather than with a region above test region 220(2). That is, the feature being tracked by tracker 204(1) in frame 202(3) corresponds to an area near test region 220(1) rather than to an area near test region 220(2), as shown. The result, in the illustrated example, is that tracker 204(1) remains associated with subject 206(1) even when subject 206(1) is obscured (frame 202(2)) and then reappears (frame 202(3)).
  • the tracking component 122 can locate the feature at the candidate feature region (indicated by tracker 204(2)) in the frame 202(3) of the video by comparing a histogram 302(1) of the second test region 220(1) in the frame 202(3) of the video to histogram 304 of a corresponding test region 218 in a prior frame 202(1) of the video.
  • the tracking component 122 can locate the feature if histograms of the test regions correspond, or if histograms of the candidate regions correspond, or if histograms of both the test and candidate regions correspond, or if histograms of a majority of histograms (candidate and any test region(s)) correspond, or a if a selected number of histograms correspond, or any combination thereof. In some examples, the tracking component 122 can locate the feature if no histogram, or a minority of histograms, or fewer than a selected number of histograms fail to correspond or have distances exceeding a selected threshold, or any combination thereof, or any combination of any of the items in this paragraph.
  • FIG. 4 is a dataflow diagram 400 illustrating example interactions between components illustrated in FIG. 1, and showing example modules of tracking component 122.
  • the modules of the tracking component 122 e.g., stored in memory 118, can include one or more modules, e.g., shell modules, or application programming interface (API) modules, which are illustrated as a tracking module 402 and an updating module 404.
  • image source 104 provides video frame(s) 106 to be processed, e.g., by detecting component 120 or tracking component 122.
  • the tracking component 122 can track features 114 in frames 106 of video, e.g., using tracking information or color histograms of candidate regions.
  • tracking component 122 is configured to update tracking results from a Kanade-Lucas-Tomasi (KLT) tracker based at least in part on histograms of image content of selected regions of frames 106, as described herein.
  • KLT Kanade-Lucas-Tomasi
  • the detecting component 120 can be configured to locate feature(s) 114 in a first frame 106 of the video, or other frame(s) of the video. In some examples, the detecting component 120 can be configured to produce a detected feature- location map 406 based at least in part on the frame 106 of the video.
  • the detected feature-location map 406 can include, e.g., locations or bounding rectangles or other bounding areas of one or more detected features 114 in the frame 106. This can be done, e.g., using algorithms implemented in MICROSOFT FACESDK.
  • Such algorithms can include, but are not limited to, dynamic cascades, boost classifiers, boosting chain learning, neural -network classification, support vector machine (SVM) classification, or Bayesian classification.
  • An example algorithm is the Viola-Jones detector using adaptive boost (AdaBoost) learning of cascaded sub-detectors using rectangular Haar-like features.
  • AdaBoost adaptive boost
  • the tracking module 402 can be configured to produce a candidate feature-location map 408 based at least in part on a subsequent frame 106 of a video and the detected feature-location map 406.
  • the feature can correspond to at least some image data of the frame of the video.
  • the tracking module 402 can produce the candidate feature-location map 408 further based on a previous frame 106 of the video, a corresponding previous feature-location map, e.g., prior-frame feature-location map 410 (e.g., provided by the detecting component 120 or the tracking module 402), or a combination thereof. This can be done, e.g., using a KLT or other tracking technique, e.g., the Scaled Fast Compressive Tracking (SFCT) or Struck tracker. For example, KLT can be applied to the previous frame 106 and the subsequent frame 106.
  • the candidate feature-location map 408 can include, e.g., zero, one, or more candidate areas for individual features located in the detected feature-location map 406.
  • the tracking module 402 can use a tracker, such as KLT, that is robust to scale change. This can permit tracking features 114 as subjects 110 approach or recede from the camera.
  • the detected feature-location map 406 corresponds to a first video frame and the candidate feature-location map 408 corresponds to a second video frame immediately following the first video frame.
  • the detected feature-location map 406 is a prior feature-location map corresponding to a prior video frame arranged in the video immediately before the frame of the video for which the candidate feature-location map 408 is determined. Examples of such configurations are discussed above with respect to frames 202(1) and 202(2), FIG. 2.
  • the tracking module 402 uses a KLT tracker.
  • the KLT tracker uses mathematical optimization to determine a region in a later frame corresponding to a region in an earlier frame. The sum of squared intensity differences between pixels of the region in the later frame and pixels of a selected region of an earlier frame is mathematically minimized. The region of the later frame is then the tracked equivalent of the region of the earlier frame.
  • the region of the earlier frame e.g., a first frame, can include image data of a feature 114, e.g., a face.
  • the region in the earlier frame and the corresponding region of the later frame can be the regions in the respective frames for a tracker 130 of the feature 114.
  • the region in the later frame can be an example of a candidate feature-location map 408.
  • KLT or another tracker can be computed based at least in part on a selected subset of the points in a region of the earlier frame.
  • the tracking module 402 can be configured to track located features over subsequent second frames of the video. For example, in a sequence of frames 1, 2, the tracking module 402 can use KLT or another tracking technique to track features from frame 1 to frame 2, then from frame 2 to frame 3, and so on.
  • the tracking module 402 can be configured to determine that a histogram of a first-feature region of a particular one of the second frames does not correspond to a histogram of a first-feature region of the first frame.
  • the tracking module 402 can record an indication that a tracked feature corresponding to the first-feature region has moved out of the frame (e.g., out of at least one of the second frames). This is discussed above in more detail with reference to out-frame trackers 208. This can permit tracking features even when they are obscured, e.g., for a small number of frames.
  • the updating module 404 can be configured to determine candidate feature points based at least in part on the candidate feature-location map 408. For example, the updating module 404 can uniformly or non-uniformly select a predetermined number of points, e.g., a 10x 10 grid, from a given region in the candidate feature-location map 408. In some examples, the updating module 404 can select a predetermined number of points of the given region, the selected points having relatively high-magnitude gradients within the region, e.g., the n points having the highest- magnitude gradients within the region, for integer s.
  • a predetermined number of points e.g., a 10x 10 grid
  • the updating module 404 can select points having local maxima of curvature, e.g., as measured by the determinant of the Hessian matrix or an approximation of the Hessian matrix using, e.g., Haar filters.
  • the updating module 404 can select points by decomposing and downsampling using, e.g., Haar wavelets, and selecting points having selected magnitudes in the decomposed images, e.g., within a certain percentage of the highest-magnitude value after wavelet transformation.
  • the updating module 404 can apply a tracker such as KLT, or use results from the tracking module 402, to track the selected points bidirectionally, e.g., from a first frame to a second frame, and then from the tracked point in the second frame back to the first frame.
  • the starting point and the point in the first frame after bidirectional tracking may not be the same point; the distance between them is referred to as the "forward-backward error" (FBE).
  • FBE forward-backward error
  • the updating module 404 can determine the FBE for individual ones of the selected points.
  • the updating module 404 can then determine a representative FBE value, e.g., the median or mean of at least some of the determined FBEs, or all of the determined FBEs, or a selected percentage of any of those. In some examples, the updating module 404 can then determine, as the candidate feature points, one or more of the points meeting selected criteria, e.g., having respective FBEs less than the determined representative FBE value. This can advantageously reduce noise in tracking due, e.g., to changes in shadows or lighting between one frame and the next.
  • a representative FBE value e.g., the median or mean of at least some of the determined FBEs, or all of the determined FBEs, or a selected percentage of any of those.
  • the updating module 404 can then determine, as the candidate feature points, one or more of the points meeting selected criteria, e.g., having respective FBEs less than the determined representative FBE value. This can advantageously reduce noise in tracking due, e.g., to changes in shadows or lighting between one
  • the updating module 404 can be configured to determine a candidate feature region based at least in part on the determined candidate feature points. In some examples, the updating module 404 can determine, for one or more of the candidate feature points, a point-displacement vector between the point in the earlier frame and the tracked point in the later frame. The updating module 404 can then determine a region-displacement vector as e.g., a median or mean of one or more of the point- displacement vectors. In some examples, the updating module 404 can determine, for pairs of the candidate feature points, a pair-scale factor as the quotient of the distance between the points in the pair in the earlier frame and the tracked points in the pair in the later frame.
  • the updating module 404 can then determine a region-scale factor as, e.g., a median or mean of one or more of the pair-scale factors. In some examples, the updating module 404 can then apply the region-displacement vector and region-scale factor to a region of the prior-frame feature-location map 410 to determine a corresponding region in the candidate feature-location map 408.
  • a region-scale factor as, e.g., a median or mean of one or more of the pair-scale factors.
  • the updating module 404 can then apply the region-displacement vector and region-scale factor to a region of the prior-frame feature-location map 410 to determine a corresponding region in the candidate feature-location map 408.
  • the updating module 404 can be configured to determine a candidate feature region based at least in part on FBE values. For example, for any particular candidate feature region, if the median (or mean or other representative) FBE is greater than a selected threshold, the updating module 404 can, e.g., disregard that candidate feature region or consider a tracker 130 corresponding to that candidate feature region to be an out-frame tracker, as discussed above with reference to out-frame trackers 208, FIG. 2. This can permit detecting tracking failures, e.g., due to occlusion of a face feature by an umbrella, or of a license-plate feature by a building.
  • the updating module 404 can be configured to locate a feature 114 at (or at least partly in, on, within, over, or under, and likewise throughout) the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video. This can be done, e.g., as discussed above with reference to histograms 214 and 216 shown in FIG. 2, and histograms 302(1), 302(2), and 304 shown in FIG. 3.
  • the updating module 404 can be configured to locate the feature 114 if a histogram of image data of a portion, defined by the candidate feature region, of a video frame 106 is within a selected distance of a previously-determined image-data histogram corresponding to the feature 114. Locating a feature based at least in part on histogram comparisons can improve the robustness of the tracker to drifting over a sequence of frames.
  • distances between histograms can be computed using various metrics, including but not limited to, ⁇ 2 (Chi Squared), Kolmogorov-Smirnov (KS), Cramer/von Mises (CvM), Quadratic Form (QF), earth-mover's (Wasserstein- Rubinstein-Mallows) distance, Kullback-Liebler (KL), symmetrized KL, Jeffrey divergence (JD), Euclidean distance in a feature space having as many dimensions as bins of the histogram, or Overton's histogram intersection.
  • ⁇ 2 Cho Squared
  • KS Kolmogorov-Smirnov
  • CvM Cramer/von Mises
  • QF Quadratic Form
  • earth-mover's Wasserstein- Rubinstein-Mallows
  • KL Kullback-Liebler
  • JD symmetrized KL
  • the distance between two histograms H 1 and H 2 is computed as in Eq. (1): where L is the number of bins in the histograms and H x is the mean of histogram x.
  • two histograms e.g., a detected histogram and a tracked candidate histogram, are determined to correspond if d ⁇ H ⁇ H 2 ) ⁇ T H for a selected threshold T H .
  • a histogram H t is formed by computing respective histograms of individual color components, e.g., a hue histogram, a saturation histogram, and a value histogram. Each histogram can be computed using values of the corresponding color components of the image.
  • the histogram H t can then be formed by concatenating the computed respective histograms. For example, given hue histogram [h ⁇ h n ], saturation histogram [s ⁇ ... , s n ], and value histogram [v ⁇ —, v n ], H t can be formed as in Eq. (2):
  • hue, saturation, and value components can alternatively be interleaved (e.g., hi, si, vi, h2, . . . ), or part of the histogram (e.g., a particular component or range) can be interleaved and part not interleaved, or any combination thereof.
  • the tracking component 122 determines that the candidate feature region 212 is not associated with the detected region indicated by tracker 204(1). In some of these examples, in response to this determination or other histogram-match or histogram-distance criteria, the tracking component 122 selects, as a region corresponding to an out-frame tracker 208, the tracked candidate feature region 212.
  • the tracking component 122 can determine that a tracker 204 is an out-frame tracker 208 if, e.g., no candidate-region histogram in a current frame matches or is within a selected distance of a prior-frame histogram associated with that tracker 204.
  • the updating module 404 can be configured to determine the test region based at least in part on the candidate feature region.
  • the test region can be equal in size to, or larger or smaller than, the candidate feature region, and can have corner(s) or a centroid overlying that of the candidate feature region or spaced apart therefrom.
  • the test region can overlap with the candidate feature region or be separate therefrom.
  • the test region can be determined, e.g., based on a stored spatial relationship.
  • the test region is or includes the candidate feature region.
  • the feature is a face.
  • the test region has a different size or location than the candidate feature region.
  • two regions are considered to correspond based on color distances.
  • two regions can be considered to correspond if the respective predominant colors in those regions (e.g., the most frequently occurring colors or color bins) are within a specified distance in a color space, e.g., a CIELAB ⁇ * of ⁇ 1.0.
  • Other example color difference metrics include the J 2 norm
  • color difference metrics include hue-angle difference, per-component or multi-component differences (e.g., CIELAB AL*, AC*, Ah*, Aa*, or Ab*), Hamming distance between digital representations of one or more components in a color representation or color space, and CIE 1976 UCS AuV .
  • the updating module is further configured to determine the test region based at least in part on a stored spatial relationship between the candidate feature region and the test region.
  • the stored spatial relationship can be a proximal disjoint relationship specifying the test region is spaced apart from the candidate feature region, e.g., by a predetermined amount in a predetermined direction. This can permit locating the feature using nearby image data.
  • the stored spatial relationship can be represented in scale-invariant terms, e.g., percentage of the size of the feature region. This can permit tracking based at least in part on test regions even when the features are changing in size or scale, e.g., because a subject 110 is moving closer to the camera or farther from the camera, e.g., image sensor 112.
  • test region(s) having proximal disjoint relationship(s) with the candidate feature regions permits using image data near features 114 to distinguish and track those features 114. Further examples are discussed above with reference to test regions 220(1) and 220(1), FIG. 2.
  • histograms as described herein can be computed in one or more color spaces, such as those defined by the International Commission on Illumination (CIE), the National Television System Committee (NTSC), or the International Electrotechnical Commission (IEC).
  • Example color spaces that can be used include CIE L*a*b* (CIELAB), CIE L*u*v* (CIELUV, 1976), CIE YuV (the CIE 1976 uniform coordinate scale, UCS, plus luminance Y), YIQ (NTSC 1953), sRGB (IEC 61966-2- 1 : 1999), linear RGB, or other color spaces.
  • the histograms can be computed or otherwise determined in a color space having at least hue and colorfulness dimensions.
  • color spaces examples include but are not limited to Hue/Saturation/ Value (HSV), Hue/Saturation/Lightness (HSL), Hue/Saturation/Intensity (HSI), and CIE L*C*h*.
  • HSV Hue/Saturation/ Value
  • HSL Hue/Saturation/Lightness
  • HI Hue/Saturation/Intensity
  • CIE L*C*h* CIE L*C*h*.
  • the images can include respective color components corresponding to the dimensions of the color space.
  • the tracking component 122 is configured to compute a distance, e.g., in a selected color space, between the histogram of a first-feature region in one frame and a histogram of a first-feature region in an earlier frame.
  • the tracking component 122 can further be configured to determine those histograms do not correspond in response to the computed distance exceeding a selected threshold, e.g., ⁇ *>1.0.
  • a feature-tracking system can include a display device 126.
  • reporting component 124 can be configured to provide a visual representation of the located feature in the frame of the video for display.
  • reporting component 124 can be configured to provide the visual representation via the display device 126. Examples are discussed above with reference to representations 128 in FIG. 1.
  • the reporting component 124 can be configured to present for display frames of the video and visual representations of at least some of the detected and/or tracked features.
  • the visual representations can include at least one obscurant.
  • Example obscurants can include, e.g., a darkened (e.g., black) area, a lightened (e.g., white) area, an area in which image data is spatially distorted, or an area in which image data is blurred.
  • Using obscurants can reduce the likelihood of visually identifying the feature when viewing a representation 128, e.g., to protect privacy of people walking by a traffic camera.
  • the visual representations can include at least one highlight.
  • Example highlights can include, e.g., an outline of the feature or of an area including the feature (e.g., trackers 130 as graphically represented in FIG. 1), a lightened or darkened background underlying or overlaying the detected feature (e.g., a semi-transparent colored highlight overlaid on image data of the feature), or an arrow or other shape pointing at or otherwise indicating the feature.
  • a representation 128 can include any combination of obscurants or highlights for respective tracked features.
  • the reporting component 124 can be configured to provide an indication of at least some of the detected and/or tracked features.
  • the reporting component 124 can be configured to present representations 128 of features 114 detected by the detecting component 120, features 114 tracked by the tracking module 402 or the updating module 404, or any combination thereof.
  • detection highlights and tracking highlights can be presented in different colors or line styles to permit readily distinguishing them in a representation 128.
  • FIGs. 5-7 illustrate example processes for tracking features in video frames.
  • the methods are illustrated as sets of operations shown as discrete blocks.
  • the methods can be implemented in any suitable hardware, software, firmware, or combination thereof.
  • functions shown in FIGs. 5-7 can be implemented on or otherwise embodied in one or more computing devices 102 or image sources 104, e.g., using software running on such devices.
  • the operations represent computer-executable instructions that, when executed by one or more processors, cause one or more processors to perform the recited operations.
  • the operations represent logic functions implemented in circuitry, e.g., datapath-control and finite-state-machine sequencing functions.
  • FIG. 5 illustrates an example process 500 for tracking features in video frames.
  • one or more features are detected in a first frame.
  • detecting component 120 detects one or more features and corresponding detected first- frame feature regions within a first frame of a video.
  • the video can include the first frame and a later second frame.
  • the video frames can be received and processed in any order. Processing steps described below can be interleaved or executed in any order unless otherwise specified.
  • detection (block 502) can be performed for the first frame after both the first and second frames have been received.
  • one or more of the detected feature are tracked in the second frame.
  • tracking module 402 determines corresponding tracked candidate second-frame feature regions in the second frame, based at least in part on the first frame, the second frame, and the detected first-frame feature regions.
  • block 504 includes comparing at least some image data of the second frame with at least some image data of the first frame.
  • tracking module 402 can use a tracking algorithm such as KLT to determine a candidate second-frame feature region by finding a mathematical best fit between the image data of the first frame in a first-frame feature region and image data of the second frame.
  • block 504 includes comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.
  • Bidi tracking can permit more accurately tracking features across sets of three or more frames, e.g., for analysis of recorded video or other applications in which multiple frames can be received before features are identified in the frames.
  • image source 104 or computing device 102 can rearrange frames, e.g., in a sequence such as a sequence of I-, P-, and B-pictures specified in one or more MPEG video standards, e.g., MPEG-2, MPEG-4 part 2, or MPEG-4 part 10. This can permit processing frames having timestamps later in the video before frames having timestamps earlier in the video, e.g., for bidi tracking.
  • histograms of the tracked candidate second-frame feature regions are compared with histograms of the detected first-frame feature regions. For example, updating module 404 determines the histograms in a color space having at least hue and colorfulness dimensions. In some examples, updating module 404 compares the histograms by computing distances between histograms of the tracked candidate second- frame feature regions and histograms of the detected first-frame feature regions.
  • histograms are computed for test regions instead of or in addition to tracked candidate second-frame feature regions or detected first-frame feature regions. This can be done, e.g., as discussed above with reference to histograms 302(1), 302(2), and 304 shown in FIG. 3, which can be used independently of, or in conjunction with, histograms 214 and 216 shown in FIG. 2.
  • the tracked candidate second-frame feature regions for which the comparison indicates at least a threshold degree of similarity can be selected as tracked second-frame feature regions.
  • updating module 404 removes from consideration as a tracked second-frame feature region, a candidate second-frame feature, if a distance between the histogram of that candidate region and the histogram of the detected region exceeds a selected threshold.
  • FIG. 6 illustrates an example process 600 for tracking and detecting features in frames of a video, e.g., using a computing device such as computing device 102.
  • Blocks 502-508 of FIG. 6 can be performed, e.g., as described above with reference to
  • Block 508 can be followed by block 602.
  • the frames of video include a third frame later than the second frame.
  • the frames of video include a fourth frame later than the third frame.
  • the labels "first,” “second,” “third,” and “fourth” are used for clarity of exposition and are not limiting except as expressly noted.
  • operations described herein with reference to the first, third, and fourth frames can be used without a second frame. Such examples can include, e.g., detecting features in the first frame, e.g., frame 202(1) of FIG. 2, determining that one or more features are not visible in the second frame, e.g., frame 202(2), FIG.
  • tracking component 122 determines one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region.
  • the comparing is repeated with respect to the third frame.
  • tracking component 122 compares histograms between candidate regions in the third frame and detected regions in the first frame. In some examples, tracking component 122 compares histograms between candidate regions in the third frame and tracked regions in the second frame.
  • the selecting (block 508) is repeated with respect to the third frame to determine one or more tracked candidate third-frame feature regions.
  • the operations in blocks 602-606 can be performed as described above with reference to FIGS. 2-5, e.g., with reference to frame 202(2) or 202(3), FIG. 2 or with reference to FIG. 3.
  • some trackers 204 may not be selected and may be flagged as out-frame trackers 208, and likewise throughout.
  • the detecting (block 502) is repeated with respect to the third frame to determine one or more detected third-frame feature regions. This can be done, e.g., as discussed above with reference to tracker 204(1) in frame 202(3) of FIG. 2.
  • block 610 includes computing respective overlap ratios for pair-wise combinations of the candidate third-frame feature regions and the detected third- frame feature regions.
  • the overlap ratio R(R T , R D ) of the region R T of a tracked feature and the region R D of a detected feature can be computed as in Eq. (3):
  • the tracked candidate third-frame feature regions not associated with a detected third-frame feature region are selected as out-frame regions. This can be done, e.g., as described above with reference to frame 202(2) of FIG. 2. In some examples, block 612 can be followed by block 614 and/or block 620.
  • the tracking (block 504) is repeated with respect to the fourth frame.
  • the selecting (block 508) is repeated with respect to the fourth frame to determine one or more tracked candidate fourth-frame feature regions.
  • the operations in blocks 614-618 can be performed as described above with reference to FIGS. 2-5, e.g., with reference to frame 202(3) shown in FIG. 2.
  • the detecting (block 502) is repeated with respect to the fourth frame to determine one or more detected fourth-frame feature regions. This can be done, e.g., as discussed above with reference to frames 202(3) or 202(4) of FIG. 2.
  • At block 622, at least some of the tracked candidate fourth-frame feature regions can be associated with corresponding out-frame regions. This can be done, e.g., as described above with reference to tracker 204(1) in frame 202(3), shown in FIG. 2. As noted above, this can permit tracking features even when they are obscured for, e.g., fewer than a selected number of frames. In some examples, at least some of the tracked candidate fourth-frame feature regions can be associated with corresponding detected fourth-frame feature regions.
  • FIG. 7 illustrates an example process 700 for tracking features in frames of a video, e.g., across occlusion events or other reductions in visibility of a feature.
  • Blocks 502 and 504 of FIG. 7 can be performed as discussed above with reference to FIG. 5.
  • Blocks 502 and 504, and other blocks shown in FIG. 7, can be performed for one or more frames of video.
  • decision block 702 it is determined whether any trackers have moved out- frame, e.g., by failure to associate with a detected feature. This can be done, e.g., as discussed above with reference to block 610 of FIG. 6, or as discussed above with reference to histograms 214 and 216 shown in FIG. 2.
  • block 704 If it is determined that there are out-frame trackers (the "Yes" branch from decision block 702), then at block 704, information of out-frame trackers is recorded. This can be done, e.g., as discussed above with reference to frame 202(2) of FIG. 2. In some examples, block 704 can be followed by block 708, by block 714, or by blocks 708 and 714 in that order, e.g., directly or with intervening blocks such as decision block 710.
  • tracking component 122 If it is determined that there are no out-frame trackers (the "No" branch from decision block 702), then at decision block 706, it is determined whether a detection interval has expired. For example, tracking component 122 counts frames of video processed since the most recent frame for which detection was performed, e.g., as discussed with reference to block 502 or block 708. Every frame or every n frames, tracking component 122 compares the count to a stored counter TG time to determine whether the detection interval has expired. Tracking component 122 resets the counter to 0, in this example, in block 502 and in block 708.
  • Example detection intervals can include, but are not limited to, a selected amount of wall-clock time, a selected number of frames in which tracking has been performed, or a selected number of frames of video received. In some examples, the detection interval is 30 frames, or as many frames as constitute one second of the video. In some examples, tracking (e.g., KLT) is performed every frame and detection (e.g., FACESDK) is performed only every n frames, for detection interval n.
  • KLT e.g., KLT
  • detection e.g., FACESDK
  • features can be detected in the video. This can be done, e.g., as discussed above with reference to detecting component 120 and block 502. For example, as discussed above with reference to frames 202(3) and 202(4), previously-obscured features can reappear in a frame or new features can appear in a frame.
  • decision block 710 it is determined whether any features were detected at block 708. This can be done as discussed above with reference to detecting component 120.
  • tracking component 122 determines whether there are any in-frame trackers, e.g., tracker 204(2) of FIG. 2. If there are in-frame trackers (the "Yes” branch from decision block 712), processing continues as described above with reference to block 504. On the other hand, if there are no active in-frame trackers (the "No” branch from decision block 712), then processing continues as described above with reference to block 708. Accordingly, detection is repeated across multiple video frames until there are features in the video frames to be tracked.
  • one or more in-frame or out-frame trackers are updated, e.g., to be associated with the detected features. This can be done, e.g., as described above with reference to blocks 610 or 622 of FIG. 6.
  • tracking component 122 can delete expired trackers.
  • tracking component 122 discards out-frame trackers 208 that have not been updated for a predetermined interval such as a number of frames, elapsed time, or other time described above, e.g., 10 frames, 30 frames, or I s. Tracking component 122 in these examples removes trackers for features that have not reappeared, e.g., faces of people who have left the field of view of the sensor 112.
  • a tracker may drift over several frames to tracking a static portion of the background (e.g,. a wall) instead of the feature.
  • tracking component 122 discards in-frame trackers that have not been updated for a selected number TA of frames, e.g., 50 frames or 100 frames. Tracking component 122 in these examples removes trackers that, for example, have drifted to the background.
  • Tracking component 122 can determine that an in-frame tracker has not been updated if the tracker has not moved more than a selected distance over the course of the TA frames, or if the tracker has not been resized by more than a selected absolute percentage area change (e.g., ⁇ 5%) over the course of the TA frames.
  • a selected absolute percentage area change e.g., ⁇ 5%
  • FIG. 8 illustrates select components of an example computing device 102.
  • computing device 102 includes one or more processor(s) 116, a memory 118, input/output (I/O) interfaces 802, and/or communications interface(s) 804.
  • Memory 118 can be implemented as any combination of various types of memory components, e.g., computer-readable media or computer storage media components. Examples of possible memory components include a random access memory (RAM), a disk drive, a mass storage component, and a non-volatile memory (e.g., ROM, Flash, EPROM, EEPROM, etc.). Alternative implementations of computing device 102 can include a range of processing and memory capabilities.
  • full-resource computing devices can be implemented with substantial memory and processing resources, including a disk drive to store content for replay by the viewer.
  • Low-resource computing devices can have limited processing and memory capabilities, such as a limited amount of RAM, no disk drive, and limited processing capabilities.
  • Processor(s) 116 process various instructions to control the operation of computing device 102 and to communicate with other electronic and computing devices.
  • the processor(s) 116 can be configured to execute modules of a plurality of modules, discussed below, on the memory 118.
  • the computer- executable instructions stored on the memory 118 can, upon execution, configure a computer such as a computing device 102 (e.g., a computing device 102 or image source 104) to perform operations described herein with reference to, e.g., detecting component 120, tracking component 122, reporting component 124, or modules of any of those.
  • the modules stored in the memory 118 can include instructions that, when executed by the one or more processor(s) 116, cause the one or more processor(s) 116 to perform operations described herein.
  • the memory 118 stores various information and/or data, including, for example, a detecting component 120, a tracking component 122 (including, e.g., tracking module 402 and updating module 404, FIG. 4), a reporting component 124, and optionally an operating system 806 and/or one or more other applications 808.
  • a detecting component 120 e.g., a tracking component 122 (including, e.g., tracking module 402 and updating module 404, FIG. 4), a reporting component 124, and optionally an operating system 806 and/or one or more other applications 808.
  • Functionality described associated with the illustrated components or modules can be combined to be performed by a fewer number of components or modules or can be split and performed by a larger number of components or modules.
  • the other applications 808 can include, for example, an Internet browser that includes video capabilities, a media player application, a video editing application, a video streaming application, a television viewing application, and so on.
  • computer-executable instructions of detecting component 120, tracking component 122, reporting component 124, and applications 808 stored in one or more computer-readable media when executed on processor 116 of computing device 102, direct computing device 102 to perform functions listed herein with respect to the relevant components in memory 118.
  • memory 118 includes a data store 810.
  • data store 810 can include a first memory for storing one or more video frame(s) 812, e.g., video frames 106 of FIG. 1.
  • Individual video frames can include, e.g., data of a plurality of pixels.
  • the data of individual pixels can include respective values for individual color channels (planes), e.g., red (R), green (G), and blue (B) color channels; luma (Y), blue chroma (Cb), and red chroma (Cr) color channels; or other color organizations.
  • planes e.g., red (R), green (G), and blue (B) color channels
  • Y luma
  • Cb blue chroma
  • Cr red chroma
  • data store 810 can store one or more feature-location map(s) 814, e.g., holding the locations of detected features (e.g., detected feature-location map 406 of FIG. 4), prior-frame features (e.g., prior-frame feature-location map 410 of FIG. 4), or candidate features (e.g., candidate feature-location map 408 of FIG. 4).
  • data store 810 can store parameters such as TH, described above with reference to Eq. (1), or other parameter(s) described herein, e.g., with reference to the illustrative results given below.
  • Communication interface(s) 804 enable computing device 102 to communicate with other computing devices, and can represent other means by which computing device 102 can receive video content.
  • communication interface(s) 804 can represent connections via which computing device 102 can receive video content, e.g., via a particular universal resource locator (URL).
  • the communications interface 804 can include, but is not limited to, a transceiver for Ethernet, cellular (3G, 4G, or other), WI-FI, ultra-wideband (UWB), BLUETOOTH, satellite, or other wireless transmissions.
  • the communications interface 804 can include a wired I/O interface, such as an Ethernet interface, a serial interface, a Universal Serial Bus (USB) interface, an INFINIBAND interface, or other wired interfaces
  • Video or frames 106 thereof can additionally or alternatively be received via I/O interface 802.
  • I/O interface 802 can include or be communicatively connected with, for example, one or more tuners, video-capture devices, video encoders, or format converters, enabling computing device 102 to receive and store video.
  • I/O interface 802 can include or be communicatively connected with one or more sensor(s) 112, as described above with reference to FIG. 1.
  • Sensor(s) 112 can be configured to capture video frames 106, e.g., of a scene 108, as discussed above.
  • I/O interface 802 can additionally or alternatively include, or be communicatively connected with, for example, a display device 126, enabling computing device 102 to present video content.
  • I/O interface 802 provides signals to a television or other display device that displays the video data, e.g., as discussed above with reference to FIG. 1.
  • I/O interface 802 can additionally or alternatively include, or be communicatively connected with, for example, a user-operable input device 816 (graphically represented as a gamepad), enabling a user to, e.g., direct computing device 102 to track specific features.
  • the user-operable input device 816 can also be usable to control playback of video, e.g., via fast-forward, rewind, play, pause, and stop functions.
  • Reporting component 124 can adjust display of the video on display device 126 based at least in part on inputs received via the user-operable input device 816. This can permit use of the system, e.g., in a real-time security or monitoring context.
  • detecting component 120 when executed by processor(s) 116 (and likewise throughout), produces a feature-location map based at least in part on a first frame of the video. This can be as described above with reference to FIG. 4, e.g., using FACESDK algorithms such as those described above.
  • tracking component 122 produces a candidate feature- location map based at least in part on a frame of a video and a prior feature-location map, e.g., from detecting component 120.
  • the feature corresponds to at least some image data of the frame of the video. This can be as described above with reference to tracking module 402 of FIG. 4.
  • the tracking component 122 determines candidate feature points based at least in part on the candidate feature-location map, determines a candidate feature region based at least in part on the determined candidate feature points, and locates a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video, wherein the test region is determined based at least in part on the candidate feature region.
  • This can be as described above with reference to updating module 404 of FIG. 4.
  • reporting component 124 provides a visual representation of the located feature in the frame of the video, e.g., via a graphical user interface. This can be as described above with reference to FIGS. 1 and 4.
  • the computing device 102 can include tracking component 122 but not include sensor 112.
  • the computing device 102 representing image source 104 can include sensor 112 but not implement tracking component 122.
  • the computing device 102 can include sensor 112 and implement tracking component 122.
  • a feature-tracking service system such as computing device 102 or an image-capturing system such as image source 104 can implement detecting component 120 or reporting component 124.
  • a processor 116 can represent a hybrid device, such as a device from ALTERA or XILINX that includes a CPU core embedded in an FPGA fabric.
  • FPGA Field- Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application-specific Standard Product
  • SoC System-On-a-Chip system
  • CPLD Complex Programmable Logic Device
  • DSP Digital Signal Processor
  • a processor 116 can represent a hybrid device, such as a device from ALTERA or XILINX that includes a CPU core embedded in an FPGA fabric.
  • processor 116 can be or include one or more single-core processors, multi-core processors, central processing unit (CPUs), graphics processing units (GPUs), general-purpose GPUs (GPGPUs), or hardware logic components configured, e.g., via specialized programming from modules or APIs, to perform functions described herein.
  • CPUs central processing unit
  • GPUs graphics processing units
  • GPGPUs general-purpose GPUs
  • hardware logic components configured, e.g., via specialized programming from modules or APIs, to perform functions described herein.
  • a system bus 818 typically connects the various components within computing device 102.
  • a system bus 818 can be implemented as one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, or a local bus using any of a variety of bus architectures.
  • bus architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus, e.g., a Mezzanine bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnects
  • Any of the components illustrated in FIG. 8 can be in hardware, software, or a combination of hardware and software. Further, any of the components illustrated in FIG. 8, e.g., memory 118, can be implemented using any form of computer-readable media that is accessible by computing device 102, either locally or remotely, including over a network 132.
  • Computer-readable media includes two types of computer-readable media, namely computer storage media and communications media.
  • Computer storage media e.g., a computer storage medium
  • Computer storage media includes tangible storage units such as volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer storage media includes tangible or physical forms of media included in a device or hardware component that is part of a device or external to a device, including, but not limited to, random-access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PRAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable readonly memory (EEPROM), flash memory, compact disc read-only memory (CD-ROM), digital versatile disks (DVDs), optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or memories, storage, devices, and/or storage media that can be used to store and maintain information for access by a computing device 102 or image source 104.
  • RAM random-access memory
  • SRAM static random-access memory
  • DRAM dynamic random-access memory
  • PRAM phase change memory
  • ROM
  • communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism.
  • computer storage media does not include communication media.
  • memory 118 can be or include computer storage media.
  • SFCT and Struck trackers were tested on two different video sequences, as was a tracker according to various aspects herein (example). Parameters for the example tracker were determined empirically, and included detection at least every 50 frames, discarding of out-frame trackers that have been out-of-frame for 55 frames, discarding of in-frame trackers that have not been updated for 100 frames, a minimum overlap ratio (Eq. (3)) of 0.3, a threshold median FBE of 10, beyond which a tracker is considered to be out-frame, and a threshold histogram distance of 0.5, beyond which a tracker is considered to be out-frame.
  • Eq. (3) minimum overlap ratio
  • the tested example configuration provided comparable or improved success rates compared to SFCT and Struck for both sequences, and provided improved precision rates compared to SFCT and Struck for location error thresholds of deviation from ground truth above about 20.
  • the tested example configuration also provided faster results than SFCT or Struck, as shown in Table 1.
  • Table 1 shows per-frame processing times in milliseconds. As shown, the tested example configuration was able to process over 100 fps on a computing system including an INTEL CORE i7 3.40 GHz CPU.
  • a system comprising: a processing unit; a computer-readable medium (CRM) operably coupled to the processor; a tracking module, stored in the CRM and executable by the processing unit to produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map, wherein the candidate feature-location map indicates candidate locations of one or more features and individual ones of the features correspond to at least some image data of the frame of the video; and an updating module, stored in the CRM and executable by the processing unit to: determine a candidate feature region based at least in part on the candidate feature-location map; and locate a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video, wherein the test region is determined based at least in part on the candidate feature region.
  • CRM computer-readable medium
  • [0124] B A system as paragraph A recites, further comprising a detecting component, stored in the CRM and executable by the processing unit to produce the prior feature- location map based at least in part on the prior frame of the video.
  • [0125] C A system as paragraph A or B recites, wherein the prior feature-location map corresponds to the prior frame of the video and the prior frame of the video is arranged in the video immediately before the frame of the video.
  • test region includes the candidate feature region and the feature is a face.
  • E A system as any of paragraphs A-D recites, wherein the updating module is further executable by the processing unit to: determine a second test region based at least in part on the candidate feature region or the test region; and locate the feature at the candidate feature region in the frame of the video by comparing a histogram of the second test region in the frame of the video to a histogram of a corresponding region in the prior frame of the video.
  • G A system as paragraph F recites, wherein the spatial relationship is a proximal disjoint relationship.
  • H A system as any of paragraphs A-G recites, further comprising a display device and a reporting component, stored in the CRM and executable by the processing unit to provide a visual representation of the located feature in the frame of the video via the display device.
  • J A method, comprising: detecting a feature in a first frame of a plurality of frames of video, the plurality of frames of video including the first frame and a later second frame; determining a detected first-frame feature region of the first frame corresponding to the detected feature; determining a tracked candidate second-frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first-frame feature region; comparing a histogram of the tracked candidate second-frame feature region with a histogram of the detected first-frame feature region; and select as a tracked second-frame feature region the tracked candidate second- frame feature region in response to the comparison indicating at least a threshold degree of similarity.
  • K A method as paragraph J recites, wherein the comparing includes computing a distance between the histogram of the tracked candidate second-frame feature region and the histogram of the detected first-frame feature region.
  • L A method as paragraph K recites, further comprising determining the histogram in a color space having at least hue and colorfulness dimensions.
  • M A method as any of paragraphs J-L recites, wherein determining the tracked candidate second-frame feature region comprises applying a Kanade-Lucas- Tomasi tracker to the first frame and the second frame.
  • N A method as any of paragraphs J-M recites, wherein the plurality of frames of video includes a third frame later than the second frame and wherein the method further comprises: determining one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region; detecting one or more features in the third frame; determining respective detected third- frame feature regions of the third frame corresponding to the detected features in the third frame; associating at least some of the tracked candidate third-frame feature regions with ones of the detected third-frame feature regions; and selecting as out-frame regions ones of the tracked candidate third-frame feature regions not associated with ones of the detected third-frame feature regions.
  • a method as paragraph N recites, wherein the plurality of frames of video includes a fourth frame later than the third frame and wherein the method further comprises: repeating the detecting with respect to the fourth frame to determine a detected fourth-frame feature region; and associating one of the out-frame regions with the detected fourth-frame feature region.
  • determining the tracked candidate second-frame feature regions comprises comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.
  • R A system, comprising: means for detecting a feature in a first frame of a plurality of frames of video, the plurality of frames of video including the first frame and a later second frame; means for determining a detected first-frame feature region of the first frame corresponding to the detected feature; means for determining a tracked candidate second-frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first-frame feature region; means for comparing a histogram of the tracked candidate second-frame feature region with a histogram of the detected first-frame feature region; and select as a tracked second-frame feature region the tracked candidate second-frame feature region in response to the comparison indicating at least a threshold degree of similarity.
  • S A system as paragraph R recites, wherein the means for comparing includes means for computing a distance between the histogram of the tracked candidate second- frame feature region and the histogram of the detected first-frame feature region.
  • T A system as paragraph S recites, further comprising means for determining the histogram in a color space having at least hue and colorfulness dimensions.
  • V A system as any of paragraphs R-U recites, wherein the plurality of frames of video includes a third frame later than the second frame and wherein the system further comprises: means for determining one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region; means for detecting one or more features in the third frame; means for determining respective detected third-frame feature regions of the third frame corresponding to the detected features in the third frame; means for associating at least some of the tracked candidate third-frame feature regions with ones of the detected third- frame feature regions; and means for selecting as out-frame regions ones of the tracked candidate third-frame feature regions not associated with ones of the detected third-frame feature regions.
  • W A system as paragraph V recites, wherein the plurality of frames of video includes a fourth frame later than the third frame and wherein the system further comprises: means for repeating the detecting with respect to the fourth frame to determine a detected fourth-frame feature region; and means for associating one of the out-frame regions with the detected fourth-frame feature region.
  • X A system as any of paragraphs R-W recites, wherein the means for determining the tracked candidate second-frame feature regions comprises means for comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.
  • a system comprising: a processing unit; a computer-readable medium operably coupled to the processor; a detecting component, stored in the CRM and executable by the processing unit to locate features in a first frame of a video; a tracking component, stored in the CRM and executable by the processing unit to track ones of the located features over a subsequent plurality of second frames of the video, wherein the tracking component is configured to record an indication that a first feature of the tracked ones of the located features has moved out of a particular frame of the plurality of second frames if a histogram of a first-feature region of the particular frame does not correspond to a histogram of a first-feature region of the first frame; and a reporting component, stored in the CRM and executable by the processing unit to provide an indication of at least some of the tracked ones of the located features.
  • AA A system as paragraph Z recites, wherein the reporting component is further executable by the processing unit to present for display one or more frames of the video and visual representations of the at least some of the tracked ones of the located features.
  • AC A system as any of paragraphs Z-AB recites, wherein the tracking component is further executable by the processing unit to: compute a distance between the histogram of a first-feature region of the particular frame and the histogram of the first- feature region of the first frame; and determine that the histogram of the first-feature region of the particular frame and the histogram of the first-feature region of the first frame do not correspond to each other in response to the computed distance exceeding a selected threshold.
  • AD A system as any of paragraphs Z-AC recites, further comprising an image sensor configured to provide one or more of the frames of the video.
  • AE A system as any of paragraphs Z-AD recites, wherein the located features include one or more faces or identification signs.
  • AF A computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations as any of paragraphs J-Q recites.
  • a device comprising: a processor; and a computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution by the processor configuring the device to perform operations as any of paragraphs J-Q recites.
  • AH A system comprising: means for processing; and means for storing having thereon computer-executable instructions, the computer-executable instructions including means to configure the device to carry out a method as any of paragraphs J-Q recites.
  • Video analysis techniques described herein can provide feature tracking, e.g., face tracking, using reduced processing time and memory consumption compared to prior schemes. This can provide the ability to use feature-tracking data in a wider variety of contexts, such as real-time highlighting or obscuration of faces or license plates of interest visible in a video.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be executed in any order, combined in any order, subdivided into multiple sub-operations, and/or executed in parallel to implement the described processes.
  • the described processes can be performed by resources associated with one or more computing device(s) 102, such as one or more internal or external CPUs or GPUs, and/or one or more pieces of hardware logic such as FPGAs, DSPs, or other types of accelerators.
  • the methods and processes described above can be embodied in, and fully automated via, software code modules executed by one or more general purpose computers or processors.
  • the code modules can be stored in any type of computer- readable storage medium or other computer storage device. Some or all of the methods can alternatively be embodied in specialized computer hardware.
  • Conditional language such as, among others, "can,” “could,” “might” or “may,” unless specifically stated otherwise, are understood within the context to present that certain examples include, while other examples do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that certain features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether certain features, elements and/or steps are included or are to be performed in any particular example. Conjunctive language such as the phrase “at least one of X, Y or Z,” unless specifically stated otherwise, is to be understood to present that an item, term, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Techniques and constructs can locate features in a frame of a video. A tracking module can produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map. The feature can correspond to at least some image data of the frame of the video. An updating module can determine a candidate feature region based at least in part on the candidate feature-location map and locate a feature at the candidate feature region by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video. In some examples, the test region is determined based at least in part on the candidate feature region. An image sensor can provide one or more of the frames of the video. Tracked- feature representations can be displayed.

Description

MACHINE VISION FEATURE-TRACKING SYSTEM
BACKGROUND
[0001] Machine vision techniques for automatically identifying objects or features in digital images and video, are used for a wide variety of uses. Visible features in video frames, such as faces or license plates, may move, rotate, or become shadowed or otherwise obscured, e.g., as movement of the features (e.g., objects) is captured over multiple frames of the video. Such occurrences can reduce the accuracy of machine- vision algorithms such as object location, orientation detection, human-face detection and tracking.
SUMMARY
[0002] This disclosure describes systems, methods, and computer-readable media for tracking features over multiple frames of a video. As used herein, "tracking" refers to determining the location in a subsequent frame of a feature detected in a prior frame. In some examples, a computing system can produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map. The computing system can determine a candidate feature region based at least in part on the candidate feature-location map. The computing system can then locate a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video. According to example techniques described herein, the test region can be determined based at least in part on the candidate feature region. According to example techniques described herein, features can be tracked over a video from an image sensor and an indication of tracked features can be presented, e.g., via an electronic display.
[0003] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term "techniques," for instance, can refer to systems, methods, computer-readable instructions, modules, algorithms, hardware logic, and/or operations as permitted by the context described above and throughout the document.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The same numbers are used throughout the drawings to reference like features and components. The drawings are not necessarily to scale. [0005] FIG. 1 is a block diagram depicting an example environment for implementing feature tracking in videos as described herein.
[0006] FIG. 2 is a graphical representation of example frames of video in which features are tracked.
[0007] FIG. 3 shows example histograms of regions of image data according to an example of feature tracking as described herein.
[0008] FIG. 4 is a dataflow diagram depicting example module interactions during feature tracking.
[0009] FIG. 5 is a flow diagram that illustrates an example process for tracking features in video frames.
[0010] FIG. 6 is a flow diagram that illustrates an example process for tracking and detecting features in video frames.
[0011] FIG. 7 is a flow diagram that illustrates an example process for tracking features in video frames across visibility reductions.
[0012] FIG. 8 is a block diagram depicting an example computing device configured to participate in feature tracking according to various examples described herein.
DETAILED DESCRIPTION OVERVIEW
[0013] Examples described herein provide techniques and constructs to track features in digital videos. These techniques can enable feature tracking across multiple frames of a video with increased speed, increased precision, reduced compute time, and/or reduced memory requirements. This tracking can also permit additional techniques to be performed, such as obscuring or highlighting features in the video such as faces, or selecting relevant portions of the video for further analysis, e.g., character recognition of license plates.
[0014] Some examples described herein provide improved performance compared to conventional tracking algorithms. Some prior schemes detect features anew in each frame of the video. Some examples herein detect features in one out of a group of multiple frames and then track the features between frames, reducing the computational demands of tracking. Some examples permit tracking features even when the features are not present in some frames, e.g., because they are obscured behind other features.
[0015] As used herein, the term "video" refers to a temporal sequence of digital images. The images can be regularly spaced in time, e.g., 29.97, 30, or 60 frames per second (fps), or can be irregularly spaced in time. Video can be stored and manipulated in, e.g., an uncompressed form or a compressed form. Example compression techniques can include, but are not limited to, those described in the Portable Network Graphics (PNG), Joint Photographic Experts Group (JPEG), Motion JPEG, and Moving Picture Experts Group (MPEG) standards.
[0016] Some example scenarios and example techniques for feature tracking are presented in greater detail in the following description of the figures.
ILLUSTRATIVE ENVIRONMENT
[0017] FIG. 1 shows an example environment 100 in which examples of feature- tracking systems can operate or in which feature tracking methods such as described below can be performed. In the illustrated example, various devices and/or components of environment 100 include a computing device 102, depicted as a server computer. Computing device 102 represents any type of device that can receive and process video content. Computing device 102 can be implemented as, for example, but without limitation, an Internet-enabled television, a television set-top box, a game console, a desktop computer, a laptop computer, a tablet computer, or a smartphone. Different devices or types of devices can have different uses for feature-tracking data. For example, controllers for robots or industrial machines can use feature-tracking information from a video of their workspaces to determine the location of workpieces during movement or a work step. Surveillance or other video-recording systems operating, e.g., in public spaces, can use feature-tracking data to highlight suspects in a video or to obscure people's faces to protect their privacy.
[0018] In the illustrated example, computing device 102 includes, or is communicatively connected with, an image source 104, depicted as a camera. Image source 104 can include one or more computing device(s) or other systems configured to provide video frames 106. In the illustrated example, image source 104 provides a video of a scene 108. The video has multiple video frames 106. Scene 108 includes a subject 110, in this example, a person. This example scene 108 is for purposes of illustration and is not limiting.
[0019] In some examples, computing device 102 or image source 104 can include one or more sensors 112, e.g., configured to capture video frames 106 or otherwise provide video frames 106 or data that can be processed into video frames 106. For example, image sensor 112 can be configured to provide a video having a plurality of frames 106. Example image sensors can include front- and rear-facing cameras of a smartphone, a light sensor (e.g., a CdS photoresistor or a phototransistor), a still imager (e.g., a charge- coupled device (CCD), a complementary metal-oxide-semiconductor (CMOS) sensor, etc.), a video imager (e.g., CCD or CMOS), a fingerprint reader, a retinal scanner, an iris scanner, a computed-radiography scanner, or the like. Some example sensors 112 can include visible-light image sensors (e.g., λ G [400 nm, 700 nm]) or infrared-light image sensors (e.g., λ G [700 nm, 15 μπι] or λ G [700 nm, 1 mm]). Individual one(s) of one or more sensor(s) 112 can be configured to output sensor data corresponding to at least one physical property, e.g., a physical property of an environment of the device, such as ambient light or a scene image or video frame.
[0020] The image data representing the subject 110 in one or more of the video frame(s) 106 has a feature 114, in this example, the person's face. In addition to faces, other example features can include, but are not limited to, identification signs, e.g., license plates or parking permits on vehicles, boat registration markers on hulls or sails, aircraft tail numbers (e.g., "N38290" or "B-8888"), street-address numbers displayed on buildings or mailboxes, patterns on articles of clothing, e.g., cartoon faces, grotesques, or corporate or government logos on T-shirts, or articles of clothing, accessories, or vehicles having colors or patterns readily distinguishable from the background of the scene 108, e.g., a red umbrella, a black hat, or a striped dress. Various aspects herein track faces or other features 114 across multiple video frames 106.
[0021] Computing device 102 includes at least one processor 116 and a memory 118 configured to store, e.g., video frame(s) 106 or other video data being processed. Memory 118 can also store one or more of a detecting component 120, a tracking component 122 and a reporting component 124 stored in the memory 118 and executable on the processor 116. The components 120, 122, or 124 can include, e.g., modules stored on computer-readable media such as computer storage media (discussed below) and having thereon computer-executable instructions. The details of example computing device 102 can be representative of other computing devices 102 or of image source(s) 104. However, individual ones of computing devices 102 or image sources 104 can include additional or alternative hardware and/or software components.
[0022] Detecting component 120 (or modules thereof, and likewise throughout) can be configured to locate features 114, e.g., faces, in one or more video frames 106. Tracking component 122 can be configured to track features 114 across multiple video frames 106, e.g., to determine how a given feature has moved between two consecutive frames, or to activate or deactivate trackers for specific features based on shadowing, occlusion, or other changes in the visibility of the features. Reporting component 124 can be configured to, e.g., provide a visual representation of a located feature, such as a face, in frame(s) 106 of a video. In some examples, reporting component 124 can be configured to render at least some of the frames 106 and tracking indications for display via a display device 126. Display device 126 can include an organic light-emitting-diode (OLED) display, a liquid- crystal display (LCD), a cathode-ray tube (CRT), or another type of visual display. Display device 126 can be a component of a touchscreen, or can include a touchscreen.
[0023] In some examples, reporting component 124 can provide, via display device 126, visual representations 128 of video frames or feature-tracking data, as discussed below. In some examples, reporting component 124 can present for display one or more frames 106 of the video and visual representations 128 of the at least some of the tracked ones of the located features. Example visual representations 128(1)— 128(3) (individually or collectively referred to herein with reference 128) are shown in FIG. 1. Representations 128 can represent possible graphical displays of tracking and video information, and can represent internal data structures discussed below.
[0024] Representation 128(1) represents a video frame 106 at a first time t = 0. The video frame at t = 0 includes image data of a human subject 110. Time can be represented, e.g., in wall-clock time (hours, minutes or seconds), frames or fields (e.g., as in timecodes standardized by the Society of Motion Picture and Television Engineers, SMPTE), or any combination thereof. A tracker 130(1) is shown highlighting a feature of interest, in this example, the face of subject 110. As described herein, a tracker is a data structure representing a particular feature across one or more of the video frames 106, and can be graphically represented as a highlight such as an outline (illustrated) or as an obscurant such as a darkened or blurred area. Other example types of highlights and obscurants are discussed below.
[0025] Representation 128(2) represents a video frame 106 at a later time t = 1. Relative to the video frame at time t = 0, the subject 110 has moved to the right of the frame 106 (to the subject's left). Similarly, tracker 130(2) has moved to the right of the frame 106 in order to indicate the location of the feature across the video frames 106 at times t = 0, 1.
[0026] Representation 128(3) represents a video frame 106 at a later time t = 2. Subject 110 has moved further to the right of the frame 106 (to the subject's left), and is beginning to move out of the frame 106. Tracker 130(3) has also moved further to the right of the frame 106 in order to indicate the location of the feature across the video frames 106 at times t = 0, 1, 2. Even though the feature is partially obscured by the edge of the frame 106, tracker 130(3) is still able to track the feature in this example.
[0027] In the illustrated example environment 100, detecting component 120, tracking component 122, and reporting component 124 can each be components of an operating system (not shown) or otherwise stored or executed locally on computing device 102 or image source 104. However, in other examples one or more of the system components can be implemented as part of a distributed system, for example, as part of a cloud-based service or as components of another application. For example, image source 104 can be embodied in a smartphone and computing device 102 can be embodied in a cloud or other hosted feature-tracking service.
[0028] In some examples, image source 104 is embodied in or connected to computing device 102. For example, image source 104 and computing device 102 can be embodied in a smartphone, tablet, MICROSOFT SURFACE, APPLE IPAD, or other device configured to capture a video and track features therein or display tracking information of features therein. This can advantageously permit tracking features even when not connected to a network. In some examples, a processor (not shown) of image source 104 can communicate with computing device 102 via a network 132, such as the Internet. Computing device 102 can host one or more of detecting component 120, tracking component 122, and reporting component 124, and can exchange data with image source 104 via network 132 to perform processing described herein. Network 132 can include a cable television network, radio frequency (RF), microwave, satellite, and/or data network, such as the Internet, and can also support wired or wireless media using any format and/or protocol, such as broadcast, unicast, or multicast. Additionally, network 132 can be any type of network, wired or wireless, using any type of network topology and any network communication protocol, and can be represented or otherwise implemented as a combination of two or more networks.
ILLUSTRATIVE PROCESSING
[0029] FIG. 2 shows a graphical representation 200 of example frames 106 of video, e.g., of scene 108, in which features are tracked, e.g., as discussed herein. FIG. 2 shows representations of four frames 202(1 )-202(4) (individually or collectively referred to herein with reference 202). Dotted or short-dashed rectangles represent trackers 204(1)- 204(3) (individually or collectively referred to herein with reference 204). Trackers 204(l)-204(3) correspond to respective tracked regions of the frames 202. The frames 202 show a total of three distinct people, namely subjects 206(l)-206(3) (individually or collectively referred to herein with reference 206). For purposes of exposition, and without limitation, the trackers 204 are numbered corresponding to the three subjects 206.
[0030] Between frames 202(1) and 202(2), subject 206(1) (depicted as a man) moves towards the right of the frame and subject 206(2) (depicted as a woman) moves towards the left of the frame, closer to the camera than subject 206(1). Trackers 204(1) and 204(2), respectively, highlight and represent the detected or tracked features, in this example, the faces of subjects 206(1) and 206(2), respectively. The initial locations of trackers 204(1) and 204(2) can be determined, e.g., by the detecting component 120.
[0031] In frame 202(2), subject 206(2) has moved in front of subject 206(1) (from the camera's point of view). Tracking component 122 has determined a new position of tracker 204(2) to correspond to the new (tracked) location of the face of subject 206(2). However, the face of subject 206(1) is obscured. Tracking component 122 has detected the obscuration and has designated tracker 204(1) as an out-frame tracker 208, represented graphically below frame 202(2) for purposes of explanation. That is, the feature being tracked by tracker 204(1) is not visible in frame 202(2). As used herein, the term "in- frame tracker" refers to a tracker that is not an out-frame tracker 208.
[0032] Inset 210 shows an enlarged view of the faces of subjects 206(1) and 206(2) in frame 202(2). Tracker 204(2) indicates the tracked area of the face of subject 206(2). Candidate feature region 212 shows a region determined, e.g., by tracking component 122, as corresponding to the tracked region of tracker 204(1) in frame 202(1). Tracking component 122 can locate the feature 114 of interest in candidate feature region 212, or determine that the feature 114 of interest is not visible in candidate feature region 212, by comparing a histogram of a test region in frame 202(2) of the video, e.g., the candidate feature region 212, to a histogram of a corresponding region in a prior frame 202(1) of the video.
[0033] In this example, tracking component 122 has determined histogram 214 as an example histogram of the tracked region of tracker 204(1) in frame 202(1). For clarity of explanation, and without limitation, histogram 214 shows percentages of black ("K"), brown ("Br"), and flesh tone ("Fl") colors in the tracked region of tracker 204(1), e.g., 25% K, 15% Br, and 60% Fl. In various examples, histograms include binned data of various color components of pixels in a region of a frame, e.g., as discussed below with reference to Eq. (2). The specific colors that are typically represented in a histogram correspond to the color space (e.g., specific amounts of red, green, and blue in the RGB color space). Black, brown, and flesh tone colors are used in this example, merely to simplify the explanation. In an example implementation, however, the brown, black, and flesh tones are actually represented as combinations of, e.g., red, green, and blue color information or hue, saturation, and value information. Also in this example, tracking component 122 has determined histogram 216 of candidate feature region 212 in frame 202(2). In the illustrated example, subject 206(1) has a relatively smaller amount of black hair and subject 206(2) has a relatively larger amount of brown hair. As shown, because candidate feature region 212 includes some of the brown hair of subject 206(2), histogram 216 is different from histogram 214. In this example, histogram 216 has 20% K, 50% Br, and 30% Fl.
[0034] The tracking component 122 can compare histograms, e.g., by computing differences or distances between histograms. In the illustrated example, tracking component 122 determines a distance (or difference, and likewise throughout) between histogram 214 and histogram 216. Tracking component 122 determines that the distance is greater than a selected threshold, and therefore that candidate feature region 212 does not correspond to the tracked feature 114, in this example, the face of subject 206(1). Tracking component 122 therefore marks or otherwise designates tracker 204(1) in frame 202(2) as an out-frame tracker 208, as shown.
[0035] Between frames 202(2) and 202(3), subject 206(2) has moved to the left of subject 206(1) (from the camera's point of view). Tracking component 122 has determined an updated location of tracker 204(2) corresponding to the tracked location of the face of subject 206(2). In some examples, since there was an out-frame tracker 208 in prior frame 202(2), the detecting component 120 can detect features (e.g., faces) in frame 202(3). The detecting component 120, in the illustrated example, detects the face of subject 206(1), no longer obscured behind subject 206(2). The tracking component 122 can compare the newly-detected face of subject 206(1) with in-frame trackers, out-frame trackers 208, or any combination thereof. For example, the tracking component 122 can compare the stored location information of out-frame tracker(s) 208 such as tracker 204(1) with location information of detected feature(s). Tracking component 122 can additionally or alternatively compare histograms of newly-discovered features 114 with histograms of in-frame trackers or of out-frame trackers 208. In the illustrated example, the tracking component 122 has determined that tracker 204(1) corresponds to the detected face of subject 206(1) in frame 202(3). [0036] In some examples, the tracking component 122 can additionally or alternatively track features based at least in part on test regions different from the tracked regions. This can permit, e.g., distinguishing between subjects' faces using image data of the clothing the subjects are wearing instead of, or in addition to, image data of the subjects' faces. Since clothing can vary more person-to-person than flesh color, using clothing can improve the accuracy of tracking faces, e.g., when one person crosses in front of another ("face crossing"), e.g., as shown in frames 202(l)-202(3), or when a face disappears and then reappears ("occlusion"), e.g., as shown in frames 202(2) and 202(3). This can also provide improved accuracy in, e.g., tracking license plates of vehicles. For example, vehicle color and bumper sticker color and pattern can vary more vehicle-to-vehicle than license plate color. Accordingly, tracking based at least in part on image data of portions of the bumper of a vehicle in addition to (or instead of) image data of portions of the license plate of the vehicle can permit more accurately tracking multiple vehicles in a scene 108. For clarity of explanation, only one test region is described here in addition to the tracked region. However, any number of test regions can be used for a given tracked region. The number of test regions for a given tracked region can vary between frames 202. The tracking component 122 can determine test regions based on candidate feature regions such as candidate feature region 212, on tracked regions such as for trackers 204, or any combination thereof.
[0037] In the illustrated example, the tracking component 122 can determine a test region 218 in frame 202(1) based at least in part on the tracked region, e.g., of tracker 204(1). The tracking component 122 can also determine respective test regions 220(1) and 220(2) in frame 202(3) based at least in part on the tracked regions of trackers 204(1) and 204(2) in frame 202(3), e.g., the locations of respective detected faces of subjects 206(1), 206(2). Histograms of these regions can be used in tracking, as is discussed below with reference to FIG. 3. The areas shown by trackers 204(1) and 204(2) in frames 202(l)-202(4) can be, e.g., candidate feature regions or test regions as discussed herein with reference to tracking component 122. For example if subjects 206(1) and 206(2) had very similar skin tones and hair color, test regions 220(1) and 220(2), corresponding to the subjects' clothing, could be used to distinguish the subjects from one another.
[0038] In frame 202(4), a new subject 206(3) has appeared in the scene 108. The detecting component 120 can locate a new feature, e.g., the face of newly-appeared subject 206(3), in frame 202(4). The tracking component 122 can determine, e.g., using comparisons of locations, histograms, or other data described herein, that the new feature does not correspond to an existing tracker, e.g., an out-frame tracker 208, and can thus allocate a new tracker 204(3) initialized with the detection results. In this example, for purposes of illustration, the visual representation of tracker 204(3) is shown as an obscurant shading the face of subject 206(3). Other examples of obscurants are discussed herein.
[0039] In some examples, tracking component 122 can determine a correspondence between detected features 114 and trackers 204 using assignment algorithms. Examples of detection of features are discussed above with reference to frames 202(1) and 202(4). Assignment algorithms, such as the Hungarian algorithm or other dynamic-programming or combinatorial optimization algorithms, use a score, e.g., a difference or distance, between a given tracker (in-frame or out-frame) and a given detected feature. For example, the score can be the histogram distance between image data in a tracked region and image data in a region holding a detected feature; the overlap ratio, discussed below with reference to Eq. (3), between a tracker and a feature; or a combination thereof, e.g., a weighted sum.
[0040] Assignment algorithms can determine a correspondence between a given set of detected features and a given set of trackers (e.g., in-frame, out-frame, or both) so that the score is mathematically maximized (e.g., for goodness-of-fit scores) or minimized (e.g., for distance scores). Other mathematical optimization algorithms can also or alternatively be used, e.g., gradient-descent algorithms. Other algorithms can be used, e.g., partitioning the image into regions and performing assignment or other testing on trackers and detected features in the same region.
[0041] In some examples, both in-frame and out-frame trackers can be provided to the assignment algorithm (or other mathematical optimization technique). The algorithm, e.g., executed on processor 116, can determine a mapping between the trackers and the detected features. The mapping may indicate that some trackers do not correspond to any detected feature. Those trackers can then, in response to such a mapping, be designated as out-frame trackers 208.
[0042] FIG. 3 shows example histograms 300 of example test regions discussed above with reference to FIG. 2. As discussed above, in some examples, test regions (e.g., clothing regions) other than the feature regions (e.g., face regions) are used in tracking the features 114, e.g., across obscurations. With reference to both FIGS. 2 and 3, in frame 202(1), tracking component 122 can determine test region 218 based at least in part on, e.g., location information or other properties of tracker 204(1) in frame 202(1) or the detected feature represented by tracker 204(1) in frame 202(1). Tracking component 122 can also determine test regions 220(1) and 220(2) based at least in part on, e.g., location information or other properties of trackers 204(1) and 204(2) in frame 202(3) or the detected features represented by trackers 204(1) and 204(2) in frame 202(3).
[0043] In the illustrated example, subject 206(1) is wearing a white ("Wh") shirt and subject 206(2) is wearing a red ("R") dress. Accordingly, the illustrated respective histograms 302(1) and 302(2) of second test regions 220(1) and 220(2) show 5% R/95% Wh and 100% R/0% Wh, respectively. Histogram 304 of test region 218 shows 0% R/100% Wh. As noted above with reference to FIG. 2, the example histograms are shown for purposes of explanation and without limitation. In an example, instead of a simple red/white binning, a histogram can include equally sized hue bins centered on [0°, 15°, ... , 360°), equally sized saturation bins centered on [0%, 10%, 100%], and equally sized value bins centered on [0%, 10%, 100%] . Red can be represented, in this example, as histogram peaks of hue near 0°, saturation near 100%, and value, e.g., 50% or higher. White can be represented as pixels with histogram peaks of saturation near 0% and value near 100%.
[0044] The tracking component 122 can compute histogram distances and determine that, since the distance between the histogram 304 (corresponding to test region 218) and histogram 302(1) (corresponding to test region 220(1)) is smaller than the distance between histogram 304 and histogram 302(2) (corresponding to test region 220(2)), tracker 204(1) should be associated with a region above test region 220(1) rather than with a region above test region 220(2). That is, the feature being tracked by tracker 204(1) in frame 202(3) corresponds to an area near test region 220(1) rather than to an area near test region 220(2), as shown. The result, in the illustrated example, is that tracker 204(1) remains associated with subject 206(1) even when subject 206(1) is obscured (frame 202(2)) and then reappears (frame 202(3)).
[0045] Accordingly, in some examples, the tracking component 122 can locate the feature at the candidate feature region (indicated by tracker 204(2)) in the frame 202(3) of the video by comparing a histogram 302(1) of the second test region 220(1) in the frame 202(3) of the video to histogram 304 of a corresponding test region 218 in a prior frame 202(1) of the video.
[0046] In some examples, the tracking component 122 can locate the feature if histograms of the test regions correspond, or if histograms of the candidate regions correspond, or if histograms of both the test and candidate regions correspond, or if histograms of a majority of histograms (candidate and any test region(s)) correspond, or a if a selected number of histograms correspond, or any combination thereof. In some examples, the tracking component 122 can locate the feature if no histogram, or a minority of histograms, or fewer than a selected number of histograms fail to correspond or have distances exceeding a selected threshold, or any combination thereof, or any combination of any of the items in this paragraph.
[0047] FIG. 4 is a dataflow diagram 400 illustrating example interactions between components illustrated in FIG. 1, and showing example modules of tracking component 122. The modules of the tracking component 122, e.g., stored in memory 118, can include one or more modules, e.g., shell modules, or application programming interface (API) modules, which are illustrated as a tracking module 402 and an updating module 404. In some examples, image source 104 provides video frame(s) 106 to be processed, e.g., by detecting component 120 or tracking component 122. The tracking component 122 can track features 114 in frames 106 of video, e.g., using tracking information or color histograms of candidate regions. In some examples, tracking component 122 is configured to update tracking results from a Kanade-Lucas-Tomasi (KLT) tracker based at least in part on histograms of image content of selected regions of frames 106, as described herein.
[0048] In some examples, the detecting component 120 can be configured to locate feature(s) 114 in a first frame 106 of the video, or other frame(s) of the video. In some examples, the detecting component 120 can be configured to produce a detected feature- location map 406 based at least in part on the frame 106 of the video. The detected feature-location map 406 can include, e.g., locations or bounding rectangles or other bounding areas of one or more detected features 114 in the frame 106. This can be done, e.g., using algorithms implemented in MICROSOFT FACESDK. Such algorithms can include, but are not limited to, dynamic cascades, boost classifiers, boosting chain learning, neural -network classification, support vector machine (SVM) classification, or Bayesian classification. An example algorithm is the Viola-Jones detector using adaptive boost (AdaBoost) learning of cascaded sub-detectors using rectangular Haar-like features. In some examples, the tracking module 402 can be configured to produce a candidate feature-location map 408 based at least in part on a subsequent frame 106 of a video and the detected feature-location map 406. The feature can correspond to at least some image data of the frame of the video. The tracking module 402 can produce the candidate feature-location map 408 further based on a previous frame 106 of the video, a corresponding previous feature-location map, e.g., prior-frame feature-location map 410 (e.g., provided by the detecting component 120 or the tracking module 402), or a combination thereof. This can be done, e.g., using a KLT or other tracking technique, e.g., the Scaled Fast Compressive Tracking (SFCT) or Struck tracker. For example, KLT can be applied to the previous frame 106 and the subsequent frame 106. The candidate feature-location map 408 can include, e.g., zero, one, or more candidate areas for individual features located in the detected feature-location map 406. In some examples, the tracking module 402 can use a tracker, such as KLT, that is robust to scale change. This can permit tracking features 114 as subjects 110 approach or recede from the camera.
[0049] In some examples, the detected feature-location map 406 corresponds to a first video frame and the candidate feature-location map 408 corresponds to a second video frame immediately following the first video frame. In some examples, the detected feature-location map 406 is a prior feature-location map corresponding to a prior video frame arranged in the video immediately before the frame of the video for which the candidate feature-location map 408 is determined. Examples of such configurations are discussed above with respect to frames 202(1) and 202(2), FIG. 2.
[0050] In some examples, the tracking module 402 uses a KLT tracker. The KLT tracker, in some examples, uses mathematical optimization to determine a region in a later frame corresponding to a region in an earlier frame. The sum of squared intensity differences between pixels of the region in the later frame and pixels of a selected region of an earlier frame is mathematically minimized. The region of the later frame is then the tracked equivalent of the region of the earlier frame. For example, the region of the earlier frame, e.g., a first frame, can include image data of a feature 114, e.g., a face. The region in the earlier frame and the corresponding region of the later frame, e.g., determined via mathematical minimization according to the KLT algorithm, can be the regions in the respective frames for a tracker 130 of the feature 114. The region in the later frame can be an example of a candidate feature-location map 408. In some examples, KLT or another tracker can be computed based at least in part on a selected subset of the points in a region of the earlier frame.
[0051] In some examples, the tracking module 402 can be configured to track located features over subsequent second frames of the video. For example, in a sequence of frames 1, 2, the tracking module 402 can use KLT or another tracking technique to track features from frame 1 to frame 2, then from frame 2 to frame 3, and so on. The tracking module 402 can be configured to determine that a histogram of a first-feature region of a particular one of the second frames does not correspond to a histogram of a first-feature region of the first frame. In response, the tracking module 402 can record an indication that a tracked feature corresponding to the first-feature region has moved out of the frame (e.g., out of at least one of the second frames). This is discussed above in more detail with reference to out-frame trackers 208. This can permit tracking features even when they are obscured, e.g., for a small number of frames.
[0052] In some examples, the updating module 404 can be configured to determine candidate feature points based at least in part on the candidate feature-location map 408. For example, the updating module 404 can uniformly or non-uniformly select a predetermined number of points, e.g., a 10x 10 grid, from a given region in the candidate feature-location map 408. In some examples, the updating module 404 can select a predetermined number of points of the given region, the selected points having relatively high-magnitude gradients within the region, e.g., the n points having the highest- magnitude gradients within the region, for integer s. In some examples, the updating module 404 can select points having local maxima of curvature, e.g., as measured by the determinant of the Hessian matrix or an approximation of the Hessian matrix using, e.g., Haar filters. In some examples, the updating module 404 can select points by decomposing and downsampling using, e.g., Haar wavelets, and selecting points having selected magnitudes in the decomposed images, e.g., within a certain percentage of the highest-magnitude value after wavelet transformation.
[0053] In some examples, the updating module 404 can apply a tracker such as KLT, or use results from the tracking module 402, to track the selected points bidirectionally, e.g., from a first frame to a second frame, and then from the tracked point in the second frame back to the first frame. The starting point and the point in the first frame after bidirectional tracking may not be the same point; the distance between them is referred to as the "forward-backward error" (FBE). In some examples, the updating module 404 can determine the FBE for individual ones of the selected points. The updating module 404 can then determine a representative FBE value, e.g., the median or mean of at least some of the determined FBEs, or all of the determined FBEs, or a selected percentage of any of those. In some examples, the updating module 404 can then determine, as the candidate feature points, one or more of the points meeting selected criteria, e.g., having respective FBEs less than the determined representative FBE value. This can advantageously reduce noise in tracking due, e.g., to changes in shadows or lighting between one frame and the next.
[0054] In some examples, the updating module 404 can be configured to determine a candidate feature region based at least in part on the determined candidate feature points. In some examples, the updating module 404 can determine, for one or more of the candidate feature points, a point-displacement vector between the point in the earlier frame and the tracked point in the later frame. The updating module 404 can then determine a region-displacement vector as e.g., a median or mean of one or more of the point- displacement vectors. In some examples, the updating module 404 can determine, for pairs of the candidate feature points, a pair-scale factor as the quotient of the distance between the points in the pair in the earlier frame and the tracked points in the pair in the later frame. The updating module 404 can then determine a region-scale factor as, e.g., a median or mean of one or more of the pair-scale factors. In some examples, the updating module 404 can then apply the region-displacement vector and region-scale factor to a region of the prior-frame feature-location map 410 to determine a corresponding region in the candidate feature-location map 408.
[0055] In some examples, the updating module 404 can be configured to determine a candidate feature region based at least in part on FBE values. For example, for any particular candidate feature region, if the median (or mean or other representative) FBE is greater than a selected threshold, the updating module 404 can, e.g., disregard that candidate feature region or consider a tracker 130 corresponding to that candidate feature region to be an out-frame tracker, as discussed above with reference to out-frame trackers 208, FIG. 2. This can permit detecting tracking failures, e.g., due to occlusion of a face feature by an umbrella, or of a license-plate feature by a building.
[0056] In some examples, the updating module 404 can be configured to locate a feature 114 at (or at least partly in, on, within, over, or under, and likewise throughout) the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video. This can be done, e.g., as discussed above with reference to histograms 214 and 216 shown in FIG. 2, and histograms 302(1), 302(2), and 304 shown in FIG. 3. In some examples, the updating module 404 can be configured to locate the feature 114 if a histogram of image data of a portion, defined by the candidate feature region, of a video frame 106 is within a selected distance of a previously-determined image-data histogram corresponding to the feature 114. Locating a feature based at least in part on histogram comparisons can improve the robustness of the tracker to drifting over a sequence of frames.
[0057] In some examples, distances between histograms can be computed using various metrics, including but not limited to, χ2 (Chi Squared), Kolmogorov-Smirnov (KS), Cramer/von Mises (CvM), Quadratic Form (QF), earth-mover's (Wasserstein- Rubinstein-Mallows) distance, Kullback-Liebler (KL), symmetrized KL, Jeffrey divergence (JD), Euclidean distance in a feature space having as many dimensions as bins of the histogram, or Overton's histogram intersection.
[0058] In some examples, the distance between two histograms H1 and H2 is computed as in Eq. (1):
Figure imgf000018_0001
where L is the number of bins in the histograms and Hx is the mean of histogram x. In some examples, two histograms, e.g., a detected histogram and a tracked candidate histogram, are determined to correspond if d^H^ H2) < TH for a selected threshold TH. In some examples, a histogram Ht is formed by computing respective histograms of individual color components, e.g., a hue histogram, a saturation histogram, and a value histogram. Each histogram can be computed using values of the corresponding color components of the image. The histogram Ht can then be formed by concatenating the computed respective histograms. For example, given hue histogram [h^ hn], saturation histogram [s^ ... , sn], and value histogram [v^—, vn], Ht can be formed as in Eq. (2):
Hi —
Figure imgf000018_0002
hn, s1( ... , sn, vlt vn] (2)
The hue, saturation, and value components can alternatively be interleaved (e.g., hi, si, vi, h2, . . . ), or part of the histogram (e.g., a particular component or range) can be interleaved and part not interleaved, or any combination thereof.
[0059] Referring to the example shown in FIG. 2, since the distance between histogram 216 and histogram 214 exceeds a selected threshold TH, the tracking component 122 determines that the candidate feature region 212 is not associated with the detected region indicated by tracker 204(1). In some of these examples, in response to this determination or other histogram-match or histogram-distance criteria, the tracking component 122 selects, as a region corresponding to an out-frame tracker 208, the tracked candidate feature region 212. In some examples, the tracking component 122 can determine that a tracker 204 is an out-frame tracker 208 if, e.g., no candidate-region histogram in a current frame matches or is within a selected distance of a prior-frame histogram associated with that tracker 204.
[0060] In some examples, the updating module 404 can be configured to determine the test region based at least in part on the candidate feature region. For example, the test region can be equal in size to, or larger or smaller than, the candidate feature region, and can have corner(s) or a centroid overlying that of the candidate feature region or spaced apart therefrom. In some examples, the test region can overlap with the candidate feature region or be separate therefrom. The test region can be determined, e.g., based on a stored spatial relationship. In some examples, the test region is or includes the candidate feature region. In some of these examples or other examples, the feature is a face. In some examples, the test region has a different size or location than the candidate feature region. Some examples of determining test regions are discussed above with reference to test regions 220(1) and 220(2), FIG. 2.
[0061] In some examples, two regions are considered to correspond based on color distances. For example, two regions can be considered to correspond if the respective predominant colors in those regions (e.g., the most frequently occurring colors or color bins) are within a specified distance in a color space, e.g., a CIELAB ΔΕ* of <1.0. Other example color difference metrics include the J2 norm || · ||2, i.e., the Euclidean distance in a color space (e.g., L*a*b* or RGB), the Ll norm \\ - \\1 (Manhattan distance), and other LP norms in a color space. Other example color difference metrics include hue-angle difference, per-component or multi-component differences (e.g., CIELAB AL*, AC*, Ah*, Aa*, or Ab*), Hamming distance between digital representations of one or more components in a color representation or color space, and CIE 1976 UCS AuV .
[0062] In some examples, the updating module is further configured to determine the test region based at least in part on a stored spatial relationship between the candidate feature region and the test region. For example, the stored spatial relationship can be a proximal disjoint relationship specifying the test region is spaced apart from the candidate feature region, e.g., by a predetermined amount in a predetermined direction. This can permit locating the feature using nearby image data. The stored spatial relationship can be represented in scale-invariant terms, e.g., percentage of the size of the feature region. This can permit tracking based at least in part on test regions even when the features are changing in size or scale, e.g., because a subject 110 is moving closer to the camera or farther from the camera, e.g., image sensor 112.
[0063] For example, when the feature 114 is a face, it may be difficult to differentiate faces based on color, since human skin has a limited range. However, clothing tends to vary between people and tends to be near faces and spaced apart from faces. Similarly, bumper stickers tend to be near license plates but spaced apart therefrom. Using test region(s) having proximal disjoint relationship(s) with the candidate feature regions permits using image data near features 114 to distinguish and track those features 114. Further examples are discussed above with reference to test regions 220(1) and 220(1), FIG. 2.
[0064] In some examples, histograms as described herein can be computed in one or more color spaces, such as those defined by the International Commission on Illumination (CIE), the National Television System Committee (NTSC), or the International Electrotechnical Commission (IEC). Example color spaces that can be used include CIE L*a*b* (CIELAB), CIE L*u*v* (CIELUV, 1976), CIE YuV (the CIE 1976 uniform coordinate scale, UCS, plus luminance Y), YIQ (NTSC 1953), sRGB (IEC 61966-2- 1 : 1999), linear RGB, or other color spaces. In some examples, the histograms can be computed or otherwise determined in a color space having at least hue and colorfulness dimensions. Examples of such color spaces include but are not limited to Hue/Saturation/ Value (HSV), Hue/Saturation/Lightness (HSL), Hue/Saturation/Intensity (HSI), and CIE L*C*h*. For example, the images can include respective color components corresponding to the dimensions of the color space.
[0065] In some examples, the tracking component 122 is configured to compute a distance, e.g., in a selected color space, between the histogram of a first-feature region in one frame and a histogram of a first-feature region in an earlier frame. The tracking component 122 can further be configured to determine those histograms do not correspond in response to the computed distance exceeding a selected threshold, e.g., ΔΕ*>1.0.
[0066] In some examples, a feature-tracking system can include a display device 126. In some examples, reporting component 124 can be configured to provide a visual representation of the located feature in the frame of the video for display. In some examples, reporting component 124 can be configured to provide the visual representation via the display device 126. Examples are discussed above with reference to representations 128 in FIG. 1. In some examples, the reporting component 124 can be configured to present for display frames of the video and visual representations of at least some of the detected and/or tracked features.
[0067] In some examples, the visual representations can include at least one obscurant. Example obscurants can include, e.g., a darkened (e.g., black) area, a lightened (e.g., white) area, an area in which image data is spatially distorted, or an area in which image data is blurred. Using obscurants can reduce the likelihood of visually identifying the feature when viewing a representation 128, e.g., to protect privacy of people walking by a traffic camera.
[0068] In some examples, the visual representations can include at least one highlight. Example highlights can include, e.g., an outline of the feature or of an area including the feature (e.g., trackers 130 as graphically represented in FIG. 1), a lightened or darkened background underlying or overlaying the detected feature (e.g., a semi-transparent colored highlight overlaid on image data of the feature), or an arrow or other shape pointing at or otherwise indicating the feature. In some examples, a representation 128 can include any combination of obscurants or highlights for respective tracked features.
[0069] In some examples, the reporting component 124 can be configured to provide an indication of at least some of the detected and/or tracked features. For example, the reporting component 124 can be configured to present representations 128 of features 114 detected by the detecting component 120, features 114 tracked by the tracking module 402 or the updating module 404, or any combination thereof. For example, detection highlights and tracking highlights can be presented in different colors or line styles to permit readily distinguishing them in a representation 128.
ILLUSTRATIVE PROCESSES
[0070] FIGs. 5-7 illustrate example processes for tracking features in video frames. The methods are illustrated as sets of operations shown as discrete blocks. The methods can be implemented in any suitable hardware, software, firmware, or combination thereof. For example, functions shown in FIGs. 5-7 can be implemented on or otherwise embodied in one or more computing devices 102 or image sources 104, e.g., using software running on such devices. In the context of software, the operations represent computer-executable instructions that, when executed by one or more processors, cause one or more processors to perform the recited operations. In the context of hardware, the operations represent logic functions implemented in circuitry, e.g., datapath-control and finite-state-machine sequencing functions. [0071] The order in which the operations are described is not to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement each process. For clarity of explanation, reference is made to various components and data items shown in FIGs. 1-4 that can carry out or participate in the steps of the exemplary methods. It should be noted, however, that other components can be used; that is, exemplary methods shown in FIGs. 5-7 are not limited to being carried out by the identified components.
[0072] FIG. 5 illustrates an example process 500 for tracking features in video frames.
[0073] At block 502, one or more features are detected in a first frame. For example, detecting component 120 detects one or more features and corresponding detected first- frame feature regions within a first frame of a video. The video can include the first frame and a later second frame. The video frames can be received and processed in any order. Processing steps described below can be interleaved or executed in any order unless otherwise specified. For example, detection (block 502) can be performed for the first frame after both the first and second frames have been received.
[0074] At block 504, one or more of the detected feature are tracked in the second frame. For example, tracking module 402 determines corresponding tracked candidate second-frame feature regions in the second frame, based at least in part on the first frame, the second frame, and the detected first-frame feature regions. In some examples, block 504 includes comparing at least some image data of the second frame with at least some image data of the first frame. In some examples, tracking module 402 can use a tracking algorithm such as KLT to determine a candidate second-frame feature region by finding a mathematical best fit between the image data of the first frame in a first-frame feature region and image data of the second frame.
[0075] In some examples, block 504 includes comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame. This is referred to herein as "bidi" tracking. Bidi tracking can permit more accurately tracking features across sets of three or more frames, e.g., for analysis of recorded video or other applications in which multiple frames can be received before features are identified in the frames. In some examples, image source 104 or computing device 102 can rearrange frames, e.g., in a sequence such as a sequence of I-, P-, and B-pictures specified in one or more MPEG video standards, e.g., MPEG-2, MPEG-4 part 2, or MPEG-4 part 10. This can permit processing frames having timestamps later in the video before frames having timestamps earlier in the video, e.g., for bidi tracking.
[0076] At block 506, histograms of the tracked candidate second-frame feature regions are compared with histograms of the detected first-frame feature regions. For example, updating module 404 determines the histograms in a color space having at least hue and colorfulness dimensions. In some examples, updating module 404 compares the histograms by computing distances between histograms of the tracked candidate second- frame feature regions and histograms of the detected first-frame feature regions.
[0077] In some examples, at block 506, histograms are computed for test regions instead of or in addition to tracked candidate second-frame feature regions or detected first-frame feature regions. This can be done, e.g., as discussed above with reference to histograms 302(1), 302(2), and 304 shown in FIG. 3, which can be used independently of, or in conjunction with, histograms 214 and 216 shown in FIG. 2.
[0078] At block 508, the tracked candidate second-frame feature regions for which the comparison indicates at least a threshold degree of similarity can be selected as tracked second-frame feature regions. For example, updating module 404 removes from consideration as a tracked second-frame feature region, a candidate second-frame feature, if a distance between the histogram of that candidate region and the histogram of the detected region exceeds a selected threshold.
[0079] FIG. 6 illustrates an example process 600 for tracking and detecting features in frames of a video, e.g., using a computing device such as computing device 102.
Blocks 502-508 of FIG. 6 can be performed, e.g., as described above with reference to
FIG. 5. Block 508 can be followed by block 602.
[0080] In some examples, the frames of video include a third frame later than the second frame. In some examples, the frames of video include a fourth frame later than the third frame. The labels "first," "second," "third," and "fourth" are used for clarity of exposition and are not limiting except as expressly noted. In some examples, operations described herein with reference to the first, third, and fourth frames can be used without a second frame. Such examples can include, e.g., detecting features in the first frame, e.g., frame 202(1) of FIG. 2, determining that one or more features are not visible in the second frame, e.g., frame 202(2), FIG. 2, and determining that at least some of the one or more features are visible in the third and fourth frames, e.g., frames 202(3) and 202(4) of FIG. 2. [0081] At block 602, the tracking (block 504) is repeated with respect to the third frame. In some examples, tracking component 122 determines one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region.
[0082] At block 604, the comparing (block 506) is repeated with respect to the third frame. In some examples, tracking component 122 compares histograms between candidate regions in the third frame and detected regions in the first frame. In some examples, tracking component 122 compares histograms between candidate regions in the third frame and tracked regions in the second frame.
[0083] At block 606, the selecting (block 508) is repeated with respect to the third frame to determine one or more tracked candidate third-frame feature regions. The operations in blocks 602-606 can be performed as described above with reference to FIGS. 2-5, e.g., with reference to frame 202(2) or 202(3), FIG. 2 or with reference to FIG. 3. As noted above, some trackers 204 may not be selected and may be flagged as out-frame trackers 208, and likewise throughout.
[0084] At block 608, the detecting (block 502) is repeated with respect to the third frame to determine one or more detected third-frame feature regions. This can be done, e.g., as discussed above with reference to tracker 204(1) in frame 202(3) of FIG. 2.
[0085] At block 610, at least some of the tracked candidate third-frame feature regions are associated with detected third-frame feature regions. This can be done, e.g., as described above with reference to frame 202(3) of FIG. 2, and the association of out-frame tracker 204(1) with subject 206(1), e.g., based at least in part on histograms in test region 220(1) and test region 218.
[0086] In some examples, block 610 includes computing respective overlap ratios for pair-wise combinations of the candidate third-frame feature regions and the detected third- frame feature regions. The overlap ratio R(RT, RD) of the region RT of a tracked feature and the region RD of a detected feature can be computed as in Eq. (3):
\RT n RD \
where RT and RD are the sets of pixel locations in the respective regions and | · | represents set cardinality. [0087] At block 612, the tracked candidate third-frame feature regions not associated with a detected third-frame feature region are selected as out-frame regions. This can be done, e.g., as described above with reference to frame 202(2) of FIG. 2. In some examples, block 612 can be followed by block 614 and/or block 620.
[0088] At block 614, the tracking (block 504) is repeated with respect to the fourth frame.
[0089] At block 616, the comparing (block 506) is repeated with respect to the fourth frame.
[0090] At block 618, the selecting (block 508) is repeated with respect to the fourth frame to determine one or more tracked candidate fourth-frame feature regions. The operations in blocks 614-618 can be performed as described above with reference to FIGS. 2-5, e.g., with reference to frame 202(3) shown in FIG. 2.
[0091] At block 620, the detecting (block 502) is repeated with respect to the fourth frame to determine one or more detected fourth-frame feature regions. This can be done, e.g., as discussed above with reference to frames 202(3) or 202(4) of FIG. 2.
[0092] At block 622, at least some of the tracked candidate fourth-frame feature regions can be associated with corresponding out-frame regions. This can be done, e.g., as described above with reference to tracker 204(1) in frame 202(3), shown in FIG. 2. As noted above, this can permit tracking features even when they are obscured for, e.g., fewer than a selected number of frames. In some examples, at least some of the tracked candidate fourth-frame feature regions can be associated with corresponding detected fourth-frame feature regions.
[0093] FIG. 7 illustrates an example process 700 for tracking features in frames of a video, e.g., across occlusion events or other reductions in visibility of a feature. Blocks 502 and 504 of FIG. 7 can be performed as discussed above with reference to FIG. 5. Blocks 502 and 504, and other blocks shown in FIG. 7, can be performed for one or more frames of video.
[0094] At decision block 702, it is determined whether any trackers have moved out- frame, e.g., by failure to associate with a detected feature. This can be done, e.g., as discussed above with reference to block 610 of FIG. 6, or as discussed above with reference to histograms 214 and 216 shown in FIG. 2.
[0095] If it is determined that there are out-frame trackers (the "Yes" branch from decision block 702), then at block 704, information of out-frame trackers is recorded. This can be done, e.g., as discussed above with reference to frame 202(2) of FIG. 2. In some examples, block 704 can be followed by block 708, by block 714, or by blocks 708 and 714 in that order, e.g., directly or with intervening blocks such as decision block 710.
[0096] If it is determined that there are no out-frame trackers (the "No" branch from decision block 702), then at decision block 706, it is determined whether a detection interval has expired. For example, tracking component 122 counts frames of video processed since the most recent frame for which detection was performed, e.g., as discussed with reference to block 502 or block 708. Every frame or every n frames, tracking component 122 compares the count to a stored counter TG time to determine whether the detection interval has expired. Tracking component 122 resets the counter to 0, in this example, in block 502 and in block 708.
[0097] Example detection intervals can include, but are not limited to, a selected amount of wall-clock time, a selected number of frames in which tracking has been performed, or a selected number of frames of video received. In some examples, the detection interval is 30 frames, or as many frames as constitute one second of the video. In some examples, tracking (e.g., KLT) is performed every frame and detection (e.g., FACESDK) is performed only every n frames, for detection interval n.
[0098] If it is determined that the detection interval has not expired (the "No" branch from block 704), then processing continues as described above with reference to block 504.
[0099] On the other hand, if it is determined that the detection interval has expired (the "Yes" branch from block 704), then at block 708, features can be detected in the video. This can be done, e.g., as discussed above with reference to detecting component 120 and block 502. For example, as discussed above with reference to frames 202(3) and 202(4), previously-obscured features can reappear in a frame or new features can appear in a frame.
[0100] At decision block 710, it is determined whether any features were detected at block 708. This can be done as discussed above with reference to detecting component 120.
[0101] If no features are detected (the "No" branch from decision block 710), then at decision block 712, tracking component 122 determines whether there are any in-frame trackers, e.g., tracker 204(2) of FIG. 2. If there are in-frame trackers (the "Yes" branch from decision block 712), processing continues as described above with reference to block 504. On the other hand, if there are no active in-frame trackers (the "No" branch from decision block 712), then processing continues as described above with reference to block 708. Accordingly, detection is repeated across multiple video frames until there are features in the video frames to be tracked.
[0102] Referring back to decision block 710, if features are detected (the "Yes" branch from decision block 710), then at block 714, one or more in-frame or out-frame trackers are updated, e.g., to be associated with the detected features. This can be done, e.g., as described above with reference to blocks 610 or 622 of FIG. 6.
[0103] In some examples, at block 716, tracking component 122 can delete expired trackers. In some examples, tracking component 122 discards out-frame trackers 208 that have not been updated for a predetermined interval such as a number of frames, elapsed time, or other time described above, e.g., 10 frames, 30 frames, or I s. Tracking component 122 in these examples removes trackers for features that have not reappeared, e.g., faces of people who have left the field of view of the sensor 112.
[0104] In some examples, for example in which features are similar in color to the background, a tracker may drift over several frames to tracking a static portion of the background (e.g,. a wall) instead of the feature. In some examples, at block 716, tracking component 122 discards in-frame trackers that have not been updated for a selected number TA of frames, e.g., 50 frames or 100 frames. Tracking component 122 in these examples removes trackers that, for example, have drifted to the background. Tracking component 122 can determine that an in-frame tracker has not been updated if the tracker has not moved more than a selected distance over the course of the TA frames, or if the tracker has not been resized by more than a selected absolute percentage area change (e.g., ±5%) over the course of the TA frames.
ILLUSTRATIVE COMPONENTS
[0105] FIG. 8 illustrates select components of an example computing device 102. In the illustrated example, computing device 102 includes one or more processor(s) 116, a memory 118, input/output (I/O) interfaces 802, and/or communications interface(s) 804. Memory 118 can be implemented as any combination of various types of memory components, e.g., computer-readable media or computer storage media components. Examples of possible memory components include a random access memory (RAM), a disk drive, a mass storage component, and a non-volatile memory (e.g., ROM, Flash, EPROM, EEPROM, etc.). Alternative implementations of computing device 102 can include a range of processing and memory capabilities. For example, full-resource computing devices can be implemented with substantial memory and processing resources, including a disk drive to store content for replay by the viewer. Low-resource computing devices, however, can have limited processing and memory capabilities, such as a limited amount of RAM, no disk drive, and limited processing capabilities.
[0106] Processor(s) 116 process various instructions to control the operation of computing device 102 and to communicate with other electronic and computing devices. For example, the processor(s) 116 can be configured to execute modules of a plurality of modules, discussed below, on the memory 118. In some examples, the computer- executable instructions stored on the memory 118 can, upon execution, configure a computer such as a computing device 102 (e.g., a computing device 102 or image source 104) to perform operations described herein with reference to, e.g., detecting component 120, tracking component 122, reporting component 124, or modules of any of those. The modules stored in the memory 118 can include instructions that, when executed by the one or more processor(s) 116, cause the one or more processor(s) 116 to perform operations described herein.
[0107] The memory 118 stores various information and/or data, including, for example, a detecting component 120, a tracking component 122 (including, e.g., tracking module 402 and updating module 404, FIG. 4), a reporting component 124, and optionally an operating system 806 and/or one or more other applications 808. Functionality described associated with the illustrated components or modules can be combined to be performed by a fewer number of components or modules or can be split and performed by a larger number of components or modules. The other applications 808 can include, for example, an Internet browser that includes video capabilities, a media player application, a video editing application, a video streaming application, a television viewing application, and so on. In some examples, computer-executable instructions of detecting component 120, tracking component 122, reporting component 124, and applications 808 stored in one or more computer-readable media (e.g., memory 118), when executed on processor 116 of computing device 102, direct computing device 102 to perform functions listed herein with respect to the relevant components in memory 118.
[0108] In the illustrated example, memory 118 includes a data store 810. In some examples, data store 810 can include a first memory for storing one or more video frame(s) 812, e.g., video frames 106 of FIG. 1. Individual video frames can include, e.g., data of a plurality of pixels. The data of individual pixels can include respective values for individual color channels (planes), e.g., red (R), green (G), and blue (B) color channels; luma (Y), blue chroma (Cb), and red chroma (Cr) color channels; or other color organizations. In some examples, data store 810 can store one or more feature-location map(s) 814, e.g., holding the locations of detected features (e.g., detected feature-location map 406 of FIG. 4), prior-frame features (e.g., prior-frame feature-location map 410 of FIG. 4), or candidate features (e.g., candidate feature-location map 408 of FIG. 4). In some examples, data store 810 can store parameters such as TH, described above with reference to Eq. (1), or other parameter(s) described herein, e.g., with reference to the illustrative results given below.
[0109] Communication interface(s) 804 enable computing device 102 to communicate with other computing devices, and can represent other means by which computing device 102 can receive video content. For example, in an environment that supports transmission of video content over an IP network, communication interface(s) 804 can represent connections via which computing device 102 can receive video content, e.g., via a particular universal resource locator (URL). In some examples, the communications interface 804 can include, but is not limited to, a transceiver for Ethernet, cellular (3G, 4G, or other), WI-FI, ultra-wideband (UWB), BLUETOOTH, satellite, or other wireless transmissions. The communications interface 804 can include a wired I/O interface, such as an Ethernet interface, a serial interface, a Universal Serial Bus (USB) interface, an INFINIBAND interface, or other wired interfaces
[0110] Video or frames 106 thereof can additionally or alternatively be received via I/O interface 802. I/O interface 802 can include or be communicatively connected with, for example, one or more tuners, video-capture devices, video encoders, or format converters, enabling computing device 102 to receive and store video. In some examples, I/O interface 802 can include or be communicatively connected with one or more sensor(s) 112, as described above with reference to FIG. 1. Sensor(s) 112 can be configured to capture video frames 106, e.g., of a scene 108, as discussed above.
[0111] I/O interface 802 can additionally or alternatively include, or be communicatively connected with, for example, a display device 126, enabling computing device 102 to present video content. In example implementations, I/O interface 802 provides signals to a television or other display device that displays the video data, e.g., as discussed above with reference to FIG. 1.
[0112] I/O interface 802 can additionally or alternatively include, or be communicatively connected with, for example, a user-operable input device 816 (graphically represented as a gamepad), enabling a user to, e.g., direct computing device 102 to track specific features. The user-operable input device 816 can also be usable to control playback of video, e.g., via fast-forward, rewind, play, pause, and stop functions. Reporting component 124 can adjust display of the video on display device 126 based at least in part on inputs received via the user-operable input device 816. This can permit use of the system, e.g., in a real-time security or monitoring context.
[0113] In some examples, detecting component 120, when executed by processor(s) 116 (and likewise throughout), produces a feature-location map based at least in part on a first frame of the video. This can be as described above with reference to FIG. 4, e.g., using FACESDK algorithms such as those described above.
[0114] In some examples, tracking component 122 produces a candidate feature- location map based at least in part on a frame of a video and a prior feature-location map, e.g., from detecting component 120. The feature corresponds to at least some image data of the frame of the video. This can be as described above with reference to tracking module 402 of FIG. 4. The tracking component 122 then determines candidate feature points based at least in part on the candidate feature-location map, determines a candidate feature region based at least in part on the determined candidate feature points, and locates a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video, wherein the test region is determined based at least in part on the candidate feature region.. This can be as described above with reference to updating module 404 of FIG. 4.
[0115] In some examples, reporting component 124 provides a visual representation of the located feature in the frame of the video, e.g., via a graphical user interface. This can be as described above with reference to FIGS. 1 and 4.
[0116] In some examples, e.g., of a computing device 102 providing a feature tracking service, the computing device 102 can include tracking component 122 but not include sensor 112. In some examples, e.g., of an image source 104 such as a smartphone making use of a feature tracking service, the computing device 102 representing image source 104 can include sensor 112 but not implement tracking component 122. In some examples, e.g., of a computing device 102 or image source 104 implementing both a feature tracking service and the use thereof, the computing device 102 can include sensor 112 and implement tracking component 122. In some examples, a feature-tracking service system such as computing device 102 or an image-capturing system such as image source 104 can implement detecting component 120 or reporting component 124.
[0117] Although shown separately, some of the components of computing device 102 can be implemented together in a single hardware device, such as in a Field- Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), Application-specific Standard Product (ASSP), System-On-a-Chip system (SoC), Complex Programmable Logic Device (CPLD), Digital Signal Processor (DSP), or other type of customizable processor. For example, a processor 116 can represent a hybrid device, such as a device from ALTERA or XILINX that includes a CPU core embedded in an FPGA fabric. These or other hardware logic components can operate independently or, in some instances, can be driven by a CPU. In some examples, processor 116 can be or include one or more single-core processors, multi-core processors, central processing unit (CPUs), graphics processing units (GPUs), general-purpose GPUs (GPGPUs), or hardware logic components configured, e.g., via specialized programming from modules or APIs, to perform functions described herein.
[0118] Additionally, a system bus 818 typically connects the various components within computing device 102. A system bus 818 can be implemented as one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, or a local bus using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus, e.g., a Mezzanine bus.
[0119] Any of the components illustrated in FIG. 8 can be in hardware, software, or a combination of hardware and software. Further, any of the components illustrated in FIG. 8, e.g., memory 118, can be implemented using any form of computer-readable media that is accessible by computing device 102, either locally or remotely, including over a network 132. Computer-readable media includes two types of computer-readable media, namely computer storage media and communications media. Computer storage media (e.g., a computer storage medium) includes tangible storage units such as volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes tangible or physical forms of media included in a device or hardware component that is part of a device or external to a device, including, but not limited to, random-access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PRAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable readonly memory (EEPROM), flash memory, compact disc read-only memory (CD-ROM), digital versatile disks (DVDs), optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or memories, storage, devices, and/or storage media that can be used to store and maintain information for access by a computing device 102 or image source 104.
[0120] In contrast to computer storage media, communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media. In some examples, memory 118 can be or include computer storage media.
ILLUSTRATIVE RESULTS
[0121] SFCT and Struck trackers (comparative) were tested on two different video sequences, as was a tracker according to various aspects herein (example). Parameters for the example tracker were determined empirically, and included detection at least every 50 frames, discarding of out-frame trackers that have been out-of-frame for 55 frames, discarding of in-frame trackers that have not been updated for 100 frames, a minimum overlap ratio (Eq. (3)) of 0.3, a threshold median FBE of 10, beyond which a tracker is considered to be out-frame, and a threshold histogram distance of 0.5, beyond which a tracker is considered to be out-frame. The tested example configuration provided comparable or improved success rates compared to SFCT and Struck for both sequences, and provided improved precision rates compared to SFCT and Struck for location error thresholds of deviation from ground truth above about 20. The tested example configuration also provided faster results than SFCT or Struck, as shown in Table 1.
[0122] Table 1 shows per-frame processing times in milliseconds. As shown, the tested example configuration was able to process over 100 fps on a computing system including an INTEL CORE i7 3.40 GHz CPU.
Tracker SFCT (Comparative) Struck (Comparative) Example
Video 1 14.68 56.26 6.62
Video 2 17.41 51.13 9.47
Table 1 EXAMPLE CLAUSES
[0123] A: A system comprising: a processing unit; a computer-readable medium (CRM) operably coupled to the processor; a tracking module, stored in the CRM and executable by the processing unit to produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map, wherein the candidate feature-location map indicates candidate locations of one or more features and individual ones of the features correspond to at least some image data of the frame of the video; and an updating module, stored in the CRM and executable by the processing unit to: determine a candidate feature region based at least in part on the candidate feature-location map; and locate a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video, wherein the test region is determined based at least in part on the candidate feature region.
[0124] B: A system as paragraph A recites, further comprising a detecting component, stored in the CRM and executable by the processing unit to produce the prior feature- location map based at least in part on the prior frame of the video.
[0125] C: A system as paragraph A or B recites, wherein the prior feature-location map corresponds to the prior frame of the video and the prior frame of the video is arranged in the video immediately before the frame of the video.
[0126] D: A system as any of paragraphs A-C recites, wherein the test region includes the candidate feature region and the feature is a face.
[0127] E: A system as any of paragraphs A-D recites, wherein the updating module is further executable by the processing unit to: determine a second test region based at least in part on the candidate feature region or the test region; and locate the feature at the candidate feature region in the frame of the video by comparing a histogram of the second test region in the frame of the video to a histogram of a corresponding region in the prior frame of the video.
[0128] F: A system as any of paragraphs A-E recites, wherein the updating module is further executable by the processing unit to determine the test region based at least in part on a spatial relationship between the candidate feature region and the test region.
[0129] G: A system as paragraph F recites, wherein the spatial relationship is a proximal disjoint relationship. [0130] H: A system as any of paragraphs A-G recites, further comprising a display device and a reporting component, stored in the CRM and executable by the processing unit to provide a visual representation of the located feature in the frame of the video via the display device.
[0131] I: A system as any of paragraphs A-H recites, wherein the one or more features include one or more faces or identification signs.
[0132] J: A method, comprising: detecting a feature in a first frame of a plurality of frames of video, the plurality of frames of video including the first frame and a later second frame; determining a detected first-frame feature region of the first frame corresponding to the detected feature; determining a tracked candidate second-frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first-frame feature region; comparing a histogram of the tracked candidate second-frame feature region with a histogram of the detected first-frame feature region; and select as a tracked second-frame feature region the tracked candidate second- frame feature region in response to the comparison indicating at least a threshold degree of similarity.
[0133] K: A method as paragraph J recites, wherein the comparing includes computing a distance between the histogram of the tracked candidate second-frame feature region and the histogram of the detected first-frame feature region.
[0134] L: A method as paragraph K recites, further comprising determining the histogram in a color space having at least hue and colorfulness dimensions.
[0135] M: A method as any of paragraphs J-L recites, wherein determining the tracked candidate second-frame feature region comprises applying a Kanade-Lucas- Tomasi tracker to the first frame and the second frame.
[0136] N: A method as any of paragraphs J-M recites, wherein the plurality of frames of video includes a third frame later than the second frame and wherein the method further comprises: determining one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region; detecting one or more features in the third frame; determining respective detected third- frame feature regions of the third frame corresponding to the detected features in the third frame; associating at least some of the tracked candidate third-frame feature regions with ones of the detected third-frame feature regions; and selecting as out-frame regions ones of the tracked candidate third-frame feature regions not associated with ones of the detected third-frame feature regions.
[0137] O: A method as paragraph N recites, wherein the plurality of frames of video includes a fourth frame later than the third frame and wherein the method further comprises: repeating the detecting with respect to the fourth frame to determine a detected fourth-frame feature region; and associating one of the out-frame regions with the detected fourth-frame feature region.
[0138] P: A method as any of paragraphs J-0 recites, wherein determining the tracked candidate second-frame feature regions comprises comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.
[0139] Q: A method as any of paragraphs J-P recites, wherein the feature includes one or more faces or identification signs.
[0140] R: A system, comprising: means for detecting a feature in a first frame of a plurality of frames of video, the plurality of frames of video including the first frame and a later second frame; means for determining a detected first-frame feature region of the first frame corresponding to the detected feature; means for determining a tracked candidate second-frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first-frame feature region; means for comparing a histogram of the tracked candidate second-frame feature region with a histogram of the detected first-frame feature region; and select as a tracked second-frame feature region the tracked candidate second-frame feature region in response to the comparison indicating at least a threshold degree of similarity.
[0141] S: A system as paragraph R recites, wherein the means for comparing includes means for computing a distance between the histogram of the tracked candidate second- frame feature region and the histogram of the detected first-frame feature region.
[0142] T: A system as paragraph S recites, further comprising means for determining the histogram in a color space having at least hue and colorfulness dimensions.
[0143] U: A system as any of paragraphs R-T recites, wherein means for determining the tracked candidate second-frame feature region comprises means for applying a Kanade-Lucas-Tomasi tracker to the first frame and the second frame.
[0144] V: A system as any of paragraphs R-U recites, wherein the plurality of frames of video includes a third frame later than the second frame and wherein the system further comprises: means for determining one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region; means for detecting one or more features in the third frame; means for determining respective detected third-frame feature regions of the third frame corresponding to the detected features in the third frame; means for associating at least some of the tracked candidate third-frame feature regions with ones of the detected third- frame feature regions; and means for selecting as out-frame regions ones of the tracked candidate third-frame feature regions not associated with ones of the detected third-frame feature regions.
[0145] W: A system as paragraph V recites, wherein the plurality of frames of video includes a fourth frame later than the third frame and wherein the system further comprises: means for repeating the detecting with respect to the fourth frame to determine a detected fourth-frame feature region; and means for associating one of the out-frame regions with the detected fourth-frame feature region.
[0146] X: A system as any of paragraphs R-W recites, wherein the means for determining the tracked candidate second-frame feature regions comprises means for comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.
[0147] Y: A system as any of paragraphs R-X recites, wherein the feature includes one or more faces or identification signs.
[0148] Z: A system comprising: a processing unit; a computer-readable medium operably coupled to the processor; a detecting component, stored in the CRM and executable by the processing unit to locate features in a first frame of a video; a tracking component, stored in the CRM and executable by the processing unit to track ones of the located features over a subsequent plurality of second frames of the video, wherein the tracking component is configured to record an indication that a first feature of the tracked ones of the located features has moved out of a particular frame of the plurality of second frames if a histogram of a first-feature region of the particular frame does not correspond to a histogram of a first-feature region of the first frame; and a reporting component, stored in the CRM and executable by the processing unit to provide an indication of at least some of the tracked ones of the located features.
[0149] AA: A system as paragraph Z recites, wherein the reporting component is further executable by the processing unit to present for display one or more frames of the video and visual representations of the at least some of the tracked ones of the located features.
[0150] AB: A system as paragraph AA recites, wherein the visual representations include at least one obscurant or at least one highlight.
[0151] AC: A system as any of paragraphs Z-AB recites, wherein the tracking component is further executable by the processing unit to: compute a distance between the histogram of a first-feature region of the particular frame and the histogram of the first- feature region of the first frame; and determine that the histogram of the first-feature region of the particular frame and the histogram of the first-feature region of the first frame do not correspond to each other in response to the computed distance exceeding a selected threshold.
[0152] AD: A system as any of paragraphs Z-AC recites, further comprising an image sensor configured to provide one or more of the frames of the video.
[0153] AE: A system as any of paragraphs Z-AD recites, wherein the located features include one or more faces or identification signs.
[0154] AF: A computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations as any of paragraphs J-Q recites.
[0155] AG: A device comprising: a processor; and a computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution by the processor configuring the device to perform operations as any of paragraphs J-Q recites.
[0156] AH: A system comprising: means for processing; and means for storing having thereon computer-executable instructions, the computer-executable instructions including means to configure the device to carry out a method as any of paragraphs J-Q recites. CONCLUSION
[0157] Video analysis techniques described herein can provide feature tracking, e.g., face tracking, using reduced processing time and memory consumption compared to prior schemes. This can provide the ability to use feature-tracking data in a wider variety of contexts, such as real-time highlighting or obscuration of faces or license plates of interest visible in a video.
[0158] Although detection and tracking of features has been described in language specific to structural features and/or methodological steps, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or steps described. Rather, the specific features and steps are disclosed as preferred forms of implementing the claimed invention.
[0159] The operations of the example processes are illustrated in individual blocks and summarized with reference to those blocks. The processes are illustrated as logical flows of blocks, each block of which can represent one or more operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, enable the one or more processors to perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be executed in any order, combined in any order, subdivided into multiple sub-operations, and/or executed in parallel to implement the described processes. The described processes can be performed by resources associated with one or more computing device(s) 102, such as one or more internal or external CPUs or GPUs, and/or one or more pieces of hardware logic such as FPGAs, DSPs, or other types of accelerators.
[0160] The methods and processes described above can be embodied in, and fully automated via, software code modules executed by one or more general purpose computers or processors. The code modules can be stored in any type of computer- readable storage medium or other computer storage device. Some or all of the methods can alternatively be embodied in specialized computer hardware.
[0161] Conditional language such as, among others, "can," "could," "might" or "may," unless specifically stated otherwise, are understood within the context to present that certain examples include, while other examples do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that certain features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether certain features, elements and/or steps are included or are to be performed in any particular example. Conjunctive language such as the phrase "at least one of X, Y or Z," unless specifically stated otherwise, is to be understood to present that an item, term, etc. can be either X, Y, or Z, or a combination thereof. [0162] Any routine descriptions, elements or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternate implementations are included within the scope of the examples described herein in which elements or functions can be deleted, or executed out of order from that shown or discussed, including substantially synchronously or in reverse order, depending on the functionality involved as would be understood by those skilled in the art. It should be emphasized that many variations and modifications can be made to the above-described examples, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

1. A system comprising:
a processing unit;
a computer-readable medium (CRM) operably coupled to the processing unit; a tracking module, stored in the CRM and executable by the processing unit to produce a candidate feature-location map based at least in part on a frame of a video and a prior feature-location map, wherein the candidate feature-location map indicates candidate locations of one or more features and individual ones of the features correspond to at least some image data of the frame of the video; and
an updating module, stored in the CRM and executable by the processing unit to: determine a candidate feature region based at least in part on the candidate feature-location map; and
locate a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a prior frame of the video, wherein the test region is determined based at least in part on the candidate feature region.
2. A system as claim 1 recites, further comprising a detecting component, stored in the CRM and executable by the processing unit to produce the prior feature- location map based at least in part on the prior frame of the video.
3. A system as either claim 1 or 2 recites, wherein the test region includes the candidate feature region and the feature is a face.
4. A system as any one of claims 1-3 recites, wherein the updating module is further executable by the processing unit to:
determine a second test region based at least in part on the candidate feature region or the test region; and
locate the feature at the candidate feature region in the frame of the video by comparing a histogram of the second test region in the frame of the video to a histogram of a corresponding region in the prior frame of the video.
5. A system as any one of claims 1-4 recites, wherein the updating module is further executable by the processing unit to determine the test region based at least in part on a spatial relationship between the candidate feature region and the test region.
6. A system as any one of claims 1-5 recites, further comprising a reporting component stored in the CRM and executable by the processing unit to provide a visual representation of the located feature in the frame of the video for display.
7. A method, comprising:
detecting a feature in a first frame of a plurality of frames of video, the plurality of frames of video including the first frame and a later second frame;
determining a detected first-frame feature region of the first frame corresponding to the detected feature;
determining a tracked candidate second-frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first-frame feature region;
comparing a histogram of the tracked candidate second-frame feature region with a histogram of the detected first-frame feature region; and
select as a tracked second-frame feature region the tracked candidate second-frame feature region in response to the comparison indicating at least a threshold degree of similarity.
8. A method as claim 7 recites, wherein the comparing includes computing a distance between the histogram of the tracked candidate second-frame feature region and the histogram of the detected first-frame feature region.
9. A method as either claim 7 or 8 recites, wherein determining the tracked candidate second-frame feature region comprises applying a Kanade-Lucas-Tomasi tracker to the first frame and the second frame.
10. A method as claim 9 recites, wherein the plurality of frames of video includes a third frame later than the second frame and wherein the method further comprises:
determining one or more tracked candidate third-frame feature regions of the third frame based at least in part on the second frame, the third frame, and one or more tracked second-frame feature regions including the tracked second-frame feature region;
detecting one or more features in the third frame;
determining respective detected third-frame feature regions of the third frame corresponding to the detected features in the third frame;
associating at least some of the tracked candidate third-frame feature regions with ones of the detected third-frame feature regions; and
selecting as out-frame regions ones of the tracked candidate third-frame feature regions not associated with ones of the detected third-frame feature regions.
11. A method as claim 10 recites, wherein the plurality of frames of video includes a fourth frame later than the third frame and wherein the method further comprises:
repeating the detecting with respect to the fourth frame to determine a detected fourth-frame feature region; and
associating one of the out-frame regions with the detected fourth-frame feature region.
12. A system comprising:
a processing unit;
a computer-readable medium (CRM) operably coupled to the processing unit; a detecting component, stored in the CRM and executable by the processing unit to locate features in a first frame of a video;
a tracking component, stored in the CRM and executable by the processing unit to track ones of the located features over a subsequent plurality of second frames of the video, wherein the tracking component is configured to record an indication that a first feature of the tracked ones of the located features has moved out of a particular frame of the plurality of second frames if a histogram of a first-feature region of the particular frame does not correspond to a histogram of a first-feature region of the first frame; and a reporting component, stored in the CRM and executable by the processing unit to provide an indication of at least some of the tracked ones of the located features.
13. A system as claim 12 recites, wherein the reporting component is further executable by the processing unit to present for display one or more frames of the video and visual representations of the at least some of the tracked ones of the located features.
14. A system as either claim 12 or 13 recites, wherein the tracking component is further executable by the processing unit to:
compute a distance between the histogram of a first-feature region of the particular frame and the histogram of the first-feature region of the first frame; and
determine that the histogram of the first-feature region of the particular frame and the histogram of the first-feature region of the first frame do not correspond to each other in response to the computed distance exceeding a selected threshold.
15. A system as any one of claims 12-14 recites, further comprising an image sensor configured to provide one or more of the frames of the video.
PCT/US2016/044143 2015-08-13 2016-07-27 Machine vision feature-tracking system WO2017027212A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510496055.0A CN106469443B (en) 2015-08-13 2015-08-13 Machine vision feature tracking system
CN201510496055.0 2015-08-13

Publications (1)

Publication Number Publication Date
WO2017027212A1 true WO2017027212A1 (en) 2017-02-16

Family

ID=56682259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/044143 WO2017027212A1 (en) 2015-08-13 2016-07-27 Machine vision feature-tracking system

Country Status (2)

Country Link
CN (1) CN106469443B (en)
WO (1) WO2017027212A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612813A (en) * 2019-02-26 2020-09-01 北京海益同展信息科技有限公司 Face tracking method and device
WO2021135064A1 (en) * 2020-01-03 2021-07-08 平安科技(深圳)有限公司 Facial recognition method and apparatus, and computer device and storage medium
US20220080974A1 (en) * 2020-09-17 2022-03-17 Hyundai Motor Company Vehicle and method of controlling the same
US20220358661A1 (en) * 2019-07-05 2022-11-10 Nec Corporation Method, apparatus and non-transitory computer readable medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6961363B2 (en) * 2017-03-06 2021-11-05 キヤノン株式会社 Information processing system, information processing method and program
CN108304755B (en) * 2017-03-08 2021-05-18 腾讯科技(深圳)有限公司 Training method and device of neural network model for image processing
US11023761B2 (en) * 2017-11-06 2021-06-01 EagleSens Systems Corporation Accurate ROI extraction aided by object tracking
US10572723B2 (en) * 2017-12-07 2020-02-25 Futurewei Technologies, Inc. Activity detection by joint human and object detection and tracking
WO2020001759A1 (en) * 2018-06-27 2020-01-02 Telefonaktiebolaget Lm Ericsson (Publ) Object tracking in real-time applications
CN109360199B (en) * 2018-10-15 2021-10-08 南京工业大学 Blind detection method of image repetition region based on Watherstein histogram Euclidean measurement
CN113228626B (en) * 2018-12-29 2023-04-07 浙江大华技术股份有限公司 Video monitoring system and method
CN110110670B (en) * 2019-05-09 2022-03-25 杭州电子科技大学 Data association method in pedestrian tracking based on Wasserstein measurement
US11087162B2 (en) * 2019-08-01 2021-08-10 Nvidia Corporation Determining relative regions of interest in images using object detection
CN111507355B (en) * 2020-04-17 2023-08-22 北京百度网讯科技有限公司 Character recognition method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110142282A1 (en) * 2009-12-14 2011-06-16 Indian Institute Of Technology Bombay Visual object tracking with scale and orientation adaptation
WO2012141663A1 (en) * 2011-04-13 2012-10-18 Alptekin Temizel A method for individual tracking of multiple objects

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392461B (en) * 2014-12-17 2017-07-11 中山大学 A kind of video tracing method based on textural characteristics
CN104637062A (en) * 2015-02-17 2015-05-20 海南大学 Target tracking method based on particle filter integrating color and SURF (speeded up robust feature)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110142282A1 (en) * 2009-12-14 2011-06-16 Indian Institute Of Technology Bombay Visual object tracking with scale and orientation adaptation
WO2012141663A1 (en) * 2011-04-13 2012-10-18 Alptekin Temizel A method for individual tracking of multiple objects

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BASIT ABDUL ET AL: "Fast target redetection for CAMSHIFT using back-projection and histogram matching", 2014 INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS (VISAPP), SCITEPRESS, vol. 3, 5 January 2014 (2014-01-05), pages 507 - 514, XP032792264 *
KENI BERNARDIN ET AL: "Automatic Person Detection and Tracking using Fuzzy Controlled Active Cameras", CVPR '07. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION; 18-23 JUNE 2007; MINNEAPOLIS, MN, USA, IEEE, PISCATAWAY, NJ, USA, 1 June 2007 (2007-06-01), pages 1 - 8, XP031114732, ISBN: 978-1-4244-1179-5 *
LAN XIAOSONG ET AL: "A super-fast online face tracking system for video surveillance", 2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 22 May 2016 (2016-05-22), pages 1998 - 2001, XP032941974, DOI: 10.1109/ISCAS.2016.7538968 *
MARTIJN LIEM ET AL: "A hybrid algorithm for tracking and following people using a robotic dog", HUMAN-ROBOT INTERACTION (HRI), 2008 3RD ACM/IEEE INTERNATIONAL CONFERENCE ON, IEEE, 12 March 2008 (2008-03-12), pages 185 - 192, XP032209238, ISBN: 978-1-60558-017-3, DOI: 10.1145/1349822.1349847 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612813A (en) * 2019-02-26 2020-09-01 北京海益同展信息科技有限公司 Face tracking method and device
US20220358661A1 (en) * 2019-07-05 2022-11-10 Nec Corporation Method, apparatus and non-transitory computer readable medium
WO2021135064A1 (en) * 2020-01-03 2021-07-08 平安科技(深圳)有限公司 Facial recognition method and apparatus, and computer device and storage medium
US20220080974A1 (en) * 2020-09-17 2022-03-17 Hyundai Motor Company Vehicle and method of controlling the same
US11713044B2 (en) * 2020-09-17 2023-08-01 Hyundai Motor Company Vehicle for estimation a state of the other vehicle using reference point of the other vehicle, and method of controlling the vehicle

Also Published As

Publication number Publication date
CN106469443A (en) 2017-03-01
CN106469443B (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2017027212A1 (en) Machine vision feature-tracking system
US11037308B2 (en) Intelligent method for viewing surveillance videos with improved efficiency
US20200250840A1 (en) Shadow detection method and system for surveillance video image, and shadow removing method
US7123769B2 (en) Shot boundary detection
US9390511B2 (en) Temporally coherent segmentation of RGBt volumes with aid of noisy or incomplete auxiliary data
US10621730B2 (en) Missing feet recovery of a human object from an image sequence based on ground plane detection
EP2795904B1 (en) Method and system for color adjustment
WO2022179251A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN106251348B (en) Self-adaptive multi-cue fusion background subtraction method for depth camera
Kim et al. Content-preserving video stitching method for multi-camera systems
CA2964966C (en) Video stabilization system and method
JP6138949B2 (en) Video processing device using difference camera
US20130027550A1 (en) Method and device for video surveillance
US11044399B2 (en) Video surveillance system
US7835552B2 (en) Image capturing apparatus and face area extraction method
TW201530495A (en) Method for tracking moving object and electronic apparatus using the same
JPWO2017029784A1 (en) Image registration system, method and recording medium
Low et al. Frame Based Object Detection--An Application for Traffic Monitoring
US9002056B2 (en) Image processing method and apparatus
CN108737814B (en) Video shot detection method based on dynamic mode decomposition
AU2011311387B2 (en) Image processing
Malathi et al. Multiple camera-based codebooks for object detection under sudden illumination change
US10306153B2 (en) Imaging apparatus, image sensor, and image processor
Chien et al. A 3D hand tracking design for gesture control in complex environments
KR20160024767A (en) Method for shot boundary detection, and image processing apparatus and method implementing the same method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16750562

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16750562

Country of ref document: EP

Kind code of ref document: A1