CN106469443B

CN106469443B - Machine vision feature tracking system

Info

Publication number: CN106469443B
Application number: CN201510496055.0A
Authority: CN
Inventors: 熊志伟; W·曾; 苏驰
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2015-08-13
Filing date: 2015-08-13
Publication date: 2020-01-21
Anticipated expiration: 2035-08-13
Also published as: CN106469443A; WO2017027212A1

Abstract

A machine vision feature tracking system is described. Techniques and constructs are capable of locating features in frames of a video. The tracking module may generate a candidate feature location map based at least in part on the frames of the video and a previous feature location map. The features may correspond to at least some image data of the frame of the video. The update module may determine a candidate feature region based at least in part on the candidate feature location map and locate the feature at the candidate feature region by comparing the histogram of the test region in the frame of the video to the histogram of the corresponding region in the previous frame of the video. In some examples, the test region is determined based at least in part on the candidate feature region. The image sensor may provide one or more of the frames of the video. The tracked feature representation may be displayed.

Description

Machine vision feature tracking system

Technical Field

The present application relates to tracking systems, and more particularly to machine vision feature tracking systems.

Background

Machine vision techniques for automatically identifying objects or features in digital images and videos are used for a wide variety of uses. Visible features in the video frames, such as faces or license plates, may move, rotate, or become shaded or otherwise occluded, for example, captured over multiple frames of the video as the features (e.g., objects) move. Such events may reduce the accuracy of machine vision algorithms such as object localization, orientation detection, face detection, and tracking.

Disclosure of Invention

Systems, methods, and computer-readable media for tracking features over multiple frames of a video are described. As used herein, "tracking" refers to determining, in a subsequent frame, the location of a feature detected in a previous frame. In some examples, a computing system may generate a candidate feature location map based at least in part on a frame of a video and a previous feature location map. The computing system may determine candidate feature regions based at least in part on the candidate feature location map. The computing system may then locate features at the candidate feature regions in the frame of the video by comparing the histogram of the test region in the frame of the video to the histogram of the corresponding region in the previous frame of the video. According to example techniques described herein, a test region may be determined based at least in part on a candidate feature region. According to example techniques described herein, features may be tracked on video from an image sensor and an indication of the tracked features may be presented, for example, via an electronic display.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. For example, the term "technique" may refer to systems, methods, computer-readable instructions, modules, algorithms, hardware logic, and/or operations as permitted by the context described above and throughout the document.

Drawings

The same numbers are used throughout the drawings to reference like features and components. The figures are not necessarily to scale.

FIG. 1 is a block diagram depicting an example environment for implementing feature tracking in videos described herein.

FIG. 2 is a graphical representation of an example frame of a video in which features are tracked.

FIG. 3 shows an example histogram of an image data region according to an example of feature tracking described herein.

FIG. 4 is a data flow diagram depicting example module interactions during feature tracking.

FIG. 5 is a flow diagram illustrating an example process for tracking features in video frames.

FIG. 6 is a flow diagram illustrating an example process for tracking and detecting features in video frames.

FIG. 7 is a flow diagram illustrating an example process for tracking features in a video frame across visibility reductions.

Fig. 8 is a block diagram depicting an example computing device configured to participate in feature tracking according to examples described herein.

Detailed Description

Overview

Examples described herein provide techniques and constructs to track features in digital video. These techniques may enable feature tracking across multiple frames of a video with increased speed, increased accuracy, reduced computation time, and/or reduced memory requirements. The tracking may also allow additional techniques to be performed, such as obscuring or highlighting features in the video (such as faces) or selecting relevant portions of the video for further analysis, e.g., character recognition of a license plate.

Some examples described herein provide improved performance over conventional tracking algorithms. Some existing schemes re-detect features in each frame of the video. Some examples herein detect features in one frame in a grouping of multiple frames and then track features between frames, thereby reducing computational requirements for tracking. Some examples allow features to be tracked even if they do not appear in some frames (e.g., because they are occluded behind other features).

As used herein, the term "video" refers to a time series of digital images. The images may be regularly spaced in time (e.g., 29.97, 30, or 60 frames per second (fps)) or may be irregularly spaced in time. Video may be stored and manipulated, for example, in uncompressed form or compressed form. Example compression techniques may include, but are not limited to, those described in the Portable Network Graphics (PNG), Joint Photographic Experts Group (JPEG), motion JPEG, and Motion Picture Experts Group (MPEG) standards.

Some example scenarios and example techniques for feature tracking are presented in more detail in the following description with respect to the figures.

Illustrative Environment

FIG. 1 shows an example environment 100 in which examples of a feature tracking system may operate or in which a feature tracking method such as that described below may be performed. In the illustrated example, various devices and/or components of environment 100 include computing device 102, depicted as a server computer. Computing device 102 represents any type of device that can receive and process video content. By way of example and not limitation, computing device 102 may be implemented as an internet-enabled television, a television set-top box, a gaming console, a desktop computer, a laptop computer, a tablet computer, or a smartphone. Different devices or different types of devices may have different uses for the feature tracking data. For example, controllers of robots or industrial machines may use feature tracking information from videos of their workplace to determine the position of a work piece during a movement or work step. Surveillance or other video recording systems operating in public spaces, for example, may use feature tracking data to highlight suspicious objects or block the face of people in videos to protect their privacy.

In the illustrated example, the computing device 102 includes or is communicatively connected with an image source 104 depicted as a camera. The image source 104 may include one or more computing devices or other systems configured to provide video frames 106. In the illustrated example, the image source 104 provides video of a scene 108. The video has a plurality of video frames 106. Scene 108 includes a subject 110, in this example a person. This example scenario 108 is for illustration purposes, not limitation.

In some examples, the computing device 102 or the image source 104 may include, for example, one or more sensors 112 configured to capture video frames 106 or otherwise provide video frames 106 or data that can be processed into video frames 106. For example, the image sensor 112 may be configured to provide video having a plurality of frames 106. Example image sensors may include smart phone front and back cameras, light sensors (e.g., CdS photo resistors or photo transistors), photo imagers (e.g., Charge Coupled Devices (CCDs), Complementary Metal Oxide Semiconductor (CMOS) sensors, etc.), video imagers (e.g., CCDs or CMOS), fingerprint readers, retina scanners, iris scanners, computer radiography scanners, and so forth. Some example sensors 112 may include visible light image sensors (e.g., λ ∈ [400nm (nanometers), 700nm ]) or infrared light image sensors (e.g., λ ∈ [700nm,15 μm ], or λ ∈ [700nm,1mm ]). Each of the one or more sensors 112 may be configured to output sensor data corresponding to at least one physical characteristic (e.g., a physical characteristic of the environment of the device), such as ambient light or a scene image or a video frame.

The image data representing the subject 110 in one or more of the video frame(s) 106 has a feature 114, in this example a human face. In addition to faces, other example features may include, but are not limited to, identification indicia (e.g., a license plate or parking pass on a vehicle, a boat registration mark on a boat hull or sail, an airplane tail number (e.g., "N38290" or "B-8888")), a street address number displayed on a building or mailbox, a pattern on clothing (e.g., a cartoon face, a fanciful pattern, or a business or government logo on a body shirt), or clothing, accessories, or vehicles having a color or pattern that is readily distinguishable from the background of the scene 108, such as a red umbrella, a black hat, or a striped skirt. Various aspects herein track a face or other feature 114 across multiple video frames 106.

The computing device 102 includes at least one processor 116 and a memory 118 configured to store, for example, processed video frame(s) (106) or other video data. The memory 118 may also store one or more of the following: a detection component 120, a tracking component 122, and a reporting component 124 stored in the memory 118 and executable on the processor 116. The components 120, 122, or 124 may include, for example, modules stored on a computer-readable medium, such as a computer storage medium (discussed below), and having computer-executable instructions thereon. The details of the example computing device 102 may be representative of other computing devices 102 or the image source 104. However, each of the computing device 102 or the image source 104 may include additional or alternative hardware and/or software components.

The detection component 120 (or modules thereof, as such herein) can be configured to locate a feature 114 (e.g., a face) in one or more video frames 106. The tracking component 122 can be configured to track the features 114 across multiple video frames 106, for example, to determine how a given feature moves between two consecutive frames, or to activate or deactivate a tracker for a particular feature based on shadows, occlusions, or other changes in the visibility of the feature. The reporting component 124 may be configured to provide, for example, a visual representation of the features, such as faces, located in the frame(s) 106 of the video. In some examples, the reporting component 124 may be configured to render at least some of the frames 106 and the tracking indications for display via the display device 126. The display device 126 may include an Organic Light Emitting Diode (OLED) display, a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), or another type of visual display. The display device 126 may be a component of, or may include, a touch screen.

In some examples, the reporting component 124 may provide a visual representation 128 of the video frames or feature tracking data via a display device 126, as discussed below. In some examples, the reporting component 124 may present to display a visual representation 128 of one or more frames 106 of the video and at least some of the tracked ones of the located features. Example visual representations 128(1) -128 (3) (individually or collectively herein using indicia 128) are shown in fig. 1. The representation 128 may represent a possible graphical display of tracking and video information and may represent internal data structures discussed below.

Representation 128(1) represents the video frame 106 at the first time t-0. The video frame at t-0 includes image data of the human subject 110. Time may be represented, for example, as wall clock time (hours, minutes, or seconds), frames or fields (e.g., as represented as a time code standardized by the Society of Motion Picture and Television Engineers (SMPTE)), or any combination thereof. A tracker 130(1) is shown highlighting the feature of interest (in this example the face of the subject 110). As described herein, a tracker is a data structure that represents a particular feature across one or more of the video frames 106, and may be represented graphically as a highlight (such as an outline (shown)) or as a blur (such as a darkened or blurred region). Other example types of highlighting and obfuscation are discussed below.

Representation 128(2) represents the video frame 106 at a later time t-1. With respect to the video frame at time t-0, the subject 110 has moved to the right of the frame 106 (to the left of the subject). Similarly, tracker 130(2) has moved to the right of frame 106 to indicate the location of the feature across video frame 106 at time t-0, 1.

Representation 128(3) represents the video frame 106 at a later time t-2. The body 110 moves to the further right of the frame 106 (to the left of the body) and begins moving out of the frame 106. Tracker 130(3) also moves to the further right of frame 106 to indicate the location of the feature across video frame 106 at time t-0, 1, 2. The tracker 130(3) is able to track features in this example even if the features are partially occluded by the edges of the frame 106.

In the illustrated example environment 100, the detection component 120, tracking component 122, and reporting component 124 can each be a component of an operating system (not shown) or otherwise stored or executed locally on the computing device 102 or image source 104. However, in other examples, one or more of these system components may be implemented as part of a distributed system, such as part of a cloud-based service or a component of another application. For example, the image source 104 may be embodied in a smartphone and the computing device 102 can be embodied in a cloud or other hosted feature tracking service.

In some examples, the image source 104 is embodied in the computing device 102 or connected to the computing device 102. For example, the image source 104 and the computing device 102 may be embodied in a smart phone, tablet, MICROSOFT SURFACE, APPLE IPAD, or other device configured to capture video and track features therein or display tracking information for features therein. This can advantageously allow tracking features even if not connected to a network. In some examples, a processor (not shown) of the image source 104 may communicate with the computing device 102 via a network 132, such as the internet. The computing device 102 may host one or more of the detection component 120, tracking component 122, and reporting component 124, and may exchange data with the image source 104 via the network 132 to perform the processing described herein. Network 132 may include a cable television network, Radio Frequency (RF) microwave, satellite, and/or data network such as the internet, and may also support wired or wireless media using any format and/or protocol, such as broadcast, unicast, or multicast. Additionally, the network 132 may be any type of network (wired or wireless) using any type of network topology and any network communication protocol, and may be represented or otherwise implemented as a combination of two or more networks.

Illustrative processing

Fig. 2 shows an image representation 200 of an example frame 106 of a video, such as a scene 108, in which features are tracked, e.g., as discussed herein. Fig. 2 shows a representation of four frames 202(1) -202 (4) (individually or collectively referred to herein by reference numeral 202). The dotted or short-dashed rectangles represent representations of trackers 204(1) -204(3) (individually or collectively referred to herein by reference numeral 204). Trackers 204(1) -204(3) correspond to respective tracked regions of frame 202. Frame 202 shows a total of three different people, namely subjects 206(1) -206 (3) (individually or collectively referred to herein by reference numeral 206). For purposes of illustration, and not limitation, the tracker 204 is numbered corresponding to three bodies 206.

Between frames 202(1) and 202(2), subject 206(1) (depicted as a man) moves to the right of the frame and subject 206(2) (depicted as a woman) moves to the left of the frame, closer to the camera than subject 206 (1). Trackers 204(1) and 204(2) highlight and represent detected or tracked features, in this example faces of subjects 206(1) and 206(2), respectively. The initial positions of the trackers 204(1) and 204(2) may be determined, for example, by the detection component 120.

In frame 202(2), the subject 206(2) has moved in front of the subject 206(1) (from the perspective of the camera). The tracking component 122 determines that the new location of the tracker 204(2) corresponds to the new (tracked) location of the face of the subject 206 (2). However, the face of the main body 206(1) is occluded. The tracking component 122 detects this occlusion and designates the tracker 204(1) as the out-of-frame tracker 208, graphically represented below the frame 202(2) for purposes of explanation. That is, the features tracked by tracker 204(1) are not visible in frame 202 (2). As used herein, the term "intra tracker" refers to a tracker that is not an extra-frame tracker 208.

Inset 210 shows an enlarged view of the faces of subjects 206(1) and 206(2) in frame 202 (2). Tracker 204(2) indicates the tracked portion of the face of subject 206 (2). The candidate feature region 212 displays regions that are determined, for example, by the tracking component 122, to correspond to the tracked regions of the tracker 204(1) in frame 202 (1). The tracking component 122 may locate the feature of interest 114 in the candidate feature region 212 by comparing the histogram of the test region (e.g., candidate feature region 212) in the frame 202(2) of the video to the histogram of the corresponding region in the previous frame 202(1) of the video, or determine that the feature of interest 114 is not visible in the candidate feature region 212.

In this example, the tracking component 122 determines the histogram 214 as an example histogram of the tracked regions of the tracker 204(1) in frame 202 (1). For clarity of explanation, and not limitation, histogram 214 shows the percentage of black ("K"), brown ("Br"), and skin tone ("Fl") in the tracked region of tracker 204(1), e.g., 25% K, 15% Br, and 60% Fl. In various examples, the histogram includes bin data (binneddata) of various color components of pixels in a region of the frame, e.g., as discussed below with reference to equation (2). The particular colors typically represented in the histogram correspond to a color space (e.g., a particular number of red, green, and blue colors in an RGB color space). Black, brown and skin tones are used in this example only to simplify the explanation. However, in an example implementation, the brown, black, and skin tones are actually represented as, for example, red, green, and blue information or a combination of hue, saturation, and value information. Also in this example, the tracking component 122 determines a histogram 216 of the candidate feature region 212 in frame 202 (2). In the example shown, the body 206(1) has a relatively small amount of black hair, and the body 206(2) has a relatively large amount of brown hair. As shown, the histogram 216 is different from the histogram 214 because the candidate feature region 212 includes some of the brown hairs of the subject 206 (2). In this example, histogram 216 has 20% K, 50% Br, and 30% Fl.

The tracking component 122 can compare the histograms, for example, by calculating differences or distances between the histograms. In the illustrated example, the tracking component 122 determines a distance (or difference, as used herein) between the histogram 214 and the histogram 216. The tracking component 122 determines that the distance is greater than the selected threshold and thus that the candidate feature region 212 does not correspond to the tracked feature 114, in this example the face of the subject 206 (1). The tracking component 122 thereby marks or otherwise designates the tracker 204(1) in frame 202(2) as the out-of-frame tracker 208, as shown.

Between frames 202(2) and 202(3), the subject 206(2) has moved to the left (from the camera's perspective) of the subject 206 (1). The tracking component 122 determines an updated position of the tracker 204(2) corresponding to the tracked position of the face of the subject 206 (2). In some examples, the detection component 120 may detect features (e.g., faces) in the frame 202(3) due to the presence of the out-of-frame tracker 208 in the previous frame 202 (2). In the illustrated example, the detection component 120 detects the face of the subject 206(1) that is no longer occluded behind the subject 206 (2). The tracking component 122 may compare the newly detected face of the subject 206(1) to the intra-frame tracker, the out-of-frame tracker 208, or any combination thereof. For example, the tracking component 122 can compare the stored location information of the out-of-frame tracker(s) 208, such as tracker 204(1), to the location information of the detected feature(s). The tracking component 122 may additionally or alternatively compare the histogram of the newly found feature 114 to the histogram of the intra-frame tracker or the extra-frame tracker 208. In the illustrated example, the tracking component 122 determines that the tracker 204(1) corresponds to the detected face of the subject 206(1) in frame 202 (3).

In some examples, the tracking component 122 may additionally or alternatively track features based at least in part on test regions other than the region being tracked. This may allow, for example, image data of a garment worn by the subject to be used instead of, or in addition to, image data of the subject's face to distinguish between the subject's faces. Since clothing may vary more person-to-person than skin color, using clothing may improve the accuracy of tracking a face, for example, when one person passes in front of another ("face-interlacing") (e.g., as shown in frames 202(1) -202 (3)) or when a face disappears and then reappears ("occlusion") (e.g., as shown in frames 202(2) and 202 (3)). This may also provide for increased accuracy in tracking the license plate of the vehicle, for example. For example, vehicle colors and bumper sticker colors and patterns may vary more from vehicle to vehicle than license plate colors. Thus, tracking based at least in part on image data of portions of a bumper of a vehicle in addition to (or instead of) image data of portions of a license plate of the vehicle may allow for more accurate tracking of multiple vehicles in the scene 108. For clarity of explanation, only one test area is described herein, except for the area being tracked. However, any number of test regions may be used for a given tracked region. The number of test regions for a given tracked region may vary between frames 202. The tracking component 122 may determine the test region based on a candidate feature region (such as candidate feature region 212), based on a tracked region (such as for tracker 204), or any combination thereof.

In the illustrated example, the tracking component 122 can determine the test area 218 in frame 202(1) based at least in part on the tracked area, e.g., tracker 204 (1). The tracking component 122 may also determine respective test regions 220(1) and 220(2) in the frame 202(3) based at least in part on the tracked regions of the trackers 204(1) and 204(2) in the frame 202(3) (e.g., the locations of the respective detected faces of the subjects 206(1), 206 (2)). The histograms of these regions may be used in tracking, as discussed below with reference to fig. 3. The portions of frames 202(1) -202 (4) displayed by trackers 204(1) and 204(2) may be, for example, candidate feature regions or test regions as discussed herein with reference to tracking component 122. For example, if subjects 206(1) and 206(2) have very similar skin and hair colors, then test areas 220(1) and 220(2) corresponding to the subject's clothing may be used to distinguish subjects from each other.

In frame 202(4), a new subject 206(3) appears in the scene 108. The detection component 120 locates new features in frame 202(4), such as the face of the newly appearing subject 206 (3). The tracking component 122 may determine that the new feature does not correspond to an existing tracker (e.g., the out-of-frame tracker 208), for example, using the comparison of position, histogram, or other data described herein, and may thereby assign a new tracker 204(3) that is initialized with the detection results. In this example, for purposes of illustration, the visual representation of tracker 204(3) is shown as a blur that shadows the face of subject 206 (3). Other examples of obfuscation are discussed herein.

In some examples, the tracking component 122 may use an assignment algorithm to determine a correspondence between the detected features 114 and the tracker 204. Examples of detection of features are discussed above with reference to frames 202(1) and 202 (4). An allocation algorithm, such as the Hungarian algorithm or other dynamic programming or combinatorial optimization algorithms, uses scores, e.g. the difference or distance between a given tracker (intra or out of frame) and a given detected feature. For example, the score may be a histogram distance between image data in the tracked region and image data in the region that retains the detected feature; the rate of overlap between the tracker and the feature, discussed below with reference to equation (3); or a combination thereof, e.g., a weighted sum.

The assignment algorithm may determine a correspondence between a given set of detected features and a given set of trackers (e.g., intra-frame, out-of-frame, or both) such that the score is mathematically maximized (e.g., for a fitness score) or minimized (e.g., for a distance score). Other mathematical optimization algorithms (e.g., gradient descent algorithms) may also or alternatively be used. Other algorithms may be used, such as partitioning the image into regions and performing assignments or other tests on trackers and detected features in the same region.

In some examples, both the intra-frame tracker and the out-of-frame tracker are provided to an allocation algorithm (or other mathematical optimization technique). An algorithm executing on the processor 116, for example, may determine a mapping between the tracker and the detected features. The mapping may indicate that some trackers do not correspond to any detected features. In response to such mapping, these trackers may then be designated as out-of-frame trackers 208.

FIG. 3 shows an example histogram 300 of the example test area discussed above with reference to FIG. 2. As discussed above, in some examples, a test region (e.g., a clothing region) is used in the cross-occlusion tracking feature 114 instead of a feature region (e.g., a face region), for example. Referring to both fig. 2 and 3, in frame 202(1), the tracking component 122 may determine the test area 218 based at least in part on, for example, the location information or other characteristics of the tracker 204(1) in frame 202(1) or the detected features represented by the tracker 204(1) in frame 202 (1). The tracking component 122 may also determine the test areas 220(1) and 220(2) based at least in part on, for example, the location information or other characteristics of the trackers 204(1) and 204(2) in frame 202(3) or the detected features represented by the trackers 204(1) and 204(2) in frame 202 (3).

In the example shown, the main body 206(1) wears a white ("Wh") shirt and the main body 206(2) wears a red ("R") skirt. Thus, the illustrated respective histograms 302(1) and 302(2) of the second test areas 220(1) and 220(2) show 5% R/95% Wh and 100% R/0% Wh, respectively. Histogram 304 of test area 218 shows 0% R/100% Wh. As mentioned above with reference to fig. 2, the example histogram is shown for purposes of explanation, not limitation. In one example, instead of a simple red/white bar, the histogram may include an equal-sized hue bar centered at [0 °,15 °,. 360 °), an equal-sized saturation bar centered at [ 0%, 10%,. 100% ], and an equal-sized value bar centered at [ 0%, 10%,. 100% ]. In this example, red may be represented as a hue near 0 °, a saturation near 100%, and a histogram peak of value (e.g., 50% or higher). White can be represented as a pixel with a histogram peak near 0% saturation and near 100% value.

The tracking component 122 may calculate the histogram distance and determine that the tracker 204(1) should be associated with an area above the test area 220(1) rather than an area above the test area 220(2) because the distance between the histogram 304 (corresponding to the test area 218) and the histogram 302(1) (corresponding to the test area 220(1)) is less than the distance between the histogram 304 and the histogram 302(2) (corresponding to the test area 220 (2)). That is, the features tracked by tracker 204(1) in frame 202(3) correspond to portions near test area 220(1), but not to portions near test area 220(2), as shown. In the example shown, the result is that even if the subject 206(1) is occluded (frame 202(2)) and subsequently re-appears (frame 202(3)), the tracker 204(1) remains associated with the subject 206 (1).

Thus, in some examples, the tracking component 122 may locate the feature (indicated by the tracker 204 (2)) at the candidate feature region in the frame 202(3) of the video by comparing the histogram 302(1) of the second test region 220(1) in the frame 202(3) of the video with the histogram 304 of the corresponding test region 218 in the previous frame 202(1) of the video.

In some examples, the tracking component 122 may locate a feature if the histograms of the test regions correspond, or if the histograms of the candidate regions correspond, or if the histograms of both the test regions and the candidate regions correspond, or if the histograms of the majority of the histograms (the candidate region and any of the test region (s)) correspond, or if a selected number of the histograms correspond, or any combination thereof. In some examples, the tracking component 122 may locate a feature if no histogram, or a few histograms, or less than a selected number of histograms do not correspond or have a distance that exceeds a selected threshold, or any combination thereof, or any combination of any of the items in this paragraph.

FIG. 4 is a data flow diagram 400 illustrating example interactions between the components shown in FIG. 1 and example modules of the display tracking component 122. The modules of the tracking component 122 (e.g., stored in the memory 118) may include one or more modules, such as a shell module or an Application Programming Interface (API) module, which are shown as a tracking module 402 and an update module 404. In some examples, the image source 104 provides the video frame(s) 106 to be processed by, for example, the detection component 120 or the tracking component 122. The tracking component 122 may track the features 114 in the frames 106 of the video, for example, using the tracking information or a color histogram of the candidate region. In some examples, the tracking component 122 is configured to update the tracking results from the Kanade-Lucas-tomasi (klt) tracker based at least in part on a histogram of image content of the selected regions of the frame 106, as described herein.

In some examples, the detection component 120 may be configured to locate the feature(s) 114 in the first frame 106 of the video or other frame(s) of the video. In some examples, the detection component 120 may be configured to generate the detected feature location map 406 based at least in part on the frames 106 of the video. The detected feature location map 406 may include, for example, the location or bounding rectangle or other bounding portion of one or more detected features 114 in the frame 106. This can be achieved, for example, using an algorithm implemented in MICROSOFT cesdk. Such algorithms may include, but are not limited to, dynamic concatenation, enhanced classifiers, boosted chain learning, neural network classification, Support Vector Machine (SVM) classifier, or bayesian classification. One example algorithm is the Viola-Jones detector using adaptive reinforcement learning (AdaBoost) to cascaded sub-detectors using rectangular haar-type features. In some examples, the tracking module 402 may be configured to generate a candidate feature location map 408 based at least in part on the subsequent frames 106 of the video and the detected feature location map 406. The features may correspond to at least some image data of the frame of the video. The tracking module 402 may further generate the candidate feature location map 408 based on a previous frame 106 of the video, a corresponding previous feature location map, such as the previous frame feature location map 410 (e.g., provided by the detection component 120 or the tracking module 402), or a combination thereof. This may be accomplished, for example, by using the KLT or other tracking techniques, such as a zoom fast compression tracking (SFCT) or bump (Struck) tracker. For example, the KLT may be applied to the previous frame 106 and the subsequent frame 106. The candidate feature location map 408 may include, for example, 0, one, or more candidate portions for each feature located in the detected feature location map 406. In some examples, tracking module 402 may use a tracker that is robust to zoom changes (such as KLT). This may allow the feature 114 to be tracked as the subject 110 approaches or moves away from the camera.

In some examples, the detected feature location map 406 corresponds to a first video frame, and the candidate feature location map 408 corresponds to a second video frame that immediately follows the first video frame. In some examples, the detected feature location map 406 is a previous feature location map corresponding to a previous video frame arranged in the video immediately preceding the frame of the video for which the candidate feature location map 408 was determined. Examples of such configurations are discussed above with reference to frames 202(1) and 202(2) of fig. 2.

In some examples, tracking module 402 uses a KLT tracker. In some examples, the KLT tracker uses mathematical optimization to determine regions in later frames that correspond to regions in earlier frames. The sum of the squared intensity differences between the pixels of the region in the later frame and the pixels of the selected region of the earlier frame is mathematically minimized. This region of the later frame is followed by the tracked equivalent of the region of the earlier frame. For example, the region of an earlier frame (e.g., the first frame) may include image data of a feature 114 (e.g., a face). This region in the earlier frame and the corresponding region of the later frame (e.g., determined via mathematical minimization according to the KLT algorithm) may be the region of the tracker 130 of the feature 114 in the respective frame. This region in the later frame may be an example of the candidate feature location map 408. In some examples, the KLT or another tracker may be calculated based at least in part on a selected subset of points in a region of an earlier frame.

In some examples, the tracking module 402 may be configured to track the located feature on a subsequent second frame of the video. For example, in the sequence of

frames

1,2, … …, the tracking module 402 may use the KLT or another tracking technique to track features from frame 1 to frame 2, then from frame 2 to frame 3, and so on. The tracking module 402 may be configured to determine that the histogram of the first feature region of a particular one of the second frames does not correspond to the histogram of the first feature region of the first frame. In response, the tracking module 402 may record an indication indicating that the tracked feature corresponding to the first feature region has moved out of the frame (e.g., out of at least one of the second frames). This is discussed in more detail above with respect to the out-of-frame tracker 208. This may allow features to be tracked even if they are occluded (e.g., for a small number of frames).

In some examples, the update module 404 may be configured to determine candidate feature points based at least in part on the candidate feature location map 408. For example, the update module 404 may uniformly or non-uniformly select a predetermined number of points, such as a 10 x 10 grid, from a given region in the candidate feature location map 408. In some examples, the update module 404 may select a predetermined number of points for a given region, the selected points having a relatively high amplitude gradient within the region, e.g., for an integer n, n points having the highest amplitude gradient within the region. In some examples, the update module 404 may select points having local maxima of curvature, e.g., as measured by a hessian matrix or a determinant of an approximation of a hessian matrix using a Haar (Haar) filter. In some examples, the update module 404 may select the points by, for example, decomposing and downsampling using a haar wavelet and selecting points in the decomposed image that have a selected amplitude (e.g., within a certain percentage of the highest amplitude value after the wavelet transform).

In some examples, update module 404 may apply a tracker (such as KLT) or use results from tracking module 402 to bi-directionally track selected points, e.g., from a first frame to a second frame, and then return from the tracked points in the second frame to the first frame. The starting point and the point in the first frame after bidirectional tracking may not be the same point; the distance between them is called the "back and forth error" (FBE). In some examples, the update module 404 may determine FBEs for various ones of the selected points. The update module 404 may then determine a representative FBE value, such as a median or average of at least some of the determined FBEs, or all of the determined FBEs, or a selected percentage of any of these. In some examples, the update module 404 may then determine one or more of the points that satisfy the selected rule (e.g., have a respective FBE that is less than the determined representative FBE value) as candidate feature points. This may advantageously reduce noise in the tracking, e.g. due to changes in shading or illumination between one frame and the next.

In some examples, the update module 404 may be configured to determine candidate feature regions based at least in part on the determined candidate feature points. In some examples, the update module 404 may determine a point location displacement between the point in the earlier frame and the tracked point in the later frame for one or more of the candidate feature points. The update module 404 may then determine the region displacement vector as, for example, a median or average of one or more of the point displacement vectors. In some examples, the update module 404 may determine the comparison scale factor for each pair of candidate feature points as a quotient of the distance between the point in the pair in the earlier frame and the tracked point in the pair in the later frame. The update module 404 may then determine the zone scaling factor as, for example, a median or average of one or more of the scaling factors. In some examples, the update module 404 may then apply the region displacement vector and the region scaling factor to a region of the previous frame feature location map 410 to determine a corresponding region in the candidate feature location map 408.

In some examples, the update module 404 may be configured to determine candidate feature regions based at least in part on the FBE values. For example, for any particular candidate feature region, if the median (or average or other representative) FBE is greater than a selected threshold, the update module 404 may, for example, ignore that candidate feature region or consider the tracker 130 corresponding to that candidate feature region to be an out-of-frame tracker, as discussed above with reference to the out-of-frame tracker 208 of fig. 2. This may allow detection of tracking failures, such as occlusion of facial features due to an umbrella or occlusion of license plate features due to a building.

In some examples, the update module 404 may be configured to locate the feature 114 at the candidate feature region in the frame of the video (or at least partially in, above, within, above, or below the candidate feature region, all of which are herein) by comparing the histogram of the test region in the frame of the video to the histogram of the corresponding region in the previous frame of the video. This may be accomplished, for example, as discussed above with reference to

histograms

214 and 216 shown in fig. 2 and histograms 302(1), 302(2), and 304 shown in fig. 3. In some examples, the update module 404 may be configured to locate the feature 114 if a histogram of image data of a portion of the video frame 106 defined by the candidate feature region is within a selected distance of a previously determined image data histogram corresponding to the feature 114. Locating features based at least in part on histogram comparison may improve the robustness of the tracker to drift across a sequence of frames.

In some examples, the distance between histograms may be calculated using various metrics including, but not limited to, χ 2 (chi-squared), Kolmogorov-smirnov (ks), Cramer/von Mises (CvM), Quadratic Form (QF), bulldozer (Wasserstein-Rubinstein-malrows) distance, Kullback-liebler (KL), symmetric KL, Jeffrey Divergence (JD), euclidean distance in feature space having as many dimensions as the bars of the histogram, or histogram intersection of overturns.

In some examples, two histograms H₁And H₂The distance between is calculated as in equation (1):

wherein L is the number of bars in the histogram, and

is the average of the histogram. In some examples, if for the selection threshold T_H，d(H₁，H₂)＜T_HThen the two histograms (e.g., the detected histogram and the tracked candidate histogram) are determined to correspond. In some examples, histogram H is formed by calculating respective histograms (e.g., hue histogram, saturation histogram, and value histogram) for each color component_i. Each histogram may be calculated using the values of the corresponding color components of the image. Histogram H may then be formed by concatenating the calculated histograms_i. For example, given a hue histogram [ h ]₁，...，h_n]Saturation histogram [ s ]₁，...，s_n]Sum value histogram [ v₁，...，v_n]，H_iMay be formed as in equation (2):

H_i＝[h₁，...，h_n，s₁，...，s_n，v₁，...，v_n](2)

the hue, saturation, and value components may alternatively be interleaved (e.g., h)₁,s₁,v₁,h₂… …), or a portion of the histogram (e.g., a particular component or range) may be interleaved and a portion not interleaved, or any combination thereof.

Referring to the example shown in FIG. 2, due to the histogramThe distance between 216 and histogram 214 exceeds a selected threshold T_HThen the tracking component 122 determines that the candidate feature region 212 is not associated with a detected region indicated by the tracker 204 (1). In some of these examples, in response to this determination or other histogram matching or histogram distance criteria, the tracking component 122 selects the tracked candidate feature region 212 as the region corresponding to the out-of-frame tracker 208. In some examples, the tracking component 122 may determine that the tracker 204 is the out-of-frame tracker 208 if: for example, no candidate region histogram in the current frame matches the previous frame histogram associated with the tracker 204 or is within a selected distance of the previous frame histogram associated with the tracker 204.

In some examples, the update module 404 may be configured to determine the test region based at least in part on the candidate feature region. For example, the test region may be equal to or greater than or less than the candidate feature region in size, and may have corner(s) or centroid(s) that cover or are spaced apart from the candidate feature region. In some examples, the test region may be overlaid with or spaced apart from the candidate feature region. The test area may be determined, for example, based on the stored spatial relationship. In some examples, the test region is or includes a candidate feature region. In some of these examples or other examples, the feature is a face. In some examples, the test region has a different size or location than the candidate feature region. Some examples of determining the test area are discussed above with reference to test areas 220(1) and 220(2) of fig. 2.

In some examples, the two regions are considered to correspond based on color distance. For example, two regions may be considered to correspond if: the respective dominant colors in these regions (e.g., the most frequently occurring colors or color bars) are within a specified distance in color space (e.g., CIELAB Δ E ≦ 1.0). Other example color difference metrics include L²Norm | · | non conducting filament₂(i.e. the Euclidean distance in the color space (e.g. Lxa b or RGB)), L¹Norm | · | non conducting filament₁(Manhattan distance) and other L's in color space^pAnd (4) norm. Other example color difference metrics include color tablesThe difference in hue angle between numerical representations of one or more components in the color space, the difference per component or multiple components (e.g., CIELAB Δ L, Δ C, Δ h, Δ a, or Δ b), the hamming distance, and the CIE 1976 UCS Δ u 'v'.

In some examples, the update module is further configured to determine the test region based at least in part on a stored spatial relationship between the candidate feature region and the test region. For example, the stored spatial relationship may be a proximal disjointed relationship in which the specified test region is separated from the candidate feature region by a predetermined amount, e.g., in a predetermined direction. This may allow the feature to be located using nearby image data. The stored spatial relationship may be represented by a scale invariant term (e.g., a percentage of the size of the feature region). This may allow for tracking based at least in part on the test area even if the feature changes in size or scale (e.g., due to the subject 110 moving closer to the camera (e.g., image sensor 112) or further from the camera).

For example, when the feature 114 is a face, it may be difficult to distinguish the face based on color because human skin has a limited range. However, clothing tends to vary between people and tends to be close to and spaced from the face. Similarly, bumper stickers tend to be close to, but spaced from, the license plate. Using test region(s) having a proximal end disjointed relationship with the candidate feature region(s) allows for distinguishing and tracking those features 114 using image data near the features 114. Other examples are discussed above with reference to test areas 220(1) and 220(2) of fig. 2.

In some examples, histograms as described herein may be calculated in one or more color spaces, such as those defined by the international commission on illumination (CIE), national committee on television (NTSC), or International Electrotechnical Commission (IEC). Example color spaces that may be used include CIE L a b (CIELAB), CIE L u v (CIELUV,1976), CIEYu 'v' (CIE 1976 unified coordinate scale, UCS, highlight Y), YIQ (NTSC 1953), sRGB (IEC 61966-2-1:1999), linear RGB, or other color spaces. In some examples, the histogram may be calculated or otherwise determined in a color space having at least hue and chroma dimensions. Examples of such color spaces include, but are not limited to, hue/saturation/value (HSV), hue/saturation/brightness (HSL), hue/saturation/intensity (HSI), and CIE L C h. For example, an image may include respective color components corresponding to various dimensions of a color space.

In some examples, the tracking component 122 is configured to calculate a distance between the histogram of the first feature region in one frame and the histogram of the first feature region in an earlier frame, for example, in the selected color space. The tracking component 122 can also be configured to determine those histograms do not correspond in response to the calculated distance exceeding a selected threshold (e.g., Δ E ≧ 1.0).

In some examples, the feature tracking system may include a display device 126. In some examples, the reporting component 124 may be configured to provide a visual representation of the features located in the frames of the video for display. In some examples, the reporting component 124 may be configured to provide a visual representation via the display device 126. Examples are discussed above with reference to representation 128 in FIG. 1. In some examples, the reporting component 124 may be configured to present for display frames of the video and a visual representation of at least some of the detected and/or tracked features.

In some examples, the visual representation may include at least one blur. Example blurriness may include, for example, portions that are darkened (e.g., black), portions that are lightened (e.g., white), portions in which image data is spatially distorted, or portions in which image data is blurry. Using blurring may reduce the likelihood of visually identifying a feature when viewing representation 128, for example, to protect the privacy of a person walking past a traffic camera.

In some examples, the visual representation may include at least one highlight. Example highlights may include, for example, an outline of a feature or an outline of a portion including a feature (e.g., tracker 130 as graphically represented in fig. 1), a brightened or darkened background under or overlaying the detected feature (e.g., a semi-transparent colored highlight overlaying image data of the feature), or an arrow or other shape pointing or otherwise indicating the feature. In some examples, representation 128 may include any combination of blurring or highlighting for each tracked feature.

In some examples, the reporting component 124 may be configured to provide an indication of at least some of the detected and/or tracked features. For example, the reporting component 124 can be configured to present a representation 128 of the features 114 detected by the detection component 120, the features 114 tracked by the tracking module 402 or the update module 404, or any combination thereof. For example, the detection highlighting and the tracking highlighting may be presented with different colors or line patterns to allow them to be easily distinguished in the representation 128.

Illustrative Process

5-7 illustrate example processes for tracking features in video frames. The methodologies are shown as a group of operations shown as discrete blocks. The method can be implemented in any suitable hardware, software, firmware, or combination thereof. For example, the functionality shown in fig. 5-7 may be implemented on or otherwise embodied in one or more of the computing device 102 or image source 104 (e.g., using software running on such a device). In the context of software, the operations represent computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the recited operations. In the context of hardware, an operation represents a logical function implemented in circuitry, such as data path control and finite state machine sequencing functions.

The order in which the operations are described is not intended to be construed as a limitation, and any number of the operations described can be combined in any order and/or in parallel to implement each process. For clarity of explanation, reference is made to various components and data items shown in FIGS. 1-4 that may perform or participate in the steps of the exemplary method. However, it should be noted that other components may be used; that is, the exemplary methods illustrated in FIGS. 5-7 are not limited to execution by the identified components.

FIG. 5 illustrates an example process 500 for tracking features in video frames.

At block 502, one or more features are detected in a first frame. For example, the detection component 120 detects one or more features within a first frame of the video and a corresponding detected first frame feature region. The video may include a first frame followed by a second frame. Video frames may be received and processed in any order. The process steps described below may be interleaved or performed in any order unless otherwise specified. For example, the detecting (block 502) may be performed for the first frame after both the first and second frames are received.

At block 504, one or more of the detected features are tracked in a second frame. For example, the tracking module 402 determines a corresponding tracked candidate second frame feature region in the second frame based at least in part on the first frame, the second frame, and the detected first frame feature region. In some examples, block 504 includes comparing at least some image data of the second frame to at least some image data of the first frame. In some examples, the tracking module 402 may use a tracking algorithm (such as KLT) to determine candidate second frame feature regions by finding a mathematically best fit between the image data of the first frame and the image data of the second frame in the first frame feature region.

In some examples, block 504 includes comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame. This is referred to herein as "two-way" tracking. Bidirectional tracking may allow features to be more accurately tracked across a set of three or more frames, for example, for analysis of recorded video or other applications where multiple frames can be received before a feature is identified in a frame. In some examples, the image source 104 or the computing device 102 may rearrange the frames into, for example, a sequence, such as a sequence of I-, P-, and B-pictures specified in one or more MPEG video standards (e.g., MPEG-2, MPEG-4 part 2, or MPEG-4 part 10). This may allow frames with later timestamps in the video to be processed before frames with earlier timestamps in the video, e.g., for bi-directional tracking.

At block 506, the histogram of the tracked candidate second frame feature region is compared to the histogram of the detected first frame feature region. For example, the update module 404 determines a histogram in a color space having at least hue and chroma dimensions. In some examples, the update module 404 compares the histograms by calculating a distance between the histogram of the candidate second frame feature region being tracked and the histogram of the detected first frame feature region.

In some examples, at block 506, a histogram of the test region is computed instead of or in addition to the candidate second frame feature region being tracked or the detected first frame feature region. This may be accomplished, for example, as discussed above with reference to histograms 302(1), 302(2), and 304 shown in fig. 3 (which may be used separately or in conjunction with

histograms

214 and 216 shown in fig. 2).

At block 508, the candidate second frame feature regions for which the comparison indicates at least a threshold degree of similarity may be selected as the tracked second frame feature regions. For example, the update module 404 does not consider the candidate second frame feature region as a tracked second frame feature region if: the distance between the histogram of the candidate region and the histogram of the detected region exceeds a selected threshold.

Fig. 6 illustrates an example process 600 for tracking and detecting features in frames of a video, e.g., using a computing device, such as computing device 102. Block 502-508 of fig. 6 may be performed, for example, as described above with reference to fig. 5. Block 508 may be followed by block 602.

In some examples, the frames of the video include a third frame later than the second frame. In some examples, the frames of the video include a fourth frame later than the third frame. The labels "first", "second", "third" and "fourth" are used for clarity of presentation and are not limiting unless explicitly stated. In some examples, the operations described herein with respect to the first, third, and fourth frames may be used without the second frame. Such examples may include, for example, detecting a feature in a first frame (e.g., frame 202(1) of fig. 2), determining that one or more features are not visible in a second frame (e.g., frame 202(2) of fig. 2), and determining that at least some of the one or more features are visible in third and fourth frames (e.g., frames 202(3) and 202(4) of fig. 2).

At block 602, tracking (block 504) is repeated for a third frame. In some examples, the tracking component 122 determines one or more tracked candidate third frame feature regions of the third frame based at least in part on the second frame, the third frame, and the one or more tracked second frame feature regions including the tracked second frame feature regions.

At block 604, the comparison (block 506) is repeated for the third frame. In some examples, the tracking component 122 compares the histogram between the candidate region in the third frame and the detected region in the first frame. In some examples, the tracking component 122 compares the histogram between the candidate region in the third frame and the tracked region in the second frame.

At block 606, the selection (block 508) is repeated for the third frame to determine one or more tracked candidate third frame feature regions. The operations in blocks 602-606 may be performed as described above with reference to fig. 2-5 (e.g., with reference to frames 202(2) or 202(3) of fig. 2 or with reference to fig. 3). As mentioned above, some trackers 204 may not be selected and may be labeled as out-of-frame trackers 208, all of which are referred to herein.

At block 608, the detecting (block 502) is repeated for a third frame to determine one or more detected third frame feature regions. This may be accomplished, for example, as discussed above with reference to tracker 204(1) in frame 202(3) of fig. 2.

At block 610, at least some of the tracked candidate third frame feature regions are associated with detected third frame feature regions. This may be accomplished, for example, as described above with reference to frame 202(3) of fig. 2 and, for example, based at least in part on the association of the out-of-frame tracker 204(1) and the subject 206(1) of the histograms in the test area 220(1) and the test area 218.

In some examples, block 610 includes calculating respective overlap ratios for pairwise combinations of candidate third frame feature regions and detected third frame feature regions. Region R of tracked feature_TAnd the region R of the detected feature_DOverlap ratio of R (R)_T,R_D) Can be calculated as in equation (3):

wherein R is_TAnd R_DIs the set of pixel locations in the respective region and represents the set potential.

At block 612, a tracked candidate third frame feature region that is not associated with the detected third frame feature region is selected as an out-of-frame region. This may be accomplished, for example, as described above with reference to frame 202(2) of fig. 2. In some examples, block 612 may be followed by block 614 and/or block 620.

At block 614, tracking (block 504) is repeated for the fourth frame.

At block 616, the comparison (block 506) is repeated for the fourth frame.

At block 618, the selection (block 508) is repeated for the fourth frame to determine one or more tracked candidate fourth frame feature regions. The operations in block 614-618 may be performed as described above with reference to fig. 2-5 (e.g., with reference to frame 202(3) shown in fig. 2).

At block 620, the detecting (block 502) is repeated for a fourth frame to determine one or more detected fourth frame feature regions. This may be accomplished, for example, as discussed above with reference to frames 202(3) or 202(4) of fig. 2.

At block 622, at least some of the tracked candidate fourth frame feature regions are associated with corresponding out-of-frame regions. This may be accomplished, for example, as described above with reference to tracker 204(1) in frame 202(3) shown in fig. 2. As mentioned above, this may allow features to be tracked even if they are occluded (e.g., for less than a selected number of frames). In some examples, at least some of the tracked candidate fourth frame feature regions may be associated with corresponding detected fourth frame feature regions.

Fig. 7 illustrates an example process 700 for tracking features in frames of a video, e.g., across occlusion events or other reductions in the visibility of features.

Blocks

502 and 504 of fig. 7 may be performed as described above with reference to fig. 5.

Blocks

502 and 504 and other blocks shown in fig. 7 may be performed for one or more frames of a video.

At decision block 702, a determination is made whether any trackers are moved out of the frame, for example, due to being unable to associate with the detected features. This may be accomplished, for example, as discussed above with reference to block 610 of fig. 6 or as discussed above with reference to

histograms

214 and 216 displayed in fig. 2.

If it is determined that an out-of-frame tracker exists ("yes" branch from decision block 702), then at block 704, information for the out-of-frame tracker is recorded. This may be accomplished, for example, as discussed above with reference to frame 202(2) of fig. 2. In some examples, block 704 may be followed by block 708, block 704 may be followed by block 714, or block 704 may be followed by

block

708 and 714 in the order of, for example, blocks 708 and 714 directly or through an intermediate block such as decision block 710.

If it is determined that there is no out-of-frame tracker ("no" branch from decision block 702), at decision block 706, a determination is made whether the detection slot has expired. For example, the tracking component 122 counts the video frames that have been processed since the last frame for which detection was performed, as discussed with reference to block 502 or block 708. The tracking component 122 counts and stores the count with a counter T every frame or every n frames_GThe times are compared to determine if the detected time slot has expired. In this example, the tracking component 122 resets the counter to 0 in block 502 and in block 708.

Example detection slots may include, but are not limited to, a selected amount of wall clock time, a selected number of frames in which tracking has been performed, or a selected number of frames of received video. In some examples, the detection time slot is 30 frames, or as many as 1 second of frames constituting the video. In some examples, tracking (e.g., KLT) is performed per frame and detection (e.g., facess dk) is performed only per n frames (for detection slot n).

If a determination is made that the detection slot has not expired ("no" branch from block 704), processing continues as described above with reference to block 504.

On the other hand, if it is determined that the detection time slot has expired ("yes" branch from block 704), then at block 708, the feature may be detected in the video. This can be accomplished, for example, as discussed above with reference to detection component 120 and block 502. For example, as discussed above with reference to frames 202(3) and 202(4), previously occluded features may reappear in the frame or new features may appear in the frame.

At decision block 710, a determination is made whether any features are detected at block 708. This can be accomplished as discussed above with reference to detection component 120.

If no features are detected ("no" branch from decision block 710), at decision block 712, the tracking component 122 determines whether any intra-frame trackers, such as tracker 204(2) of FIG. 2, are present. If an intra tracker is present (the "yes" branch from decision block 712), processing continues as described above with reference to block 504. On the other hand, if there is no active intra tracker ("no" branch from decision block 712), processing continues as described above with reference to block 708. Thus, detection is repeated across multiple video frames until there are features to be tracked in the video frames.

Referring back to decision block 710, if a feature is detected ("yes" branch from decision block 710), at block 714, one or more intra-frame or extra-frame trackers are updated, e.g., to be associated with the detected feature. This may be accomplished, for example, as described above with reference to

blocks

610 or 622 of fig. 6.

In some examples, the tracking component 122 may delete expired trackers at block 716. In some examples, the tracking component 122 discards the out-of-frame tracker 208 that is not updated for a predetermined time slot (such as multiple frames, elapsed time, or other time described above, e.g., 10 frames, 30 frames, or 1 second). The tracking component 122 in these examples removes trackers for features that do not reappear (e.g., faces of people that have left the field of view of the sensor 112).

In some examples, such as where the features are similar in color to the background, the tracker may drift over several frames to track stationary portions of the background (e.g., walls) instead of the features. In some examples, at block 716, the tracking component 122 discards not updated by the selected number T_AAn intra-frame tracker of a frame (e.g., 50 frames or 100 frames). The tracking component 122 in these examples removes trackers that have drifted to the background, for example. The tracking component 122 canDetermining that the intra tracker is not updated if: the tracker is at T_ANot moved more than a selected distance during a frame, or the tracker at T_AThe portion that is not resized during the frame by more than a selected absolute percentage (e.g., ± 5%) changes.

Illustrative Assembly

Fig. 8 illustrates select components of an example computing device 102,800. In the illustrated example, computing device 102 includes one or more processors 116, memory 118, input/output (I/O) interfaces 802, and/or communication interfaces 804. The memory 118 can be implemented as any combination of various types of memory components, such as a computer-readable medium or computer storage media component. Examples of possible memory components include random access memory (RAN), disk drives, mass storage components, and non-volatile memory (e.g., ROM, flash memory, EPROM, EEPROM, etc.). Alternative implementations of the computing device 102 may include a range of processing and memory capabilities. For example, a full resource computing device may be implemented with a large amount of memory and processing resources, including a disk drive for storing content for playback by a viewer. However, low-resource computing devices may have limited processing and memory capabilities, such as a limited amount of RAM, no disk drive, and limited processing power.

The processor 116 processes various instructions to control the operation of the computing device 102 and to communicate with other electronic and computing devices. For example, the processor(s) 116 may be configured to execute modules of the plurality of modules on the memory 118, as discussed below. In some examples, computer-executable instructions stored on the memory 118, upon execution, may configure a computer, such as the computing device 102 (e.g., the computing device 102 or the image source 104), to perform the operations described herein with respect to, for example, the detection component 120, the tracking component 122, the reporting component 124, or modules of any of these. The modules stored in the memory 118 may include instructions that, when executed by the one or more processing units 116, cause the one or more processors 116 to perform the operations described herein.

The memory 118 stores various information and/or data including, for example, the detection component 120, the tracking component 122 (including, for example, the tracking module 402 and the update module 404 of fig. 4), the reporting component 124, and optionally an operating system 806 and/or one or more other applications 808. The functions described in association with the illustrated components or modules may be combined to be performed by a fewer number of components or modules or may be divided and performed by a greater number of components or modules. Other applications 808 may include, for example, an internet browser including video capabilities, a media player application, a video editing application, a video streaming application, a television viewing application, and so forth. In some examples, computer executable instructions of the detection component 120, tracking component 122, reporting component 124, and application 808 stored in one or more computer readable media (e.g., memory 118), when executed on the processor 116 of the computing device 102, instruct the computing device 102 to perform the functions listed herein with reference to the relevant components in memory 118.

In the illustrated example, the memory 118 includes a data store 810. In some examples, the data store 810 may include a first memory to store one or more video frames 812 (e.g., the video frames 106 of fig. 1). Each video frame may include data for a plurality of pixels, for example. The data for each pixel may include respective values for each color channel (plane), e.g., red (R), green (G), and blue (B) color channels; luminance (Y), blue chrominance (Cb), and red chrominance (Cr) color channels; or other color organization. In some examples, the data store 810 may store one or more feature location maps 814 that maintain, for example, the location of the detected feature (e.g., the detected feature location map 406 of fig. 4), the location of a previous frame feature (e.g., the previous frame feature location map 410 of fig. 4), or the location of a candidate feature (e.g., the candidate feature location map 408 of fig. 4). In some examples, data store 810 may store T such as described above with reference to equation (1)_HThe like or other parameter(s) described herein, e.g., with reference to the illustrative results given below.

The communication interface 804 enables the computing device 102 to communicate with other computing devices and may represent other devices that the computing device 102 may use to receive video content. For example, in an environment that supports the transmission of video content over an IP network, the communication interface 804 may represent a connection via which the computing device 102 can receive the video content, such as by way of a particular Uniform Resource Locator (URL). In some examples, communication interface 804 may include, but is not limited to, a transceiver for ethernet, cellular (3G, 4G, or other), WI-FI, Ultra Wideband (UWB), BLUETOOTH, satellite, or other wireless transmissions. The communication interface 804 may include a wired I/O interface, such as an ethernet interface, a serial interface, a Universal Serial Bus (USB) interface, an INFINIBAND interface, or other wired interface.

The video or frames of video 106 may additionally or alternatively be received via the I/O interface 802. The I/O interface 802 may include or be communicatively connected to, for example, one or more tuners, video capture devices, video encoders, or format converters to enable the computing device 102 to receive and store video. In some examples, I/O interface 802 may include or be communicatively connected with one or more sensors 112, as described above with reference to fig. 1. The sensor(s) 112 may be configured to capture, for example, the video frames 106 of the scene 108, as discussed above.

The I/O interface 802 may additionally or alternatively include or be communicatively connected with, for example, a display device 126 to enable the computing device 102 to present video content. In an example implementation, the I/O interface 802 provides signals to a television or other display device that displays video data, such as discussed above with respect to fig. 1.

The I/O interface 802 may additionally or alternatively include or be communicatively connected with, for example, a user-operable input device 816 (graphically represented as a game pad) to enable a user to, for example, instruct the computing device 102 to track a particular feature. User-operable input devices 816 may also be used to control playback of video, for example, via fast forward, playback, play, pause, and stop functions. The reporting component 124 can adjust the display of video on the display device 126 based at least in part on input received via the user-operable input device 816. This may allow the system to be used, for example, in a real-time security or surveillance context.

In some examples, the detection component 120, when executed by the processor(s) 116 (all here), generates the feature location map based at least in part on the first frame of the video. This may be as described above with reference to fig. 4, for example using a facsdk algorithm such as those described above.

In some examples, the tracking component 122 generates candidate feature location maps based at least in part on, for example, frames of the video from the detection component 120 and previous feature location maps. The features correspond to at least some image data of a frame of the video. This may be as described above with reference to the tracking module 402 of fig. 4. The tracking component 122 then determines candidate feature points based at least in part on the candidate feature location map, determines candidate feature regions based at least in part on the determined candidate feature points, and locates features at the candidate feature regions in a frame of the video by comparing a histogram of a test region in the frame of the video with a histogram of a corresponding region in a previous frame of the video, wherein the test region is determined based at least in part on the candidate feature regions. This may be as described above with reference to the update module 404 of fig. 4.

In some examples, the reporting component 124 provides a visual representation of the features located in the frames of the video, e.g., via a graphical user interface. This may be as described above with reference to fig. 1 and 4.

In some examples, such as a computing device 102 that provides feature tracking services, the computing device 102 may include the tracking component 122 but not the sensor 112. In some examples of an image source 104, such as, for example, a smartphone using a feature tracking service, the computing device 102 representing the image source 104 may include the sensor 112, but does not implement the tracking component 122. In some examples, such as the computing device 102 or the image source 104 implementing both the feature tracking service and the use thereof, the computing device 102 may include the sensor 112 and implement the tracking component 122. In some examples, a feature tracking service system, such as the computing device 102 or an image capture system (such as the image source 104), may implement the detection component 120 or the reporting component 124.

Although shown separately, some of the various components of the computing device 102 may be implemented together in a single hardware device, such as in a Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), system on a chip (SOC), Complex Programmable Logic Device (CPLD), Digital Signal Processor (DSP), or other type of customizable processor. For example, the processor 116 may represent a hybrid device, such as a device from ALTERA or XILINX that includes a CPU core embedded in an FPGA fabric. These or other hardware logic components may operate independently, or in some instances, may be driven by a CPU. In some examples, the processor 116 may be or include one or more single-core processors, multi-core processors, Central Processing Units (CPUs), Graphics Processing Units (GPUs), general purpose GPUs (gpgpgpus), or hardware logic components configured to perform the functions described herein, e.g., via special-purpose programming from a module or API.

Additionally, a system bus 818 typically connects the various components within the computing device 102. The system bus 818 can be implemented as any one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, or a local bus using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus such as a Mezzanine bus.

Any of the components shown in FIG. 8 may be hardware, software, or a combination of hardware and software. Moreover, any of the components shown in FIG. 8 (e.g., memory 118) may be implemented using any form of computer-readable media that is accessible locally or remotely (including over network 132) by computing device 102. Computer-readable media includes both types of computer-readable media, namely computer storage media and communication media. Computer storage media (e.g., a computer storage medium) includes tangible storage units, such as volatile memory, non-volatile memory, and/or other persistent and/or secondary computer storage media, removable and non-removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes media in tangible or physical form that is included in or is part of a device or a hardware component external to a device, including but not limited to: random Access Memory (RAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), phase change memory (PRAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD), optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or memory, storage, devices, and/or storage media that can be used to store and maintain information for access by the computing device 102 or the image source 104.

In contrast to computer storage media, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transport mechanism. As defined herein, computer storage media does not include communication media. In some examples, memory 118 may be or include computer storage media.

Illustrative results

SFCT and Struck trackers (comparable) were tested on two different video sequences as trackers (examples) according to various aspects herein. The parameters of the example tracker are determined empirically and include at least every 50 frames of detection, dropping out-of-frame trackers that have been out-of-frame for 55 frames, dropping in-frame trackers that have not been updated for 100 frames, a minimum overlap ratio of 0.3 (equation (3)), a threshold median value FBE of 10 above which the tracker is considered out-of-frame, and a threshold histogram distance of 0.5 above which the tracker is considered out-of-frame. The example configuration tested provides comparable or improved success rates for both sequences compared to SFCT and Struck, and improved accuracy rates for position error thresholds above about 20 from ground truth. The example configuration tested also provided faster results than either SFCT or Struck, as shown in table 1.

Table 1 shows the processing time per frame in milliseconds. As shown, the example configuration tested is capable of handling over 100fps on a computing system including an INTEL CORE (INTEL CORE) i73.40ghz CPU.

Watch (A)

Example clauses

A: a system, comprising: a processing unit: a Computer Readable Medium (CRM) operably coupled to the processor; a tracking module stored in the CRM and executable by the processing unit to generate a candidate feature location map based at least in part on a frame of the video and a previous feature location map, wherein the candidate feature location map indicates candidate locations of one or more features and respective ones of the features of at least some image data corresponding to the frame of the video; and an update module stored in the CRM and executable by the processing unit to: determining a candidate feature region based at least in part on the candidate feature location map; and locating a feature at a candidate feature region in a frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a previous frame of the video, wherein the test region is determined based at least in part on the candidate feature region.

The system as recited in paragraph a, further comprising a detection component stored in the CRM and executable by the processing unit to generate a previous feature location map based at least in part on a previous frame of the video.

The system as recited in paragraphs a or B, wherein the previous feature location map corresponds to a previous frame of the video, and the previous frame of the video is arranged in the video immediately before the frame of the video.

The system as recited in any of paragraphs a-C, wherein the test region comprises a candidate feature region and the feature is a face.

The system as recited in any of paragraphs a-D, wherein the update module is further executable by the processing unit to: determining a second test region based at least in part on the candidate feature region or the test region; and locating a feature at the candidate feature region in the frame of the video by comparing the histogram of the second test region in the frame of the video with the histogram of the corresponding region in the previous frame of the video.

The system as recited in any of paragraphs a-E, wherein the update module is further executable by the processing unit to determine the test region based at least in part on a spatial relationship between the candidate feature region and the test region.

G: the system as referenced in paragraph F, wherein the spatial relationship is a near-end disjoint relationship.

H: a system as recited in any of paragraphs a-G, further comprising a display device and a reporting component stored in the CRM and executable by the processing unit to provide a visual representation of a feature located in a frame of the video via the display device.

I: a system as recited in any of paragraphs a-H, wherein the one or more features comprise one or more faces or identification symbols.

J: a method, comprising: detecting a feature in a first frame of a plurality of frames of a video, the plurality of frames of the video including the first frame followed by a second frame; determining a detected first frame feature region of the first frame corresponding to the detected feature; determining a tracked candidate second frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first frame feature region; comparing the histogram of the tracked candidate second frame feature region with the histogram of the detected first frame feature region; and selecting the tracked candidate second frame feature region as the tracked second frame feature region in response to the comparison indicating at least a threshold degree of similarity.

K: a method as recited in paragraph J, wherein the comparing comprises calculating a distance between the histogram of the candidate second frame feature region being tracked and the histogram of the detected first frame feature region.

L. the method as recited in paragraph K, further comprising determining a histogram in a color space having at least hue and chroma dimensions.

M: the method as recited in any of paragraphs J-L, wherein determining candidate second frame feature regions that are tracked comprises applying a Kanade-Lucas-Tomasi tracker to the first frame and the second frame.

N: a method as recited in any of paragraphs J-M, wherein the plurality of frames of the video includes a third frame that is later than the second frame, and wherein the method further comprises: determining one or more tracked candidate third frame feature regions of the third frame based at least in part on the second frame, the third frame, and the one or more tracked second frame feature regions comprising the tracked second frame feature regions; detecting one or more features in the third frame; determining respective detected third frame feature regions of the third frame corresponding to the detected features in the third frame; associating at least some of the tracked candidate third frame feature regions with ones of the detected third frame feature regions; and selecting each of the tracked candidate third-frame feature regions that is not associated with each of the detected third-frame feature regions as an out-of-frame region.

O: a method as recited in paragraph N, wherein the plurality of frames of the video includes a fourth frame that is later than the third frame, and wherein the method further comprises: repeating the detecting for a fourth frame to determine a fourth frame feature region that is detected; and associating one of the out-of-frame regions with the fourth frame feature region that is detected.

A method as recited in any of paragraphs J-O, wherein determining candidate second frame feature regions that are tracked comprises comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.

Q: a method as recited in any of paragraphs J-P, wherein the features include one or more faces or identification symbols.

A system, comprising: means for detecting a feature in a first frame of a plurality of frames of a video, the plurality of frames of the video including the first frame followed by a second frame; means for determining a detected first frame feature region of the first frame corresponding to the detected feature; means for determining a candidate tracked second frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first frame feature region; means for comparing the histogram of the candidate second frame feature region being tracked with the histogram of the detected first frame feature region; and means for selecting the tracked candidate second frame feature region as the tracked second frame feature region in response to the comparison indicating at least a threshold degree of similarity.

S: a system as recited in paragraph R, wherein the means for comparing comprises means for calculating a distance between the histogram of the candidate second frame feature region being tracked and the histogram of the feature region of the first frame being detected.

A system as paragraph S refers to further comprising means for determining a histogram in a color space having at least hue and chroma dimensions.

U: the system as recited in any of the paragraphs R-T, wherein the means for determining candidate second frame feature regions that are tracked comprises means for applying a Kanade-Lucas-Tomasi tracker to the first frame and the second frame.

V: the system as recited in any of paragraphs R-U, wherein the plurality of frames of the video includes a third frame later than the second frame, and wherein the system further comprises: means for determining one or more tracked candidate third frame feature regions of the third frame based at least in part on the second frame, the third frame, and the one or more tracked second frame feature regions comprising the tracked second frame feature regions; means for detecting one or more features in the third frame; means for determining respective detected third frame feature regions of the third frame corresponding to the detected features in the third frame; means for associating at least some of the tracked candidate third frame feature regions with a plurality of the detected third frame feature regions; and means for selecting each of the tracked candidate third frame feature regions that are not associated with each of the detected third frame feature regions as an out-of-frame region.

W: a system as paragraph V refers to, wherein the plurality of frames of the video includes a fourth frame later than the third frame, and wherein the system further comprises: means for repeating the detecting for a fourth frame to determine a fourth frame feature region detected; and means for associating one of the out-of-frame regions with the fourth frame feature region that is detected.

X: the system as recited in any of paragraphs R-W, wherein the means for determining candidate second frame feature regions that are tracked comprises means for comparing at least some image data of the second frame with at least some image data of the first frame and at least some image data of a frame subsequent to the second frame.

Y: a system as recited in any of paragraphs R-X, wherein the features comprise one or more faces or identifiers.

Z: a system, comprising: a processing unit; a computer readable medium operatively coupled to the processor; a detection component stored in the CRM and executable by the processing unit to locate a feature in a first frame of the video; a tracking component stored in the CRM and executable by the processing unit to track each of the located features over a subsequent plurality of second frames of the video, wherein the tracking component is configured to record an indication that a first feature of each of the located features that is tracked has moved out of a particular frame of the plurality of second frames if: the histogram of the first feature region of the particular frame does not correspond to the histogram of the first feature region of the first frame; and a reporting component stored in the CRM and executable by the processing unit to provide an indication of at least some of the tracked ones of the located features.

AA: a system as referenced by paragraph Z, wherein the reporting component is further executable by the processing unit to present for display a visual representation of one or more frames of the video and at least some of the tracked ones of the located features.

AB: the system as recited in paragraph AA, wherein the visual representation includes at least one blur or at least one highlight.

AC: the system as recited in any of paragraphs Z-AB, wherein the tracking module is further executable by the processing unit to: calculating a distance between the histogram of the first feature region of the specific frame and the histogram of the first feature region of the first frame; and determining that the histogram of the first feature region of the specific frame and the histogram of the first feature region of the first frame do not correspond to each other in response to the calculated distance exceeding the selected threshold.

AD: a system as recited in any of paragraphs Z-AC, further comprising an image sensor configured to provide one or more of the frames of video.

AE: a system as recited in any of paragraphs Z-AD, wherein the located features comprise one or more faces or identifiers.

AF：

A computer-readable medium (e.g., computer storage medium) having thereon computer-executable instructions that, when executed, configure a computer to perform the operations described in any of paragraphs J-Q. AG: an apparatus, comprising: a processor; and a computer-readable medium (e.g., a computer storage medium) having thereon computer-executable instructions that, when executed by a processor, configure the device to perform the operations recited in any of paragraphs J-Q.

AH is a system comprising: means for processing; and means for storing thereon computer-executable instructions comprising means for configuring the apparatus to perform the method as described in any of paragraphs J-Q.

Final phrase

The video analysis techniques described herein may provide feature tracking, such as face tracking, using reduced processing time and memory consumption compared to existing approaches. This may provide the ability to use feature tracking data in a wide variety of contexts, such as highlighting or blurring a face or license plate of interest visible in the video in real-time.

Although the detection and tracking features have been described in language specific to structural features and/or methodological steps, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or steps described. Rather, the specific features and steps are disclosed as preferred forms of implementing the claimed invention.

The operations of the example processes are illustrated in separate blocks and are summarized with reference to these blocks. The processes are illustrated as a flow of logical blocks, each of which may represent one or more operations that may be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, cause the one or more processors to perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, modules, components, data structures, and so forth that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be performed in any order, combined in any order, sub-divided into multiple sub-operations, and/or performed in parallel to implement the described processes. The described processes may be performed by one or more segments of hardware logic (such as FPGAs, DSPs, or other types of accelerators) and/or resources associated with one or more computing devices 102 (such as one or more internal or external CPUs or GPUs).

The methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general-purpose computers or processors. These code modules may be stored in any type of computer-executable storage media or other computer storage device. Some or all of the methods may alternatively be embodied in dedicated computer hardware.

Unless specifically stated otherwise, it may be understood in the context that conditional language (such as "can," "might," or "may") means that a particular example includes but other examples do not include a particular feature, element, and/or step. Thus, such conditional language is not generally intended to imply that particular features, elements, and/or steps are required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether particular features, elements, and/or steps are to be included or to be performed in any particular embodiment. Unless specifically stated otherwise, it is to be understood that joint language (such as the phrase "X, Y or at least one of Z") indicates that the term, word, etc. can be either X, Y or Z, or a combination thereof.

Any routine descriptions, elements, or blocks in the flow charts described herein and/or in the accompanying drawings should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternative implementations are included within the scope of the examples described herein in which elements or functions may be deleted or executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art. It should be emphasized that many variations and modifications may be made to the above-described examples, the elements of which are to be understood as being in other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

1. A system for feature tracking, comprising:

a processing unit;

a computer readable medium CRM operatively coupled to the processor;

a detection component stored in the CRM and executable by the processing unit to detect a feature in a first frame of a video;

a tracking module stored in the CRM and executable by the processing unit to generate a candidate feature location map based at least in part on a frame of a video and a previous feature location map, wherein the candidate feature location map indicates candidate locations of one or more candidate features and each of the candidate features corresponds to at least some image data of the frame of the video; and

an update module stored in the CRM and executable by the processing unit to:

determining a candidate feature region based at least in part on the candidate feature location map; and

locating a feature at the candidate feature region in the frame of the video by comparing a histogram of a test region in the frame of the video to a histogram of a corresponding region in a previous frame of the video, wherein the test region is determined based at least in part on the candidate feature region;

wherein the tracking module determines a correspondence between a detected feature and a candidate feature using an allocation algorithm to determine a mapping between the detected feature and the candidate feature to determine whether the candidate feature is in-frame or out-of-frame.

2. The system of claim 1, wherein the update module is further executable by the processing unit to:

determining a second test region based at least in part on the candidate feature region or the test region; and

locating a feature at the candidate feature region in the frame of the video by comparing a histogram of the second test region in the frame of the video to a histogram of a corresponding region in the previous frame of the video.

3. The system of claim 1, wherein the update module is further executable by the processing unit to determine the test region based at least in part on a spatial relationship between the candidate feature region and the test region.

4. A method for feature tracking, comprising:

detecting a feature in a first frame of a plurality of frames of a video, the plurality of frames of the video including the first frame followed by a second frame;

determining a detected first frame feature region of the first frame corresponding to the detected feature;

determining a tracked candidate second frame feature region of the second frame based at least in part on the first frame, the second frame, and the detected first frame feature region;

comparing the histogram of the tracked candidate second frame feature region with the histogram of the detected first frame feature region;

determining a correspondence between a detected feature and a candidate feature using an allocation algorithm to determine a mapping between the detected feature and the candidate feature to determine whether the candidate feature is in-frame or out-of-frame; and

selecting the tracked candidate second frame feature region as a tracked second frame feature region in response to the comparison indicating at least a threshold degree of similarity.

5. The method of claim 4, wherein the comparing comprises calculating a distance between the histogram of the tracked candidate second frame feature region and the histogram of the detected first frame feature region.

6. The method of claim 4, wherein determining the candidate tracked second frame feature region comprises applying a Kanade-Lucas-Tomasi tracker to the first frame and the second frame.

7. A system for feature tracking, comprising:

a processing unit;

a computer readable medium CRM operatively coupled to the processor;

a tracking component stored in the CRM and executable by the processing unit to track each of the located features over a subsequent plurality of second frames of the video, wherein the tracking component is configured to record an indication that a first feature of the tracked each of the located features has moved out of a particular frame of the plurality of second frames if: the histogram of the first feature region of the particular frame does not correspond to the histogram of the first feature region of the first frame; and

a reporting component stored in the CRM and executable by the processing unit to provide an indication of at least some of the various ones of the located features that are tracked;

wherein the tracking component determines a correspondence between a detected feature and a candidate feature using an assignment algorithm to determine a mapping between the detected feature and the candidate feature to determine whether the candidate feature is in-frame or out-of-frame.

8. The system of claim 7, wherein the reporting component is further executable by the processing unit to present for display one or more frames of the video and a visual representation of at least some of the tracked ones of the located features.

9. The system of claim 7, wherein the tracking component is further executable by the processing unit to:

calculating a distance between a histogram of a first feature region of the specific frame and a histogram of the first feature region of the first frame; and

determining that the histogram of the first feature region of the particular frame and the histogram of the first feature region of the first frame do not correspond to each other in response to the calculated distance exceeding the selected threshold.

10. The system of claim 7, further comprising an image sensor configured to provide one or more of the frames of the video.