WO2013019743A2 - Apparatus and methods for object recognition using a genetically-defined feature space transform - Google Patents

Apparatus and methods for object recognition using a genetically-defined feature space transform Download PDF

Info

Publication number
WO2013019743A2
WO2013019743A2 PCT/US2012/048881 US2012048881W WO2013019743A2 WO 2013019743 A2 WO2013019743 A2 WO 2013019743A2 US 2012048881 W US2012048881 W US 2012048881W WO 2013019743 A2 WO2013019743 A2 WO 2013019743A2
Authority
WO
WIPO (PCT)
Prior art keywords
feature space
transform
sensor
interest
compact
Prior art date
Application number
PCT/US2012/048881
Other languages
French (fr)
Other versions
WO2013019743A3 (en
Inventor
Daniel Riley SCHUPP
Lue ANDRIE-HER
Original Assignee
Reconrobotics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Reconrobotics, Inc. filed Critical Reconrobotics, Inc.
Publication of WO2013019743A2 publication Critical patent/WO2013019743A2/en
Publication of WO2013019743A3 publication Critical patent/WO2013019743A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present invention relates to apparatus and methods for object classification from visual information using a feature space transform genetically-defined off-line and then implementing the feature space transform on a low- and locally-powered low cost basic image processing platform (e.g., adapted from prior generation cellular phones) as well as apparatus and methods for generating the feature space transform; that is, within the field of machine vision, the efficient classification of specific pre-defined objects in near real-time in-situ is a significant problem addressed and solved herein.
  • a feature space transform genetically-defined off-line and then implementing the feature space transform on a low- and locally-powered low cost basic image processing platform (e.g., adapted from prior generation cellular phones) as well as apparatus and methods for generating the feature space transform; that is, within the field of machine vision, the efficient classification of specific pre-defined objects in near real-time in-situ is a significant problem addressed and solved herein.
  • a genetic algorithm In a genetic algorithm ("GA"), a group of strings encode candidate solutions to an optimization problem that over generations evolves toward improved better solutions. Traditionally, solutions are represented in binary as strings of 0s and I s, but other encodings are also possible.
  • the evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified (recombined, crossed-over, and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the application of the GA. Commonly, the GA can be selected to cease operating when a maximum number of generations have been produced or a satisfactory fitness level has been reached for the population. If the GA has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.
  • a typical genetic algorithm requires: a genetic representation of the solution domain, and a fitness function to evaluate the solution domain.
  • GA Genetic Algorithm
  • Chromosome a component of the solution in an individual or a series of genes.
  • Alleles the set of possible states a single gene can take and can be numerical or symbolic (for example: 152 or "addition").
  • Biological equivalent Adenine, Thymine, Cytosine, Guanine in DNA.
  • Fitness Function a function that explicitly defines what constitutes a desirable solution. This function is used to evaluate and deliver the fitness.
  • Feature Space an abstract space in which the features of an object are more readily quantifiable.
  • Basis Vectors the set of vectors which define a coordinate system - all other vectors can be defined as a combination of these vectors.
  • Blobs the representation of an object as a binary image (black and white).
  • the instant disclosure relates to use of genetic algorithms that are individually trained extensively a priori to identify specific objects or features of such specific objects (e.g., man-made apparatus such as vehicles, buildings, signs, flags, and the like and natural-objects such as animals and the like).
  • the resulting feature recognition set typically includes a very compact algorithm that can be implemented in or on a small reconnaissance vehicle having limited processing capability, for example.
  • the operative algorithms can be grouped or utilized to identify diverse and distinct objects (e.g., two, five, ten, or more) from the vehicle.
  • the feature recognition set or system can be deployed as a compact self-contained unit and mounted to a structure having a desired field of view.
  • the objects of interest which can include a weapon, a person, a vehicle, and/or a particular building the feature recognition set single (on the vehicle or mounted in a stationary location) can alternatively apply the diverse algorithms upon a scene of interest and distinguish with a high degree of specificity and sensitivity which of the plurality of objects is present.
  • the conclusions drawn from the algorithm can be wirelessly transmitted for appropriate follow-up, stored to a memory location, trigger a remote or local alert or alarm, and/or can be quickly acted upon by the vehicle.
  • the GA is inseparable from the object recognition process.
  • the feature space transform is the system and the method of distinguishing objects, and the feature space transform is fully evolved through the GA, it would be incorrect to say that the GA plays anything but a core role in the system.
  • the GA in the ⁇ 07 patent application assists the object classification, while the GA in the proposed system is the object classification.
  • Object recognition utilizing machine or robotic vision has been a difficult capability to achieve. With man-made objects the task is a very cumbersome for one reason due to the large variety of objects capable of being classified or identified. Organic objects are even more difficult due to the infinite variability of not only the objects themselves but also the infinite variety in positions and deformations of them.
  • the system according to this disclosure requires a video stream of a scene, either from a live camera or from a saved file, as well as hardware for video processing and the (optimized) feature space transform.
  • the system also requires potentially separate hardware for genetic training of the feature space transform.
  • the video must be initially processed to extract images and blobs. This allows a human to discern between the different kinds of objects that are desired for classification.
  • the GA generates feature space transforms using random numbers (similar to the concept of genetic mutation).
  • a population of individuals in this case, an individual consists of a single transform
  • the fitness function must evaluate the transform for all training data, and then analyze the representations of the data.
  • Higher fitness scores are awarded for a tighter grouping within each class, and for greater distances between the classes (for instance, bikers and pedestrians should have substantially different scores from each other, but all bikers should have a similar representation).
  • the feature space transform itself is a dot product of the representation of the image with transformation matrices (and thus resolves down to a single point value in a grid).
  • the matrix is simply a series of coefficients which are optimized by the GA to produce significant shifts in the feature space representation for defining features of an object's blob.
  • the blob representation is multiplied element-by- element by the transformation matrices, and the result is summed for each matrix.
  • Each transformation matrix provides a dimension for the feature space.
  • the genetic algorithm needs to evolve the solutions through a series of generations, in which the solutions are combined through genetic crossover, and compete to provide the stress necessary to guide the evolution of the transform.
  • the algorithm can keep a human user up to date, by calculating the percentage accuracy on the test data.
  • the transform can be easily incorporated into an image processing system.
  • the system simply performs the feature space transform, and checks the representation against the known values of the objects to be identified.
  • a GA develops solutions which act as the transform to a feature space.
  • a feature space is a space in which the coordinate system is fundamentally different from the usual Cartesian or polar spatial coordinate system.
  • a classic example of a commonly used space is the Fourier domain.
  • a time-varying sinusoid that stretches to infinity becomes an impulse (essentially, a point) in frequency (Fourier) space.
  • a feature space is characterized not by a shift from spatial and temporal domains to a frequency domain, but rather a shift from a two-dimensional representation (the "blob" of an object) to a point in a custom-defined feature space, which brings out distinctions between different types of objects.
  • the features space transform that is most effective at bringing out differences between classes of objects, the feature space which most lends itself to the problem is defined.
  • An alternate explanation of the feature space takes the point of view of vector mathematics and the dot product. The multiply and sum method that is employed by the transform is the same as a numerical dot product.
  • a dot product is referred to as a projection, because it projects one vector onto another. In essence, by projecting the image of the object onto the feature space the features that differentiate it from the other objects are thrown into stark contrast.
  • the solutions in the population are the same size as the sample images (a standard size is picked, and the blob of an object must be extracted from an image and resized to that standard size).
  • a series of example images is fed through each solution, and their resulting scores are represented by points on a graph.
  • the fitness of a solution is determined by the tightness of the grouping, and distance between groupings of objects (the first priority is to remove overlaps between groups).
  • the best solution would have a series of very tightly knit groupings (one for each class) that are many standard deviations away from each other.
  • the form of the solution defined it is simple to generate a population of solutions with associated fitness scores.
  • the form of the fitness function allows the algorithm to evaluate the solutions, promoting solutions that perform well, while removing solutions that perform poorly.
  • This algorithm will yield a genetically-defined feature space transform.
  • the feature space transform per the present disclosure runs on a compact, very low power, and computationally inexpensive platform (e.g., it can be implemented with prior-generation cell phone image processing firmware), and is simple to implement.
  • the winning or best solution is included in the image processing code as a predefined matrix.
  • an object When an object is detected, it can be isolated and resized to the predetermined standard size.
  • the object's blob representation is then simply multiplied by the feature space transform element-by-element, and the result is summed together. This is the numerical definition of a dot product, which yields a single point representing the blob in the feature space.
  • the recognition engine By knowing information about the feature space gleaned from the genetic algorithm itself, we can identify which grouping the point belongs to. For example, the x/y bounds of the person on a bike class or a vehicle are known, so if a point falls within those bounds, it is classified as a person on a bike or the vehicle, respectively.
  • An assumption that can further reduce the complexity and computational overhead is that in some, is not most or all, object recognition environments or situations the perspective of the sensor, or camera, will have a predetermined perspective (e.g., an oblique angle of say forty-five degrees (45°) for an aerial vehicle) so the recognition engine only needs to be trained with images of an object of interest from that perspective in order to effectively and efficiently classify same.
  • the recognition engine is susceptible of utilizing diverse optics including electro-optics, standard magnifying optics for small-scale objects of interest, and the like by simply coupling the optics of choice to the sensing or sensor portion of the system.
  • FIG. 1 depicts in flow chart form the pre-processing (or training) phase, captioned as the "In House Processing” and the field processing (or simply, the processing) phase of generating a suitable GA and then applying it in the field in a compact, efficient manner, respectively.
  • FIG. 2 depicts in annotated flow chart form the training phase for a genetically- defined feature space transform using the previously generated GA, according to the instant disclosure.
  • FIG. 3 is an annotated flow diagram illustrating the basic connections and some of the operations of the embedded processor of the instant disclosure.
  • FIGS. 4 A and 4B depict in a relative scale an image captured that originally includes 229x209 pixels (following background / foreground filtering, or segmentation) and then is resized to a 50x50 pixel thumbnail matrix (or 2500 element vector) for processing and analysis by the feature space transform, respectively.
  • FIGS. 5 illustrate two distinct classes of objects; namely, a pedestrian and a person on a recumbent bicycle (A and B) applied via a suitably trained feature space transform (or chromosome) as seen at C to provide an X-Y plot showing the grouping and separation between these distinct objects (and also incidentally including several automobiles as outside the desired feature spaces of the pedestrian and the person on a recumbent bicycle).
  • FIG. 6 depicts in grey-scale for patent purposes a pair of chromosomes (#1 and
  • FIG. 7 illustrate in a histogram format a plot of a "bike fit” and a "human fit” for a given chromosome (herein chromosome #17) and reflecting the statistically significant separation therebetween.
  • FIG. 8 is a photograph of a stationary-mounted field system according to the instant disclosure, wherein the field-based system was provided with relatively heavy- duty battery power (including heat transfer fins for warmer days) and was trained to classify pedestrians, bikers, and cars.
  • FIG. 9 depicts an aerial vehicle having a sensor near or on its nose and illustrating the oblique angle, theta or ⁇ (used to reduce the complexity of object recognition herein) for this particular application wherein the vehicle is assumed to have a downward stare at ⁇ , periodic or persistent, equaling approximately 45 degrees (45°) from horizontal or from the vehicle's present course.
  • FIG. 10 depicts a low-to-the-ground reconnaissance vehicle, such as those designed, manufactured, and distributed by ReconRobotics, Inc. of Edina, Minn. (U.S.A.) having a generally upward stare (angle ⁇ ) to identify objects of interest from its typically relatively lower perspective during tactical operation.
  • the present disclosure relates to systems, components, methods, and use of genetically-defined feature space transforms for rapid, efficient, and low-power object recognition in diverse environments and of diverse objects.
  • the feature space transform can be formulated via evolution in advance of deployment of the resulting feature space transform on computationally simple and efficient platforms.
  • the former is referred to as a pre-processing phase and the later a processing phase.
  • the pre-processing phase can utilize any arbitrary computational platform including user interfaces therefor.
  • Such platforms can of include diverse interfaces that include one or more displays (e.g., standard 2D or single-, or multi-, touch-screen displays) so the operator can monitor, interrupt or alter variables toward progress during the pre-processing phase, for example.
  • Manual data entry can occur via known techniques such as capacitive-touch (e.g., using a stylus, manual digit, or the like), mouse-, keyboard-, or voice-activated and -driven input pathways.
  • Data response and retrieval from such platforms can include one or more of visual display, auditory, and tactile feedback to the operator, as is well known in the art.
  • the sensor can be mounted in a stationary location and used to monitor and classify one or more types of objects passing through its field of view in a persistent stare configuration.
  • the senor can be articulated to periodically sample a first field of field of view and at least a second field of view.
  • the ideally complementary fields of view can then be stitched together to provide a virtual, and larger, field of view.
  • two or more discrete object recognition engines and accompanying sensors, whether operating at the same or different wavelengths
  • a dedicated recognition engine operating in one discrete portion of the electromagnetic spectrum could share all or a portion of its field of view with a second (or more) recognition engine(s) monitoring a different portion of the electromagnetic spectrum.
  • the location could utilize pre-existing structures (manmade and natural) or could be mounted upon a specially erected structure at a desired location.
  • the sensor can be mounted to a moving vehicle traversing air, water, land, and space.
  • the type of sensors deployed for the processing phase includes nearly every type of pixel-based, or capable of being represented as pixilated data, presently known and later developed (e.g., visible spectrum, infrared, ultraviolet, microwave, radiofrequency, millimeter wave, and could also include acoustic sensors as well).
  • the compact physical dimensions and the very low power requirements allow the system to operate for extended periods in the processing mode using just a local battery source.
  • the battery source can comprise a primary or secondary (rechargeable) battery and the power requirements when implemented on an essentially "smart phone” platform are on the order of five volts (5V) to about twelve volts (12V) with a secondary battery providing 1000 mAh to about 2000 mAh (which could be configured as a primary battery cell).
  • the present real-time, field-based recognition engine can operate reliably coupled to a manually-deployed UAV having a total lift-off weight of between about five pounds (5#) and about ten pounds (10#) or so.
  • the field-based recognition engine can be hosted by other very lightweight UAV reconnaissance vehicles without diminishing the performance thereof.
  • the recognition engine as described, depicted, and claimed herein will not inhibit the performance in terms of loitering time above a potentially hostile environment while at the same time providing specific object identification and classification capabilities to the UAV (and thus, via telemetry, rapid response from other theatre-based assets).
  • the selected feature space transform will continuously process incoming images from the sensor and when the specific object is identified (after processing the imagery) a single point resulting from a dot product mathematical operation during image processing the coordinates of said point can be sent via telemetry, saved to memory, and/or other communicated to a remote location.
  • a visual, audio, tactile, or other type of alert can be generated generally or specifically relating to the type or class of the object just identified and classified.
  • more than one object can be accurately identified by a single suitably trained or genetically-defined feature space transform assuming adequate separation between classes and relatively tight grouping for each class is obtainable from the transform.
  • the operational transform is trained using images having a unique perspective. That is, if the intended object or objects used for training will be imaged by a sensor coupled to an manned aircraft or unmanned aerial vehicle an assumption for training and operation would include an oblique angle of, for example, 45 degrees (-45°) downward. If the sensor were coupled to a small reconnaissance vehicle, such one of the vehicles produced by ReconRobotics, Inc. of Edina, Minnesota, U.S.A., the angle used for training would be from a low-to-the-ground location (e.g., perhaps 15°-20 ° upward). If used on a submersible, the angle could be arbitrarily depending on whether shallow- or deep-water reconnaissance was planned and the types and likely locations of objects of interest to be interrogated relative to the submersible vehicle (or fixed submerged location).
  • the Object is the vector representation of an object and the [Genome] is a set of vectors (N number) the same length as the Object.
  • the [Score] is a set of N numbers. It can be thought of as a much shorter vector in the feature space.
  • the fitness function is used to determine the usefulness of a given solution. The highest possible fitness score is sought by the genetic algorithm.
  • the fitness function takes the general form of:
  • the genetic algorithm applies the feature space transform of each solution in the population to every piece of test data.
  • the fitness of the solution is calculated using the fitness function, and the population is ordered by fitness. Solutions with higher fitness scores are promoted and combined.
  • FIG. 1 depicts in flow chart form 100 the pre-processing (or training) phase, captioned as the "In House Processing” 102 and the field processing (or simply, the processing) phase 104 of generating a suitable GA to generate a qualifying feature space transform and then applying the feature space transform in the field in a compact, efficient manner, respectively.
  • the pre-processing phase 102 at least two or a longer series of training images of an object of interest to later be accurately classified are applied through a genetic program 108 which then provides a candidate GA 1 10 that passes to an evaluation phase 1 12 and the fitness relative to serving as a operational GA driver for a specific feature space transform is evaluated relative to other generated GAs 1 10.
  • the fitness of the GA 1 10 is either adequate or optimum or not (at 1 14), in the former case the GA 1 10 passes to the field processing (or processing) phase for use in the field in a compact, computationally efficient embedded controller (as described and depicted herein). In the latter case at least another generation of in-house training occurs in an effort to reach an optimum GA for the given object of interest.
  • One aspect of the pre-processing (or in-house) training phase is that for certain scenarios an assumption is made as to the likely perspective of the object of interest relative to the sensor that captures the images in real time in the field.
  • the assumption might include a nominal 45 degree (45°) inclination or oblique angle between sensor and object thus greatly reducing the other possible perspectives that need to be considered and reducing the complexity of both the training and the recognition operations. If this assumption needs to be refined or modified the original real time images can be processed and then post-processed (or additionally processed) using, for example, an affine transformation or algorithm. This might be beneficial if the original assumption were a nominal 45 degree (45°) inclination or oblique angle and the actual angle were on the order of near 90 degrees (90°) for example.
  • the feature space transform is inserted into a field processing routine 104, desirably disposed in an efficient, low-overhead environment (e.g., both low power, compact electronic files, and lightweight).
  • an image 1 16 is acquired by the sensor and a so-called blob extraction technique applied (at 1 18) that can include various techniques for isolating the object of interest from background items, deemed noise, and some foreground items, if needed.
  • a resulting blob image 120 is thus produced and it is applied to the feature space transform for rapid processing.
  • the feature space transform can provide several details about the object, such as locating its centroid and outer peripheral edges and after normalizing the image (e.g., to a nominal size like 50x50 pixels or 2500 pixel matrix) performs a mathematical dot product calculation that has as an output a single value.
  • This single value is mapped to a graphical depiction (e.g., a grid) of previously identified and verified objects of interest (at 122) and given the single value's location relative to the grid, an identification declaration occurs (at 124).
  • the identification declaration 124 could be affirmative, negative, or indeterminate. If affirmative, then the object of interest has been located with a high degree of confidence resulting in several possibilities for follow- through.
  • the identification could be merely stored in memory for later review, analysis, or cataloging (perhaps in a stationary sensor environment tracking pedestrian and vehicle traffic) or it could initiate an aggressive response to intercept, interdict, or disable the object of interest (in either a stationary sensor or a roving sensor scenario), or it could trigger an alarm or notice locally or remotely.
  • FIG. 2 depicts in more detail than FIG. 1 , and in annotated flow chart form 200 the pre-processing (or training) phase for a genetically-defined feature space transform according to the instant disclosure.
  • a series of training images are gathered and organized by type of object of interest. As noted above, these training images 202 are most useful if they include a perspective relative to an anticipated sensor location vis-a-vis the object of interest so as to simply and provide a relatively finite solution space for the object of interest.
  • the images might include various objects but the training phase 200 can be utilized for diverse objects including manmade and naturally-occurring.
  • the classification engine is trained using the GA 1 10 by iteratively generating a population (1), applying a fitness function for individuals within the population (2), selecting the best or optimum to "breed” for a next generation (3), breeding the next generation of the population (4), and repeating (beginning at (2)) until a best individual feature space transform emerges (at 5).
  • a number of considerations are or can be included in processing at item 3 of 204.
  • a standardized size training image and resulting standardized chromosome for processing herein a nominal 50x50 pixel array, although, of course, other sizes standardized or not, can be used.
  • a general fitness function algorithm per this disclosure includes applying a mathematical dot product operation of a training image to each chromosome individually to generate a vector having the same length as the number of chromosomes. Then grouping together the dot product vectors for all training images by classification type. Subsequently determining the distances between each classification type.
  • the classification groups determine fitness score (e.g., an individual with classification groups that are farther apart in distance relative to other classification groups has a higher fitness score).
  • the best individual i.e., feature space transform
  • FIG. 3 is an annotated flow diagram 300 illustrating the basic connections and some of the operations of the embedded processor (or computer system) 306 and 306' of the instant disclosure.
  • a camera 302 or other sensor, captures within its field of view a portion of a scene of interest.
  • the camera 302 can comprise any device for acquiring an image but herein comprises a charge-coupled-device (CCD) which is a pixel-based imaging platform.
  • CCD charge-coupled-device
  • the camera 302 can thus operate in any of several ranges of the electromagnetic spectrum. As noted hereinabove the camera, or sensor, 302 can operate in the ultraviolet, infrared, milli-meter wave, visible, microwave, visible or other portion of the electromagnetic spectrum.
  • the type of sensor 302 implemented typically relates to the expected real-time environment in which the sensor 302 will operate (e.g., if mounted to the front of an automobile, truck, or bus it might utilize fog-piercing millimeter-wave sensing circuitry and the objects of interest might include retro-reflective street signs).
  • the camera 302 couples via universal serial bus 304 to the embedded computer system 306 (operating the annotated processes shown at 306'), although manner of coupling between the camera 302 and the embedded computer system 306 can be accomplished by other means, including wireless telemetry for example.
  • the embedded computer system 306 is supplied power via battery supply 308 which provides portability, although in the stationary environment scenario the system 306 could be hard-wired via a transformer to a standard source of electrical power from a power grid.
  • battery supply 308 which provides portability, although in the stationary environment scenario the system 306 could be hard-wired via a transformer to a standard source of electrical power from a power grid.
  • several procedures are performed, as annotated box 306' implies.
  • an image from the camera 302 is filtered and resized (herein nominally to a 50x50 pixel array) to bring the object therein to the fore.
  • the object is then processed by the best individual from the genetic algorithm (i.e., the feature space transform) by multiplying the (foreground) object's 50x50 array, or matrix, by said best individual to generate a classification vector.
  • the genetic algorithm i.e., the feature space transform
  • the classification vector lies within the bounds of a known classification type, then classify the foreground object as one of the objects of interest that are sought to be identified.
  • the thusly classified objects can be saved to storage (per FIG. 3) and/or can be sent via wireless telemetry to one or more remote locations for follow-up or can trigger a more or less immediate local response.
  • a notice or alert signal can be dispatched for follow-up.
  • a notice can comprise a simple visual indication (e.g., a display-based light signal or "flag" or similar), a relay of the image that provided the successful identification, an auditory alert, a tactile alert, and the like.
  • a warning alarm or spotlight can be rapidly activated illuminating the object of interest automatically while also recording the image(s) to a memory location and/or sending an alert signal to remote location(s) if desired.
  • the system 306 can be coupled to systems for confirming locations (e.g., GPS coordinates) and include time- stamp, or "clock," features for helping to determine the location and time an object of interest was identified.
  • FIGS. 4A and 4B depict in a relative scale the same image 400 captured that originally includes 229x209 pixels 402 (following background / foreground filtering, or segmentation) and then is resized to a 50x50 pixel thumbnail matrix 404 (or 2,500 element vector) for processing and analysis by the feature space transform, respectively.
  • FIGS. 4 A and 4B are intended to provide a sense of the computational economy provided by standardizing the processed images.
  • processing the image at the original resolution of 229x209 pixels would produce a 47,861 element vector (which is almost twenty times (20x) the size of the thumbnail matrixes employed herein).
  • the inventors recognize, however, that due to the resizing or standardizing of the captured images that the relative scale, or actual sizes, of the objects recorded is unavailable.
  • the GA and the resulting feature space transforms provided according to the instant disclosure are specific and sensitive enough to overcome this minor obstacle.
  • manual intervention could be applied or the results ignored following additional image processing before declaring a successful identification of an object of interest.
  • FIG. 5 illustrate at 500 two distinct classes of objects; namely, a pedestrian 502 and a person on a recumbent bicycle 504 (A and B) applied via a suitably trained feature space transform (or chromosome) 506 as seen at C to provide an X-Y plot 508 showing the grouping and separation between these distinct objects 502,504 (and also incidentally including several automobiles, not depicted, as outside the desired feature spaces of the pedestrian 502 and the person on a recumbent bicycle 504).
  • a person can appreciate that the classifications for objects 502,504 (and the automobiles) are well defined as being closed grouped (not overlapping) and separated from one another.
  • FIG. 6 depicts in grey-scale for patent purposes (although originally captured in color for ease of review and comparison) a pair of chromosomes (#1 and #2) which upon close examination reveal significant differences therebetween.
  • Chromosomes #1 and #2 represent the feature space that pedestrian 502 (of FIGS. 5A-D) and biker 504 (of FIGS. 5A-D) will be projected onto via the dot operator.
  • FIG. 7 illustrate in a histogram format a plot of a "bike fit” and a "human fit” for a given chromosome (herein chromosome #17) and reflecting the statistically significant separation therebetween. While nominal overlap occurs it is not adequate to defer use of chromosome #17 in the real-time, field-based, processing operations.
  • FIG. 8 is a photograph 800 of a stationary-mounted field system 802 according to the instant disclosure, wherein the field-based system 802 was provided with relatively heavy-duty battery power (including heat transfer fins for warmer days, although not shown as they are coupled to an upper portion of system 802) and was trained to classify pedestrians, bikers, and cars along a well-traveled pathway.
  • the system 802 can be operated in a persistent stare or a periodic image capture mode and can be used to classify objects of interest and/or to simply indicate that the original scene within the field of view 804 of the system 802 has changed (i.e., basic perimeter surveillance).
  • the training image would be the scene within the field of view 804 and the GA would generate a feature space transform for the unpopulated scene in the field of view 804. If an image captured did not provide a positive classification one could reasonably conclude that something has entered scene or changed the field of view 804 and appropriate follow-up can be promptly initiated (e.g., personnel dispatched, images saved to memory, illumination and/or alarms activated, etc.). As shown by dashed lines in FIG. 8, the field of view 804 can be used to advantage in that objects within the field of view 804 will always come from a perspective oblique angle from horizontal, namely theta ( ⁇ ) and thus the training images should also always have a similar perspective.
  • FIG. 9 depicts an aerial vehicle 900 having a sensor 902 near or on its nose and illustrating the oblique angle, theta or ⁇ (used to reduce the complexity of object recognition herein) for this particular application wherein the vehicle is assumed to have a downward stare at ⁇ , periodic or persistent, equaling approximately 45 degrees (45°) from horizontal or from the vehicle's present course.
  • the sensor (or camera) 902 As the aerial vehicle 900 progresses the sensor (or camera) 902 thus images a discrete field of view 904 and captures images thereof.
  • the images are processed per the real-time (field based) processing described herein and can be captured at any arbitrary rate as long as the resolution of the resulting images is not compromised.
  • the aerial vehicle 900 depicted in FIG. 9 appears to be a relatively large unmanned aerial vehicle (UAV) the real-time processing system is intended to operate with equal, if not enhanced, accuracy on a UAV that is deployed in the field, typically by having its flight initiated by being manually deployed into the air. That said, the systems and techniques described herein can be utilized by any arbitrary air vehicle (including spaced-based, or orbital, platforms) provided the optics used to capture the images are able to adequately resolve the objects they encounter.
  • the sensor or camera 902 can be fixed or articulated, if desired.
  • the processing system can be coupled to existing imaging systems of an air vehicle with a maximum of economy for incidental real-time processing of images.
  • FIG. 10 depicts a low-to-the-ground reconnaissance vehicle 1000, such as those designed, manufactured, and distributed by ReconRobotics, Inc. of Edina, Minn. (U.S.A.) having a generally upward stare (depicted by dashed lines and the angle ⁇ there between) to identify objects of interest in its field of view 1002 from its typically relatively lower perspective during tactical operation.
  • the vehicle 1000 includes real-time, field-based feature space transforms operating on the upwardly looking perspective of the vehicle 1000 that was trained to identify handguns (1004).
  • the perspective used to train and produce the transform would most efficiently use an upwardly looking perspective as well.
  • the systems can be used in conjunction with other, perhaps less computationally efficient, systems; for example, in the realm of facial recognition the systems herein can be used to generically determine the presence of "human faces” and that information could be used to trigger a more computationally expensive effort to specifically identify the person(s) within the image(s).
  • the systems hereof can of course be used in an image post-processing effort as well in a compact, low power platform to review previous acquired images from diverse sources very, very rapidly.
  • the sample rate-to-conclusion/classification for a given object of interest can exceed the ability to place a "new" image into the real-time system.
  • the object of interest can comprise a unique visual identifier (e.g., ground markings, flags, retro-reflective signs, etc.) that are used by a vehicle to return to a designated location.
  • a unique visual identifier e.g., ground markings, flags, retro-reflective signs, etc.
  • an unmanned aerial vehicle could utilize the teachings herein to aggressive and affirmatively image, locate, and identify diverse objects of interest during a given mission and subsequently use the same recognition engine to confirm the designated location and return there and land.
  • a specific and unique sign or structure can be used to trigger a vehicle to stop imaging and drop ordnance or the like.
  • the vehicle could image instruction signs that it interprets as a specific action to take at that location.
  • Example 1 Using at least two images of an object of interest in the foreground of an image, performing at least one of: segmentation of the foreground from the background of said images, scaling the thus isolated object of interest and using the thus isolated object of interest as a training element for a genetic algorithm adapted to generate a set of candidate feature space transforms for said object of interest, and locating an optimum of relatively-best feature space transform for use in a real-time object classification process wherein the real-time object classification process including extracting a so-called "blob" of data points from a newly acquired image from the field, representative in the abstract of the object of interest and applying the optimized or relatively-best feature space transform to categorize and classify the object from the newly acquired image.
  • blob so-called "blob" of data points from a newly acquired image from the field
  • Example 2 According to Example 1 wherein the scaling involves an arbitrary number of imaging pixels in a rectangular configuration, such as 50 x 50 pixels.
  • Example 3 According to Example 1, wherein the at least two images of the object of interest are taken from an angle and/or elevation characteristic of a vehicle (such as an aerial vehicle, a terrestrial vehicle, a submerged or submersible vehicle, or a space- based vehicle) and the vehicle's typical spatial relationship relative to the object of interest.
  • a vehicle such as an aerial vehicle, a terrestrial vehicle, a submerged or submersible vehicle, or a space- based vehicle
  • Example 4 According to Example 1, further including iteratively applying a fitness function to the candidate feature space transforms and initially selecting only those feature space transforms that provide close grouping of a single class of objects and/or reasonable separation among different classes of objects relative to other of the feature space transforms.
  • Example 5 According to Example 4, further including finally selecting a single feature space transform that meets at least one of superior object identification or superior class identification relative to other feature space transforms.
  • Example 6 further including loading the finally selected feature space transform onto an embedded controller coupled to an imaging device so that an output signal from the controller and imaging device is provided, and programming the embedded controller to perform mathematical dot product calculations on the output signal so a single value relating to the identity of the object of interest is provided that relates to the class of the object of interest.
  • Example 7 further including, based on the single value a first object is identified and then performing the procedure again for at least a second object of interest.
  • Example 8 According to Example 6, wherein the embedded controlling comprises an OMAP processor (version 3 models 34x, 35x, and 36x, or the like) running ARM architecture.
  • OMAP processor version 3 models 34x, 35x, and 36x, or the like
  • Example 9 A run-time embedded computing platform for an object classification system, comprising: a mounting location disposed one of on or in a structure and affording access to a desired field of view from said structure; a sensor disposed, configured, and mechanically coupled to access the desired field of view; a compact, locally- and low-powered computer system coupled to the mounting location and to the sensor; and a genetically-defined feature space transform operably coupled to the computer system and adapted to receive an extracted object from an image from the sensor, apply the genetically-defined feature space transform thereto, and categorize and determine whether the extracted object is an object of interest.
  • Example 10 A run-time embedded computing platform for an object classification system according to Example 9, wherein the computer system further comprises one of: a telemetry circuit for sending information from the computing platform to remote locations and a removable local storage medium.
  • Example 11 A run-time embedded computing platform for an object classification system according to Example 9 or Example 10, wherein the structure comprises an aerial vehicle, a terrestrial vehicle, and a stationary location.
  • Example 12 A run-time embedded computing platform for an object classification system according to Examples 9 through 11, wherein the feature space transform performs a mathematical dot product calculation to produce a single point for comparison to other known points for the same object of interest.
  • Example 13 A run-time embedded computing platform for an object classification system according to Examples 9 through 12, wherein the sensor comprises a visible light sensor.
  • Example 14 A run-time embedded computing platform for an object classification system according to Examples 9 through 13, wherein the extracted object is extracted using background and foreground filtering and/or segmentation techniques.
  • Example 15 A run-time embedded computing platform for an object classification system according to Examples 9 through 14, wherein the compact, locally- and low-powered computer system comprises an OMAP version 3 processor running an ARM-based instruction set.
  • Example 16 A run-time embedded computing platform for an object classification system according to Examples 9 through 13, wherein the extracted object is extracted using background and foreground filtering and/or segmentation techniques.
  • Example 15 A run-time embedded computing platform for an object classification system according to Examples 9 through 14, wherein the compact, locally- and low-powered computer system comprises an OMAP version 3 processor running an ARM-based instruction set.
  • Example 16 A run-time embedded computing platform for an object classification system according to Examples 9 through 13, wherein the extracted object is extracted using background and foreground filtering and/or segmentation techniques.
  • Example 15 A run-time embedded computing platform for an object classification system according to Examples 9 through 14,
  • a compact, low- and self-powered field-based object classification system comprising: means for distinguishing a specific object optically using a sensor of arbitrary radiation sensitivity, based on the specific object's representation in a feature space that has been defined by a genetically tailored feature space transform in advance of a field- based distinguishing procedure; and wherein the feature space transform is trained and defined to distinguish the specific object posed within a field of view by filtering an image space, rendering the specific object to a standard compact image size, and creating a chromosome image of the standard compact image size; wherein the genetic algorithm develops a transform for said feature space which can be implemented conveniently in a variety of computationally-simple environments.
  • Example 17 A system according to Example 16, wherein at least one of the computationally-simple environments comprises an object recognition engine operating within an embedded system.
  • Example 18 A system according to Example 16 or Example 17, wherein the genetic algorithm develops the transform over between about two thousand (2,000) and over ten thousand (10,000) discrete generations or until the transform meets preselected criteria of a fitness function.
  • Example 19 A system according to Examples 16 through 18, wherein the arbitrary radiation sensor includes at least one of the following: a visible-wave radiation sensor, a laser radiation sensor, a microwave radiation sensor, a millimeter wave radiation sensor, and an acoustic radiation sensor.
  • the arbitrary radiation sensor includes at least one of the following: a visible-wave radiation sensor, a laser radiation sensor, a microwave radiation sensor, a millimeter wave radiation sensor, and an acoustic radiation sensor.
  • Example 20 A system for object recognition utilizing genetically-defined feature space transforms, the system comprising: a pre-processing system for genetic training of the transform, comprising of: an input port adapted to receive data defining a two or more dimensional image; an image analyzing module configured to receive the data and to extract the areas of interest from the image to create training images; a processing module that receives the training images and generates a genetic algorithm based on a genetic program; an output port to send a completed genetic feature space transform;
  • a pre-processing system for genetic training of the transform comprising of: an input port adapted to receive data defining a two or more dimensional image; an image analyzing module configured to receive the data and to extract the areas of interest from the image to create training images; a processing module that receives the training images and generates a genetic algorithm based on a genetic program; an output port to send a completed genetic feature space transform;
  • Example 21 A processing system for application of the genetic feature space transform, comprising: an input port to take in the genetically-defined transform matrix; and a processor to perform a dot product of the transform to create a vectorized value, wherein the vectorized value is subsequently compared utilizing a fitness function to various categories and is then classified, and wherein the output comprises a classified object determined by pre-set categories.
  • Example 22 A non-transitory computer-readable media providing instructions, comprising: instructions for receiving via an input port to receive a genetically-defined feature space transform matrix for a specific object to be identified; instructions for applying via a processor a dot product calculation of the genetically-defined feature space transform to create a vectorized value; instructions for comparing the vectorized value via a fitness function to various categories; instructions for classifying the value of the feature space transform based on tightness of a grouping of the vectorized value relative to other vectorized values for the same specific object to be identified; and instructions for determining if the output qualifies to be classified as the object to be identified object per pre-set categories.
  • Example 23 A pre-processing and training subsystem for generating genetically- defined feature space transforms for particular objects of interest, comprising: a computer system adapted to receive and process training images of specific objects of interest via a system input port; a genetic program coupled to the computer system and adapted to generate populations of genetic algorithms for assessment in producing a suitable genetic algorithm; means for evaluating the applying the training images to the populations of GAs and producing feature space transforms; means for evaluating the fitness of the features space transforms and when a threshold of fitness is met or exceeded, generating an output feature space transform for real-time field processing on a compact, self- and low-powered computer system disposed in one of a stationary location, on an aerial vehicle, and on a terrestrial vehicle.
  • Example 24 A pre-processing and training subsystem according to Example 23, wherein the training images comprise a video stream of images.
  • Example 25 A pre-processing and training subsystem according to Example 23 or Example 24, wherein the training images depict more than one type of object of interest.
  • Example 26 A pre-processing and training subsystem according to Examples 23 through 25, wherein the aerial vehicle comprises an unmanned aerial vehicle.
  • Example 27. A pre-processing and training subsystem according to Examples 23 through 26, wherein the stationary location comprises one of a manmade structure and a natural structure.
  • Example 28 A pre-processing and training subsystem according to Examples 23 through 27, wherein the system generates genetically-defined features space transforms for two different applications.
  • Example 29 A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field, comprising: a standard embedded processor running a relatively commonly-available image processing architecture; a sensor coupled to the processor for one of substantially constantly and periodically capturing images in the field within a field of view of the sensor; a local source of power coupled to the processor; a feature space transform coupled to the processor and defined by a genetic algorithm to classify at least one object of interest appearing in the captured images, subsequent to filtering and standardizing the image to a predetermined array size, and to produce as an output signal a single coordinate point following a mathematical dot product calculation on an output signal line in the event that a pre-defined object of interest appears in one or more of the captured images.
  • Example 30 A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29, wherein the relatively commonly-available image processing architecture comprises components from previous generations of commercially available so-called smart phones.
  • Example 31 A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 or Example 30, wherein the relatively commonly-available image processing architecture comprises components such as the one of the generations of an Open Multimedia Applications Platform (OMAP) and the OMAP is coupled to and capable of running at least one of the ARM instruction set variants.
  • OMAP Open Multimedia Applications Platform
  • Example 32 A run- time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 through Example 31, wherein the feature space transform is trained to identify and classify at least two distinct objects of interest.
  • Example 33 A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 through Example 32, wherein the output signal further comprises an alert signal to at least one remote location.
  • Example 34 A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to Example 33, wherein the alert signal comprises one of a visual signal, an audible signal, and a tactile, signal.
  • Example 35 A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to Example 34, wherein the alert signal originates and can be perceived from a location or vehicle near the subsystem.
  • Example 36 A method of generating a genetically-defined feature space transform, comprising: a. exposing an image processing engine to at least two views of a common object of interest, wherein the at least two views substantially share an oblique perspective angle; b. processing the at least two views with a genetic algorithm to produce a plurality of individuals for later processing of generations, if needed; c. generating a plurality of candidate feature space transforms; and d. evaluating the candidate feature space transforms using a fitness function and if an optimum or supra-threshold fitness value does not result, repeating b. through d. but in the event that an optimum or a supra-threshold does result, then e. saving to memory the feature space transform for use in one of a real-time field-based environment and a controlled, interior environment.
  • Example 37 A method according to Example 36, wherein the at least two views depict one of: an aerial vehicle; a motorized weapon; a flag; an animal; a building; and a terrestrial vehicle.
  • Example 38 A method for training and deploying a reconnaissance vehicle having a compact, low- and locally-powered object classification engine coupled thereto, in a potentially hostile environment, comprising: generating a compact optimized feature space transform from a series of training images of each of at least one object of interest that is expected or might be encountered in a potentially hostile environment, wherein the series of training images share a perspective derived from a type, velocity, and/or size of a reconnaissance vehicle; deploying the reconnaissance vehicle into view of the potentially hostile environment with the compact optimized feature space transform operably coupled to a compact, low- and locally-powered object classification engine further coupled to a sensor having a field of view including the potentially hostile environment; and gathering and processing discrete images of a sequence of image of the potentially hostile environment with the classification engine and provided an output signal when an object of interest is successfully classified and identified.
  • Example 39 A method according to Example 38, wherein the output signal at least one of: i) is stored to a memory location and ii) triggers at least one of the following actions: activating a local and/or remote visual, audio, or tactile alarm signal; sending global positioning system coordinates to a remote location for follow-up; and one of storing in memory or sending a thumbnail images of the object of interest via wired or wireless telemetry to at least one remote location.
  • Example 40 A method according to Example 38 or Example 39, wherein the sensor is sensitive to visible light.
  • Example 41 A method according to Example 38 through Example 40, wherein the reconnaissance vehicle comprises one of a terrestrial vehicle wherein the series of training images share a generally upwardly- looking perspective and an aerial vehicle wherein the series of training images share a generally downward-looking perspective.
  • Example 42 A method according to Example 41, wherein the terrestrial vehicle can be deployed via a manual throwing motion to a desired initial position within or proximate the potentially hostile environment.
  • Example 43 A method according to Example 41, wherein the aerial vehicle be deployed via a manual throwing motion launch the aerial vehicle toward the potentially hostile environment, and wherein the aerial vehicle can one or autonomously navigate toward the hostile environment or can be remotely controlled toward the hostile environment.
  • Example 44 A method according to Example 43, further comprising: deploying at least one structure having a unique perspective when viewed from sensor coupled to the deployed aerial vehicle at or near a desired landing location or terminal location; guiding or steering the aerial vehicle so the sensor's field of view includes the at least one structure; operating the classification engine to identify the at least one structure; and landing the aerial vehicle at or near the desired landing location or terminal location.
  • Example 45 A method according to Example 38 or 39, wherein the sensor comprises one of: a milli-meter wave sensor, an ultraviolet sensor, and an infrared sensor.
  • Example 46 A method according to Example 38 or 39, wherein the output signal is conveyed to one or more aerial vehicles for action vis-a-vis the object of interest.
  • Example 47 A method according to Example 38 or 39, wherein the output signal is conveyed to ground-based troops for action vis-a-vis the object of interest.
  • Example 48 A method according to Example 38 or 39, wherein the potentially hostile environment comprises one of: a portion of a building, at least one room within a building, and an entire building.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Image Analysis (AREA)

Abstract

The instant disclosure relates to use of genetic algorithms that produce feature space transforms that are individually trained/evaluated extensively a priori to identify specific objects or features of such specific objects (e.g., man-made structures such as vehicles, buildings, flags, and the like and natural-objects such as human facial features, animals, and the like). A resulting optimized feature space transform results and can be implemented as a very compact algorithm operating on a small reconnaissance vehicle having limited processing capability, for example. The feature space transform can be utilized to identify diverse and distinct objects (e.g., two, five, ten, or more) from a single vehicle (or stationary location). Thus, if the objects of interest include a weapon, a vehicle, and/or a particular building the single vehicle can alternatively apply the single feature space transform upon a scene of interest and distinguish with a high degree of specificity and sensitivity which of the plurality of objects is present. The conclusions drawn from the transform can be wirelessly transmitted for appropriate follow-up, stored to a memory location, and/or quickly acted upon by the vehicle.

Description

APPARATUS AND METHODS FOR OBJECT RECOGNITION USING A GENETICALLY-DEFINED FEATURE SPACE TRANSFORM
CROSS REFERENCE TO RELATED APPLICATION
The present application claims the benefit of and priority to U.S. provisional patent application no. 61/513,279 captioned, "Method for Object Recognition Using a Genetically-defined Feature Space Transform," and filed 29 July 201 1 , the contents of which are fully incorporated herein by reference.
FIELD OF THE INVENTION
The present invention relates to apparatus and methods for object classification from visual information using a feature space transform genetically-defined off-line and then implementing the feature space transform on a low- and locally-powered low cost basic image processing platform (e.g., adapted from prior generation cellular phones) as well as apparatus and methods for generating the feature space transform; that is, within the field of machine vision, the efficient classification of specific pre-defined objects in near real-time in-situ is a significant problem addressed and solved herein.
BACKGROUND OF THE INVENTION
Existing solutions to the problem of efficient classification of objects in-situ require large amounts of computing power and/or time for execution, precluding implementation on a low-cost embedded system. The automation of this process in an efficient method and has applications everywhere from passive statistics gathering systems to quality control systems and vision-driven control systems, among others.
In a genetic algorithm ("GA"), a group of strings encode candidate solutions to an optimization problem that over generations evolves toward improved better solutions. Traditionally, solutions are represented in binary as strings of 0s and I s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified (recombined, crossed-over, and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the application of the GA. Commonly, the GA can be selected to cease operating when a maximum number of generations have been produced or a satisfactory fitness level has been reached for the population. If the GA has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.
A typical genetic algorithm requires: a genetic representation of the solution domain, and a fitness function to evaluate the solution domain.
As used herein the following terms (in italics) are applied with the following definitions: Genetic Algorithm (GA): an algorithm that uses concepts from genetics and evolution to deliver a solution.
Individual: a competing solution in a genetic algorithm.
Chromosome: a component of the solution in an individual or a series of genes.
Alleles: the set of possible states a single gene can take and can be numerical or symbolic (for example: 152 or "addition"). Biological equivalent: Adenine, Thymine, Cytosine, Guanine in DNA.
Fitness: a measure of the ability of an individual in the genetic population to solve the desired problem and is yielded by a fitness function (defined below).
Fitness Function: a function that explicitly defines what constitutes a desirable solution. This function is used to evaluate and deliver the fitness.
Feature Space: an abstract space in which the features of an object are more readily quantifiable.
Basis Vectors: the set of vectors which define a coordinate system - all other vectors can be defined as a combination of these vectors.
Blobs: the representation of an object as a binary image (black and white). SUMMARY OF THE INVENTION
The instant disclosure relates to use of genetic algorithms that are individually trained extensively a priori to identify specific objects or features of such specific objects (e.g., man-made apparatus such as vehicles, buildings, signs, flags, and the like and natural-objects such as animals and the like). The resulting feature recognition set typically includes a very compact algorithm that can be implemented in or on a small reconnaissance vehicle having limited processing capability, for example. The operative algorithms can be grouped or utilized to identify diverse and distinct objects (e.g., two, five, ten, or more) from the vehicle. Alternatively, the feature recognition set or system can be deployed as a compact self-contained unit and mounted to a structure having a desired field of view. Thus, if the objects of interest which can include a weapon, a person, a vehicle, and/or a particular building the feature recognition set single (on the vehicle or mounted in a stationary location) can alternatively apply the diverse algorithms upon a scene of interest and distinguish with a high degree of specificity and sensitivity which of the plurality of objects is present. The conclusions drawn from the algorithm can be wirelessly transmitted for appropriate follow-up, stored to a memory location, trigger a remote or local alert or alarm, and/or can be quickly acted upon by the vehicle.
Incorporated herein by reference is U.S. application Pub. No. 2006/024107 Al (the Ί07 application) which provides some insight into the space that the present disclosure expands. That is, the concepts in the Ί07 application are somewhat similar to the solution described, depicted and claimed herein, in that they are both object recognition techniques which make use of a GA at some point, but there are several key differences. The role of the genetic algorithm is completely different in the two systems. In the Ί07 application (the contents of which are incorporated herein), the GA is used only to generate weights for features found through other methods. The system was classifying objects before the implementation of the GA, which was added only to improve distinguishing capability.
In the instant system, the GA is inseparable from the object recognition process. As the feature space transform is the system and the method of distinguishing objects, and the feature space transform is fully evolved through the GA, it would be incorrect to say that the GA plays anything but a core role in the system. In essence, the GA in the Ί 07 patent application assists the object classification, while the GA in the proposed system is the object classification. Object recognition utilizing machine or robotic vision has been a difficult capability to achieve. With man-made objects the task is a very cumbersome for one reason due to the large variety of objects capable of being classified or identified. Organic objects are even more difficult due to the infinite variability of not only the objects themselves but also the infinite variety in positions and deformations of them. If given infinite computing power and infinite memory space, and infinite speed, the task of object identification would still be a tough problem to tackle. With the limitations imposed by the real world especially in the field of mobile robotics, power, speed, and memory space are at a premium. This has left high level image processing capabilities such as object identification far out of reach for all but the specialized laboratory environments. This is less than ideal for many applications-such as robots, security cameras, remote monitoring and others.
The system according to this disclosure requires a video stream of a scene, either from a live camera or from a saved file, as well as hardware for video processing and the (optimized) feature space transform. The system also requires potentially separate hardware for genetic training of the feature space transform. The video must be initially processed to extract images and blobs. This allows a human to discern between the different kinds of objects that are desired for classification.
Armed with training data, the GA generates feature space transforms using random numbers (similar to the concept of genetic mutation). A population of individuals (in this case, an individual consists of a single transform) is formed, and evaluated for fitness. The fitness function must evaluate the transform for all training data, and then analyze the representations of the data. Higher fitness scores are awarded for a tighter grouping within each class, and for greater distances between the classes (for instance, bikers and pedestrians should have substantially different scores from each other, but all bikers should have a similar representation).
The feature space transform itself is a dot product of the representation of the image with transformation matrices (and thus resolves down to a single point value in a grid). The matrix is simply a series of coefficients which are optimized by the GA to produce significant shifts in the feature space representation for defining features of an object's blob. To apply the transform, the blob representation is multiplied element-by- element by the transformation matrices, and the result is summed for each matrix. Each transformation matrix provides a dimension for the feature space.
The genetic algorithm needs to evolve the solutions through a series of generations, in which the solutions are combined through genetic crossover, and compete to provide the stress necessary to guide the evolution of the transform. The algorithm can keep a human user up to date, by calculating the percentage accuracy on the test data.
Once the GA algorithm has sufficiently optimized the transform, the transform can be easily incorporated into an image processing system. At the point where object recognition becomes desirable the system simply performs the feature space transform, and checks the representation against the known values of the objects to be identified.
To develop a genetically-defined feature space transform, you must use a GA. This class of algorithms uses concepts of natural selection and competition to evolve a solution to a problem. The problem and solution parameters must be defined, so the algorithm is capable of assessing the fitness of the potential solutions in the population. By iterating the process of competition and genetic combination, the overall quality of the solutions increases.
In the case of this feature space transform, in one embodiment a GA develops solutions which act as the transform to a feature space. A feature space is a space in which the coordinate system is fundamentally different from the usual Cartesian or polar spatial coordinate system. A classic example of a commonly used space is the Fourier domain. A time-varying sinusoid that stretches to infinity becomes an impulse (essentially, a point) in frequency (Fourier) space.
In one embodiment a feature space is characterized not by a shift from spatial and temporal domains to a frequency domain, but rather a shift from a two-dimensional representation (the "blob" of an object) to a point in a custom-defined feature space, which brings out distinctions between different types of objects. By defining the features space transform that is most effective at bringing out differences between classes of objects, the feature space which most lends itself to the problem is defined. An alternate explanation of the feature space takes the point of view of vector mathematics and the dot product. The multiply and sum method that is employed by the transform is the same as a numerical dot product. In geometry, a dot product is referred to as a projection, because it projects one vector onto another. In essence, by projecting the image of the object onto the feature space the features that differentiate it from the other objects are thrown into stark contrast.
The solutions in the population are the same size as the sample images (a standard size is picked, and the blob of an object must be extracted from an image and resized to that standard size). A series of example images is fed through each solution, and their resulting scores are represented by points on a graph. The fitness of a solution is determined by the tightness of the grouping, and distance between groupings of objects (the first priority is to remove overlaps between groups). The best solution would have a series of very tightly knit groupings (one for each class) that are many standard deviations away from each other. With the form of the solution defined, it is simple to generate a population of solutions with associated fitness scores. The form of the fitness function allows the algorithm to evaluate the solutions, promoting solutions that perform well, while removing solutions that perform poorly. This algorithm will yield a genetically-defined feature space transform. The feature space transform per the present disclosure runs on a compact, very low power, and computationally inexpensive platform (e.g., it can be implemented with prior-generation cell phone image processing firmware), and is simple to implement. The winning or best solution is included in the image processing code as a predefined matrix. When an object is detected, it can be isolated and resized to the predetermined standard size. The object's blob representation is then simply multiplied by the feature space transform element-by-element, and the result is summed together. This is the numerical definition of a dot product, which yields a single point representing the blob in the feature space. By knowing information about the feature space gleaned from the genetic algorithm itself, we can identify which grouping the point belongs to. For example, the x/y bounds of the person on a bike class or a vehicle are known, so if a point falls within those bounds, it is classified as a person on a bike or the vehicle, respectively. An assumption that can further reduce the complexity and computational overhead is that in some, is not most or all, object recognition environments or situations the perspective of the sensor, or camera, will have a predetermined perspective (e.g., an oblique angle of say forty-five degrees (45°) for an aerial vehicle) so the recognition engine only needs to be trained with images of an object of interest from that perspective in order to effectively and efficiently classify same.
Also, the recognition engine is susceptible of utilizing diverse optics including electro-optics, standard magnifying optics for small-scale objects of interest, and the like by simply coupling the optics of choice to the sensing or sensor portion of the system. The foregoing and other aspects, advantages, and capabilities will be further described and depicted hereinbelow as the foregoing is not intended as limiting but rather illustrative of some aspects of the teaching of the instant disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts in flow chart form the pre-processing (or training) phase, captioned as the "In House Processing" and the field processing (or simply, the processing) phase of generating a suitable GA and then applying it in the field in a compact, efficient manner, respectively.
FIG. 2 depicts in annotated flow chart form the training phase for a genetically- defined feature space transform using the previously generated GA, according to the instant disclosure.
FIG. 3 is an annotated flow diagram illustrating the basic connections and some of the operations of the embedded processor of the instant disclosure.
FIGS. 4 A and 4B depict in a relative scale an image captured that originally includes 229x209 pixels (following background / foreground filtering, or segmentation) and then is resized to a 50x50 pixel thumbnail matrix (or 2500 element vector) for processing and analysis by the feature space transform, respectively.
FIGS. 5 (including A, B, C, and D) illustrate two distinct classes of objects; namely, a pedestrian and a person on a recumbent bicycle (A and B) applied via a suitably trained feature space transform (or chromosome) as seen at C to provide an X-Y plot showing the grouping and separation between these distinct objects (and also incidentally including several automobiles as outside the desired feature spaces of the pedestrian and the person on a recumbent bicycle). FIG. 6 depicts in grey-scale for patent purposes a pair of chromosomes (#1 and
#2) which upon close examination reveal significant differences therebetween (but which in practice are typically colored with high contrast for ease of viewing).
FIG. 7 illustrate in a histogram format a plot of a "bike fit" and a "human fit" for a given chromosome (herein chromosome #17) and reflecting the statistically significant separation therebetween.
FIG. 8 is a photograph of a stationary-mounted field system according to the instant disclosure, wherein the field-based system was provided with relatively heavy- duty battery power (including heat transfer fins for warmer days) and was trained to classify pedestrians, bikers, and cars. FIG. 9 depicts an aerial vehicle having a sensor near or on its nose and illustrating the oblique angle, theta or Θ (used to reduce the complexity of object recognition herein) for this particular application wherein the vehicle is assumed to have a downward stare at Θ, periodic or persistent, equaling approximately 45 degrees (45°) from horizontal or from the vehicle's present course. FIG. 10 depicts a low-to-the-ground reconnaissance vehicle, such as those designed, manufactured, and distributed by ReconRobotics, Inc. of Edina, Minn. (U.S.A.) having a generally upward stare (angle Θ) to identify objects of interest from its typically relatively lower perspective during tactical operation.
DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS The present disclosure relates to systems, components, methods, and use of genetically-defined feature space transforms for rapid, efficient, and low-power object recognition in diverse environments and of diverse objects. The feature space transform can be formulated via evolution in advance of deployment of the resulting feature space transform on computationally simple and efficient platforms. The former is referred to as a pre-processing phase and the later a processing phase. The pre-processing phase can utilize any arbitrary computational platform including user interfaces therefor. Such platforms can of include diverse interfaces that include one or more displays (e.g., standard 2D or single-, or multi-, touch-screen displays) so the operator can monitor, interrupt or alter variables toward progress during the pre-processing phase, for example. Manual data entry can occur via known techniques such as capacitive-touch (e.g., using a stylus, manual digit, or the like), mouse-, keyboard-, or voice-activated and -driven input pathways. Data response and retrieval from such platforms can include one or more of visual display, auditory, and tactile feedback to the operator, as is well known in the art. The sensor can be mounted in a stationary location and used to monitor and classify one or more types of objects passing through its field of view in a persistent stare configuration. If a greater field of view is desirable the sensor can be articulated to periodically sample a first field of field of view and at least a second field of view. The arguably complementary fields of view can then be stitched together to provide a virtual, and larger, field of view. Along the same lines, of course, two or more discrete object recognition engines (and accompanying sensors, whether operating at the same or different wavelengths) can be ganged together to provide persistent stare over said larger field of view with the resulting identification and classification of objects passing therethrough aggregated for review and potential follow-up. In addition, while perhaps a bit redundant a dedicated recognition engine operating in one discrete portion of the electromagnetic spectrum could share all or a portion of its field of view with a second (or more) recognition engine(s) monitoring a different portion of the electromagnetic spectrum.
The location could utilize pre-existing structures (manmade and natural) or could be mounted upon a specially erected structure at a desired location. The sensor can be mounted to a moving vehicle traversing air, water, land, and space.
The type of sensors deployed for the processing phase includes nearly every type of pixel-based, or capable of being represented as pixilated data, presently known and later developed (e.g., visible spectrum, infrared, ultraviolet, microwave, radiofrequency, millimeter wave, and could also include acoustic sensors as well). The compact physical dimensions and the very low power requirements allow the system to operate for extended periods in the processing mode using just a local battery source. The battery source can comprise a primary or secondary (rechargeable) battery and the power requirements when implemented on an essentially "smart phone" platform are on the order of five volts (5V) to about twelve volts (12V) with a secondary battery providing 1000 mAh to about 2000 mAh (which could be configured as a primary battery cell). Of course, more than one battery may be used but the design goals including a lightweight, compact, and low-cost platform do not require more than one battery (primary or secondary cell). In one implementation the present real-time, field-based recognition engine can operate reliably coupled to a manually-deployed UAV having a total lift-off weight of between about five pounds (5#) and about ten pounds (10#) or so. In another implementation the field-based recognition engine can be hosted by other very lightweight UAV reconnaissance vehicles without diminishing the performance thereof. The recognition engine as described, depicted, and claimed herein will not inhibit the performance in terms of loitering time above a potentially hostile environment while at the same time providing specific object identification and classification capabilities to the UAV (and thus, via telemetry, rapid response from other theatre-based assets). Once trained to identify a specific object the selected feature space transform will continuously process incoming images from the sensor and when the specific object is identified (after processing the imagery) a single point resulting from a dot product mathematical operation during image processing the coordinates of said point can be sent via telemetry, saved to memory, and/or other communicated to a remote location. At the remote location a visual, audio, tactile, or other type of alert can be generated generally or specifically relating to the type or class of the object just identified and classified.
As noted, more than one object can be accurately identified by a single suitably trained or genetically-defined feature space transform assuming adequate separation between classes and relatively tight grouping for each class is obtainable from the transform.
Also, as noted briefly above contributing to the efficiency of the training and later real-time processing operation of the operational transform is that it is trained using images having a unique perspective. That is, if the intended object or objects used for training will be imaged by a sensor coupled to an manned aircraft or unmanned aerial vehicle an assumption for training and operation would include an oblique angle of, for example, 45 degrees (-45°) downward. If the sensor were coupled to a small reconnaissance vehicle, such one of the vehicles produced by ReconRobotics, Inc. of Edina, Minnesota, U.S.A., the angle used for training would be from a low-to-the-ground location (e.g., perhaps 15°-20° upward). If used on a submersible, the angle could be arbitrarily depending on whether shallow- or deep-water reconnaissance was planned and the types and likely locations of objects of interest to be interrogated relative to the submersible vehicle (or fixed submerged location).
The genetically-defined feature space transform per this disclosure can be described by the following equation:
[Genome] · Object = [Score]
Where the Object is the vector representation of an object and the [Genome] is a set of vectors (N number) the same length as the Object. The [Score] is a set of N numbers. It can be thought of as a much shorter vector in the feature space.
The fitness function is used to determine the usefulness of a given solution. The highest possible fitness score is sought by the genetic algorithm. The fitness function takes the general form of:
Figure imgf000012_0001
Put simply, it is the normalized distance between clouds of classifications. Brooklyn or Euclidean distance can be used. The distance is normalized by the variance of the first cloud. This function assigns higher scores to solutions that yield clouds that are far away (cars and humans should look different), yet with low variance (all humans should yield similar results for optimal confidence).
In each generation, the genetic algorithm applies the feature space transform of each solution in the population to every piece of test data. The fitness of the solution is calculated using the fitness function, and the population is ordered by fitness. Solutions with higher fitness scores are promoted and combined.
Once a solution has been selected for use in the field, implementation is almost trivial. The transform is applied (in the case of a 50x50 pixel thumbnail, this is a 2500 element dot product). The resulting Score is compared to the known classes using the . ~ dist nce{_score,centroidj) . same techuiique as the fitness function. The fitness to each class ( — — ) is calculated, and the minimum value (the class that has the closest centroid) is selected. In practice, this involves a handful of subtractions and a single division (negligible).
The dimensionality of the transform will now be briefly addressed. To clarify the length of the [Genome] and [Score] objects above, a person could imagine how much easier it is to identify an object in three dimensions than two dimensions. The same applies to the transform here, and the lengths of the [Genome] and [Score] objects are the same as the number of dimensions in the feature space. The feature space can be extrapolated past three dimensions, and doing so will further improve the ability of the transform to distinguish between objects. In practice, three dimensions tend to be sufficient.
Turning now to FIG. 1, which depicts in flow chart form 100 the pre-processing (or training) phase, captioned as the "In House Processing" 102 and the field processing (or simply, the processing) phase 104 of generating a suitable GA to generate a qualifying feature space transform and then applying the feature space transform in the field in a compact, efficient manner, respectively. In the pre-processing phase 102 at least two or a longer series of training images of an object of interest to later be accurately classified are applied through a genetic program 108 which then provides a candidate GA 1 10 that passes to an evaluation phase 1 12 and the fitness relative to serving as a operational GA driver for a specific feature space transform is evaluated relative to other generated GAs 1 10. The fitness of the GA 1 10 is either adequate or optimum or not (at 1 14), in the former case the GA 1 10 passes to the field processing (or processing) phase for use in the field in a compact, computationally efficient embedded controller (as described and depicted herein). In the latter case at least another generation of in-house training occurs in an effort to reach an optimum GA for the given object of interest. One aspect of the pre-processing (or in-house) training phase is that for certain scenarios an assumption is made as to the likely perspective of the object of interest relative to the sensor that captures the images in real time in the field. For example, for an aerial vehicle the assumption might include a nominal 45 degree (45°) inclination or oblique angle between sensor and object thus greatly reducing the other possible perspectives that need to be considered and reducing the complexity of both the training and the recognition operations. If this assumption needs to be refined or modified the original real time images can be processed and then post-processed (or additionally processed) using, for example, an affine transformation or algorithm. This might be beneficial if the original assumption were a nominal 45 degree (45°) inclination or oblique angle and the actual angle were on the order of near 90 degrees (90°) for example. Once the GA 1 10 has been used to defined an optimized feature space transfer during the training or pre-processing phase 102 the feature space transform is inserted into a field processing routine 104, desirably disposed in an efficient, low-overhead environment (e.g., both low power, compact electronic files, and lightweight). In this environment an image 1 16 is acquired by the sensor and a so-called blob extraction technique applied (at 1 18) that can include various techniques for isolating the object of interest from background items, deemed noise, and some foreground items, if needed. A resulting blob image 120 is thus produced and it is applied to the feature space transform for rapid processing. The feature space transform can provide several details about the object, such as locating its centroid and outer peripheral edges and after normalizing the image (e.g., to a nominal size like 50x50 pixels or 2500 pixel matrix) performs a mathematical dot product calculation that has as an output a single value. This single value is mapped to a graphical depiction (e.g., a grid) of previously identified and verified objects of interest (at 122) and given the single value's location relative to the grid, an identification declaration occurs (at 124). The identification declaration 124 could be affirmative, negative, or indeterminate. If affirmative, then the object of interest has been located with a high degree of confidence resulting in several possibilities for follow- through. For example, the identification could be merely stored in memory for later review, analysis, or cataloging (perhaps in a stationary sensor environment tracking pedestrian and vehicle traffic) or it could initiate an aggressive response to intercept, interdict, or disable the object of interest (in either a stationary sensor or a roving sensor scenario), or it could trigger an alarm or notice locally or remotely.
FIG. 2 depicts in more detail than FIG. 1 , and in annotated flow chart form 200 the pre-processing (or training) phase for a genetically-defined feature space transform according to the instant disclosure. At 202, like 106 of FIG. 1 , a series of training images are gathered and organized by type of object of interest. As noted above, these training images 202 are most useful if they include a perspective relative to an anticipated sensor location vis-a-vis the object of interest so as to simply and provide a relatively finite solution space for the object of interest. The images might include various objects but the training phase 200 can be utilized for diverse objects including manmade and naturally-occurring. As shown at 204 the classification engine is trained using the GA 1 10 by iteratively generating a population (1), applying a fitness function for individuals within the population (2), selecting the best or optimum to "breed" for a next generation (3), breeding the next generation of the population (4), and repeating (beginning at (2)) until a best individual feature space transform emerges (at 5).
As shown in the annotated item 206 of FIG. 2, a number of considerations are or can be included in processing at item 3 of 204. Thus, a standardized size training image and resulting standardized chromosome for processing (herein a nominal 50x50 pixel array, although, of course, other sizes standardized or not, can be used). A general fitness function algorithm per this disclosure includes applying a mathematical dot product operation of a training image to each chromosome individually to generate a vector having the same length as the number of chromosomes. Then grouping together the dot product vectors for all training images by classification type. Subsequently determining the distances between each classification type. And then, using the classification groups to determine fitness score (e.g., an individual with classification groups that are farther apart in distance relative to other classification groups has a higher fitness score). Following the processing of 204 (including 206) the best individual (i.e., feature space transform) is saved for usage in the field application (during the processing phase).
FIG. 3 is an annotated flow diagram 300 illustrating the basic connections and some of the operations of the embedded processor (or computer system) 306 and 306' of the instant disclosure. In the field or in real-time processing (also simply processing elsewhere herein) a camera 302, or other sensor, captures within its field of view a portion of a scene of interest. The camera 302 can comprise any device for acquiring an image but herein comprises a charge-coupled-device (CCD) which is a pixel-based imaging platform. The camera 302 can thus operate in any of several ranges of the electromagnetic spectrum. As noted hereinabove the camera, or sensor, 302 can operate in the ultraviolet, infrared, milli-meter wave, visible, microwave, visible or other portion of the electromagnetic spectrum. The type of sensor 302 implemented typically relates to the expected real-time environment in which the sensor 302 will operate (e.g., if mounted to the front of an automobile, truck, or bus it might utilize fog-piercing millimeter-wave sensing circuitry and the objects of interest might include retro-reflective street signs). The camera 302 couples via universal serial bus 304 to the embedded computer system 306 (operating the annotated processes shown at 306'), although manner of coupling between the camera 302 and the embedded computer system 306 can be accomplished by other means, including wireless telemetry for example. The embedded computer system 306 is supplied power via battery supply 308 which provides portability, although in the stationary environment scenario the system 306 could be hard-wired via a transformer to a standard source of electrical power from a power grid. Within the embedded computer system 306 several procedures are performed, as annotated box 306' implies. Thus, an image from the camera 302 is filtered and resized (herein nominally to a 50x50 pixel array) to bring the object therein to the fore. The object is then processed by the best individual from the genetic algorithm (i.e., the feature space transform) by multiplying the (foreground) object's 50x50 array, or matrix, by said best individual to generate a classification vector. Subsequently, assuming that the classification vector lies within the bounds of a known classification type, then classify the foreground object as one of the objects of interest that are sought to be identified. The thusly classified objects can be saved to storage (per FIG. 3) and/or can be sent via wireless telemetry to one or more remote locations for follow-up or can trigger a more or less immediate local response.
In addition, upon successful identification of an object of interest a notice or alert signal can be dispatched for follow-up. Such a notice can comprise a simple visual indication (e.g., a display-based light signal or "flag" or similar), a relay of the image that provided the successful identification, an auditory alert, a tactile alert, and the like. Also, especially for perimeter surveillance in a stationary environment, a warning alarm or spotlight can be rapidly activated illuminating the object of interest automatically while also recording the image(s) to a memory location and/or sending an alert signal to remote location(s) if desired. In either the stationary or roving environment, the system 306 can be coupled to systems for confirming locations (e.g., GPS coordinates) and include time- stamp, or "clock," features for helping to determine the location and time an object of interest was identified. FIGS. 4A and 4B depict in a relative scale the same image 400 captured that originally includes 229x209 pixels 402 (following background / foreground filtering, or segmentation) and then is resized to a 50x50 pixel thumbnail matrix 404 (or 2,500 element vector) for processing and analysis by the feature space transform, respectively. FIGS. 4 A and 4B are intended to provide a sense of the computational economy provided by standardizing the processed images. For example, processing the image at the original resolution of 229x209 pixels would produce a 47,861 element vector (which is almost twenty times (20x) the size of the thumbnail matrixes employed herein). The inventors recognize, however, that due to the resizing or standardizing of the captured images that the relative scale, or actual sizes, of the objects recorded is unavailable. However, the GA and the resulting feature space transforms provided according to the instant disclosure are specific and sensitive enough to overcome this minor obstacle. Of course, in the event of conflict between, for example two similar shaped or characterized but differently-sized objects manual intervention could be applied or the results ignored following additional image processing before declaring a successful identification of an object of interest.
FIG. 5 (A, B, C, and D) illustrate at 500 two distinct classes of objects; namely, a pedestrian 502 and a person on a recumbent bicycle 504 (A and B) applied via a suitably trained feature space transform (or chromosome) 506 as seen at C to provide an X-Y plot 508 showing the grouping and separation between these distinct objects 502,504 (and also incidentally including several automobiles, not depicted, as outside the desired feature spaces of the pedestrian 502 and the person on a recumbent bicycle 504). In assessing the plot 508 a person can appreciate that the classifications for objects 502,504 (and the automobiles) are well defined as being closed grouped (not overlapping) and separated from one another.
FIG. 6 depicts in grey-scale for patent purposes (although originally captured in color for ease of review and comparison) a pair of chromosomes (#1 and #2) which upon close examination reveal significant differences therebetween. Chromosomes #1 and #2 represent the feature space that pedestrian 502 (of FIGS. 5A-D) and biker 504 (of FIGS. 5A-D) will be projected onto via the dot operator.
FIG. 7 illustrate in a histogram format a plot of a "bike fit" and a "human fit" for a given chromosome (herein chromosome #17) and reflecting the statistically significant separation therebetween. While nominal overlap occurs it is not adequate to defer use of chromosome #17 in the real-time, field-based, processing operations.
FIG. 8 is a photograph 800 of a stationary-mounted field system 802 according to the instant disclosure, wherein the field-based system 802 was provided with relatively heavy-duty battery power (including heat transfer fins for warmer days, although not shown as they are coupled to an upper portion of system 802) and was trained to classify pedestrians, bikers, and cars along a well-traveled pathway. The system 802 can be operated in a persistent stare or a periodic image capture mode and can be used to classify objects of interest and/or to simply indicate that the original scene within the field of view 804 of the system 802 has changed (i.e., basic perimeter surveillance). That is, the training image would be the scene within the field of view 804 and the GA would generate a feature space transform for the unpopulated scene in the field of view 804. If an image captured did not provide a positive classification one could reasonably conclude that something has entered scene or changed the field of view 804 and appropriate follow-up can be promptly initiated (e.g., personnel dispatched, images saved to memory, illumination and/or alarms activated, etc.). As shown by dashed lines in FIG. 8, the field of view 804 can be used to advantage in that objects within the field of view 804 will always come from a perspective oblique angle from horizontal, namely theta (Θ) and thus the training images should also always have a similar perspective. One can readily appreciate the advantages to this assumption in that without the assumption virtually infinite perspective would have to be considered in making an accurate, rapid, and precise classification of an object (which is contrary to the low-power, lightweight, compact, and computationally simple structures and techniques taught by this disclosure). FIG. 9 depicts an aerial vehicle 900 having a sensor 902 near or on its nose and illustrating the oblique angle, theta or Θ (used to reduce the complexity of object recognition herein) for this particular application wherein the vehicle is assumed to have a downward stare at Θ, periodic or persistent, equaling approximately 45 degrees (45°) from horizontal or from the vehicle's present course. As the aerial vehicle 900 progresses the sensor (or camera) 902 thus images a discrete field of view 904 and captures images thereof. The images are processed per the real-time (field based) processing described herein and can be captured at any arbitrary rate as long as the resolution of the resulting images is not compromised. While the aerial vehicle 900 depicted in FIG. 9 appears to be a relatively large unmanned aerial vehicle (UAV) the real-time processing system is intended to operate with equal, if not enhanced, accuracy on a UAV that is deployed in the field, typically by having its flight initiated by being manually deployed into the air. That said, the systems and techniques described herein can be utilized by any arbitrary air vehicle (including spaced-based, or orbital, platforms) provided the optics used to capture the images are able to adequately resolve the objects they encounter. In the air vehicle environment, as with the stationary environment, the sensor or camera 902 can be fixed or articulated, if desired. The processing system can be coupled to existing imaging systems of an air vehicle with a maximum of economy for incidental real-time processing of images.
FIG. 10 depicts a low-to-the-ground reconnaissance vehicle 1000, such as those designed, manufactured, and distributed by ReconRobotics, Inc. of Edina, Minn. (U.S.A.) having a generally upward stare (depicted by dashed lines and the angle Θ there between) to identify objects of interest in its field of view 1002 from its typically relatively lower perspective during tactical operation. In FIG. 10 the vehicle 1000 includes real-time, field-based feature space transforms operating on the upwardly looking perspective of the vehicle 1000 that was trained to identify handguns (1004). Thus the perspective used to train and produce the transform would most efficiently use an upwardly looking perspective as well.
While the disclosure has depicted or described primarily air- and land-based vehicles and stationary locations for the innovative systems they can be implemented on submersible vehicles or on submerged structures, if desired. In addition, the systems can be used in conjunction with other, perhaps less computationally efficient, systems; for example, in the realm of facial recognition the systems herein can be used to generically determine the presence of "human faces" and that information could be used to trigger a more computationally expensive effort to specifically identify the person(s) within the image(s). The systems hereof can of course be used in an image post-processing effort as well in a compact, low power platform to review previous acquired images from diverse sources very, very rapidly. Anecdotally, the sample rate-to-conclusion/classification for a given object of interest can exceed the ability to place a "new" image into the real-time system. In one implementation, the object of interest can comprise a unique visual identifier (e.g., ground markings, flags, retro-reflective signs, etc.) that are used by a vehicle to return to a designated location. For example an unmanned aerial vehicle could utilize the teachings herein to aggressive and affirmatively image, locate, and identify diverse objects of interest during a given mission and subsequently use the same recognition engine to confirm the designated location and return there and land. Alternatively a specific and unique sign or structure can be used to trigger a vehicle to stop imaging and drop ordnance or the like. Also, in the event of a loss of telemetry to (but not control of) a vehicle the vehicle could image instruction signs that it interprets as a specific action to take at that location.
EXAMPLES
The following numbered Examples are intended to further illustrate but not limit the scope of the present disclosure and should not be construed as impacting the following claims. Example 1. Using at least two images of an object of interest in the foreground of an image, performing at least one of: segmentation of the foreground from the background of said images, scaling the thus isolated object of interest and using the thus isolated object of interest as a training element for a genetic algorithm adapted to generate a set of candidate feature space transforms for said object of interest, and locating an optimum of relatively-best feature space transform for use in a real-time object classification process wherein the real-time object classification process including extracting a so-called "blob" of data points from a newly acquired image from the field, representative in the abstract of the object of interest and applying the optimized or relatively-best feature space transform to categorize and classify the object from the newly acquired image.
Example 2. According to Example 1 wherein the scaling involves an arbitrary number of imaging pixels in a rectangular configuration, such as 50 x 50 pixels.
Example 3. According to Example 1, wherein the at least two images of the object of interest are taken from an angle and/or elevation characteristic of a vehicle (such as an aerial vehicle, a terrestrial vehicle, a submerged or submersible vehicle, or a space- based vehicle) and the vehicle's typical spatial relationship relative to the object of interest.
Example 4. According to Example 1, further including iteratively applying a fitness function to the candidate feature space transforms and initially selecting only those feature space transforms that provide close grouping of a single class of objects and/or reasonable separation among different classes of objects relative to other of the feature space transforms.
Example 5. According to Example 4, further including finally selecting a single feature space transform that meets at least one of superior object identification or superior class identification relative to other feature space transforms.
Example 6. According to Example 5, further including loading the finally selected feature space transform onto an embedded controller coupled to an imaging device so that an output signal from the controller and imaging device is provided, and programming the embedded controller to perform mathematical dot product calculations on the output signal so a single value relating to the identity of the object of interest is provided that relates to the class of the object of interest.
Example 7. According to Example 6, further including, based on the single value a first object is identified and then performing the procedure again for at least a second object of interest.
Example 8. According to Example 6, wherein the embedded controlling comprises an OMAP processor (version 3 models 34x, 35x, and 36x, or the like) running ARM architecture.
Example 9. A run-time embedded computing platform for an object classification system, comprising: a mounting location disposed one of on or in a structure and affording access to a desired field of view from said structure; a sensor disposed, configured, and mechanically coupled to access the desired field of view; a compact, locally- and low-powered computer system coupled to the mounting location and to the sensor; and a genetically-defined feature space transform operably coupled to the computer system and adapted to receive an extracted object from an image from the sensor, apply the genetically-defined feature space transform thereto, and categorize and determine whether the extracted object is an object of interest.
Example 10. A run-time embedded computing platform for an object classification system according to Example 9, wherein the computer system further comprises one of: a telemetry circuit for sending information from the computing platform to remote locations and a removable local storage medium.
Example 11. A run-time embedded computing platform for an object classification system according to Example 9 or Example 10, wherein the structure comprises an aerial vehicle, a terrestrial vehicle, and a stationary location. Example 12. A run-time embedded computing platform for an object classification system according to Examples 9 through 11, wherein the feature space transform performs a mathematical dot product calculation to produce a single point for comparison to other known points for the same object of interest.
Example 13. A run-time embedded computing platform for an object classification system according to Examples 9 through 12, wherein the sensor comprises a visible light sensor.
Example 14. A run-time embedded computing platform for an object classification system according to Examples 9 through 13, wherein the extracted object is extracted using background and foreground filtering and/or segmentation techniques. Example 15. A run-time embedded computing platform for an object classification system according to Examples 9 through 14, wherein the compact, locally- and low-powered computer system comprises an OMAP version 3 processor running an ARM-based instruction set. Example 16. A compact, low- and self-powered field-based object classification system comprising: means for distinguishing a specific object optically using a sensor of arbitrary radiation sensitivity, based on the specific object's representation in a feature space that has been defined by a genetically tailored feature space transform in advance of a field- based distinguishing procedure; and wherein the feature space transform is trained and defined to distinguish the specific object posed within a field of view by filtering an image space, rendering the specific object to a standard compact image size, and creating a chromosome image of the standard compact image size; wherein the genetic algorithm develops a transform for said feature space which can be implemented conveniently in a variety of computationally-simple environments.
Example 17. A system according to Example 16, wherein at least one of the computationally-simple environments comprises an object recognition engine operating within an embedded system.
Example 18. A system according to Example 16 or Example 17, wherein the genetic algorithm develops the transform over between about two thousand (2,000) and over ten thousand (10,000) discrete generations or until the transform meets preselected criteria of a fitness function.
Example 19. A system according to Examples 16 through 18, wherein the arbitrary radiation sensor includes at least one of the following: a visible-wave radiation sensor, a laser radiation sensor, a microwave radiation sensor, a millimeter wave radiation sensor, and an acoustic radiation sensor.
Example 20. A system for object recognition utilizing genetically-defined feature space transforms, the system comprising: a pre-processing system for genetic training of the transform, comprising of: an input port adapted to receive data defining a two or more dimensional image; an image analyzing module configured to receive the data and to extract the areas of interest from the image to create training images; a processing module that receives the training images and generates a genetic algorithm based on a genetic program; an output port to send a completed genetic feature space transform;
Example 21. A processing system for application of the genetic feature space transform, comprising: an input port to take in the genetically-defined transform matrix; and a processor to perform a dot product of the transform to create a vectorized value, wherein the vectorized value is subsequently compared utilizing a fitness function to various categories and is then classified, and wherein the output comprises a classified object determined by pre-set categories.
Example 22. A non-transitory computer-readable media providing instructions, comprising: instructions for receiving via an input port to receive a genetically-defined feature space transform matrix for a specific object to be identified; instructions for applying via a processor a dot product calculation of the genetically-defined feature space transform to create a vectorized value; instructions for comparing the vectorized value via a fitness function to various categories; instructions for classifying the value of the feature space transform based on tightness of a grouping of the vectorized value relative to other vectorized values for the same specific object to be identified; and instructions for determining if the output qualifies to be classified as the object to be identified object per pre-set categories.
Example 23. A pre-processing and training subsystem for generating genetically- defined feature space transforms for particular objects of interest, comprising: a computer system adapted to receive and process training images of specific objects of interest via a system input port; a genetic program coupled to the computer system and adapted to generate populations of genetic algorithms for assessment in producing a suitable genetic algorithm; means for evaluating the applying the training images to the populations of GAs and producing feature space transforms; means for evaluating the fitness of the features space transforms and when a threshold of fitness is met or exceeded, generating an output feature space transform for real-time field processing on a compact, self- and low-powered computer system disposed in one of a stationary location, on an aerial vehicle, and on a terrestrial vehicle.
Example 24. A pre-processing and training subsystem according to Example 23, wherein the training images comprise a video stream of images.
Example 25. A pre-processing and training subsystem according to Example 23 or Example 24, wherein the training images depict more than one type of object of interest.
Example 26. A pre-processing and training subsystem according to Examples 23 through 25, wherein the aerial vehicle comprises an unmanned aerial vehicle. Example 27. A pre-processing and training subsystem according to Examples 23 through 26, wherein the stationary location comprises one of a manmade structure and a natural structure.
Example 28. A pre-processing and training subsystem according to Examples 23 through 27, wherein the system generates genetically-defined features space transforms for two different applications.
Example 29. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field, comprising: a standard embedded processor running a relatively commonly-available image processing architecture; a sensor coupled to the processor for one of substantially constantly and periodically capturing images in the field within a field of view of the sensor; a local source of power coupled to the processor; a feature space transform coupled to the processor and defined by a genetic algorithm to classify at least one object of interest appearing in the captured images, subsequent to filtering and standardizing the image to a predetermined array size, and to produce as an output signal a single coordinate point following a mathematical dot product calculation on an output signal line in the event that a pre-defined object of interest appears in one or more of the captured images. Example 30. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29, wherein the relatively commonly-available image processing architecture comprises components from previous generations of commercially available so-called smart phones.
Example 31. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 or Example 30, wherein the relatively commonly-available image processing architecture comprises components such as the one of the generations of an Open Multimedia Applications Platform (OMAP) and the OMAP is coupled to and capable of running at least one of the ARM instruction set variants.
Example 32. A run- time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 through Example 31, wherein the feature space transform is trained to identify and classify at least two distinct objects of interest.
Example 33. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to Example 29 through Example 32, wherein the output signal further comprises an alert signal to at least one remote location.
Example 34. A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to Example 33, wherein the alert signal comprises one of a visual signal, an audible signal, and a tactile, signal. Example 35. A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to Example 34, wherein the alert signal originates and can be perceived from a location or vehicle near the subsystem.
Example 36. A method of generating a genetically-defined feature space transform, comprising: a. exposing an image processing engine to at least two views of a common object of interest, wherein the at least two views substantially share an oblique perspective angle; b. processing the at least two views with a genetic algorithm to produce a plurality of individuals for later processing of generations, if needed; c. generating a plurality of candidate feature space transforms; and d. evaluating the candidate feature space transforms using a fitness function and if an optimum or supra-threshold fitness value does not result, repeating b. through d. but in the event that an optimum or a supra-threshold does result, then e. saving to memory the feature space transform for use in one of a real-time field-based environment and a controlled, interior environment.
Example 37. A method according to Example 36, wherein the at least two views depict one of: an aerial vehicle; a motorized weapon; a flag; an animal; a building; and a terrestrial vehicle.
Example 38. A method for training and deploying a reconnaissance vehicle having a compact, low- and locally-powered object classification engine coupled thereto, in a potentially hostile environment, comprising: generating a compact optimized feature space transform from a series of training images of each of at least one object of interest that is expected or might be encountered in a potentially hostile environment, wherein the series of training images share a perspective derived from a type, velocity, and/or size of a reconnaissance vehicle; deploying the reconnaissance vehicle into view of the potentially hostile environment with the compact optimized feature space transform operably coupled to a compact, low- and locally-powered object classification engine further coupled to a sensor having a field of view including the potentially hostile environment; and gathering and processing discrete images of a sequence of image of the potentially hostile environment with the classification engine and provided an output signal when an object of interest is successfully classified and identified.
Example 39. A method according to Example 38, wherein the output signal at least one of: i) is stored to a memory location and ii) triggers at least one of the following actions: activating a local and/or remote visual, audio, or tactile alarm signal; sending global positioning system coordinates to a remote location for follow-up; and one of storing in memory or sending a thumbnail images of the object of interest via wired or wireless telemetry to at least one remote location.
Example 40. A method according to Example 38 or Example 39, wherein the sensor is sensitive to visible light.
Example 41. A method according to Example 38 through Example 40, wherein the reconnaissance vehicle comprises one of a terrestrial vehicle wherein the series of training images share a generally upwardly- looking perspective and an aerial vehicle wherein the series of training images share a generally downward-looking perspective.
Example 42. A method according to Example 41, wherein the terrestrial vehicle can be deployed via a manual throwing motion to a desired initial position within or proximate the potentially hostile environment.
Example 43. A method according to Example 41, wherein the aerial vehicle be deployed via a manual throwing motion launch the aerial vehicle toward the potentially hostile environment, and wherein the aerial vehicle can one or autonomously navigate toward the hostile environment or can be remotely controlled toward the hostile environment.
Example 44. A method according to Example 43, further comprising: deploying at least one structure having a unique perspective when viewed from sensor coupled to the deployed aerial vehicle at or near a desired landing location or terminal location; guiding or steering the aerial vehicle so the sensor's field of view includes the at least one structure; operating the classification engine to identify the at least one structure; and landing the aerial vehicle at or near the desired landing location or terminal location.
Example 45. A method according to Example 38 or 39, wherein the sensor comprises one of: a milli-meter wave sensor, an ultraviolet sensor, and an infrared sensor.
Example 46. A method according to Example 38 or 39, wherein the output signal is conveyed to one or more aerial vehicles for action vis-a-vis the object of interest.
Example 47. A method according to Example 38 or 39, wherein the output signal is conveyed to ground-based troops for action vis-a-vis the object of interest.
Example 48. A method according to Example 38 or 39, wherein the potentially hostile environment comprises one of: a portion of a building, at least one room within a building, and an entire building.
While this disclosure includes various specific embodiments, depictions, and examples they are not intended as limiting but rather as illustrative as to the scope and breadth thereof and for those of skill in the art other minor or trivial and various modifications or substitutions can be made without departing from the true scope of the disclosure, as more fully set forth in the appended claims.

Claims

Claims:
1. A run-time embedded computing platform for an object classification system, comprising: a mounting location disposed one of on or in a structure and affording access to a desired field of view from said structure; a sensor disposed, configured, and mechanically coupled to access the desired field of view; a compact, locally- and low-powered computer system coupled to the mounting location and to the sensor; and a genetically-defined feature space transform operably coupled to the computer system and adapted to receive an extracted object from an image from the sensor, apply the genetically-defined feature space transform thereto, and categorize and determine whether the extracted object is an object of interest.
2. A run-time embedded computing platform for an object classification system according to claim 1 , wherein the computer system further comprises one of: a telemetry circuit for sending information from the computing platform to remote locations and a removable local storage medium.
3. A run-time embedded computing platform for an object classification system according to claim 1 or claim 2, wherein the structure comprises an aerial vehicle, a terrestrial vehicle, and a stationary location.
4. A run-time embedded computing platform for an object classification system according to claims 1 through 3, wherein the feature space transform performs a mathematical dot product calculation to produce a single point for comparison to other known points for the same object of interest.
5. A run-time embedded computing platform for an object classification system according to claim s 1 through 4, wherein the sensor comprises a visible light sensor.
6. A run-time embedded computing platform for an object classification system according to claims 1 through 5, wherein the extracted object is extracted using background and foreground filtering and/or segmentation techniques.
7. A run-time embedded computing platform for an object classification system according to claims 1 through 6, wherein the compact, locally- and low-powered computer system comprises an OMAP version 3 processor running an ARM-based instruction set.
8. A compact, low- and self-powered field-based object classification system comprising: means for distinguishing a specific object optically using a sensor of arbitrary radiation sensitivity, based on the specific object's representation in a feature space that has been defined by a genetically tailored feature space transform in advance of a field- based distinguishing procedure; and wherein the feature space transform is trained and defined to distinguish the specific object posed within a field of view by filtering an image space, rendering the specific object to a standard compact image size, and creating a chromosome image of the standard compact image size; wherein the genetic algorithm develops a transform for said feature space which can be implemented conveniently in a variety of computationally-simple environments.
9. A system according to claim 8, wherein at least one of the computationally-simple environments comprises an object recognition engine operating within an embedded system.
10. A system according to claim 8 or claim 9, wherein the genetic algorithm develops the transform over between about two thousand (2,000) and over ten thousand (10,000) discrete generations or until the transform meets preselected criteria of a fitness function.
1 1. A system according to claims 8 through 10, wherein the arbitrary radiation sensor includes at least one of the following: a visible-wave radiation sensor, a laser radiation sensor, a microwave radiation sensor, a millimeter wave radiation sensor, and an acoustic radiation sensor.
12. A system for object recognition utilizing genetically-defined feature space transforms, the system comprising: a pre-processing system for genetic training of the transform, comprising of: an input port adapted to receive data defining a two or more dimensional image; an image analyzing module configured to receive the data and to extract the areas of interest from the image to create training images; a processing module that receives the training images and generates a genetic algorithm based on a genetic program; an output port to send a completed genetic feature space transform;
13. A processing system for application of the genetic feature space transform, comprising: an input port to take in the genetically-defined transform matrix; and a processor to perform a dot product of the transform to create a vectorized value, wherein the vectorized value is subsequently compared utilizing a fitness function to various categories and is then classified, and wherein the output comprises a classified object determined by pre-set categories.
14. A non-transitory computer-readable media providing instructions, comprising: instructions for receiving via an input port to receive a genetically-defined feature space transform matrix for a specific object to be identified; instructions for applying via a processor a dot product calculation of the genetically-defined feature space transform to create a vectorized value; instructions for comparing the vectorized value via a fitness function to various categories; instructions for classifying the value of the feature space transform based on tightness of a grouping of the vectorized value relative to other vectorized values for the same specific object to be identified; and instructions for determining if the output qualifies to be classified as the object to be identified object per pre-set categories.
15. A pre-processing and training subsystem for generating genetically-defined feature space transforms for particular objects of interest, comprising: a computer system adapted to receive and process training images of specific objects of interest via a system input port; a genetic program coupled to the computer system and adapted to generate populations of genetic algorithms for assessment in producing a suitable genetic algorithm; means for evaluating the applying the training images to the populations of GAs and producing feature space transforms; means for evaluating the fitness of the features space transforms and when a threshold of fitness is met or exceeded, generating an output feature space transform for real-time field processing on a compact, self- and low-powered computer system disposed in one of a stationary location, on an aerial vehicle, and on a terrestrial vehicle.
16. A pre-processing and training subsystem according to claim 15, wherein the training images comprise a video stream of images.
17. A pre-processing and training subsystem according to claim 15 of claim 16, wherein the training images depict more than one type of object of interest.
18. A pre-processing and training subsystem according to claims 15 through 17, wherein the aerial vehicle comprises an unmanned aerial vehicle.
19. A pre-processing and training subsystem according to claims 15 through 18, wherein the stationary location comprises one of a manmade structure and a natural structure.
20. A pre-processing and training subsystem according to claims 15 through 19, wherein the system generates genetically-defined features space transforms for two different applications.
21. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field, comprising: a standard embedded processor running a relatively commonly-available image processing architecture; a sensor coupled to the processor for one of substantially constantly and periodically capturing images in the field within a field of view of the sensor; a local source of power coupled to the processor; a feature space transform coupled to the processor and defined by a genetic algorithm to classify at least one object of interest appearing in the captured images, subsequent to filtering and standardizing the image to a predetermined array size, and to produce as an output signal a single coordinate point following a mathematical dot product calculation on an output signal line in the event that a pre-defined object of interest appears in one or more of the captured images.
22. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to claim 21 , wherein the relatively commonly-available image processing architecture comprises components from previous generations of commercially available so-called smart phones.
23. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to claim 21 or claim 22, wherein the relatively commonly-available image processing architecture comprises components such as the one of the generations of an Open Multimedia Applications Platform (OMAP) and the OMAP is coupled to and capable of running at least one of the ARM instruction set variants.
24. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to claim 21 through claim 23, wherein the feature space transform is trained to identify and classify at least two distinct objects of interest.
25. A run-time, compact, low- and locally -powered contained subsystem for classifying objects in real time in the field according to claim 21 through claim 24, wherein the output signal further comprises an alert signal to at least one remote location.
26. A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to claim 25, wherein the alert signal comprises one of a visual signal, an audible signal, and a tactile, signal.
27. A run-time, compact, low- and locally-powered contained subsystem for classifying objects in real time in the field according to claim 26, wherein the alert signal originates and can be perceived from a location or vehicle near the subsystem.
28. A method of generating a genetically-defined feature space transform, comprising: a. exposing an image processing engine to at least two views of a common object of interest, wherein the at least two views substantially share an oblique perspective angle; b. processing the at least two views with a genetic algorithm to produce a plurality of individuals for later processing of generations, if needed; c. generating a plurality of candidate feature space transforms; and d. evaluating the candidate feature space transforms using a fitness function and if an optimum or supra-threshold fitness value does not result, repeating b. through d. but in the event that an optimum or a supra-threshold does result, then e. saving to memory the feature space transform for use in one of a real-time field-based environment and a controlled, interior environment.
29. A method according to claim 28, wherein the at least two views depict one of: an aerial vehicle; a motorized weapon; a flag; an animal; a building; and a terrestrial vehicle.
30. A method for training and deploying a reconnaissance vehicle having a compact, low- and locally-powered object classification engine coupled thereto, in a potentially hostile environment, comprising: generating a compact optimized feature space transform from a series of training images of each of at least one object of interest that is expected or might be encountered in a potentially hostile environment, wherein the series of training images share a perspective derived from a type, velocity, and/or size of a reconnaissance vehicle; deploying the reconnaissance vehicle into view of the potentially hostile environment with the compact optimized feature space transform operably coupled to a compact, low- and locally-powered object classification engine further coupled to a sensor having a field of view including the potentially hostile environment; and gathering and processing discrete images of a sequence of image of the potentially hostile environment with the classification engine and provided an output signal when an object of interest is successfully classified and identified.
31. A method according to claim 30, wherein the output signal at least one of: i) is stored to a memory location and ii) triggers at least one of the following actions: activating a local and/or remote visual, audio, or tactile alarm signal; sending global positioning system coordinates to a remote location for follow-up; and one of storing in memory or sending a thumbnail images of the object of interest via wired or wireless telemetry to at least one remote location.
32. A method according to claim 30 or claim 31 , wherein the sensor is sensitive to visible light.
33. A method according to claim 30 through claim 32, wherein the reconnaissance vehicle comprises one of a terrestrial vehicle wherein the series of training images share a generally upwardly-looking perspective and an aerial vehicle wherein the series of training images share a generally downward-looking perspective.
34. A method according to claim 33, wherein the terrestrial vehicle can be deployed via a manual throwing motion to a desired initial position within or proximate the potentially hostile environment.
35. A method according to claim 33, wherein the aerial vehicle be deployed via a manual throwing motion launch the aerial vehicle toward the potentially hostile environment, and wherein the aerial vehicle can one or autonomously navigate toward the hostile environment or can be remotely controlled toward the hostile environment.
36. A method according to claim 35, further comprising: deploying at least one structure having a unique perspective when viewed from sensor coupled to the deployed aerial vehicle at or near a desired landing location or terminal location; guiding or steering the aerial vehicle so the sensor's field of view includes the at least one structure; operating the classification engine to identify the at least one structure; and landing the aerial vehicle at or near the desired landing location or terminal location.
37. A method according to claim 30 or claim 31 , wherein the sensor comprises one of: a milli-meter wave sensor, an ultraviolet sensor, and an infrared sensor.
38. A method according to claim 31 or claim 31, wherein the output signal is conveyed to one or more aerial vehicles for action vis-a-vis the object of interest.
39. A method according to claim 30 or claim 31 , wherein the output signal is conveyed to ground-based troops for action vis-a-vis the object of interest.
40. A method according to claim 30 or claim 31, wherein the potentially hostile environment comprises one of: a portion of a building, at least one room within a building, and an entire building.
PCT/US2012/048881 2011-07-29 2012-07-30 Apparatus and methods for object recognition using a genetically-defined feature space transform WO2013019743A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161513279P 2011-07-29 2011-07-29
US61/513,279 2011-07-29

Publications (2)

Publication Number Publication Date
WO2013019743A2 true WO2013019743A2 (en) 2013-02-07
WO2013019743A3 WO2013019743A3 (en) 2013-04-25

Family

ID=47629877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/048881 WO2013019743A2 (en) 2011-07-29 2012-07-30 Apparatus and methods for object recognition using a genetically-defined feature space transform

Country Status (1)

Country Link
WO (1) WO2013019743A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8860682B1 (en) 2013-04-22 2014-10-14 Cypress Semiconductor Corporation Hardware de-convolution block for multi-phase scanning
CN109478241A (en) * 2016-05-13 2019-03-15 努门塔公司 The reasoning and study of input data based on sensorimotor
US10824911B2 (en) * 2016-06-08 2020-11-03 Gopro, Inc. Combining independent solutions to an image or video processing task

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050276459A1 (en) * 2004-06-09 2005-12-15 Andrew Eames Method and apparatus for configuring and testing a machine vision detector
US20070081729A1 (en) * 2005-10-06 2007-04-12 Sony Corporation Image processing apparatus
US20070242856A1 (en) * 2004-06-28 2007-10-18 Canon Kabushiki Kaisha Object Recognition Method and Apparatus Therefor
US20090313192A1 (en) * 2008-06-11 2009-12-17 Aaron Keith Baughman Evolutionary facial feature selection
US20100172555A1 (en) * 2007-04-18 2010-07-08 The University Of Tokyo Feature quantity selection method, feature quantity selection apparatus, image classification method, image classification apparatus, computer program, and recording medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050276459A1 (en) * 2004-06-09 2005-12-15 Andrew Eames Method and apparatus for configuring and testing a machine vision detector
US20070242856A1 (en) * 2004-06-28 2007-10-18 Canon Kabushiki Kaisha Object Recognition Method and Apparatus Therefor
US20070081729A1 (en) * 2005-10-06 2007-04-12 Sony Corporation Image processing apparatus
US20100172555A1 (en) * 2007-04-18 2010-07-08 The University Of Tokyo Feature quantity selection method, feature quantity selection apparatus, image classification method, image classification apparatus, computer program, and recording medium
US20090313192A1 (en) * 2008-06-11 2009-12-17 Aaron Keith Baughman Evolutionary facial feature selection

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8860682B1 (en) 2013-04-22 2014-10-14 Cypress Semiconductor Corporation Hardware de-convolution block for multi-phase scanning
US9377493B2 (en) 2013-04-22 2016-06-28 Parade Technologies, Ltd. Hardware de-convolution block for multi-phase scanning
CN109478241A (en) * 2016-05-13 2019-03-15 努门塔公司 The reasoning and study of input data based on sensorimotor
CN109478241B (en) * 2016-05-13 2022-04-12 努门塔公司 Computer-implemented method of performing inference, storage medium, and computing device
US10824911B2 (en) * 2016-06-08 2020-11-03 Gopro, Inc. Combining independent solutions to an image or video processing task

Also Published As

Publication number Publication date
WO2013019743A3 (en) 2013-04-25

Similar Documents

Publication Publication Date Title
Yuan et al. A distributed visual surveillance system
US8965044B1 (en) Rotorcraft threat detection system
Yandouzi et al. Review on forest fires detection and prediction using deep learning and drones
US11379995B2 (en) System and method for 3D object detection and tracking with monocular surveillance cameras
CN111753594A (en) Danger identification method, device and system
EP3428878A1 (en) Image recognition system
CN111291646A (en) People flow statistical method, device, equipment and storage medium
Hohberg Wildfire smoke detection using convolutional neural networks
Silva et al. Monocular trail detection and tracking aided by visual SLAM for small unmanned aerial vehicles
WO2013019743A2 (en) Apparatus and methods for object recognition using a genetically-defined feature space transform
EP4095812A1 (en) Method for predicting a trajectory of an agent in a vicinity of a self-driving vehicle based on ranking
Owechko et al. Cognitive swarms for rapid detection of objects and associations in visual imagery
CN112633064B (en) Scene recognition method and electronic equipment
Arthi et al. Object detection of autonomous vehicles under adverse weather conditions
Han et al. Human pose classification within the context of near-ir imagery tracking
CN110781730A (en) Intelligent driving sensing method and sensing device
US11587237B2 (en) Image segmention via efficient semidefinate-programming based inference for binary and multi-class Markov Random Fields
CN114359614A (en) System and method with robust classifier for protection against patch attacks
EP3866071A1 (en) Device and method for classifying images using an attention layer
WO2018052496A1 (en) Method for object detection in digital image and video using spiking neural networks
CN117037120B (en) Target perception method and device based on time sequence selection
Brooks et al. Gaussian process models for sensor-centric robot localisation
EP3629237A1 (en) Device and method to improve the robustness against 'adversarial examples'
Mou et al. A novel efficient wildlife detecting method with lightweight deployment on UAVs based on YOLOv7
US20230267749A1 (en) System and method of segmenting free space based on electromagnetic waves

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12819683

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12819683

Country of ref document: EP

Kind code of ref document: A2