EP1425709A2

EP1425709A2 - Model-based object classification and target recognition

Info

Publication number: EP1425709A2
Application number: EP02769950A
Authority: EP
Inventors: Henrik Baur; Helmuth Eggers; Lars KRÜGER; Rainer Ott
Original assignee: EADS Deutschland GmbH
Current assignee: Airbus Defence and Space GmbH
Priority date: 2001-09-15
Filing date: 2002-09-16
Publication date: 2004-06-09
Also published as: CA2460615A1; US8005261B2; WO2003025843B1; DE10145608A1; WO2003025843A2; CA2460615C; WO2003025843A3; US20040267682A1; DE10145608B4

Abstract

The invention relates to a method for the model-based classification and/or target recognition of an object, which comprises the following steps: recording an image of an object, defining a characteristic feature represented by an object part, defining at least one condition, which is linked to said characteristic feature and which indicates the applicability thereof and carrying out the classification and/or target recognition of the objet by recording said characteristic feature, if the condition indicates the applicability of the characteristic feature.

Description

Model-based object classification and target recognition

The present invention relates generally to a model-based object classification and target recognition and in particular to a structure and the processing of models for object classification and position determination.

All previously known methods from the prior art, which use explicit geometry models for matching, extract only a few features from the input data at the same time. There are mutliple reasons for this.

On the one hand, it is difficult to merge different characteristics in such a way that the same initial values have the same meaning. On the other hand, there are purely practical reasons, which are explained in more detail in the following two sections.

Furthermore, the rules for when a feature of a model is to be checked are either as firmly programmed as the feature itself or are determined from the geometry of the object.

The previously known systems, including those of D.G. Löwe in "Fitting Parametrized Three-Dimensional Models To Images", IEEE Transact. on Pattern Analysis and Machine Intelligence, Vol. 13, No. 5, 1991, by L. Stephan et al. in "Portable, scalable architecture for model-based FLIR ATR and SAR / FLIR fusion", Proc. of SPIE, Vol. 3718, Automatic Target Recognition IX, Aug. 1999, and those described in EP-A-622 750 generally have a fixed arrangement of image processing and in particular a fixed arrangement of preprocessing.

According to these known systems, the image is first read in, then it is preprocessed and then the matching is carried out. This leads to, that in the known systems either all preprocessing, the results of which are contained in any model, must be carried out, or that permanently implemented tests have to be carried out that circumvent this.

It is therefore an object of the present invention to provide a method for object classification and target recognition which minimizes the computation effort required and is at the same time more robust.

Another object of the present invention is to provide a method for object classification and target recognition that minimizes the number of preprocessing steps.

These tasks as well as other tasks to be found in the description and figures below are accomplished by a method according to the appended claims.

Embodiments of the invention are explained in more detail with reference to a drawing. 1 shows the process of object detection at the highest level;

Fig. 2 shows the detailed flow of the matching block of Fig. 1;

3 shows an image acquired in the image recording block of FIG. 1;

FIG. 4 shows a region (ROI) surrounding the searched objects, which region consists of a rectangular partial section of the image of FIG. 3; and FIGS. 5a to 5e using the example of the edge receptor as the feature request works.

The present invention is based on the knowledge that certain features are only visible from special views. So z. B. the windows of the cargo hold doors of helicopters only visible from the side, but not from other angles. The same applies to the lighting conditions, which allow the detection of cargo hold doors or other elements of helicopters (such as wheels, carrying load, etc.) only under certain lighting conditions. Therefore, according to the present invention, at least one feature to be recognized is linked to at least one condition or at least one rule. Of course, it is possible to link a large number of features with respective specific conditions and / or to associate several conditions with a single feature to be recognized. Under these circumstances, only those features would have to be extracted from the image in which the respectively linked condition is fulfilled. In other words, object classification and / or target recognition does not have to be carried out for a cargo hold door which, depending on the position of the helicopter, cannot be visible in relation to a camera.

According to the invention, the possibility was found to store various features (so-called “features”, eg edges, surface areas, hot spots) in the model in a simple and consistent manner and to carry out the extraction of these features in an effective manner.

If you want to extract further features in the known image processing systems from the aforementioned prior art, you have to explicitly program their calls including parameter transfer for each application or model. Depending on the system, this can be more or less complex. This rigid sequence consisting of the taking of an image, the segmentation of the captured image and the preprocessing of the image captured by the segmentation is known from EP-A-622 750.

In accordance with the present invention, each feature that can be recognized is provided with a condition that determines its applicability. The algorithm of this condition is freely programmable and not only limited to the geometry of the object. The condition can also examine, for example, the distance of the object to be recognized from the camera, the lighting conditions (e.g. contrast), speed, height, relative position, etc.

By taking one or more of the conditions into account, the redundant work caused by the "invisibility" or "non-detection" of a feature can be omitted and the method made more robust according to the invention at the same time, since missing features do not lead to a poorer evaluation of the model.

According to a further particularly preferred aspect of the present invention, each feature that fulfills a condition and is therefore required in a preprocessing of a sub-step of the image processing is requested by this sub-step. The order of preprocessing and the algorithm of the sub-step are stored in the model (e.g. as the number of a function in a list of available functions). This avoids the unnecessary work in a rigid arrangement of image acquisition, preprocessing and classification / localization.

Since different sub-steps may require the same characteristics (e.g. the characteristics left edge and right edge of an object require the preprocessing "edge image") or partial results from lower preprocessing represent inputs for higher preprocessing (e.g. edge image and wavelet decomposition of the filtered original image, With the help of local wavelet bases, the local properties of a function can be efficiently examined), all "reusable" preprocessing steps are saved in the order of creation, starting with the original image. If a certain preprocessing is required, the image processing makes a "request" of this preprocessing with everyone previous stages of this preprocessing, starting with the original.

The handling of the request is to carry out the preprocessing and to store and provide the result or, if already available, to provide the stored result without performing a recalculation. Thus, as already mentioned, existing preprocessing or series of preprocessing can be called up quickly from a cache. E.g. preprocessing 1 is carried out for a feature A and preprocessing 1, 2 and 3 are required for a further feature B, the buffered preprocessing 1 of feature 1 according to the invention can be accessed, thus reducing the processing time.

These steps make it possible to extract all the features required for the recognition of an object (after a corresponding normalization) and to feed them into the recognition process. So you are no longer tied to a small number of features due to speed or maintenance reasons. Of course, the preprocessing of the system according to the invention also requires time for the calculation, but only the calculations that are absolutely necessary are carried out, since each preprocessing has to be carried out only once. This enables various characteristics to be extracted as long as the total time of all preprocessing does not exceed the maximum runtime.

The method for preprocessing described above can be implemented according to the invention independently of the knowledge that certain features are only visible from special views. In other words, the present preprocessing can be carried out independently of the link to one of the specific conditions, although the combination of the two features has a particularly advantageous effect in terms of the computational complexity and robustness of the system. The method for preprocessing according to the invention is particularly advantageous compared to the prior art. This is what DG Löwe recognizes in "Fitting Parametrized Three-Dimensional Models To Images", IEEE Transact. on Pattern Analysis and Machine Intelligence, Vol. 13, No. 5, 1991, presented method the object sought by means of edges. These edges are expressed as parameterized curves and the free parameters (spatial position and internal degrees of freedom) are determined by an approximation method. The method is relevant in that it stores geometric preprocessing in a cache. However, the cache of the known Löwe method only relates to visibility conditions, while the cache or buffer memory according to the invention is not limited in the type of preprocessing. The visibility conditions are also only determined from the geometry of the object and cannot be freely selected. Otherwise, the Löwe process is a typical representative of processes with permanently implemented preprocessing.

The procedure according to L. Stephan et. al. ("Portable, scalable architecture for model-based FLIR ATR and SAR / FLIR fusion", Proc. Of SPIE, Vol. 3718, Automatic Target Recognition IX, Aug. 1999) extracts features not specified from the radar images (SAR) as well as edges from the infrared images (FLIR images). A separate hypothesis is created with each of these characteristics and these hypotheses are finally merged. The entire preprocessing is implemented in a fixed order in the system, only the geometry models to be found are interchangeable. EP-A-622 750 specifies the exact nature and sequence of the preprocessing.

A currently particularly preferred embodiment of the invention will now be explained with reference to the accompanying FIGS. 1 to 5e. This embodiment can be modified in a manner well known to those skilled in the art and it is in no way intended to limit the scope of the invention to the example below. Rather, the scope of protection is determined by the features of the claims and their equivalents.

1 shows a sequence of object recognition at the top level. In step 1, the acquisition of the image with a camera, loading of a stored image or generation of a VR image takes place in the image recording block. An image acquired in the image recording block of FIG. 1 is shown by way of example in FIG. 3.

In step 2 (ROI creation) there is a simple and quick rough detection of the object in the image, i. H. the specification of a rectangular region that largely encloses the objects sought. The abbreviation ROI (Region Of Interest) denotes this region surrounding the sought objects, which can be seen with reference to FIG. 4. Methods for determining such an ROI are known per se. These include threshold value methods, pixel classification etc. The ROI currently formed must also be assigned to an ROI from the last image.

In step 3, a decision is made as to whether the object in the region of interest has been provided with an ROI for the first time or not. This step is necessary because there are no hypotheses to be tested yet that are assigned to the ROI and therefore the hypotheses cannot yet be checked.

If the decision in step 3 is "yes", the hypothesis initialization takes place in step 4. Here one or more 7-tuples are assigned to an ROI. The 7-tuple consists of the type of object (e.g. model number (in the case of a helicopter 1 = Hind, 2 = Helix, 3 = Bell Ranger, etc.) and the estimated six degrees of freedom, assuming this model class , The initial creation of the six degrees of freedom can be done, for example, by systematic testing. If the decision in step 3 is "no", the hypothesis update is carried out in step 5. In the case of an existing hypothesis, the new position created by the movement of the object in space must be adapted to the position of the object in the image. For this purpose, a movement prediction known in the prior art is carried out by means of a tracker (for example a Cayman filter).

In step 5 of FIG. 1, the matching described in detail with reference to FIG. 2 takes place.

In step 6 of FIG. 1, the 2D-3D pose estimation is implemented. From the change in position of the receptors and the assumed position of the receptors in space (from hypothesis), the change in position of the object in space can be estimated using the 2D-3D pose estimation. Methods for this are known in the prior art (see e.g. Haralick: Pose Estimation from Corresponding Point Data, IEEE Transactions on Systems, Man and Cybernetics, Vol. 19, No. 6, Nov./Dec. 1989).

In step 7 (block "Better") in FIG. 1, the quality of the model is determined. This is necessary because the matching violates the rigidity property of the object. The pose estimation and re-projection ensure rigidity, since errors in individual receptors are averaged and a single pose (6 degrees of freedom) is created for all receptors. Another matching in the same picture is useful to get the best possible result, i. H. to achieve the smallest possible error between the hypothesis and the image. Therefore, in the event of deterioration (or very small improvement), it is assumed that the optimal point has already been reached.

In step 8 of FIG. 1 ("Classification" block), all hypotheses, in particular their quality values, of an ROI are evaluated. The classification results either the decision for a certain class and pose (by selecting or combining pose values of different hypotheses) or the information that the object cannot be assigned to a known class.

In step 9 of FIG. 1, the class, quality and orientation are evaluated. The information from the classification can be displayed to the user in various ways (e.g. location and class as an overlay in the image) or actions can be derived directly (e.g. triggering a weapon). This can be determined after each picture or at larger, regular intervals or when certain quality thresholds or classifications are exceeded or fallen below.

The details of the matching (matching) are explained with reference to FIG. 2.

In step 10 of FIG. 2, rules are checked. The rule of each receptor is evaluated and, based on the result, the receptor is adopted in the 2D representation (graph) or not. Since different rules can exist for different applications that also process any information on the rule result, a geometrically motivated control function is used to describe how the method operates. It should be noted that the parameters of the control function must not only take into account the geometry of the object and its current pose. If available, other information (e.g. position of the sun, horizon line, friend-foe positions, radio beacon, time) can also contribute to the rule result.

The control function of the "Vector Angle" rule contains three parameters that are stored in the model: a, b and. Your result is r.

The control function itself has the following form: (R ^χ -ή cos? =

- zi

1 ß <a ß -a

1 - a ≤ ß ≤ a + b

0 ß> a + b

The vector z is the unit vector in the z direction (viewing direction of the camera). Matrix ß is the rotation matrix from the hypothesis that rotates the model from its original position (parallel to the camera coordinate system) to its current view, x is a vector that describes the average direction of view from the object to the outside (e.g. the outside normal of a surface) ,

If r supplies a value other than 0, the receptor is adopted in the 2D representation. The values between 0 and 1 are available for further evaluation, but are currently not in use.

In step 11 of FIG. 2, the projection of the receptors is carried out.

Step 11 is performed separately (and possibly in parallel) for each receptor that is found in the graph. First, the receptor reference point ß ^{3 is} projected into the image matrix as ß ² .

Matrix ß is the above. Rotation matrix, t is the vector from the origin of the camera coordinate system to the origin of the model coordinate system in the scene (translation vector). Matrix E is the projection matrix or camera model:

fi, 0 0

P = 0 ß ^> , 0

0 0 1 The focal length of the camera, / _sx and / _{sy is} the resolution of the camera in pixels per mm. ß ² is a homogeneous vector (u, v and scaling) in pixels relative to the main camera point. This is converted accordingly into the pixel coordinates x and y.

The receptor projection function is then called up, which projects the receptor-specific data. An example of this is an edge receptor, the start and end points of which are defined in 3-D on the model and are projected into the image matrix using this function in the same way as the reference point.

The 3D points are stored in step 12. A list of hypothesis points is created in 3-D, one or more points per receptor being saved in a defined order. The receptor reference point of each receptor can always be found in the list, additional points are optional. The edge receptor also saves the start and end points.

In step 13 the graph generation is implemented. If this is necessary for the following matching process, a graph is generated from the point cloud of the points projected into the image matrix by tessellation. The method used is known and described in the following article: Watson, D.F., 1981, Computing the n-dimensional Delaunay tessellation with application to Voronoi polytopes: The Computer J., 24 (2), p. 167-172.

In step 14, the 2D matching is carried out, either using the Elastic Graph Matching method by Prof. vd Malsburg or another method with a similar objective. We implemented such a method, which has special properties that are related to the tracking of the object. The process must be the best possible Location of the searched feature near the start position can be found, a tradeoff between feature quality and deviation from the given graph configuration is desirable. In this step it is therefore necessary to scan the image in some way with the application function of the receptor. The match quality of the application function is assigned to each scanned position so that the most favorable position can be determined.

The example of the edge receptor now shows how the feature request works. For this purpose, its algorithm is given as a pseudocode: req = root of the preprocessing tree (5.a) req = request (req, edge image, threshold = 10, sigma = 1) (5.b) req = request (req, distance image, maximum distance = 100 ) (5.c) image = image from tree (req) (5.d) determine chamfer distance along the line (image, line) (5.e)

From the image acquisition (block 1) to the beginning of 5b, the preprocessing cache is only occupied with the original image.

According to the pseudo code 5a (see Fig. 5.a), the pointer req is placed on the root of the tree.

In the request (5.b) (see FIG. 5b) it is determined that there is still no node of the type edge image with the above-mentioned. Parameters there. Then it is generated by means of the registered routine for calculating an edge image.

(5.c) generates the distance image in the same way (see FIG. 5c). (5.d) reads the image from req and (5.e) calculates the quality of the feature by determining the mean distance (in pixels) to an image edge. The values are taken directly from the edge image. For this purpose, reference is made to FIGS. 5d and 5e.

When estimating the next position, the tree iterator (req) is put back to the root in (5.a) and moved on in (5.b) and (5.c) without calculation.

Other receptors that are stored in the model can expand this tree, as the free space on the right side of FIG. 5e is intended to indicate.

The 2D points are stored in step 15 of FIG. 2. The points ß ² after the matching step are stored in a list in the same order as in (12). It is important to ensure that the synchronism of the two lists is guaranteed so that there are no inconsistencies in the matching.

Claims

claims

1. A method for model-based classification and / or target recognition of an object, which comprises the following steps: a) recording an image of an object; b) determining a feature that is part of the object; c) determining at least one condition associated with the feature and indicating the applicability of the feature; and d) performing the classification and / or target recognition of the object by the detection of the feature if the condition indicates the applicability of the feature.

2. The method of claim 1, wherein step b) comprises determining a plurality of features, wherein

Step c) includes determining at least one condition for each of the features, and wherein

Step d) comprises the classification and / or target recognition of the object by detecting the large number of features.

3. The method of claim 1 or 2, wherein the algorithm of the at least one condition is freely programmable.

4. The method according to one or more of claims 1-3, wherein the condition is selected from a group consisting of: geometry of

Object, distance of the object from a camera, lighting conditions, contrast, speed of the object, height of the object, and relative position of the object to a camera.

5. The method according to one or more of claims 1-4, further comprises at least one step for preprocessing for the detection of a specific feature, and wherein before the preprocessing for the particular feature a check is carried out to determine whether the preprocessing for the particular feature was carried out in connection with another feature and, if so, the use of the preprocessing of the other characteristic for the specific characteristic.

6. The method according to claim 5, wherein the preprocessing carried out is stored in a cache memory.

7. The method of claim 5 or 6, wherein the feature is "left edge" or "right edge" of an object, and each of these features includes preprocessing "edge image".

8. The method according to one or more of claims 5-7, wherein all reusable preprocessing steps are stored in the order in which they were created.

9. The method according to one or more of claims 6-8, wherein the cache is not limited in the type of preprocessing.