EP2441047A1

EP2441047A1 - Method and device for the real-time tracking of objects in an image sequence in the presence of an optical blur

Info

Publication number: EP2441047A1
Application number: EP10734231A
Authority: EP
Inventors: Nicolas Livet; Thomas Pasquier
Original assignee: Total Immersion
Current assignee: Total Immersion
Priority date: 2009-06-08
Filing date: 2010-06-04
Publication date: 2012-04-18
Also published as: FR2946446A1; FR2946446B1; WO2010142895A1

Abstract

The invention in particular relates to a method and to a device for the real-time tracking of a representation of objects in an image sequence, at least one image of said sequence comprising an optical blur effect. After identifying a representation of the objects in a first image of the sequence, the identified representation of the objects is tracked (200) in a second image following the first one according to a first object tracking mode using a key image. When the presence of the blur is detected (205) in a third image differing from the first one, the identified representation of the objects is tracked (230) in the third image according to a second object tracking mode differing from said first tracking mode. The attitude of the objects in the third image is then estimated according to the identified representation of the objects in the third image.

Description

Method and device for tracking real-time objects in a sequence of images in the presence of optical blur

The present invention relates to the combination of real and virtual images in real time, in an augmented reality system, and more particularly a method and a device for tracking objects in real time in a sequence of images comprising fuzzy images.

Augmented reality is intended to insert one or more virtual objects in the images of a video stream. Depending on the type of application, the position and orientation of these virtual objects can be determined by external data of the scene represented by the images, for example coordinates directly derived from a game scenario, or by related data. to certain elements of this scene, for example coordinates of a particular point of the scene such as the hand of a player or a decorative element. When the position and the orientation are determined by data related to certain elements of this scene, it may be necessary to follow these elements according to the movements of the camera or the movements of these elements themselves in the scene. The operations of tracking elements and incrustation of virtual objects in the real images can be executed by separate computers or by the same computer.

The purpose of the tracking algorithms used for these purposes is to find very accurately, in a real scene, the pose, that is to say the position and orientation, of an object whose information of geometry is generally available or, equivalently, to retrieve the extrinsic position and orientation parameters of a camera filming this object, thanks, for example, to image analysis.

There are several ways to track an object in a sequence of images, that is, in a video stream. Generally, tracking algorithms, also called target tracking algorithms, use a marker that can be visual or use other means such as sensors, preferably wireless type radio frequency or infrared. Alternatively, some algorithms use shape recognition to track a particular element in an image stream.

The Ecole Polytechnique Fédérale de Lausanne has developed a visual tracking algorithm that does not use a marker and whose originality lies in the pairing of particular points between the current image of a video stream and a keyframe, called a keyframe. in English terminology, obtained at the initialization of the system and a key image updated during the execution of the visual tracking. The principle of this algorithm is described for example in the article entitled "Fusing Online and Offline Information for Stable 3D Tracking in Real Time" - Luca Vacchetti, Vincent Lepetit, Pascal Fua - IEEE Transactions on Pattern Analysis and Machine Intelligence 2004.

The objective of this visual tracking algorithm is to find, in a real scene, the pose of an object whose three-dimensional mesh (3D) is available as a 3D model, or to find, in an equivalent way, the extrinsic parameters. of position and orientation of a camera filming this object, motionless, thanks to image analysis.

The current image is here compared with one or more keyframes recorded to find a large number of matches, or pairings, between these pairs of images to estimate the pose of the object. To this end, a keyframe is composed of two elements: a captured image of the video stream and a pose (orientation and position) of the real object appearing in this image. The keyframes are images extracted from the video stream in which the object to be tracked has been placed manually through the use of a pointing device such as a mouse. Keyframes preferably characterize the pose of the same object in several images. They are created and registered "offline", that is to say out of the permanent regime of the monitoring application. It is interesting to note that for targets or objects of planar type, for example a magazine, these keyframes can be directly generated from an available image of the object, for example in JPEG or bitmap format.

Each offline keyframe includes an image in which the object is present and a pose to characterize the location of that object as well as a number of points of interest that characterize the object in the image. Points of interest are, for example, constructed from a Harris point detector, SURF (Speeded-Up Robust Features), SIFT (acronym for Scale-Invariant Feature). Transform in Anglo-Saxon terminology) or YAPE (acronym for Yet Another Point Extractor in English terminology) and represent locations with high values of directional gradients in the image and a description of variation of the image in the vicinity of these points. Before initiating the tracking application, it is necessary to determine one or more keyframes offline. They are generally images extracted from the video stream, which contain the object to be followed, and which are associated with a position and an orientation of the three-dimensional model of this object. For this, an operator visually matches a wired model to the actual object. The manual preparation phase thus consists in finding a first estimate of the pose of the object in an image extracted from the video stream, which amounts to formalizing the initial affine transformation T _{p → c} , the transition matrix between the reference associated with the image. object followed to the marker attached to the camera. The use of this model makes it possible to establish the link between the coordinates of the points of the three-dimensional model of the object expressed in the reference of the object and the coordinates of these points in the reference of the camera. For tracking planar objects, it is important to note that, equivalently, a single image can be used to construct an offline keyframe. During the initialization of the tracking application, the offline keyframes are processed in order to position points of interest according to the parameters chosen when launching the application. These parameters are specified empirically for each type of application use and allow the detection and matching application to be adapted to obtain a better quality of estimation of the pose of the object according to the characteristics of the application. the real environment. Then, when the real object in the current image is in a pose that is close to the pose of this same object in one of the offline keyframes, the number of matches becomes important. It is then possible to find the affine transformation allowing to fix the three-dimensional model of the object on the real object.

When such a match has been found, the tracking algorithm goes into steady state. The movements of the object are followed by an image on the other and the drifts are compensated by the information contained in the offline key image retained during initialization. It should be noted that for precision purposes, this offline keyframe can be reprojected using the estimated pose of the previous image. This reprojection thus makes it possible to have a key image that contains a representation of the object similar to that of the current image and can thus allow the algorithm to operate with points of interest and descriptors that are not robust to rotations.

The tracking application thus combines two distinct types of algorithm: a detection of points of interest, for example a modified version of Harris point detection or detection of SIFT or SURF points, and a reprojection technique. points of interest positioned on the three-dimensional model towards the plane image. This reprojection makes it possible to predict the result of a spatial transformation of one image on the other, extracted from the video stream. These two combined algorithms allow robust tracking of an object with six degrees of freedom.

In a general way, a point p of the image is the projection of a point P of the real scene with p ~ P ₁ • P _E • T _{p → c} • P where Pi is the matrix of the intrinsic parameters of the camera, ie its focal length, the center of the image and the offset, P _E is the matrix of the extrinsic parameters of the camera, that is to say the position of the camera in space real, and T _{p → c} is the matrix of passage between the reference associated with the object followed towards the marker attached to the camera. Only the relative position of the object relative to the relative position of the camera is considered here, which amounts to placing the reference of the real scene at the optical center of the camera. This results in the relation P - P ₁ - T _{p → c} • P. Since the matrix Pi is known, the tracking problem therefore consists of to determine the matrix T _{p → c} , that is to say the position and orientation of the object with respect to the reference of the camera.

To do this, an algorithm called "error minimization" is used in order to find the best solution for the estimation T _{p → c} by using the set of three-dimensional correspondences on the geometric model and two-dimensional (2D) in the current image and in the keyframe. For example, an RANSAC (RANdom SAmple Consensus) algorithm or PROSAC (acronym for PROgressive SAmple Consensus in English terminology), allowing the elimination of measurement errors (2D / 3D correspondences). wrong) can be combined with a Levenberg-Marquardt algorithm to quickly converge to an optimal solution that reduces the reprojection error.

The applicant has developed a visual tracking algorithm for objects that do not use a marker and whose originality lies in the pairing of particular points between the current (and previous) image of a video stream and a set of Keyframes, obtained automatically when the system is booted. Such an algorithm is in particular described in the French patent application FR 2 911 707. This algorithm makes it possible, in a first step, to identify the object positioned in front of the camera and then to initialize completely automatically without positioning constraints. the process of tracking the object. This algorithm makes it possible in particular to recognize and follow a large number of objects present at the same time in a video stream and thus allows the identification and tracking of targets or objects in a real scene. These objects can be of different geometries and have various colorimetric aspects. By way of example, but in a non-limiting way, they may be textured trays, faces, clothes, natural scenes, television studios or buildings.

However, when the error measurement becomes too large, that is, when the number of matches between the current image and the current keyframe becomes too small, the tracking is stalled (it is assumed that the estimate of the pose of the object is no longer sufficiently coherent) and a new initialization phase is necessary. Moreover, a generally accepted limit of object tracking systems concerns the difficulty of adapting them to applications belonging to a so-called "general public" context. Indeed, the main implementation constraints of these solutions for such applications are, in particular, a limited amount of memory and computing power. In addition, these systems generally require the use of high quality cameras and not low cost cameras such as those provided with PC-type laptops (personal computer acronym in English terminology) and the web-camera , called webcam in English terminology. These low-cost cameras are often equipped with optics of variable quality and are thus very sensitive to external light conditions. They often require significant exposure times.

In such conditions, the rapid movements of the camera, ie the image sensor, and / or the objects present in the real scene often cause optical shake effects, called motion blur in terminology. Anglo-Saxon.

While the use of professional cameras can significantly reduce these effects of blur, these cameras are nevertheless sensitive to the rapid movements of objects on the scene, such as a football hit by a player.

This blurring phenomenon frequently leads to stalls of the tracking applications used.

To counter this phenomenon, it is possible to use shake reduction systems. In the field of photography, various approaches have been developed. In particular, there are stabilizers that equip digital photography devices, in particular reflex cameras. Two kinds of stabilizer are mainly used: the optical stabilizer and the digital stabilizer. They are particularly effective in low light conditions or when the opening time is voluntarily long.

The principle of an optical stabilizer is to link the optical group with an accelerometer type sensor to detect the movements of the camera and slightly move this group accordingly to counteract the movements of the camera.

Digital stabilizers work by changing the framing of the photograph in the image from the sensor. This approach requires the use of a sensor whose resolution is greater than that of the image. Detection of the movements of the camera can be achieved by the use of a gyro accelerometer or by image analysis.

However, these optical or digital stabilization approaches do not meet the needs of tracking algorithms for Augmented Reality applications. In fact, most web cameras and "general public" cameras do not include accelerometer type sensors. In addition, the use of an oversized sensor reduces the size of the overall image and finally allows to stabilize the image only for low amplitude movements. In the context of object tracking systems, not only the movements of the camera are wide but in addition, the objects present in front of the camera can be moving independently. This type of movement, localized in the image, can not be detected by a global approach such as that proposed by the use of a stabilizer. However, there is an approach in the field of image analysis, initially proposed by Jianbo Shi and Carlo Tomasi ("Good Feature to Track" IEEE CVPR 1994), called "KLT feature tracker", to track points. characteristics in a sequence of images and to estimate an optical time flow, or optical flow in Anglo-Saxon terminology, that is to say the pixel displacements (acronym for Picture Element in English terminology) between two images . This method thus aims to find a pixel v in an image J which seems most similar to a pixel u of an image I thus estimating the displacement d of this pixel between the two images. In other words, the coordinates of the pixel v can be expressed as follows: v = u + d = [ux + dx, uy + dy].

Noting that the affine movement of a sub-window between two images I and J can be described by the following relation, / (Jx - "l ι _ / {xt where x represents the coordinates of a point in the sub-window with respect to the center of this sub-window, so that point x moves to Ax + d in the second image with ,

The approach aims to minimize the following function that describes the residual error between two regions that belong to two images I and J,

é ^* ≈≈ Cj ^ [J [Ax + d) - I {y) f M v) dx

where W describes the neighborhood around x and w (x) represents a weighting function such as a Gaussian.

To be effective, this tracking of characteristic points must, however, be coupled to a point-of-interest detector in an initial image. For the implementation of this preliminary step, it is necessary to search for image areas that have a high frequency signature. The points of interest are thus located, in the initial image, on the pixels which have high values of second derivatives on their neighborhood.

An implementation of the search and tracking of these descriptors is proposed in the public library known as OpenCV (acronym for Open Computer Vision in English terminology), developed by the company Intel. This implementation notably proposes the use of a pyramid of subsampled images in order to increase the robustness of the solution to changes of scale when the size of the object in the image varies greatly.

Such a feature element tracking solution, also called template matching in English terminology, makes it possible to follow points of interest by using a portion of the image around the position of this point which makes the repeatability of these points of interest. interest more robust to the effects of blur.

However, the implementation of this solution presents significant constraints. First of all, it only makes it possible to estimate the movements of pixels in an image, that is to say with two degrees of freedom, not to estimate the pose of an object present in the image according to six degrees of freedom. In addition, the calculation time of such matches in successive images is costly in performance. Finally, the position of the points tracked drifts rapidly over time, especially when the texture in the image has similar areas close to each other. It is also accepted that this type of tracking characteristic elements causes local drift phenomena that introduce over time inaccuracies in their position in the image.

Other approaches for determining and correcting optical blur in an image exist. These are for example techniques based on line detection, also called edge detection in English terminology. However, they are often not robust because they require the presence of marked contours. In addition, these contours tend to disappear with radial-type blurring which results from a rotational movement about the axis of view of the camera or from a roll-type rotational movement of the object in the scene. .

Still other approaches aim at estimating, for each pixel of an image, the direction of movement (optical flow). By way of example, it is possible to transform a so-called "spatial" image into a frequency domain by means of a Fourier transform. Such a method is in particular described in the thesis entitled "Visual Motion Estimation based on Motion Blur Interpretation" of Rekleitis banned (1995). However, these approaches are often expensive in terms of calculations and therefore difficult to apply to a real-time context for consumer applications. Moreover, they do not make it possible to obtain easily exploitable information for an object tracking method.

The invention solves at least one of the problems discussed above.

The subject of the invention is thus a method of tracking a representation of at least one object in a sequence of images, in real time, at least one image of said sequence of images comprising at least one optical blur effect , said method comprising the following steps, identifying a representation of said at least one object in a first image of said sequence of images;

tracking said identified representation of said at least one object in a second image of said plurality of images, said second image following said first image, according to a first object tracking mode using a keyframe;

detecting blur in a third image of said image sequence, distinct from said first image;

tracking said identified representation of said at least one object in said third image, according to a second object tracking mode, distinct from said first tracking mode; and,

estimating the pose of said object in said third image according to said identified representation of said at least one object in said third image. The method according to the invention thus makes it possible to follow in real time one or more real objects in a sequence of images, some of whose images comprise an optical blur effect, local or global, while optimizing the necessary resources.

According to a particular embodiment, said step of tracking said identified representation of said at least one object in said second image comprises a step of determining correspondences between a plurality of points of interest of said second image and a corresponding key image. said fuzzy detecting step comprising a step of comparing the number of matches between said plurality of points of interest of said second image and said corresponding keyframe with a threshold.

The method according to the invention thus makes it possible to benefit from the calculations made for the tracking of representations of real objects for the purpose of detecting blur effects. Still according to a particular embodiment, said step of tracking said identified representation of said at least one object in a third image comprises a step of searching for characteristic points in said first or second image, the laying of said at least one object being at least partially determined by reprojection of said characteristic points onto a three-dimensional model of said at least one object. The method according to the invention thus makes it possible to refine the tracking of real objects. Still according to a particular embodiment, said step of tracking said identified representation of said at least one object in a third image comprises a step of searching for characteristic points in a key image corresponding to said third image, the laying of said at least one object at least partially determined by reprojection of said characteristic points on a three-dimensional model of said at least one object. The method according to the invention thus makes it possible to refine the tracking of real objects.

Advantageously, said step of tracking said identified representation of said at least one object in said second image comprises a step of determining a plurality of points of interest in said first and second images, said points of interest being identified as Harris points or SURF, SIFT or YAPE points. Similarly, said step of tracking said identified representation of said at least one object in said second image preferably comprises a step of determining a plurality of points of interest in said first or second image and in a keyframe corresponding, said points of interest being identified as Harris points or SURF, SIFT or YAPE points.

Still according to a particular embodiment, the method is recursively applied to several images of said plurality of images to improve the tracking of real objects. The invention also relates to a computer program comprising instructions adapted to the implementation of each of the steps of the method described above when said program is executed on a computer as well as information storage means, removable or not , partially or completely readable by a computer or a microprocessor comprising code instructions of a computer program for performing each of the steps of this method. The invention also relates to a device comprising means adapted to the implementation of each of the steps of the method described above.

The advantages provided by this computer program, these storage means and this device are similar to those mentioned above.

Other advantages, aims and features of the present invention will emerge from the detailed description which follows, given by way of non-limiting example, with reference to the accompanying drawings in which:

FIG. 1, comprising FIGS. 1a, 1b, 1c and 1d, schematically illustrates different types of blur that may appear in an image;

FIG. 2 schematically illustrates an example of an algorithm combining motion tracking and blur detection to enable objects to be tracked despite the presence of global or local blur in one or more images of a sequence of images in which objects are followed;

FIG. 3 presents a first embodiment of the algorithm illustrated in FIG. 2;

FIG. 4 illustrates the extraction of the 2D / 3D correspondences between a current image and a 3D model by using the tracking of robust elements that are robust to the blur between a current image and the image preceding it in the sequence; and,

FIG. 5 illustrates an exemplary device adapted to implement the invention or a part of the invention.

The aim of the invention is the robust and rapid tracking of one or more objects, in real time, in image sequences that may exhibit temporal optical blur effects. The combination of an algorithm for identifying and tracking objects such as the one developed by the company Total Immersion with a more robust algorithm for monitoring image characteristics with motion blur is here implemented to solve the problems of stalls that can occur in the presence of blur.

As mentioned above, these stalls can be frequent when low quality cameras are used or when movements of real objects in front of the camera are fast. They are most often the consequence of a series of images, generally over a specific period, which exhibit an optical blur effect.

The so-called effects of image blur are generally "global" blurs, most often caused by rapid movements of the camera, more specifically the image sensor, or "local", caused by the rapid movement of objects present in the field of vision.

Figure 1, including Figures 1a, 1b, 1c and 1d, schematically illustrates different types of blur that may appear in an image.

Figure 1a is a schematic representation of a 100-1 image from a sequence of images, for example a video stream from a camera incorporating an image sensor. The image 100-1 here represents a scene 105 in which the objects 110, 115 and 120 are placed. These objects are here static and the camera from which the image 100-1 is derived is stable. The image 100-1 does not present any blur.

Figure 1b shows a 100-2 image similar to 100-1, from the same camera. However, during the capture of the image 100-2, the sensor has moved, causing a global blur on the image. Figure 1c shows a 100-3 image similar to 100-1, from the same camera. However, during the capture of the image 100-3, the object 120 moved rapidly along the translation axis 125, thus causing a local directional blur on the image.

Figure 1d shows a 100-4 image similar to 100-1, from the same camera. However, during the capture of the image 100-4, the object 120 moved rapidly along the axis of rotation 130, thus causing a radial or rotational local blur on the image.

FIG. 2 schematically illustrates an example of an algorithm combining motion tracking and blur detection to enable tracking of objects despite the presence of global or local blur in one or more images of a sequence of images in which the objects are followed. The algorithm illustrated here is implemented on each of the images of the sequence, sequentially.

As illustrated, a first step here is to detect the presence of the object or objects to follow in the images and to follow (step 200). The tracking mode used here is for example a standard object tracking algorithm, in steady state mode (the initialization phase, automatic or not, was previously performed), using so-called "stable" descriptors such as Harris or SIFT, SURF or YAPE type descriptors. The steady state indicates that one or more objects are detected and tracked in the sequence of images from the camera. In this case, the pose of an object is precisely determined in each of the images successively outputted from the image sensor.

Recursive pairings, consisting of determining the corresponding points in successive images, step by step, can be used in this standard tracking mode using the characteristic points of the previous image.

The same goes for the pairings between points of a current image and keyframes. If these two types of pairings are used, the tracking mode is called "hybrid". In this case, the pairings determined between a current image and key images are added to the determined pairings between the current image and the previous image to evaluate the pose.

Recursive pairings are particularly robust to vibration effects, while keyframe matches help to avoid recursive pairing drifts. The use of these two types of pairings thus allows a more robust and stable visual tracking.

These pairings thus give the correspondences between coordinates of points of an image and the coordinates of corresponding points of the three-dimensional geometric model associated with the tracked object. They are advantageously used to estimate the pose of an object in the current image according to the pose of the object in the previous image and / or in the key image used. A next step is to detect the possible presence of blur in the image being processed (step 205), that is to say to detect fast movements of objects in the scene or camera shake. In other words, if one or more objects are present and tracked in the field of the camera, an optical blur detection step is performed, systematically or not. This detection is a measure that makes it possible to determine the presence of optical blur in the current image or in a series of images. It can be based, for example, on the variation in the number of matches between the points of interest used in the standard object tracking mode. If this variation is greater than a predetermined threshold, for a given tracking object or for all objects tracked, the presence of blur is detected.

Advantageously, this step is performed only under certain conditions (step 210), for example by using motion sensors such as accelerometers or gyroscopes, for the case of camera shake, or following the loss of relevant information, especially when a fall in the number of matches between the points of interest used in the standard object tracking mode is observed. In the latter case, steps 205 and 210 are combined.

If it is not necessary to detect the presence of optical blur in the image, the algorithm continues in a conventional manner (step 200).

The step of measuring blur in a sequence of images is important because it makes it possible to determine the exact moment when the tracking of objects of conventional type is no longer suitable for tracking objects and may quickly generate a stall.

Following the optical blur detection step, a test is performed to determine if the image contains an optical blur (step 215). If the measurement is negative, that is to say if no optical blur is detected in the processed image, the algorithm continues in a conventional manner (step 200). If not, a new object tracking mode is used to track objects in blurred images. If the presence of optical blur in the image is detected, a next step is to determine whether the object tracking mode used for tracking objects in blurred images is initialized (step 220).

If this mode has not been initialized, it is (step 225). The initialization consists in particular in creating information relating to the use of a method of tracking characteristic elements that are robust to blurring in a sequence of fuzzy images, in particular to detecting characteristic elements that are robust to blurring in the images. This step may, in some implementations, be performed "offline" at the launch of the application, especially when these features robust to the blur are built directly on offline keyframes.

The mode of tracking characteristic elements that are robust to blurring in a sequence of fuzzy images is then implemented (step 230). By way of illustration, such a mode of tracking characteristic elements that are robust to blurring can be based on the use of KLT type descriptors or else the tracking of lines of strong gradients as previously described. Advantageously, these two solutions are combined to obtain a more robust result.

In this step, at least a portion of the so-called stable descriptors used in the conventional type object tracking (step 200) is replaced by the descriptors determined during the initialization phase of the object tracking mode used to enable the tracking of objects in fuzzy images, more robust to "local" and "global" optical blur effects.

When the optical blur disappears (steps 205 and 215), the standard tracking mode is used again (step 200). Otherwise, the object tracking mode used to track objects in blurred images is maintained (step 230).

According to a first particular embodiment of the algorithm illustrated in FIG. 2, an object tracking algorithm comprising an object identification step, an initialization step depending on the object or objects present in the field camera optics and a tracking step of these objects is combined with a characteristic point tracking algorithm of type KLT, advantageously adapted to the context of tracking objects in a sequence of fuzzy images. An optical blur detection operator in an image is directly extracted from the tracking algorithm.

Figure 3 partially illustrates this first embodiment for tracking objects in a current image 300.

A first step is to identify, or detect, the object or objects to follow present in the field of the camera and initialize the tracking of these objects (step 305). This step implements a known algorithm, such as that developed by the company Total Immersion, presented above, which uses a database containing a large number of descriptors, for example points of interest and descriptors of HARRIS type, SIFT, SURF or YAPE, belonging to a large number of referenced objects 310.

These descriptors are preferably organized into classification trees such as binary decision trees (see for example the article "Keypoint Recognition using Randomized Trees" V. Lepetit and P. Fua, EPFL, 2006) or according to structures with multiple ramifications, also called fern-like decision trees (see for example the article "Fast Keypoint Reconnection using Random Ferns" M. Ozuysal, P. Fua and V. LepetitJ, allowing a simple and fast classification by comparison of intensity image around a point of interest to allow fast and robust identification of one or more objects in the current image.

This detection step also estimates an approximate pose of the recognized objects in the image in order to simplify the initialization step. This estimation also makes it possible to create a so-called current key image, referenced 315, which is then used in the object tracking method.

If at least one object has been recognized, the current keyframe 315 is used to initialize the tracking system. During this initialization, points of interest, for example Harris points, are calculated on the current key image 315 to be used in the tracking of the identified object (s). After being initialized, the object tracking method is started (step

320). This method is here a "hybrid" method that uses a correlation operator, for example a correlation operator of the ZNCC type (acronym Zero-mean Normal Cross Correlation in English terminology) for determining matches between the current image 300 and the current key image 315 and between the current image 300 and the previous image 325, preceding the current image in the sequence of images. This set of correspondences is then used to determine the pose (position and orientation) of the objects followed. It should be noted here that the more these points are numerous and the more precise their position, the more precise the result of the pose estimation.

A next step is to determine if the current image contains an optical blur effect (step 330). According to the embodiment described here, the two sets of matches between the current image 300 and the previous image 325 and between the current image 300 and the current key image 315 are used as an indicator of the quality of the image. current image. When the number of such pairings falls substantially and falls below a threshold, it is considered that at least a portion of the picture contains an optical blur (step 335). Such a threshold may be predetermined or dynamically determined. It is important to note that a substantial drop in the number of matches may also occur in the case where the object partially disappears from the image. However, in this case, the number of points often remains important and the number of matches gradually decreases during the processed image sequence.

If the number of these matches remains greater than the threshold, the tracking of the objects continues in a standard way (step 320).

If, on the contrary, the number of these pairings drops substantially and becomes less than a threshold, a particular mode of monitoring, here the KLT point tracking algorithm, is initialized (step 340).

In this step, the previous image 325 and the previous pose resulting from the tracking algorithm are used to search for characteristic elements to follow, robust to blur. The preceding image is a priori not fuzzy since the fuzzy image detector (step 330) has found a sufficiently large number of matches on this image. The characteristic elements to follow, robust with the blur, called KLT characteristics, are sought in this previous image thanks to the estimate of the derivatives seconds for each pixel in the image. When these second derivatives are important, that is to say greater than a predetermined threshold, in at least one of the two main directions, it is considered that the pixel characterizes a point of interest robust to the blur. These points are stored (reference 345). Then, knowing the pose of the object in the preceding image 325 and knowing the geometric model 400 of the object, it is possible to estimate the reprojection of these KLT characteristics and to extract precise 3D coordinates on the object. 400 geometric model of the object.

In a next step (step 350), the correspondences of the KLT characteristics of the previous image 345 are searched in the current image 300. This characteristic element tracking method as described in the state of the art makes it possible to follow points on successive images. It is particularly robust in identifying pixel movements in different portions of the overall image. The correspondences as illustrated in FIG. 4 are then obtained (reference 355).

FIG. 4 illustrates the extraction of the 2D / 3D correspondences between the current image and the 3D model 400 by using the KLT characteristic tracking between the previous image 325 and the current image 300. It is thus shown that the knowledge of the 2D / 3D correspondences between the previous image 325 and the 3D 400 geometrical model and the construction of the 2D / 2D pairings between the current image and the previous image allows the extraction of 2D / 3D correspondences between the current image and the 3D image. 3D geometric model. These new correspondences allow, as previously described, to estimate the pose of the object in the current image. It should be noted that in Figure 4 it is possible to replace the previous image 325 by a keyframe. This figure thus describes the recursive pairings current image - previous image as well as the current image - key image pairings.

A next step (step 360) is to track objects using the KLT characteristics previously calculated. This step consists in particular in using the recursive correspondences between previous and current images in order to extract a list of matches between the image plane and the geometric model of the object. These matches are known because in step 340, the characteristic elements of the previous image have been reprojected on the geometric model of the object.

Then, matching the KLT characteristics between the points of the current and previous images makes it possible to match the characteristic elements of the current image with geometric points on the model of the object. Finally, a classic minimization algorithm is used to estimate the pose of the object in the current image. It can for example be a Levenberg-Marquardt type approach, combined with a RANSAC algorithm that eliminates bad 2D / 3D matches.

When the number of matches of KLT characteristics is insufficient (step 365) with respect to a predetermined threshold or determined dynamically, it is considered that the object is no longer present in the field of the camera. In this case, the object tracking method is then stalled and a new object detection phase is performed (steps 365 and 305) to detect objects that are potentially in the field of the camera.

It should be noted here that, when the previous and current images contain an optical blur effect, the initialization step (step 340) is not repeated, as illustrated by the dashed arrow between blocks 335 and 350 .

According to a second particular embodiment of the algorithm illustrated in FIG. 2, the initialization of the tracking of characteristic elements in a sequence of images comprising an optical blur effect (step 340 of FIG. 3) is advantageously replaced by an independent follow-up of the previous image. In this case, the KLT characteristics used for the initialization are not estimated on the previous image but by using the current key image, reprojected according to the previous pose, estimated during the tracking step on the image previous. In this way, the KLT characteristics tracked in the current image are similar to those of this reprojected keyframe, which allows a faster detection rate in the successive images of the image sequence. This second embodiment makes it possible to avoid possible errors that would be linked to an erroneous pose estimate on the previous image as well as possible occultation problems of the object, for example when the hand of a user passes in front of a real object. It should be noted that it is possible to combine the two described embodiments in order to obtain more robust results for object tracking. However, such a combination increases the costs in terms of calculation.

A device adapted to implement the invention or a part of the invention is illustrated in Figure 5. The device shown is preferably a standard device, for example a personal computer.

The device 500 here comprises an internal communication bus 505 to which are connected:

a central processing unit or microprocessor 510 (CPU, acronym for Central Processing Unit in English terminology);

- A read-only memory 515 (ROM, acronym for Read OnIy Memory in English terminology) may include the programs necessary for the implementation of the invention;

a random access memory or cache memory 520 (RAM, acronym for Random Access Memory in English terminology) comprising registers adapted to record variables and parameters created and modified during the execution of the aforementioned programs;

a communication interface 540 adapted to transmit and receive data to and from a communication network. The device 500 also preferably has the following elements:

a hard disk 525 which may comprise the aforementioned programs and data processed or to be processed according to the invention; and

a memory card reader 530 adapted to receive a memory card 535 and to read or write to it data processed or to be processed according to the invention. The internal communication bus allows communication and interoperability between the various elements included in the device 500 or connected to it.

The representation of the internal bus is not limiting and, in particular, the microprocessor is capable of communicating instructions to any element of the device 500 directly or via another element of the device 500.

The executable code of each program enabling the programmable device to implement the processes according to the invention can be stored, for example, in the hard disk 525 or in the read-only memory 515.

According to one variant, the memory card 535 may contain data as well as the executable code of the aforementioned programs which, once read by the device 500, is stored in the hard disk 525.

According to another variant, the executable code of the programs can be received, at least partially, through the communication interface 540, to be stored in the same manner as described above.

More generally, the program or programs may be loaded into one of the storage means of the device 500 before being executed.

The microprocessor 510 will control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, which instructions are stored in the hard disk 525 or in the read-only memory 515 or else in the other storage elements mentioned above. . When powering on, the program or programs that are stored in a non-volatile memory, for example the hard disk 525 or the read-only memory 515, are transferred into the RAM 520 which then contains the executable code of the program or programs according to the invention, as well as registers for storing the variables and parameters necessary for the implementation of the invention. Naturally, to meet specific needs, a person skilled in the field of the invention may apply modifications in the foregoing description.

Claims

A method of tracking a representation of at least one object in a sequence of images, in real time, at least one image of said sequence of images comprising at least one optical blur effect, which method is characterized in that what he understands the following steps,

identifying (305) a representation of said at least one object in a first image of said image sequence;

tracking (200, 320) said identified representation of said at least one object in a second image of said plurality of images, said second image following said first image, according to a first object tracking mode using a keyframe; detecting (205, 330) blur in a third image of said image sequence, distinct from said first image;

tracking (230, 360) said identified representation of said at least one object in said third image, according to a second object tracking mode, distinct from said first tracking mode; and, estimating the pose of said object in said third image according to said identified representation of said at least one object in said third image.

The method of claim 1 wherein said step of tracking said identified representation of said at least one object in said second image includes a step of determining mappings between a plurality of points of interest of said second image and an image. corresponding key, said fuzzy detecting step comprising a step of comparing the number of matches between said plurality of points of interest of said second image and said corresponding keyframe with a threshold.

The method of claim 1 or claim 2 wherein said step of tracking said identified representation of said at least one object in a third image comprises a step of searching for characteristic points in said first or second image, the laying of said at least one object being at least partially determined by reprojection of said characteristic points onto a three-dimensional model of said at least one object.

4. Method according to any one of the preceding claims wherein said step of tracking said identified representation of said at least one object in a third image comprises a step of searching for characteristic points in a key image corresponding to said third image, the pose said at least one object being at least partially determined by reprojection of said characteristic points on a three-dimensional model of said at least one object.

The method according to any one of the preceding claims, wherein said step of tracking said identified representation of said at least one object in said second image comprises a step of determining a plurality of points of interest in said first and second images. , said points of interest being identified as Harris points or SURF, SIFT or YAPE points.

The method of any preceding claim wherein said step of tracking said identified representation of said at least one object in said second image includes a step of determining a plurality of points of interest in said first or second image. and in a corresponding keyframe, said points of interest being identified as Harris points or SURF, SIFT or YAPE points.

The method of any one of the preceding claims recursively applied to multiple images of said plurality of images.

8. Computer program comprising instructions adapted to the implementation of each of the steps of the method according to any one of the preceding claims when said program is executed on a computer.

9. Information storage medium, removable or not, partially or completely readable by a computer or a microprocessor comprising code instructions of a computer program for carrying out each of the steps of the method according to any one of claims 1 to 7.

10. Device comprising means adapted to the implementation of each of the steps of the method according to any one of claims 1 to 7.