WO2012107696A1

WO2012107696A1 - Methods, device and computer programs for recognising shapes, in real time, by means of an appliance including limited resources

Info

Publication number: WO2012107696A1
Application number: PCT/FR2012/050290
Authority: WO
Inventors: Nicolas Livet; Jérémy CHAMOUX
Original assignee: Total Immersion
Priority date: 2011-02-11
Filing date: 2012-02-09
Publication date: 2012-08-16
Also published as: FR2971601A1; FR2971601B1

Abstract

The invention relates in particular to the real-time identification of shapes in an information-processing device having limited resources. After having detected (605) a point of interest in an image and having extracted an image portion according to the position thereof, a feature of said image portion is determined (610, 615). A signature is then generated (620) according to the image portion and the determined feature, from a predetermined statistical model. A signature is then identified (625) in a classification structure including a plurality of signatures, each one of which is associated with a category. The identified signature is identified according to the signature generated and a category associated with the identified signature is identified. A shape is then identified according to the identified category.

Description

Methods, apparatus and computer programs for pattern recognition, in real time, using a device comprising limited resources The present invention relates to the recognition of shapes in images, especially in image sequences, and more particularly to methods, devices and computer programs for the pattern recognition, real-time, using an apparatus including limited resources such as ^a mobile communication terminal. The invention can in particular be used in augmented reality applications.

Augmented reality is intended to insert one or more virtual objects in images of a video stream. Depending on the type of application, the position and orientation of these virtual objects can be determined by external data of the scene represented by the images, for example coordinates directly derived from a game scenario, or by related data. to certain elements of this scene, for example coordinates of a particular point of the scene such as the hand of a player or a decorative element. In general, the virtual objects inserted correspond to real elements present in the scene. Thus, it may be necessary to perform a prior step of robust identification and recognition (or recognition) of these real objects present in the scene, also called pattern recognition step, especially when these objects are numerous.

When the position and orientation of the virtual objects are determined by data related to certain elements of this scene, it may be necessary to follow these elements according to the movements of the camera or the movements of these elements themselves in the scene . The operations of tracking elements and incrustation of virtual objects in the real images can be executed by separate computers or by the same computer.

The objective of the visual tracking or sensor tracking algorithms is to find very accurately, in a real scene, the pose, ie the position and the orientation, of an object whose the geometry information is available or, in an equivalent way, to retrieve the parameters extrinsic position and orientation of a camera filming this object, through image analysis.

The applicant has developed robust visual tracking algorithms for non-marker objects whose originality lies in the pairing of particular points between the current image of a video stream and a set of keyframes, obtained automatically at the initialization of the system as described, for example, in the French patent application No. 1059541. Although these algorithms are robust and fast, there is nevertheless a need to recognize an object among a large number of objects before implementing such a tracking algorithm linked to a recognized object.

The identification of objects, or pattern recognition, consists in identifying the presence of one or more objects previously characterized by learning, in a given image. Typically, portions of this image, from an image sensor such as a camera or camera, are compared to elements of a set of image portions of objects stored in a database, characterizing parts of the objects to be identified. Such portions of images may correspond to a set of pixels (acronym for picture element in English terminology) located around points of interest. After identifying several image portions, it is possible, by pairing, to identify an object.

The pairing is advantageously treated as a classification problem in which each class or category corresponds to a subset of all possible views of an image portion around a point of interest. Several representations of the different objects potentially present in the actual scene under consideration are thus constructed during a learning phase to be compared later with a current image from the image sensor.

A particularly efficient classification model, in terms of processing time, is based on the use of multi-branched structures. It uses a learning phase (offline or off-line in English terminology) that aims to memorize characteristics portions of images representative of points of interest and a detection phase (online or on-line in English terminology) which aims to compare characteristics of a portion of a current image with previously memorized portion characteristics to identify points of interest.

More specifically, a set F comprises J structures F _y , each structure F comprising D binary tests f _d such that,

where i and 2 are pixels, the position of which is randomly chosen during the creation of the test, the image portion considered and / represents a pixel intensity. Each of structures F _are thus allows the creation of a binary signature {fi _j, f _2j, fo from a given image portion, preferably converted into grayscale. Each binary signature can then be represented by a value k _j by concatenation of the component binary values and conversion to an integer.

Figure 1, including Figures 1a-1d, illustrates the classification principle based on the use of multi-branched structures. FIG. 1a represents an image portion comprising, for the sake of clarity, 5 × 5 pixels. FIG. B represents a set of structures Fy, each structure comprising a set of binary tests (TB _{y /} ) relating to pixel values (p _x, j), making it possible to obtain binary values which, combined, form a signature binary (VBj) from which can be determined an index (k _j ) which can be represented as a binary or real number. Thus, by way of illustration, the structure F _1t whose first binary test consists in comparing the pixels p _{U: 1} and p _{Vi 1} (f

allows to obtain here the binary signature VB-i = {0, 1, 0,0, 1, 1, 0, 1, 0, 1} whose corresponding index is ki = 309.

During the learning phase, a table is constructed in order to assign probabilities of belonging to categories, for all the signatures representative of image portions comprising the points of interest considered (a category is here associated with each point interest). During this learning phase, a large number of image portions are synthesized from images representing the objects to be recognized. Typically, an image portion is a block of 32x32 pixels located around a point of interest. If the object can be considered as planar, the views are synthesized by affine transformation. Alternatively, if the object is of any geometry and a textured three-dimensional model of the object is available, views are synthesized according to conventional rendering techniques, for example using technologies known as OpenGL and Direct3D .

For these purposes, points of interest are detected in the available images of the objects to be detected using, for example, a points of interest detector known as HARRIS, FAST or MSER. Image portions, such as blocks of 32x32 pixels, are then extracted around the identified points of interest. Several levels of scales of the images used can be used (a Gaussian pyramid of images can thus be constructed with, for example, a factor of two between the levels in order to allow the classification of portions of images with very close and very similar representations. far from a camera used).

The extracted image portions are small and can be processed with homographic transformations. A front view of the target object generates a subset of new affine views. An affine transformation can be decomposed using two parameterized rotation matrices and a scale matrix (the scale typically ranging from 0.6 to 1.5).

These image portions may be smoothed using a Gaussian filter (e.g., a 5x5 or 7x7 filter) in order to ^improve the vis-vis the noise robustness and / or blurring may be present in the captured image. It is also possible to simulate noise in the image portions used.

Thus, it is possible to obtain, for each point of interest detected, a set of image portions corresponding to different possible views of a part of an object to be identified. The number of image portions is, for example, for each point of ^interest, between five thousand and ten thousand views. These are used to construct a classification model.

The learning phase consists of constructing this classification model using the views available for each point of interest, ie for each class c ,.

It aims to estimate, for each structure F _j, the distribution of the probability ^of identifying a class. For these purposes, the binary signature {/ _? /, f ₂ , f _D } of each image portion associated with a class c ,, for each structure Fy, is estimated. This signature is converted into an integer by concatenation of the binary test results and conversion of the bit sequence thus formed into a real number. This number represents an index k _j (between 0 and 2 ^D -1) of a table representing the probability distribution. When the index k _j corresponding to a signature has been estimated, the value stored in the table, corresponding to this index and to the category to which the considered image portion belongs, is incremented by one. When all the views of all the classes have been processed, this table, of dimension 2 ^D xN (N being the class number to be classified) contains the probability distribution for each of the classes c ,. For each index k _j , each of the elements of the column is normalized so that the sum of these elements is equal to 1. In other words, all the elements are normalized to check the following relation, k = \

with

_kc where P represents the probability that the signature corresponding to ^the index k belongs to the category c, and P _(y = k \ C - c) is a likelihood function constructed here from observations.

FIG. 1c illustrates such a table in which each line corresponds to a binary signature represented by an index k _j and each column corresponds to a predetermined category c 1. Each value stored in the table represents the probability that the image portion from which the signature represented by the index kj has been obtained corresponds to the category c ,. Thus, in other words, the line k _s represents the distribution of membership probabilities of the image portion from which the signature represented by the index k _j has been obtained.

The learning is repeated for each of the structures Fj which leads to 2 ^D xNxJ probability values, representing a memory occupancy of 51 KB (kilobytes), if J = 50 and D = 10, for a class c ,. Thus, considering that two hundred classes are recorded for each object, the memory occupancy is 10.2 MB (megabytes) per object.

The detection phase consists, for a given image portion, of identifying, if necessary, a class c, as a function of the binary responses obtained by the structures F _j . For these purposes, the same procedure as that performed during the learning phase is implemented. It aims to obtain portions of images surrounding points of interest previously detected in a current image extracted from a video stream. For each image portion, J binary signatures are obtained and converted into indexes kj referencing values in the tables of probability distributions determined during the learning phase. These values, if they are previously converted into logarithmic values, can then be summed to obtain the distribution of the membership probabilities of the processed image portion to the classes used.

FIG. 1 d schematically illustrates the method for classifying a given image portion denoted P1. As illustrated, a first step consists in determining the binary signatures VB _j from predetermined structures Fj. Indexed values k _j are then obtained from the binary signatures VB _j by concatenation and conversion of the bits of the binary signatures. The indexes k _j are then used to find, in the tables determined during a learning phase, probability distributions, denoted DistProb _j , membership of the treated image portion classes used. These distributions are then added to each other to obtain an average distribution (DistProbMoy) from which it is possible to isolate a class c, for example the class corresponding to the probability value, greater than a predetermined threshold, the highest.

Pairings found between the points of the current image and points learned images, that is to say the classification of points of ^interest used, are used to identify an object and estimate its pose (position and orientation) in the repository of the camera used using standard techniques, for example techniques for minimizing reprojection error.

Although such a classification method typically offers a recognition rate of the order of 80%, it is preferable to use robust pseudo-matching estimation techniques that may appear during the detection phase. A RANSAC algorithm (acronym for RAndom SAmple Consensus in English terminology) can, for example, be used to make a first estimate to reject false matches and estimate a first pose of the object in the image. A Levenberg-Marquardt type nonlinear error minimization method can then be used to refine this first estimate.

While such a method gives satisfactory results for a small number of objects, it requires particularly important memory resources, making it difficult or difficult to implement it in devices comprising limited resources such as mobile communication terminals. , for example smartphone type. In addition, this method does not allow the detection of a large number of objects, which requires frequent updating of the object database used according to the scenario implemented.

The invention solves at least one of the problems discussed above.

The invention thus relates to a method ^for real-time identification of at least one form of which at least a partial representation is present in an image, for an information processing device, this method comprising the following steps, detecting at least one point of interest in said image and extracting at least one image portion of said image according to the position of said at least one point of interest;

determining at least one characteristic of said at least one image portion;

generating at least one signature according to said at least one image portion and said at least one characteristic, said at least one signature being obtained from at least one predetermined statistical model;

identifying at least one signature in a classification structure comprising a plurality of signatures, each signature of said plurality of signatures being associated with a category, said at least one identified signature being identified according to said at least one generated signature, and identifying a category associated with said at least one identified signature, said at least one form being identified according to said identified category.

The method according to the invention thus makes it possible to effectively recognize a large number of shapes, for example several thousand, while limiting the storage resources necessary for storing characteristics of the shapes to be recognized. The method according to the invention also makes it possible to estimate a pose of the identified forms making it possible, in particular, to facilitate follow-up steps of shapes in the following images.

According to a particular embodiment, said at least one statistical model is a multi-branched model based on structures using random binary tests, said at least one signature being generated as a function of at least one result in at least one of said tests. binaries. The method according to the invention is thus simple to implement, efficient and does not require particularly important computing resources.

Advantageously, said step of determining at least one characteristic of said at least one image portion comprises a step of determining ^a main direction of said at least one image portion. The method according to the invention thus makes it possible to reduce the amount of redundant information used to identify shapes.

Said at least one binary test of which said at least one result is used to generate said at least one signature is preferably determined according to said main direction. The method according to the invention thus makes it possible to reduce the processing carried out on the data used to identify shapes by predetermining operations.

Still according to a particular embodiment, said step of determining at least one characteristic of said at least one image portion comprises a step of determining a sign of said at least one image portion, said sign being determined according to a variation of intensity in said at least one image portion. The method of the invention thus optimizes the amount of information used to identify shapes as well as the processing associated with the data used to identify shapes. Said intensity variation is preferably estimated along said main direction. Advantageously, the method further comprises a step of selecting said at least one statistical model according to said sign.

Still according to a particular embodiment, said classification structure is a classification tree of kd-tree, spill-tree, kmean-tree or vocabulary-tree type.

The method furthermore preferably comprises a step of validating said identified category, said validation step comprising a step of comparing a difference of said at least one identified signature and at least one generated signature with a predetermined threshold in order to improve shape identification performance and limit processing on data used to identify shapes.

Advantageously, the method further comprises a step of validating said identified category, said validation step comprising a step of reprojecting a model of said at least one identified form according to an estimated pose of said at least one identified form so to improve the shape identification performance. The invention also relates to a method of constructing a statistical model for the real-time identification of at least one form of which at least a partial representation is present in an image, for an information processing device, this process comprising the following steps,

selecting at least one image and detecting at least one point of interest in said selected image and extracting at least one image portion of said selected image according to the position of said at least one point of interest;

generating a plurality of image portions by deformation of said at least one image portion;

determining at least one characteristic of said at least one image portion and each image portion of said plurality of image portions; and,

constructing a statistical model associating image portions, chosen from said at least one image portion and said plurality of image portions, to a class associated with said at least one point of interest detected according to characteristics of said at least one image portion and / or said plurality of image portions.

The method according to the invention thus makes it possible to learn data which subsequently makes it possible to efficiently recognize a large number of forms, for example several thousand, while limiting the storage resources necessary for storing characteristics of the shapes to be recognized. The method according to the invention also makes it possible to simplify and optimize a possible subsequent learning by limiting the number of steps to be performed.

Advantageously, said deformation is an affine or projective deformation.

According to a particular embodiment, said step of determining at least one characteristic of said at least one image portion and of each image portion of said plurality of image portions comprises a step of determining a direction principal of that at least a portion of ^{the image} and of each image portion of said plurality of image portions. The method according to the invention thus reduces the amount ^of redundant information used to identify forms.

Still according to a particular embodiment, said step of determining at least one characteristic of said at least one image portion and each image portion of said plurality of image portions comprises a step of determining a sign of said at least one image portion and each image portion of said plurality of image portions, said sign being determined according to a variation of intensity in the corresponding image portion.

The method of the invention thus optimizes the amount of information used to identify shapes as well as the processing associated with the data used to identify shapes. Said sign is preferably determined according to a variation of intensity related to said main direction.

Still according to a particular embodiment, the method further comprises a step of modifying said statistical model aimed at deleting data relating to a class similar to another class. The method according to the invention thus makes it possible to improve the identification of shapes while limiting the amount of data necessary for it.

The invention also relates to a method of constructing a classification structure comprising a plurality of signatures, each signature of said plurality of signatures being associated with a category, for the real-time identification of at least one form of which an at least partial representation is present in an image, for an information processing device, the method comprising the following steps,

selecting at least one image comprising at least a partial representation of said at least one shape, detecting at least one point of interest in said at least one image and extracting at least one image portion from said at least one image according to the position of said at least one point of interest; determining at least one characteristic of said at least one image portion;

- generation ^of at least one signature in accordance with said at least one image portion and said at least one feature, said at least one signature being obtained from at least one predetermined statistical model;

creating said classification structure from said at least one signature and a category associated with said at least one point of interest and said at least one form.

The method according to the invention thus allows the addition of new forms to be identified to a pattern identification system, without the initial learning phase being linked to the shapes to be subsequently identified.

The invention also relates to a computer program comprising instructions adapted to the implementation of each of the steps of the method described above when said program is executed on a computer and a device comprising means adapted to the implementation of each of the steps of the method described above. The benefits of this computer program and method are similar to those discussed above.

Other advantages, aims and features of the present invention will emerge from the detailed description which follows, given by way of non-limiting example, with reference to the accompanying drawings in which:

FIG. 1, comprising FIGS. 1a to 1d, illustrates the classification principle based on the use of structures with multiple branches;

FIG. 2 illustrates general steps of an algorithm according to the invention;

FIG. 3 illustrates an example of an algorithm for creating a signature generator, according to the invention;

FIG. 4, comprising FIGS. 4a and 4b, illustrates an orientation calculation of image portions; - Figure 5 illustrates an example of ^a learning algorithm of objects to be identified;

FIG. 6 illustrates an example of an algorithm for identifying objects, in real time, in images;

- Figure 7 schematically illustrates classification of an image data portion and associated with a point of interest to enable the identification ^of objects in real time, in an image; and,

FIG. 8 illustrates an exemplary information processing device adapted to implement the invention.

In general, the invention allows the identification of many objects in images of a video stream, regardless of the angle of observation of these objects, their scale and independently of the internal parameters of the image capture device. , in an apparatus with limited computing and storage resources. For these purposes, points of interest are identified in the processed images and image portions around these points are analyzed. A first analysis aims to characterize an image portion by calculating a signature, according to a first database, while a second analysis aims to identify the image portion corresponding to the obtained signature, according to a second database. data.

Figure 2 illustrates general steps of an algorithm according to the invention.

As illustrated, a first step here relates to the creation of a signature generator (step 200). This step aims in particular to create a database allowing, from pixels of a portion of an image, to calculate a signature that can be used to identify the image portion corresponding to the signature obtained. Step 200 is described in more detail with reference to FIG.

In a next step (step 205), an apprenticeship is made from representations of the objects to be identified. This step aims in particular to establish links between image portion signatures and categories. used to identify objects. Step 205 is described in more detail with reference to FIG.

Finally, when a signature generator has been created and object learning to be identified has been made, it is possible to identify representations of objects in images (step 210). Step 210 is described in more detail with reference to FIG.

In other words, a first phase, performed offline, essentially consists in the creation of a tool for generating signatures from image portions. The creation of this signature generator is preferably independent of the objects to be recognized later. Thus, a signature generator is generic and can be used to recognize a large number of objects as well as to allow recognition of objects not yet identified. A second phase, also offline, consists essentially of learning objects to be identified. This phase is repeated, at least partially, whenever new objects need to be identified. Finally, the third phase, online, aims at image processing to characterize points of interest and identify objects.

The signature of an image portion is here determined according to a classification algorithm with multiple branches as described above, in particular with reference to FIG. 1. However, while the classes used in a multi-branching classification algorithm typically aim to identify particular points of interest of objects to be identified, ie categories, the classes used here are arbitrary, different from each other. each other, and aim at the creation of distinctive signatures.

Figure 3 illustrates an example algorithm for creating a signature generator, according to the invention.

As indicated above, the signature generator is here created from a set of images 300. The latter are chosen, for example, randomly. However, if the content of the images used here is of no interest, these images must nevertheless include points of interest such as those likely to characterize an object to be identified. Images used to create the signature generator are thus typically images representing any objects.

A first step is to detect points of interest (step 305). The points of interest are for example detected with a standard algorithm, for example an algorithm known as HARRIS or FAST. An image portion is then extracted around each detected point of interest, for example a portion of 32x32 pixels approximately centered on the point of interest considered.

The number of points of interest detected on all the images, for example 500, must be greater than the length of the signatures to be generated. Furthermore, the points of interest are preferably chosen so as not to be at the edges of images to avoid possible edge effects, especially if a reprojection operation is implemented.

It is observed here that the algorithm described with reference to FIG. 3 can be implemented in different ways. Indeed, it is possible to detect all the points of interest of all the images, then to extract image portions for each point of interest of each image and to process each of these image portions. It is also possible to select a first image, to detect a first point of interest, then to extract an image portion and to process this image portion before detecting a new point of interest of this same image and repeat the previous steps and then start over on a next image. However, whatever the sequence of steps, the latter are similar. Therefore, for the sake of clarity, all embodiments are not described here, only the processing of an image portion associated with a point of interest of an image is described below. The same applies to the other algorithms described below.

In a following step, a set of image portions is generated from the image portion considered (step 310) by rotation, scaling, sound effects, etc. These image portions are advantageously projective views. because affine views badly represent what can be perceived in the images extracted from a video stream from a camera. In fact, the image portions used here can be quite large, for example 32 × 32 pixels, and the projective deformations modify the appearance of a visible object in an image plane. ^This is especially the case for the portions of images corresponding to points of interest detected on low scale levels because they contain a large number of pixels with respect to the total size of the image of the object.

It is therefore preferable to take into account these deformations as well as the potential variations of the openings of the current cameras. For these purposes, a simplified model, which does not take into account the aspects of radial and tangential distortions nor the possible variations of the coordinates of the optical center, is preferably used. Thus, only the opening value, through the focal parameter f, and the pose of the object are here taken into account.

The parameters used to obtain image portions from an image portion extracted around a point of interest are therefore the focal length f, the distance Z between the object and the camera and the rotations around the image. x-axis (Θ), ordinates (φ) and the axis of the image sensor (φ). The projective transformation P can then be defined by the following standard relation,

P = K. T

where K represents the matrix containing the intrinsic parameters of the camera and T the homogeneous matrix corresponding to the geometric transformation defined by the position and orientation of the object.

As an illustration, it is possible to randomly vary angles 6> and φ in the range [-JT, + π], angle dans in the range [0, 2π], distance Z in the range [ 0.7xZ, 1.5xZ \ and the focal length in the range [800, 1000] (expressed in pixels relative to a resolution of 640x480, for standard cameras).

The image portions created from this model are then advantageously smoothed using, for example, a Gaussian filter such as a 5x5 or 7x7 filter to improve robustness to noise and / or blur. In addition, noise can be added to these image portions by using noise simulation to improve robustness. The resulting set of image portions, preferably smoothed and slightly noisy, corresponds to different possible views of the point of interest. It is desirable to generate a large number of image portions for each point of interest, for example 5,000 or 10,000 image portions.

After determining a set of image portions for each point of interest, an orientation, or principal direction, related to the axis of the image sensor (typically the axis of the camera), i.e. say at angle φ, is determined for each image portion (step 315). Determining a principal direction for each image portion allows normalization of these image portions and, therefore, reducing the variability of the image portions, which leads to a reduction in the amount of memory necessary for storing reference data. It has already been observed that such normalization is performed during the offline phase, to create the signature generator and for the learning of the objects to be identified, as well as during the online phase to identify objects.

In an article titled "Multiple Target Localization At Over 100 FPS", the authors, Taylor and Drummond, propose an efficient and robust approach to finding an orientation of an image portion relative to a point of interest in an image plane. A set of sixteen pixels sampled on a circle around the position of a point of interest is used to calculate gradients, as vectors, between opposing pixels. The orientation of the image portion of the point of interest is then defined as the sum of these gradient vectors.

FIG. 4a illustrates such an image portion orientation calculation. As illustrated, sixteen points are determined on a circle around a point of interest (placed in the center of the figure). The comparison of the intensities of these sixteen pixels, taken two by two, opposite each other, makes it possible to determine gradient vectors (here eight vectors represented by the arrows in dotted line). The sum of these vectors then makes it possible to calculate an orientation vector (represented by the solid line arrow) of the image portion associated with the central point of interest. Advantageously and in order to improve the stability of the method for calculating the orientation with respect to the variations of scale and the other orientation parameters, the image portion of size PxP is resampled bilinearly into a portion of image size P / 2xP / 2 before the calculation described above is applied.

While the orientation thus calculated can be used to resample the pixels of the considered image portion according to the calculated orientation, it is preferably used to create or select a set of binary tests.

The possible orientations are sampled on values included, for example, between 0 and 31. This discretization makes it possible to precompute each of the binary tests of the multi-branched structure used and to store them in memory. It is thus possible to use an orientation associated with an image portion without additional calculation cost.

The calculation of the orientation makes it possible to reduce the space of the orientations with two orientations, those around the two axes perpendicular to the optical axis, or the axes of the abscissae and the ordinates, that is to say those which vary to a lesser extent ^the appearance of the relevant image portion.

In a next step, an image portion sign is advantageously determined (step 320). Indeed, it has been observed that the image portions processed around a point of interest have an important characteristic related to the local variation of intensity between central pixels and peripheral pixels. This variation is stronger between the central pixels and the peripheral pixels in the direction given by ^the orientation of the image portion in question, as described above. It is thus possible to distinguish two types of points of interest, depending on whether the intensity variation is positive or negative.

It is here assumed that the positive sign is given to a point of interest when the intensity of the center of the ^image portion is smaller than the pixels of the periphery in the main direction. Otherwise, the sign is negative. Of course, a contrary convention can be used. In practice, when so-called points of HARRIS are used, the sign can be determined by comparing the average of the intensities on the central pixels (for example the 9x9 central pixels) and the average on the pixels of the periphery in the main direction.

When FAST points of interest are used, the sign can be given directly by construction because the FAST points are detected by a significant difference in intensity between the central pixel of the image portion and the pixels on a circle around of the central pixel. Indeed, in the FAST process, a Bresenham circle, for example 16-pixel perimeter, is built around each pixel of the image. If k contiguous pixels (ka typically a value equal to 9, 10, 1 1 or 12) contained in this circle are either all of intensity greater than the central pixel, or all of intensity lower than the central pixel, then the central pixel is considered a point of interest. It is then sufficient to compare the intensity of the central pixel with the intensity of the contiguous pixels.

A first use of this principle of signed points of interest relates to a final step of comparing a signature obtained with reference signatures. In this case, the signatures of each of the points of interest to be recognized are classified into two groups: the group of negative points and the group of positive points. This has the effect of dividing by two the processing time of the comparison phase and to reduce the size of the structures used.

A second use, which can be combined with the first, is to modify the structure of the database used to generate signatures. Thus, the N classes used for the construction of the signature generator are separated into two groups having, preferably, equivalent sizes (Λ // 2). Depending on the sign of the image portion considered, one or the other of the databases is used. The generated signature therefore has a size equal to Λ // 2, which leads to an optimized memory occupancy and a computing time twice as fast, with equivalent robustness.

A probability distribution table is then constructed for each structure F _s of the multi-branched structure used (step 325). For these purposes, if the binary tests of the F structures have not been defined, they are, preferably randomly. In other words, for each structure Fy, a set of binary tests is created. An orientation, predetermined or randomly defined, is associated with each test of this test set. Corresponding sets of tests are then created for each possible orientation (the orientations being considered according to the center of the image portions), according to the possible orientation values, that is to say according to the sampled values between 0 and Q-1 (each orientation d _q is then defined by d _q = qx2n Q, where q is the orientation index between 0 and Q-1). Thus, for example, if a test f, consists in comparing a pixel p _u and a pixel p _v , it is considered that this test is associated with the direction do and it is then necessary, for this test, to create Q-1 tests, i.e. tests fi associated with directions d _q (q ranging from 7 to Q-1). Thus, for example, a test f is created to compare the pixels p ' _u and p' _v in an orientation d _q the position of the pixels p'u and p ' _v corresponding to the position of the pixels p _u and p _v , respectively, after a rotation of d _q -d ₀ . FIG. 4b illustrates such an example that the rotation from do to d _q is equal to π / 2, that is to say that q '= Q / 4, Q being here a multiple of 4.

Thus, for each structure F ,, there exists a distribution table of probability of belonging to a class, the structure defining a set of binary tests for a set of directions. The classes are, for example, random and different values associated with each point of interest considered in this first off-line phase.

When an image portion is processed, its main direction is determined as well as, if appropriate, its sign.

A set of binary values is then calculated for each structure F _h according to the binary tests associated with the principal direction previously determined. The test results are then used to determine an index k _j as described above with reference to FIG. 1. It is observed here that the index k _j is calculated according to the results of binary tests associated with a predetermined direction, for example the direction 0. The value corresponding to this index (initialized to zero when the table is created corresponding probability distribution) is then incremented by one in the corresponding distribution table of probabilities. As indicated above, the table used here is the table associated with the considered structure and. if a portion of image sign is used, corresponding to the sign of the image portion considered.

All image portions of the points of interest considered are thus processed. After such processing, the probability distribution tables are normalized one by one such that the sum of the elements of each column is equal to 1 as indicated above.

The probability distribution tables are here stored in the database 330 which is itself preferably stored in a non-volatile memory.

In a next step (step 335), the probability distribution tables are analyzed to delete data deemed unnecessary, for example redundant data such as data associated with similar classes.

During this step, each of the M classes built (where M is, for example, determined by a user). M being greater than / V, is tested to remove, among these classes, those that are most similar.

For these purposes, it is possible to generate, for each class, a new set of image portions (projective or affine). These image portions are then tested to identify their membership class (as described with reference to Figure 1), generating a signature and looking in this signature for the class that has the most discriminating response. The classes with the most errors are considered to be similar to one or more other existing classes. These similar classes are deleted one by one, applying optionally the stress is to keep as many data points ^to positive signs of interest data on the negative sign for points of interest.

The probability distribution tables thus determined can be used to generate signatures characterizing a portion of an image. Such signatures are here constituted by the cumulative distributions of probabilities associated with binary test results related to F, structures. Thus, by way of illustration, an image portion makes it possible to obtain, from a set of structures F, associated with binary tests f a set of binary values VB _j which correspond to a set of indexes. k _s . These indexes k _j each allow to determine a distribution of probabilities. The probability distributions thus obtained are then combined, for example by cumulation, to determine a signature.

It is observed that such signatures characterize the degree of similarity of an image portion to a class set. Consequently, they are invariant in orientation, in scale and in focal length and, more generally, according to the characteristics of affine and projective deformation.

Therefore, the use of such a signature generator allows significant gains in terms of memory occupancy. Indeed, it is no longer necessary to memorize 2 ^D xJ values for each point of interest, which represents approximately 00 MB for, for example, two thousand points of interest (with J = 50 and D = 10). The construction of the signature generator requires a memory capable of storing 2 ^D xJ bytes, ie between 5Mo and 15Mo to memorize between 100 and 300 points of interest. It is observed that if this generator has a cost, especially in terms of memory occupancy and processing, it reduces the storage cost, variable, associated with the representations of objects to be recognized.

In addition, you do not need to modify the signature generator to learn how to identify new objects.

FIG. 5 illustrates an exemplary object learning algorithm to be identified. As previously stated this second phase is performed offline. It must be repeated each time a new object has to be identified.

Learning is here made from a set ^of images 500 representing, at least partially, one or more objects to be identified.

A first step is to detect points of interest

(step 505). As with step 305. described with reference to the 3, the points of interest are, for example, detected with a standard algorithm, for example an algorithm known as HARRIS or FAST.

Such detection can be performed at different levels ^of scale of a Gaussian pyramid built from the initial image to improve recognition of objects seen at different distances. Typically, two hundred points of interest are detected per object, evenly distributed over images of two or three scale levels.

An ^image portion is extracted again around each detected point of interest, e.g., a block of 32x32 pixels, approximately centered on the item of interest.

After extracting the image portions of each detected point of interest, within the limit of the number of points of interest required, a principal direction, linked to the axis of the image sensor (typically the axis of the camera ), i.e. at the angle φ, is determined for each image portion (step 510). As described above, the determination of a principal direction for each image portion allows normalization of these image portions and, therefore, a reduction in the amount of memory necessary for storing reference data and, thus, dual, to improve the quality of the classification.

The main orientation is here determined according to step

315 of Figure 3. A set of sixteen pixels sampled on a circle around the position of a point of interest is used to calculate gradients, as vectors, between opposing pixels. The orientation of the image portion of the point of interest is then defined as the sum of these gradient vectors, as illustrated in FIG. 4.

Again, in order to improve the stability of the method for calculating orientation with respect to scale variations and other orientation parameters, the image portion of size PxP is preferably bilinearly resampled into a portion of the image. image size P / 2xP / 2 before the calculation described above is applied.

The set of possible orientations is sampled according to the same values as those used during the first phase described above, for example, between 0 and 31. The thus calculated orientation is. preferably, used to select a set of binary tests.

In a next step, an image portion sign is advantageously determined (step 515). The sign is here calculated ⁱⁿ accordance with step 320 described with reference to Figure 3.

A signature of the processed image portion is then generated from the predetermined probability distribution tables 330

(step 520). It is constituted by the cumulative logarithmic distributions of probabilities associated with binary test results related to the structures Fj used. Thus, a set of binary values VB _j is obtained from the set of structures F _j which are associated with binary tests f 1.

As described above, the binary values VBj make it possible to define indexes k _j which, in turn, make it possible to obtain probability distributions.

The probability distributions thus obtained are then combined, for example by cumulation, to determine a signature. It is noted here that the binary tests to be used for a given structure, are identified according to the orientation of the predetermined ^image portion processed so that it is not necessary to resample the portion of processed ^image according to its orientation.

A signature is thus generated for each previously detected point of interest.

Each signature is then stored with a reference, or category (CIF), a point of interest of ^the object corresponding to identifying (step

525).

It is not necessary, during this phase, to generate plural pieces ^of image for each point ^of interest detected. This phase can be done very quickly even if the number of objects is large. Nevertheless, it is however possible when the application allows to generate multiple image portions for each point of ^interest and retain an average signature to improve the quality of ^learning.

An optional step of improving the signature generator can take place here. It consists in reducing the size of the signatures generated in a smaller size. For these purposes, a large number of generated signatures is used to evaluate what each of the signature classes brings for the classification step. The classes of the signature that vary the least on all the image portions tested are deleted one by one. This process is then repeated until the best classes are obtained while retaining, if necessary, the constraint that half of the signatures are linked to positive points of interest and the other half of the signatures are linked to negative points of interest.

A characteristic signature is obtained for each of the points of ^interest of each of the objects to be recognized. These signatures are then placed in a classification structure, specific to the classification task, stored here in the database 530. Standard structures such as structures known under the names of "kd-tree", "spill-tree" , "Kmean-tree" or "vocabulary-tree" can be used to classify each of the signatures and to compare them. These structures can be stored in nonvolatile memories or rebuilt when the application is launched.

It is observed that when a sign of image portions is used, two structures are built, one to classify positive image portions of signatures and ^the other to classify signatures negative image portions.

In addition, the three-dimensional coordinates of each detected point of interest are evaluated. They can in particular be obtained according to the geometry of the object and its position in the image used.

The information considered relevant here, that is to say in particular the two-dimensional and three-dimensional coordinates of the points of interest, the signatures of the points of interest, the indices of the corresponding objects and the classification structures, are advantageously stored in nonvolatile memories, for example a hard disk or a memory card, to avoid having to re-determine them later.

For example, if the size of a signature is one hundred and it is necessary to classify about a thousand objects, each object being represented by two hundred points ^the memory usage of signatures is about 20MB (1 x 000 objects 200 points / object x size of the signature (100 bytes) = 20MB). The memory occupancy of the signature generator (5Mo) is added to the memory occupancy of the signatures, which gives a total of about 25 MB of data necessary for the operation of the object recognition application.

When the learning is finished, it is possible to identify, in real time, objects in images.

FIG. 6 illustrates an example of an algorithm for identifying objects, in real time, in images.

As indicated above, this third phase is performed online in order to detect one or more objects in a current image 600.

A first step is to detect points of interest (step 605). As with step 305, described with reference to FIG. 3, the points of interest are, for example, detected with a standard algorithm, for example an algorithm known as HARRIS or FAST. The maximum number of points of interest to be detected is preferably predetermined. It is, for example, equal to one hundred.

An image portion is again extracted around each detected point of interest, for example a block of 32x32 pixels, approximately centered on the point of interest considered.

After extracting the image portions of each detected point of interest, in the limited number of points of interest required, a principal direction, linked to the axis of the image sensor (typically the axis of the camera ), i.e. angle φ, is determined for each image portion (step 610). As previously described, determining a principal direction for each image portion allows normalization of these image portions and, therefore, a reduction in the amount of memory required for storing reference data.

The main orientation is here determined according to step

315 of Figure 3. A set of sixteen pixels sampled on a circle around the position of a point of interest is used to calculate gradients, in the form of vectors, between opposite pixels. The orientation of the image portion of the point of interest is then defined as the sum of these gradient vectors, as illustrated in FIG. 4.

Again, to improve the stability of the calculation procedure of ^the orientation face of scale variations and other orientation parameters, the size of image portion PxP is preferably re-sampled in a portion of bilinearly image size P / 2xP / 2 before the calculation described above is applied.

The set of possible orientations is sampled according to the same values as those used during the first and second phases described above, for example between 0 and 31. The orientation thus calculated is preferably used to select a set of binary tests.

In a next step, an image portion sign is advantageously determined (step 61). The sign is here calculated according to the step 320 described with reference to FIG.

A signature of the processed image portion is then generated from the previously determined probability distribution tables 330 (step 620). It is constituted by the cumulative distributions of probabilities associated with the results of binary tests related to the structures F _j used. Thus, a set of binary values VB _j is obtained from the set of structures Fj which are associated with binary tests f, -. As described above, the binary values VB _j make it possible to define indexes k _s which, in turn, make it possible to obtain probability distributions. The probability distributions thus obtained are then combined, for example by cumulation, to determine a signature. It is noted here that the binary tests to be used for a given structure, are identified according to the orientation of the predetermined ^image portion processed so that it is not necessary to resample the portion of image processed according to its orientation.

A signature is thus generated for each previously detected point of interest. These signatures are then compared (step 625) with signatures previously classified in specific structures at the classification stage (classification structure stored here in the database 530). For each generated signature, the closest signature gives the category (cat) of the corresponding point of interest.

Again, the use of the sign of the point of interest makes it possible to avoid comparing two points that would be fundamentally different (positive versus negative).

In such a method, each point of interest of the current image is matched with a classified point by comparing the signatures of the image portions associated with these points. It is possible to remove the least relevant matches by using a distance between the signatures associated with the two points and a predetermined tolerance threshold.

The object that has the most matching with the points of the current image is considered a potentially recognized object. A last geometric validation step, for example an RANSAC or PROSAC type algorithm, is then preferably performed to validate the coherence of the matches found. If the number of validated matches is greater than a predetermined threshold, for example ten, the index of the identified object is returned by the application. If not, it is possible to test the following objects in a list of potentially recognized objects.

The process, as a whole, can classify and recognize several thousand objects in real time. As ^an illustration, it is thus possible, on a smartphone-type mobile phone, to memorize features of a thousand objects to recognize them in less than one hundred milliseconds, depending on the capabilities of the device.

Note that it is possible to repeat the recognition process over several consecutive images, or close to ^a video stream separating signatures sorted into groups. For example, the first group can contain a thousand objects to be recognized and is processed in image 1, the second group, which also contains a thousand objects, is processed in image 2 and thus right now. It is thus possible to recognize on a mobile device ten thousand objects in less than one second.

It is also interesting to note that the principle of the invention applies equally to portions of images previously converted to gray level (for example by combining red, green and blue components) or portions of images. corresponding to points of interest detected on one of the components.

Figure 7 schematically illustrates the classification of a given image portion denoted PI and associated with a point of interest to allow the identification of objects, in real time, in an image.

A first step is to determine binary values from structures including predetermined binary tests. These binary values are then used to find, in the tables determined during a learning phase (database 330), probability distributions, denoted DistProb _j , membership of the processed image portion to classes used. These probability distributions are then combined with each other, for example by accumulation, in order to obtain a signature (Signature). It is observed here that neither probability distributions DistProb _j nor signature Signature have meaning as such.

The signature obtained from the treated image portion is then compared with signatures previously stored and stored in a classification structure (here belonging to the database 530) in order to find the signature closest to the signature obtained and thus associate a cat category at the point of interest considered.

FIG. 8 illustrates an exemplary information processing device that can be used to implement, at least partially, the invention, in particular the steps described with reference to FIGS. 2, 3, 5 and 6. The device 800 is example a smartphone-type mobile phone, a personal digital assistant or a microcomputer.

The device 800 preferably comprises a communication bus

802 to which are connected: a central processing unit or microprocessor 804 (CPU, Central Processing Unit);

a ROM 806 (Read Only Memory) which may include the operating system and programs such as "Prog";

a RAM or cache memory 808 (RAM, Random Access

Memory) having registers adapted to record variables and parameters created and modified during the execution of the aforementioned programs;

an 810 video acquisition card connected to an 812 camera; and, a graphics card 814 connected to a screen or a projector 816.

Optionally, the device 800 can also have the following elements:

a hard disk 820 which may include the aforementioned "Prog" programs and data processed or to be processed according to the invention;

an 822 keyboard and an 824 mouse or any other pointing device such as an optical pencil, a touch screen or a remote control enabling the user to interact with the programs according to the invention, in particular during the installation and / or initialization;

a communication interface 826 connected to a distributed communication network 828, for example the Internet network, the interface being able to transmit and receive data; and,

- A memory card reader (not shown) adapted to read or write processed or processed data according to the invention.

The communication bus allows communication and interoperability between the various elements included in the device 800 or connected to it. The representation of the bus is not limiting and, in particular, the central unit is able to communicate instructions to any element of the device 800 directly or via another element of the device 800.

The executable code of each program enabling the programmable device to implement the processes according to the invention can be stored, for example, in the hard disk 820 or in the read-only memory 806. According to one variant, the executable code of the programs may be received via the communication network 828, via the interface 826, to be stored in the same manner as that described previously.

More generally, the program (s) may be loaded into one of the storage means of the device 800 before being executed.

The central unit 804 will control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions which are stored in the hard disk 820 or in the read-only memory 806 or else in the other elements of aforementioned storage. When powering up, the program or programs that are stored in a non-volatile memory, for example the hard disk 820 or the read-only memory 806, are transferred into the random access memory 808 which then contains the executable code of the program or programs according to the invention, as well as registers for storing the variables and parameters necessary for the implementation of the invention.

It should be noted that the communication apparatus comprising the device according to the invention can also be a programmed apparatus. This device then contains the code of the computer program or programs for example frozen in a specific application integrated circuit (ASIC).

Naturally, to meet specific needs, a person skilled in the field of the invention may apply modifications in the foregoing description. In particular, if the invention has mainly been described in the context of augmented reality applications, it can be used to trigger specific actions such as the opening of Internet pages, playback of video or sound type files. It can also be used, for example, to locate a user in a real environment.

Claims

1. A method for real-time identification of at least one form of which at least a partial representation is present in an image, for an information processing device, said method being characterized in that it comprises the following steps,

detecting (605) at least one point of interest in said image and extracting at least one image portion of said image according to the position of said at least one point of interest;

determining (610, 615) at least one characteristic of said at least one image portion;

generating (620) at least one signature according to said at least one image portion and said at least one characteristic, said at least one signature being obtained from at least one previously determined statistical model;

identification (625) of at least one signature in a classification structure comprising a plurality of signatures, each signature of said plurality of signatures being associated with a category, said at least one identified signature being identified according to said at least one generated signature, and identification of a category associated with said at least one identified signature, said at least one form being identified according to said identified category.

The method of claim 1 wherein said at least one statistical model is a multi-branched model based on structures using random binary tests, said at least one signature being generated based on at least one at least one result. said binary tests.

3. The method of claim 1 or claim 2 wherein said step of determining ^at least one characteristic of said at at least one image portion comprises a step of determining (610) a main direction of said at least one image portion.

4. Method according to claim 3, dependent on claim 2, wherein said at least one binary test of which said at least one result is used to generate said at least one signature is determined according to said main direction.

The method of claim 3 or claim 4 wherein said step of determining at least one feature of said at least one image portion comprises a step of determining a sign of said at least one image portion. said sign being determined according to a variation of intensity in said at least one image portion.

6. The method of claim 5 wherein said intensity variation is estimated according to said main direction.

7. The method of claim 5 or claim 6 further comprising a step of selecting said at least one statistical model according to said sign.

8. The method according to any of claims 1 to 7 wherein said classification structure is a classification tree of kd-tree, spill-tree, kmean-tree or vocabulary-tree type.

9. Method according to any one of claims 1 to 8 further comprising a step of validating said identified category, said validation step comprising a step of comparing a difference of said at least one identified signature and at least one generated signature. at a predetermined threshold.

10. Method according to any one of claims 1 to 8 further comprising a step of validating said identified category, said validation step comprising a step of reprojection of a model of said at least one form identified according to an estimated pose of said at least one identified form.

1 1. A method of constructing a statistical model for ^the real-time identification of at least one form a representation of at least partially is present in an image, to a processing device ^information, the method being characterized in that it comprises the following steps,

selecting at least one image and detecting (305) at least one point of interest in said selected image and extracting at least one image portion of said selected image according to the position of said at least one point of interest;

generating (310) a plurality of image portions by deformation of said at least one image portion;

determining (315, 320) at least one characteristic of said at least one image portion and each image portion of said plurality of image portions; and,

constructing (325) a statistical model associating image portions, chosen from said at least one image portion and said plurality of image portions, with a class associated with said at least one detected point of interest according to features of said at least one image portion and / or said plurality of image portions.

12. The method of claim 1 1 wherein said deformation is an affine or projective deformation.

The method of claim 11 or claim 12 wherein said step of determining at least one feature of said at least one image portion and each image portion of said plurality of image portions comprises a determining step (610) of a main direction of said at least one image portion and each image portion of said plurality of image portions.

The method of claim 13 wherein said step of determining at least one feature of said at least one image portion and each image portion of said plurality of image portions comprises a step of determining a sign of said at least one portion of image and each image portion of said plurality of image portions, said sign being determined according to an intensity variation in the portion of ^the corresponding image.

15. The method of claim 14 wherein said sign is determined according to a variation of intensity related to said main direction.

The method of any of claims 10 to 15 further comprising a step of modifying said statistical model for deleting data relating to a class similar to another class.

A method of constructing a classification structure comprising a plurality of signatures, each signature of said plurality of signatures being associated with a category, for real-time identification of at least one form of which at least a partial representation is present in an image, for an information processing device, this method being characterized in that it comprises the following steps,

selecting at least one image comprising at least a partial representation of said at least one form, detecting (505) at least one point of interest in said at least one image and extracting at least one portion of image of said at least one image according to the position of said at least one point of interest;

determining (510, 515) at least one characteristic of said at least one image portion;

- generation (520) of at least one signature in accordance with said at least one image portion and said at least one feature, said at least one signature being obtained from ^at least one predetermined statistical model;

creating (525) said classification structure from said at least one signature and a category associated with said at least one point of interest and said at least one form.

18. Computer program comprising instructions adapted to the implementation of each of the steps of the method according to any one of the preceding claims when said program is executed on a computer.

19. Device comprising means adapted to the implementation of each of the steps of the method according to any one of claims 1 to 17.