EP2441048A1 - Verfahren und vorrichtungen zur erkennung echter objekte, zur nachverfolgung der darstellung dieser objekte und für erweiterte realität bei einer bildsequenz in einem client-server-modus - Google Patents
Verfahren und vorrichtungen zur erkennung echter objekte, zur nachverfolgung der darstellung dieser objekte und für erweiterte realität bei einer bildsequenz in einem client-server-modusInfo
- Publication number
- EP2441048A1 EP2441048A1 EP10734232A EP10734232A EP2441048A1 EP 2441048 A1 EP2441048 A1 EP 2441048A1 EP 10734232 A EP10734232 A EP 10734232A EP 10734232 A EP10734232 A EP 10734232A EP 2441048 A1 EP2441048 A1 EP 2441048A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- real object
- objects
- representation
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/16—Indexing scheme for image data processing or generation, in general involving adaptation to the client's capabilities
Definitions
- the present invention relates to the combination of real and virtual images in real time, in an augmented reality system, and more particularly to methods and devices for identifying real objects, monitoring the representation of these objects and of reality augmented, in a sequence of images, in client-server mode to allow in particular the implementation of augmented reality application in remote devices, in particular mobile devices comprising low resources.
- Augmented reality is intended to insert one or more virtual objects in the images of a video stream.
- the position and orientation of these virtual objects can be determined by external data of the scene represented by the images, for example coordinates directly derived from a game scenario, or by related data. to certain elements of this scene, for example coordinates of a particular point of the scene such as the hand of a player or a decorative element.
- the virtual objects inserted correspond to real elements present in the scene. Thus, it may be necessary to perform a preliminary step of identifying or recognizing these real objects present in the scene, especially when these objects are numerous.
- the position and orientation of the virtual objects are determined by data related to certain elements of this scene, it may be necessary to follow these elements according to the movements of the camera or the movements of these elements themselves in the scene .
- the operations of tracking elements and incrustation of virtual objects in the real images can be executed by separate computers or by the same computer.
- the objective of the visual tracking or sensor tracking algorithms is to find very accurately, in a real scene, the pose, ie the position and the orientation, of an object whose geometry information is available or, in an equivalent way, to find the extrinsic position and orientation parameters of a camera filming this object, thanks to the image analysis.
- tracking algorithms also called target tracking algorithms
- use a marker that can be visual or use other means such as sensors, preferably wireless type radio frequency or infrared.
- some algorithms use pattern recognition to track a particular element in an image stream.
- the autoimmune Polytechnique Fédérale de Lausanne has developed a visual tracking algorithm that does not use a marker and whose originality lies in the pairing of particular points between the current image of a video stream and a keyframe, called a keyframe. in English terminology, obtained at the initialization of the system and a key image updated during the execution of the visual tracking.
- the principle of this algorithm is described for example in the article entitled "Fusing Online and Offline Information for Stable 3D Tracking in Real Time" - Luca Vacchetti, Pascal Lepetit, Pascal Fua - IEEE Transactions on Pattern Analysis and Machine Intelligence 2004.
- this visual tracking algorithm is to find, in a real scene, the pose of an object whose three-dimensional mesh is available in the form of a three-dimensional model (3D), or to find, in an equivalent way, the extrinsic parameters of position and orientation of a camera filming this object, still, thanks to the image analysis.
- the current image is here compared with one or more keyframes recorded to find a large number of matches between these pairs of images to estimate the pose of the object.
- a keyframe is composed of two elements: a captured image of the video stream and a pose (orientation and position) of the real object appearing in this image.
- the keyframes are images extracted from the video stream in which the object to be tracked has been placed manually through the use of a pointing device such as a mouse. Keyframes preferably characterize the pose of the same object in several images. They are created and saved "offline", that is to say out of the permanent regime of the monitoring application. It is interesting to note that for targets or objects of planar type, for example a magazine, these keyframes can be directly generated from an available image of the object, for example in JPEG or bitmap format. In the case of non-planar targets, for example a face, it is possible to generate a set of keyframes from a three-dimensional texture model of the object.
- Each offline keyframe includes an image in which the object is present and a pose to characterize the location of that object as well as a number of points of interest that characterize the object in the image.
- Points of interest are, for example, built from a Harris point detector, SURF (Speeded-Up Robust Features), SIFT (Scale-Invading Feature) Transform in Anglo-Saxon terminology), YAPE (acronym for Yet Another Point Extractor in English terminology) or MSER (acronym for Maximally Stable Extremal Regions in Anglo-Saxon terminology) and represent locations with high values of directional gradients in the image and a description of variation of the image in the vicinity of these points.
- These points of interest may also contain descriptors that describe the neighborhood of a point. These descriptors can take the form of image portions (Harris) or gradient values in a neighborhood of a point.
- the manual preparation phase thus consists in finding a first estimate of the pose of the object in an image extracted from the video stream, which amounts to formalizing the initial affine transformation T p ⁇ c , the transition matrix between the reference associated with the image. object followed towards the marker attached to the camera.
- T p ⁇ c the initial affine transformation
- the offline keyframes are processed in order to position points of interest according to the parameters chosen when launching the application. These parameters are specified empirically for each type of application use and allow the detection and matching application to be adapted to obtain a better quality of estimation of the pose of the object according to the characteristics of the application. the real environment. Then, when the real object in the current image is in a pose that is close to the pose of the same object in one of the offline keyframes, the number of matches becomes important. It is then possible to find the affine transformation allowing to fix the three-dimensional model of the object on the real object. When such a match has been found, the tracking algorithm goes into steady state.
- the movements of the object are followed by an image on the other and the drifts are compensated by the information contained in the offline key image retained during initialization.
- this offline keyframe can be reprojected using the estimated pose of the previous image.
- This reprojection thus makes it possible to have a key image that contains a representation of the object similar to that of the current image and can thus allow the algorithm to operate with points of interest and descriptors that are not robust to rotations.
- the tracking application thus combines two distinct types of algorithms: a detection of points of interest, for example a modified version of Harris point detection or detection of SIFT or SURF points, and a reprojection technique. points of interest positioned on the three-dimensional model towards the plane image.
- a point p of the image is the projection of a point P of the real scene with p ⁇ P 1 • P E • T p ⁇ c • P
- Pi is the matrix of the intrinsic parameters of the camera, ie its focal length, the center of the image and the offset
- P E is the matrix of the extrinsic parameters of the camera, that is to say the position of the camera in space real
- T p ⁇ c is the matrix of passage between the reference associated with the object followed towards the marker attached to the camera.
- the tracking problem therefore consists of determining the matrix T p ⁇ c , ie the position and the orientation of the object with respect to the reference of the camera.
- error minimization is used in order to find the best solution for the estimation T p ⁇ c by using the set of three-dimensional correspondences on the geometric model and two-dimensional (2D) in the current image and in the keyframe.
- an RANSAC Random SAmple Consensus
- PROSAC cronym for PROgressive SAmple Consensus in English terminology
- the applicant has developed a visual tracking algorithm for objects that do not use a marker and whose originality lies in the pairing of particular points between the current (and previous) image of a video stream and a set of Keyframes, obtained automatically when the system is booted.
- Such an algorithm is in particular described in the French patent application FR 2 911 707.
- This algorithm makes it possible, in a first step, to identify the object positioned in front of the camera and then to initialize completely automatically without positioning constraints. the process of tracking the object.
- This algorithm makes it possible in particular to identify and follow a large number of objects present at the same time in a video stream and thus allows the identification and tracking of targets or objects in a real scene. It is particularly robust to occlusions and allows you to follow objects whose 2D projections in the image overlap.
- These objects can be of different geometries and have various colorimetric aspects. By way of example, but in a non-limiting way, they may be textured trays, faces, clothes, natural scenes, television studios or buildings.
- these object tracking implementations in a client-server mode have certain limitations. In particular, they do not identify an object whose representation is present in an image compared to a very limited number of objects.
- the processing time performed by the servers quickly becomes very high when a large number of clients transmit requests. For example, in order to make an augmented reality application, the server must receive all the images of the video stream of its clients to insert synthesis objects and retransmit this video stream to the clients.
- the invention solves at least one of the problems discussed above.
- the subject of the invention is thus a method for computer identification of at least one real object whose representation is present in at least one image, in real time, in a client-server architecture, this method comprising the steps following, - acquisition of said at least one image;
- the method according to the invention thus makes it possible to identify in real time real objects present in the field of a camera while benefiting from a client-server architecture. According to this method, the resources necessary for the identification of real objects are distributed between the client and the server while limiting the volume of data exchanged.
- said step of determining at least one singular element of said at least one image is repeated for a plurality of images, said at least one information relating to said at least one particular singular element being estimated according to at least one element. singular determined in several images of said plurality of images to improve the recognition of real objects.
- the method further comprises a step of transmitting to said server information relating to the installation and / or stability of the device implementing said step of acquiring said at least one image and / or the nature of said at least one real object to accelerate and / or optimize the identification of real objects.
- the method further comprises a step of receiving at least one item of data making it possible to recognize a representation of said at least one real object in at least one second distinct image of said at least one image, called at least a first image, in response to said transmission, for tracking said real object.
- the method according to the invention can thus be used to follow in real time identified objects.
- the method preferably further comprises a step of receiving at least one piece of data to be inserted in an image comprising a representation of said identified real object and a step of inserting said at least one piece of data to be inserted into said image comprising said item. representation of said real object identified.
- the method thus allows the implementation of augmented reality applications.
- the invention also relates to a method for computer identification of at least one real object whose representation is present in at least one image, in real time, in a client-server type architecture, this method comprising the steps following,
- the method according to the invention thus makes it possible to identify in real time real objects present in the field of a camera while benefiting from a client-server architecture.
- the resources necessary for the identification of real objects are distributed between the client and the server while limiting the volume of data exchanged. It also allows a centralized update, in real time, of the database used to identify real objects.
- said creation step comprises a step of developing a decision structure to accelerate the identification of real objects.
- the method further comprises a step of receiving at least one complementary piece of information, said at least one complementary piece of information being used in said identification stage and being relative to the laying and / or to the stability of the device for acquiring said at least one image and / or the nature of said at least one real object in order to accelerate and / or optimize the identification of real objects.
- said at least one piece of information received relates to at least two singular elements of at least one image, said step of identifying said at least one real object comprising a step of analyzing the relative position of said at least two objects. singular elements to improve the identification of real objects among a large number of real objects.
- the method further comprises a step of estimating the laying of said at least one identified real object, said at least one information relating to said identification of said at least one real object comprising said estimated pose to facilitate the identification of real objects in other images.
- the method preferably further comprises a step of transmitting at least one information relating to said at least one identified real object making it possible to recognize a representation of said at least one real object in at least a second image distinct from said at least one image, called first image, to follow said real object.
- the method further comprises a step of determining at least one characteristic data of at least one representation of said at least one identified real object, said at least one information relating to said identified real object comprising said characteristic data to improve the tracking of real objects identified.
- the method further comprises a service activation step associated with said at least one identified real object.
- the method further comprises a step of transmitting at least one piece of data to be inserted in an image comprising a representation of said real object identified to allow the implementation of augmented reality applications.
- said at least one information relating to at least one singular element of an image comprises an image portion relative to said singular point.
- the invention also relates to a computer program comprising instructions adapted to the implementation of each of the steps of the method described above when said program is executed on a computer and a device comprising means adapted to the implementation of each step of this process.
- FIG. 1 illustrates an example of an environment in which the invention can be implemented
- FIG. 2 illustrates certain steps executed in a client and in a server to implement the invention
- FIG. 3 illustrates an example of determination of the data used to identify a planar object in images
- FIG. 4 illustrates a particular implementation of the algorithm illustrated in FIG. 2;
- FIG. 5 illustrates an exemplary device adapted to implement the invention or a part of the invention.
- FIG. 1 illustrates an example of environment 100 in which the invention can be implemented.
- the servers 105-1 and 105-2, generically referenced 105 are here accessed by the clients 110-1 to 110-3, generically referenced 110, via a network 115, for example the Internet network.
- customers 110 are mobile devices such as mobile phones, personal assistants, also called PDAs (Personal Digital Assistant), and portable personal computers, also called laptops or notebooks in English terminology.
- PDAs Personal Digital Assistant
- portable personal computers also called laptops or notebooks in English terminology.
- the invention can also be implemented with fixed devices, including standard personal computers, also called desktop in English terminology.
- connections between the clients 110-1 and 110-2 and the network 115 are here wireless connections, for example connections of the GSM type (acronym for Global System for Mobile Communications in English terminology). or WiFi while the connections between the 110-3 client and the network 115 as well as between the 105-1 and 105-2 servers are wired connections, for example Ethernet connections.
- GSM Global System for Mobile Communications
- WiFi Wireless Fidelity
- the servers 105 are used to calculate and store data making it possible to identify real objects from descriptors, to identify these objects from descriptors (during a steady state phase), to create and store media associated with these objects used to increase real images according to the pose of these objects and generate data allowing customers 110 to follow these objects in a sequence of images (during a steady state phase).
- Mobile devices are essentially used to acquire image sequences, to detect descriptors in these images and, possibly, to send them, to follow representations of objects in these images according to received data and to insert media received in these images according to the pose detected objects. It is observed that the expression server here designates a server or a group of servers.
- Figure 2 illustrates more precisely some steps performed in a client 200 and in a server 205 to implement the invention.
- a first step, executed in the server 205, is to create data for identifying the representation of objects in images (step 210).
- This step is advantageously performed offline during a learning phase. However, it can also be performed, at least partially, during a steady state phase to allow the dynamic addition of new information related to new objects or not.
- This step is repeated for each object to be identified later. It is, in principle, performed only once per object. Nevertheless, it can be repeated, in particular, when the object is modified or its identification can be improved.
- this data can be organized according to classification trees such as binary decision trees (see for example the article "Keypoint Recognition using Randomized Trees" V. Lepetit and P. Fua, EPFL, 2006) or according to structures with multiple ramifications, also called ferns-type decision trees (see for example the article "Fast Keypoint Reconnection using Random Ferns” M. Ozuysal, P. Fua and V.
- classification trees such as binary decision trees (see for example the article "Keypoint Recognition using Randomized Trees" V. Lepetit and P. Fua, EPFL, 2006) or according to structures with multiple ramifications, also called ferns-type decision trees (see for example the article "Fast Keypoint Reconnection using Random Ferns" M. Ozuysal, P. Fua and V.
- Another step is to create media associated with the objects that can be identified (step 220). These media are here stored in the database 215. They represent for example text, sounds, still or moving images and / or three-dimensional models animated textures or not.
- step 220 is advantageously performed offline during a learning phase in the server 205. Nevertheless, this step may also occur during the steady state of the application in order to avoid a service outage. Moreover, steps 210 and / or 215 may be performed in a remote server so as not to disturb the operation of server 205. It should be noted here that steps 210 and 215 may be executed one after the other, in one way or another, or in a parallel way. These two steps make it possible to create a database that is used during the steady state of the augmented reality application.
- a first step of the augmented reality application is to acquire, from the client 200, an image (step 225).
- an image sensor associated with the "client” device enables the acquisition of images in real time with a frequency and a resolution that depend on the technical characteristics of the device.
- the captured images are then preferably converted to grayscale in order to improve the robustness, in particular to the variations in brightness and the unequal technical characteristics of the image sensors, and to simplify the calculations.
- Images acquired and possibly converted can also be sub-sampled and filtered, for example with a "low-pass” filter, to optimize calculation times and reduce the noise present in the image.
- the image acquired and converted is called the current image.
- a next step implemented in the client 200, the points of interest of the current image are detected in order to construct, from these, descriptors.
- a detector of points of interest called keypoints in English terminology. This may include a Harris point detector or MSER, FAST, SURF, SIFT or YAPE type detectors.
- descriptors are constructed to allow for matching with points identified on reference objects.
- the descriptors are preferably determined on the "client" device 200 to prevent the transmission of images and to optimize the computing times of the servers.
- the determined descriptors or portions of extracted images are then transmitted to the server as illustrated by the arrow 235 to allow the identification of objects on the current image from this information.
- an object identification step is performed (step 240).
- the server uses these data, in whole or in part, to estimate and associate to them a probability of belonging to one of the objects registered in the database of the system.
- An object is identified, for example, as soon as the estimated probability reaches a predetermined threshold.
- object identification may end as soon as an object or predetermined number of objects is identified or continue until the processing of all received data.
- the server 205 When an object is identified, the server 205 generates information for tracking this object (step 245). Depending on the tracking algorithm used in the "client" device 200 and the type of object recognized, this may be an identification of the object (for example a string of characters), the laying of the object, an identification of the number of the image in which the object was found (called timecode in English terminology), an image containing the object and its associated pose or a set of descriptors describing the object, for example 2D / 3D mappings and an associated image portion, and its geometry. As indicated by the arrow 250, this information is transmitted to the client 200.
- this information is transmitted to the client 200.
- an object tracking algorithm implemented in the "client” device may be initialized to begin tracking the identified objects (step 255).
- This first information is, for example, the pose of the object in the image and its geometry.
- the information received afterwards, including the descriptors, is used "on the fly” by the object tracking algorithm to improve the results. Thus, it is not necessary for the client 200 to receive all the tracking information from the server 205 to begin tracking an object.
- the server 205 when the server 205 has identified one or more objects, it transmits to the client 200 the media associated with the identified objects.
- the client 200 when objects are followed in a sequence of images, in the "client” device, it is possible to enrich these images (step 265). In other words, it is then possible to correlate the display of synthesis objects with the pose of the real objects present in the scene.
- these media may take the form of text superimposed in the video stream, images, video streams to be displayed, for example, on 3D elements present in the scene, audio streams and / or animated 3D synthesis objects.
- object identification can be achieved using decision trees, such as binary decision trees or an equivalent ferns approach.
- Saved objects can have geometries close to the plane or any 3D geometry. If the objects are substantially planar, the subset of views is generated using affine transformations. If the objects have any 3D geometry, a projective model can be used to generate the subset of views. This process allows to obtain the index of the objects represented in the current image but also to estimate their pose in six degrees of freedom (position and orientation).
- a large number of views are synthesized from at least one image of the target object. Points of interest are extracted from each of these views. If the object is substantially planar, the views are synthesized by applying an affine transformation to the image. If, on the other hand, the object is of any geometry, the views are synthesized from a 3D texture model of it using conventional rendering techniques, for example OpenGL (Open Graphics Library) originally developed by Silicon Graphics Inc.) or Direct3D (developed by Microsoft).
- OpenGL Open Graphics Library
- Direct3D developed by Microsoft
- points of interest k are detected on an available image of the object, for example with a YAPE detector, and a small image portion around this point, or window, for example a block of 32x32 pixels (acronym for picture element in English terminology), is extracted.
- image portions surrounding the points of interest / c are close to the plane and can be processed with homographic transformations.
- a front view of the target object is sufficient to generate the subset of the new views.
- A R ⁇ R ⁇ SR ⁇
- R ⁇ and R ⁇ are two rotation matrices respectively parametrized by the angles ⁇ and ⁇
- S diag [ ⁇ i, ⁇ 2] is a matrix scale.
- the object is more complex, it is possible to use a 3D model built automatically, for example with a three-dimensional scanner or a 3D model obtained using a modeler or a 3D reconstruction software based on multiple images of the same object as ImageModeler from RealViz (ImageModeler is a trademark).
- This template is then used to generate new views with standard rendering techniques. When all views are generated, a set of image portions corresponding to different views is obtained for each point of interest.
- the most representative points of interest are selected.
- the k points of interest should have a high probability P (/ c /) of being found in an image if they are visible, considering perspective distortions and the presence of noise.
- T is the geometric transformation used to synthesize a new view
- T is an affine transformation or projection.
- the probability P ( / c,) can be estimated simply by counting the number of times the point is in the reference system, and all points of interest and associated image portions are then constructed with only the points of interest. interest having a probability P (/ c,) greater than a predetermined threshold.
- a decision tree here contains nodes that represent a simple test to perform and to separate the space portions of images to be classified. This test consists, for example, in comparing the intensity (I) of two pixels (mi and m 2 ) belonging to a portion of an image, preferably smoothed by a Gaussian. Thus, if the intensity of the mid pixel, denoted l (r ⁇ ii), is greater than the intensity of the pixel m 2 , noted l (m 2 ), a first branch of the tree, starting from the considered node, is selected. In the opposite case, if l (r ⁇ ii) is less than Km 2 ), the other branch of the node is selected.
- Each sheet contains an estimate of the probability distribution, determined a posteriori, of all the classes based on the training data.
- a new image portion is classified by making it descend into the tree and performing on it the elementary test at each node by considering different pixels mi and m 2 , preferably randomly chosen.
- a The probability of belonging to a class is assigned to it. This probability depends on the probability distribution stored in the sheet that is updated during the learning phase.
- a single tree can not usually describe all possible tests and may lead to insufficient results.
- several trees are constructed from a subset of image portions randomly drawn from the set of image portions corresponding to the points of interest found. Each of the trees thus describes a single partition of the space of a portion of an image.
- the responses of each tree are combined to classify a portion of the image.
- the trees preferably have a limited depth, for example 15.
- the matches found in this real-time classification are used to estimate plan homography using standard minimization techniques.
- a RANSAC type algorithm can, for example, be used to make a first estimate of the affine transformation in order, in particular, to reject false matches and to estimate a first pose of the object in the image. Then, it is possible to use a Levenberg-Marquardt nonlinear minimization to refine this first estimate. It is important to note that a large number of objects can be registered in the system (tens of thousands). This has for statistical consequence that some objects of this recognition database may have similarities and cause errors during the estimation phase of probabilities. In order to remedy this problem, the geometric information on the relative position of the points of interest described above is advantageously used in order to choose the best candidate among the potential identifications.
- the data used to identify objects comprise pairs of points of interest and portions of images, organized in the form of decision trees.
- Figure 3 illustrates an example of determining data used to identify a planar object in images.
- the representation 300 of a game card is here converted to reduce its dynamics.
- the color image 300 is converted to the grayscale image 305. As indicated above, such a conversion notably makes it possible to improve the robustness of the identification of objects.
- the points of interest of the image 305 are then determined according to a standard algorithm implementing, for example, a Harris detector which makes it possible to find, in particular, point 310.
- affine and / or Other types of transformations, in particular degradations are applied to the image 305.
- the image 305 as well as the resulting images of these transformations are then used for the extraction of portions of images around the determined points of interest. Examples of such image portions extracted around point 310 are shown in Figure 3 with references 315-1 through 315-4. It is thus possible to extract all the portions of images (reference
- an object tracking algorithm comprising a step of identifying objects using binary type decision trees or ferns structures is combined with a scene augmentation algorithm according to a client-server mode.
- FIG. 4 Such an embodiment is illustrated in FIG. 4 between a client 400 and a server 405. It corresponds to a particular implementation of the algorithm illustrated in FIG. 2.
- a first step, performed in the server 405, is to create data for identifying the representation of objects in an image (step 410). As indicated above, this step, advantageously performed offline during a learning phase, is repeated for each object to be identified later.
- the data created here points of interest associated with portions of images, are organized in decision trees of binary type or in structures with multiple ramifications of ferns type, the main difference between these two approaches residing in the construction of the model. allowing the search of probabilities.
- This classification in the form of decision structures allows fast matching with points of interest of a current image and associated image portions received by the server.
- the image acquisition and, possibly, conversion step implemented in the "client” device 400 and making it possible to obtain a current image (step 415), are similar to the step 225 described previously.
- a next step (step 420), implemented in the client 400 the points of interest of the current image are detected in order to construct, from these, descriptors.
- points of interest of YAPE type are used.
- the associated descriptors correspond to the image portions around these points.
- a Harris, MSER, FAST, SURF or SIFT points of interest detector can be used.
- the descriptors, as well as their position (u, v) in the current image are transmitted, preferably in continuous flow, to the server 405 as indicated by the arrow 425. Descriptors of several successive images can be sent. The descriptors corresponding to different images are advantageously separated by tags indicating it.
- the server 405 begins to receive these descriptors and their associated position in the image, it proceeds, for each, to a matching phase to identify objects (step 430). For each object contained in the object database to be recognized, it updates a probability measure. When the probability measure is sufficiently high for at least one given object, the identification phase ends and the identifier of the recognized object is transmitted to the client as indicated by the arrow 435.
- the "client” device 400 transmits descriptors as long as no object is recognized, that is to say as long as it does not receive object identifiers (step 440).
- a list of matches between points of interest of the current image and points of interest determined during the learning is obtained. Since the three-dimensional coordinates of these points determined during the learning are known (because the pose of the object in the learning image is known), it is possible to obtain a list of correspondence between the points of interest of the student. image (two-dimensional coordinates) and those of the object in its space (three-dimensional coordinates). It is then possible to deduce a pose of the object in the current image according to conventional methods such as those described above. This estimated pose is then transmitted to the client 400, as indicated by the arrow 450, with the description of the geometry of this object, for example a model of "wireframe” type (wireframe) in the case of a non-planar object. or both dimensions, length and width, in the case of a substantially planar object.
- the server 405 creates and transmits, as indicated by the arrow 465, a signature of the identified objects (step 460).
- the server builds that keyframe.
- the key image can advantageously be represented by a list of points of interest containing two planar coordinates (u, v), three coordinates in the reference of the object (x, y, z) and a portion of picture around this point. Only this information can be transmitted to the customer without it being necessary to transmit the key image in its entirety.
- the implemented object tracking algorithm does not use a keyframe (see for example the approach described by D. Wagner in the document “Pose Tracking from Natural Features on Mobile Phones”), it is possible to send directly the part of the database containing the classification of the identified object. If such a solution has a significant cost in terms of the volume of data to be transmitted, of the order of one megabyte, it allows the client to use this data to initialize more easily when the object tracking is stalled. .
- Yet another solution is to recreate on the "client” device, the binary type decision trees or the ferns-type multi-branch structures as a function of a reference image transmitted by the server.
- Such a solution requires computing resources in the "client” device, it makes it possible to reduce considerably the volume of data exchanged between clients and servers and to facilitate the reinitialization of the object tracking algorithm in case of stall.
- the tracking algorithm is here implemented by the "client" device (step 455). It may be based on the use of one or more obtained keyframes received from the server or on the use of decision trees received from the server or developed by the client. It should be noted that when one or more objects are identified it is no longer necessary for the client to recognize an object placed in front of the camera. Nevertheless, the identification algorithm can be automatically called, for example, when no object is tracked for a predetermined time or when the user indicates that he wishes to search for a new object, for example by using a button or the touch screen of the device.
- an augmented reality algorithm is used (step 470). It may be, for example, DfusionAR software (DfusionAR is a brand of Total Immersion) that performs this operation according to the characteristics of the device, including 3D rendering application layers (eg DirectX or OpenGL).
- DfusionAR is a brand of Total Immersion
- the algorithm described in FIGS. 2 and 4 thus makes it possible to limit the transfer of data between clients and servers to the data necessary for the recognition, the object tracking and the increase of the image sequence. to say the video stream. It is not required to transmit all the images.
- the data may be incrementally transmitted, the identification of the object being iteratively possible to perform on the considered image portions and the object tracking can begin as soon as enough descriptors are received. The same is true for displaying summary objects that can be refined as the client receives the geometric and texture data.
- the server can transmit information concerning the current image to simplify the step of recognition.
- it may be the current image itself or a histogram representing object classification probabilities.
- it will be noted in particular the possibility of transmitting to the server a set of contours obtained after the application of filters, for example of the "high pass" type, in order to improve the rate of good identification on the objects present in the recognition base. 215.
- the user can choose to recognize and track objects of type monuments.
- it can be connected to a group of servers, called cluster in English terminology, which gathers data relating to historical monuments.
- a location function for example a GPS function (acronym for Global Positioning System in English terminology), can also be used to identify the approximate position of the user.
- Such information transmitted to the server, can facilitate the step of identifying monuments or other buildings in the field of vision. This approach makes it possible to easily superimpose information on the video stream from the sensor but also to locate more finely the user. Indeed, in this case, it is not only possible to find the position of the user but also to determine its orientation in relation to the world it looks.
- the invention also allows the quick identification and tracking of objects of simpler geometry, for example movie posters or album covers.
- Different servers can be deployed to provide services tailored to different types of requests.
- identifying and tracking objects does not require sending image sequences between clients and servers.
- the only image sequences that can be transmitted are those corresponding to the media used to increase real scenes.
- conventional compression formats using, for example, the H.263, H.264 or MPEG4 codecs, may allow bandwidth gain.
- Other compression means may be used for sending audio data useful for scene augmentation.
- the application layer SIP (acronym for Session Initiation Protocol in English terminology) can, for example, be used for sending data between clients and servers.
- IP type networks (acronym for Internet Protocol in English terminology) that can be used with telecommunication networks such as GSM and GPRS (General Packet acronym). Radio Service in English terminology).
- 3G, 3G + connections for example based on UMTS technology (Universal Mobile Telecommunications System in English terminology), WiFi, LAN or Internet can be used to ensure communication between customers and customers.
- remote servers For example, in a museum, it is possible to use a set of mobile clients, including PDA type, connected to a server that gathers the necessary information for the recognition and monitoring of all paintings and sculptures visible to visitors. In this case, augmented reality makes it possible to enrich the information given to visitors on the history of a work of art.
- FIG. 5 A device adapted to implement the invention or a part of the invention is illustrated in Figure 5.
- the device shown is preferably a standard device, for example a personal computer.
- the device 500 here comprises an internal communication bus 505 to which are connected:
- a central processing unit or microprocessor 510 (CPU, acronym for Central Processing Unit in English terminology); - A read-only memory 515 (ROM, acronym for Read OnIy Memory in English terminology) may include the programs necessary for the implementation of the invention;
- RAM Random Access Memory
- cache memory 520 comprising registers adapted to record variables and parameters created and modified during the execution of the aforementioned programs
- a communication interface 540 adapted to transmit and receive data to and from a communication network.
- the device 500 also preferably has the following elements:
- a hard disk 525 which may comprise the aforementioned programs and data processed or to be processed according to the invention.
- a memory card reader 530 adapted to receive a memory card 535 and to read or write to it data processed or to be processed according to the invention.
- the internal communication bus allows communication and interoperability between the various elements included in the device 500 or connected to it.
- the representation of the internal bus is not limiting and, in particular, the microprocessor is capable of communicating instructions to any element of the device 500 directly or via another element of the device 500.
- the executable code of each program enabling the programmable device to implement the processes according to the invention can be stored, for example, in the hard disk 525 or in the read-only memory 515.
- the memory card 535 may contain data as well as the executable code of the aforementioned programs which, once read by the device 500, is stored in the hard disk 525.
- the executable code of the programs may be received, at least partially, through the first communication interface 540, to be stored in the same manner as described above.
- program or programs may be loaded into one of the storage means of the device 500 before being executed.
- the microprocessor 510 will control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, which instructions are stored in the hard disk 525 or in the read-only memory 515 or else in the other storage elements mentioned above. .
- the program or programs that are stored in a non-volatile memory for example the hard disk 525 or the read-only memory 515, are transferred into the RAM 520 which then contains the executable code of the program or programs according to the invention, as well as registers for storing the variables and parameters necessary for the implementation of the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0902762A FR2946439A1 (fr) | 2009-06-08 | 2009-06-08 | Procedes et dispositifs d'identification d'objets reels, de suivi de la representation de ces objets et de realite augmentee, dans une sequence d'images, en mode client-serveur |
PCT/FR2010/051105 WO2010142896A1 (fr) | 2009-06-08 | 2010-06-04 | Procédés et dispositifs d'identification d'objets réels, de suivi de la représentation de ces objets et de réalité augmentée, dans une séquence d'images, en mode client-serveur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2441048A1 true EP2441048A1 (de) | 2012-04-18 |
Family
ID=41796144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP10734232A Withdrawn EP2441048A1 (de) | 2009-06-08 | 2010-06-04 | Verfahren und vorrichtungen zur erkennung echter objekte, zur nachverfolgung der darstellung dieser objekte und für erweiterte realität bei einer bildsequenz in einem client-server-modus |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP2441048A1 (de) |
FR (1) | FR2946439A1 (de) |
WO (1) | WO2010142896A1 (de) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101487944B1 (ko) | 2010-02-24 | 2015-01-30 | 아이피플렉 홀딩스 코포레이션 | 시각 장애인들을 지원하는 증강 현실 파노라마 |
US20150070347A1 (en) * | 2011-08-18 | 2015-03-12 | Layar B.V. | Computer-vision based augmented reality system |
US9946963B2 (en) | 2013-03-01 | 2018-04-17 | Layar B.V. | Barcode visualization in augmented reality |
US20160189419A1 (en) * | 2013-08-09 | 2016-06-30 | Sweep3D Corporation | Systems and methods for generating data indicative of a three-dimensional representation of a scene |
FR3026534B1 (fr) * | 2014-09-25 | 2019-06-21 | Worou CHABI | Generation d'un film d'animation personnalise |
CN105069754B (zh) * | 2015-08-05 | 2018-06-26 | 意科赛特数码科技(江苏)有限公司 | 基于在图像上无标记增强现实的系统和方法 |
FR3066850B1 (fr) * | 2017-05-24 | 2019-06-14 | Peugeot Citroen Automobiles Sa | Procede de visualisation en trois dimensions de l’environnement d’un vehicule |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008134901A1 (en) * | 2007-05-08 | 2008-11-13 | Eidgenössische Technische Zürich | Method and system for image-based information retrieval |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2911707B1 (fr) | 2007-01-22 | 2009-07-10 | Total Immersion Sa | Procede et dispositifs de realite augmentee utilisant un suivi automatique, en temps reel, d'objets geometriques planaires textures, sans marqueur, dans un flux video. |
-
2009
- 2009-06-08 FR FR0902762A patent/FR2946439A1/fr not_active Withdrawn
-
2010
- 2010-06-04 EP EP10734232A patent/EP2441048A1/de not_active Withdrawn
- 2010-06-04 WO PCT/FR2010/051105 patent/WO2010142896A1/fr active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008134901A1 (en) * | 2007-05-08 | 2008-11-13 | Eidgenössische Technische Zürich | Method and system for image-based information retrieval |
Non-Patent Citations (1)
Title |
---|
See also references of WO2010142896A1 * |
Also Published As
Publication number | Publication date |
---|---|
FR2946439A1 (fr) | 2010-12-10 |
WO2010142896A1 (fr) | 2010-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2491532B1 (de) | Verfahren, computerprogramm, und vorrichtung für echtzeit-hybridverfolgung von objektdarstellungen in einer bildfolge | |
EP2455916B1 (de) | Auf nichtstarrem Tracking basierende Mensch-Maschine-Schnittstelle | |
EP3707676B1 (de) | Verfahren zur schätzung der installation einer kamera im referenzrahmen einer dreidimensionalen szene, vorrichtung, system mit erweiterter realität und zugehöriges computerprogramm | |
WO2010142896A1 (fr) | Procédés et dispositifs d'identification d'objets réels, de suivi de la représentation de ces objets et de réalité augmentée, dans une séquence d'images, en mode client-serveur | |
EP2828834B1 (de) | Modell und verfahren zur herstellung fotorealistischer 3d-modelle | |
FR2933218A1 (fr) | Procede et dispositif permettant de detecter en temps reel des interactions entre un utilisateur et une scene de realite augmentee | |
FR2911707A1 (fr) | Procede et dispositifs de realite augmentee utilisant un suivi automatique, en temps reel, d'objets geometriques planaires textures, sans marqueur, dans un flux video. | |
FR2913128A1 (fr) | Procede et dispositif de determination de la pose d'un objet tridimensionnel dans une image et procede et dispositif de creation d'au moins une image cle | |
WO2018185104A1 (fr) | Procede d'estimation de pose, dispositif, systeme et programme d'ordinateur associes | |
FR2983607A1 (fr) | Procede et dispositif de suivi d'un objet dans une sequence d'au moins deux images | |
EP4033399B1 (de) | It-gerät und verfahren zur schätzung der dichte einer menschenmenge | |
FR2911708A1 (fr) | Procede et dispositif de creation d'au moins deux images cles correspondant a un objet tridimensionnel. | |
EP2441046A2 (de) | Verfahren und vorrichtung zur kalibrierung eines bildsensors mit einem echt-zeit-system zur verfolgung von objekten in einer bildfolge | |
Czúni et al. | The use of IMUs for video object retrieval in lightweight devices | |
FR2946446A1 (fr) | Procede et dispositif de suivi d'objets en temps reel dans une sequence d'images en presence de flou optique | |
WO2021123209A1 (fr) | Procédé de segmentation d'une image d'entrée représentant un document comportant des informations structurées | |
CA3230088A1 (fr) | Procede de mise en relation d'une image candidate avec une image de reference | |
WO2012107696A1 (fr) | Procédés, dispositif et programmes d'ordinateur pour la reconnaissance de formes, en temps réel, à l'aide d'un appareil comprenant des ressources limitées | |
FR3129759A1 (fr) | Procédé d’aide au positionnement d’un objet par rapport à une zone cible d’une matrice de pixels d’une caméra, dispositif et programme d’ordinateur correspondants | |
FR2939541A1 (fr) | Procede de classification automatique de sequences d'images | |
FR2896605A1 (fr) | Procede et systeme d'identification et de poursuite automatique d'un objet dans un film pour la videosurveillance et le controle d'acces |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20111229 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR |
|
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1164515 Country of ref document: HK |
|
17Q | First examination report despatched |
Effective date: 20130827 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20140103 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1164515 Country of ref document: HK |