EP2374073A1

EP2374073A1 - System for searching visual information

Info

Publication number: EP2374073A1
Application number: EP09771343A
Authority: EP
Inventors: Younes Bennani; Mustapha Lebbah; Nistor Grozavu; Hamid Benhadda
Original assignee: Thales SA; Universite Sorbonne Paris Nord Paris 13
Current assignee: Thales SA; Universite Sorbonne Paris Nord Paris 13
Priority date: 2008-12-10
Filing date: 2009-12-09
Publication date: 2011-10-12
Also published as: CN102369525B; FR2939537A1; FR2939537B1; US20120131026A1; CN102369525A; US8666898B2; WO2010066774A1

Abstract

The invention relates to a system for searching information within a database (1) of a large size, including a processor (2) and inputs/outputs, said system being characterised in that the processor includes at least one of the following elements: a first module E1 for extracting descriptors associated with each object in the database, and for generating a table containing the objects and the value of a descriptor associated with an object for the descriptors selected for representing the object; a second module E₂ for applying several classification algorithms SOMi for each of the tables T_Tk resulting from the module E-₁ in order to allocate for each object Oi a class number xij for an applied SOMi algorithm for each descriptor category; a third module E₃ adapted for merging the results from module E₂ in order to determine a class number associated with an object Oi for each descriptor type Tk; a fourth module E₄ for searching for which column SOMi of a table T_Sι is closer to the column obtained during the first merging of step E₃ and for selecting the closest SOMi card contained in the table, or best SOMi card; and a fifth module E5 adapted for merging the best SOMi sum cards and for applying an algorithm for searching for the best SOMf card to be transmitted to a display means (5).

Description

SYSTEM FOR SEARCHING VISUAL INFORMATION

The object of the present invention relates to a system and a method for performing a visual information search of objects within a large multi-modal database (images, videos, signals, documents, etc. .).

The invention generally relates to the visualization of images, of texts when the base consists of AFP dispatches for example, or of audio signals when the base relates to communications records, for example, etc. More generally, the system according to the invention is used in the field of searching for visual information of multi-modal data, and allows browsing and searching, in databases, faster, thanks to a better structuring of the base in the form of homogeneous classes of objects of this base.

In the following description, the invention illustrated by way of example on image data from the base wikipedia. The term image is used to designate an image in a database, the image being described by several descriptors or attributes, such as its texture, its color, the text associated with this image, and so on.

The term "best map" is used to define a map with a high quality index. This index is calculated between the consensus score obtained in step E5 described below and the initial cards obtained in step E2. It is possible to use different indices according to the descriptors extracted in step E1, correlation, purity index, rank index, etc.

The term "large size" refers to the two dimensions of a database (lines = observations and columns = vahables), the number of lines being of the order of several millions of images and the number of columns of the order of several thousand. A SOM card is a card known to those skilled in the art corresponding to an auto-adaptive or self-organizing card which is a class of network of artificial neurons based on unsupervised learning methods. It is often referred to as the English self organizing map (SOM), we still map Kohonen. The function of the algorithm implemented by a card is notably to classify objects.

The mass of data collected each day becomes more and more important. At present, studies estimate that the amount of information in the world doubles every twenty months. The Web and digital libraries are giving rise to new challenges in the areas of databases (DB) and information retrieval (Rl) within these databases. In many applications, it becomes important, if not necessary, to facilitate access to information by means of a web-based navigation aid system, a system for assisting in the formulation of queries for searches in databases. data, to filter, adapt and customize this information.

The prior art discloses various systems and techniques for searching images or information. Conventional image search methods are generally based on principles related to linguistic indexing techniques (keywords) (ie to a textual pre-annotation associated with the images) without taking into account the content information. or structural description such as texture, color, density, shape, latent contours, etc., for searching images in a database. Most methods use only keywords associated with images to make the classification. In addition, they use classification techniques such as the averaging algorithms known by the acronym "k-means" where the number of classes to be found and the (mobile) centers of these classes must be arbitrarily defined. Such techniques imply instability of the results according to the original settings (sensitivity of the algorithms at the starting points). Other methods use the other parameters such as color or texture separately without combining them and they do not go back to refine the results obtained.

The idea of the present invention consists in particular in providing a method and a system to meet the expectations of users and to solve such issues as:

• How to access multidimensional data or a set of data as quickly as possible, in a large database of multimodal data (signals, speech, image, video, documents, etc.)?

• How to organize the archiving of a large multimodal database and thus allow quick access when searching for an object in this database and offering several answers with increasing degrees of relevance to the query?

• How to synthesize the multimodal database in a form of cartography summarizing its content?

• How to improve the process and improve query responses by taking into account possible interactions with a user.

The invention relates to an information retrieval system within a large database, comprising a processor and inputs / outputs, said system being characterized in that said processor comprises at least the following elements:

A first module Ei adapted to extract the descriptors associated with each object of the database, and to construct an array containing the objects and the value of a descriptor associated with an object, for the descriptors chosen for the representation of the object. object, A second module E ₂ adapted to apply several classification algorithms SOMi, for each of the tables T _Tk resulting from the module E ₁ , in order to assign for each object Oi, a class number xij, for an algorithm SOMi applied, for each category of descriptors,

A third module E3 adapted to merge the results from the module E ₂ to determine for each type of descriptor Tk, a class number associated with an object Oi,

• A fourth module E4, adapted to find what is the column SOMi of a table T _S ι closest to the column obtained during the first merge of step E ₃ , and to select the nearest SOMi card contained in the table T _S ι _, or better map SOMi, • A fifth module E ₅ adapted to merge the "best maps" are SOMi, and apply a search algorithm of the best map to be transmitted to a display means.

The fusion algorithm used is, for example, a relational analysis algorithm. The object is an image and the extracted attributes are chosen from the following list: texture, color.

After the melting step performed in the module E ₃ , the method returns to search for the best type card 1, then the best type card N, etc. According to another embodiment, after the melting step performed in the module E ₃ , the method takes K results from the merger and merges them using a merge technique, to obtain a single partition compromised to fetch the best card for a type of descriptor.

Other features and advantages of the present invention will appear better on reading a nonlimiting example of embodiment with reference to the figures which represent:

FIGS. 1A and 1B, an example of a system structure allowing the implementation of the method according to the invention, and a synoptic of the different phases implemented by the method, FIG. 2, a representation of the sequence of the various steps implemented by the method according to the invention,

• Fig. 3, pretreatment and feature extraction steps, "Fig. 4, steps for classification (by multiple algorithms) and recoding data by category,

• Figure 5 the merger of the different classification and consensus search results, and

• Figure 6 the final merge, the search for the final consensus and the selection of the best map for navigating and retrieving information in a large database.

In order to better understand the object of the present invention, the following example will be given for a problem of search and quick access to visual information in a database of images described by a set numeric descriptors (color descriptors, texture descriptors, etc.) and textual (several thousand words extracted from web pages). The term object is used in this example to designate an image in the database. The terms objects and data are used interchangeably to designate an element of a database. Thus, FIG. 1A schematizes an example of a system according to the invention which comprises the database 1 containing a large number of images, from which the information must be sought. The image database 1 is connected to a processor 2 which will comprise different modules adapted to implement the steps E ₁ to E ₅ , including a relational analysis module and one or more modules self-organizing card more known under the abbreviation Anglo-Saxon Self Organizing Map or abbreviated SOM. The database 1 and the processor 2 are for example implemented in a recognition system comprising inputs / outputs 3, 4. The output 4 can be in connection with an interface Man Machine, which allows for example the display of the results 5 and / or the possibility of entering different types of requests by an operator 6.

Figure 1 B is a summary of the different steps of the method, detailed in the following figures. The method takes as input large databases 1, images, signals, documents, or others. The first step E ₁ consists of extracting characteristics or descriptors associated with the objects or data stored. The second step E ₂ will consist in reducing the dimension of the description space of the objects, by recoding each data, for each type of descriptor k, by as many numbers as SOM algorithms used for recoding. Each number nor corresponding to the class of membership of this data by the algorithm SOMi. We thus obtain a recoding of the data in spaces of the selected classifications. During the third step E ₃ , the method will merge the classifications by implementing a consensus search algorithm that can be achieved by a relational analysis, a method known to those skilled in the art that will not be detailed. In a fourth step E ₄ , the method will then recode the data from the first merge, then the fifth step E ₅ is to select the best auto adaptive card or SOM for viewing and navigation simplified and fast within the database.

FIG. 2 represents the sequence of steps E ₁ to E ₅ executed in the modules E ₁ to E ₅ which will be detailed in FIGS. 3 to 6.

Figure 3 details the steps performed by the processor 2 for the first phase. The data in the large database is transformed by suitable pretreatments to extract features or attributes relating to each descriptor (color, texture, etc.) for each data item. The objects contained in the database are referenced Oi. These objects Oi are thus described by a set of descriptors, by example of K types. At the end of this step, the processor has at its disposal K tables T _T κ of data or elements zij, each composed of N lines which correspond to the number of data Oi contained in the base BD, 1, and a variable number of columns. The number of columns varies depending on the type of indicators. Attributes for describing an indicator (or descriptor) are not necessarily the same. Indeed, the number of attributes to describe the color descriptor is not, necessarily the same as that which makes it possible to describe the texture descriptor, for example. A zij element of the array corresponds to a value obtained by the extraction step. For example, if we consider the set of colors as the type of descriptor, the attributes will each correspond to a color and the element zij will be the value associated with a given color for the data Oi. The results of the first step Ei are thus in the form of K tables T _T κ of data zij which will be segmented subsequently by using several unsupervised automatic classification algorithms (FIG. 4), better known under the name Anglo-Saxon. SOM. These algorithms can be variants of the SOM algorithm. Each table T _κ of data zij coming from a type of descriptor k will be segmented by several algorithms SOMi. The number of algorithms SOMi applied to each table Tk is chosen by the user. It may vary for each descriptor. The algorithms applied to each descriptor may vary, or be the same, from one descriptor to another.

Figure 4 details the application of several algorithms SOMi on the K tables Tk of Figure 3.

The application of several algorithms SOMi, generate classifications that allow a reduction of the dimension of the space of the data (size of the space in which the search and the navigation will be done) and offer a categorical coding of smaller dimensions . Thus, initially, it is possible to have hundreds of attributes (or columns) that describe each K indicators (or descriptors). After application of Different algorithms SOMi, there will be as many columns (number much smaller than the initial attributes, hence the reduction) that algorithms SOMi applied. The data will be described by the category numbers (or classes) for each algorithm. The objects Oi of a table will be described by the category numbers for each algorithm. An element xij of the array corresponds to the class number to which the object Oi belongs after application of the algorithm j. Indeed, A map is a simplified view of all the images in the database. Indeed, if the map is in the form of a two-dimensional view (13x13), we will have 169 images representative of the whole original base (which represents 169 classes). But you should know that each image among the 169 of the map hide (or represent) several other images of the base. All the images hidden (or represented) by the image number n, will have as number (or will belong to the class) n. The result of the classification step in the case of the application of the algorithm SOM is a 2-dimensional topological map where each object referent is considered as a neuron represented by a prototype vector of the same dimension as the data. In fact, each algorithm SOMi gives as a result a two-dimensional topological map T _S ι and each neuron (or element) of the map will have a number that will identify all the data xij represented by this neuron, where s is an index to designate the application of an algorithm SOM and I the number of algorithms used. During step E ₂ , the method goes after having reduced the dimension of the description space, by using several algorithms SOMi, recode each element of the arrays. This is illustrated in FIG. 4. The example given in this figure shows the application of several algorithms SOMi, for each data item Oi, corresponding to different topographic classifications. For each element of a table T _S ι, and each classification of 1 to C, 1 to P, 1 to D, in the figure, SOM1, ... SOMi, with i = C or P or D, the process goes recode the elements xij of the table, for each type of descriptor of type 1 to K. This recoding consists of representing each data item Oi by a vector having as many components as SOM algorithms used. A component xij obtained by recoding corresponds to the number of the class to which the data Oi belongs in the map SOMj, it is also equal to the number of the prototype neuron closest to this datum in the original description space. At the end of this recoding, the processor has, for each type of descriptor k, a table of data T _S ι having a number of lines (always the same equal to the number N of objects Oi of the base), a number of columns corresponding to the number of algorithms SOMi applied for each type of descriptor. In the figure this is illustrated by the sizing NxC for the table T _S c, NxD for the table T _S p and NxE for the table T _SD , with C, D and E representing the number of algorithms SOM used for each type of descriptor.

FIG. 5 details the steps implemented during the step corresponding to the 1 ^θrθ merge of the SOMi classifications. This step implements a relational analysis algorithm known to those skilled in the art or, more generally, it can implement any type of fusion algorithm having functions similar to those offered by the relational analysis. For each descriptor k, the T'k data table is subjected to a relational analysis in order to determine a consensus between the different classification results by the SOMi. This is the first step of merging the objects in the process. This step can also be seen as a meta-classification whose final result is a compromise classification. The principle of relational analysis is to find a result consistent with the majority of opinions expressed (in general). In the present example, this is classification, and therefore the result of the relational analysis will be to put in the same class all the images or objects of the database that were put together in the same class by the majority SOMi algorithms used. Thus, in FIG. 5, the recoded data contained in the first table T ' _S c for each SOMi, for i varying from 1 to C, for example, will be subjected to a relational analysis algorithm which will give the result of the classification. as close as possible to all the classifications obtained by the algorithms SOMi i = 1, ..., C applied to each type of descriptor. A two-column table is thus obtained, the first column designating the objects Oi and the second column the cluster number, the class to which the object belongs. The letters AR correspond to the relational analysis operation applied to a table. At this stage we find ourselves in the presence of two possible variants:

1) the first variant consists for each descriptor, to return to the starting SOMi (we measure the distances between the compromised partition obtained by the RA and each of the scores obtained by the applied SOMs) using the results of the consensus obtained during relational analysis. This will allow you to select the best SOMi map by type of descriptor, and use these maps for browsing and searching information, this will respond to requests from users, when they specify the type of descriptor they are interested in.

2) the second variant is to merge the results obtained by the AR for all the descriptors and return for the best maps

(like before). The advantage of this approach will be to find, for each type of descriptor, the best card that takes into account the other descriptors.

Finding the best map, therefore, comes back to find the partition closest to the partition "compromise" found by the relational analysis (or most correlated with this partition). Several mathematical indicators, known to those skilled in the art, exist in the scientific literature for calculating this correlation.

From the results of the first merger, a second recoding of the data is performed. Figure 6 is an illustration of step E ₅ . After finding the best SOMi map of each descriptor, we will be in the presence of K maps SOMi (one for each descriptor), considered as the K best maps. These K cards will be subjected to the relational analysis which will look for the partition compromised between all the partitions relating to the K SOM.

After this second recoding, the processor has an NxK data table with N the number of objects in the base and K the number of descriptors chosen in the first step E ₁ . The data of this new table T _NK are in a form similar to that of the tables obtained in Figure 5. Indeed the number of columns will be equal to K is each column will simply be one of the columns obtained in Figure 5 (the best of the C SOMs for the descriptor 1, the best of the D SOMs of the descriptor 2, etc.). The data is subjected to a relational analysis algorithm to obtain the fusion of the data and the search for a global consensus. The consensus found by this algorithm represents a final classification and makes it possible to select the best SOMi card or SOMf card that will be used as an interface with the end user for browsing and searching information. The advantage of this last merge is to have the best classification of the database, regardless of the type of descriptor. At this point, a user can have a real-time response to a request without having to specify a specific descriptor type.

Another feature that can be added to the system will be the interactivity between the system and the end user to improve or refine the classes obtained by the method. Indeed, when a user makes a request, he will have one or more classes corresponding to his request. It may, eventually, remove one or more data (or images) that it deems to be misclassified and offer them to the system that will fetch the best possible class for each data. The user has the possibility to indicate that information is relevant or not in relation to his request. This interaction with the user results in an update and a refinement of the classification. Based on the topological properties of the SOM card, the system reclassifies the information according to the user's reaction. This allows an active evolution of the system. The user seeing a given image class may decide that one or more of the images in this class do not correspond to the observed class. This (or these images) will then be proposed to the system so that it classifies them in the most relevant class.

In summary, the system according to the invention makes it possible to classify and visualize data with very large multi-modal dimensionality in a space of small dimensionalities, or partitioning space, without having information a priori on the number of groups. The first contribution consists in defining the problem of fusion as a problem of meta-classification in a space of categorical variables by an automatic classification technique (the relational analysis). The second contribution is to deal with the problem of fusion in a modular, cooperative and evolving way. Indeed, this process is evolutionary compared to data and compared to users or experts. A process of "backward" backtracking and refinement of the results of the global classification is introduced in the modular merge process. The use of unsupervised connectionist methods as a means of data recoding (quantization) and relational analysis as a merge method allows hierarchical visualization of classification results with several levels of detail. The effectiveness of this method is illustrated on a problem of research and quick access to visual information in a database of images described by a set of numerical descriptors (color descriptors and texture descriptors) and textual (several thousand words extracted from the web pages).

Claims

1 - System for searching information in a large database (1), comprising a processor (2) and inputs / outputs (3, 4), said system being characterized in that said processor (1) 2) has at least the following elements:

A first module Ei adapted to extract the descriptors associated with each object of the database, and to construct an array containing the objects and the value of a descriptor associated with an object, for the descriptors chosen for the representation of the object. 'object,

A second module E ₂ adapted to apply several classification algorithms SOMi, for each of the tables T _Tk from the module E ₁ , in order to assign for each object Oi, a class number xij, for an algorithm SOMi applied, for each category of descriptors, • A third module E ₃ adapted to merge the results from the module E _{2 in} order to determine for each type of descriptor Tk, a class number associated with an object Oi,

• A fourth module E ₄ , adapted to find what is the column SOMi of a table T _S ι closest to the column obtained during the first merge of step E ₃ , and to select the nearest SOMi map contained in the table T _S ι _, or better map SOMi,

• A fifth module E ₅ adapted to merge the "best maps" are SOMi, and apply a search algorithm of the best map SOMf to transmit to a display means (5).

2 - System according to claim 1, characterized in that the fusion algorithm is a relational analysis algorithm.

3 - System according to one of claims 1 to 2, characterized in that the object is an image and in that the extracted attributes are selected from the following list: texture, color. 4 - System according to claim 1, characterized in that after the melting step performed in the module E ₃ , the method returns to search for the best typed card 1, then the best card type N, etc..

5 - System according to claim 1, characterized in that after the melting step performed within the module E ₃ , the method takes K results from the merger and merges them using a fusion technique, so to get a single compromised partition to fetch the best map for a descriptor type.