EP4165519A1

EP4165519A1 - Method and system for merging information

Info

Publication number: EP4165519A1
Application number: EP20731485.7A
Authority: EP
Inventors: Kilian VASNIER; Sylvain GATEPAILLE; Valérian JUSTINE
Original assignee: Airbus Defence and Space SAS
Current assignee: Airbus Defence and Space SAS
Priority date: 2019-06-14
Filing date: 2020-06-12
Publication date: 2023-04-19
Also published as: US20220374464A1; FR3097346B1; FR3097346A1; WO2020249719A1

Abstract

According to the invention, instances of individuals are generated by ontology alignment using information from a variety of sources. In order to perform a merging of information aimed at merging the instances of individuals that correspond to a single individual, a data-processing system performs the following steps: generating the instances of individuals using an ontology which defines, for each property of each instance of an individual, an evolution model to be applied to said property, the evolution model representing the evolution of the reliability of said property over time in relation to the variability of said property over time; performing the merging of information by comparing, two-by-two, the generated instances of individuals with instances of individuals stored in a knowledge base, performing, for each shared property, a calculation of similarity distance by applying at least the evolution model defined for said property, so as to define a coefficient of confidence for each property in order to decide whether or not to merge said instances of individuals; and updating the knowledge base with the instances of individuals resulting from the merging of information. The effectiveness of the merging of information is thus improved.

Description

DESCRIPTION

TITLE: PROCESS AND SYSTEM FOR MERGING INFORMATION

TECHNICAL AREA

The technical field of the present invention relates to information fusion methods and systems. The technical field of the present invention is also that of situational awareness methods and systems which are used to detect abnormal behavior of individuals (vehicle, person, etc.) and which are based on such fusion methods and systems. information.

STATE OF THE PRIOR ART

Many fields and activities are interested in the fusion of information: medical, environment, air and maritime traffic surveillance, military security ... Their common point lies in the fact of having to manage dynamic systems in real time with a multitude of data which must be summarized in a single operational table in order to allow a better understanding of the situations, what is called “situation awareness” in English.

The information to be processed to establish such an operational table can come from various sources. Two categories of information provided can be distinguished: so-called "hard" information and "flexible" information ("soft" in English). The hard information provides a quantitative evaluation of elements and comes from physical sensors (camera, microphone, radar ...). The flexible information comes from an extraction of linguistic content (observer report, text, phone call ...) allowing a qualitative assessment of elements and possible relationships between them. In other words, hard information is precise information that can most often be reduced to a numerical value, and flexible information is information that is often difficult to reduce to a numerical value, requiring knowledge of the context in which the information was acquired. to understand it and which is difficult to use when isolated from the environment in which said information was collected.

Information fusion involves several steps, the two main ones being (1) a calculation of similarity distance between the different information available, although these information is of a varied nature, and (2) G association of this information, or not, depending on the result of the similarity calculation. The objective here is to detect whether various information received concerns the same individual or not. The term “individual” is understood in the broad sense in the field of information fusion, namely a separate unit (entity) in a domain of interpretation (person, vehicle, object, group, etc.

The information fusion solutions in the literature make a strict comparison between the properties of individuals detected in the information received at a given point in time, regardless of the time difference between the points in time when the information in question was generated. For example, when a maritime surveillance system attempts to compare information relating to a vessel observed three days ago with information relating to a vessel observed more recently in order to determine whether it is the same individual or not, the identity of the captain is at that time more reliable information than the respective positions of these vessels. The approach used is therefore a brake on the automation of information fusion processes, which then need, from an operational point of view, human intervention to ensure that a similarity detected between information is effectively a matter of concern. a correlation and not a simple coincidence without reality on the ground.

It is therefore desirable to overcome these drawbacks of the prior art. In particular, it is desirable to provide a solution which, in an information merger, reduces the number of false positives and increases the number of true negatives. More generally, it is desirable to provide a more efficient information fusion solution. It is in particular desirable to provide a solution which, in the context of situation management based on the fusion of information, limits the intervention of a human operator to decide whether the information presented to him is duplicate. or whether the said information does indeed relate to separate individuals.

DISCLOSURE OF THE INVENTION

An object of the present invention is to provide a method of processing information which originates from various sources and from which instances of individuals are generated by ontology alignment, the method of processing information comprising a fusion of information aimed at merging the instances of individuals which correspond to the same individual, the method being implemented by a data processing system, characterized in that the method comprises the following steps: generating the instances of individuals using an ontology which defines, for each property of each instance of individual, an evolution model to be applied to said property, the evolution model represents the evolution of the reliability of said property over time in relation to the variability over time of said property; merge information by comparing in pairs the instances of individuals generated with instances of individuals stored in the knowledge base, by performing for each joint property a similarity distance calculation in application at least of the model of evolution defined for said property, so as to define a confidence coefficient for each property to decide whether or not to merge said instances of individuals; and updating the knowledge base with the instances of individuals resulting from the information fusion. Thus, the fusion of information is efficient, because it limits the taking into account of properties according to their variability over the time which separates two observations (instants at which the information concerned was captured).

According to a particular embodiment, each evolution model is of one type among the following three possible types: constant, for the properties which do not change over time; predictive, for properties which can be estimated over a certain limited period of time or with a certain uncertainty which evolves over time; and circumstantial, for properties whose evolution over time depends on the occurrence of an event. Thus, the properties are associated with evolution models adapted to different types of property variability.

According to a particular embodiment, the circumstantial model of evolution is exponentially decreasing. Thus, even with a roughly defined exponential decay accentuation time factor, properties subject to sporadic events influencing the variability of said properties are easily taken into account.

According to a particular embodiment, each instance of an individual which results from the merger of two other instances of an individual retains only one value for each property among those available in said other instances of an individual and the value retained depends on the model evolution with which said property is associated. Thus, the information fusion is refined. According to a particular embodiment: in the case of constant evolution models, the value kept is that having the best precision; in the case of predictive evolution models, the retained value is the most recent; and in the case of circumstantial evolution models, the conserved value is that showing the highest confidence coefficient according to the following system of equations:

72 ⁼ ^ 2 where the index "1" represents the oldest information and the index "2" represents the most recent information, where l is the coefficient representative of a reliability of the source having carried out the capture of the information considered, t is an exponential decay accentuation time factor, and t represents the instant of capture of the information considered.

According to a particular embodiment, the method further comprises the following step: exploiting the results obtained by merging information in a situation management system, and detecting abnormal behavior of individuals using a set of predefined rules , or to a situation ontology model, and to instances of individuals resulting from the fusion of information. Thus, the intervention of a human operator in deciding whether the information presented to him is duplicate or whether said information does indeed relate to distinct individuals is limited.

According to a particular embodiment, the similarity distance calculation by applying at least the evolution model is aggregated with at least one other similarity calculation. Thus, the information fusion is refined.

According to a particular embodiment, the similarity calculations are weighted. Thus, the merging of information can be easily personalized for a specific use case (maritime surveillance, etc.).

According to a particular embodiment, a said further calculation of similarity distance is a calculation of taxonomic similarity distance and said further calculation of domain similarity distance is a range domain similarity distance calculation. According to a particular embodiment, the calculation of the similarity distance in application at least of the evolution model applies a reliability coefficient of the sources having captured the information considered. Thus, more credit can easily be given to information from reliable sources.

According to a particular embodiment, the information to be processed is flexible information and / or hard information. Thus, information fusion is effective regardless of the nature, hard or flexible, of the information collected.

The invention also relates to a computer program, which can be stored on a medium and / or downloaded from a communication network, in order to be read by a processor. This computer program includes instructions for implementing the above-mentioned method in any of their embodiments, when said program is executed by the processor. The invention also relates to an information storage medium storing such a computer program.

The invention also relates to an information processing system which originates from various sources and from which instances of individuals are generated by ontology alignment, the information processing system comprising electronic circuitry implementing a fusion of individuals. 'information aiming to merge the instances of individuals which correspond to the same individual, characterized in that the electronic circuitry implements: means for generating the instances of individuals using an ontology which defines, for each property of each instance of individual, an evolution model to be applied to said property, the evolution model represents the evolution of the reliability of said property over time in relation to the variability over time of said property; means for performing the merging of information by comparing two by two instances of individuals generated with instances of individuals stored in the knowledge base, by performing for each common property a similarity distance calculation in application of at least the evolution model defined for said property, so as to define a confidence coefficient for each property to decide whether or not to merge said instances of individuals; and means for updating the knowledge base with instances of individuals resulting from the information fusion.

BRIEF DESCRIPTION OF THE DRAWINGS The characteristics of the invention mentioned above, as well as others, will emerge more clearly on reading the following description of at least one exemplary embodiment, said description being given in relation to the accompanying drawings, among which:

[Fig. 1] schematically illustrates an information processing method implementing the present invention;

[Fig. 2] schematically illustrates an example of a hardware arrangement of an information processing system in which the present invention can be implemented;

[Fig. 3] schematically illustrates an example of the hardware arrangement of a control unit used in the information processing system;

[Fig. 4A] schematically illustrates a first example of a model of the evolution over time of a coefficient of confidence of a property of an instance of an individual;

[Fig. 4B] schematically illustrates a second example of a model of the evolution over time of a coefficient of confidence of a property of an individual instance; and

[Fig. 5] schematically illustrates a mechanism for calculating the distance of similarity between two instances of individuals, in a particular embodiment.

DETAILED EXPOSURE OF EMBODIMENTS

Fig. 1 schematically illustrates an information processing method implementing the present invention. The method is implemented by an information processing system, an example of a hardware arrangement of which is detailed below in relation to FIG. 2.

In a step S 101, the information processing system collects information. Data is collected from multiple sources and the information collected comes from sources of various types and capacities. Each information collected is either of the hard information type or of the flexible information type. Multi-source collection involves collecting information from sources relevant to the targeted use case of information fusion. In this regard, we can refer in particular to the document “Characterization of hard and soft sources of information: A practical illustration” by Anne-Laure Jousselme et al., 17th International Conference on Information Fusion, 2014.

The hard information is obtained from sources such as physical sensors. This information is then structured, by the nature of the sensors which produce this information, in a raw data format. Soft information is linked to a human activity (social media, websites, official reports from a community or organization, etc.), are usually very large and unstructured. The extraction of flexible information is then based on a linguistic and semantic analysis of the content. Soft information is therefore considered as subjective, while hard information is considered objective.

In the case of hard information, collection is carried out directly from physical sensors, or from databases collecting information from these physical sensors, sometimes by applying processing. In the field of maritime surveillance, reference may be made in particular to the databases accessible on the GISIS website ("Global Integrated Shipping Information System" in English, https://gisis.imo.org) of the World Maritime Organization. , or on the website of the Paris MoU organization (“Paris Memorandum of Understanding on port State control”, https://www.parismou.org/) in charge of monitoring maritime and port activities in Western Europe or even to InterPol databases.

In the case of flexible sources, collection is usually done from websites or social media, such as Facebook (registered trademark) or Twitter (registered trademark). Open source intelligence platforms can also provide information resulting from one or more processing (translation, transcription, extraction, etc.) applied to pre-collected information, which makes it possible to derive so-called information from it. 'individuals of interest (eg, person, place, organization, event, equipment).

The information collected can thus come from intelligence of human origin (designated under the term HUMINT, for “Human Intelligence” in English), from intelligence of open source origin (designated under the term OSINT, for “Open Source Intelligence” in English ) a maritime website, RSS (“Really Simple Syndication”) type flow syndication, an automatic identification system AIS (“Automatic Identification System”) for ships, databases maritime, radar information (designated under the term RADINT, for "Radar Intelligence" in English) with potentially different types of radar, information of electromagnetic origin (designated under the term SIGINT, for "Signal Intelligence" in English) such as radar activity detections of vessels or analysis of telephony signals mobile, and image source information (designated under the term IMINT, for “Image Intelligence”) such as images captured by satellites or drones.

Collection therefore makes it possible to obtain a set of hard and / or flexible information that concerns individuals. Information about these individuals is extracted from data available from various sources. The extraction can be done at the level of the source itself, so that the information processing system obtains in step S 101 information already “digested” (eg, recognition of a shape of a vessel in a video image sequence). The extraction can, as a variant, be done at the level of the information processing system, which then receives raw data from the source in question to be digested.

In a step S 102, the information processing system performs an ontology matching ("onthology matching" in English).

Ontology is a representation of the information of a system that defines the types of individuals of this system with their categories, properties and relationships between these individuals for a specific operational use case (maritime surveillance, for example). The ontology thus makes it possible to have the same representation of information which is compatible with both hard and soft sources.

Any individual identified and extracted at the end of the information collection is instantiated, to then feed relevant information into a situation monitoring system. Likewise, any property linked to this individual and extracted from the corresponding collected information is instantiated. Note that a property is either a literal (also called an "attribute"), such as for example the length of a ship, or a relation of an individual with another individual, such as for example the relation between a ship and its captain. However, when the property is not present in the collected information in question, the property in question is not instantiated. Thus, an individual extracted from collected information can be totally or partially instantiated.

For example, in the case of maritime surveillance, an ontology can define an individual of type "ship", with several properties (eg, name of the ship, owner, date of observation, size, position, speed, IMO number (" International Maritime Organization number ”in English) ...). From information coming from a first source (eg, AIS automatic identification system), an instance (also referred to as an object) of an individual representing this vessel can be created with a literal instance for IMO number, observation date, position and speed, but not for the name of the vessel, the owner and the size, which are not part of the information contained in the messages of the automatic identification systems AIS. From another source of information, such as a vessel watch list for sensitive areas of the world, an individual instance representing that vessel with a literal instance for IMO number, vessel name and the shipowner can be created from information from this other source of information, but without an instance of a literal for speed, position and date of observation. It should also be noted that, in the field of information fusion, the fact that an individual instance does not include an instance of one or more particular literals can already be information in itself. . The fact of not instantiating a property in an instance of individual, rather than using a default value for this property, avoids detecting by mistake a correlation between two instances of individuals because of this property which would have been defined. by default for one and / or the other of these instances of individuals.

Ontology alignment therefore consists of a total or partial instantiation of all individuals, with their properties and relationships, detected in the information collected, by inheriting the definitions provided by the ontology considered.

The information collected can already be assigned, at the time of collection, to an ontology or not. The information processing system can also use an existing ontology with the information collected, or use its own ontology adapted to the use case (e.g., maritime surveillance). When the information source already provides an ontology, a transcription of the ontology provided by said information source into an ontology adapted to the use case (e.g., maritime surveillance) can be performed. When no ontology is provided by the information source, the instantiation of detected individuals relies directly on the ontology appropriate to the use case. For the purposes of the invention, the ontology adapted to the use case comprises parameters necessary for the establishment of evolution models in association with the instantiated properties.

To apply the appropriate evolution model to each instantiated property, an appropriate ontology must be used. This comes from an expertise making it possible to determine which model describes the evolution over time of each defined property and its variability, and in in particular, to correctly parameterize the evolution model accordingly (eg, time factor t as presented below). The more a property is subject to variations over time, the less reliable this property is considered in information fusion. Each property is then associated with: a value; to an evolution model accompanied by one or more configuration parameters of said evolution model; preferably, a piece of information on the reliability of the information source that allowed the instantiation of the property in question; and information representative of an observation instant (ie, the moment when the value of the property was obtained by the information source). A classical ontology describes a property only by its value and its observation time, as well as possibly by the reliability of the information source. But here, each property is completed by an evolution model which represents the evolution of the reliability of said property over time in relation to the variability over time of said property. The term “reliability” is understood to mean the degree of confidence that the information processing system may have in a property value to decide whether or not to merge instances of individuals, in view of its variability over the period between the instants of. captures information from which said instances of individuals are extracted.

In a step S103, the information processing system performs an update of a knowledge base KB 205. It should be noted that knowledge bases are distinguished from simple databases. An explanation is given in the document “Knowledge Base Support for Decision Making Using Fusion Techniques in a C2 Environment”, Amanda Vizedom et al, Proceedings of the 4th International Conference on Information Fusion, International Society of Information Fusion, 2001, where he is indicated that the distinction between knowledge bases and databases is based on the distinction between general knowledge and specific data. A knowledge base is optimized for storing general, potentially complex knowledge of the type that can be instantiated. A database, on the other hand, usually does not have the means to represent general principles, but is optimized to store very specific data, such as lists of elements and attributes. The added value of knowledge bases lies in the fact that they constitute the basis of a reasoning in which new information is deduced from what is already known. It goes beyond finding data. Reasoning with a knowledge base involves applying and combining general knowledge to draw implicit conclusions, but not explicitly contained in the original information. This knowledge-based reasoning enables diagnosis, monitoring, and general response to queries to a depth not possible with a simple database.

The instances of individuals during the ontology alignment in step S102 are therefore stored in the knowledge base KB 205 structured according to the ontology used to describe the individuals instantiated from the various information collected in step S 101 (with the necessary parameters for setting up evolution models).

In a step S 104, the information processing system performs an information merging operation. Information fusion is based on calculations of similarity distance between instances of individuals, and more precisely of similarity distances between properties of these instances of individuals. The similarity distance between two instances of individuals is a metric defining to what extent the instantiated individuals are similar or different, and even defining to what extent it is possible to decide whether these individuals are similar or different.

The information fusion operation performed here takes into account evolution models, associated with each possible property of individuals according to the ontology applied in step S102. These evolution models make it possible to take into account the temporal dimension of the properties of individuals and their respective variabilities in the information fusion operation.

Thus, step S104 mainly comprises two sub-steps: a sub-step S 1041 where similarity distance calculations are performed by applying the evolution models, for each property of each instance of an individual to be considered; and a data association sub-step S 1042, where the instances of individuals corresponding to the same individuals are associated, or according to the terminology applicable in the field, merged.

Some uncertainty as to the reliability of the information collected exists, because of the time period between the collection of information related to the variability of the properties observed and potentially because of the reliability of the information source itself (eg , accuracy of a sensor used to retrieve this information). Given that in instances of individuals in the KB 205 knowledge base the properties of instances of individuals may have been obtained from different information sources (due to information merging), it is necessary to consider the temporal dimension of this uncertainty at the level of the properties of instances of individuals and not at level of the individuals themselves. In addition, each property evolves over time in a different way. It is then proposed, in the calculations of similarity distances, to associate a weighting per individual instance property. This weighting corresponds to the uncertainty inherent in said property with respect to its collection method and to an evolution model corresponding to the estimated evolution over time of the variability of said property. The resulting weighting should express the fact that the more uncertain a property, the less impact it should have on similarity distance calculations, since information merging cannot rely on this property to decide whether two instances of individuals considered correspond or not to the same individual. For example, in the field of maritime surveillance, if we compare the position of a ship observed ten minutes ago to another position of a ship observed 4 days ago, it is not possible to know whether these two ships are one and the same or not, because in 4 days, the possibilities of changing the position of a ship are too vast for this to be a reliable criterion for comparison. Conversely, as the length of a vessel does not change, comparing a vessel length observation from a year ago with an observation from a day ago is reliable in trying to determine if it is the same ship or not.

It is therefore taken into account here that each property of an individual does not necessarily evolve in the same way as another property of that individual. For example, the length of a ship is not likely to change, while its position is. Separate evolution models therefore represent these differences in the evolution of properties over time and therefore of the confidence to be given to these properties for the fusion of information as a function of the times of observation of the property in question.

Consider an instance of individual O comprising a set of properties P. For each property p EP, g _r represents a confidence coefficient defined as follows: where l _r is an optional coefficient representative of the reliability of the information source that made it possible to obtain the instance of the property p considered and m _r is the evolution model applicable to the property p considered.

In the case of hard information sources, l _r is preferably equal to 1 - e _s , where e _s is the error rate of the information source. In the case of flexible information sources, l _r is preferably equal to the F-measure, also called F-score. The domain of g _r is D = [0.1] GM, as for l _r and p _p . A weight (or score) equal to "1" is considered a very reliable property to perform a similarity distance calculation and, conversely, a confidence coefficient (or weight or score) of zero means the property is too uncertain to be taken. taken into account in the calculation of similarity distance. Note that a transposition in the domain D = [-1,1] is possible, where a confidence coefficient (or weight or score) equal to "1" designates a very reliable property for performing a similarity distance calculation, a confidence coefficient (or weight or score) equal to "-1" is considered to be too uncertain a property to be taken into account in the calculation of similarity distance, and a confidence coefficient (or weight or score) equal to " 0 ”reveals an inability to decide on the reliability of the property in question.

The models of evolution are preferably of three possible types: constant; predictive; and circumstantial.

The constant evolution model is associated with p properties which do not change over time, such as the length of a ship. A representation of a particular embodiment is provided in FIG. 4A, where it appears that the confidence coefficient g _r is equal to the coefficient l _r (m _r being here equal to “1”).

Unlike the constant evolution model, the predictive evolution model evolves over time and is therefore associated with p properties which evolve over time. In the case of maritime surveillance, properties p which correspond to the predictive evolution model are, for example, the speed of a ship, its position and its direction of navigation. The values of these properties p can be estimated (ie, predicted) over a certain period of time (over a limited period of time, beyond which the variability of the property p considered is such that its reliability is zero) or with a certain uncertainty that evolves over time. For example, knowing the position of a ship and the direction of its movement, it is easy to predict the area the ship will be in in the near future (eg, a few minutes later). In the case of predictive evolution models, the evolution is predictable, in particular thanks to mathematical tools. Such tools are commonly used, in particular to estimate a change in the position or speed of a physical object. Kalman filters or particulate filters (also known as sequential Monte Carlo methods) are preferred examples. By their very nature, predictive evolution models incorporate a notion of a confidence coefficient, often in the form of a covariance matrix. Thus, in these cases, it is the comparison of the properties according to the predictive evolution model which directly integrates not only a predicted value but also the possible error on the prediction. This is the case, for example, with the Mahalanobis distance.

The circumstantial evolution model is associated with p properties, the evolution of which over time depends on the occurrence of an event. In the literature, such a concept is defined as a rare stochastic event insofar as this type of event has a more or less low probability of occurring. The p properties associated with the circumstantial evolution model are therefore subject to modification following a specific unforeseeable event. For example, in the case of maritime surveillance, circumstantial properties are the identity of the master or the flag of a vessel, which may change when the vessel in question changes owners. Another example is the location of the vessel, which can change a lot over time. Localization is here to be distinguished from position. Position is a set of geographic coordinates, while a vessel's location is the name of the place (e.g., Mediterranean Sea) where the vessel is located.

The difficulty in circumstantial evolution models is to define the probability of such an event occurring and to find an adequate way to represent it. While other models could be used, exponential decay models appear to be a suitable approach. A representation of one embodiment is provided in FIG. 4B, where it appears that the confidence coefficient g _r is defined as follows: where t is a time factor allowing to accentuate or not the curve of the function of exponential decay. As time passes, the confidence coefficient g _r gradually decreases. Note that the maximum value of the confidence coefficient g _r is here equal to the coefficient l _r , when t = 0. The time factor t can be determined empirically and / or statistically, by business knowledge. Typically, at 3t, the property is considered to have changed, and the confidence coefficient should then be practically zero. If you know from experience that the captain of a military ship is replaced every 4 years, then: t = (4 years) / 3 = 16 months. This type of approach, even by roughly defining the time factor t, significantly improves information fusion processes.

The similarity distance DS {l _j , / _fc ) between two instances of individuals I _j and I _k is then an average sum of the weighted similarity distances of each property p common to the two instances of individuals I _j and I _k and can then be calculated in the sub-step S 1041 as follows:

property p common to the two instances of individuals I _j and I _k .

There is a wide variety of possible similarity distance calculations depending on the type of property to be compared. For example, a similarity distance calculation of a textual property can be obtained using the Levenshtein distance (also called "edit distance"), which is a metric for measuring the difference between two sequences of text. In this case, the Levenshtein distance represents the minimum number of character change operations to be carried out in order to transform a first word, or a first sequence of words, to correspond to a second word, or respectively a second sequence of words . According to another example of calculating the distance of textual similarity, the Hamming distance (which is an upper bound of the Levenshtein distance) is used. The Hamming distance makes it possible to quantify the differences between two sequences of symbols or characters of the same length. Other digital calculations of similarity distances can be used to compare, for example, two speeds or two values of any other physical property.

Normalization aims to ensure that the results of similarity distance calculations can then be used and compared together despite their heterogeneity and despite being based on different distance calculations. The purpose of normalization is to allow the result to be bounded by a distance, usually between 0 and 1. Typically, the results of distance calculations are close to 0 when there is no difference. For example, to normalize the Levenshtein or Hamming distance, it suffices to divide the result of the similarity distance calculation by the sum of the character length of the first sequence and the length of the second sequence

For a more precise overall result, the normalization can be transposed between -1 and 1. The normalization is then made between 0 and 1, then the result of this normalization is subtracted from 1. Thus, 1 represents the similarity and -1 represents the dissimilarity.

This similarity distance calculation by property p common to the instances of individuals considered can be aggregated with other similarity distance calculations, as detailed below in relation to FIG. 5, in order to obtain an aggregated similarity distance which is then used to decide whether or not to merge the instances of individuals I _j and I _k .

In substep S 1042, the information processing system performs a data association operation from the similarity distances calculated in substep S 1041. Data association is a heuristic for deciding whether two instances of individuals must be merged or not, given the similarity distance value (score) between these two instances of individuals. The instances of individuals following the collection of information and at least a subset of those already present in the KB 205 knowledge base are analyzed in pairs to determine if they correspond to the same individual and if they must therefore be merged. In this regard, we can refer to the document: “Systemic Test and Evaluation of a Hard + Soft Information Fusion Framework Challenges and Current Approaches”, Geoff Gross et al, 17th International Conference on Information Fusion, 2014.

The information merging operation of step S 104 therefore consists, as far as possible, of merging instances of individuals who represent the same individual. Preferably, the individual instance which results from the merger of two original individual instances retains only one value for each property among those available in said original individual instances. The retained value depends on the evolution model with which the considered property is associated.

In the case of constant evolution models, the conserved value is that described by the source (eg, sensor) of the information from which is extracted the individual instance considered which has the best precision (which is known to the fact that the ontology has the information on the accuracy of the source which observed the property).

In the case of predictive evolution models, the conserved value is the most recent.

In the case of circumstantial evolution models, the conserved value is that showing the highest confidence coefficient according to the following system of equations:

72 ⁼ ^ 2 where the index "1" represents the oldest information and the index "2" represents the most recent information, where l is the optional coefficient representative of the reliability of the source that performed the capture (or observation) of the information considered, t is the time factor of the predictive evolution model as defined above, and / represents the instant of capture (or observation) of the information considered.

In a step S105, the information processing system performs a new update of the knowledge base KB 205. After the information merging has been performed, each new individual instance resulting from the information merging is stored in the KB 205 knowledge base. Since the similarity distance was sufficiently small to allow the association of data between at least one pair of instances of individuals, the instances of individuals (and therefore their properties ) can be merged to generate an "augmented" instance for this individual. This new instance can then in turn be associated with one or more other instances during a new iteration of the information fusion operation. The instances of individuals which have allowed the fusion of information and the instance of individuals generated by the fusion of information are therefore all kept in the knowledge base KB 205 and are linked to each other therein. As a variant, the instances of individuals that were used to create a merged individual instance are not kept in KB 205 knowledge base.

In a step S 106, a situational awareness system uses the results obtained during the information merging operations carried out in step S 105 and represents these results in the form of synthetic views, in order to facilitate the detection of abnormal behavior. Such situational awareness systems are well known in the field of maritime surveillance and / or civil security, and are generally operated by regional, national or international organizations responsible for monitoring a given geographical area. The situation monitoring system is integrated, or connected, to the information processing system.

Such situational awareness systems implement sets of predefined rules exploiting the results obtained in step S 105 to detect individuals (ship, etc.) with abnormal behavior compared to a behavior defined as standard in view of the type. of the individual considered, and to generate an alert if necessary, which is for example displayed to the operator. Such rule-based mechanisms are well known in the literature through expert systems. In another example of the means implemented by a situational awareness system to detect abnormal behavior and assess threats, situational ontology models are used to characterize types of behavior. One such example of the use of situation ontology is described in the document "Improving Maritime Situational Awareness by Fusing Sensor Information and Intelligence", van den Broek et al., International Conference on Information Fusion, 2011 ..

Such situational awareness systems generally include one or more common operational views (or "Common Operational Picture, COP" in English) made up of synthetic graphical or / and tabular views presenting the results of the information fusion with those obtained by d other biases. For example, the situational awareness system comprises, in a graphical interface, a geographical view of the monitored area with a background map or an aerial image or both superimposed. Vessels in the monitored area are superimposed in the geographic view by an icon and a label giving the vessel's identification information. A displacement vector, or a trajectory, can also be presented for each vessel on the geographic view. In this same graphic interface, the situation monitoring system can also include a tabular or graphical view presenting the alerts generated following the exploitation of the results of the information fusion. These alerts can be presented to a human operator according to a color code according to the severity and / or the urgency of the situation, potentially accompanied by a visual and / or audible warning signal.

In particular, it is recognized in the literature that a human being is capable of correlating up to 7 distinct levels of information in order to obtain operationally exploitable information. Furthermore, the information fusion approaches of the state of the art tend to increase the correlation spaces, but remain limited to the properties of individuals whose time dimension does not enter into the distance calculation. similarity. By applying the information fusion techniques of the state of the art on several hundred instances of individuals coming from various hard and / or flexible information sources and representative of only 5 real individuals, it is possible that the information processing system can only reduce the number of instances of individuals after merging of information to about twenty, in particular because of the failure to take into account the time dimension of the properties. There are therefore around twenty instances of individuals reassembled in the situation and for which the human operator must himself distinguish whether they are duplicates or distinct individuals. However, the greater the number of properties of an individual, the more difficult it is for a human operator to reduce situational awareness to observation of 5 real individuals and to make a safe and rapid decision if necessary.

One of the advantages obtained by using the results of the fusion of information resulting from the method of the invention in a situational awareness system is therefore to offer a correlation space between information much larger than that which a human operator is able to apprehend manually, that is to say by his only cognitive capacities with or without the help of the methods of fusion of information of the state of the art, this in order to eliminate the duplicates before display and offer improved and more automated situational awareness. This allows the human operator to focus on situational interpretation and situational decision making, rather than residual and manual correlation operations.

In a particular embodiment, the graphical interface also presents means of representing the history of information mergers carried out automatically at the during the implementation of the method and saved as and when in the knowledge base KB 205.

It should be noted that the use of the results of the fusion of information as described in step S106 is not however limited to the examples of situation management and to the examples of modes of representation mentioned above.

Fig. 2 schematically illustrates an example of a hardware arrangement of an information processing system in which the present invention can be implemented. The information processing system is for example a maritime surveillance system MSS (“Maritime Surveillance System” in English) 250. In the case of use of maritime surveillance, the information collected concerns any vessel present at sea in an area. predefined geographic area (eg, all seas and oceans around the world). Sources have recovered partial or redundant information on ships. This information must be correlated so that it can be completed and merged in order to better understand the behavior of all these ships. The result of the information fusion is a descriptive list of vessels containing more complete and non-redundant information, which allows efficient work on the information retrieved, which is impossible without precise correlation of the information collected. Evolution models provide this precision by taking into account the temporal evolution of the properties of the instances of individuals following the collection of information and more particularly the variability of these properties over time. The units (or modules) shown in the example arrangement of Fig. 2 achieve this result.

The information processing system comprises a DC (“Data Collector”) collection unit 201, in charge of recovering information from a set 200 of various information sources SI, S2, S3, S4, independently. whether the sources in question provide hard or soft information. The collection unit DC 201 has the behavior already described in relation to step S 101.

The DC collection unit 201 can also include direct access to existing databases containing hard and / or flexible information which comes from various sources and which has been previously collected by another means. Thus, the information processing system is capable of interconnecting with a distributed database system originating from distinct actors and authorities. The information processing system further comprises an OM (“Ontology Matching”) ontology alignment unit 202, which has the behavior already described in relation to step S 102.

The information processing system further comprises an input-output unit KIO (“Knowledge Input / Output” in English) in charge of ensuring the access, in input and output, of the knowledge base KB 205. In other words, the input-output unit KIO 203 provides access to the knowledge base KB 205.

The information processing system further comprises an information fusion unit IF ("Information Fusion" in English) 204, which has the behavior already described in relation to step S 104.

As already mentioned in relation to step S 106, the information processing system preferably further comprises a situation monitoring system. The situation monitoring system then comprises a trigger unit TRIGG (“Trigger” in English) 207 and a graphical user interface GUI (“Graphical User Interface” in English) 208. The trigger unit TRIGG 207 is in charge of lifting alerts on abnormal behavior detected as a result of data fusion. The GUI 208 graphical interface is configured to graphically represent alerts on abnormal behavior detected as a result of information merging, as well as individuals related to these alerts.

The information processing system further comprises a CTRL control unit 206 in charge of coordinating, for example by means of a data bus 310, the various units of the information processing system, so as to implement the behavior already described. in relation to FIG. 1.

As described below in relation to FIG. 3, each of the DC 201 collection units, OM 202 ontology alignment, KIO input / output 203 and IF information fusion units 204, can be implemented in hardware form, for example using an electronic component (" chip ”) or a set of electronic components (“ chipset ”in English); or else be produced in software form and implemented by a processor executing the corresponding computer program instructions. The same goes for the TRIGG 207 trigger unit and the GUI 208 GUI. Fig. 3 schematically illustrates an example of a hardware arrangement of the control unit CTRL 206 of the information processing system.

The example of the hardware architecture presented comprises, connected by a communication bus 310: a processor CPU 301; a random access memory RAM (“Random Access Memory” in English) 302; a ROM (“Read Only Memory”) 303 or a Flash memory; a storage unit or a storage medium drive, such as an SD ("Secure Digital") card reader or an HDD ("Hard Disk Drive") 304; and at least one 305 I / O interface.

CPU 301 is capable of executing instructions loaded into RAM 302 from ROM 303, external memory (such as an SD card), storage media (such as disk hard HDD), or a communication network. Upon power-up, the CPU 301 is able to read instructions from RAM 302 and execute them. These instructions form a computer program causing the CPU 301 to implement some or all of the algorithms and steps described here.

Thus, all or part of the algorithms and steps described here can be implemented in software form by executing a set of instructions by a programmable machine, such as a DSP (“Digital Signal Processor”) or a microcontroller or a processor. All or part of the algorithms and steps described here can also be implemented in hardware form by a machine or a dedicated component, such as an FPGA (“Field-Programmable Gâte Array”) or an AS IC (“Application-Specific Integrated Circuit ”in English). Thus, the information processing system comprises electronic circuitry adapted and configured to implement the algorithms and steps described here.

Fig. 5 schematically illustrates a mechanism for calculating the distance of similarity between two instances of individuals, in a particular embodiment in which a calculation of distance of similarity based on the evolution models is aggregated with at least one other calculation of distance of similarity. In the similarity distance calculations, the instances of individuals are compared in pairs, eg, instances of individuals 01 and 02 are injected as input (I) of the similarity distance calculation. A first similarity distance is calculated using a taxonomic similarity distance calculation module TS (“Taxonomy Similarity”) 501. The instances of individuals 01 and 02 are instances of class in the ontology considered. The taxonomic similarity distance calculation compares the positions of the classes of instances of individuals 01 and 02. In the considered ontology, the classes and properties are hierarchical and this hierarchy can be represented by a graph. For example, a class (node) "Submarine" and a class (node) "Boat" both inherit from a class (node) "Boat" which itself inherits from a class (node) "Vehicle" , and from the “Vehicle” class (node) also inherit from the “Aircraft” and “Land Vehicles” classes (nodes), and so on. A distance between two graph nodes can be calculated by counting the number of edges of the shortest path between the nodes considered in the graph. The taxonomic similarity measure also takes into account another criterion to represent depth in the ontological hierarchy. This depth criterion is often represented by the smallest (most specific) common sub-denominator of the two instances of individuals 01 and 02. To do this, the Wu and Palmer similarity distance calculation method can be used. to meet these criteria. The taxonomic similarity distance TS (01; 02) is here defined from the distance which separates the two classes Cl and C2 of the instances of individuals 01 and 02 from the root R of the hierarchy and from the distance which separates their lowest common sub-denominator CO with respect to the root R of the hierarchy, according to the following formula: where d (R; CO) is the distance which separates the class CO from the root R of the hierarchy, d (R; CO; Cl) is the distance which separates the class Cl from the root R passing through the class CO and d (R; CO; C 2) is the distance separating class C2 from the root R passing through class CO. Reference may be made to the document “Verb Semantics and Lexical Sélection”, Z. Wu and M. Palmer, Proceedings of the 32nd Annual Meetings of the Associations for Computational Linguistics, 1994.

The same principle applies to determine the distance between properties in the hierarchy defined by the ontology considered. A second similarity distance is calculated using a domain and range similarity distance calculation module DRS ("Domain and Range Similarity" in English) 502. The calculation of the domain similarity distance and DRS range compares the number of fields (properties) shared by the two classes C1 and C2 to which the two instances of individuals C1 and 02 belong respectively, normalized by their total number of fields. Ontology is in fact preferentially not limited to the hierarchical structure of concepts in the form of classes, but also includes domain and range definitions within the properties, as shown by the following system of equations. Thus, the calculation of distance of similarity between classes involves the comparison of properties which appear in common in the considered instances of these classes.

where OPD (C) (C = C 1 or C 2) represents the set of relation-type properties which have class C in the domain definition of a first subject, and \ OPD (C) \ represents the cardinality of this set ; OPR (C) represents the set of relation-type properties that have class C in the range definition of a second subject, and \ OPR (C) \ represents the cardinality of that set; DPD ÇC) represents the set of literal type properties that have class C in their range definition, and | DPDÇC) | the cardinality of this set.

The computation of domain similarity distance and DRS range is then obtained as follows:

The calculation of taxonomic similarity distance TS and the calculation of the distance of similarity of domain and DRS range are addressed in particular in the document “Semantic Decision Support for Information Fusion Applications”, A Bellenger, PhD Thesis, Institut National des Sciences Appliqués de Rouen , 2013, more particularly in section 7.2.1.1 “Semantic Similarity regarding the Terminology of the Ontology”. A third similarity distance is calculated using a similarity distance calculation module based on the evolution models MoES (“Model of Evolution-based Similarity”) 503. As already indicated, the similarity distance based on models of evolution MoES between instances of individuals 01 and 02 is an average sum of the weighted similarity distances of each property p common to the two instances of individuals 01 and 02, as follows:

The first, second and third similarity distances are then combined by an aggregator module AGG 504, in order to produce at the output (O) of the calculation of the similarity distance a similarity distance SD ("Similarity Distance" in English) between instances of individuals 01 and 02. Preferably, the aggregator module AGG 504 applies respective weights to the first, second and third similarity distances, in order to give more or less importance to each of them and to standardize the result. The weights respectively assigned to the first, second and third similarity distances are defined as a function of the application framework considered. Ontology can thus, for example, give greater weight to the taxonomic similarity distance TS compared to the similarity distance based on the MoES evolution models and to the domain and range similarity distance DRS.

The mechanism for calculating the distance of similarity between two instances of individuals has been presented in Fig. 5 in modular form. The modules in question can be hardware modules or software modules. Further, the similarity distance calculation mechanism shown in FIG. 5 is also representative of a method including steps of calculating the first, second and third similarity distances, and the corresponding aggregation, as described above.

Claims

1. A method of processing information which comes from various sources and from which instances of individuals are generated by ontology alignment, the method of processing information comprising a fusion of information aiming to merge the instances of individuals which correspond to the same individual, the method being implemented by a data processing system, characterized in that the method comprises the following steps:

- generate the instances of individuals using an ontology which defines, for each property of each instance of individual, an evolution model to be applied to said property, the evolution model represents the evolution of the reliability of said property over time in relation to the variability over time of said property;

- perform the fusion of information by comparing in pairs the instances of individuals generated with instances of individuals stored in the knowledge base, by performing for each common property a similarity distance calculation in application at least of the model d 'evolution defined for said property, so as to define a confidence coefficient for each property to decide whether or not to merge said instances of individuals; and

- update the knowledge base with instances of individuals resulting from the merging of information.

2. The method of claim 1, wherein each evolution model is of one type among the following three possible types:

- constant, for properties that do not change over time;

- predictive, for properties which can be estimated over a certain limited period of time or with a certain uncertainty which changes over time; and

- circumstantial, for properties whose evolution over time depends on the occurrence of an event.

3. Method according to claim 2, wherein the circumstantial evolution model is exponentially decreasing.

4. Method according to any one of claims 1 to 3, wherein each individual instance which results from the merger of two other individual instances retains only one value for each property among those available in said other instances. of individual and the conserved value depends on the evolution model with which said property is associated.

5. The method of claim 4, wherein:

- in the case of constant evolution models, the value kept is the one with the best precision;

- in the case of predictive evolution models, the retained value is the most recent; and

- in the case of circumstantial evolution models, the conserved value is that showing the highest confidence coefficient according to the following system of equations:

g ₂ = l ₂ where the index "1" represents the oldest information and the index "2" represents the most recent information, where l is the coefficient representative of a reliability of the source having performed the capture of the information considered, t is an exponential decay accentuation time factor, and t represents the capture instant of the information considered.

6. Method according to any one of claims 1 to 5, further comprising the following step:

- exploit the results obtained by the fusion of information in a situation management system, and detect abnormal behavior of individuals thanks to a set of predefined rules, or to a situation ontology model, and to the instances of individuals resulting from the fusion of information.

7. A method according to any one of claims 1 to 6, wherein the similarity distance calculation by applying at least the evolution model is aggregated with at least one other similarity calculation.

8. The method of claim 7, wherein the similarity calculations are weighted.

The method of claim 7 or 8, wherein said further similarity distance calculation is a taxonomic similarity distance calculation and said further domain similarity distance calculation is a range domain similarity distance calculation. .

10. Method according to any one of claims 1 to 9, wherein the calculation of the similarity distance by applying at least the evolution model applies a reliability coefficient of the sources having captured the information considered.

11. Method according to any one of claims 1 to 10, wherein the information to be processed is soft information and / or hard information.

12. A computer program product comprising instructions for implementing, by a processor, the method according to any one of claims 1 to 11, when said program is executed by said processor.

13. Information storage medium storing a computer program comprising instructions for implementing, by a processor, the method according to any one of claims 1 to 11, when said program is read and executed by said processor.

14. Information processing system which comes from various sources and from which instances of individuals are generated by ontology alignment, the information processing system comprising electronic circuitry implementing an information fusion aiming at merge the instances of individuals which correspond to the same individual, characterized in that the electronic circuitry implements:

- means for generating the instances of individuals using an ontology which defines, for each property of each instance of an individual, an evolution model to be applied to said property, the evolution model represents the evolution of reliability of said property over time in relation to the variability over time of said property;

means for performing the fusion of information by comparing in pairs the instances of individuals generated with instances of individuals stored in the knowledge base, by performing for each property in common a similarity distance calculation in application at least the evolution model defined for said property, so as to define a confidence coefficient for each property to decide whether or not to merge said instances of individuals; and - means for updating the knowledge base with the instances of individuals resulting from the fusion of information.