FR3087555A1

FR3087555A1 - DEVICE FOR AUTOMATIC TEXT PROCESSING BY COMPUTER

Info

Publication number: FR3087555A1
Application number: FR1859660A
Authority: FR
Inventors: Ambroise Cade; Henri Faucher De Corn
Original assignee: Meremind
Current assignee: Meremind
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2020-04-24
Also published as: WO2020079109A1

Abstract

Un dispositif de traitement automatique de texte par ordinateur qui reçoit une phrase découpée en jetons, et qui utilise un rectificateur, un apparieur, et un combinateur de manière répétitive, pour produire une structure formant un arbre qui décrit les liens syntaxiques et sémantiques de la phrase sur la base d'une logique non monotone probabiliste. An automatic computer word processor that receives a tokenized sentence, and uses a rectifier, matcher, and combiner repeatedly, to produce a tree-like structure that describes the syntactic and semantic links of the sentence on the basis of a non-monotonic probabilistic logic.

Description

MEREMIND1.FRD Dispositif de traitement automatique de texte par ordinateur L'invention concerne le domaine du traitement automatique d'un texte par ordinateur, et plus particulièrement le domaine du traitement du langage naturel, ou NLP (Natural Language Processing en anglais).MEREMIND1.FRD Device for automatic word processing by computer The invention relates to the field of automatic processing of a text by computer, and more particularly to the field of natural language processing, or NLP (Natural Language Processing).

Le traitement du langage naturel connaît deux principales branches les méthodes linguistiques et les méthodes à base d'entraînement automatique.There are two main branches of natural language processing: linguistic methods and automatic training-based methods.

10 Les méthodes de la première branche sont basées sur la théorisation des langues issues des théories de Chomsky.10 The methods of the first branch are based on the theory of languages derived from Chomsky's theories.

Cependant, cette branche n'a fait l'objet que de mises en oeuvre à bases d'heuristiques largement manuelles et complexes et n'a jamais connu d'application informatique donnant des résultats satisfaisants pour une application générale.However, this branch has only been the subject of implementations based on largely manual and complex heuristics and has never known a computer application giving satisfactory results for a general application.

15 Les bases de l'intelligence artificielle fondée sur l'entraînement automatique ont été posées dans les années 1960.15 The foundations of artificial intelligence based on automatic training were laid in the 1960s.

Dans les cinq dernières années, au gré des évolutions technologiques, et avec l'explosion des quantités de données accessibles pour réaliser des entraînements automatiques (MIL pour machine learning en anglais, DL pour Deep 20 Learning en anglais et NN pour Neural Networks en anglais), ce domaine a connu une progression exponentielle.In the last five years, according to technological developments, and with the explosion of the quantities of data accessible to carry out automatic training (MIL for machine learning in English, DL for Deep 20 Learning in English and NN for Neural Networks in English) , this field has grown exponentially.

D'une manière générale, les entraînements automatiques reposent sur la détermination par une machine d'un modèle statistique déterminé sur la base d'un entraînement dont les 25 paramètres sont fixés par la personne qui programme l'entraînement, et sur la base d'un jeu d'entraînement.In general, automatic workouts are based on the determination by a machine of a statistical model determined on the basis of a workout, the parameters of which are set by the person who programs the workout, and on the basis of a practice game.

Dans la pratique, cela signifie que le concepteur maîtrise les principes de l'entraînement et les paramètres de celui-ci ainsi que les données sur lesquelles l'entraînement se fonde, mais pas le résultat, qui est appelé modèle d'inférence.In practice, this means that the designer understands the training principles and parameters as well as the data on which the training is based, but not the result, which is called the inference model.

Ainsi, une fois l'entraînement terminé, c'est le modèle d'inférence qui est utilisé pour réaliser 30 les prédictions sur les données d'entrée que l'on souhaite traiter, sans que le concepteur soit capable de le voir autrement que comme une boîte noire (black box en anglais).Thus, once the training is finished, it is the inference model that is used to make the predictions on the input data that one wishes to process, without the designer being able to see it other than as. a black box.

2 Appliquées au NLP, ces méthodes reposent principalement sur la vectorisation des mots, avec des modèles de type Bag-of-words, ou encore Word2Vec, et le traitement de ces vecteurs comme un problème d'entraînement automatique.2 Applied to NLP, these methods are mainly based on the vectorization of words, with models of the Bag-of-words type, or even Word2Vec, and the treatment of these vectors as an automatic training problem.

Ces modèles reposent également sur un enchaînement analyse syntaxique suivi d'une analyse sémantique.These models are also based on a syntactic analysis sequence followed by a semantic analysis.

Ainsi, les phrases sont d'abord découpées syntaxiquement, puis un sens est plaqué sur le résultat.Thus, the sentences are first cut out syntactically, then a meaning is applied to the result.

Néanmoins, la nature « boîte noire » des modèles d'inférence produits est contradictoire avec l'objectif poursuivi dans le NLP.Nevertheless, the "black box" nature of the inference models produced is contradictory to the objective pursued in the NLP.

En effet, le langage a un sens, c'est même son 10 fondement.Indeed, language has a meaning, it is even its foundation.

De plus, ce sens ne s'exprime pas uniquement par la syntaxe ou par la sémantique, mais par une combinaison des deux.Moreover, this meaning is not expressed only by syntax or by semantics, but by a combination of the two.

Pour ces raisons, et contrairement à ce que l'on pourrait croire, les solutions de cette deuxième branche reposent également sur une quantité d'heuristiques importante, qui ont le défaut supplémentaire d'être dépourvues de sens ou de logique perceptible, car elles sont créées pour satisfaire le 15 moteur d'inférence dont l'organisation n'est pas comprise Il existe donc un besoin pour un dispositif de traitement automatique de texte stable, fonctionnel dans le sens le plus général du terme, et qui ne repose pas exclusivement sur des heuristiques.For these reasons, and contrary to what one might think, the solutions of this second branch are also based on a large quantity of heuristics, which have the additional defect of being devoid of sense or of perceptible logic, because they are created to satisfy the inference engine whose organization is not understood There is therefore a need for a stable automatic word processor, functional in the most general sense of the term, and which does not rely exclusively on heuristics.

20 L'invention vient améliorer la situation.The invention improves the situation.

A cet effet, l'invention propose un dispositif de traitement automatique de texte par ordinateur, comprenant une mémoire agencée pour recevoir des données de texte à analyser sous la forme de jetons comprenant chacun une chaîne de caractères et un identifiant de jeton unique, une base de données de concepts 25 associant des chaînes de caractères et des identifiants de concept, certains au moins des identifiants de concept étant associés entre eux, des données de constructions lexicales modèles et des données de constructions structurelles modèles, comprenant chacune une ou plusieurs conditions d'application à une caractéristique et une ou plusieurs conclusions constituant des éléments à appliquer à une caractéristique, et une base de données 30 d'observations associant au moins deux identifiants de concept, un type de relation et une valeur d'observation indiquant une probabilité de véracité du type de relation entre les au moins deux identifiants de concept, le dispositif étant agencé pour travailler de manière 3 répétitive sur un transitoire comprenant des caractéristiques lexicales et des caractéristiques structurelles produites en appliquant des constructions lexicales modèles et des constructions structurelles modèles, le transitoire étant initialisé avec des caractéristiques lexicales comprenant pour chaque jeton un identifiant de concept dont la fréquence est la plus importante dans la base de données de concepts et qui est associé à la chaîne de caractère du jeton.To this end, the invention provides a device for automatic computer word processing, comprising a memory arranged to receive text data to be analyzed in the form of tokens each comprising a string of characters and a unique token identifier, a base of concept data 25 associating character strings and concept identifiers, at least some of the concept identifiers being associated with one another, model lexical construct data and model structural construct data, each comprising one or more conditions of application to a characteristic and one or more conclusions constituting elements to be applied to a characteristic, and a database of observations associating at least two concept identifiers, a type of relation and an observation value indicating a probability of veracity the type of relationship between the at least two concept identifiers, the device being arranged to work repetitively on a transient comprising lexical features and structural features produced by applying model lexical constructions and model structural constructions, the transient being initialized with lexical features comprising for each token a concept identifier whose frequency is most important in the concept database and which is associated with the token character string.

Ce dispositif comprend en outre : un rectificateur agencé pour déterminer pour chaque caractéristique lexicale d'un 10 transitoire une liste d'identifiants de concept associés à l'identifiant de concept de cette caractéristique lexicale, pour déterminer un ensemble d'observations correspondant aux identifiants de concept des listes ainsi déterminées, et pour appliquer un moteur d'inférence logique probabiliste non-monotone pour déterminer l'identifiant de concept de chaque liste tel que les valeurs d'observations associées à ces identifiants de concept 15 minimisent une fonction de coût définie en appliquant un opérateur de logique multivaluée à une ou plusieurs règles tirées du contenu des caractéristiques du transitoire et instanciées avec des valeurs d'observations correspondantes, et pour remplacer les identifiants de concept des caractéristiques lexicales du transitoire par les identifiants de concept ainsi déterminés, 20 - un apparieur agencé pour déterminer parmi les constructions structurelles modèles celles dont la ou les conditions s'appliquent à une ou plusieurs des caractéristiques du transitoire, et pour retourner la liste des constructions structurelles avec la ou les caractéristiques auxquelles s'appliquent leurs conditions, le dispositif étant en outre agencé pour classer les constructions structurelles associées à chaque caractéristique par 25 fréquence d'utilisation, la première construction structurelle devant être appliquée au transitoire et les autres formant une liste d'options, et un combinateur agencé pour exécuter de manière séquentielle la sélection d'une construction structurelle devant être appliquée au transitoire, le stockage d'une copie du transitoire avec la liste d'options associée à la construction structurelle, et l'application 30 de la construction structurelle devant être appliquée au transitoire à la ou les caractéristiques du transitoire à laquelle cette construction structurelle a été associée par 4 l'apparieur, et pour répéter cette exécution séquentielle sur le transitoire ainsi modifié avec la construction structurelle devant être appliquée au transitoire suivante.This device further comprises: a rectifier arranged to determine for each lexical characteristic of a transient a list of concept identifiers associated with the concept identifier of this lexical characteristic, to determine a set of observations corresponding to the identifiers of concept of the lists thus determined, and to apply a non-monotonic probabilistic logical inference engine to determine the concept identifier of each list such that the observation values associated with these concept identifiers minimize a cost function defined in applying a multivalued logic operator to one or more rules taken from the content of the characteristics of the transient and instantiated with corresponding observation values, and to replace the concept identifiers of the lexical characteristics of the transient by the concept identifiers thus determined, 20 - a pairer arranged to determine among the constructio ns structural models those whose condition (s) apply to one or more of the characteristics of the transient, and to return the list of structural constructions with the characteristic (s) to which their conditions apply, the device being furthermore arranged to classify the structural constructs associated with each characteristic by frequency of use, the first structural construct to be applied to the transient and the others forming a list of options, and a combiner arranged to sequentially perform the selection of a structural construct to be applied. applied to the transient, storing a copy of the transient with the list of options associated with the structural construction, and applying the structural construction to be applied to the transient to the characteristic (s) of the transient at which this structural construction has been associated by 4 the pairer, and to repeat r this sequential execution on the transient thus modified with the structural construction having to be applied to the following transient.

Ce dispositif est en outre agencé pour déterminer, après l'exécution du combinateur, si les caractéristiques du transitoire produit définissent un arbre dont tous les noeuds sont liés entre eux et dépourvu de cycle, pour retourner cet arbre si c'est le cas, et pour répéter l'exécution du rectificateur, de l'apparieur et du combinateur sur le dernier transitoire produit sinon, et est en outre agencé, lorsque l'apparieur ne retourne aucune construction structurelle à appliquer au transitoire, pour remplacer le transitoire courant par la copie du transitoire la plus récente, et pour exécuter le combinateur avec la première construction de la liste d'options en tant que construction structurelle à appliquer au transitoire..This device is furthermore arranged to determine, after the execution of the combiner, whether the characteristics of the produced transient define a tree of which all the nodes are linked to each other and without a cycle, to return this tree if this is the case, and to repeat the execution of the rectifier, the pairer and the combiner on the last transient produced otherwise, and is further arranged, when the pairer does not return any structural construct to be applied to the transient, to replace the current transient with the copy of the most recent transient, and to run the combiner with the first construct of the option list as a structural construct to apply to the transient.

Ce dispositif est particulièrement avantageux car il permet de résoudre les problèmes décrits plus haut.This device is particularly advantageous because it makes it possible to solve the problems described above.

En effet, il repose entièrement sur l'application d'un algorithme dont les règles sont liées à la linguistique.Indeed, it relies entirely on the application of an algorithm whose rules are linked to linguistics.

Dans diverses variantes, le dispositif pourra présenter une ou plusieurs des caractéristiques suivantes : - le moteur d'inférence logique probabiliste non-monotone comprend un optimisateur utilisant l'algorithme des multiplicateurs avec directions alternées, - le dispositif comprend en outre un filtre agencé pour déterminer, pour chaque construction structurelle devant être appliquée et la liste d'options associée un jeu de règles, pour déterminer un jeu d'observations à partir du jeu de règles et du ou des identifiants de concept associé à la caractéristique à laquelle doit être appliquée la construction structurelle devant être appliquée, et pour appliquer le moteur d'inférence logique probabiliste non-monotone avec le jeu de règles et le jeu d'observation afin de déterminer la construction structurelle devant être appliquée au transitoire et la liste d' opti on s, - le rectificateur définit des règles fréquentielles à partir de caractéristiques lexicales du transitoire, des règles de voisinage à partir des identifiants de concept des caractéristiques lexicales du transitoire et des identifiants de concept qui leur sont associés dans la base de données de concept à une distance choisie, et des règles structurelles tirées d'attributs 5 sémantiques de caractéristiques non lexicales du transitoire liant entre elles deux caractéristiques lexicales, - le combinateur est agencé pour stocker uniquement la copie du transitoire avec la liste d'options associée à la construction structurelle, et en ce que, lorsque l'apparieur ne 5 retourne aucune construction structurelle à appliquer au transitoire, pour remplacer le transitoire courant par la copie du transitoire la plus récente, et pour exécuter le combinateur avec la première construction de la liste d'options en tant que construction structurelle à appliquer au transitoire, puis en répétant l'application du rectifieur, de l'apparieur et du combinateur avec le transitoire résultant, 10 - le combinateur est agencé pour stocker les constructions structurelles restantes devant être appliquées au transitoire en même temps que la copie du transitoire avec la liste d'options associée à la construction structurelle, et en ce que lorsque l'apparieur ne retourne aucune construction structurelle à appliquer au transitoire, pour remplacer le transitoire courant par la copie du transitoire la plus récente, et pour exécuter le 15 combinateur avec la première construction de la liste d'options en tant que construction structurelle à appliquer au transitoire ainsi que les constructions structurelles restantes, puis en répétant l'application du rectifieur, de l'apparieur et du combinateur avec le transitoire résultant, - le dispositif est agencé pour analyser l'arbre retourné et pour produire un graphe 20 sémantique dont les noeuds sont formés par les caractéristiques lexicales et leur identifiant de concept, et donc les liens sont définis par les attributs sémantiques de caractéristiques non lexicales liant entre elles deux caractéristiques lexicales, et - le moteur d'inférence est agencé pour appliquer un opérateur de logique multivaluée choisi parmi le groupe comprenant la t-norme de Lukasiewicz, la t-norme minimale et le 25 produit de Hamacher.In various variants, the device may have one or more of the following characteristics: - the non-monotonic probabilistic logic inference engine comprises an optimizer using the algorithm of multipliers with alternating directions, - the device further comprises a filter designed to determine , for each structural construction to be applied and the associated list of options a set of rules, to determine a set of observations from the set of rules and the concept identifier (s) associated with the characteristic to which the rule must be applied. structural construct to be applied, and to apply the non-monotonic probabilistic logical inference engine with the rule set and observation set to determine the structural construct to be applied to the transient and the list of options, - the rectifier defines frequency rules from lexical characteristics of the transient, rules of e neighborhood from the concept identifiers of the lexical characteristics of the transient and the concept identifiers associated with them in the concept database at a chosen distance, and structural rules drawn from semantic attributes 5 of non-lexical characteristics of the transient linking two lexical characteristics together, the combiner is arranged to store only the copy of the transient with the list of options associated with the structural construction, and in that, when the pairer does not return any structural construction to be applied to the transient , to replace the current transient with the most recent copy of the transient, and to run the combiner with the first construct in the options list as a structural construct to apply to the transient, then repeating the application of the rectifier, from the pairer and combiner with the resulting transient, 10 - the combiner is arranged for st Store the remaining structural constructions to be applied to the transient at the same time as the copy of the transient with the list of options associated with the structural construction, and in that when the matcher does not return any structural construction to be applied to the transient, to replace the current transient by copying the most recent transient, and to run the combiner with the first construct from the option list as a structural construct to apply to the transient along with the remaining structural constructs, then repeating the application of the rectifier, of the pairer and of the combiner with the resulting transient, - the device is arranged to analyze the returned tree and to produce a semantic graph whose nodes are formed by the lexical characteristics and their concept identifier, and therefore the links are defined by the semantic attributes of non-lexical characteristics linking between el the two lexical characteristics, and the inference engine is designed to apply a multivalued logic operator chosen from the group comprising the Lukasiewicz t-norm, the minimal t-norm and the Hamacher product.

L'invention concerne également un procédé de traitement automatique de texte mis en oeuvre par ordinateur, comprenant les opérations suivantes : a) recevoir des données de texte à analyser sous la forme de jetons comprenant 30 chacun une chaîne de caractères et un identifiant de jeton unique, une base de données de concepts associant des chaînes de caractères et des identifiants de concept, certains au moins des identifiants de concept étant associés entre eux, des données de constructions 6 lexicales modèles et des données de constructions structurelles modèles, comprenant chacune une ou plusieurs conditions d'application à une caractéristique et une ou plusieurs conclusions constituant des éléments à appliquer à une caractéristique, et une base de données d'observations associant au moins deux identifiants de concept, un type 5 de relation et une valeur d'observation indiquant une probabilité de véracité du type de relation entre les au moins deux identifiants de concept, b) initialiser un transitoire pouvant comprendre des caractéristiques lexicales et des caractéristiques structurelles produites en appliquant des constructions lexicales modèles et des constructions structurelles modèles, avec des caractéristiques lexicales comprenant 10 pour chaque jeton un identifiant de concept dont la fréquence est la plus importante dans la base de données de concepts et qui est associé à la chaîne de caractère du jeton, c) travailler de manière répétitive sur le transitoire en répétant les opérations successives suivantes: cl) déterminer pour chaque caractéristique lexicale d'un transitoire une liste 15 d'identifiants de concept associés à l'identifiant de concept de cette caractéristique lexicale, pour déterminer un ensemble d'observations correspondant aux identifiants de concept des listes ainsi déterminées, et pour appliquer un moteur d'inférence logique probabiliste non-monotone pour déterminer l'identifiant de concept de chaque liste tel que les valeurs d'observations associées à ces identifiants de concept minimisent une 20 fonction de coût définie en appliquant un opérateur de logique multivaluée à une ou plusieurs règles tirées du contenu des caractéristiques du transitoire et instanciées avec des valeurs d'observations correspondantes, et pour remplacer les identifiants de concept des caractéristiques lexicales du transitoire par les identifiants de concept ainsi déterminés, 25 c2) déterminer parmi les constructions structurelles modèles celles dont la ou les conditions s'appliquent à une ou plusieurs des caractéristiques du transitoire, et pour retourner la liste des constructions structurelles avec la ou les caractéristiques auxquelles s'appliquent leurs conditions, c3) classer les constructions structurelles associées à chaque caractéristique par 30 fréquence d'utilisation, la première construction structurelle devant être appliquée au transitoire et les autres formant une liste d'options, et 7 c4) exécuter de manière séquentielle la sélection d'une construction structurelle devant être appliquée au transitoire, le stockage d'une copie du transitoire avec la liste d'options associée à la construction structurelle, et l'application de la construction structurelle devant être appliquée au transitoire à la ou les caractéristiques du transitoire à laquelle cette construction structurelle a été associée par l'apparieur, et pour répéter cette exécution séquentielle sur le transitoire ainsi modifié avec la construction structurelle devant être appliquée au transitoire suivante, c5) déterminer, après l'exécution du combinateur, si les caractéristiques du transitoire produit définissent un arbre dont tous les noeuds sont liés entre eux et dépourvu de cycle, 10 pour retourner cet arbre si c'est le cas, et pour répéter les opérations cl) à c5) sur le transitoire courant sinon, c6) si l'opération c2) ne retourne aucune construction structurelle à appliquer au transitoire, remplacer le transitoire courant par la copie du transitoire la plus récente, exécuter l'opération c5) avec la première construction de la liste d'options en tant que 15 construction structurelle à appliquer au transitoire.The invention also relates to a computer-implemented automatic text processing method, comprising the following operations: a) receiving text data to be analyzed in the form of tokens each comprising a character string and a unique token identifier , a concept database associating character strings and concept identifiers, at least some of the concept identifiers being associated with each other, model lexical construct data 6 and model structural construct data, each comprising one or more conditions of application to a characteristic and one or more conclusions constituting elements to be applied to a characteristic, and a database of observations associating at least two concept identifiers, a type of relation and an observation value indicating a probability of veracity of the type of relation between the at least two identifiers of concept, b) initiali be a transient capable of comprising lexical features and structural features produced by applying model lexical constructions and model structural constructions, with lexical features comprising for each token a concept identifier with the highest frequency in the database concepts and which is associated with the character string of the token, c) work repeatedly on the transient by repeating the following successive operations: c) determine for each lexical characteristic of a transient a list of associated concept identifiers to the concept identifier of this lexical characteristic, to determine a set of observations corresponding to the concept identifiers of the lists thus determined, and to apply a non-monotonic probabilistic logical inference engine to determine the concept identifier of each list such as observation values associated with these concept identifiers minimize a defined cost function by applying a multivalued logic operator to one or more rules taken from the content of the characteristics of the transient and instantiated with corresponding observation values, and to replace the concept identifiers of the transient. lexical characteristics of the transient by the concept identifiers thus determined, 25 c2) determine among the model structural constructions those whose condition (s) apply to one or more of the characteristics of the transient, and to return the list of structural constructions with the or the characteristics to which their conditions apply, c3) classify the structural constructions associated with each characteristic by frequency of use, the first structural construction to be applied to the transient and the others forming a list of options, and 7 c4) to execute sequentially the selecti on of a structural construction to be applied to the transient, storing a copy of the transient with the list of options associated with the structural construction, and applying the structural construction to be applied to the transient at the feature (s) of the transient to which this structural construction has been associated by the pairer, and to repeat this sequential execution on the transient thus modified with the structural construction to be applied to the following transient, c5) determine, after the execution of the combiner, if the characteristics of the produced transient define a tree of which all the nodes are linked to each other and without a cycle, 10 to return this tree if it is the case, and to repeat the operations cl) to c5) on the current transient otherwise, c6) if operation c2) does not return any structural construction to be applied to the transient, replace the current transient by the copy of the transient l a more recent, perform operation c5) with the first construct from the list of options as the structural construct to be applied to the transient.

Dans diverses variantes, le procédé pourra présenter une ou plusieurs des caractéristiques suivantes : - l'application du moteur d'inférence logique probabiliste non-monotone comprend 20 l'application d'un optimisateur utilisant l'algorithme des multiplicateurs avec directions alternées, - le procédé comprend en outre, entre l'opération c3) et l'opération c4) c7) déterminer, pour chaque construction structurelle devant être appliquée et la liste d'options associée un jeu de règles, déterminer un jeu d'observations à partir du jeu de 25 règles et du ou des identifiants de concept associé à la caractéristique à laquelle doit être appliquée la construction structurelle devant être appliquée, et appliquer le moteur d'inférence logique probabiliste non-monotone avec le jeu de règles et le jeu d'observation afin de déterminer la construction structurelle devant être appliquée au transitoire et la liste d'options, 30 - l'opération cl) comprend la définition de règles fréquentielles à partir de caractéristiques lexicales du transitoire, de règles de voisinage à partir des identifiants de concept des caractéristiques lexicales du transitoire et des identifiants de concept qui leur sont associés 8 dans la base de données de concept à une distance choisie, et de règles structurelles tirées d'attributs sémantiques de caractéristiques non lexicales du transitoire liant entre elles deux caractéristiques lexicales, - l'opération c4) stocker uniquement la copie du transitoire avec la liste d'options associée à la construction structurelle, et dans lequel, après de l'exécution de l'opération c7), les opérations cl) à c7) sont répétées sur le transitoire courant, - l'opération c4) stocke les constructions structurelles restantes devant être appliquées au transitoire en même temps que la copie du transitoire avec la liste d'options associée à la construction structurelle, et dans lequel, après l'exécution de l'opération c7), l'opération 10 c5) est appliquée avec les constructions structurelles restantes, puis en répétant l'application du rectifieur, puis, les opérations cl) à c7) sont répétées sur le transitoire courant, et - le procédé comprend en outre l'opération suivante : d) analyser l'arbre retourné par l'opération c6) et produire un graphe sémantique 15 dont les noeuds sont formés par les caractéristiques lexicales et leur identifiant de concept, et donc les liens sont définis par les attributs sémantiques de caractéristiques non lexicales liant entre elles deux caractéristiques lexicales.In various variants, the method may have one or more of the following characteristics: - the application of the non-monotonic probabilistic logic inference engine comprises the application of an optimizer using the algorithm of multipliers with alternating directions, - the method further comprises, between operation c3) and operation c4) c7) determining, for each structural construction to be applied and the associated list of options a set of rules, determining a set of observations from the set 25 rules and the concept identifier (s) associated with the characteristic to which the structural construction to be applied must be applied, and apply the non-monotonic probabilistic logical inference engine with the rule set and the observation set in order to determine the structural construction to be applied to the transient and the list of options, 30 - operation c) includes the definition of frequency rules from d e lexical characteristics of the transient, of neighborhood rules from the concept identifiers of the lexical characteristics of the transient and of the concept identifiers associated with them 8 in the concept database at a chosen distance, and of structural rules drawn from semantic attributes of non-lexical characteristics of the transient linking together two lexical characteristics, - operation c4) store only the copy of the transient with the list of options associated with the structural construction, and in which, after the execution of l 'operation c7), operations cl) to c7) are repeated on the current transient, - operation c4) stores the remaining structural constructions to be applied to the transient at the same time as the copy of the transient with the associated list of options to the structural construction, and in which, after the execution of the operation c7), the operation 10 c5) is applied with the constructions stru cturelles, then by repeating the application of the rectifier, then, operations c1) to c7) are repeated on the current transient, and - the method further comprises the following operation: d) analyze the tree returned by the operation c6) and produce a semantic graph 15 whose nodes are formed by the lexical characteristics and their concept identifier, and therefore the links are defined by the semantic attributes of non-lexical characteristics linking together two lexical characteristics.

D'autres caractéristiques et avantages de l'invention apparaîtront mieux à la lecture de la 20 description qui suit, tirée d'exemples donnés à titre illustratif et non limitatif, tirés des dessins sur lesquels : - la figure 1 représente une vue schématique d'un dispositif selon l'invention, - la figure 2 représente un exemple de mise en oeuvre d'une fonction de traitement automatique de texte par le dispositif de la figure 1, 25 - les figures 3 à 7 représentent des exemples de mise en oeuvre d'opérations de la figure 2, - les figures 8 à 17 représentent des représentations d'étapes de traitement de la fonction de la figure 2 sur un exemple simplifié, - la figure 18 représente un arbre retourné par le dispositif selon l'invention, et - la figure 19 représente un graphe sémantique tiré de l'arbre de la figure 18.Other characteristics and advantages of the invention will emerge better on reading the description which follows, taken from examples given by way of illustration and not limiting, taken from the drawings in which: FIG. 1 represents a schematic view of a device according to the invention, FIG. 2 represents an example of implementation of an automatic text processing function by the device of FIG. 1, FIGS. 3 to 7 represent examples of implementation of 'operations of Figure 2, - Figures 8 to 17 represent representations of processing steps of the function of Figure 2 in a simplified example, - Figure 18 shows a shaft returned by the device according to the invention, and - figure 19 represents a semantic graph taken from the tree of figure 18.

30 9 Les dessins et la description ci-après contiennent, pour l'essentiel, des éléments de caractère certain.The drawings and the description which follow essentially contain elements of a definite nature.

Ils pourront donc non seulement servir à mieux faire comprendre la présente invention, mais aussi contribuer à sa définition, le cas échéant.They can therefore not only serve to better understand the present invention, but also contribute to its definition, if necessary.

La présente description est de nature à faire intervenir des éléments susceptibles de protection par le droit d'auteur et/ou le copyright.This description is likely to involve elements susceptible of protection by copyright and / or copyright.

Le titulaire des droits n'a pas d'objection à la reproduction à l'identique par quiconque du présent document de brevet ou de sa description, telle qu'elle apparaît dans les dossiers officiels.The rights holder has no objection to any identical reproduction by any person of this patent document or its description as it appears in official records.

Pour le reste, il réserve intégralement ses droits.For the rest, he reserves his rights in full.

10 La figure 1 représente une vue schématique d'un dispositif selon l'invention.Figure 1 shows a schematic view of a device according to the invention.

Le dispositif 2 comprend une mémoire 4, un rectificateur 6, un apparieur 8, un filtre 10, un combinateur 12 et un validateur 14.The device 2 comprises a memory 4, a rectifier 6, a pairer 8, a filter 10, a combiner 12 and a validator 14.

15 Dans le cadre de l'invention, la mémoire 4 peut être tout type de stockage de données propre à recevoir des données numériques : disque dur, disque dur à mémoire flash (SSD en anglais), mémoire flash sous toute forme, mémoire vive, disque magnétique, stockage distribué localement ou dans le cl oud, etc.In the context of the invention, the memory 4 can be any type of data storage suitable for receiving digital data: hard disk, hard disk with flash memory (SSD in English), flash memory in any form, random access memory, magnetic disk, storage distributed locally or in the oud key, etc.

Les données calculées par le dispositif peuvent être stockées sur tout type de mémoire similaire à la mémoire 4, ou sur celle-ci.The data calculated by the device can be stored on any type of memory similar to memory 4, or on the latter.

Ces 20 données peuvent être effacées après que le dispositif a effectué ses tâches ou conservées.This data can be erased after the device has performed its tasks or kept.

Dans l'exemple décrit ici, la mémoire 4 reçoit des données permanentes qui servent à mettre en oeuvre le dispositif 2 et peuvent être enrichies au fur et à mesure de ses exécutions.In the example described here, the memory 4 receives permanent data which is used to implement the device 2 and can be enriched as it is executed.

Dans l'exemple décrit ici, la mémoire 4 reçoit également des données 25 temporaires ou données de travail, qui sont générées pour les besoins d'une exécution donnée du dispositif 2, et qui ne sont pas conservées après cette exécution.In the example described here, the memory 4 also receives temporary data or work data, which are generated for the needs of a given execution of the device 2, and which are not kept after this execution.

Les données permanentes et les données de travail peuvent être stockées sur une même mémoire 4 ou sur des mémoires distinctes.The permanent data and the working data can be stored on the same memory 4 or on separate memories.

30 Les données permanentes comprennent une base de données de concepts sémantique basée au moins en partie sur la base de données Wordnet (voir https://wordnet.princeton.edu/) pour la langue anglaise (d'autres bases de données 10 pourront être utilisées pour d'autres langues).30 The master data includes a database of semantic concepts based at least in part on the Wordnet database (see https://wordnet.princeton.edu/) for the English language (other databases 10 may be used for other languages).

Les données permanentes contiennent également, et de manière non exhaustive, les données suivantes, qui seront définies en détail plus bas : - des constructions lexicales modèles pour une langue donnée, c'est-à-dire une condition 5 basée sur la catégorie lexicale d'un mot (son Part of speech en anglais), et une conclusion permettant de caractériser la caractéristique lexicale par sa chaîne de caractère, son part of speech, son identifiant unique de jeton et un identifiant de concept, - des constructions structurelles modèles pour une langue donnée, c'est-à-dire une ou plusieurs conditions qui relient une caractéristique lexicale ou structurelle et une autre 10 caractéristique lexicale ou structurelle et reflète la structuration syntaxique de la langue donnée, comme le fait qu'un adjectif est préposé ou postposé, ou encore la structure d'une proposition subordonnée de cause, etc., - des règles de filtrage, qui permettent de formuler chacune une règle associée à une construction structurelle, afin de lever une ambiguïté lexicale grâce à la sémantique, et 15 - une base de données d'observations, qui sont des quadruplets associant un type de relation, deux identifiants de concepts, et une valeur d'observation comprise entre 0 et 1 (0 signifiant que le type de relation suggéré est faux, 1 qu'il est vrai, et les autres valeurs une probabilité qu'il soit vrai ou faux).The permanent data also contain, and in a non-exhaustive manner, the following data, which will be defined in detail below: - model lexical constructions for a given language, that is to say a condition 5 based on the lexical category d '' a word (its Part of speech in English), and a conclusion allowing to characterize the lexical characteristic by its character string, its part of speech, its unique identifier of token and a concept identifier, - model structural constructions for a given language, i.e. one or more conditions which connect a lexical or structural characteristic and another lexical or structural characteristic and reflects the syntactic structuring of the given language, such as whether an adjective is prepended or postposed , or the structure of a subordinate proposition of cause, etc., - filtering rules, which make it possible to formulate each a rule associated with a structural construct it, in order to remove a lexical ambiguity thanks to the semantics, and 15 - a database of observations, which are quadruplets associating a type of relation, two identifiers of concepts, and an observation value between 0 and 1 (0 means that the type of relation suggested is false, 1 that it is true, and the other values a probability that it is true or false).

Par exemple, pour le type de relation Est Un(A,B), la valeur de l'observation de Est Un(Einstein, homme) serait 1, tandis que 20 la valeur de l'observation de Est Un(Einstein, chien) serait 0, et que celle de Est Un(Einstein, génie) serait de 0,95.For example, for the relation type Is One (A, B), the value of the observation of Is One (Einstein, man) would be 1, while 20 the value of the observation of Is One (Einstein, dog) would be 0, and that of Is One (Einstein, genius) would be 0.95.

Ainsi, selon un concept de « closed world » (le monde est fermé), si deux concepts ne sont pas reliés, alors les observations les reliant sont initialisées à 0.Thus, according to a concept of "closed world" (the world is closed), if two concepts are not connected, then the observations connecting them are initialized to 0.

Cette base de données d'observations peut être complétée à partir du contenu de la base de données de concepts pour déduire des relations logiques entre les 25 concepts.This observation database can be supplemented from the contents of the concept database to deduce logical relationships between the 25 concepts.

Les observations pourraient lier plus de deux identifiants de concept.Observations could link more than two concept identifiers.

Néanmoins, ce type de relation peut être reformulé par un jeu de plusieurs observations reliant les identifiants de concept deux à deux.Nevertheless, this type of relation can be reformulated by a set of several observations linking the concept identifiers in pairs.

Dans le cadre de l'invention, le rectificateur 6, l'apparieur 8, le filtre 10, le 30 combinateur 12 et le validateur 14 sont des éléments accédant directement ou indirectement à la mémoire 4.In the context of the invention, the rectifier 6, the pairer 8, the filter 10, the combiner 12 and the validator 14 are elements which directly or indirectly access the memory 4.

Ils peuvent être réalisés sous la forme d'un code informatique approprié exécuté sur un ou plusieurs processeurs.They can be implemented in the form of appropriate computer code executed on one or more processors.

Par processeurs, il doit 11 être compris tout processeur adapté aux calculs décrits plus bas.By processors, it must be understood any processor suitable for the calculations described below.

Un tel processeur peut être réalisé de toute manière connue, sous la forme d'un microprocesseur pour ordinateur personnel, d'une puce dédiée de type FPGA ou SoC (« system on chip » en anglais), d'une ressource de calcul sur une grille, d'un microcontrôleur, ou de toute autre forme propre à fournir la puissance de calcul nécessaire à la réalisation décrite plus bas.Such a processor can be produced in any known manner, in the form of a microprocessor for a personal computer, of a dedicated chip of FPGA or SoC (“system on chip”) type, of a computing resource on a computer. grid, a microcontroller, or any other form suitable for providing the computing power necessary for the embodiment described below.

Un ou plusieurs de ces éléments peuvent également être réalisés sous la forme de circuits électroniques spécialisés tel un ASIC.One or more of these elements can also be made in the form of specialized electronic circuits such as an ASIC.

Une combinaison de processeur et de circuits électroniques peut également être envisagée.A combination of processor and electronic circuits can also be envisaged.

10 La figure 2 représente un exemple de mise en oeuvre d'une fonction de traitement automatique de texte par le dispositif de la figure 1 Le rôle de la fonction de la Figure 2 est fondamental.FIG. 2 represents an example of the implementation of an automatic word processing function by the device of FIG. 1 The role of the function of FIG. 2 is fundamental.

En effet, c'est une boucle qui appelle le rectificateur 6, l'apparieur 8, le filtre 10, le combinateur 12 et le validateur 14 afin de 15 créer progressivement un graphe sémantique qui représente le sens de la phrase reçue en entrée.Indeed, it is a loop which calls the rectifier 6, the pairer 8, the filter 10, the combiner 12 and the validator 14 in order to progressively create a semantic graph which represents the meaning of the sentence received as input.

Ainsi, le dispositif 2 offre avec cette fonction une brique fondamentale au NLP car celle-ci est quasi-dépourvue d'heuristique, et offre donc une solution répétible pour créer une 20 sous-couche de langage qui permet aux machines d'accéder à la compréhension sémantique des textes.Thus, device 2 offers with this function a fundamental building block to NLP because it is almost heuristic-free, and therefore offers a repeatable solution for creating a language sublayer which allows machines to access the language. semantic understanding of texts.

Le dispositif 2 trouve une application particulièrement efficace dans les domaines du « Question answering » (Réponse aux questions).The device 2 finds a particularly effective application in the fields of “Question answering”.

En effet, il permet d'abord d'analyser un texte pour en établir la sémantique, puis d'analyser de la même manière une question pour la rapprocher de cette sémantique et fournir la réponse.Indeed, it allows first to analyze a text to establish its semantics, then to analyze a question in the same way to bring it closer to this semantics and provide the answer.

Plus 25 généralement, le dispositif 2 se base sur une base de données de concepts sémantiques existante, mais est capable de l'enrichir grâce à son fonctionnement, contrairement aux méthodes à base d'entraînement automatique.More generally, the device 2 is based on an existing database of semantic concepts, but is capable of enriching it thanks to its operation, unlike the methods based on automatic training.

La fonction de la figure 2 commence par une opération 200 avec une fonction Init().The function of Figure 2 begins with an operation 200 with an Init () function.

Un 30 exemple de mise en oeuvre de la fonction Init( est explicité avec la figure 3.An example of implementation of the function Init (is explained with FIG. 3.

La fonction Init() a pour rôle d'initialiser la boucle de la figure 2, et en particulier d'initialiser l'objet principal qui est modifié par les boucles afin d'obtenir le graphe sémantique qui est le 12 résultat.The role of Init () is to initialize the loop of FIG. 2, and in particular to initialize the main object which is modified by the loops in order to obtain the semantic graph which is the result.

Cet objet est ici appelé « transitoire », car il est appelé à évoluer de nombreuses fois, très rapidement, afin de générer le graphe sémantique.This object is called here "transient", because it is called to evolve many times, very quickly, in order to generate the semantic graph.

Comme on le verra dans la suite, un transitoire est composé de caractéristiques.As we will see below, a transient is made up of characteristics.

Chaque caractéristique peut porter soit sur un jeton particulier, auquel cas elle est appelée caractéristique lexicale, soit sur une relation entre deux caractéristiques lexicales, auquel cas elle est appelée caractéristique structurelle.Each characteristic can relate either to a particular token, in which case it is called a lexical characteristic, or to a relation between two lexical characteristics, in which case it is called a structural characteristic.

Une caractéristique structurelle peut elle-même porter sur la relation entre d'autres caractéristiques structurelles, ce qui permet de créer des sens sémantiques complexes.A structural feature may itself relate to the relationship between other structural features, which allows for the creation of complex semantic meanings.

10 Cette manière de travailler permet également de construire le graphe sémantique de manière à la fois syntaxique et sémantique, ce qui respecte la nature du langage.10 This way of working also makes it possible to construct the semantic graph in a way that is both syntactic and semantic, which respects the nature of the language.

Enfin, cela permettra également de traiter le cas de phrases plurilingues.Finally, this will also make it possible to deal with the case of plurilingual sentences.

En effet, si une partie d'une phrase n'est pas résolue dans une langue donnée, elle peut être analysée avec les 15 caractéristiques d'une autre langue afin d'identifier un sous-ensemble de phrase dans une deuxième langue qui donne un sens sémantique à une phrase qui n'en avait pas dans la première langue.Indeed, if a part of a sentence is not resolved in a given language, it can be analyzed with the characteristics of another language in order to identify a subset of sentence in a second language which gives a semantic meaning to a sentence which did not have one in the first language.

Cela est complètement inédit dans le domaine du NLP, et est complètement inaccessible aux solutions à base d'entraînement automatique.This is completely new in the field of NLP, and is completely inaccessible to solutions based on automatic training.

20 Les caractéristiques, qu'elles soient lexicales ou structurelles, sont le fruit de l'application d'objets linguistiques appelés constructions.The characteristics, whether they are lexical or structural, are the result of the application of linguistic objects called constructions.

Ces objets, bien connus dans le domaine de l'analyse linguistique, n'ont jamais trouvé d'application informatique efficace jusqu'à aujourd'hui.These objects, well known in the field of linguistic analysis, have never found an effective computer application until today.

25 Les constructions s'appuient sur un couple condition(s)/conclusion(s).25 The constructions are based on a condition (s) / conclusion (s).

Dit autrement, une construction est un objet qui se comporte comme suit : si des caractéristiques lexicales ou structurelles remplissent la ou les conditions, alors on va leur appliquer la ou les conclusions de la construction.In other words, a construction is an object which behaves as follows: if lexical or structural characteristics meet the condition (s), then we will apply the conclusion (s) of the construction to them.

Ainsi, à chaque fois qu'une boucle trouve des constructions qui s'appliquent aux caractéristiques du transitoire, elle va les compléter ou 30 créer une ou des caractéristiques supplémentaires dans le transitoire si celles-ci n'existent pas encore.Thus, whenever a loop finds constructs that apply to the characteristics of the transient, it will complement them or create additional characteristic (s) in the transient if these do not already exist.

Le graphe sémantique sera ainsi construit de proche en proche, en partant des mots et en leur affectant le sens qu'ils ont dans la phrase, puis en regroupant ceux-ci en 13 groupes nominaux, groupes verbaux, puis phrases nominales et verbales, etc., jusqu'à ce que la phrase soit entièrement définie.The semantic graph will thus be built step by step, starting with the words and assigning them the meaning they have in the sentence, then grouping them into 13 nominal groups, verbal groups, then nominal and verbal sentences, etc. ., until the sentence is fully defined.

Comme on le verra, la fonction de la figure 2 est particulièrement puissante car elle permet néanmoins de réévaluer au cours de son exécution le sens associé à un mot et de propager la conséquence de ce changement dans tout le graphe sémantique.As will be seen, the function of FIG. 2 is particularly powerful because it nevertheless makes it possible to re-evaluate during its execution the meaning associated with a word and to propagate the consequence of this change throughout the semantic graph.

Les conditions et les conclusions des constructions sont constituées de champs dont les valeurs sont fixes ou variables.The conditions and conclusions of constructions consist of fields whose values are fixed or variable.

10 Les conditions portent typiquement sur un ou plusieurs des champs de l'un ou plusieurs des types suivants : - Frontières, qui utilise des prédicats du même nom pour définir les bornes d'un groupe d' éléments, - SousUnités, qui reçoit une liste d'éléments désignés par l'élément, 15 - CatégorieLexicale, qui définit une catégorie lexicale (par exemple, nom, verbe, nom commun, nom propre, adverbe, article, adjectif; etc., - ClasseLexicale, qui définit une classe lexicale (par exemple, verbe transitif, verbe intransitif, auxiliaire, déterminant, etc.), - CatégoriePhrasale, qui définit des groupes de mots entre eux (par exemple phrase 20 nominale, phrase verbale, etc.), et - CatégorieClausale, qui définit une catégorie de clause et permet de grouper des groupes de mots entre eux, par exemple des phrases nominales, des phrases verbales, etc.10 The conditions typically relate to one or more of the fields of one or more of the following types: - Boundaries, which uses predicates of the same name to define the bounds of a group of elements, - SubUnits, which receives a list of elements designated by the element, 15 - LexicalCategory, which defines a lexical category (for example, noun, verb, common noun, proper noun, adverb, article, adjective; etc., - LexicalClass, which defines a lexical class ( for example, transitive verb, intransitive verb, auxiliary, determinant, etc.), - CategoryPhrasal, which defines groups of words among themselves (for example nominal sentence, verbal sentence, etc.), and - CategoryClausal, which defines a category clause and allows groups of words to be grouped together, for example nominal sentences, verbal sentences, etc.

Les conclusions portent typiquement sur un ou plusieurs des champs de l'un ou plusieurs 25 des types suivants : - Référent, qui définit un identifiant de variable qui unit plusieurs éléments à un même niveau, - Args, qui définit un argument constitutif de liens entre éléments, - Parent, qui définit l'élément parent de l'élément courant dans la structure, 30 - Sens, qui définit un sens relationnel (par exemple une relation temporelle - simultanée, point de référence temporelle, etc.), et - ClasseSémantique(), qui définit une classe sémantique.The conclusions typically relate to one or more of the fields of one or more of the following types: - Referent, which defines a variable identifier which unites several elements at the same level, - Args, which defines an argument constituting links between elements, - Parent, which defines the parent element of the current element in the structure, 30 - Meaning, which defines a relational meaning (for example a temporal relation - simultaneous, temporal reference point, etc.), and - SemanticClass (), which defines a semantic class.

14 Les conditions et les conclusions peuvent présenter de nombreux autres champs, comme Forme (relative à une forme d'attribut), Chaîne (une chaîne de caractère), Précède (un prédicat liant deux éléments), Suit (un prédicat liant deux éléments), Interrogation (indique qu'un groupe d'éléments comprend un verbe, un pronom ou un adverbe interrogatif), Préposition, TypeDePhrase (par exemple, phrase nominale, etc.), Nombre, Date, Personne, ValenceSémantique (par exemple, acteur et son identifiant d'élément, etc.), ValenceSyntactique (par exemple, sujet ou objet et son identifiant d'élément), FonctionSyntaxique (par exemple adjectif, auxiliaire, verbe, etc.), Temps, Voix (active ou passive), FormePassive (oui ou non), etc.14 Conditions and conclusions can have many other fields, such as Form (relating to an attribute form), String (a character string), Precedes (a predicate linking two elements), Follow (a predicate linking two elements) , Interrogation (indicates that a group of items includes a verb, pronoun, or interrogative adverb), Preposition, PhraseType (for example, noun sentence, etc.), Number, Date, Person, SemanticValence (for example, actor and its element identifier, etc.), SyntacticValence (for example, subject or object and its element identifier), SyntaxicFunction (for example adjective, auxiliary, verb, etc.), Tense, Voice (active or passive), PassiveForm (yes or no), etc.

10 Par exemple, la construction de type « Composé » peut être définie comme suit : { "id": "noms-composés", "score": 0.0, 15 "type": "Phrasai", "description": "règles de noms composés", "catégorie": "en", "groupe" - "stagel", "constructionClass": "Classic", 20 "locks": [ { le nom 1" "nom": "?nounl", "comprehension": { 25 "nom": "?nounl", "map": { "Catégori eLexi cale" 30 "nom": "?noun2", 15 "comprehension": "nom": "?noun2", "map": { " CatégorieLexicale ": "[nom]" } { "nom" : "?compound", "comprehension": 10 "nom": "?compound", "map": { "Forme": ( "nom": "?v2", "map": { 15 "Accolé": "[[?noun 1 noun2]]" 20 ], "conclusions": [ { "nom": "?compound", "map": { 25 "Accord": "?v5", "Args": "[?v9,?v7]", "Référent": "?ref', "ClasseSémantique": "identifier", "Parent" : "?v10", 30 "Sens": "[[attribute, ?v9, ?v6]]", "SousUnités": " [? noun 1,?noun2]//union", "CatégorieLexicale": "[nom,composé]" 16 Dans cette construction, le « ? » indique qu'il s'agit d'une variable, qui pourra se retrouver à la fois dans les conditions et dans les conclusions, servir à définir l'application de la construction, etc. 10 11 convient également de noter que la fonction de la figure 2 a une nature quasi-récursive.10 For example, the construct of type "Compound" can be defined as follows: {"id": "compound-names", "score": 0.0, 15 "type": "Phrasai", "description": "rules of compound names "," category ":" en "," group "-" stagel "," constructionClass ":" Classic ", 20" locks ": [{name 1" "name": "? nounl", "comprehension ": {25" name ":"? Nounl "," map ": {" Categori eLexi cale "30" name ":"? Noun2 ", 15" comprehension ":" name ":"? Noun2 "," map " : {"LexicalCategory": "[name]"} {"name": "? Compound", "comprehension": 10 "name": "? Compound", "map": {"Shape": ("name": "? v2", "map": {15 "Attached": "[[? noun 1 noun2]]" 20], "conclusions": [{"name": "? compound", "map": {25 " Agreement ":"? V5 "," Args ":" [? V9,? V7] "," Referrer ":"? Ref ', "SemanticClass": "identifier", "Parent": "? V10", 30 " Meaning ":" [[attribute,? V9,? V6]] "," SubUnits ":" [? Noun 1,? Noun2] // union "," Classical ":" [name, compound] "16 In this construct , the "?" indicates that it is a variable, which can be found at the fo is in the conditions and in the conclusions, serve to define the application of the construction, etc. It should also be noted that the function of Figure 2 has a quasi-recursive nature.

Cela rend sa compréhension moins aisée qu'une fonction séquentielle classique et doit être pris en compte dans la lecture de ce qui suit.This makes its understanding less easy than a conventional sequential function and must be taken into account when reading what follows.

C'est pour cette raison que sont fournies les figures 8 à 17.For this reason, Figures 8 to 17 are provided.

Elles n'apportent pas en soi un enseignement sur la technique du dispositif 2, mais elles permettent de mieux comprendre comment celui-ci explore tous 15 les possibles pour établir le graphe sémantique.They do not in themselves teach the technique of device 2, but they make it possible to better understand how the latter explores all the possibilities to establish the semantic graph.

La fonction [nit() commence avec une opération 300 dans laquelle un tableau Tk[] est initialisé par une fonction Lex().The [nit () function starts with an operation 300 in which an array Tk [] is initialized by a Lex () function.

La fonction Lex() réalise l'analyse lexicale d'une phrase reçue en entrée par le dispositif 2, et fournit un tableau Tk[] dans lequel la phrase est 20 découpée en jetons normalisés.The Lex () function performs the lexical analysis of a sentence received as input by the device 2, and provides an array Tk [] in which the sentence is split into standardized tokens.

Le tableau Tk[] stocke pour chaque jeton la chaîne de caractères lui correspondant et un identifiant unique de ce jeton dans la phrase.The Tk [] array stores for each token the corresponding character string and a unique identifier of this token in the sentence.

Ce résultat est également stocké dans un tableau Tst Stack[] qui sera décrit plus bas.This result is also stored in a Tst Stack [] array which will be described below.

La fonction Lex() implémente un analyseur lexical pour produire une séquence de jetons 25 (tokens en anglais) normalisés.The Lex () function implements a lexical analyzer to produce a sequence of standardized tokens.

La notion de normalisation est en référence au fait que certains mots peuvent être écrits sous plusieurs formes (par exemple les contractions en langue anglaise), ou que certains caractères doivent être supprimés ou regroupés.The notion of normalization is in reference to the fact that certain words can be written in several forms (for example contractions in English language), or that certain characters must be deleted or grouped together.

Ainsi l'analyseur lexical réalise une ou plusieurs des fonctions suivantes : - nettoyage des caractères gênants (indice de note de bas de page, caractères spéciaux, 30 etc.), - découpage d'un texte en phrases (grâce aux délimiteurs tels que le point, le point d'exclamation, etc.), 17 - regroupement des expressions spéciales entre elles (dates, etc.), - développement des mots contractés (par exemple « don't » devient « do not »), - découpage de la phrase en jetons normalisés pour traitement par le dispositif 2.Thus the lexical analyzer performs one or more of the following functions: - cleaning up annoying characters (footnote index, special characters, etc.), - splitting a text into sentences (thanks to delimiters such as the period, exclamation point, etc.), 17 - grouping of special expressions together (dates, etc.), - expansion of contract words (for example "don't" becomes "do not"), - splitting of the sentence in standardized tokens for processing by the device 2.

L'analyseur lexical ne fait pas l'objet de l'invention et l'homme du métier connaît plusieurs solutions dans l'état de l'art pour le mettre en oeuvre.The lexical analyzer is not the subject of the invention and a person skilled in the art knows several solutions in the state of the art for implementing it.

Ensuite, dans une opération 310, des prédicats de forme sont initialisés par une fonction SFPO qui reçoit le tableau Tk[] comme variable.Then, in an operation 310, shape predicates are initialized by an SFPO function which receives the array Tk [] as a variable.

La fonction SFPO prend le tableau Tk[] 10 et va produire des prédicats relatifs aux positions des jetons les uns par rapport aux autres.The SFPO function takes the array Tk [] 10 and will produce predicates relating to the positions of the tokens with respect to each other.

Ainsi, pour deux jetons [ballon][rouge], la fonction SFPO crée un prédicat de type Accolé([ballon],[rouge]) et un prédicat de type Précède([ballon],[rouge]).Thus, for two [balloon] [red] tokens, the SFPO function creates a predicate of type Attached ([ball], [red]) and a predicate of type Precede ([ball], [red]).

Ces prédicats indiquent donc que le jeton [ballon] est accolé au jeton [rouge] et qu'il le précède directement.These predicates therefore indicate that the token [balloon] is attached to the token [red] and that it directly precedes it.

Dans l'exemple décrit ici, le prédicat Précède() est généré pour tous les 15 jetons en aval d'un jeton amont.In the example described here, the Predicate Precedes () is generated for all 15 tokens downstream of an upstream token.

La fonction SFPO est agencée pour générer également des prédicats de type Frontières(), qui indiquent des indices de début et de fin d'une chaîne de plusieurs jetons.The SFPO function is arranged to also generate predicates of type Borders (), which indicate start and end indices of a chain of several tokens.

Le jeu de prédicats ainsi produit est stocké dans la mémoire 4, et est accédé pour déterminer l'application de conditions, comme décrit plus bas.The set of predicates thus produced is stored in memory 4, and is accessed to determine the application of conditions, as described below.

20 Une boucle est alors lancée pour analyser chaque caractéristique lexicale du transitoire afin d'initialiser les identifiants de concept.A loop is then launched to analyze each lexical characteristic of the transient in order to initialize the concept identifiers.

A ce stade, et c'est la dernière fois dans la boucle, on peut identifier caractéristique du transitoire et concept Ainsi, dans une opération 320, le transitoire Tst est dépilé, et dans une opération 330, une fonction Find() détermine, pour le jeton courant, l'identifiant de syntaxique de la base de données de 25 concepts de la mémoire 4 qui a la fréquence la plus importante, et le stocke dans un tableau Ltk[].At this point, and this is the last time in the loop, we can identify characteristic of the transient and concept Thus, in an operation 320, the transient Tst is popped, and in an operation 330, a Find () function determines, for the current token, the syntactic identifier of the database of 25 concepts in memory 4 which has the highest frequency, and stores it in an Ltk [] array.

Par exemple, si le jeton est « rouge », la fonction Find() renverra l'identifiant syntaxique associé à l'adjectif « rouge » plutôt que celui associé à la couleur « rouge », car le mot est utilisé plus souvent en tant qu'adjectif qu'en tant que substantif.For example, if the token is "red", the Find () function will return the syntactic identifier associated with the adjective "red" rather than the one associated with the color "red", because the word is used more often as a. adjective as a noun.

Simultanément, la fonction Find() attribue l'identifiant de concept le plus fréquent parmi 30 les identifiants de concept associés à cet identifiant syntaxique.Simultaneously, the Find () function assigns the most frequent concept identifier among the concept identifiers associated with this syntactic identifier.

Enfin, les autres identifiants syntaxiques sont stockés en tant qu'options dans un tableau OC [] Ces options seront stockées dans le tableau Tst Stack[] dans une opération 350 décrite plus bas.Finally, the other syntactic identifiers are stored as options in an OC [] array. These options will be stored in the Tst Stack [] array in an operation 350 described below.

18 Lorsque tous les jetons ont été ainsi traités, le tableau Ltk[] est fourni comme argument à une fonction LexConstr() dans une opération 340.18 When all the tokens have been processed in this way, the Ltk [] array is supplied as an argument to a LexConstr () function in an operation 340.

La fonction LexConstr() retourne une liste de constructions lexicales Cstr[] qui permettront d'initialiser les caractéristiques lexicales du transitoire.The LexConstr () function returns a list of lexical constructions Cstr [] which will allow the lexical characteristics of the transient to be initialized.

Cela est réalisé dans une opération 350 dans laquelle une fonction Merge() reçoit la liste de constructions lexicales Cstr[] de l'opération 340 et le transitoire Tst, et les combine.This is done in an operation 350 in which a Merge () function receives the list of lexical constructs Cstr [] from operation 340 and the transient Tst, and combines them.

Là encore, comme il s'agit de la première opération, la combinaison est garantie, c'est-à-dire que la condition de chaque construction lexicale de la liste Cstr[] est nécessairement 10 remplie par une caractéristique du transitoire Tst, puisqu'elles ont été choisies spécifiquement pour cela.Here again, since this is the first operation, the combination is guaranteed, i.e. the condition of each lexical construction of the list Cstr [] is necessarily fulfilled by a characteristic of the transient Tst, since 'they were chosen specifically for this.

Comme indiqué plus haut, des caractéristiques lexicales optionnelles sont générées et stockées dans un tableau Tst Stack[].As indicated above, optional lexical features are generated and stored in a Tst Stack [] array.

Ces options peuvent être explorées lorsqu'un problème est identifié.These options can be explored when a problem is identified.

Cela est notamment assuré par une fonction Bck() dans une opération 290.This is notably ensured by a Bck () function in an operation 290.

Cela sera décrit plus en détail avec la description 15 du combinateur 12 en rapport avec la figure 7.This will be described in more detail with the description of the combiner 12 in connection with FIG. 7.

En sortie, le transitoire Tst est donc initialisé avec les caractéristiques lexicales correspondant à chaque jeton, avec l'identifiant unique de jeton et l'identifiant de concept qui a été déterminé comme étant le plus probable.At output, the transient Tst is therefore initialized with the lexical characteristics corresponding to each token, with the unique token identifier and the concept identifier which has been determined as being the most probable.

Les prédicats de type Accolé() et 20 Précède() sont également stockés pour la suite, et la fonction se termine dans une opération 399.The Next () and 20 Precedes () predicates are also stored for the rest, and the function ends in an operation 399.

Après l'opération 200 d'initialisation, la boucle de la figure 2 commence avec une opération 205 dans laquelle une fonction Max() détermine si une condition de sortie liée 25 à un nombre excessif d'exécutions de boucle est remplie.After the initialization operation 200, the loop of Figure 2 begins with an operation 205 in which a Max () function determines whether an exit condition related to an excessive number of loop executions is met.

Cela permet d'éviter de rester bloqué dans une boucle de calcul trop longue (par exemple au-delà de 1000 itérations).This makes it possible to avoid getting stuck in a too long calculation loop (for example beyond 1000 iterations).

Lorsque cette condition est remplie, la fonction de la figure 2 se termine dans une opération 299 par une erreur.When this condition is met, the function of Figure 2 ends in an operation 299 with an error.

En variante, l'opération 205 peut être omise.Alternatively, operation 205 can be omitted.

30 Ensuite, une nouvelle boucle commence.30 Then a new loop begins.

Cette boucle commence dans une opération 210 par l'exécution d'une fonction Sem().This loop begins in an operation 210 by the execution of a function Sem ().

La fonction Sem() est dans l'exemple décrit ici mise en oeuvre par le rectificateur 6, et la figure 4 en donne un exemple de réalisation.In the example described here, the function Sem () is implemented by the rectifier 6, and FIG. 4 gives an exemplary embodiment.

19 D'un point de vue général, le but poursuivi par la fonction Sem() est d'analyser le transitoire courant, qui vient d'être enrichi par la boucle précédente, et de voir s'il ne serait pas approprié de changer un ou plusieurs des identifiants de concepts des caractéristiques lexicales compte tenu des caractéristiques structurelles du transitoire.19 From a general point of view, the goal pursued by the function Sem () is to analyze the current transient, which has just been enriched by the previous loop, and to see if it would not be appropriate to change a or more of the concept identifiers of the lexical characteristics taking into account the structural characteristics of the transient.

Dit autrement, la fonction Sem() vient « secouer » le sac d'identifiants de concepts disponibles pour chaque caractéristique lexicale, afin de déterminer s'il n'y a pas un nouveau concept qui donnerait plus de sens à la phrase décrite par le transitoire à ce stade, d'un point de vue sémantique.In other words, the function Sem () “shakes” the bag of concept identifiers available for each lexical characteristic, in order to determine if there is not a new concept which would give more meaning to the sentence described by the transient at this stage, from a semantic point of view.

10 Ainsi, dans une opération 400, le rectificateur 6 crée un tableau Concept[] qui recueille tous les identifiants de concept des caractéristiques lexicales du transitoire courant Tst.Thus, in an operation 400, the rectifier 6 creates a Concept table [] which collects all the concept identifiers of the lexical characteristics of the current transient Tst.

Ensuite, dans une opération 410, une fonction Extrap() détermine pour chacun de ces identifiants de concept la liste des identifiants de concept qui lui sont reliés dans la base 15 de données de concepts et stocke chaque liste dans une entrée d'un tableau Candid[].Then, in an operation 410, an Extrap () function determines for each of these concept identifiers the list of concept identifiers which are linked to it in the concept database 15 and stores each list in an entry of a Candid table. [].

Sur la base du tableau Candid[], une fonction Obsety() collecte toutes les observations reliées à chacun des concepts de chaque liste du tableau Candid[] et les regroupe dans un tableau Obs[] dans une opération 420.On the basis of the Candid [] array, an Obsety () function collects all the observations related to each of the concepts of each list of the Candid [] array and groups them together in an Obs [] array in an operation 420.

20 Enfin, dans une opération 430, une fonction Infer() utilise le tableau Candid[] et le tableau Obs[] pour modifier le transitoire courant, et la fonction se termine avec une opération 499.Finally, in an operation 430, an Infer () function uses the Candid [] array and the Obs [] array to modify the current transient, and the function ends with an operation 499.

Plus précisément, la fonction Infer() utilise un moteur d'inférence logique probabiliste 25 non-monotone qui calcule une multitude de fonctions de coût en fonctions des observations du tableau Obs[] pour chaque combinaison d'un identifiant de concept par liste du tableau Candid[].More precisely, the function Infer () uses a non-monotonic probabilistic logical inference engine which calculates a multitude of cost functions according to the observations of the table Obs [] for each combination of a concept identifier by list of the table. Candid [].

Dit autrement, la combinaison des listes crée une combinatoire d'identifiants de concept, et les observations associées à chaque identifiant de concept sont utilisés pour calculer une fonction de coût à partir de celles-ci.In other words, the combination of the lists creates a combinatorial of concept identifiers, and the observations associated with each concept identifier are used to calculate a cost function from them.

30 La fonction de coût est réalisée en déterminant une pluralité de règles à partir du transitoire.The cost function is performed by determining a plurality of rules from the transient.

Ensuite, ces règles sont évaluées sur la base des observations en appliquant un 20 opérateur de logique multivaluée qui permet de linéariser le problème.Then, these rules are evaluated on the basis of the observations by applying a multivalued logic operator which allows the problem to be linearized.

Dans la version préférée de l'invention, c'est la t-norme de Lukasiewicz.In the preferred version of the invention, it is the Lukasiewicz t-standard.

En variante, l'opérateur pourrait être la t-norme minimale ou le produit de Harnacher.Alternatively, the operator could be the minimum t-norm or the Harnacher product.

Ces règles appartiennent à trois catégories.These rules fall into three categories.

La première catégorie de règles comprend des règles dites fréquentielles.The first category of rules includes so-called frequency rules.

Elles sont basées sur la fréquence de l'identifiant de concept qui est associé aux caractéristiques lexicales présentes dans le transitoire.They are based on the frequency of the concept identifier which is associated with the lexical characteristics present in the transient.

Ces règles s'expriment sous la forme CaractLex 10 (Groupe(Car)) => Concept(Groupe(Car)).These rules are expressed in the form CaractLex 10 (Group (Char)) => Concept (Group (Char)).

Il y a une règle par groupe d'identifiants de concept issu de la fonction ExtrapQ.There is one rule per group of concept identifiers resulting from the ExtrapQ function.

La deuxième catégorie de règles comprend des règles dites de voisinage, qui sont basées sur les liens entre les identifiants de concept des caractéristiques lexicales dans la base de 15 données de concept.The second category of rules includes so-called neighborhood rules, which are based on the links between the concept identifiers of the lexical features in the concept database.

Pour ces règles, la base de données de concepts est explorée à partir de chaque groupe d'identifiants de concept, et cherche les identifiants de concept « voisins » dans les autres groupes d'identifiants de concept, à une distance choisie.For these rules, the concept database is explored from each group of concept identifiers, and searches for the “neighboring” concept identifiers in the other groups of concept identifiers, at a selected distance.

Par exemple, si un premier groupe contient le concept « Fred », et un deuxième contient le concept « hold » (tenir en anglais), alors ces concepts sont à une distance de deux 20 concepts dans la base de données de concepts.For example, if a first group contains the concept "Fred", and a second group contains the concept "hold", then these concepts are two concepts apart in the concept database.

En effet, « Fred » est un nom propre, associé au concept « être humain », et le concept « être humain » est lui-même relié à la capacité « hold » puisque les êtres humains tiennent des objets.Indeed, “Fred” is a proper name, associated with the concept “human being”, and the concept “human being” is itself related to the capacity “hold” since the human beings hold objects.

Ainsi, lorsqu'une liaison est trouvé entre deux identifiants de concept de deux groupes distincts à une distance choisies dans la base de données de concepts, une règle de voisinage est créée.Thus, when a link is found between two concept identifiers of two distinct groups at a distance chosen in the concept database, a neighborhood rule is created.

Ces règles 25 s'expriment sous la forme : CaractLex(Groupe(Carl))& Caract Lex(Group e(C ar2))& Li en(Groupe(Carl),Groupe(Car 2)) => Concept(Groupe(Car2)) Enfin, la troisième catégorie de règles comprend des règles dites structurelles, car elles 30 sont issues des liens sémantiques établis entre les caractéristiques lexicales au sein du transitoire.These rules 25 are expressed in the form: CaractLex (Group (Carl)) & Caract Lex (Group e (C ar2)) & Li en (Group (Carl), Group (Char 2)) => Concept (Group (Car2) )) Finally, the third category of rules includes so-called structural rules, because they result from the semantic links established between the lexical characteristics within the transient.

Ces règles sont donc tirées des attributs sémantiques des caractéristiques issues des constructions structurelles qui lient entre elles deux caractéristiques lexicales.These rules are therefore taken from the semantic attributes of the characteristics resulting from the structural constructions which link two lexical characteristics between them.

21 Par exemple, s'il a été identifié que la caractéristique lexicale associée à la chaîne « Fred » est liée à la caractéristique lexicale associée à la chaîne « holds » par un attribut sémantique de type « acteur », alors une règle correspondante vient remplacer la règle de deuxième catégorie qui liait ces deux caractéristiques lexicales.21 For example, if it has been identified that the lexical characteristic associated with the string "Fred" is linked to the lexical characteristic associated with the string "holds" by a semantic attribute of the "actor" type, then a corresponding rule replaces the second category rule which linked these two lexical characteristics.

Ces règles s'expriment donc sous la forme : CaractLex(Groupe(Carl))& Caract Lex(Group e(C ar2))8.4.These rules are therefore expressed in the form: CaractLex (Group (Carl)) & Caract Lex (Group e (C ar2)) 8.4.

AttSem (Groupe(Carl), Groupe( Car2)) Concept(Groupe(Car2)) La fonction de coût vient instancier ces règles avec les observations choisies en fonction 10 de la combinatoire d'identifiants de concepts de chaque groupe issu de la fonction ExtrapO.AttSem (Group (Carl), Group (Car2)) Concept (Group (Car2)) The cost function instantiates these rules with the observations chosen according to the combination of concept identifiers of each group resulting from the ExtrapO function .

L'optimisation de cette fonction de coût permet de déterminer la combinaison de chaque identifiant de concept de chaque liste qui offre la meilleure sémantique pour le transitoire courant, dont les caractéristiques lexicales sont ainsi mises à jour avec les nouveaux identifiants de concept qui sont considérés comme plus pertinents. 15 11 apparaît donc que, lorsque le transitoire ne contient que des constructions lexicales, il n'y a que des règles de première catégorie et de deuxième catégorie, et la fonction de coût est basée sur la co-occurrence des identifiants de concept dans la base de données de concepts.The optimization of this cost function makes it possible to determine the combination of each concept identifier of each list which offers the best semantics for the current transient, whose lexical characteristics are thus updated with the new concept identifiers which are considered as more relevant. 15 It therefore appears that, when the transient contains only lexical constructions, there are only rules of the first category and of the second category, and the cost function is based on the co-occurrence of the concept identifiers in the database of concepts.

Puis, au fur et à mesure que les constructions structurelles rajoutent des liens 20 sémantiques entre les caractéristiques lexicales dans le transitoire, des règles de troisième catégorie, beaucoup plus discriminantes, sont introduites dans la fonction de coût et vont la contraindre fortement.Then, as the structural constructions add semantic links between the lexical characteristics in the transient, third category rules, much more discriminating, are introduced into the cost function and will strongly constrain it.

La Demanderesse a découvert que l'utilisation d'un moteur d'inférence logique 25 probabiliste non-monotone permet d'offrir pour la première fois un résultat satisfaisant pour implémenter une méthode à base de constructions linguistiques.The Applicant has discovered that the use of a non-monotonic probabilistic logic inference engine makes it possible for the first time to offer a satisfactory result for implementing a method based on linguistic constructions.

En effet, la fonction Sem(), grâce à l'ajustement sémantique qu'elle offre à chaque exécution de la boucle, est fondamentale dans l'obtention d'un résultat favorable.Indeed, the function Sem (), thanks to the semantic adjustment that it offers at each execution of the loop, is fundamental in obtaining a favorable result.

30 La Demanderesse a en outre découvert qu'il était particulièrement avantageux d'utiliser un moteur d'inférence incluant un optimisateur utilisant l'algorithme des multiplicateurs avec directions alternées (ou ADMIM pour « Alternating Direction Method of 22 Multipliers » en anglais).The Applicant has also discovered that it was particularly advantageous to use an inference engine including an optimizer using the algorithm of multipliers with alternating directions (or ADMIM for “Alternating Direction Method of 22 Multipliers”).

En effet, l'utilisation d'un tel optimisateur permet de réduire les coûts de temps de calcul en linéarisant le problème, alors que le problème de base est de type NP, c'est-à-dire une combinatoire de toutes les variantes de chaque liste entre elles, multipliée par la quantité d'observations pour chaque membre de chaque liste.Indeed, the use of such an optimizer makes it possible to reduce the costs of computation time by linearizing the problem, whereas the basic problem is of NP type, that is to say a combinatorial of all the variants of each list between them, multiplied by the quantity of observations for each member of each list.

Une fois l'ajustement sémantique réalisé, la boucle se poursuit avec une boucle qui va tester chacune des constructions structurelles modèles sur toutes les caractéristiques du transitoire et déterminer celles qui sont susceptibles de s'appliquer.Once the semantic adjustment has been made, the loop continues with a loop which will test each of the model structural constructions on all the characteristics of the transient and determine those which are likely to apply.

Pour cela, dans une opération 220 un tableau ConStr[] des constructions structurelles modèles est dépilé, et 10 dans une opération 230, la construction structurelle c issue de l'opération 220 est testée avec toutes les caractéristiques du transitoire courant dans une fonction Match°.For this, in an operation 220 an array ConStr [] of model structural constructions is popped, and in an operation 230, the structural construction c from operation 220 is tested with all the characteristics of the current transient in a Match ° function. .

La fonction Match() est exécutée par l'apparieur 8 et la figure 5 représente un exemple de mise en oeuvre de cette fonction.The Match () function is executed by the matcher 8 and FIG. 5 represents an example of implementation of this function.

D'une manière générale, la fonction Match() analyse 15 chacune des conditions de la construction structurelle c et construit les n-uplets de conditions qui satisfont les conditions de la construction structurelle.Generally speaking, the Match () function analyzes each of the conditions of the structural construct c and constructs the tuples of conditions which satisfy the conditions of the structural construct.

Ainsi, dans une opération 500 la liste L[] des conditions de la construction c est récupérée au moyen d'une fonction Locks[].Thus, in an operation 500 the list L [] of the conditions of the construction c is retrieved by means of a function Locks [].

Ensuite, une boucle est lancée dans laquelle cette liste est dépilée dans une opération 510 et les caractéristiques du transitoire courant sont chacune comparées à 20 la condition courante.Next, a loop is started in which this list is popped into an operation 510 and the characteristics of the current transient are each compared to the current condition.

Pour cela, le transitoire est dépilé dans une opération 520 et la caractéristique f correspondante comparée à la condition 1 courante dans une opération 530.For this, the transient is unstacked in an operation 520 and the corresponding characteristic f compared with the current condition 1 in an operation 530.

Si la caractéristique f satisfait la condition 1, alors la caractéristique suivante est testée en répétant l'opération 520.If characteristic f satisfies condition 1, then the next characteristic is tested by repeating operation 520.

Si la caractéristique f satisfait la condition, alors une fonction AddFt() est exécutée dans une opération 540.If the characteristic f satisfies the condition, then an AddFt () function is executed in an operation 540.

La fonction AddFt() vient ajouter 25 dans un tableau m[] tous les groupes de caractéristiques qui satisfont une condition de la construction structurelle.The AddFt () function adds 25 in an array m [] all the groups of characteristics which satisfy a condition of the structural construction.

Ainsi, lorsqu'une caractéristique satisfait la condition 1, la fonction AddFt() détermine ceux des groupes du tableau m[] qui sont compatibles avec cette caractéristique compte tenu de l'ensemble des conditions de la construction structurelle, et ajoute la caractéristique f à tous les groupes de caractéristiques 30 compatibles.Thus, when a characteristic satisfies the condition 1, the function AddFt () determines those of the groups of the array m [] which are compatible with this characteristic taking into account all the conditions of the structural construction, and adds the characteristic f to all groups of characteristics 30 compatible.

Après l'opération 540, ou si l'opération 530 est négative, alors la boucle reprend avec la caractéristique suivante dans l'opération 520.After operation 540, or if operation 530 is negative, then the loop resumes with the next characteristic in operation 520.

23 Lorsque toutes les caractéristiques ont été testées pour la condition 1 courante, la boucle est répétée avec la condition suivante en répétant l'opération 510.23 When all the characteristics have been tested for the current condition 1, the loop is repeated with the next condition by repeating operation 510.

Lorsque toutes les conditions ont été testées, dans une opération 550, une fonction Rem() réduit le tableau m[] pour vérifier les groupes de caractéristiques produits et ne garder que ceux qui sont complets, c'est-à-dire qui remplissent toutes les conditions de la construction structurelle.When all the conditions have been tested, in an operation 550, a function Rem () reduces the array m [] to check the groups of characteristics produced and to keep only those which are complete, that is to say which meet all of them. the conditions of structural construction.

Ces groupes forment donc des n-uplets de caractéristiques qui sont susceptibles de se voir appliquer la construction structurelle, puis la fonction se termine dans une opération 599.These groups therefore form n-tuples of characteristics which are likely to have the structural construction applied, then the function ends in an operation 599.

Il est à noter que la fonction de la figure 5 peut être réalisée de nombreuses manières.Note that the function of Fig. 5 can be performed in many ways.

Par 10 exemple, la boucle pourrait être réalisée de manière à exclure le test d'une caractéristique dès qu'il est détecté qu'elle ne satisfait pas une condition, par exemple en testant la valeur qui lui est associée dans le tableau m[] en début de boucle.For example, the loop could be carried out in such a way as to exclude the testing of a characteristic as soon as it is detected that it does not satisfy a condition, for example by testing the value associated with it in the table m [] at the start of the loop.

D'autres variantes pourront être envisagées.Other variants could be considered.

15 Les caractéristiques qui correspondent à la construction c sont ensuite stockées dans un tableau Con[] dans une opération 235, puis la boucle reprend avec l'opération 210.The characteristics which correspond to the construction c are then stored in an array Con [] in an operation 235, then the loop resumes with the operation 210.

Une fois toutes les constructions structurelles modèles testées, le tableau Con[] est testé dans une opération 240.After all the model structural constructions have been tested, the Con [] array is tested in an operation 240.

Si le tableau Con[] contient des constructions, alors les opérations 250 à 270 vont analyser ces constructions et choisir les plus pertinentes et établir une liste 20 d'options pour le cas où les constructions choisies conduiraient à une impasse dans les boucles suivantes.If the array Con [] contains constructs, then operations 250 to 270 will analyze these constructs and choose the most relevant and establish a list of options for the case where the chosen constructs lead to a dead end in the following loops.

Si ce tableau est vide, alors aucune construction structurelle ne trouve à s'appliquer aux caractéristiques du transitoire.If this table is empty, then no structural construction can be applied to the characteristics of the transient.

Comme ce test est à l'intérieur d'une boucle, cela signifie que le graphe sémantique n'a pas été totalement résolu.As this test is inside a loop, it means that the semantic graph has not been fully resolved.

Le fait que le tableau Con[] soit vide indique qu'il n'est pas possible de compléter le graphe 25 sémantique.The fact that the array Con [] is empty indicates that it is not possible to complete the semantic graph.

Il faudra donc explorer les options établies dans la ou les boucles précédentes.It will therefore be necessary to explore the options established in the previous loop (s).

Cela sera réalisé dans une opération 290.This will be done in a 290 operation.

Dans l'opération 250, une fonction Ord() traite le tableau Con[] et produit deux tableaux C2M[] et OC[].In operation 250, an Ord () function processes the array Con [] and produces two arrays C2M [] and OC [].

Le tableau C2M[] contient la liste des constructions structurelles les plus 30 probables, tandis que le tableau OC[] contient la liste des options.Table C2M [] contains the list of most probable structural constructs, while table OC [] contains the list of options.

Plus précisément, la fonction Ord() initialise une première liste en retirant le premier n-uplet du tableau Con[].More precisely, the Ord () function initializes a first list by removing the first tuple from the Con [] array.

Ensuite, elle parcourt tous les autres n-uplets du tableau Con[], et, à chaque fois qu'un n- 24 uplet concerne une caractéristique du premier n-uplet, il l'introduit dans la première liste et le retire du tableau Con[].Then, it iterates through all the other tuple of the array Con [], and, each time a tuple concerns a characteristic of the first tuple, it inserts it in the first list and removes it from the array. Con [].

Une fois tous les n-uplets parcourus, l'opération est répétée avec le reste du tableau Con[], jusqu'à ce que celui-ci soit vide.Once all the tuples have been scanned, the operation is repeated with the rest of the Con [] array, until it is empty.

Il en découle un tableau C2M[] contenant les n-uplets qui ont servi à générer les listes, et un tableau OC[] qui contient les n-uplets qui ont été progressivement rajoutés à chaque liste, en tant qu'options au n-uplet correspondant du tableau C21\4].This results in a C2M [] array containing the tuples which were used to generate the lists, and an OC [] array which contains the tuples which were gradually added to each list, as options to the n- corresponding tuple of table C21 \ 4].

Dans l'opération 260, les tableaux C2M[] et OC[] sont traités dans une fonction Filt() par le filtre 10 afin de déterminer s'il existe une raison de penser que le choix fréquentiel 10 n'était pas le bon.In operation 260, arrays C2M [] and OC [] are processed in a Filt () function by filter 10 in order to determine if there is a reason to believe that the frequency choice 10 was not the correct one.

Plus précisément, dans le cas des tableaux C2M[] et OC[], on se trouve dans une situation où plusieurs constructions structurelles correspondent à la même caractéristique.More precisely, in the case of tables C2M [] and OC [], we find ourselves in a situation where several structural constructions correspond to the same characteristic.

Dit autrement, il existe une ambiguïté lexicale, et la fonction Filt() va essayer de la lever par une analyse sémantique.In other words, there is a lexical ambiguity, and the Filt () function will try to resolve it by semantic analysis.

La figure 6 représente un exemple de mise en oeuvre de la fonction Filt().FIG. 6 represents an example of implementation of the Filt () function.

15 La fonction Filt() est une boucle qui analyse chaque conclusion du tableau C2M[] en dépilant celui-ci dans une opération 610.The Filt () function is a loop which parses each conclusion of the array C2M [] by unstacking it in an operation 610.

Ensuite, dans une opération 620, une fonction Opt() génère un tableau N[] qui reçoit la conclusion courante c et toutes les options lui correspondant dans le tableau OC[].Then, in an operation 620, an Opt () function generates an array N [] which receives the current conclusion c and all the options corresponding to it in the array OC [].

Puis, dans une opération 620, une fonction Rules[] 20 génère un tableau R[] de règles d'analyse.Then, in an operation 620, a Rules [] function 20 generates an array R [] of analysis rules.

Par exemple dans le cas où deux groupes nominaux sont séparés par une virgule, il est nécessaire de déterminer s'il s'agit d'une liste ou s'il s'agit d'une apposition.For example in the case where two nominal groups are separated by a comma, it is necessary to determine whether it is a list or if it is an apposition.

Pour cela, les constructions structurelles correspondant à la liste d'une part et à l'apposition d'autre part sont traduites en deux règles qui sont stockées dans le tableau R[] avec les identifiants de concept attachés à la 25 caractéristique concernée.For this, the structural constructions corresponding to the list on the one hand and to the apposition on the other hand are translated into two rules which are stored in the table R [] with the concept identifiers attached to the characteristic concerned.

Ces règles sont dans l'exemple décrit ici tirées des règles de filtrage dans la mémoire 4.These rules are in the example described here taken from the filtering rules in memory 4.

Plus précisément, les règles définissent des prédicats entre les caractéristiques.More precisely, the rules define predicates between the characteristics.

Néanmoins, si pour toutes les constructions du tableau N[] il n'y a aucune règle, alors rien n'est introduit dans le tableau R[] pour la construction c.However, if for all the constructions of the array N [] there is no rule, then nothing is introduced in the array R [] for the construction c.

Ensuite, dans une opération 630, un tableau d'observations Obs[] est généré par une fonction 30 Observ() afin de déterminer les observations en rapport avec l'identifiant de concept de la caractéristique concernée par la construction c et chacune des règles du tableau R[].Then, in an operation 630, an observation table Obs [] is generated by a function Observ () in order to determine the observations related to the concept identifier of the characteristic concerned by the construction c and each of the rules of the table R [].

Enfin, dans une opération 640, le moteur d'inférence est à nouveau appliqué sur le tableau 25 R[] et le tableau Obs[].Finally, in an operation 640, the inference engine is again applied to the table R [] and the table Obs [].

Si le tableau R[] est vide, alors rien n'est fait et l'ordre établi avec les tableaux C2M[] et OC[] est maintenu.If the array R [] is empty, then nothing is done and the order established with the arrays C2M [] and OC [] is maintained.

Sinon, le moteur d'inférence permet de déterminer sémantiquement celle des constructions qui est la plus pertinente sémantiquement.Otherwise, the inference engine makes it possible to semantically determine which of the constructs is the most semantically relevant.

Il en résulte un tableau Con2[] des constructions à appliquer au transitoire courant et un tableau OC[] d'options.The result is an array Con2 [] of the constructions to be applied to the current transient and an array OC [] of options.

Lorsque toutes les constructions du tableau C2M[] ont été traitées, la fonction se termine dans une opération 699.When all the constructs in array C2M [] have been processed, the function ends in an operation 699.

En variante, la fonction Filt[] pourrait être omise.Alternatively, the Filt [] function could be omitted.

Une fois la fonction Filt() exécutée, les constructions choisies du tableau Con2[] sont appliquées au transitoire courant dans une 10 opération 270 par le combinateur 12 dans une fonction Merge0.After the Filt () function has been executed, the constructions chosen from the table Con2 [] are applied to the current transient in an operation 270 by the combiner 12 in a function Merge0.

La figure 7 représente un exemple de mise en oeuvre de la fonction MergeQ.FIG. 7 represents an example of implementation of the MergeQ function.

Là encore, le tableau Con2[] est dépilé dans une opération 700.Again, the array Con2 [] is popped in a 700 operation.

Ensuite, dans une opération 710, le tableau OC[] est également dépilé afin de stocker les options 15 correspondantes.Then, in an operation 710, the OC array [] is also popped in order to store the corresponding options.

Puis dans une opération 720, le transitoire courant est stocké avec les options de l'opération 710 dans une opération 720.Then in an operation 720, the current transient is stored with the options of operation 710 in an operation 720.

Cette opération est cruciale car c'est elle qui permettra de parcourir de la manière la plus complète et efficace dans l'opération 290.This operation is crucial because it is this which will make it possible to travel in the most complete and efficient way in operation 290.

Il y a donc un transitoire couplé à une liste d'options de constructions qui est généré avant chaque application d'une construction, et autant de solutions de repli en cas d'échec.There is therefore a transient coupled to a list of construction options which is generated before each application of a construction, and as many fallback solutions in the event of failure.

20 Dit autrement, les transitoires introduits dans le tableau Tst Stack[] à l'opération 720 sont tous différents les uns des autres puisqu'entre deux applications de l'opération 720, le transitoire est modifié dans l'opération 730.In other words, the transients introduced into the table Tst Stack [] in operation 720 are all different from each other since between two applications of operation 720, the transient is modified in operation 730.

Ainsi, le tableau Tst Stack[] contient le détail de toutes les constructions appliquées au transitoire, une par une, et classée temporellement par la nature même de la boucle.Thus, the table Tst Stack [] contains the detail of all the constructions applied to the transient, one by one, and classified temporally by the very nature of the loop.

25 Enfin, dans une opération 730, les conclusions de la construction dépilée dans l'opération 700 sont appliquées au transitoire courant.Finally, in an operation 730, the conclusions of the construction unstacked in operation 700 are applied to the current transient.

Pour cela, si ces conclusions s'appliquent à une caractéristique existante, alors celle-ci est mise à jour.To do this, if these conclusions apply to an existing characteristic, then this characteristic is updated.

Sinon, une nouvelle caractéristique est créée dans le transitoire.Otherwise, a new characteristic is created in the transient.

Lorsque toutes les constructions ont 30 été appliquées au transitoire, la fonction Merge() se termine dans une opération 799.When all the constructs have been applied to the transient, the Merge () function ends in a 799 operation.

26 Une fois que la fonction Merge() est terminée, une opération 280 détermine avec une fonction Goal() si la phrase a été entièrement résolue et si le graphe sémantique est fini.26 Once the Merge () function is complete, an operation 280 determines with a Goal () function whether the sentence has been fully resolved and whether the semantic graph is finished.

Pour cela, la fonction Goal() détermine si tous les jetons ont été remplis avec un identifiant de concept, si toutes les caractéristiques définissent entre elles un arbre dont toutes les branches sont reliées entre elles, c'est-à-dire s'il est possible d'atteindre toutes les caractéristiques de l'arbre à partir de chaque caractéristique, et enfin s'il n'existe pas de cycle dans la structure générée.For this, the Goal () function determines if all the tokens have been filled with a concept identifier, if all the characteristics define between them a tree of which all the branches are connected to each other, that is to say if it It is possible to reach all the characteristics of the tree from each characteristic, and finally if there is no cycle in the generated structure.

Si c'est le cas, alors la fonction de la figure 2 se termine dans l'opération 299.If so, then the function of Figure 2 ends in operation 299.

Sinon, la boucle reprend avec l'opération 205 pour traiter le transitoire courant.Otherwise, the loop resumes with operation 205 to process the current transient.

10 L'opération 290 consiste à dépiler le tableau Tst Stack[] alimenté par l'exécution de l'opération 720, et à reprendre la boucle à l'opération 270 en utilisant la première option.Operation 290 consists of unstacking the Tst Stack [] array fed by the execution of operation 720, and resuming the loop at operation 270 using the first option.

Dans ce qui précède, il apparaît donc que la fonction de la figure 2 est un algorithme 15 systématique basé sur des données connues et quasi dépourvu d'heuristiques (on pourrait qualifier les règles de filtrage d'heuristique, mais elles sont optionnelles).In the foregoing, it therefore appears that the function of FIG. 2 is a systematic algorithm based on known data and almost devoid of heuristics (the filtering rules could be qualified as heuristic, but they are optional).

Cela démontre le caractère répétible du traitement du dispositif 2.This demonstrates the repeatability of the treatment of device 2.

De plus, le traitement du dispositif 2 est à la fois syntaxique par l'application des constructions structurelles et sémantique par l'utilisation du rectificateur 6 et du filtre 10.In addition, the processing of device 2 is both syntactic by applying structural constructs and semantic by using rectifier 6 and filter 10.

C'est cette approche totalement nouvelle, 20 rendue possible grâce à l'utilisation d'un moteur d'inférence logique probabiliste non- monotone qui rend possible l'application d'un modèle à base de constructions et qui produit une compréhension sémantique.It is this completely new approach, made possible through the use of a non-monotonic probabilistic logic inference engine which makes possible the application of a construct-based model and which produces a semantic understanding.

Les figures 8 à 17 sont données à titre d'exemple afin d'aider à mieux comprendre le 25 fonctionnement de la boucle de la figure 2 et en particulier l'opération 290 sur la phrase « Freds holds a small match ».Figures 8 to 17 are given by way of example in order to help better understand the operation of the loop of Figure 2 and in particular operation 290 on the sentence "Freds holds a small match".

Dans une première étape représentée sur la figure 8, le dispositif 2 initialise le transitoire avec les constructions lexicales dans l'ordre de leur fréquence dans la base de données de 30 concepts : « Holds, verbe » « small, adjectif », « a, article de construction », « match, verbe » et « Fred, nom propre ».In a first step represented in FIG. 8, the device 2 initializes the transient with the lexical constructions in the order of their frequency in the database of 30 concepts: “Holds, verb” “small, adjective”, “a, construction article ”,“ match, verb ”and“ Fred, proper noun ”.

En variante, les constructions lexicales pourraient être classées par leur ordre dans la phrase, ou dans un ordre choisi de façon à déterminer un 27 degré de confiance dans le choix des premières constructions lexicales, et pour placer vers la fin du transitoire les constructions lexicales pour lesquelles le degré de confiance est le plus bas.Alternatively, the lexical constructs could be ordered by their order within the sentence, or in an order chosen so as to determine a degree of confidence in the choice of the first lexical constructs, and to place the lexical constructs towards the end of the transient. which have the lowest confidence.

A la boucle suivante, représentée sur la Figure 9, une phrase adjectivale a été créée au-dessus de « small », une phrase nominale au-dessus de « Fred », et deux phrases verbales respectivement au-dessus de « holds » et de « match ».At the next loop, shown in Figure 9, an adjectival sentence has been created above "small", a nominal sentence above "Fred", and two verbal sentences respectively above "holds" and "Match".

Dans la boucle représentée sur la figure 10, une clause vient lier la phrase nominale de 10 « Fred » et la phrase verbale de « holds », puis les boucles ultérieures échouent.In the loop shown in Figure 10, a clause links the nominal phrase of "Fred" and the verbal phrase of "holds", then subsequent loops fail.

Cet échec remonte des les transitoires du tableau Tst Stack[] jusqu'à déterminer que l'erreur porte sur la construction lexicale de « match ».This failure goes back through the transients of the Tst Stack [] array until it determines that the error relates to the lexical construction of "match".

Comme « Fred » est de rang inférieur à « match », le transitoire est réduit à la figure 11 « match, nom commun » et « Fred, nom propre » sont alors testés comme option, puis les figures 12 à 15 montrent le 15 développement des constructions sur cette base.As "Fred" ranks lower than "match", the transient is reduced to figure 11 "match, common name" and "Fred, proper name" are then tested as an option, then figures 12 to 15 show the development. constructions on this basis.

Enfin, avec la figure 16 la phrase verbale est conclue et la structure est fermée avec une clause affirmative représentée sur la figure 17.Finally, with figure 16 the verbal sentence is concluded and the structure is closed with an affirmative clause shown in figure 17.

Afin de mieux comprendre les règles de première, deuxième et troisième catégorie, celles-ci vont être explicitées pour l'exemple 20 des figures 16 et 17.In order to better understand the rules of the first, second and third category, they will be explained for example 20 of FIGS. 16 and 17.

Sur la figure 16, un seul lien sémantique est établi entre les caractéristiques lexicales le lien d'attribut entre « small » et « match ».In FIG. 16, a single semantic link is established between the lexical characteristics, the attribute link between “small” and “match”.

Ainsi, le jeu de règles produit sera : CaractLex(Groupe(Fred))=>Concept(Groupe(Fred)) 25 CaractLex(Groupe(hold))=>Concept(Groupe(hold)) CaractLex(Groupe(a))=>Concept(Groupe(a)) CaractLex(Groupe(small))=>Concept(Groupe(small)) CaractLex(Groupe(match))=>Concept(Groupe(match)) CaractLex(Groupe(Fred))&taractLex(Groupe(hol d))&Li en(Groupe(Fred),Groupe(hol d 30 )) Concept(Groupe(hold)) CaractLex(Groupe(hold))&CaractLex(Groupe(match))&Lien(Groupe(hold),Groupe (ma tch)) => Concept(Groupe(match)) 28 CaractLex(Groupe(small))&CaractLex(Groupe(match))&Attrib(Groupe(small), Groupe( match)) => Concept(Groupe(match)) Sur la figure 17, deux autres liens sémantiques sont établis entre les caractéristiques lexicales : le lien d'acteur de « Fred » sur « hold » et le lien de thème entre « hold » et « match ».Thus, the ruleset produced will be: CaractLex (Group (Fred)) => Concept (Group (Fred)) 25 CaractLex (Group (hold)) => Concept (Group (hold)) CaractLex (Group (a)) = > Concept (Group (a)) CaractLex (Group (small)) => Concept (Group (small)) CaractLex (Group (match)) => Concept (Group (match)) CaractLex (Group (Fred)) & taractLex (Group (hol d)) & Li en (Group (Fred), Group (hol d 30)) Concept (Group (hold)) CaractLex (Group (hold)) & CaractLex (Group (match)) & Link (Group (hold), Group ( ma tch)) => Concept (Group (match)) 28 CaractLex (Group (small)) & CaractLex (Group (match)) & Attrib (Group (small), Group (match)) => Concept (Group (match)) On FIG. 17, two other semantic links are established between the lexical characteristics: the actor link from "Fred" to "hold" and the theme link between "hold" and "match".

Ainsi, deux règles de troisième catégorie sont ajoutées et remplacent les règles de deuxième catégorie de la figure 16 : 10 CaractLex(Groupe(Fred))&CaractLex(Groupe(hold))&Acteur(Groupe(Fred),Groupe (ho 1d)) => Concept(Groupe(hold)) CaractLex(Groupe(hold))&CaractLex(Groupe(match))&Thème(Groupe(hold), Groupe( match)) => Concept(Groupe(match)) 15 La figure 18 représente le transitoire complet associé à l'exemple de la figure 17, et la figure 19 représente le graphe sémantique qui lui correspond.Thus, two rules of the third category are added and replace the rules of the second category in figure 16: 10 CaractLex (Group (Fred)) & CaractLex (Group (hold)) & Actor (Group (Fred), Group (ho 1d)) = > Concept (Group (hold)) CaractLex (Group (hold)) & CaractLex (Group (match)) & Theme (Group (hold), Group (match)) => Concept (Group (match)) 15 Figure 18 represents the transient complete associated with the example of FIG. 17, and FIG. 19 represents the semantic graph which corresponds to it.

Ainsi, il apparaît que les caractéristiques lexicales sont liées entre elles par des attributs sémantiques dans des caractéristiques non lexicales.Thus, it appears that the lexical characteristics are linked together by semantic attributes in non-lexical characteristics.

Cela transparaît sur la figure 18 par la liaison d'éléments de type ?argXX et ?refYY ou de type ?argXX et ?argYY.This is shown in Figure 18 by the connection of elements of type? ArgXX and? RefYY or of type? ArgXX and? ArgYY.

20 Ainsi, il apparaît les liens suivants : Actor ?arg27 ?ref28, Theme ?arg27 ?ref28, Attr ?arg15 ?arg17, NonldentifiableReferent ?arg 2 ?arg17.20 Thus, the following links appear: Actor? Arg27? Ref28, Theme? Arg27? Ref28, Attr? Arg15? Arg17, NonldentifiableReferent? Arg 2? Arg17.

25 La figure 19 traduit ces liens sémantiques, et permet de représenter le graphe sémantique qui décrit le sens de la phrase, tel que produit par le dispositif 2.FIG. 19 translates these semantic links, and makes it possible to represent the semantic graph which describes the meaning of the sentence, as produced by the device 2.

Le dispositif 2 produit donc un transitoire final qui a une structure d'arbre qui contient tous les liens à la fois syntaxiques et sémantiques de la phrase produite en entier.Device 2 therefore produces a final transient which has a tree structure which contains all the links, both syntactic and semantic, of the sentence produced as a whole.

Cet arbre permet de produire un graphe sémantique qui donne le sens de la phrase.This tree makes it possible to produce a semantic graph which gives the meaning of the sentence.

30 Cela est vraiment fondamental car il est possible de créer de manière automatique, sans intervention humaine, une couche de description sémantique d'un texte, qui devient donc 29 interrogeable.This is really fundamental because it is possible to automatically create, without human intervention, a semantic description layer of a text, which therefore becomes searchable.

De plus, par la nature même du dispositif 2, cette couche de description sémantique peut être enrichie de manière incrémentielle, en fournissant de nouvelles phrases, sans avoir à refaire tout l'entraînement.Moreover, by the very nature of the device 2, this semantic description layer can be enriched incrementally, by providing new sentences, without having to redo all the training.

De plus, cette couche est interrogeable, et permet d'analyser ce que comprend le dispositif 2,In addition, this layer is searchable, and makes it possible to analyze what the device 2 comprises,

Claims

CLAIMS 1 Device for automatic computer word processing, comprising a memory (4) arranged to receive text data to be analyzed in the form of tokens each comprising a character string and a unique token identifier, a database of concepts associating character strings and concept identifiers, at least some of the concept identifiers being associated with one another, model lexical construct data and model structural construct data, each comprising one or more conditions of application to a characteristic and a or several conclusions constituting elements to be applied to a characteristic, and a database of observations associating at least two concept identifiers, a type of relation and an observation value indicating a probability of veracity of the type of relation between the au at least two concept identifiers, the device (2) being arranged to work d e repetitively on a transient comprising lexical characteristics and structural characteristics produced by applying model lexical constructions and model structural constructions, the transient being initialized with lexical characteristics comprising for each token a concept identifier whose frequency is the most important in the database of concepts and which is associated with the character string of the token, the device (2) further comprising a rectifier (6) arranged to determine for each lexical characteristic of a transient a list of concept identifiers associated with the concept identifier of this lexical characteristic, to determine a set of observations corresponding to the concept identifiers of the lists thus determined, and to apply a non-monotonic probabilistic logical inference engine to determine the concept identifier of each list such as observation values s associated with these concept identifiers minimize a cost function defined by applying a multivalued logic operator to one or more rules drawn from the content of the characteristics of the transient and instantiated with corresponding observation values, and to replace the identifiers of concept of the lexical characteristics of the transient by the concept identifiers thus determined, a matcher (8) arranged to determine among the model structural constructions those whose condition (s) apply to one or more of the characteristics of the transient, and to return the list of structural constructions with the characteristic (s) to which their conditions apply, the device (2) being further arranged to classify the structural constructions associated with each characteristic by frequency of use, the first structural construction having to be applied to the 10 transient and the others forming a list of options, and a combiner (12) arranged to perform sequentially the selection of a structural construct to be applied to the transient, storing a copy of the transient with the list of options associated with the structural construct , and applying the structural construction to be applied to the transient to the characteristic (s) of the transient with which this structural construction has been associated by the matcher (8), and to repeat this sequential execution on the transient thus modified with the structural construction to be applied to the next transient, the device (2) being further arranged to determine, after the execution of the combiner (12), whether the characteristics of the produced transient define a tree of which all the nodes are linked together and without cycle, to return this shaft if it is the case, and to repeat the execution of the rectifier (6), of the pairer (8) and of the combi nator (12) on the last transient produced otherwise, the device (2) being further arranged, when the pairer (8) does not return any structural construction to be applied to the transient, to replace the current transient by the copy of the transient. more recent, and to execute the combiner (12) with the first construct of the list of options as a structural construct to be applied to the transient. 30

2. Device according to claim 1, in which the non-monotonic probabilistic logic inference engine comprises an optimizer using the algorithm of multipliers with alternating directions. 32

3. Device according to claim 1 or 2, further comprising a filter (10) arranged to determine, for each structural construction to be applied and the associated list of options a set of rules, to determine a set of observations from of the set of rules and of the concept identifier (s) associated with the characteristic to which the structural construction to be applied must be applied, and to apply the non-monotonic probabilistic logical inference engine with the set of rules and the set of observation to determine the structural construction to be applied to the transient and the list of options. 10

4. Device according to one of the preceding claims, wherein the rectifier (6) defines frequency rules from lexical characteristics of the transient, neighborhood rules from the concept identifiers of the lexical characteristics of the transient and the concept identifiers which are associated with them in the concept database at a chosen distance, and structural rules drawn from semantic attributes of non-lexical characteristics of the transient linking two lexical characteristics together.

5. Device according to one of claims 1 to 4, wherein the combiner (12) is arranged to store only the copy of the transient with the list of options 20 associated with the structural construction, and in that, when the pairer (8) does not return any structural construct to apply to the transient, to replace the current transient with the most recent copy of the transient, and to execute the combiner (12) with the first construct of the option list as a construct structural to be applied to the transient, then repeating the application of the rectifier (6), the matcher (8) and the combiner (12) with the resulting transient.

Device according to one of claims 1 to 4, wherein the combiner (12) is arranged to store the remaining structural constructs to be applied to the transient together with the copy of the transient with the list of options associated with. the structural construct, and in that when the matcher (8) does not return any structural construct to apply to the transient, to replace the current transient with the most recent copy of the transient, and to execute the combiner (12) with the first build the option list as a structural construct to apply to the transient along with the remaining structural constructs, then repeating the application of the rectifier (6), matcher (8) and combiner (12) with the resulting transient.

7. Device according to one of the preceding claims, arranged to analyze the returned tree and to produce a semantic graph whose nodes are formed by the lexical characteristics and their concept identifier, and therefore the links are defined by the semantic attributes of non-lexical features linking two lexical features together.

8. Device according to one of the preceding claims, in which the inference engine is arranged to apply a multivalued logic operator chosen from the group comprising the Lukasiewicz t-norm, the minimum t-norm and the Harnacher product. .

9. A computer-implemented automatic text processing method, comprising the following operations: a) receiving text data to be analyzed in the form of tokens each comprising a character string and a unique token identifier, a database. concept data associating character strings and concept identifiers, at least some of the concept identifiers being associated with one another, model lexical construct data and model structural construct data, each comprising one or more application conditions to a characteristic and one or more conclusions constituting elements to be applied to a characteristic, and a database of observations associating at least two concept identifiers, a type of relation and an observation value indicating a probability of the veracity of the type of relation between the at least two identifiers of concept, b) to initialize a transient being able to co understand lexical characteristics and structural characteristics produced by applying model lexical constructions and model structural constructions, with 34 lexical characteristics comprising for each token a concept identifier whose frequency is the most important in the concept database and which is associated with the character string of the token, c) work repeatedly on the transient by repeating the following successive operations: cl) determining for each lexical characteristic of a transient a list of concept identifiers associated with the identifier concept of this lexical characteristic, to determine a set of observations corresponding to the concept identifiers of the lists 10 thus determined, and to apply a non-monotonic probabilistic logical inference engine to determine the concept identifier of each list such as the observation values associated with these identifia Concept nts minimize a defined cost function by applying a multivalued logic operator to one or more rules taken from the content of the characteristics of the transient and instantiated with corresponding observation values, and to replace the concept identifiers of the lexical characteristics of the transient by the concept identifiers thus determined, c2) determine among the model structural constructions those whose 20 or conditions apply to one or more of the characteristics of the transient, and to return the list of structural constructions with the characteristic or characteristics to which their conditions apply, c3) classify the structural constructs associated with each characteristic by frequency of use, the first structural construct to be applied to the transient and the others forming a list of options, and c4) sequentially perform the selecting a struct construct urel to be applied to the transient, storing a copy of the transient with the list of options associated with the structural construction, and applying the structural construction to be applied to the transient to the characteristic (s) of the transient at which this structural construction has been associated by the pairer (8), and to repeat this sequential execution on the transient thus modified with the structural construction to be applied to the following transient, c5) determine, after the execution of the combiner (12), if the characteristics of the produced transient define a tree of which all the nodes are linked to each other and without a cycle, to return this tree if this is the case, and to repeat operations cl) to c5) on the current transient otherwise, 10 c6) if operation c2) does not return any structural construction to apply to the transient, replace the current transient by the copy of the most recent transient, execute operation c5) with the first construction of the list of options as a structural construction to be applied to the transient. 15

10. The method of claim 9, wherein the application of the non-monotonic probabilistic logic inference engine comprises the application of an optimizer using the algorithm of multipliers with alternating directions. 20

11. The method of claim 9 or 10, further comprising, between operation c3) and operation c4): c7) determining, for each structural construction to be applied and the associated list of options, a set of rules, determine a set of observations from the rule set and the concept identifier (s) associated with the characteristic to which the structural construct to be applied is to be applied, and apply the non-monotonic probabilistic logic inference engine with the rule set and the observation set to determine the structural construction to be applied to the transient and the list of options. 30

12. Method according to one of claims 9 to 11, wherein the operation c) comprises the definition of frequency rules from lexical characteristics 36 of the transient, of neighborhood rules from the concept identifiers of the lexical characteristics of the transient. and concept identifiers associated with them in the concept database at a selected distance, and structural rules derived from semantic attributes of non-lexical characteristics of the transient linking two lexical characteristics to one another.

13. Method according to one of claims 9 to 12, wherein the operation c4) store only the copy of the transient with the list of options associated with the structural construction, and wherein, after the execution of the operation c7), operations 10 c1) to c7) are repeated on the current transient.

14. Method according to one of claims 9 to 12, wherein operation c4) stores the remaining structural constructs to be applied to the transient along with the copy of the transient with the list of options associated with the structural construct. , and in which, after the execution of the operation c7), the operation c5) is applied with the remaining structural constructions, then repeating the application of the rectifier (6), then, the operations c1) to c7) are repeated on the current transient. 20

15. Method according to one of claims 9 to 14, further comprising the following operation: d) analyzing the tree returned by operation c6) and producing a semantic graph whose nodes are formed by the lexical characteristics and their. concept identifier, and therefore the links are defined by the semantic attributes of 25 non-lexical characteristics linking together two lexical characteristics.