EP1880314A1 - Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantique - Google Patents
Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantiqueInfo
- Publication number
- EP1880314A1 EP1880314A1 EP06764601A EP06764601A EP1880314A1 EP 1880314 A1 EP1880314 A1 EP 1880314A1 EP 06764601 A EP06764601 A EP 06764601A EP 06764601 A EP06764601 A EP 06764601A EP 1880314 A1 EP1880314 A1 EP 1880314A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- tree
- semantic
- verbal
- structural
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Definitions
- the invention relates to the field of automated document analysis and the use of the results of such analyzes.
- document is meant here a set of data representing known or recognizable characters. It may in particular be a text consisting of an ordered sequence of verbal entities, such as words, groups of words, numbers or alphanumeric groups.
- analysis is understood to mean any type of verification intended to determine whether a document has a meaning, possibly taking into account its context.
- the term "use of the results" is understood to mean any operation or method that can be applied to an analyzed document, for example with a view to a translation, possibly simultaneous, or with a view to filtering information (for example as part of an e-mail management), or for orthographic and / or grammatical correction, or for transcription of a voice dictation, or for the generation of texts (such as abstracts), or for search, by means of a search engine, of textual information accessible in private or public network servers (such as the Internet).
- the purpose of the invention is therefore to improve the situation, and in particular to allow the correct interpretation of a document by an automatic evaluation of the role played by each of the verbal entities (or words). which compose this document (such as a text) on the syntactic, semantic and contextual levels. It proposes for this purpose a device for semantic analysis of documents, including a structural and semantic database and a document interpreter to determine if a document makes sense using the database.
- an n-ary tree manager responsible for constituting a structural n-ary tree from a decomposition of a document to be analyzed into an ordered sequence of verbal entities and of structural and / or semantic constraints, the tree a structural n-ary comprising a root node, formed of a primary governing verbal entity, and structures formed of a subordinate subordinate verbal entity and attached either directly or indirectly to the root node by a link provided with at least one connecting characteristic, a secondary subordinate verbal entity that may in turn become a governing verbal entity, - a semantic tree manager responsible for determining, at least from the n-ary structural tree and the database, categorizing entities of type object and act type activated by certain nodes of the n-ary tree, in order to construct a semantic tree with principal nodes consisting of object and act categorizing entities and linked by semantic relations arising from the n-ary tree's connectional characteristics and associated with attributes that are a function of the characteristics of the other nodes of the n-ary tree. area
- the semantic tree manager is responsible for transforming each n-ary (structural) tree into a semantic tree, firstly, by extracting from it categorizing entities, a second part , creating semantic links between categorizing entities extracted from the interpretation of the structural links that connect the verbal entities that activated these categorizing entities, and thirdly, by assigning to each categorizing entity extracted a list of minus an attribute according to a model (or pattern) defined by a lexicon.
- semantic constraints specific to each structural class that is to say semantic compatibility relationships that exploit the generic semantic features, and / or lexical functions, and / or
- a constraint is a regulated link defining a connectional characteristic.
- object-type categorizing entity an abstraction obtained (essentially) by categorization of objects from the real world (such as a table, a star or a rose) or abstract notions that function as metaphors for objects real (as for example feelings), generally referenced by nouns (the reciprocal is not necessarily true).
- categorizing entity of the act type an abstraction obtained (essentially) by categorization of actions of the real world (such as for example going or moving) or of abstract notions that function as metaphors of real actions (such as thinking or loving) that can be referenced either (preferentially) by verbs or substantives (in this case the construction of the semantic tree requires an additional step of applying a lexical function to transform the substantive structure into a verbal structure (this lexical function being part of the definition of the substantive concerned) - as an example we can quote the transformation of the expression "the displacement of the table” into the expression “move the table”), or by any other structural category according to a process specific to the language in question.
- a document has at least one meaning since it has been possible to constitute a semantic tree from its verbal entities.
- the device according to the invention may comprise other characteristics that can be taken separately or in combination, and in particular:
- its document interpreter can comprise a binary tree manager responsible for constituting a structural binary tree from the decomposition of a document into an ordered sequence of verbal entities and structural and / or semantic constraints, this structural binary tree comprising sheets, each associated with a verbal entity of the suite and constituting one of the two child nodes attached to a father node, and a root node, constituting a father node and associated with all or part of the verbal entities of the suite.
- the n-ary tree manager is responsible for building each n-ary tree from a structural binary tree;
- - its document interpreter can include a decomposition module responsible for decomposing each set of data defining a document to be analyzed into an ordered sequence of verbal entities;
- its document interpreter may comprise a semantic analyzer responsible for determining the semantic compatibility relationships between main object-type nodes and / or act-type main nodes of at least one semantic tree;
- its semantic analyzer can be responsible for determining relationships between the main nodes of at least one semantic tree among spatial, temporal, causal, anaphoric and cataphoric relationships; its semantic analyzer may be responsible for performing a diagnosis relating to the analysis of a document, and for delivering a message representative of the result of this diagnosis.
- This diagnostic message specifies the nature of the problems encountered during the analysis of the document.
- the invention also proposes a method of semantic analysis of documents consisting of:
- the n-ary structural tree comprising a root node formed of a primary governing verbal entity and structures formed of a secondary subordinate verbal entity and attached either directly or indirectly to the root node by a link provided with at least one connectional characteristic, a secondary subordinate verbal entity that may in turn become a governing verbal entity, - to be determined, at least from the structural n-ary tree and from data stored in a structural and semantic database, categorizing entities of object type and act type activated by certain nodes of the n-ary tree, in order to construct a semantic tree provided with principal nodes consisting of object and act categorizing entities and linked by semantic relations resulting from the n-ary tree's connectional characteristics and associated with attributes that are functionally related other nodes of the n-ary tree and their respective links.
- the method according to the invention may comprise other characteristics that can be taken separately or in combination, and in particular:
- a structural binary tree can be constituted from the decomposition of a document into an ordered sequence of verbal entities and structural and / or semantic constraints, the binary structural tree comprising sheets, each associated with a verbal entity of the suite and constituting one of the two child nodes attached to a father node, and a root node, constituting a father node and associated with all or part of the verbal entities of the suite.
- each n-ary tree is constituted from a structural binary tree;
- each set of data defining a document to be analyzed can be broken down into an ordered sequence of verbal entities
- semantic compatibility relationships between principal nodes of the object type and / or principal nodes of the act type of at least one semantic tree can be determined
- the analyzed document has a meaning in determining relations between principal nodes of at least one semantic tree, chosen from spatial, temporal, causal, anaphoric, and cataphoric relationships; after having constituted a semantic tree, it is possible to perform a diagnosis relating to the analysis of a document and then to deliver a message representative of the result of the diagnosis.
- This diagnostic message specifies the nature of the problems encountered during the analysis of the document.
- It can for example include information representative of the difficulties encountered during the analysis of a document, and / or the different possibilities of interpretation of a sentence, and / or at least one unknown word, and / or or at least one grammar fault, and / or at least one construction defect, and / or at least one sense, and / or list of unresolved ambiguities.
- FIG. 1 very schematically and functionally illustrates an exemplary embodiment of a device for semantic analysis of documents according to the invention
- FIG. 2 schematically illustrates the main steps of an exemplary algorithm for decomposing a document into verbal entities
- FIG. 3 schematically illustrates the main steps of an exemplary algorithm for constructing a structural binary tree from a document decomposition into verbal entities
- FIG. 4 schematically illustrates the main steps of an exemplary algorithm for constructing a structural n-ary tree from a structural binary tree
- FIG. 5 is a non-limiting tree diagram illustrating schematically relations between different types, subtypes and sub-types of categorizing entities
- FIG. 6 schematically illustrates the main steps of an exemplary algorithm for constructing a semantic tree from a structural n-ary tree
- FIG. 7 schematically illustrates an example of a binary tree. structural
- FIG. 8 schematically illustrates an example of a n-ary structural tree resulting from the structural binary tree of FIG. 7,
- FIG. 9 schematically illustrates an example of a semantic tree resulting from the n-ary structural tree of FIG. 8;
- FIG. 10 schematically illustrates causal and anaphoric relations in another semantic tree example;
- FIGS. 11A and 11B respectively diagrammatically illustrate another example of a n-ary structural tree and the associated semantic tree in the case of a chronological management;
- FIG. 12 schematically illustrates a temporal relation between two examples;
- semantic tree FIG. 13 schematically illustrates temporal and anaphoric relations between two other semantic tree examples,
- FIG. 14 schematically illustrates semantic compatibility relationships between verbal entities of another semantic tree example
- FIG. 15 schematically illustrates the principal relationships between nodes associated with substantives (NO) and associated nodes. adjectives (NA), and the main concepts attached to it (especially metrics), and
- FIG. 16 schematically illustrates the main relationships between categorizing entities object and act, and the main concepts attached thereto.
- the object of the invention is, in particular, to structure the meaning of the information contained in a document to be analyzed, in particular by means of a model for automatically removing at least a part of the inherent ambiguities and polysemias. natural language documents.
- the device D is dedicated to the automatic removal of ambiguities and polysemies of text type documents.
- Such a device D can for example be installed in a computer or application server (s) which it uses certain resources, including calculation (CPU).
- An analysis device D according to the invention comprises at least one BD structural and semantic database and an ID document interpreter.
- the database BD also called lexicon (or referential-lexicon), includes words (or verbal entities) to which are assigned syntactic and semantic properties as well as composition rules (or links). Properties and links (or rules) are data used to construct categorizing (or conceptual) entities of the act and object type.
- categorizing entities have been given previously. Moreover, here we mean by "property” an abstraction obtained by categorization of notions of a defining nature, based on sets of values, generally referenced by nouns (such as color or size). A value is by definition an abstraction directly referenced by an adjective and necessarily linked to a property. Values can usually be associated with quantitative ("objective") and / or qualitative (“subjective”) scales, as will be discussed later in the introduction of the notion of metrics.
- Words are classified into structural classes of verbs, nouns, adjectives, adverbs and structuring words. All these classes can be subdivided for example into subclasses, sub-subclasses, and so on (as will be seen later with reference to Figure 15 where the NO A , NO U and the like are subclasses of the class of substantives).
- a categorizing entity is responsible for freely generating ambiguous meaning by association with other categorizing entities, under the control of properties that limit their freedom to respect a controlled syntactic and semantic structure.
- the links are responsible for controlling the properties through pragmatic overdeterminations (weak or strong pragmatic constraints), either from the document (text) itself, or from the general context.
- weak pragmatic constraints we mean here the fact that no general reference framework in an open context can exhaust all the possibilities of interpretation of a textual message.
- Links provide flexibility to the process of ambiguity (or disambiguation) by enabling or disabling certain property rules as needed, for example by privileging semantics over syntax when an ungrammatical sentence is clearly meaningful. Their role in disambiguation is essential.
- the database BD can be subdivided into a general database BD1 and a specialized database BD2.
- the generalist database BD1 also called the general lexicon, has inputs, typically several tens of thousands (for example 80 000) which define inflected forms (typically several hundreds of thousands, and more than 300 000 for example), provided with data reflecting weak pragmatic constraints intervening notably in the disambiguation of the intrinsic meaning of a text that preserves its general polysemy.
- the specialized BD2 database also known as the specialized lexicon, contains data reflecting linguistic features of a specific context (strong pragmatic constraints) that make it possible to limit the general polysemy of the messages in order to extract one or more locally interpreted meaning that is relevant. The more the context definition is detailed, the more the final interpretation is simple to achieve.
- the semantic properties are organized according to a taxonomy and distributed on the axes of three multidimensional primary repositories - the material real, the intentional and the contextual. They are independent of the classical syntaxes which only slightly integrate the semantic parameters. Therefore they are not specific to a particular language.
- Each multidimensional primary repository has axes of decomposition of semantic properties and a logic of own composition.
- the logics associated with the three primary repositories are of the modal type.
- Categorizing entities are dynamic objects in a six-dimensional linguistic universe with algebra in multimodal logic.
- a set of compatibility rules between properties govern the interactions between categorizing entities.
- the document interpreter ID is responsible for determining whether a document has meaning using the database BD and processing functions implementing a mathematical model which will be discussed later. It comprises at least one GAN n-ary tree manager and a semantic tree manager GAS, as well as possibly an AS semantic analyzer.
- the n-ary tree manager GAN is responsible for building, using its processing functions and the database BD, a structural n-ary tree from a decomposition of a document to be analyzed. an ordered sequence of verbal entities (or words, or groups of words, or alphanumeric groups) and structural and / or semantic constraints chosen and defined in the database BD.
- the ordered sequences of verbal entities are for example provided by a document decomposition module MD which, as in the example illustrated in FIG. 1, can be part of the device D. But this does not matter. is not required. Indeed, when the device D does not include a document decomposition module MD, the suites can be directly provided by an external equipment.
- the document decomposition module MD is responsible, when it exists, for breaking down each set of data, which defines a document (such as a text) into an ordered sequence of verbal entities to be analyzed.
- a document such as a text
- the language and its syntax structure are not identified. The latter if it is not given, is identified in the next step. However, it may be considered at this stage to determine separators specific to a given language, as for example for Chinese.
- Each n-ary structural tree which is constructed by the n-ary tree manager GAN, includes a root node that is associated with a so-called primary verbal verb entity and structures that are formed of a so-called secondary subordinate verbal entity. and attached either directly or indirectly to the root node by a link provided with at least one connection feature.
- the establishment of a link (identified by its (or its) connective characteristic (s)) in a binary structural tree is done by applying the structural and / or semantic constraints provided by the associated connection potentials.
- the elementary data from the database (or lexicon) BD1 or BD2 of the two verbal entities concerned.
- Some secondary subordinate verbal entities may in turn become governing verbal entities.
- Each structural n-ary tree can be constructed from a binary tree, itself constructed from an ordered sequence of verbal entities, possibly provided by the decomposition module MD.
- the document interpreter ID comprises, as illustrated in FIG. 1, a structural binary tree manager.
- the latter is responsible for recomposing each ordered sequence of verbal entities that it receives into a structural binary tree. More precisely, as will be seen below, two adjacent nodes come into composition to form a new node, knowing that initially only leaves are available.
- a structural binary tree comprises a root node which represents the set of verbal entities of a sentence (or portion of sentence) to be processed, and which constitutes a parent node for two child nodes resulting from its binary decomposition. According to the number of verbal entities that a child node comprises, it constitutes either a leaf of the binary tree, or a father node decomposable in its turn, in a binary way, into two child nodes.
- the binary decomposition of the root node gives two child nodes that can in turn be fathers nodes that can be binary decomposed and so on until each leaf of the tree binary be occupied by a verbal entity (word) of the (portion of) sentence being processed.
- This binary decomposition is done according to structural and / or semantic constraints stored in the database BD.
- the user of the device D does not intervene at this stage. His intervention is eventually reduced to the definition of local rules to override certain general rules (such as prohibiting the application of rules of agreement in gender).
- the GAB binary tree manager and / or the GAN n-ary tree manager may have a function of identifying the lexical units (or verbal entities) specific to the language used to write (or dictate) a document, to highlight lexical ambiguities.
- the semantic tree manager GAS is responsible for determining object and act type categorizing entities from the structural n-ary tree and data stored in the database BD.
- an object-type categorizing entity is an abstraction obtained (essentially) by categorization of real-world objects or abstract notions that function as metaphors for real objects, generally referenced by nouns.
- a categorizing entity of the act type is an abstraction obtained (essentially) by categorization of real-world actions or abstract notions that function as metaphors for real actions that can be referenced either (preferentially) by verbs, or by substantives (in this case the construction of the semantic tree requires an additional step of applying a lexical function to transform the substantival structure into a verbal structure).
- the semantic tree manager GAS can, in certain situations, use the information contained in one or more other n-ary trees corresponding to other sentences of the same document to constitute a semantic tree. . This is particularly the case in the presence of ambiguities of the anaphor or cataphor type.
- Each semantic tree is made up of main nodes that are each associated with at least one categorizing entity of the object type or of the act type, which is activated by certain nodes of the n-ary tree, and which are linked by semantic relations originating from connective features of the n-ary tree and associated attributes that are a function of the characteristics of the other nodes of the n-ary tree and their respective links.
- the semantic analyzer AS is responsible for determining the semantic compatibility relationships between the main object-type nodes and / or the act-type main nodes of at least one semantic tree. Semantic compatibility relationships exploit semantic features. For example, only a "human”, which is an object-type categorizing entity, can "think", which is an act-type categorizing entity.
- the semantic analyzer AS is a document analysis diagnostic tool. It can for example specify what difficulties were encountered during the analysis of a document (or sentence) and / or different possibilities of interpreting a sentence and / or unknown words and / or grammatical errors (for example, disregarded rules of agreement) and / or construction defects and / or nonsense (eg unsatisfied semantic compatibility rules) and / or ambiguities that could not be resolved.
- the diagnostics it is for example possible to classify messages, or to solve a problematic situation (by application of a local rule or by identification of a lack of information preventing complete comprehension of a message), or why a message is considered "incomprehensible”.
- automated actions may be undertaken.
- the various elements composing the document interpreter ID use processing functions that implement a mathematical model.
- the latter is based on several algorithms that intervene on the links that are provided with at least one connectional characteristic and that are established between structures formed of a subordinate subordinate verbal entity and a root node. More precisely, these algorithms exploit the properties of the entries of the database BD, previously transformed into categorizing entities whose data and links constitute the properties.
- the categorizing entities constitute varieties distributed along axes grouped into three different primary reference frames. Varieties can interact and combine via lexical, syntactic, semantic, and pragmatic composition rules in a six-dimensional linguistic universe.
- Groups can be likened to syntagms with syntactic and semantic properties. They inherit new availability of composition of a higher order which authorize the creation of secondary or supergroup linguistic graphs which correspond roughly to informative sentences possibly embellished with a diagnosis, for example in the form of a classification in "comprehensible information "," Questionable information ",
- the mathematical model makes the data freely interact with each other under the sole control of the compatibility rules of their respective properties.
- hypotheses are explored and reduced, for example by means of a method of reduction of hypotheses inspired by the modal system called "S4" of
- Compatibility rules are first and second level, they allow to lift as soon as possible the different types of ambiguities of first level that can appear in an ordered sequence of verbal entities (or sentence).
- the super group can then be related to the original sentence (or document) for the exploitation of the structured information it contains. For example, we can compare a super group with super reference groups (defining pre-parameterized filters, possibly derived from an analysis of questions - in natural language - posed by users or by other texts). You can also perform operations on groups of super groups, such as distance calculations or consistency checks. One or more super groups can also be used to extract specific information, such as summaries. One or more super groups can still be used to generate new messages.
- the decomposition module MD receives a document to be analyzed. This is for example a text in natural language.
- the decomposition module MD determines (reads) the first character of the document.
- the decomposition module MD performs a test to determine if the character read is the last of the document.
- the decomposition module MD performs a new test in a step 30 to determine whether the character read is a separator. If it is not the case, in a step 40 the decomposition module MD adds this character to the word that is being composed, then it returns to step 10 in order to restart the steps of the algorithm with the character following document. On the other hand, if the character being read is not a separator, the decomposition module MD performs a new test in a step 50 to determine if the character read is the last one. of a word being composed. If you!
- a step 60 the decomposition module MD identifies the word that has just been composed, then it stores the word in a buffer before returning to step 10 in order to restart the steps of the algorithm with the next character of the document.
- the decomposition module MD creates, in a step 55, a level which materializes a hyphen, then moves on to step 60.
- the separators are either word separators (which actually leads to step 60), or separators of text units of different logical levels, nested within each other, such as segments, sentences, paragraphs. , or chapters.
- This sample algorithm is applied to each character of a document up to the last.
- This decomposition algorithm thus provides an ordered sequence of verbal entities consisting respectively of words, groups of words, numbers or alphanumeric groups, generally separated by separators, and whose meaning must be analyzed.
- the implementation of the document decomposition algorithm can be done by means of a transducer, for example constructed in the form of a finite state machine which optimizes both the required memory space and the performances.
- the meaning analysis of an ordered sequence of verbal entities preferably begins with the constitution of a structural binary tree for each sentence of the document.
- the entire ordered sequence of verbal entities is used to construct a binary tree.
- each portion of the ordered sequence of verbal entities, which corresponds to a sentence is used to construct a binary tree.
- the bitmap manager GAB receives an ordered sequence of verbal entities.
- This suite is for example provided by the document decomposition module MD which implements a decomposition algorithm of the type described above. But, this is not mandatory. In fact, when the device D does not include a document decomposition module MD, the sequences can be directly supplied to the bitmap manager GAB by an external device.
- the bitmap manager GAB initializes the structural binary tree to be built.
- bitmap manager GAB for example sets to zero (0) the value of a parent node counter i of the structural binary tree.
- the bitmap manager GAB for example sets to zero (0) the value of a parent node counter i of the structural binary tree.
- each other father node (i> 0) of the binary tree represents the result of a part of the binary decomposition of verbal entities that occupy their own father node.
- the binary decomposition of the root node gives two child nodes which can in turn be fathers nodes that can be binary decomposed and so on until each leaf of the binary tree is occupied by an entity verbal (word) of the processed sentence.
- the binary decomposition is done according to structural and / or semantic constraints stored in the database BD.
- bit matrix manager GAB starts the analysis of the parent node i pointed at zeroing (0) the value of a child node counter j of the structural binary tree. Then, he proceeds to a decomposition of the verbal entities of the father node i pointed in two parts j and j '(not represented).
- bitmap manager GAB performs a test to determine whether the pointed part j, resulting from the decomposition of the pointed node i, satisfies one or more chosen structural and / or semantic constraints. If it is not the case, it proceeds to step 140. In the opposite case, in a step 135 the bitmap manager GAB defines a new (connection) node within the binary tree in order to assign it to the pointed part j, then he proceed to step 140. This new node j is then a child node of the parent node i pointed.
- step 140 the GAB binary tree manager performs a test to determine if the dotted portion that has just been processed is the last part resulting from the decomposition of the parent node pointed to. If it is not the case, in a step 150, the bitmap manager GAB increments the index counter j by one unit, then returns to perform step 130. On the other hand, if the indicated part j that has just been processed is the last part resulting from the decomposition of the parent node i pointed, then the bitmap manager GAB performs another test in a step 160 to determine if there are other nodes i to treat. If it is not the case, in a step 170, the bitmap manager GAB increments the index counter i by one unit, then returns to perform step 120.
- bitmap manager GAB performs another test in a step 180 to determine if the last iteration performed in step 135 did not create new nodes and thus new connection possibilities that it is necessary to explore. If this is not the case, the binary structural tree is constituted and the bit-tree construction algorithm ends in C. On the other hand, if an iteration must be performed, the bit-tree manager GAB returns to perform Step 110.
- An example of a structural binary tree corresponding to the phrase "The small ice breeze" is illustrated in Figure 7. In this example, the root node corresponds to the entire sentence "The small ice breeze".
- a first child node of the root node includes the words "The little breeze", while the second child node of the root node includes the words "the ice”).
- the first child node (“The little breeze") is then a father node for its two child nodes associated respectively with the words “breeze” and "The little one”.
- the child node associated with the word “breeze” is a leaf of the binary tree that can no longer be decomposed.
- the child node associated with the words "La petite” is then a father node for its two child nodes associated respectively with the words "La” and “petite”.
- the child nodes associated respectively with the words "La” and “petite” are leaves of the binary tree that can no longer be decomposed.
- the second child node (“the ice”) is a father node for its two son nodes respectively associated with the words “la” and “glace”.
- the child nodes associated respectively with the words “la” and “glace” are leaves of the binary tree which can no longer be decomposed.
- n-ary tree a tree in which the decomposition of a father node leads to any number of child nodes, this number may vary from one father node to another.
- FIG. 4 Reference is made to FIG. 4 to describe the main steps of an exemplary algorithm for constituting a structural n-ary tree.
- This algorithm is implemented by the n-ary tree manager GAN of the device D according to the invention.
- the n-ary tree manager GAN is fed in binary trees by the bit matrix manager GAB of the device D. But this is not mandatory . Indeed, it can be envisaged that the n-ary tree manager GAN is fed with binary trees by external equipment, or that it is arranged to directly construct an n-ary tree from an ordered sequence. of verbal entities, and therefore without having to build a binary tree beforehand.
- the n-ary tree manager GAN receives the description of a binary tree, for example provided by the structural binary tree constitution algorithm described above.
- the n-ary tree manager GAN initializes the n-ary structural tree to be constructed. It creates a first node C (current) in the n-ary tree which becomes its root node, and sets a node index counter i of the associated binary tree to zero. It is important to note that each node of an n-ary tree is associated with a single verbal entity (or word) coming from a leaf of the binary tree, unlike the binary tree which has intermediate nodes associated with several entities. verbal (or words).
- n-ary tree manager GAN takes a node of index i in the binary tree, then in a step 220 it performs a test to determine if this index node i is a leaf of the binary tree.
- the GAN n-ary tree manager performs a test in a step 230 for determining whether the index node i is of the governing type (R) or of the subordinate type (S).
- the n-ary tree manager GAN associates with the current node C the index leaf node i of the binary tree, and this current node C is then considered the father of at least one child node of the n-ary tree. It is indeed recalled that each father node of a binary tree systematically corresponds to a parent child node and a subordinate child node. Consequently, the two leaf nodes of each father intermediate node of a binary tree can be linked to one another to form within the associated n-ary tree a structure in which the governing child node is attached. to the corresponding subordinate child node by a link that can be associated with the connectional characteristics of their parent node.
- the root node of the n-ary tree can only be a leaf node governing which is attached, directly and indirectly, to the root node of the associated binary tree by one or more intermediate nodes of exclusively governing type. In other words, this root node comes from an exclusively governing lineage.
- the n-ary tree manager GAN proceeds to a step 270.
- the n-ary tree manager GAN thus connects (reassigns) the subordinate node (S) of index i to the corresponding governing node (R), by means of a link associated with the connectional characteristics of their node. dad. Then, the n-ary tree manager GAN proceeds to step 270.
- the GAN n-ary tree manager begins by creating a new branch in the n-ary tree under construction, and then assigns the properties of the index node i to this branch B. Then, he connects (or attaches) the upper end (sup (B)) of the branch B to the current node C, and creates a new node N that connects (or attaches) to the lower end (inf (B)) of the branch B. Finally, the n-ary tree manager GAN replaces the current node C with the node N that it has just created, before going on to step 270.
- the n-ary tree manager GAN performs a test to determine if the index node i being processed is the last node of the binary tree to be processed. If this is the case, then the n-ary structural tree is formed and the n-ary tree construction algorithm ends in D. On the other hand, if the index node i being processed is not not the last node of the binary tree to be processed, in a step 280 the n-ary tree manager GAN increments the value of the index i by one, then returns to perform step 210 with the next node of the binary tree. All the nodes of the binary tree are thus treated one after the other starting from the root node.
- the root node of the n-ary tree is the verb "breeze" which is the only leaf node governing the binary tree from an exclusively governing line. In most cases, the root node of the n-ary tree is the main verb of the parsed sentence.
- a first structure is composed of the nodes "La” and “petite” which are respectively leaf nodes governing and subordinate of the intermediate node associated with the verbal entities "La petite” in the binary tree.
- the leaf node “La” is here governing, so it is attached to the root node “breeze”.
- the "small" leaf node is here subordinated and attached to the associated governing node
- a second structure is composed of the nodes "la” and "glace” which are respectively nodes subordinate leaves and governing node intermediate associated with verbal entities "ice” in the binary tree.
- the leaf node “ice” being here governing, it is therefore attached to the root node “breeze”.
- the leaf node “la” is here subordinated and attached to the associated governing node “ice” by a link associated with the connectional characteristics of their father node (“ice”) within the binary tree.
- semantic tree a tree that only includes categorizing entities (object or act type) with their properties, necessary to understand the meaning of the sentence (or document), given its context .
- Categorizing entities are the first level of decomposition of a taxonomy: ontology. All categorizing entities fall into one or other of their subtypes (or subclasses).
- FIG. 5 shows a nonlimiting example of a tree diagram describing various types, subtypes and sub-sub-types of categorizing entities. More precisely, in this example categorizing entities of the "act” type group two subtypes (or subclasses) of categorizing entities called “event” and "defining", which group respectively two sub-sub-types (or sub-types). -sub-classes) categorizing entities called “action” and "event” on the one hand, and “definition” and “modalization” on the other hand. Categorizing entities of type "object” group two subtypes (or subclasses) of categorizing entities called “individual” and "place”.
- FIG. 6 describes the main steps of an exemplary algorithm for constituting a semantic tree.
- This algorithm is implemented by the semantic tree manager GAS of the device D according to the invention. It may be preceded by a possible application of a lexical function intended to normalize the structural n-ary tree in order to eliminate any "stylistic" peculiarities that may be detrimental to its semantic analysis.
- the semantic tree manager GAS receives the description of an n-ary tree, for example provided by the structural n-ary tree construction algorithm described above.
- the semantic tree manager GAS receives the description of an n-ary tree, for example provided by the structural n-ary tree construction algorithm described above.
- the semantic tree manager receives the description of an n-ary tree, for example provided by the structural n-ary tree construction algorithm described above.
- the semantic tree manager receives the description of an n-ary tree, for example provided by the structural n-ary tree construction algorithm described above.
- the semantic tree manager receives the description of an n-ary tree, for example provided
- GAS extracts from the structural n-ary tree the verbal entity subtended by a highest-ranking categorizing entity in the n-ary tree (usually associated with its root node) and which constitutes the root of the semantic tree .
- the semantic tree manager GAS performs a test to determine if the verbal entity corresponds to an act.
- the semantic tree manager GAS proceeds to a step 320. If this is not the case, the semantic tree manager GAS creates, in a step 315, a support verb defining an act, then it goes to step 320.
- step 320 the semantic tree manager GAS initializes the semantic tree. Then, he inserts the act into a chronological list of acts, which may possibly already include other acts listed in the sentence being analyzed and / or in previous sentences of the document being analyzed.
- This list is for example in the form of a table built as and when stored in a memory. Then, the semantic tree manager
- GAS instantiates a semantic structure.
- the lexicon provides a semantic tree pattern for the categorizing entity (object or act) whose
- a pattern comprises, on the one hand, a semantic connection model (of the same nature as certain lexical functions) which makes it possible to transform the actantial schema of a verbal entity into a semantic (sub) tree, as shown schematically, at As an example, in Figures 8 and 9, and secondly, a list of properties (or attributes), as shown schematically in Figure 9.
- a semantic connection model of the same nature as certain lexical functions
- the semantic tree manager GAS extracts the next node from the n-ary tree, and in a step 340 it performs a test to determine if the verbal entity associated with this extracted node activates an object.
- object In accordance with the definition given above, the word "object” must be here understood in its broadest and most common definition, extending it to abstract objects such as feelings and representations, and not in the specialized and restrictive definition it has in computer science.
- the semantic tree manager GAS inserts this object in the semantic tree. Then, it inserts the object into a list (or universe) of objects, which may possibly already contain other objects listed in the sentence being analyzed and / or in previous sentences of the current document. analysis. This list is for example in the form of a table built as and when stored in a memory. Then, the semantic tree manager GAS instantiates the semantic structure (as indicated above). The semantic tree manager GAS then proceeds to a step 410.
- step 360 the semantic tree manager GAS performs a new test to determine whether properties (or connectional features) are associated with this verbal entity.
- the semantic tree manager GAS identifies a proprietary object. More specifically, a categorizing entity of the "property” type that does not operate autonomously (unless it is a meta-object), and which necessarily characterizes an object, has been identified. This object, which is called “owner”, is identified either directly through a connection (ordinary or anaphoric) that connects it to the property (as for example the expression “the color of the sky” or “its color” ), or (more rarely, when there is no apparent connection) by going through the list of objects instantiated by the analyzed text in search of an object that has the property in question (which can be a source of anomalies when there are none or if there are several possible).
- a connection ordinary or anaphoric
- the semantic tree manager GAS assigns a value to the object.
- the value (s) associated with the property is (are) identified directly by searching among the subordinate nodes those who are in adjectival connection
- the semantic tree manager GAS then proceeds to step 410. If the result of the test carried out in step 360 indicates that the verbal entity is not associated with a property, then in a step 380 the manager of semantic tree GAS performs a new test to determine if modalisation is possible. Modalization is carried by verbs such as power or will, on the one hand, and think (that) or believe (that), on the other hand. These verbs do not activate acts (unlike the verbs think or believe when used absolutely) but modify the interpretation of the act to which they are attached. Thus, the expression “I can go” does not have the same value as the expression "I'm going", but in both cases the semantic head is the verb "to go”. Similarly, the expression “Peter thinks we do not write enough” does not have the same value as the expression "we do not write enough", the semantic head being however the verb "to write” in both case.
- the semantic tree manager GAS identifies a proprietary act in a step 390.
- the procedure for identifying a proprietary act is similar to that of a proprietary object presented above (but applied to a act).
- the semantic tree manager GAS assigns a modalization to the proprietary act.
- the semantic tree manager GAS then proceeds to step 410.
- step 400 the semantic tree manager GAS considers that there is an anomaly. We are then in the presence of a node that there is no way to attach to the semantic tree being created.
- the semantic tree manager GAS then proceeds to step 410.
- the semantic tree manager GAS performs a test for determine if the node of the n-ary tree that has just been analyzed is the last of the said n-ary tree. If this is the case, then the semantic tree is constituted and the semantic tree construction algorithm ends in E. On the other hand, if the node of the n-ary tree that has just been analyzed is not the last of said n-ary tree, then the semantic tree manager GAS returns to step 330 to begin analyzing the next node of the n-ary tree. All the nodes of the n-ary tree are thus analyzed one after the other.
- the root node of the semantic tree is the verb "break” that comes from the word “breeze” of the n-ary tree of figure 8.
- This word “breeze” has indeed two very different meanings : verb "to break” conjugated to the present (and thus act corresponding to the answer “now” to the question “when?", knowing that it remains to be determined if the word “now” concerns the time of the speech or if it is defined by the speech), and the noun “breeze” which designates a small fresh wind.
- the word “petite” is an adjective attached because of its position on a subject of the verb "to break” which is here represented by the word “La” which is therefore an anaphorical pronoun denoting a common feminine noun introduced into a sentence previous.
- “The” here is a determinant whose role is, on the one hand, to confirm the substantive status of the verbal entity that it accompanies (thus making it possible to substantiate, for example, adjectives or verbs), and on the other hand, to provide information as to the existence of the associated object.
- the adjective “small” therefore constitutes a main node of the object type
- This object node x is associated with two properties, one of a feminine gender (referenced F in FIG. 9) and one of size (referenced as small in FIG. 9).
- the semantic tree illustrated in FIG. 9 is therefore the result of the ambiguity removal relative to the two branches attached to the word "breeze" of FIG. 8.
- this semantic tree does not make it possible to remove the other ambiguity relating to the interpretation of the sentence, mentioned above.
- additional analyzes of the contextual type must be performed by the semantic analyzer AS of the device D.
- this other ambiguity can only be thrown by a cotexual analysis with respect to the sentences previous and / or following of the analyzed document, or contextual (that is to say pragmatic).
- cotext which refers to the text surrounding a sentence being analyzed
- context which refers to the environment (in the broad sense) in which a text is produced and / or received.
- These complementary analyzes are mainly aimed at treating anaphoras and cataphors. They are done by determining within the tables (or lists) objects and actions words that do not have a semantic identity, such as pronouns. In other words, we search among the stored words those that can serve as anaphoremas.
- the semantic tree on the left corresponds to the phrase "The customer has called”.
- the main nodes of this semantic tree are "call” and "client”.
- the word “to call” is the main verb and therefore the act, while the word “client” is a substantive subject of the verb "to call” and therefore an object.
- the semantic tree on the right corresponds to the part of the sentence "he received his bill late”.
- the main nodes of this semantic tree are "receive”, “he” and "invoice”.
- the word “to receive” is the main verb and therefore the act, while the word “he” is a pronoun subject of the verb “to receive” and therefore an object, and the word “invoice” is a substantive direct object complement of the verb "to receive” and thus an object.
- the separator ":” is here equivalent to "because", so that there is a causal relationship between the two parts of the sentence.
- the anaphoric "he” can only refer to the client word. Indeed, in the list of instantiated objects of the analyzed document, only the word “client” fulfills the conditions of structural and semantic compatibility (masculine singular substantive, semantically compatible with the actant prime (or subject) of the verb "to receive” which is the word “he”). There is therefore an anaphoric relationship between the words "client” and "he”.
- Figs. 11A and 11B are illustrated a structural n-ary tree and the associated semantic tree that correspond to the phrase "The invoice arrived after the due date”.
- Ambiguities are here materialized in the structural n-ary tree by stylized T's placed at the level of the words “after” and “expiry”, and materializing a function of translation of the word of right by the word of left.
- the word to the left of a stylized T is obligatorily a translative; it is a grammatical word which has the faculty to change the structural category the word which is to the right of the same stylized T. For example, in the phrase “I take the red” (speaking of a garment), the determinant “the” shifts "red” from its original category of adjective to substantive, implying that it There must be an object on the semantic plane compatible with the red color that answers the question asked.
- FIG 12 In Figure 12 are illustrated two semantic trees corresponding to two parts of a sentence separated by the separator "," (comma). This sentence is "While X is A, Y is B". This example materializes the temporal relation between the two actions respectively carried out by X and Y. More precisely, the analysis of the two semantic trees and the tables of acts and objects, associated with the analyzed document, makes it possible to understand that the action A takes place in a time interval I and that the action B takes place in a time interval I 'which is included in I.
- Figure 13 are illustrated two semantic trees corresponding to two parts of the same sentence. This sentence is "Peter lost the book I gave him".
- a main node belonging to a semantic tree can be an act or an object derived from the structural classes verb and substantive.
- some nodes of a structural n-ary tree may not respond to this constraint. This is particularly the case of the word "red” in the sentence "I take the red”.
- the word "red” is here an adjective, it can not directly create a main node in the semantic tree. It can only be in principle a value of a property (the color) of an object substantive to which it relates.
- a complementary analysis parallel to that allowing to solve the anaphors and cataphors, must therefore be carried out. This additional analysis consists in determining the objects, already listed in the object list of the document to be analyzed, the one or those having a property of the same type as that associated with the problem word. In the example, this property is the color.
- the objects specified in the list are then applied to the semantic constraints are carried by the main verb, here the verb "to take".
- the "red” value is then assigned to the "color” property of the compatible object which then constitutes a node allowed to be integrated into the semantic tree of the sentence to which it belongs.
- Figure 13 is illustrated an n-ary tree corresponding to the phrase "Increase the volume of the base xx of yy Go". Ambiguities are here materialized in the structural n-ary tree by stylized Ts placed at the level of the words “base” and “Go” (for "Giga octet”).
- a first semantic pre-analysis makes it possible to see that the semantically relevant words, that is to say that pertain to the modeled environment, are here "increase”, “volume”, “base”, “xx”, “Go” And yy.
- Semantic compatibility relationships are for example governed by two types of compatibility rules called C- ⁇ x and C 2 .
- Compatibility rules of type C / apply to two nodes that are in direct connection, that is to say whose connection (or attachment) does not include an intermediate node. This is for example the case of words
- connection may, however, include
- a first group concerns the compatibility based on the actancial / semantic schemas that the language allows to degrade by replacing an object node (NO), such as a substantive, by another object node compatible with the first in the context of metrics.
- NO object node
- An actancial scheme (or potential of connection) describes the set of connections (hence the collocation "potential of connection") that a verbal entity is likely to accept, as well as their conditions of realization.
- Each potential connection is identified by a connectional characteristic, such as Examples are those referenced Act1, Act2 and Det in Figure 8.
- each potential connection comprises a variable number of structural and / or semantic constraints (for example, the potential Act1 connection of a verb can only be provided by a noun compatible in number, kind and semantically).
- a second group concerns metric-based compatibility, which privileges the connection of a node associated with a substantive (NO) to a node associated with an adjective (NA), including nodes associated with real nouns ( denoted NO 0 ) and the other nodes associated with unit nouns (denoted NO U ).
- Compatibility rules of type C 2 apply to two nodes in indirect connection, that is to say whose connection (or attachment) passes through at least one other node. This is for example the case of the words "volume” and "Go”.
- a metric is defined by the set of values it admits associated with a unit (as well as its multiples and subdivisions).
- NA belongs to one or more metrics
- P A the list of all the properties (independently of the objects they define) to which these metrics can be associated
- NO is defined by a set of properties P 0 , each of which is linked to a metric.
- Quantitative metrics are usually described intensionally, as a subset satisfying a condition, such as belonging to the set of positive integers or decimals.
- Qualitative metrics are usually described in extension as a set of discrete values, such as color (red, green, yellow, blue, orange, ...) or beauty (beautiful, ugly, ... ).
- Quantitative metrics are also distinguishable from qualitative metrics because they allow a relationship of order (values can be classified, which is not the case for pure qualitative metrics), and usually involve the notion of metrics. unit (except in the case of enumeration).
- a special status must be provided for units and percentages.
- the units answer the problem of enumeration (creating a category of the absolute), while the percentages make it possible to create relative scales independent of any unit.
- an exact quantitative metric may correspond to a scale of intensity between -25 and +25
- an approximate quantitative metric may be defined by discrete values of adjectives such as large, medium, and small.
- the word “create” (act) is compatible with the word “base” (object) which is a real noun (NO 0 ) defined by properties such as identifier, volume, content, server, etc.
- the word "empty” is an adjective (NA) that is governed by the word “base”, and must therefore be assigned as a value to one of the properties of the word “base” (NO °).
- the property identifier has the particularity of not no precise metric, any word, existing or manufactured, that can be used. It follows that it is strongly discouraged to use the words of the current language as identifiers, which provides a first clue to remove the previous ambiguity. An unknown word placed in the right place in a structural tree is a suitable candidate. A second clue is provided by the absence of capital letters.
- the analysis can be completed by reducing the word "empty" to a number.
- the metric associated with the content property includes - at least - ⁇ empty, full ⁇ u [0, 100] ...
- the word “will” is a verb [of complement] of information
- the word “volume” is a property since the word “sound” refers to an object defined elsewhere (anaphoric connection)
- the word “ Go comes under the very special category of units that are necessarily associated with a quantitative metric
- the word” 3 is a numerical adjective (NA) that can belong to all quantitative metrics compatible with positive integers.
- the word “3" can therefore be assigned as a value to the word "volume” provided that the intersection M VO ium ⁇ MG 0 OM 3 (where M x represents the set of all the metrics that can be associated with x) contains one and one only element. In the opposite case, there is either impossibility if the intersection is empty, or ambiguity if there are several solutions.
- metrics can provide information. This is for example the case of the phrase "I want to increase my laptop by two hours".
- FIG. 15 are schematically represented (and summarized) the principal relations between nodes associated with nouns (NO) and nodes associated with adjectives (NA), and the notions related thereto, in particular the metrics, the units, and the constraints (or rules) C1 used to prohibit all triplets (identifier, valuation, measure) that are not valid.
- figure 16 are schematically represented (and summarized) the principal relations between the categorizing entities of object and act type, and the related notions, notably the circumstances, the modalisations, the properties, the values and the metrics.
- the device for semantic analysis of documents D according to the invention can be realized in the form of electronic circuits, modules software (or computer), or a combination of circuits and software.
- the semantic document analysis device D can be used in any application that needs a reliable separation of correctly analyzed texts or messages from those that are not, and an accurate diagnosis that is easy to use for texts or messages incorrectly analyzed.
- a first application relates to the tools (or equipment) management of electronic mail (for example type email (or "e-mail")).
- the device D can indeed be used to filter information by determining whether the message which contains this information satisfies a set of semantic criteria.
- the device D will continue to react positively via its filter, which is irrelevant since the filter provides at least the information required by the super reference group.
- the super reference groups can be created from the synthesis of the results of the analysis of a corpus of reference messages, which makes it possible to avoid the user responsible for designing the filters the learning of knowledge specific to the application; it is enough for him to have sufficient control of the natural language to be able to elaborate the corpus concerned. It is also possible to juxtapose several filters within a single device D or parallel device D, and couple this device (s) to an interface adapted to the routing, so as to constitute an email manager.
- a second application concerns orthographic and / or grammatical tools (or equipment).
- the device D can indeed make it possible, on the one hand, to identify the grammatical errors which generally result from a bad application of the rules of syntax, then to identify the rule not respected and to propose a correction, and on the other hand, to identify the unknown words by separating the proper nouns and the barbarisms, then proposing for these words which are compatible.
- the device D actually makes it possible to answer the question "which are the words which, substituted for a faulty word, are likely to remove an ambiguity or an error? ".
- a third application relates to voice dictation tools (or equipment).
- the device can indeed make it possible to choose one of several solutions proposed by a voice recognition engine.
- a fourth application concerns tools (or equipment) for generating text.
- the device D can indeed collaborate with a text generator which is based, for example, on the theory called "Sense ⁇ ->Text" (or TST).
- a fifth application relates to the tools (or equipment) for generating summaries.
- the first is to create from scratch a new text that constitutes a digest of the original, with a variable "compression ratio" (but generally high).
- the second is to extract, based on criteria defined by a user, relevant sections of an original text.
- the device D can calculate thematic results if it is coupled to a hierarchy function and in the presence of linguistic markers.
- a sixth application concerns search engines.
- the search for textual information may consist of searching for either factual information, materialized by a question such as "what is the value of ...? ", Or texts relating to a theme or a predefined subject.
- the device D can indeed ensure, in the case of the factual research, an adequate semantic indexing allowing to directly produce a response.
- DBMS database management system
- the device D can also make it possible, in the case of the search for themed texts, to make distance calculations from thematic results, and then to propose a list of relevant documents according to said calculations. This type of operation could be enriched by the implementation of an accuracy rate.
- a seventh application concerns multilingual translators.
- the device D can provide a semantic analysis of text, fast and reliable, to remove the ambiguities of translation. Only a use of the totality of the information present in a text can indeed guarantee a relevant translation, that is to say a translation respecting as much as possible the meaning conveyed by the original text.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0504765A FR2885712B1 (fr) | 2005-05-12 | 2005-05-12 | Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantique |
PCT/FR2006/001055 WO2006120352A1 (fr) | 2005-05-12 | 2006-05-11 | Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantique |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1880314A1 true EP1880314A1 (fr) | 2008-01-23 |
Family
ID=35124726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06764601A Ceased EP1880314A1 (fr) | 2005-05-12 | 2006-05-11 | Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantique |
Country Status (4)
Country | Link |
---|---|
US (1) | US7856438B2 (fr) |
EP (1) | EP1880314A1 (fr) |
FR (1) | FR2885712B1 (fr) |
WO (1) | WO2006120352A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245708A (zh) * | 2019-06-18 | 2019-09-17 | 山东浪潮人工智能研究院有限公司 | 一种基于gan网络的技术文档术语解释生成方法及装置 |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7395425B2 (en) * | 2001-03-29 | 2008-07-01 | Matsushita Electric Industrial Co., Ltd. | Data protection system that protects data by encrypting the data |
WO2010018473A2 (fr) * | 2008-07-17 | 2010-02-18 | Talisma Corporation Private Ltd. | Procédé d'envoi d'une campagne sms (service de messages courts) à un destinataire associé par sélection du destinataire de base |
US8527353B2 (en) * | 2008-09-16 | 2013-09-03 | Yahoo! Inc. | Method and apparatus for administering a bidding language for online advertising |
EP2359263A4 (fr) * | 2008-12-19 | 2018-01-03 | EntIT Software LLC | Procédé et produit de programme informatique pour une sélection d'informations de document |
CN101510221B (zh) * | 2009-02-17 | 2012-05-30 | 北京大学 | 一种用于信息检索的查询语句分析方法与系统 |
US8880537B2 (en) | 2009-10-19 | 2014-11-04 | Gil Fuchs | System and method for use of semantic understanding in storage, searching and providing of data or other content information |
US9230258B2 (en) | 2010-04-01 | 2016-01-05 | International Business Machines Corporation | Space and time for entity resolution |
WO2011158066A1 (fr) * | 2010-06-16 | 2011-12-22 | Sony Ericsson Mobile Communications Ab | Métadonnées sémantiques basées sur l'utilisateur pour des messages textuels |
JP5849960B2 (ja) * | 2010-10-21 | 2016-02-03 | 日本電気株式会社 | 含意判定装置、方法、およびプログラム |
US9002859B1 (en) | 2010-12-17 | 2015-04-07 | Moonshadow Mobile, Inc. | Systems and methods for high-speed searching and filtering of large datasets |
CA2823839A1 (fr) * | 2011-01-10 | 2012-07-19 | Roy W. Ward | Systemes et procedes de recherche et de filtrage a grande vitesse de grands ensembles de donnees |
US9171054B1 (en) | 2012-01-04 | 2015-10-27 | Moonshadow Mobile, Inc. | Systems and methods for high-speed searching and filtering of large datasets |
US8990204B1 (en) | 2012-01-17 | 2015-03-24 | Roy W. Ward | Processing and storage of spatial data |
US10387780B2 (en) | 2012-08-14 | 2019-08-20 | International Business Machines Corporation | Context accumulation based on properties of entity features |
US9270451B2 (en) | 2013-10-03 | 2016-02-23 | Globalfoundries Inc. | Privacy enhanced spatial analytics |
CN104142917B (zh) * | 2014-05-21 | 2018-05-01 | 北京师范大学 | 一种用于语言理解的层次语义树构建方法及系统 |
US10122805B2 (en) | 2015-06-30 | 2018-11-06 | International Business Machines Corporation | Identification of collaborating and gathering entities |
US10521411B2 (en) | 2016-08-10 | 2019-12-31 | Moonshadow Mobile, Inc. | Systems, methods, and data structures for high-speed searching or filtering of large datasets |
US10528665B2 (en) * | 2017-01-11 | 2020-01-07 | Satyanarayana Krishnamurthy | System and method for natural language generation |
CN108334497A (zh) * | 2018-02-06 | 2018-07-27 | 北京航空航天大学 | 自动生成文本的方法和装置 |
CN109815490B (zh) * | 2019-01-04 | 2023-11-14 | 平安科技(深圳)有限公司 | 文本分析方法、装置、设备及存储介质 |
CN110085290A (zh) * | 2019-04-01 | 2019-08-02 | 东华大学 | 支持异构信息集成的乳腺钼靶报告语义树模型建立方法 |
CN110647662B (zh) * | 2019-08-03 | 2022-10-14 | 电子科技大学 | 一种基于语义的多模态时空数据关联方法 |
CN110660128B (zh) * | 2019-09-23 | 2023-08-11 | 云南电网有限责任公司电力科学研究院 | 一种基于生成对抗网络的三维语义场景重建方法 |
CN111709250B (zh) * | 2020-06-11 | 2022-05-06 | 北京百度网讯科技有限公司 | 用于信息处理的方法、装置、电子设备和存储介质 |
US11194966B1 (en) * | 2020-06-30 | 2021-12-07 | International Business Machines Corporation | Management of concepts and intents in conversational systems |
CN111931503B (zh) * | 2020-08-04 | 2024-01-26 | 腾讯科技(深圳)有限公司 | 信息抽取方法及装置、设备、计算机可读存储介质 |
CN112492313B (zh) * | 2020-11-22 | 2021-09-17 | 复旦大学 | 一种基于生成对抗网络的图片传输系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SE466029B (sv) * | 1989-03-06 | 1991-12-02 | Ibm Svenska Ab | Anordning och foerfarande foer analys av naturligt spraak i ett datorbaserat informationsbehandlingssystem |
JP3266246B2 (ja) * | 1990-06-15 | 2002-03-18 | インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン | 自然言語解析装置及び方法並びに自然言語解析用知識ベース構築方法 |
US6076051A (en) * | 1997-03-07 | 2000-06-13 | Microsoft Corporation | Information retrieval utilizing semantic representation of text |
US6721697B1 (en) * | 1999-10-18 | 2004-04-13 | Sony Corporation | Method and system for reducing lexical ambiguity |
IL142421A0 (en) * | 2001-04-03 | 2002-03-10 | Linguistic Agents Ltd | Linguistic agent system |
-
2005
- 2005-05-12 FR FR0504765A patent/FR2885712B1/fr not_active Expired - Fee Related
-
2006
- 2006-05-11 EP EP06764601A patent/EP1880314A1/fr not_active Ceased
- 2006-05-11 US US11/920,186 patent/US7856438B2/en not_active Expired - Fee Related
- 2006-05-11 WO PCT/FR2006/001055 patent/WO2006120352A1/fr active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2006120352A1 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245708A (zh) * | 2019-06-18 | 2019-09-17 | 山东浪潮人工智能研究院有限公司 | 一种基于gan网络的技术文档术语解释生成方法及装置 |
CN110245708B (zh) * | 2019-06-18 | 2021-05-18 | 浪潮集团有限公司 | 一种基于gan网络的技术文档术语解释生成方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
FR2885712B1 (fr) | 2007-07-13 |
US7856438B2 (en) | 2010-12-21 |
US20090077113A1 (en) | 2009-03-19 |
WO2006120352A1 (fr) | 2006-11-16 |
FR2885712A1 (fr) | 2006-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1880314A1 (fr) | Dispositif et procede d'analyse semantique de documents par constitution d'arbres n-aire et semantique | |
Gardent et al. | Creating training corpora for nlg micro-planning | |
US9633005B2 (en) | Exhaustive automatic processing of textual information | |
Jescheniak et al. | Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. | |
EP1544746A2 (fr) | Création de résumés normalisés en utilisant de modèles de domaines communs pour l'analyse et la géneration de texte. | |
US9588958B2 (en) | Cross-language text classification | |
JP6676110B2 (ja) | 発話文生成装置とその方法とプログラム | |
WO2002067142A2 (fr) | Dispositif d'extraction d'informations d'un texte a base de connaissances | |
Ye | Supporting component-based software development with active component repository systems | |
US20120010872A1 (en) | Method and System for Semantic Searching | |
Mallery | Semantic content analysis: a new methodology for the RELATUS natural language environment | |
WO2022134779A1 (fr) | Procédé, appareil et dispositif d'extraction de données associées à une action de personnage et support de stockage | |
Van Valin et al. | Interfacing the lexicon and an ontology in a linking system | |
Hawkinson | The Representation of Concepts in OWL. | |
RU2662699C2 (ru) | Исчерпывающая автоматическая обработка текстовой информации | |
Gyawali | Surface Realisation from Knowledge Bases | |
Aretoulaki | COSY-MATS: A Hybrid Connectionist-Symbolic Approach To The Pragmatic Analysis of Texts For Their Automatic Summarisation | |
Sevilla et al. | Enriched semantic graphs for extractive text summarization | |
Galitsky et al. | Building chatbot thesaurus | |
Jenkins | Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System | |
Zarri | A structured metadata approach for dealing in an ‘intelligent’way with complex ‘narrative’information | |
Tomai | A pragmatic approach to computational narrative understanding | |
FR3087555A1 (fr) | Dispositif de traitement automatique de texte par ordinateur | |
Şerban | Detection and integration of affective feedback into distributed interactive systems | |
Fliedner | Linguistically informed question answering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20071110 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
17Q | First examination report despatched |
Effective date: 20080414 |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: LA SOCIETE HUMAN KNOWLEDGE |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: PRIGNITZ, HERMANN Inventor name: FIDAALI, KABIRE |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20160513 |