CN112667819A - Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device - Google Patents

Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device Download PDF

Info

Publication number
CN112667819A
CN112667819A CN202011435544.2A CN202011435544A CN112667819A CN 112667819 A CN112667819 A CN 112667819A CN 202011435544 A CN202011435544 A CN 202011435544A CN 112667819 A CN112667819 A CN 112667819A
Authority
CN
China
Prior art keywords
entity
description
reasoning
knowledge base
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011435544.2A
Other languages
Chinese (zh)
Inventor
刘焕勇
刘张宇
邹志龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Data Horizon Guangzhou Technology Co ltd
Original Assignee
Data Horizon Guangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data Horizon Guangzhou Technology Co ltd filed Critical Data Horizon Guangzhou Technology Co ltd
Priority to CN202011435544.2A priority Critical patent/CN112667819A/en
Publication of CN112667819A publication Critical patent/CN112667819A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method and a device for constructing an entity description reasoning knowledge base and acquiring reasoning evidence quantitative information. The method comprises the steps of establishing an entity description reasoning knowledge base by utilizing a large-scale unstructured open text, wherein the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base; and aiming at the input event or event description list, event pair or event description pair list, searching in the entity description reasoning knowledge base through entity linkage, and returning reasoning evidence and conduction strength between events. The invention widens the range of the prior logical reasoning knowledge base and can improve the logical reasoning capability of the prior knowledge base; the method can be flexibly applied to different reasoning scenes such as single-event reasoning, double-event pair reasoning and the like, and can be quickly changed according to actual requirements on an acquisition mode.

Description

Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device
Technical Field
The invention relates to a method and a device for constructing an entity description reasoning knowledge base and acquiring reasoning evidence quantitative information, belongs to the field of natural language processing, and belongs to a specific class knowledge base constructing and application reasoning method.
Background
Knowledge reasoning is an advanced stage of artificial intelligence, and based on the existing knowledge, the knowledge reasoning machine technology is applied to finish the decision-making action in the limited field, so that the economic benefit can be generated while the manual labor is fully reduced. For example, knowledge reasoning is performed based on known knowledge, knowledge discovery is performed by adopting event-driven conduction paths and the like, business reasoning and decision assistance can be assisted, unknown risk early warning is performed in intelligent research, and public opinion control and monitoring are performed on companies in public opinion analysis.
In order to accomplish the above reasoning work, the method comprises 1) a large-scale knowledge base with reasoning function as a basic data resource; 2) the method has two core points of a human-friendly, credible and interpretable knowledge reasoning display mode and the like in the reasoning process. Wherein, from the large-scale open text, through the extraction of logical knowledge, a large-scale knowledge base with reasoning description capability can be formed, namely, the knowledge base refers to a class of common knowledge which has logical description capability and is composed of logical reasoning factors. The credible and interpretable knowledge reasoning mode makes requirements on the reasoning process, and the reasoning path is transparent, credible and quantifiable and can be conveniently and well understood by human beings.
Limited by the current technical level of natural language processing, the current situation-oriented reasoning has the following defects in the construction and application of the logic reasoning knowledge base:
1. the dimensions of existing logical repositories are relatively limited. The existing logic reasoning knowledge base is mainly focused on the construction of a knowledge base of a case evolution logic (such as causal evolution, conditional evolution and the like), an industry chain conduction logic and related extensions, and the construction of other types of logic reasoning knowledge bases needs to be further mined.
2. Existing event reasoning conducts quantitative data loss. The current event reasoning process has a conductivity quantization problem, the quantization problem depends on quantifiable data indexes to a great extent, and most of the data are not open, which causes data distortion. The weight of event inference can be derived using numerical rank computation in the entity description. In terms of logical extent, an important advantage of entity description knowledge is that "freeing data", there are a large number of statistical conclusions about the data in the description of an entity.
3. Existing event reasoning conducts the display of evidence missing. In the current event reasoning interpretable display process, each reasonable node between events and edges between the nodes are directly used as elements for drawing, and the demonstration evidence source display of the conducting edges is not effective enough. This lack does not easily guarantee the reliability and reliability of reasoning.
4. Mining and application of entity logic is not fully explored. Entity description knowledge is an important class of logical reasoning factors and application objects. In a real situation, a specific entity performs real entity description, a large number of potential reasoning clues are hidden, and the method is another very effective reasoning factor which can fill gaps of the existing logic and further broaden the extension of the reasoning logic. For example, chile is a country in the world where the lithium reserves are known to be the largest and the lithium mines are mined most, and is also a world maximum copper producing country, a Chinese maximum supply country for refined copper and other description information, so that the key position of chile on the supply chains of lithium mines and copper mines can be refracted.
Disclosure of Invention
Aiming at the current situation and problems in the prior art, the invention aims to provide a method and a device for constructing an entity description reasoning knowledge base and acquiring event reasoning evidence quantitative information.
The invention is composed of an entity description reasoning knowledge base construction module and an event reasoning evidence quantitative information acquisition module. The method mainly adopts the technical scheme that a natural language processing means is applied, entity description knowledge extraction is carried out facing expression with definite introductory description in a large-scale unstructured open text, entities, associated entities and obvious conduction association among the entities are quantized by using a quantization and statistical means, the association description is used as conduction evidence to be recorded and stored, and finally a large-scale entity description reasoning knowledge base is formed. And automatically acquiring reasoning evidence and associated reasoning information with quantitative information aiming at a specific input event or event pair list by an event entity linking and matching method based on the entity description reasoning knowledge base.
The technical scheme adopted by the invention is as follows:
an entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method comprises the following steps:
establishing an entity description reasoning knowledge base by using the large-scale unstructured open text, wherein the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base;
and aiming at the input event or event description list, event pair or event description pair list, searching in the entity description reasoning knowledge base through entity linkage, and returning reasoning evidence and conduction strength between events.
Further, the entity description knowledge base is established by adopting the following steps:
preprocessing input texts to form a paragraph set and a sentence set;
performing pattern matching on the paragraph set and the sentence set by using an entity description tag word list to form an entity description candidate paragraph and a candidate sentence set;
training a corresponding entity and an identification model of entity concept description aiming at each sentence in the candidate paragraphs and the candidate sentence sets by adopting a sequence labeling method, and learning the entity and the described characteristics thereof on a word level;
and obtaining an entity main body, an entity description binary group and an entity main body, a descriptor and an entity description triple by using the trained recognition model to form an entity description knowledge base.
Further, the sequence marking refers to outputting a character label corresponding to each character in a given text character string; the sequence labeling uses a BIO labeling mode, the beginning of the entity is labeled by B-X, the middle of the entity is labeled by I-X, irrelevant characters are labeled by O, and the labels to be predicted are divided into 7 types: B-Entity, I-Entity, B-Trigger, I-Trigger, B-Desc, I-Desc and O.
Further, the entity association conduction library is established by adopting the following steps:
acquiring entity description in an entity description knowledge base;
identifying an associated entity in the entity description and an associated description between the associated entity and the entity body to form an < entity body, associated description and associated entity > triple;
quantifying the association description between the entity body and the associated entity, wherein the association description is used as the strength of the association conducted from the entity body to the associated entity;
and forming an entity body, an association description, an associated entity and an association strength quadruplet set, namely forming an entity association conduction library.
Furthermore, the quantification of the association description between the entity body and the associated entity is carried out by means of an emotional intensity word list, a degree side word list and a quantification rule base.
Further, acquiring single event description reasoning evidence quantization information by adopting the following steps:
aiming at an input single event description or an event description list, entity identification and entity linkage are carried out by means of an entity in a constructed entity description reasoning knowledge base, and mapping and association are completed between the event description and the entity description reasoning knowledge base;
based on the associated result and the position of the current entity in the entity description reasoning knowledge base, traversing and expanding are carried out according to the set walking depth by adopting a walking method to obtain the associated entity, and the conduction strength and the reasoning evidence between entity conduction.
Further, acquiring quantitative information of the dual event description on the reasoning evidence by adopting the following steps:
aiming at an input single event description pair or an event description pair list, entity identification and entity linkage are carried out by means of an entity in a constructed entity description reasoning knowledge base, and mapping and association are completed between the event description and the entity description reasoning knowledge base to form a head event entity linkage set and a tail event entity linkage set;
respectively combining a head event entity link set and a tail event entity link set to form a < head event entity, a tail event entity > binary subgraph, and performing subgraph multi-hop matching on the binary subgraph in an entity description inference knowledge base to obtain a hit associated conduction subgraph;
analyzing the associated conducting subgraphs by respectively adopting traversal methods of breadth-first and depth-first;
and (3) assembling the associated conduction paths and the conduction weights formed in the associated conduction subgraph analysis process, taking the description sentence where the conduction edge is located as an inference evidence, and returning the weight information on the description edge as the conduction strength.
An entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition device adopting the method comprises the following steps:
the entity description reasoning knowledge base building module is used for building an entity description reasoning knowledge base by utilizing the large-scale unstructured open text, and the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base;
and the reasoning evidence quantitative information acquisition module is used for searching in the entity description reasoning knowledge base through entity link aiming at the input event or event description list, event pair or event description pair list and returning reasoning evidence and conduction strength between events.
Compared with the prior art, the invention has the following advantages:
1. the invention provides a set of entity description reasoning knowledge base construction method, which further widens the range of the existing logical reasoning knowledge base and can further improve the logical reasoning capability of the existing knowledge base.
2. The invention provides an entity description extraction method based on integration of three models of sequence labeling deep learning of obvious description indicator words, and higher extraction accuracy, recall ratio and F value can be obtained through experimental evaluation.
3. The invention provides an entity description reasoning knowledge base which can relieve the problems of quantitative data loss of event reasoning conduction and evidence display loss of event reasoning conduction in the reasoning process to a certain extent.
4. The invention provides a set of device for acquiring event reasoning evidence quantitative information based on an entity description reasoning knowledge base, which can be flexibly applied to different reasoning scenes such as single-event reasoning, double-event pair reasoning and the like, and can be rapidly changed according to actual requirements in an acquisition mode.
Drawings
Fig. 1 is a system configuration diagram.
FIG. 2 is a diagram of entity description inference knowledge base construction.
Fig. 3 is a diagram of an event reasoning evidence quantitative information acquisition process.
FIG. 4 is a detailed flow diagram of entity description extraction.
Fig. 5 is a schematic flow chart of the construction process of the entity association conduction library.
Fig. 6 is a flow chart illustrating reasoning evidence quantification information by single event description.
Fig. 7 is a flow chart illustrating quantification information of reasoning evidence by dual event description.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
The invention mainly comprises an entity description reasoning knowledge base construction module and an event reasoning evidence quantitative information acquisition module, which are shown in figure 1. The technical flow of each module follows the following steps:
entity description reasoning knowledge base construction
1. Definition of entity description inference knowledge base concept
The entity description reasoning knowledge base is a common knowledge base with logical reasoning capability formed by taking two core knowledge of an entity and entity association description information as nodes and taking a quantifiable conduction relation between the entity and an associated entity as a side, and is formed by extracting description with definite introductivity, wherein the entity description knowledge base consists of two sub-bases of an entity description knowledge base and an entity association conduction base. Wherein:
the entity description knowledge base refers to a knowledge set composed of < entity body, entity description > binary group or < entity body, descriptor and entity description > triple after structured extraction.
The entity association conduction library refers to a knowledge set composed of < entity body, association description, association entity > triple or < entity body, association description, association entity, association strength > quadruple.
The entity refers to an object discussed in the actual description text, has no specific word-building structure, and can be a verb, a verb phrase, a short sentence and other different language types.
The description with explicit introductivity refers to the actual description knowledge of a specific entity in a real scene, and comprises the objective attributes and special characteristics of the entity, including definition, attributes or characteristics, and directly or indirectly describes the entity. In the expression of the form of "australian forest fire is a common natural disaster", the "common natural disaster" is an expression of the "australian forest fire".
2. Construction of an entity description knowledge base
The entity description knowledge base is constructed, and the idea is that entity body recognition, body description body recognition and other processing are carried out on the open unstructured text by setting an entity description trigger word and by means of a deep learning mixed mode, and < entity body, entity description > binary group, < entity body, indicator word and entity description > triple are output to form the entity description knowledge base. As shown in fig. 4, the specific steps (steps 1 and 2) are as follows, corresponding to "text preprocessing" in fig. 4):
1) and (4) aiming at the input text, removing illegal characters such as blank characters and the like, and converting the traditional characters into simplified characters.
2) And carrying out segmentation processing on the text by using the line feed character to form a paragraph set. And carrying out clause processing by using semicolons, periods and exclamation marks to form a sentence set.
3) And performing pattern matching on the paragraph set and the sentence set by using the entity description tag word list to form an entity description candidate paragraph and a candidate sentence set. Wherein, the entity description label word list refers to trigger words trigger which are formed by induction and often appear in the descriptive sentences through observing the characteristics of the descriptive sentences, 67 terms including name, self-sealing, self-name, called, viewed, referred, called, appellated, named, commonly, alias, change, chemical work, namely, name, getting name, change, call, name, humming, self-name, cannability, nomination, namely, becoming name, finger, meaning, repulsing, being, expressing, particularly, being, called, being, named, called, name, equal, calculating, ending, name in beauty, self-name, self-sealing, self-name, called, viewed, like, known, named, called, viewed, named, called.
4) And training a corresponding entity and an identification model of entity concept description aiming at each sentence in the candidate paragraphs and the candidate sentence sets by adopting a sequence labeling idea, and learning the entity and the described characteristics thereof on a word level. The sequence marking refers to outputting a character label corresponding to each character in a given text character string. The method comprises the following specific steps:
A. tagging of data
The marking strategy used by the invention is a BIO marking mode, namely marking the beginning of an entity by B-X, marking the middle of the entity by I-X and marking irrelevant characters by O. Compared with a BIOES five-segment marking method, the BIO three-segment marking method has the greatest advantages of supporting word-by-word marking and not needing word segmentation processing on data. Since the extraction task is to extract the entities in the sentence and the description information of the entities, X can respectively take "Entity", "Trigger", and "Desc (Entity concept description information)". Therefore, the labels to be predicted are divided into 7 types: "B-Entity", "I-Entity", "B-Trigger", "I-Trigger", "B-Desc", "I-Desc", "O". And taking Australia forest fire as an example, and explaining a labeling specification of the entity concept description extraction task by combining the example.
For example: the Australian forest fire is a common natural disaster. "
The data needs to be labeled in the form of: "Australian forest fire [ Entity ]" "is a common natural disaster [ Desc ]" of [ Trigger ] ".
The serialized data format is as follows, each line of which is composed of a word and its corresponding label, the label set adopts BIO, and sentences are separated by an empty line. The results after serialization were:
TABLE 1 text sequence and data tag formats
Ao De Big (a) Benefit to A Forest of great forest meters Forest (forest)
B-Entity I-Entity I-Entity I-Entity I-Entity I-Entity
Big (a) Fire(s) Is that It is composed of Often times See
I-Entity I-Entity B-Trigger O B-Desc I-Desc
Is/are as follows From However, the device is not suitable for use in a kitchen Disaster recovery Harm (I)
I-Desc I-Desc I-Desc I-Desc I-Desc O
After the model is trained, the index positions of all the labels of 'B-Entity', 'I-Entity', 'B-Trigger', 'I-Trigger', 'B-Desc', 'I-Desc' and 'O' are extracted according to the set label specification, and then the corresponding characters are searched from the original sentence according to the index positions for splicing.
B. Training and use of models
The invention uses a method based on a pre-training language model and fine tuning (Finetune), and particularly adopts a BERT Chinese pre-training model file Chinese _ L-12_ H-768_ A-12 as a pre-training model in the selection of the pre-training model. The method comprises the steps of training word embedding at a character level on a large-scale text, inputting the word embedding into a BilSTM model to obtain more inter-character dependence, carrying out feature coding on paragraphs through BERT and BilSTM, then carrying out label prediction in a CRF (domain parameter matching) decoding mode, identifying the boundaries of an entity and a description body, and processing by adopting a rule combination classification method aiming at the problem of combination pairing between the entity and entity description information.
C. Combination of entity descriptions
After the sequence labeling model is used, a plurality of entity parts and description parts occur, the algorithm adopts the pairing principle to carry out combination, and the matching combination rule is shown in table 2.
Table 2-entity description tuple extraction rules
Figure BDA0002821117960000071
D. Profiling of entity description extraction
The method comprises the steps of automatically and manually marking 10813 pieces of data in a sequence marking mode, dividing the data into a training set, a verification set and a test set according to the ratio of 8:1:1, wherein the adopted evaluation indexes are Precision (Precision), Recall (Recall) and F1 values, and the method is specifically defined as follows:
Figure BDA0002821117960000072
Figure BDA0002821117960000073
Figure BDA0002821117960000074
wherein, TpIs the number of correctly identified samples of the model, FpIs the number of samples, F, that the model incorrectly identified a non-correlated entity as a correlated entityNIs the number of samples that the model did not identify as containing the relevant entity.
Finally, the results of the model testing on token-level (character label level) on 1172 test sets are shown in the following table:
TABLE 3 results of model testing
Item Precision Recall F1
Entity-Entity 97.62% 96.47% 97.04
Description label word-Trigger 95.51% 97.70% 96.59
Description of entity concept-Desc 88.42% 94.38% 91.30
Token 93.66% 96.17% 94.90
In addition, 49,079,726 open texts are extracted to obtain 8,904,569 entity description triples with high quality, the description tag word-Trigger in the triples is reserved to form an entity main body, a description word and an entity description triple, and the description tag word-Trigger in the triples forms an entity main body and an entity description binary.
3. Construction of entity association conduction library
The idea of the construction of the entity association conduction library is that aiming at the entity description in the entity description knowledge library, the association entity in the description and the association description between the entity description and the entity subject are identified, an association description set is utilized, an association strength calculation method is designed, the association relation between the entity subject and the association entity is quantized and used as the conduction association strength from the entity subject to the association entity, and the association description is used as the conduction evidence. And finally, carrying out entity standardization and end-to-end processing to form a large-scale entity association conduction library. The construction process of the entity association conduction library is shown in fig. 5. The algorithm comprises the following steps:
1) and acquiring the associated entity description. And acquiring the associated entity description, wherein the pointer acquires entity description sentences from the entity description triple information extracted by the entity description knowledge base construction part.
2) And (4) identifying the associated entity. And the associated entity identification refers to the entity identification of the entity description statement by means of an external field entity library to form an associated entity. Wherein the outside-realm entity library is composed of trade-property goods or goods names. The method comprises the following steps:
A. the large article futures name sets sold by the large article futures exchange, the Shanghai futures exchange, the Zhengzhou futures exchange and the like.
B. Each large block in the financial field can be used for name sets of traded companies, stocks, funds, industries, concept shares and the like.
C. Each node in the industry chain includes a collection of names of products, companies, industries, people, and the like.
D. And the commodity name set in the commodity name list of the State statistics bureau.
And combining the entity of the extracted entity description triple, the entity description statement and the identified associated entity to form an entity body, an associated description and an associated entity triple.
3) And (5) conducting correlation quantization. And the conduction association quantification refers to intensity calculation of the association description by means of an emotion intensity word list, a degree side word list and a quantification rule base, and is used as the conduction association intensity between the entity and the associated entity. Wherein:
A. the emotion intensity vocabulary refers to emotion words with obvious intensity score marks, such as (important, 1.0), (main, 1.0), (general, 0.5), (small, 0.1) and the like under the format (emotion words and intensities), and the emotion words can effectively represent the intensity expressed by the description sentences.
B. The degree side word list refers to a word set of side words with obvious degree modification meanings and strength information thereof, such as (extraordinary, 2.0), (extreme, 2.0), (most, 2.0) and the like in a format (degree side words and strength), and can effectively represent key points and emphatic attributes in description expression.
C. The quantization rule base. Refers to a set of rules for weighting the emotional intensity words, the degree adverbs, and the descriptive sentence lengths included in the descriptive expression. The rule fully considers three factors of emotion, degree and description length, and sets a correlation coefficient for calculation, wherein the specific calculation mode is as follows:
a) assuming that the total word number of the sentence is N, weight (X) is the strength Score of X, count (X) is the number of X, and the total Score is Score (entity, associated entity);
b) score (entity, associated entity) (alpha) × (emotion word) × (weight) + (beta) × (degree adverb))/N [ ((alpha) × (emotion word) × (degree adverb))/N [ ((degree adverb) ]
4) And (5) quantitative statistic updating. And the quantitative statistic updating and the pointer carry out 1) to 3) operations on all entity description triples in the entity description knowledge base, and the scores of the triples are the accumulation of all the same triple strength scores aiming at the entities and the associated entities with the same content.
5) And (5) associating the entity with the conduction library. And forming an entity main body, association description, association entity and association strength quadruple set by the scores updated through quantitative statistics, and finally finishing the establishment of the entity association conduction library.
Event reasoning evidence quantitative information acquisition
The event reasoning evidence quantitative information acquisition is mainly divided into two components, namely entity linking and evidence quantitative information acquisition, as shown in fig. 3. The idea is that aiming at an input event or event description list, an event pair or event description pair list, the search is carried out in a constructed entity description reasoning knowledge base through an entity linking technical means, and reasoning evidence and conduction strength between events are returned. Wherein the content of the first and second substances,
the event or event description refers to an expression form of an objectively occurring event, usually in the form of a linguistic unit with a structural component of a predicate object or predicate structure.
The reasoning evidence refers to the demonstration description of the conduction edge between the events, and is in the form of statement description and source document which obviously record the conduction relation.
The conduction strength refers to the weight of the conduction edge between events, namely the sum of the conduction association strengths of all the link entities and other entities in the entity association conduction library. For example, given event a and event B, with entities e1 and e2 linked in event a and entities e3 and e4 linked in event B, the strength of conduction between events a and B is the sum of the strength of association of e1 with e3, e1 with e4, e2 with e3, e2 with e 4.
1. Single event description reasoning evidence quantitative information acquisition
The single event description reasoning evidence quantitative information acquisition is as shown in fig. 6, and the idea is that aiming at an input single event description or an event description list, entity identification and entity link are performed by means of an entity in a constructed entity description reasoning knowledge base, and event description and the entity description reasoning knowledge base are mapped and associated. Based on the associated result and the position of the current entity in the knowledge base, traversing and expanding are carried out according to the set walking depth by adopting a walking method to obtain the associated entity, and the conduction strength and the reasoning evidence between the entity conduction. The algorithm comprises the following steps:
1) an event description is obtained. And acquiring the event description input by the user, judging the type of the event description, and traversing if the event description is a list.
2) And linking the entities. And utilizing the established entity description reasoning knowledge base to identify and link the entity, and mapping and associating the event description and the entity description reasoning knowledge base.
3) And the migration depth and the migration width are configured. After the linkage of the event description and the entity in the entity reasoning knowledge base is completed, the depth and the breadth of the walk are dynamically set according to business needs so as to limit the associated entity name set needing to be deduced.
4) The inference knowledge base wanders. And according to the configured migration depth and extent, performing migration in the constructed entity description inference knowledge base, and simultaneously recording the associated conduction paths and conduction weights of the entity description inference knowledge base.
5) And acquiring reasoning evidence quantitative information. And (3) assembling the associated conduction paths and conduction weights of the recorded journey after the inference knowledge base walks, returning the associated entity information under different path depths and path widths, taking the description sentence where the conduction edge is located as evidence information, namely the inference evidence in fig. 6, and taking the weight information on the description edge as quantitative data, namely the conduction strength in fig. 6.
2. Dual event description versus reasoning evidence quantitative information acquisition
The idea of the dual event description reasoning evidence quantitative information acquisition is that, as shown in fig. 7, for an input single event description pair or event description pair list, entity recognition and entity linking are performed by means of an entity in a constructed entity description reasoning knowledge base, and event description and the entity description reasoning knowledge base are mapped and associated. And meanwhile, an entity subgraph is constructed according to head and tail entities, subgraph multi-hop matching is carried out in an originally constructed entity reasoning knowledge base, subgraph analysis is carried out on a matching result, and finally quantized evidence data are output. The algorithm comprises the following steps:
1) an event description is obtained. And acquiring the event description input by the user, judging the type of the event description, and traversing if the event description is a list.
2) And linking the entities. And utilizing the entities in the constructed entity description reasoning knowledge base to perform entity identification and entity linkage, and completing mapping and association of the event description and the entity description reasoning knowledge base to respectively form a head event entity linkage set and a tail event entity linkage set.
3) And (5) sub-graph multi-hop matching. And respectively combining a head event entity link set and a tail event entity link set formed in the entity link to form a < head event entity, tail event entity > binary subgraph. And setting the maximum interval number between matching, wherein if a single hop is a first-level, a second hop is a second-level, and the like. And matching the binary subgraphs in an entity description reasoning knowledge base to obtain the hit associated conducting subgraphs. The maximum number of intervals is 1 by default, and the conduction capability is strongest.
4) And (5) analyzing the associated subgraph. And analyzing the associated conduction subgraphs found through subgraph multi-hop matching by respectively adopting traversal methods with breadth-first and depth-first, and outputting associated information data in each conduction path.
5) And acquiring reasoning evidence quantitative information. And (3) assembling the associated conduction paths and the conduction weights formed in the associated subgraph analysis process, taking the description sentence where the conduction edge is located as evidence information (the 'reasoning evidence' in fig. 7), and returning the weight information (the 'conduction strength' in fig. 7) on the description edge as quantized data.
Based on the same inventive concept, another embodiment of the present invention provides an entity description reasoning knowledge base constructing and reasoning evidence quantifying information obtaining apparatus using the above method, comprising:
the entity description reasoning knowledge base building module is used for building an entity description reasoning knowledge base by utilizing the large-scale unstructured open text, and the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base;
and the reasoning evidence quantitative information acquisition module is used for searching in the entity description reasoning knowledge base through entity link aiming at the input event or event description list, event pair or event description pair list and returning reasoning evidence and conduction strength between events.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
The foregoing disclosure of the specific embodiments of the present invention and the accompanying drawings is directed to an understanding of the present invention and its implementation, and it will be appreciated by those skilled in the art that various alternatives, modifications, and variations may be made without departing from the spirit and scope of the invention. The present invention should not be limited to the disclosure of the embodiments and drawings in the specification, and the scope of the present invention is defined by the scope of the claims.

Claims (10)

1. An entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method is characterized by comprising the following steps:
establishing an entity description reasoning knowledge base by using the large-scale unstructured open text, wherein the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base;
and aiming at the input event or event description list, event pair or event description pair list, searching in the entity description reasoning knowledge base through entity linkage, and returning reasoning evidence and conduction strength between events.
2. The method of claim 1, wherein the entity description repository is created by:
preprocessing input texts to form a paragraph set and a sentence set;
performing pattern matching on the paragraph set and the sentence set by using an entity description tag word list to form an entity description candidate paragraph and a candidate sentence set;
training a corresponding entity and an identification model of entity concept description aiming at each sentence in the candidate paragraphs and the candidate sentence sets by adopting a sequence labeling method, and learning the entity and the described characteristics thereof on a word level;
and obtaining an entity main body, an entity description binary group and an entity main body, a descriptor and an entity description triple by using the trained recognition model to form an entity description knowledge base.
3. The method of claim 2, wherein the sequence label is to output a character label corresponding to each character in a given text string; the sequence labeling uses a BIO labeling mode, the beginning of the entity is labeled by B-X, the middle of the entity is labeled by I-X, irrelevant characters are labeled by O, and the labels to be predicted are divided into 7 types: B-Entity, I-Entity, B-Trigger, I-Trigger, B-Desc, I-Desc and O.
4. The method of claim 1, wherein the entity association conduction library is created by:
acquiring entity description in an entity description knowledge base;
identifying an associated entity in the entity description and an associated description between the associated entity and the entity body to form an < entity body, associated description and associated entity > triple;
quantifying the association description between the entity body and the associated entity, wherein the association description is used as the strength of the association conducted from the entity body to the associated entity;
and forming an entity body, an association description, an associated entity and an association strength quadruplet set, namely forming an entity association conduction library.
5. The method of claim 4, wherein the quantifying of the description of the association between the entity body and the associated entity is performed by means of an emotion intensity vocabulary, a degree vocabulary, and a quantification rule base.
6. The method according to claim 1, characterized in that the following steps are adopted to obtain single event description reasoning evidence quantitative information:
aiming at an input single event description or an event description list, entity identification and entity linkage are carried out by means of an entity in a constructed entity description reasoning knowledge base, and mapping and association are completed between the event description and the entity description reasoning knowledge base;
based on the associated result and the position of the current entity in the entity description reasoning knowledge base, traversing and expanding are carried out according to the set walking depth by adopting a walking method to obtain the associated entity, and the conduction strength and the reasoning evidence between entity conduction.
7. The method of claim 1, wherein the following steps are taken to obtain quantitative information on inference evidence of dual event description:
aiming at an input single event description pair or an event description pair list, entity identification and entity linkage are carried out by means of an entity in a constructed entity description reasoning knowledge base, and mapping and association are completed between the event description and the entity description reasoning knowledge base to form a head event entity linkage set and a tail event entity linkage set;
respectively combining a head event entity link set and a tail event entity link set to form a < head event entity, a tail event entity > binary subgraph, and performing subgraph multi-hop matching on the binary subgraph in an entity description inference knowledge base to obtain a hit associated conduction subgraph;
analyzing the associated conducting subgraphs by respectively adopting traversal methods of breadth-first and depth-first;
and (3) assembling the associated conduction paths and the conduction weights formed in the associated conduction subgraph analysis process, taking the description sentence where the conduction edge is located as an inference evidence, and returning the weight information on the description edge as the conduction strength.
8. An entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition device adopting the method of any claim 1 to 7, characterized by comprising:
the entity description reasoning knowledge base building module is used for building an entity description reasoning knowledge base by utilizing the large-scale unstructured open text, and the entity description reasoning knowledge base comprises an entity description knowledge base and an entity association conduction base;
and the reasoning evidence quantitative information acquisition module is used for searching in the entity description reasoning knowledge base through entity link aiming at the input event or event description list, event pair or event description pair list and returning reasoning evidence and conduction strength between events.
9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.
CN202011435544.2A 2020-12-07 2020-12-07 Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device Pending CN112667819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011435544.2A CN112667819A (en) 2020-12-07 2020-12-07 Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011435544.2A CN112667819A (en) 2020-12-07 2020-12-07 Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device

Publications (1)

Publication Number Publication Date
CN112667819A true CN112667819A (en) 2021-04-16

Family

ID=75401779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011435544.2A Pending CN112667819A (en) 2020-12-07 2020-12-07 Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device

Country Status (1)

Country Link
CN (1) CN112667819A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255353A (en) * 2021-05-31 2021-08-13 中国科学院计算技术研究所厦门数据智能研究院 Entity standardization method
CN116049326A (en) * 2022-12-22 2023-05-02 广州奥咨达医疗器械技术股份有限公司 Medical instrument knowledge base construction method, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131055A1 (en) * 2009-04-09 2012-05-24 Sigram Schindler Beteiligungsgesellschaft Mbh Fstp expert system
CN109255034A (en) * 2018-08-08 2019-01-22 数据地平线(广州)科技有限公司 A kind of domain knowledge map construction method based on industrial chain
CN109635171A (en) * 2018-12-13 2019-04-16 成都索贝数码科技股份有限公司 A kind of fusion reasoning system and method for news program intelligent label
CN110275959A (en) * 2019-05-22 2019-09-24 广东工业大学 A kind of Fast Learning method towards large-scale knowledge base
CN110674840A (en) * 2019-08-22 2020-01-10 中国司法大数据研究院有限公司 Multi-party evidence association model construction method based on Bayesian network and evidence chain extraction method and device
CN110968700A (en) * 2019-11-01 2020-04-07 数地科技(北京)有限公司 Domain event map construction method and device fusing multi-class affairs and entity knowledge
CN111737400A (en) * 2020-06-15 2020-10-02 上海理想信息产业(集团)有限公司 Knowledge reasoning-based big data service tag expansion method and system
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN111859966A (en) * 2020-06-12 2020-10-30 中国科学院信息工程研究所 Method for generating labeling corpus facing network threat intelligence and electronic device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131055A1 (en) * 2009-04-09 2012-05-24 Sigram Schindler Beteiligungsgesellschaft Mbh Fstp expert system
CN109255034A (en) * 2018-08-08 2019-01-22 数据地平线(广州)科技有限公司 A kind of domain knowledge map construction method based on industrial chain
CN109635171A (en) * 2018-12-13 2019-04-16 成都索贝数码科技股份有限公司 A kind of fusion reasoning system and method for news program intelligent label
CN110275959A (en) * 2019-05-22 2019-09-24 广东工业大学 A kind of Fast Learning method towards large-scale knowledge base
CN110674840A (en) * 2019-08-22 2020-01-10 中国司法大数据研究院有限公司 Multi-party evidence association model construction method based on Bayesian network and evidence chain extraction method and device
CN110968700A (en) * 2019-11-01 2020-04-07 数地科技(北京)有限公司 Domain event map construction method and device fusing multi-class affairs and entity knowledge
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN111859966A (en) * 2020-06-12 2020-10-30 中国科学院信息工程研究所 Method for generating labeling corpus facing network threat intelligence and electronic device
CN111737400A (en) * 2020-06-15 2020-10-02 上海理想信息产业(集团)有限公司 Knowledge reasoning-based big data service tag expansion method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DONGYEOP KANG ET AL.: "Detecting and Explaining Causes From Text For a Time Series Event", 《HTTP://ARXIV.ORG/ABS/1707.08852》, 27 July 2017 (2017-07-27), pages 1 - 10 *
朱福勇;刘雅迪;高帆;王凯;: "基于图谱融合的人工智能司法数据库构建研究", 扬州大学学报(人文社会科学版), no. 06, 30 June 2019 (2019-06-30), pages 90 - 97 *
王元卓;贾岩涛;刘大伟;靳小龙;程学旗;: "基于开放网络知识的信息检索与数据挖掘", 计算机研究与发展, no. 02, 15 February 2015 (2015-02-15), pages 198 - 216 *
谭晓;张志强;: "知识图谱研究进展及其前沿主题分析", 图书与情报, no. 02, 20 April 2020 (2020-04-20), pages 56 - 69 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255353A (en) * 2021-05-31 2021-08-13 中国科学院计算技术研究所厦门数据智能研究院 Entity standardization method
CN116049326A (en) * 2022-12-22 2023-05-02 广州奥咨达医疗器械技术股份有限公司 Medical instrument knowledge base construction method, electronic equipment and storage medium
CN116049326B (en) * 2022-12-22 2024-03-08 广州奥咨达医疗器械技术股份有限公司 Medical instrument knowledge base construction method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
Wang et al. Learning latent opinions for aspect-level sentiment classification
CN109255031B (en) Data processing method based on knowledge graph
CN112528034B (en) Knowledge distillation-based entity relationship extraction method
Khouja Stance prediction and claim verification: An Arabic perspective
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN111783394A (en) Training method of event extraction model, event extraction method, system and equipment
CN111125360B (en) Emotion analysis method and device in game field and model training method and device thereof
CN106776672A (en) Technology development grain figure determines method
US20230069935A1 (en) Dialog system answering method based on sentence paraphrase recognition
CN112667819A (en) Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
Lopes et al. Exploring bert for aspect extraction in portuguese language
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN114840685A (en) Emergency plan knowledge graph construction method
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
Addepalli et al. A proposed framework for measuring customer satisfaction and product recommendation for ecommerce
Rawat et al. Topic modelling of legal documents using NLP and bidirectional encoder representations from transformers
Wang et al. Aspect-based sentiment analysis with graph convolutional networks over dependency awareness
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN116166789A (en) Method naming accurate recommendation and examination method
Panditharathna et al. Question and answering system for investment promotion based on nlp
Rahman et al. ChartSumm: A large scale benchmark for Chart to Text Summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination