CN111159328A - Information knowledge fusion system and method - Google Patents

Information knowledge fusion system and method Download PDF

Info

Publication number
CN111159328A
CN111159328A CN201911142507.XA CN201911142507A CN111159328A CN 111159328 A CN111159328 A CN 111159328A CN 201911142507 A CN201911142507 A CN 201911142507A CN 111159328 A CN111159328 A CN 111159328A
Authority
CN
China
Prior art keywords
knowledge
entity
new
existing
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911142507.XA
Other languages
Chinese (zh)
Inventor
李德启
谢彬
吴剑涛
姜鑫
牛硕硕
刘太林
邱定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN201911142507.XA priority Critical patent/CN111159328A/en
Publication of CN111159328A publication Critical patent/CN111159328A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an information knowledge fusion system and method, comprising the following steps: text information and knowledge base fusion module: integrating the knowledge extracted from the text into the existing knowledge base; the knowledge base and knowledge base fusion module comprises: and integrating the knowledge in the new knowledge base into the existing knowledge base. The method for calculating the information source credibility remarkably improves the efficiency of the method for calculating the information source credibility, and the novel method for fusing the multi-source knowledge base based on the dynamic partition index remarkably improves the calculation efficiency of knowledge fusion.

Description

Information knowledge fusion system and method
Technical Field
The invention belongs to the technical field of text analysis, and particularly relates to an intelligence knowledge fusion system and method.
Background
Knowledge fusion is an important link in the process of creating the knowledge graph. The data for constructing the knowledge graph is often multi-source heterogeneous data, so that knowledge fusion faces many problems to be solved. This solution proposes a solution to the following two main problems.
1. The credibility calculation of the network information and the calculation efficiency;
2. the fusion efficiency problem of multi-source knowledge base fusion.
For the first problem, the scheme provides network information text reliability calculation based on the RNN and the variable-length time sequence, posts are grouped according to a certain time interval, and then each group is used as a unit of the time sequence for training.
For the second problem, the scheme provides a partition index technology, and the index is established in the knowledge base alignment by pruning and filtering out the entity pairs which are impossible to be similar in the knowledge base, so that the similar entity pairs are distributed to one or a plurality of blocks as possible to become candidate pairs, and the final alignment processing is only carried out on the candidate pairs, thereby achieving the purpose of improving the matching efficiency. In the scheme, in order to simultaneously consider the influence of the attributes and the relations on the index partitions, a dynamic index method is adopted for partitioning.
The reference patent literature shows that the related comparison technology has the following defects:
1. how to fuse knowledge from the text is not considered, only fusion between two knowledge bases is considered, and the problem of credibility calculation of information sources is not considered. The scheme provides a method and a process for extracting knowledge from a text and fusing the knowledge with a knowledge base, and provides a high-efficiency reliability calculation algorithm, so that incremental updating of the knowledge base is realized, and the method and the process have fundamental significance for incremental construction of a knowledge map.
2. The multi-source knowledge base is fused, a clustering method is adopted, and when the capacity of the knowledge base is large (in the tens of millions), the time consumption is very large; the KNN clustering algorithm is in direct proportion to the square of the capacity of the knowledge base, and is very inefficient in practical use. The dynamic partition index technology can obviously improve the efficiency of entity matching in fusion.
Correlation search result 1:
application (patent) No. CN201910025114 name knowledge fusion method, device, computer equipment and storage medium
The application relates to the technical field of knowledge maps, in particular to a knowledge fusion method, a knowledge fusion device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of knowledge data in a knowledge data source; extracting entity data in any knowledge data, and carrying out vectorization conversion on the entity data to generate a multi-dimensional word vector; reducing the dimension of the multi-dimensional word vector to obtain a two-dimensional word vector, and multiplying the two-dimensional word vector with the original two-dimensional word vector after the two-dimensional word vector is transferred to obtain an entity data matrix, wherein elements in the entity data matrix are vectorized entity data; acquiring an attribute value of the real attribute data; and inputting elements in the entity data matrix and the attribute values of the real attribute data into a credibility recognition model, obtaining the credibility of the knowledge data after the parameters are output, and fusing after comparing the credibility with a preset credibility threshold value. The method and the device realize effective fusion of a plurality of attributes in the same entity.
The technical points are compared:
1. the reliability calculation method comprises the following steps: confidence calculations are mentioned. The credibility method mentioned in the patent is to compare with entities in the knowledge base, and the method proposed by us is to calculate the credibility of the information source directly according to context comments in the information source. Our approach is more objective. Because the knowledge entered first in the knowledge base cannot be assumed to be more true and accurate, the knowledge base is not suitable to be used as a reference standard.
2. The efficiency problem of entity matching: the method adopts a Kmeans clustering method with high calculation complexity, and does not consider the efficiency problem in practical application. If two relatively large knowledge bases are fused, entity matching can be a time-consuming calculation. Our proposed method of dynamic partition indexing would significantly improve efficiency.
Correlation search result 2:
application (patent) No. CN201810443980 name knowledge fusion method based on multi-source data
The invention provides a knowledge fusion method based on multi-source data, which comprises the steps of firstly respectively carrying out normalized representation on the attribute of each data source when fusing entity data of a plurality of sources, wherein the normalized representation comprises synonymous attribute mapping and uniform conversion of numerical units of attribute values, so that the normalized processing on the attribute can reduce the influence on the subsequent entity comparison; then, the entities are subjected to block aggregation based on the entity names and the entity attributes, so that only entities from different sources in the same block are used as candidate matching entity pairs, comparison between all the entities in two data sources is avoided, and the calculation complexity is reduced; and finally, taking entities from different sources in the same block as candidate entity pairs, calculating the similarity between the entities by adopting an entity alignment algorithm, matching to obtain the entity pairs describing the same objective world in different sources, establishing equivalent links of the same entity between different data sources, and combining entity attributes, wherein unique entities in one data source can be directly added into a knowledge base.
The technical points are compared:
1. fusion of the intelligence text and the knowledge base: the method does not mention the problem of credibility calculation of the information source, and cannot process the problems of processing and selecting entity attribute names and entity attribute value conflicts.
Correlation search result 3:
application (patent) No. CN201710117723 name knowledge fusion method in commodity field
The invention provides a knowledge fusion method in the commodity field, which comprises the steps of obtaining commodity data to be processed; mapping each attribute into a Word Vector with a plurality of dimensions by adopting a Word2Vector method; calculating the similarity between every two attributes according to the word vector distance of the attribute values in the attributes; and fusing the two attributes with the similarity higher than the preset threshold value into the same type of attribute to obtain the commodity data set with the fused attributes. The invention provides a knowledge fusion method with better knowledge fusion effect, which captures a large amount of commodity data for training, greatly reduces the appearance of unknown words when fusing commodity attributes, and does not influence the operation of the method even if individual attribute values do not exist, so that the method has higher practicability than the traditional method using an external knowledge base; the method is not only suitable for commodity knowledge fusion in the E-commerce field, but also suitable for commodity knowledge fusion in other fields, and therefore better service based on knowledge is provided.
The technical points are compared:
1. fusion of the intelligence text and the knowledge base: the method does not mention the problem of credibility calculation of the information source, and cannot process the problems of processing and selecting entity attribute names and entity attribute value conflicts.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an intelligence knowledge fusion system and method.
The invention provides an intelligence knowledge fusion system, which comprises:
text information and knowledge base fusion module: integrating the knowledge extracted from the text into the existing knowledge base;
the knowledge base and knowledge base fusion module comprises: and integrating the knowledge in the new knowledge base into the existing knowledge base.
Preferably, the text information and knowledge base fusion module includes:
a referee clustering and disambiguation module: clustering and normalizing different reference words of the same entity in the text;
entity association knowledge base module: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis module based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating module: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the existing knowledge is considered to be in conflict with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with new knowledge.
Preferably, the analyzing and calculating the credibility of the informative text from the network comprises the following steps: and collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit, modeling the neural network, and reasoning the credibility.
Preferably, the first knowledge warehousing and updating module: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
Preferably, the knowledge base and knowledge base fusion module includes:
a dynamic partition index module: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
a feature matching module based on the structural similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
an entity alignment sub-module across knowledge bases: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
a second knowledge warehousing and updating module: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
Preferably, the second knowledge warehousing and updating module: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
The intelligence knowledge fusion method provided by the invention comprises the following steps:
text information and knowledge base fusion step: integrating the knowledge extracted from the text into the existing knowledge base;
the method comprises the following steps of fusion of a knowledge base: and integrating the knowledge in the new knowledge base into the existing knowledge base.
Preferably, the step of fusing the text information with the knowledge base comprises:
clustering and disambiguation of the referents: clustering and normalizing different reference words of the same entity in the text;
and entity association knowledge base step: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis step based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating step: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the knowledge conflicts with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with the new knowledge;
the analysis and calculation of the credibility of the informative text from the network refers to the following steps: collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit for neural network modeling and reasoning credibility;
the first knowledge warehousing and updating step comprises: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
Preferably, the step of fusing the knowledge base and the knowledge base comprises the following steps:
dynamic partition indexing: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
and feature matching based on the structure similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
entity alignment across repositories sub-step: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
and a second knowledge warehousing and updating step: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
Preferably, the second knowledge warehousing and updating step: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
Compared with the prior art, the invention has the following beneficial effects:
1) a scheme for fusing an intelligence text and a knowledge base is provided; the contrast technology only refers to the fusion of the knowledge base and the knowledge base, and lacks the direct fusion of the text and the knowledge base.
2) The method for calculating the reliability of the information source obviously improves the efficiency of the method for calculating the reliability of the information source.
3) The novel method for fusing the multi-source knowledge base based on the dynamic partition index obviously improves the calculation efficiency of knowledge fusion.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of knowledge fusion (shaded portion) in the construction of a knowledge graph provided by the present invention.
FIG. 2 is a schematic view of the flow structure of the referent clustering and disambiguation sub-module provided in the present invention.
Fig. 3 is a schematic diagram of a network structure of a referent pair encoder and its data flow situation provided by the present invention.
Fig. 4 is a schematic diagram of a "cluster pair" encoder network structure diagram and its data flow situation provided by the present invention.
FIG. 5 is a detailed flow diagram of knowledge base alignment provided by the present invention.
FIG. 6 is a schematic diagram of a model network structure for constructing a time series based on LSTM provided by the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides an intelligence knowledge fusion system, which comprises:
text information and knowledge base fusion module: integrating the knowledge extracted from the text into the existing knowledge base;
the knowledge base and knowledge base fusion module comprises: and integrating the knowledge in the new knowledge base into the existing knowledge base.
Specifically, the text information and knowledge base fusion module includes:
a referee clustering and disambiguation module: clustering and normalizing different reference words of the same entity in the text;
entity association knowledge base module: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis module based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating module: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the existing knowledge is considered to be in conflict with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with new knowledge.
Specifically, the analyzing and calculating the credibility of the informative text from the network refers to: and collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit, modeling the neural network, and reasoning the credibility.
Specifically, the first knowledge warehousing and updating module: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
Specifically, the knowledge base and knowledge base fusion module includes:
a dynamic partition index module: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
a feature matching module based on the structural similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
an entity alignment sub-module across knowledge bases: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
a second knowledge warehousing and updating module: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
Specifically, the second knowledge warehousing and updating module: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
The intelligence knowledge fusion system provided by the invention can be realized through the step flow of the intelligence knowledge fusion method provided by the invention. The person skilled in the art can understand the intelligence knowledge fusion method as a preferred example of the intelligence knowledge fusion system.
The intelligence knowledge fusion method provided by the invention comprises the following steps:
text information and knowledge base fusion step: integrating the knowledge extracted from the text into the existing knowledge base;
the method comprises the following steps of fusion of a knowledge base: and integrating the knowledge in the new knowledge base into the existing knowledge base.
Specifically, the step of fusing the text information and the knowledge base comprises the following steps:
clustering and disambiguation of the referents: clustering and normalizing different reference words of the same entity in the text;
and entity association knowledge base step: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis step based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating step: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the knowledge conflicts with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with the new knowledge;
the analysis and calculation of the credibility of the informative text from the network refers to the following steps: collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit for neural network modeling and reasoning credibility;
the first knowledge warehousing and updating step comprises: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
Specifically, the step of fusing the knowledge base and the knowledge base comprises the following steps:
dynamic partition indexing: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
and feature matching based on the structure similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
entity alignment across repositories sub-step: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
and a second knowledge warehousing and updating step: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
Specifically, the second knowledge warehousing and updating step: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
The present invention will be described more specifically below with reference to preferred examples.
Preferred example 1:
the structure of the whole system is shown in the knowledge fusion in the knowledge graph construction of FIG. 1.
The system comprises the following modules:
1) and the text information and knowledge base fusion module is used for fusing the knowledge extracted from the text into the existing knowledge base.
2) And the knowledge base and knowledge base fusion module is used for fusing the knowledge in the new knowledge base into the existing knowledge base.
Wherein: the text information and knowledge base fusion module comprises the following sub-modules:
1) and a deep reinforcement learning-based referent clustering and disambiguation sub-module (see fig. 2, 3 and 4) for clustering and normalizing different referents of the same entity in the text.
2) And the entity association knowledge base submodule is used for connecting the entity names in the text to the corresponding entities in the knowledge base.
3) And the RNN-based network intelligence text credibility analysis sub-module is used for analyzing and calculating the credibility of the intelligence text from the network. Specifically, relevant network media comments are collected on the extracted intelligence text knowledge, and the comments are converted into an RNN input unit to perform neural network modeling and infer credibility. The analysis result is used for guiding whether the knowledge extracted from the network intelligence text in the next step is worth to be input into the knowledge base, if the credibility is higher than 50%, the knowledge is worth to be input into the knowledge base, but if the confidence is in conflict with the existing knowledge, for example, the entity attribute is inconsistent, manual intervention is needed for selecting whether to keep the knowledge in the existing knowledge base or select new knowledge.
4) And the entity knowledge warehousing and updating submodule is used for storing new entity knowledge or updating the existing entity knowledge with the new entity knowledge. If the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are chosen or rejected according to the credibility.
The knowledge base and knowledge base fusion module comprises the following sub-modules:
1) and a dynamic partition index submodule. The purpose of this submodule is to improve the entity matching efficiency in the knowledge fusion calculation process and reduce the calculation complexity. The index is established in the knowledge base alignment (see fig. 5) by pruning to filter out the entity pairs which are impossible to be similar in the knowledge base, so that the similar entity pairs are distributed to one or a plurality of blocks as possible to become candidate pairs, and the final alignment processing is only carried out on the candidate pairs, thereby achieving the purpose of improving the matching efficiency. In the scheme, in order to simultaneously consider the influence of the attributes and the relations on the index partitions, a dynamic index method is adopted for partitioning. The method constructs index key value pairs according to the category, the example and the word size of prior alignment in a knowledge base, and recursively creates sub-partitions according to the index key value pairs until the size of each partition is smaller than a specified threshold or each pair of index key value pairs is used;
2) a feature matching sub-module based on a structural similarity function, which is used to assist in entity alignment, and which is used to calculate similarity scores between attributes. The process of entity alignment is performed attribute by attribute. Here the similarity between pairs of entities is measured by the ratio of the intersection to the union of the pairs of entities to the common neighbor set.
3) And the entity alignment submodule of the cross-knowledge base regards the entity alignment problem as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score. And when the similarity of the attributes exceeds 50%, the entities can be considered as the same entity, and the entities are aligned and subjected to knowledge fusion.
4) And the knowledge warehousing and updating sub-module has the same function as the knowledge warehousing and updating sub-module in the text information and knowledge base fusion module.
Example 1:
(1) neural network model based on LSTM and variable-length time series
The neural network model based on the LSTM and the variable-length time sequence is used for calculating the reliability of the network intelligence text and improving the efficiency of the information source reliability calculation method.
1) Basic flow
The present solution uses an RNN-based model to classify network media events as trusted and untrusted. First, we will describe converting a network media tag (e.g., Twitter) into a continuous variable-length time sequence, and then using an RNN structure with a single-layer LSTM kernel for confidence classification.
2) Question statement
Social text information such as a personal published microblog sticker is generally short and limited. If we start from the statement of a time, usually an event declaration will involve many posts related to it. The true and false on the personal sticker level is not concerned, but the statement of a certain event is judged by using the overall information of all stickers. Therefore, the main task of the scheme is not to predict the true and false of each label but to care about the true and false of an event. ***
Defining a given series of events E ═ EiEach of the events Ei={(mi,j,ti,j) Ideally t is the time stamp of all correlationsi,jPaster mi,jAnd (4) forming. The goal is to detect the true and false of each event.
3) Variable length time series
During modeling, each sub-label is used as an input example, and an RNN network is constructed to simulate a time sequence with the sequence length equal to the number of the related sub-labels. Here, it should be noted that a popular event may have thousands of people posting, and we only set a single output layer at the last time step of an event to judge whether the event is true or false. In this case, the computation cost of back propagation with only the last stage loss but through a large number of time steps is very large and inefficient. To this end we group posts at certain time intervals and then train each group as a unit of a time sequence.
The denser posting period in the information propagation process should be taken into account, and we use the number of time intervals in the high frequency period to approximately determine the RNN reference length. For this reason, the algorithm specially designed by the scheme realizes the selection of the appropriate time span. First, the entire timeline is divided equally into N (reference RNN sequence length) time intervals. Then finding a set U of non-space-time intervals in the space-time interval by deleting the space-time interval(i.e., U)Each time interval containing at least one post), wherein a set of consecutive time intervals containing the longest time span is selected into the collection
Figure BDA0002281321740000111
If it is not
Figure BDA0002281321740000112
The time interval which is less than N and the current time interval number is greater than the time interval number of the previous round is continuously divided, the divided time interval is half of the time interval of the previous round, otherwise, the time interval is divided
Figure BDA0002281321740000113
As output values. It should be noted that the number of time series generated by this partitioning method will be close to N, but the number of time intervals of different events will differ, but the individual time intervals in an event will be equal.
4) Algorithm
Figure BDA0002281321740000114
5) Model structure
Based on the above constructed time series, the recurrent neurons of the RNN can already adapt the time interval very naturally. In each time interval we use tf idf values as input for each word. Where we will delete some unimportant words, keep the K words with the highest tf index idf, so the input dimension is K. The neuron output at the last time step will be used to calculate the probability of the event being true or false using the softmax function. To get a low dimensional representation we add an embedding layer between the hidden layer and the input layer of the LSTM. As shown in particular in fig. 6.
The training target scheme is provided with 2 labels, wherein the form of the labels is [1,0] represents that the event information is true, and [0,1] represents that the event information is false. For each training instance (per event), the goal is to minimize the squared difference between the predicted and true values:
Figure BDA0002281321740000121
wherein g isc,pcRespectively representing the true value and the predicted value, and c representing the label of the true situation. ThetaiTo the right of the model parameters, the term is the L2 regularization term penalty term, in order to prevent overfitting.
Model training the model is trained by using a back propagation algorithm, and parameter updating is carried out by using an AdaGrad algorithm. Wherein, the vocabulary is set to be K5000, the reference RNN sequence length is set to be N50, the number of hidden neurons is 100, and the learning rate is 0.5.
(2) Dynamic partition indexing
The dynamic partition index is introduced to partition and index the knowledge base when the knowledge base is aligned, so as to accelerate the matching process of the entity pair.
An index is a structure that sorts values of one or more columns in a database table, and the index can be used to quickly access specific information in the database table. The index is established in the knowledge base alignment, namely, the unlikely similar entity pairs in the knowledge base are filtered through pruning, so that the similar entity pairs are distributed to one or more blocks as possible to form candidate pairs, and the final alignment treatment is only carried out on the candidate pairs, thereby achieving the purpose of improving the matching efficiency.
One of the key problems is the selection of the index key. The index key value is a function of one or more attributes of the entity set in the knowledge base, and the entity set to be matched is divided by the function values, so that the blocks can contain all the matching entity pairs, and the fewer the generated candidate pairs, the better. The selection of the index key value needs to consider 3 factors:
quality of attribute values. Because any absence or error of an attribute value as an index key may result in misclassification of the entity, thereby affecting the result of the alignment, the attribute value as an index key value is as complete and correct as possible.
Distribution of attribute values. Under the condition of a certain number of entities, the skewed attribute values are distributed, so that the matching pairs in partial partitions are far larger than those in other partitions, the total number of matches is increased, and the uniformly distributed attribute values generate the least matching pairs. The distribution of the attribute values is therefore as uniform as possible.
A trade-off between the number of blocks and the size. Generating a relatively small number of larger partitions by indexing may reduce the probability of loss of potentially matching entity pairs, but may result in more candidate pairs; while a large number of smaller partitions can reduce the number of candidate pairs, more potentially matching entity pairs may be lost. It is therefore desirable to design an indexing scheme that makes partitioning as possible without losing as many possible matches as possible.
In the scheme, in order to simultaneously consider the influence of the attributes and the relations on the index partitions, a dynamic index method is adopted for partitioning.
In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. An intelligence knowledge fusion system, comprising:
text information and knowledge base fusion module: integrating the knowledge extracted from the text into the existing knowledge base;
the knowledge base and knowledge base fusion module comprises: and integrating the knowledge in the new knowledge base into the existing knowledge base.
2. The intelligence knowledge fusion system of claim 1, wherein the text information and knowledge base fusion module comprises:
a referee clustering and disambiguation module: clustering and normalizing different reference words of the same entity in the text;
entity association knowledge base module: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis module based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating module: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the existing knowledge is considered to be in conflict with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with new knowledge.
3. The intelligence knowledge fusion system of claim 2, wherein the evaluation of the credibility of the intelligence text from the network comprises: and collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit, modeling the neural network, and reasoning the credibility.
4. The intelligence knowledge fusion system of claim 1, wherein the first knowledge warehousing and updating module: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
5. The intelligence knowledge fusion system of claim 1, wherein the knowledge base and knowledge base fusion module comprises:
a dynamic partition index module: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
a feature matching module based on the structural similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
an entity alignment sub-module across knowledge bases: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
a second knowledge warehousing and updating module: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
6. The intelligence knowledge fusion system of claim 5, wherein the second knowledge warehousing and updating module: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
7. An intelligence knowledge fusion method, comprising:
text information and knowledge base fusion step: integrating the knowledge extracted from the text into the existing knowledge base;
the method comprises the following steps of fusion of a knowledge base: and integrating the knowledge in the new knowledge base into the existing knowledge base.
8. The intelligence knowledge fusion method according to claim 7, wherein the text information and knowledge base fusion step comprises:
clustering and disambiguation of the referents: clustering and normalizing different reference words of the same entity in the text;
and entity association knowledge base step: connecting the entity names in the text to corresponding entities in a knowledge base;
network intelligence text credibility analysis step based on RNN: analyzing and calculating the credibility of the information text from the network to obtain a credibility analysis result;
a first knowledge warehousing and updating step: according to the output reliability analysis result, if the reliability is higher than a preset value, the user is considered to be worthy of entering a knowledge base, and new knowledge is stored; otherwise, the knowledge conflicts with the existing knowledge, and manual intervention is needed to select whether to keep the knowledge in the existing knowledge base or update the existing knowledge with the new knowledge;
the analysis and calculation of the credibility of the informative text from the network refers to the following steps: collecting related network media comments on the extracted information text knowledge, converting the comments into a Recurrent Neural Network (RNN) input unit for neural network modeling and reasoning credibility;
the first knowledge warehousing and updating step comprises: saving the new entity knowledge or updating the existing entity knowledge with the new entity knowledge; if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attribute of the new knowledge entity and the attribute in the old knowledge entity are selected according to the credibility analysis result.
9. The intelligence knowledge fusion method of claim 7, wherein the knowledge base and knowledge base fusion step comprises:
dynamic partition indexing: filtering out the entity pairs which are impossible to be similar in the knowledge base through pruning, establishing indexes in the alignment of the knowledge base, distributing the similar entity pairs to one or more blocks to form candidate pairs by adopting a dynamic index method, and finally performing alignment treatment on the candidate pairs;
and feature matching based on the structure similarity function: calculating similarity scores among the attributes, namely measuring the similarity between the entity pairs by using the ratio of the intersection and the union of the entity pair common neighbor sets to obtain the attribute similarity scores;
entity alignment across repositories sub-step: the entity alignment problem is regarded as a classification problem for judging whether the entity pair to be matched is matched according to the attribute similarity score, if the attribute similarity score is higher than a preset value, the entity pair to be matched is regarded as the same entity, and at the moment, the entities are aligned and knowledge fusion is carried out;
and a second knowledge warehousing and updating step: the new entity knowledge is saved or the existing entity knowledge is updated with the new entity knowledge.
10. The intelligence knowledge fusion method according to claim 9, wherein the second knowledge warehousing and updating step: if the new entity knowledge is the entity knowledge which is not in the existing library, storing the new entity knowledge in a library and adding an information source; if the entity knowledge exists in the library, adding the attribute which does not exist in the corresponding library into the library; if the attribute already exists in the existing knowledge entity, the attributes of the new knowledge entity and the attributes in the old knowledge entity are selected according to the attribute similarity score.
CN201911142507.XA 2019-11-20 2019-11-20 Information knowledge fusion system and method Pending CN111159328A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142507.XA CN111159328A (en) 2019-11-20 2019-11-20 Information knowledge fusion system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142507.XA CN111159328A (en) 2019-11-20 2019-11-20 Information knowledge fusion system and method

Publications (1)

Publication Number Publication Date
CN111159328A true CN111159328A (en) 2020-05-15

Family

ID=70556034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142507.XA Pending CN111159328A (en) 2019-11-20 2019-11-20 Information knowledge fusion system and method

Country Status (1)

Country Link
CN (1) CN111159328A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905806A (en) * 2021-03-25 2021-06-04 哈尔滨工业大学 Knowledge graph materialized view generator and generation method based on reinforcement learning
CN115033716A (en) * 2022-08-10 2022-09-09 深圳市人马互动科技有限公司 General self-learning system and self-learning method based on same

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017123168A (en) * 2016-01-05 2017-07-13 富士通株式会社 Method for making entity mention in short text associated with entity in semantic knowledge base, and device
CN109886294A (en) * 2019-01-11 2019-06-14 平安科技(深圳)有限公司 Knowledge fusion method, apparatus, computer equipment and storage medium
CN110457486A (en) * 2019-07-05 2019-11-15 中国人民解放军战略支援部队信息工程大学 The people entities alignment schemes and device of knowledge based map

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017123168A (en) * 2016-01-05 2017-07-13 富士通株式会社 Method for making entity mention in short text associated with entity in semantic knowledge base, and device
CN109886294A (en) * 2019-01-11 2019-06-14 平安科技(深圳)有限公司 Knowledge fusion method, apparatus, computer equipment and storage medium
CN110457486A (en) * 2019-07-05 2019-11-15 中国人民解放军战略支援部队信息工程大学 The people entities alignment schemes and device of knowledge based map

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PELHANS: "知识图谱入门(六)知识融合", 《知乎》 *
吴小华等: "基于Self-Attention和 Bi-LSTM 的中文短文本情感分析", 《中文信息学报》 *
熊晶: "《甲骨学知识图谱构建方法研究》", 31 January 2019 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905806A (en) * 2021-03-25 2021-06-04 哈尔滨工业大学 Knowledge graph materialized view generator and generation method based on reinforcement learning
CN115033716A (en) * 2022-08-10 2022-09-09 深圳市人马互动科技有限公司 General self-learning system and self-learning method based on same

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
CN111428054B (en) Construction and storage method of knowledge graph in network space security field
CN110968699A (en) Logic map construction and early warning method and device based on event recommendation
CN110633366B (en) Short text classification method, device and storage medium
CN104239553A (en) Entity recognition method based on Map-Reduce framework
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN113268370B (en) Root cause alarm analysis method, system, equipment and storage medium
CN105844398A (en) PLM (product life-cycle management) database-based mining algorithm for DPIPP (distributed parameterized intelligent product platform) product families
Elayidom et al. A generalized data mining framework for placement chance prediction problems
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
CN114741519A (en) Paper correlation analysis method based on graph convolution neural network and knowledge base
CN111159328A (en) Information knowledge fusion system and method
CN110033191B (en) Business artificial intelligence analysis method and system
Rabbi et al. An Approximation For Monitoring The Efficiency Of Cooperative Across Diverse Network Aspects
Jastrzebska et al. Fuzzy cognitive map-driven comprehensive time-series classification
CN112069825B (en) Entity relation joint extraction method for alert condition record data
CN117151222A (en) Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
CN116226404A (en) Knowledge graph construction method and knowledge graph system for intestinal-brain axis
CN111221704B (en) Method and system for determining running state of office management application system
Diday Symbolic Data Analysis and the SODAS project: purpose, history, perspective
Boulanouar et al. A hybrid approach for linguistic summarization of time series
Szymczak et al. Coreference detection in XML metadata
Butalia et al. Applications of rough sets in the field of data mining
CN117668259B (en) Knowledge-graph-based inside and outside data linkage analysis method and device
Mehta et al. Temporal sequential pattern in data mining tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200515