CN111680488A - Cross-language entity alignment method based on knowledge graph multi-view information - Google Patents

Cross-language entity alignment method based on knowledge graph multi-view information Download PDF

Info

Publication number
CN111680488A
CN111680488A CN202010512003.9A CN202010512003A CN111680488A CN 111680488 A CN111680488 A CN 111680488A CN 202010512003 A CN202010512003 A CN 202010512003A CN 111680488 A CN111680488 A CN 111680488A
Authority
CN
China
Prior art keywords
entity
language
vector
text
description
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010512003.9A
Other languages
Chinese (zh)
Other versions
CN111680488B (en
Inventor
鲁伟明
徐玮
吴飞
庄越挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010512003.9A priority Critical patent/CN111680488B/en
Publication of CN111680488A publication Critical patent/CN111680488A/en
Application granted granted Critical
Publication of CN111680488B publication Critical patent/CN111680488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a cross-language entity alignment method based on knowledge graph multi-view information. Firstly, respectively extracting information according to triples and entity description texts of two language knowledge maps to construct a structure diagram and a text diagram, and coding vector representation on an entity structure and vector representation on a text by using a double-layer diagram convolutional network; then according to the entity description text and the cross-language linguistic data, bidirectional long-and-short-term memory network is used for coding vector representation on the entity description; and calculating a final cross-language alignment entity pair by combining the vector distances of the paired entities under the three visual angles in a weighting mode. The invention realizes the cross-language entity alignment of the knowledge graph, optimizes the entity vector representation based on the multi-view information of the structure and the text, and improves the accuracy of the cross-language entity alignment.

Description

Cross-language entity alignment method based on knowledge graph multi-view information
Technical Field
The invention relates to a cross-language entity alignment method based on knowledge graph multi-view information, in particular to a technology for realizing cross-language entity alignment based on knowledge graph structures and text information by utilizing a convolutional neural network.
Background
Due to the rapid development of the internet and the explosion of internet information, people need to structure information so as to further analyze and utilize the information and serve various tasks and scenes, and therefore knowledge maps are produced at the same time. The knowledge graph is a large-scale semantic network in nature, and is a structured knowledge base which formally describes things of an objective world and relations among the things. Entity alignment is to determine whether entities with different names or entities from different sources point to unique objects in the real world. In the multilingual knowledge graph, a part of cross-language entity links generally exist to indicate known entity alignment, and through the known entity pairs and the cross-language entity alignment technology, more entity alignment relations can be found out, so that the information of the knowledge graph is enriched, and the subsequent cross-language task can be expanded.
For the task of aligning across language entities, the traditional methods of academia include methods based on rules and similarity calculation and methods based on machine learning. With the introduction of deep learning and the gradual development and deepening in the field of natural language processing, entity embedding representation and deep neural network-based entity alignment methods become mainstream, most methods are based on structured data of a knowledge graph, usually comparison and calculation of attribute triples and relationship triples, and entity alignment cannot be optimized by effectively utilizing text information.
Disclosure of Invention
The invention aims to encode the entity representation of the knowledge graph from multiple visual angles by using the structural information and the text information of the cross-language knowledge graph, thereby improving the alignment effect of the cross-language entity.
The purpose of the invention is realized by the following technical scheme: a cross-language entity alignment method based on knowledge map multi-view information calculates the distance between entities by encoding entity structure vectors, entity text vectors and entity description vectors, and finds out a cross-language alignment entity pair. The method comprises the following steps:
1) and (3) entity structure vector coding based on the relation triples: and respectively constructing a structure diagram for the knowledge graphs of the two languages according to the relation triples. The structure diagram takes the entities as nodes, edges are formed between the entities with relationships, and the specific weight of the edges is calculated according to the relationships between the entities to form an adjacency matrix of the structure diagram. On the constructed structure chart, a double-layer graph convolution network is adopted for training, and the vector representation of the current entity is continuously updated by using the entity and the entity codes around the entity. The graph convolution networks of the two knowledge-graphs share a weight matrix. And optimizing the entity structure vector representation according to the pre-aligned cross-language aligned entity pair S and the positive and negative example entity pair ternary loss functions.
2) Entity text vector encoding based on entity description information: and combining knowledge maps of the two languages, and constructing a uniform text graph by using the entity and the description text. The text graph has two types of nodes: entity nodes and word nodes in an entity description, have three types of edges: an "entity-descriptor" side, a "descriptor-descriptor" side within a single language, a "descriptor-descriptor" side across languages. Weights are calculated for each type of edge, forming an adjacency matrix. On the constructed text graph, a double-layer graph convolution network is adopted for training, and entity text vector representation is optimized according to a pre-aligned cross-language alignment entity pair S and a positive and negative example entity pair ternary loss function.
3) Entity description vector coding based on entity description information and cross-language corpora: the method comprises the steps of pre-training cross-language aligned word vectors on a single-language corpus and a cross-language parallel corpus of two languages by using Bilbwa, then using a series of word vectors of each entity description as input, and coding the entity descriptions by using a bidirectional long-short-term memory network (BilSTM) to obtain entity description vectors. Optimizing the network structure by optimizing the distance between the entity description vectors of the pre-aligned cross-language aligned entity pair S to obtain the final description vectors of all entities.
4) Computing a cross-language alignment entity pair from the multi-perspective entity vector: regarding each entity in one language knowledge graph, taking each entity of the other language knowledge graph as a candidate entity, calculating the distance between the entity and the candidate entity according to the entity structure vector, the entity text vector and the entity description vector which are respectively obtained in the step 1) and the step 2) and the step 3), sorting the distances from small to large, and selecting the entity pair with the minimum distance as an aligned entity pair.
Further, in step 1), the weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
1.1) weight calculation of adjacency matrix A: for entity eiAnd ejWeight a between themij∈ A is calculated as:
Figure BDA0002528663340000031
Figure BDA0002528663340000032
Figure BDA0002528663340000033
where fun (r) and ifun (r) are the influence scores of the relation r in the forward direction and the reverse direction, respectively, G is the knowledge graph, # Triples _ of _ r is the number of triplets in the relation triplets about the relation r, and # Head _ activities _ of _ r and # Tail _ activities _ of _ r are the number of Head Entities and the number of Tail Entities involved in the triplets of the relation r, respectively.
1.2) calculating entity vectors in the graph convolution network: the input of the graph convolution network is a solid structure characteristic matrix
Figure BDA0002528663340000034
Obtained by random initialization, n representing the total number of entities, dsRepresenting the dimension of the solid structure feature vector. The overall calculation formula of the graph convolution network of the structure diagram is as follows:
Figure BDA0002528663340000035
wherein
Figure BDA0002528663340000036
Figure BDA0002528663340000037
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,
Figure BDA0002528663340000038
is that
Figure BDA0002528663340000039
A diagonal node degree matrix of. Weight matrix
Figure BDA00025286633400000310
And
Figure BDA00025286633400000311
are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
1.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA00025286633400000312
wherein f iss(p)=||hs(e1),hs(e2)||1Is a function of scoring the entity distance, calculating the Manhattan distance, h, between the entity structure vectorss(e1),hs(e2) Respectively represent entities e1,e2The structure vector of (1). Gamma raysIs the spacing constraint between the structure vectors.
Further, in the step 2), before the knowledge graph is combined, the entity description information is preprocessed, illegal characters, participles, stop words and the like are filtered, and words with too low frequency in the corpus are filtered.
Further, in the step 2), the weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
2.1) weight calculation of adjacency matrix A: the weight of the three types of edges and the weight calculation mode of the text map adjacency matrix are specifically as follows:
2.1.1) "entity-descriptor" edge:
for the edge formed by the entity and the descriptor, the weight is calculated by using the word frequency-inverse document frequency (TF-IDF), and the calculation formula is as follows:
Figure BDA0002528663340000041
Figure BDA0002528663340000042
TFIDF(t,d)=TF(t,d)×IDF(t)
where TF (t, d) calculates the frequency with which the word t appears in the entity description d, nt,dIs the number of occurrences of the word t in the entity description d, ∑t′∈dnt′,dIDF (t) is the inverse document frequency of the word t in the entity description set D, | D | is the total number of entity descriptions in the entity description set, | { D ∈ D: t ∈ D } | is the number of entity descriptions in the entity description set that contain the word t.
2.1.2) single language "descriptor-descriptor" side:
for the edges formed between the descriptors of the single language, firstly, the global word co-occurrence condition is calculated through a sliding window, then, the Point Mutual Information (PMI) of two words is calculated to obtain the weight, and for any two words i and j, the weight calculation formula is as follows:
Figure BDA0002528663340000043
Figure BDA0002528663340000044
Figure BDA0002528663340000045
where # W represents the number of sliding windows in all entity description corpuses, # W (i) represents the number of sliding windows containing word i, and # W (i, j) represents the number of sliding windows containing both word i and word j.
2.1.3) Cross-language "descriptor-descriptor" edge:
for the edges formed between the cross-language descriptors, utilizing a pre-aligned cross-language aligning entity pair S, connecting the word in each entity description text with all the words in the description of the aligning entity pairwise, and calculating the frequency of each formed descriptor pair in the descriptor pairs formed by all the aligning entity pairs to enhance the cross-language information. This method is referred to herein using X-DF (Cross Document frequency).
For words i and j from two knowledge-graph entity descriptions, respectively, the weight calculation formula is:
Figure BDA0002528663340000051
where count (i, j) represents the number of word pairs consisting of words i and j of the text descriptions of all aligned entity pairs, and count (d) represents the number of word pairs consisting of text descriptions of all aligned entity pairs.
2.1.4) weight calculation mode of text image adjacency matrix:
Figure BDA0002528663340000052
2.2) calculating entity vectors in the graph convolution network: the input of the graph convolution network is an entity text characteristic matrix
Figure BDA0002528663340000053
Obtained by random initialization, n represents the total number of entities, m represents the total number of words, dtRepresenting the entity text feature vector dimension. The general calculation formula of the graph convolution network of the text graph is similar to the step 1.2), and specifically comprises the following steps:
Figure BDA0002528663340000054
wherein
Figure BDA0002528663340000055
Figure BDA0002528663340000056
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,
Figure BDA0002528663340000057
is that
Figure BDA0002528663340000058
A diagonal node degree matrix of. Weight matrix
Figure BDA0002528663340000059
And
Figure BDA00025286633400000510
are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
2.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA0002528663340000061
wherein f ist(p)=||ht(e1),ht(e2)||1Is an entity distance scoring function, calculates the Manhattan distance, h, between entity text vectorst(e1),ht(e2) Respectively represent entities e1,e2The text vector of (2). Gamma raytIs the spacing constraint between text vectors.
Further, the step 3) specifically includes the following sub-steps:
3.1) corpus processing: the available cross-language parallel linguistic data can be directly used, and also partial linguistic data can be extracted from the monolingual linguistic data, and the cross-language parallel linguistic data can be obtained through a translation tool. The cross-language parallel linguistic data is processed into sentence alignment, and operations such as punctuation filtering, word stop and the like are completed on the linguistic data.
3.2) pre-training across language word vectors: cross-language word vector representations are trained using a cross-language word vector training model Bilbowa based on single-language corpora of two languages and sentence-aligned parallel corpora.
3.3) entity description vector coding: pre-training word vector sequence corresponding entity description with words
Figure BDA0002528663340000062
Expressing, | s | is the total number of words in the entity description, dd is the dimension of the entity description vector, and the vector representation of the entity description is obtained by optimizing and aligning the distance between the entity vectors by using BilSTM training, wherein the specific formula is as follows:
Figure BDA0002528663340000063
Figure BDA0002528663340000064
Figure BDA0002528663340000065
Figure BDA0002528663340000066
wherein h istCorresponding to the vector of the tth word of the text description, averaging the vector representations of all the words to obtain an entity description vector hd
For an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA0002528663340000071
wherein f isd(p)=||hd(e1),hd(e2)||1Is an entity distance scoring function, and calculates the Manhattan distance h between entity description vectorsd(e1),hd(e2) Respectively represent entities e1And e2The description vector of (1). Gamma raydIs a space constraint between description vectors.
Further, in the step 4), the distance between the entity pairs is calculated in the following specific manner:
the pair of entities p ═ of two different knowledge graphs (e)1,e2) The distance between them is calculated by the formula:
Figure BDA0002528663340000072
wherein d iss、dt、ddRepresenting the dimension of the entity structure vector, the dimension of the entity text vector, and the dimension of the entity description vector, α and β are hyper-parameters used to weigh the distance of the three parts, respectively.
If only entity structure vectors and entity text vectors are used, the pair of entities p of the two different knowledge-graphs is (e)1,e2) The distance between them is calculated by the formula:
Figure BDA0002528663340000073
where α is a hyper-parameter used to trade off the two-part distance.
Compared with the prior art, the method has the following beneficial effects:
1. the method provides a model for coding an entity structure and a text to obtain cross-language information based on a graph convolution network, constructs a structural graph and a text graph by designing proper node and edge weights, optimizes coding of an entity vector by adopting the graph convolution network, and improves the alignment accuracy of cross-language entities.
2. The method provides semantic vectors described by the text coding entity based on cross-language word vector pre-training and the bidirectional memory network coding entity description, further increases the coding of entity text information, and improves the alignment effect of cross-language entities.
3. This approach can have good results with less training data and is a higher improvement when more training data is provided than other approaches.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
FIG. 2 is a diagram of an overall model of the present invention;
FIG. 3 is a diagram of a knowledge-graph structure and textual information in accordance with one embodiment of the present invention;
FIG. 4 is a diagram of a solid structure vector coding model according to an embodiment of the present invention;
FIG. 5 is a diagram of an entity text vector coding model according to an embodiment of the present invention;
FIG. 6 is a graph of experimental results of an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
As shown in fig. 1, the method for aligning cross-language entities based on knowledge-graph multi-view information provided by the invention comprises the following steps:
1) and (3) entity structure vector coding based on the relation triples: and respectively constructing a structure diagram for the knowledge graphs of the two languages according to the relation triples. The structure diagram takes the entities as nodes, edges are formed between the entities with relationships, and the specific weight of the edges is calculated according to the relationships between the entities to form an adjacency matrix of the structure diagram. On the constructed structure chart, a double-layer graph convolution network is adopted for training, and the graph convolution networks of the two knowledge graphs share a weight matrix. And optimizing the entity structure vector representation according to the pre-aligned cross-language aligned entity pair S and the positive and negative example entity pair ternary loss functions.
2) Entity text vector encoding based on entity description information: and combining knowledge maps of the two languages, and constructing a uniform text graph by using the entity and the description text. The text graph has two types of nodes: entity nodes and word nodes in an entity description, have three types of edges: an "entity-descriptor" side, a "descriptor-descriptor" side within a single language, a "descriptor-descriptor" side across languages. Weights are calculated for each type of edge, forming an adjacency matrix. On the constructed text graph, a double-layer graph convolution network is adopted for training, and entity text vector representation is optimized according to a pre-aligned cross-language alignment entity pair S and a positive and negative example entity pair ternary loss function.
3) Entity description vector coding based on entity description information and cross-language corpora: the method comprises the steps of pre-training word vectors aligned across languages by using Bilbwa on a single language material and a cross-language parallel language material of two languages, then using a series of word vectors described by each entity as input, and coding the entity description by using a bidirectional long-time and short-time memory network to obtain the entity description vectors. Optimizing the network structure by optimizing the distance between the entity description vectors of the pre-aligned cross-language aligned entity pair S to obtain the final description vectors of all entities.
4) Computing a cross-language alignment entity pair from the multi-perspective entity vector: regarding each entity in one language knowledge graph, taking each entity of the other language knowledge graph as a candidate entity, calculating the distance between the entity and the candidate entity according to the entity structure vector, the entity text vector and the entity description vector which are respectively obtained in the step 1) and the step 2) and the step 3), sorting the distances from small to large, and selecting the entity pair with the minimum distance as an aligned entity pair.
Further, in step 1), the weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
1.1) weight calculation of adjacency matrix A: for entity eiAnd ejWeight a between themij∈ A is calculated as:
Figure BDA0002528663340000091
Figure BDA0002528663340000092
Figure BDA0002528663340000093
where fun (r) and ifun (r) are the influence scores of the relation r in the forward direction and the reverse direction, respectively, G is the knowledge graph, # Triples _ of _ r is the number of triplets in the relation triplets about the relation r, and # Head _ activities _ of _ r and # Tail _ activities _ of _ r are the number of Head Entities and the number of Tail Entities involved in the triplets of the relation r, respectively.
1.2) calculating entity vectors in the graph convolution network: the input of the graph convolution network is a solid structure characteristic matrix
Figure BDA0002528663340000094
And the method is obtained by random initialization, wherein n represents the total entity number, and ds represents the dimension of the entity structure feature vector. The overall calculation formula of the graph convolution network of the structure diagram is as follows:
Figure BDA0002528663340000095
wherein
Figure BDA0002528663340000096
Figure BDA0002528663340000097
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,
Figure BDA0002528663340000098
is that
Figure BDA0002528663340000099
A diagonal node degree matrix of. Weight matrix
Figure BDA00025286633400000910
And
Figure BDA00025286633400000911
are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
1.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA0002528663340000101
wherein f iss(p)=||hs(e1),hs(e2)||1Is a function of scoring the entity distance, calculating the Manhattan distance, h, between the entity structure vectorss(e1),hs(e2) Respectively represent entities e1,e2The structure vector of (1). Gamma raysIs the spacing constraint between the structure vectors.
Further, in the step 2), before the knowledge graph is combined, the entity description information is preprocessed, illegal characters, participles, stop words and the like are filtered, and words with too low frequency in the corpus are filtered. The weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
2.1) weight calculation of adjacency matrix A: the weight of the three types of edges and the weight calculation mode of the text map adjacency matrix are specifically as follows:
2.1.1) "entity-descriptor" edge:
for the edge formed by the entity and the descriptor, the weight is calculated by using the word frequency-inverse document frequency (TF-IDF), and the calculation formula is as follows:
Figure BDA0002528663340000102
Figure BDA0002528663340000103
TFIDF(t,d)=TF(t,d)×IDF(t)
where TF (t, d) calculates the frequency with which the word t appears in the entity description d, nt,dIs the number of occurrences of the word t in the entity description d, ∑t′∈dnt′,dIDF (t) is the inverse document frequency of the word t in the entity description set D, | D | is the total number of entity descriptions in the entity description set, | { D ∈ D: t ∈ D } | is the number of entity descriptions in the entity description set that contain the word t.
2.1.2) single language "descriptor-descriptor" side:
for the edges formed between the descriptors of the single language, firstly, the global word co-occurrence condition is calculated through a sliding window, then, the Point Mutual Information (PMI) of two words is calculated to obtain the weight, and for any two words i and j, the weight calculation formula is as follows:
Figure BDA0002528663340000104
Figure BDA0002528663340000111
Figure BDA0002528663340000112
where # W represents the number of sliding windows in all entity description corpuses, # W (i) represents the number of sliding windows containing word i, and # W (i, j) represents the number of sliding windows containing both word i and word j.
2.1.3) Cross-language "descriptor-descriptor" edge:
for the edges formed between the cross-language descriptors, utilizing a pre-aligned cross-language aligning entity pair S, connecting the word in each entity description text with all the words in the description of the aligning entity pairwise, and calculating the frequency of each formed descriptor pair in the descriptor pairs formed by all the aligning entity pairs to enhance the cross-language information. This method is referred to herein using X-DF (Cross Document frequency).
For words i and j from two knowledge-graph entity descriptions, respectively, the weight calculation formula is:
Figure BDA0002528663340000113
where count (i, j) represents the number of word pairs consisting of words i and j of the text descriptions of all aligned entity pairs, and count (d) represents the number of word pairs consisting of text descriptions of all aligned entity pairs.
2.1.4) weight calculation mode of text image adjacency matrix:
Figure BDA0002528663340000114
2.2) calculating entity vectors in the graph convolution network: the input of the graph convolution network is an entity text characteristic matrix
Figure BDA0002528663340000115
Obtained by random initialization, n represents the total number of entities, m represents the total number of words, dtRepresenting the entity text feature vector dimension. Graph convolution network for text graphsThe general calculation formula is similar to step 1.2), specifically:
Figure BDA0002528663340000116
wherein
Figure BDA0002528663340000117
Figure BDA0002528663340000118
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,
Figure BDA0002528663340000121
is that
Figure BDA00025286633400001211
A diagonal node degree matrix of. Weight matrix
Figure BDA0002528663340000123
And
Figure BDA0002528663340000124
are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
2.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA0002528663340000125
wherein f ist(p)=||ht(e1),ht(e2)||1Is an entity distance scoring function, calculates the Manhattan distance, h, between entity text vectorst(e1),ht(e2) Respectively represent entities e1,e2The text vector of (2). Gamma raytIs the spacing constraint between text vectors.
Further, the step 3) specifically includes the following sub-steps:
3.1) corpus processing: the available cross-language parallel linguistic data can be directly used, and also partial linguistic data can be extracted from the monolingual linguistic data, and the cross-language parallel linguistic data can be obtained through a translation tool. The cross-language parallel linguistic data is processed into sentence alignment, and operations such as punctuation filtering, word stop and the like are completed on the linguistic data.
3.2) pre-training across language word vectors: cross-language word vector representations are trained using a cross-language word vector training model Bilbowa based on single-language corpora of two languages and sentence-aligned parallel corpora.
3.3) entity description vector coding: pre-training word vector sequence corresponding entity description with words
Figure BDA0002528663340000126
Representing, | s | is the total number of words in the entity description, ddFor the entity description vector dimension, the distance between aligned entity vectors is optimized by using BilSTM training to obtain the vector representation of the entity description, and the specific formula is as follows:
Figure BDA0002528663340000127
Figure BDA0002528663340000128
Figure BDA0002528663340000129
Figure BDA00025286633400001210
wherein h istCorresponding to the vector of the tth word of the text description, averaging the vector representations of all the words to obtain an entity description vector hd
For an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure BDA0002528663340000131
wherein f isd(p)=||hd(e1),hd(e2)||1Is an entity distance scoring function, and calculates the Manhattan distance h between entity description vectorsd(e1),hd(e2) Respectively represent entities e1And e2The description vector of (1). Gamma raydIs a space constraint between description vectors.
Further, in the step 4), the distance between the entity pairs is calculated in the following specific manner:
the pair of entities p ═ of two different knowledge graphs (e)1,e2) The distance between them is calculated by the formula:
Figure BDA0002528663340000132
wherein d iss、dt、ddRepresenting the dimension of the entity structure vector, the dimension of the entity text vector, and the dimension of the entity description vector, α and β are hyper-parameters used to weigh the distance of the three parts, respectively.
If only entity structure vectors and entity text vectors are used, the pair of entities p of the two different knowledge-graphs is (e)1,e2) The distance between them is calculated by the formula:
Figure BDA0002528663340000133
where α is a hyper-parameter used to trade off the two-part distance.
Examples
As shown in fig. 3, an example of the method is given, and the specific steps implemented by the example are described in detail below in conjunction with the method of the present technology (the flow is shown in fig. 1, and the model is shown in fig. 2), as follows:
(1) and (3) entity structure vector coding based on the relation triples: and respectively constructing a structure diagram for the knowledge graphs of the two languages according to the relation triples. The structure diagram takes the entities as nodes (such as the entities 'Batman' and 'Batman'), forms edges (such as 'Batman' and 'superman', 'Batman' and 'S, i.e. erman') between the entities with relations, and calculates the specific weight of the edges according to the relations between the entities to form the adjacency matrix of the diagram. As shown in fig. 4, on the constructed structure diagram, a double-layer graph convolution network is adopted for training, and the graph convolution networks of the two knowledge graphs share a weight matrix. And optimizing the entity structure vector representation according to the pre-aligned cross-language aligned entity pair and the positive and negative example entity pair ternary loss functions.
(2) Entity text vector encoding based on entity description information: processing entity description information, filtering illegal characters, word segmentation, stop words and the like, and filtering words with low frequency in the corpus. And combining knowledge maps of the two languages, and constructing a uniform text graph by using the entity and the description text. The text graph has two types of nodes: entity nodes (such as "Batman" and "Batman") and word nodes in the entity description (such as "DC comics" and "DC comics"), have three types of edges: the "entity-descriptor" edge (e.g., "batman" - "hero"), the "descriptor-descriptor" edge within a single language (e.g., "DC caricature" - "batman"), and the "descriptor-descriptor" edge across languages (e.g., "DC caricature" - "DC communications"). Weights are calculated for each type of edge, forming an adjacency matrix. As shown in fig. 5, on the constructed text graph, a double-layer graph convolution network is adopted for training, and entity text vector representation is optimized according to a pre-aligned cross-language aligned entity pair S and a positive and negative example entity pair ternary loss function.
(3) Entity description vector coding based on entity description information and cross-language corpora: the method comprises the steps of pre-training word vectors aligned across languages by using Bilbwa on a single language material and a cross-language parallel language material of two languages, then using a series of word vectors described by each entity as input, and coding the entity description by using a bidirectional long-time and short-time memory network to obtain the entity description vectors. Optimizing the network structure by optimizing the distance between the entity description vectors of the pre-aligned cross-language aligned entity pair S to obtain the final description vectors of all entities.
(4) Computing a cross-language alignment entity pair from the multi-perspective entity vector: for each entity in one language knowledge graph, taking each entity of the other language knowledge graph as a candidate entity, calculating the distance between the entity and the candidate entity according to an entity structure vector, an entity text vector and an entity description vector (all 100 dimensions), selecting the entity pair with the minimum distance as an aligned entity pair, and finally obtaining the aligned entity pair ' Batman ' -Batman '.
The cross-language entity operation results of this example are shown in table 1, and the model of this method is denoted as STGCN. SE, TE and DE respectively represent entity structure coding, entity text coding and entity description coding. The evaluation index Hits @ k represents the probability that aligned entities are hit in the first k entities when aligned entities are found for all entities of the current language. The final experimental results exceeded the other methods shown on the chinese dataset of the public dataset DBP15K, and the alignment accuracy achieved reached 56.1%.
TABLE 1 Cross-language entity operational Experimental results
Figure BDA0002528663340000151
When the entity structure coding and the entity text coding are adopted, the effect of pre-aligning entity pairs according to different proportions on an English data set in the DBP15K is shown in FIG. 6, and compared with other methods, the best effect is always achieved when the data amount is small to large, and the advantage is greater when the data amount is large.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims (7)

1. A cross-language entity alignment method based on knowledge graph multi-view information is characterized by comprising the following steps:
1) and (3) entity structure vector coding based on the relation triples: constructing a structure diagram for the knowledge graphs of the two languages according to the relation triples; the structure diagram takes the entities as nodes, edges are formed between the entities with relationships, and the specific weight of the edges is calculated according to the relationships between the entities to form an adjacent matrix of the diagram; on the constructed structure chart, a double-layer graph convolution network is adopted for training, and the vector representation of the current entity is continuously updated by using the entity and the entity codes around the entity; the graph convolution networks of the two knowledge graphs share a weight matrix; and optimizing the entity structure vector representation according to the pre-aligned cross-language aligned entity pair S and the positive and negative example entity pair ternary loss functions.
2) Entity text vector encoding based on entity description information: combining knowledge maps of the two languages, and constructing a uniform text graph by using an entity and a description text; the text graph has two types of nodes: entity nodes and word nodes in an entity description, have three types of edges: an entity-descriptor side, a descriptor-descriptor side within a single language, a descriptor-descriptor side across languages; calculating the weight for each type of edge to form an adjacency matrix; on the constructed text graph, a double-layer graph convolution network is adopted for training, and entity text vector representation is optimized according to a pre-aligned cross-language alignment entity pair S and a positive and negative example entity pair ternary loss function.
3) Entity description vector coding based on entity description information and cross-language corpora: using Bilbwa to pre-train word vectors aligned across languages on a single language material and a cross-language parallel language material of two languages, then using a series of word vectors described by each entity as input, and coding the entity description by using a bidirectional long-and-short-term memory network (BilSTM) to obtain entity description vectors; optimizing the network structure by optimizing the distance between the entity description vectors of the pre-aligned cross-language aligned entity pair S to obtain the final description vectors of all entities.
4) Computing a cross-language alignment entity pair from the multi-perspective entity vector: regarding each entity in one language knowledge graph, taking each entity of the other language knowledge graph as a candidate entity, calculating the distance between the entity and the candidate entity according to the entity structure vector, the entity text vector and the entity description vector which are respectively obtained in the step 1) and the step 2) and the step 3), sorting the distances from small to large, and selecting the entity pair with the minimum distance as an aligned entity pair.
2. The method according to claim 1, wherein in step 1), the weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
1.1) weight calculation of adjacency matrix A: for entity eiAnd ejWeight a between themij∈ A is calculated as:
Figure FDA0002528663330000021
Figure FDA0002528663330000022
Figure FDA0002528663330000023
where fun (r) and ifun (r) are the influence scores of the relation r in the forward direction and the reverse direction, respectively, G is the knowledge graph, # Triples _ of _ r is the number of triplets in the relation triplets about the relation r, and # Head _ activities _ of _ r and # Tail _ activities _ of _ r are the number of Head Entities and the number of Tail Entities involved in the triplets of the relation r, respectively.
1.2) calculating entity vectors in the graph convolution network: output of graph convolution networkInto a matrix of physical structural features
Figure FDA0002528663330000024
Obtained by random initialization, n representing the total number of entities, dsRepresenting the dimension of the entity structure feature vector; the overall calculation formula of the graph convolution network of the structure diagram is as follows:
Figure FDA0002528663330000025
wherein
Figure FDA0002528663330000026
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,is that
Figure FDA0002528663330000028
A diagonal node degree matrix of (a); weight matrix Ws (0)And Ws (1)Are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
1.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive example entity-pair distance by randomly replacing entity e1Or e2Construct negative instance entity pair p '═ e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure FDA0002528663330000031
wherein f iss(p)=||hs(e1),hs(e2)||1Is a function of scoring the entity distance, calculating the Manhattan distance, h, between the entity structure vectorss(e1),hs(e2) Respectively represent entities e1,e2The structure vector of (1); gamma raysIs the spacing constraint between the structure vectors.
3. The method as claimed in claim 1, wherein in step 2), before the knowledge-graph is combined, entity description information is preprocessed to filter illegal characters, participles, stop words and the like, and to filter words with too low frequency in the corpus.
4. The method according to claim 1, wherein in step 2), the weight calculation of the adjacency matrix a and the entity vector calculation and loss function in the graph convolution network are specifically as follows:
2.1) weight calculation of adjacency matrix A: the weight of the three types of edges and the weight calculation mode of the text map adjacency matrix are specifically as follows:
2.1.1) "entity-descriptor" edge:
for the edge formed by the entity and the descriptor, the weight is calculated by using the word frequency-inverse document frequency (TF-IDF), and the calculation formula is as follows:
Figure FDA0002528663330000032
Figure FDA0002528663330000033
TFIDF(t,d)=TF(t,d)×IDF(t)
where TF (t, d) calculates the frequency with which the word t appears in the entity description d, nt,dIs the number of occurrences of the word t in the entity description d, ∑t′∈dnt′,dIs the total number of words in the entity description D, IDF (t) is the inverse document frequency of the word t in the entity description set D, | D | is the total number of entity descriptions in the entity description set, | { D ∈ D: t ∈ D } | is the entity description setThe number of entity descriptions containing the word t.
2.1.2) single language "descriptor-descriptor" side:
for the edges formed between the descriptors of the single language, firstly, the global word co-occurrence condition is calculated through a sliding window, then, the Point Mutual Information (PMI) of two words is calculated to obtain the weight, and for any two words i and j, the weight calculation formula is as follows:
Figure FDA0002528663330000041
Figure FDA0002528663330000042
Figure FDA0002528663330000043
where # W represents the number of sliding windows in all entity description corpuses, # W (i) represents the number of sliding windows containing word i, and # W (i, j) represents the number of sliding windows containing both word i and word j.
2.1.3) Cross-language "descriptor-descriptor" edge:
for the edges formed between the cross-language descriptors, connecting the words in each entity description text and all the words in the description of the aligned entity pair pairwise by using a pre-aligned cross-language aligned entity pair S, and calculating the frequency of each formed descriptor pair in the descriptor pairs formed by all the aligned entity pairs to enhance the cross-language information; for words i and j from two knowledge-graph entity descriptions, respectively, the weight calculation formula is:
Figure FDA0002528663330000044
where count (i, j) represents the number of word pairs consisting of words i and j of the text descriptions of all aligned entity pairs, and count (d) represents the number of word pairs consisting of text descriptions of all aligned entity pairs.
2.1.4) weight calculation mode of text image adjacency matrix:
Figure FDA0002528663330000045
2.2) calculating entity vectors in the graph convolution network: the input of the graph convolution network is an entity text characteristic matrix
Figure FDA0002528663330000046
Obtained by random initialization, n represents the total number of entities, m represents the total number of words, dtRepresenting the dimension of the entity text feature vector; the overall calculation formula of the graph convolution network of the text graph is as follows:
Figure FDA0002528663330000051
wherein
Figure FDA0002528663330000052
Adding unit matrix with equal dimension on the basis of adjacent matrix A, adding self information of current entity,
Figure FDA0002528663330000053
is that
Figure FDA0002528663330000054
A diagonal node degree matrix of (a); weight matrix Ws (0)And Ws (1)Are diagonal matrices and the activation function σ uses ReLU (·) max (0,).
2.3) loss function: for an entity pair p ═ (e)1,e2) ∈ S as a positive case entity pair distance, a negative case entity pair p 'is constructed by randomly replacing the entity e1 or e2 (e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure FDA0002528663330000055
wherein f ist(p)=||ht(e1),ht(e2)||1Is an entity distance scoring function, calculates the Manhattan distance, h, between entity text vectorst(e1),ht(e2) Respectively represent entities e1,e2The text vector of (2); gamma raytIs the spacing constraint between text vectors.
5. The method for aligning cross-language entities based on knowledge-graph multi-view information as claimed in claim 1, wherein the step 3) comprises the following steps:
3.1) corpus processing: the cross-language parallel corpus is processed to be sentence-aligned.
3.2) pre-training across language word vectors: cross-language word vector representations are trained using a cross-language word vector training model Bilbowa based on single-language corpora of two languages and sentence-aligned parallel corpora.
3.3) entity description vector coding: pre-training word vector sequence corresponding entity description with words
Figure FDA0002528663330000056
Representing, | s | is the total number of words in the entity description, ddFor the entity description vector dimension, the distance between aligned entity vectors is optimized by using BilSTM training to obtain the vector representation of the entity description, and the specific formula is as follows:
Figure FDA0002528663330000057
Figure FDA0002528663330000058
Figure FDA0002528663330000059
Figure FDA00025286633300000510
wherein h istCorresponding to the vector of the tth word of the text description, averaging the vector representations of all the words to obtain an entity description vector hd
For an entity pair p ═ (e)1,e2) ∈ S as a positive case entity pair distance, a negative case entity pair p 'is constructed by randomly replacing the entity e1 or e2 (e'1,e′2)∈Sp′,Sp' is a negative example set of entity pairs, then minimize the following objective function:
Figure FDA0002528663330000061
wherein f isd(p)=||hd(e1),hd(e2)||1Is an entity distance scoring function, calculates the Manhattan distance, h, between entity description vectorsd(e1),hd(e2) Respectively represent entities e1And e2The description vector of (2); gamma raydIs a space constraint between description vectors.
6. The method according to claim 5, wherein in step 3.1), the available cross-language parallel corpus can be used directly, or a part of corpus can be extracted from a monolingual corpus, and the cross-language parallel corpus can be obtained through a translation tool; processing cross-language parallel linguistic data into sentence alignment, completing operations of punctuation filtering, stop word removal and the like on the linguistic data, and then performing cross-language word vector pre-training.
7. The method according to claim 1, wherein in step 4), the distance between the entity pairs is calculated as follows:
the pair of entities p ═ of two different knowledge graphs (e)1,e2) The distance between them is calculated by the formula:
Figure FDA0002528663330000062
wherein d iss、dt、ddα and β are hyper-parameters used to weigh the distance of the three parts, respectively representing the dimension of the entity structure vector, the dimension of the entity text vector, and the dimension of the entity description vector;
if only entity structure vectors and entity text vectors are used, the pair of entities p of the two different knowledge-graphs is (e)1,e2) The distance between them is calculated by the formula:
Figure FDA0002528663330000063
where α is a hyper-parameter used to trade off the two-part distance.
CN202010512003.9A 2020-06-08 2020-06-08 Cross-language entity alignment method based on knowledge graph multi-view information Active CN111680488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010512003.9A CN111680488B (en) 2020-06-08 2020-06-08 Cross-language entity alignment method based on knowledge graph multi-view information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010512003.9A CN111680488B (en) 2020-06-08 2020-06-08 Cross-language entity alignment method based on knowledge graph multi-view information

Publications (2)

Publication Number Publication Date
CN111680488A true CN111680488A (en) 2020-09-18
CN111680488B CN111680488B (en) 2023-07-21

Family

ID=72453997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010512003.9A Active CN111680488B (en) 2020-06-08 2020-06-08 Cross-language entity alignment method based on knowledge graph multi-view information

Country Status (1)

Country Link
CN (1) CN111680488B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287123A (en) * 2020-11-19 2021-01-29 国网湖南省电力有限公司 Entity alignment method and device based on edge type attention mechanism
CN112287126A (en) * 2020-12-24 2021-01-29 中国人民解放军国防科技大学 Entity alignment method and device suitable for multi-mode knowledge graph
CN112380864A (en) * 2020-11-03 2021-02-19 广西大学 Text triple labeling sample enhancement method based on translation
CN113487088A (en) * 2021-07-06 2021-10-08 哈尔滨工业大学(深圳) Traffic prediction method and device based on dynamic space-time diagram convolution attention model
CN113987121A (en) * 2021-10-21 2022-01-28 泰康保险集团股份有限公司 Question-answer processing method, device, equipment and readable medium of multi-language reasoning model
CN114357114A (en) * 2022-01-04 2022-04-15 新华智云科技有限公司 Entity cleaning method and system based on unsupervised learning
CN114896394A (en) * 2022-04-18 2022-08-12 桂林电子科技大学 Event trigger detection and classification method based on multi-language pre-training model
CN115795060A (en) * 2023-02-06 2023-03-14 吉奥时空信息技术股份有限公司 Entity alignment method based on knowledge enhancement
CN117435714A (en) * 2023-12-20 2024-01-23 湖南紫薇垣信息系统有限公司 Knowledge graph-based database and middleware problem intelligent diagnosis system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232186A (en) * 2019-05-20 2019-09-13 浙江大学 The knowledge mapping for merging entity description, stratification type and text relation information indicates learning method
CN110704576A (en) * 2019-09-30 2020-01-17 北京邮电大学 Text-based entity relationship extraction method and device
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232186A (en) * 2019-05-20 2019-09-13 浙江大学 The knowledge mapping for merging entity description, stratification type and text relation information indicates learning method
CN110704576A (en) * 2019-09-30 2020-01-17 北京邮电大学 Text-based entity relationship extraction method and device
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HONG YANG等: "《Guiding Cross-lingual Entity Alignment via Adversarial Knowledge Embedding》" *
张鸿,吴飞等: "《一种基于内容相关性的跨媒体检索方法》" *
杨茜: "《知识图谱中多粒度关系链接技术研究》" *
王巍巍;王志刚;潘亮铭;刘阳;张江涛;: "双语影视知识图谱的构建研究" *
苏佳林;王元卓;靳小龙;程学旗;: "自适应属性选择的实体对齐方法" *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380864A (en) * 2020-11-03 2021-02-19 广西大学 Text triple labeling sample enhancement method based on translation
CN112287123A (en) * 2020-11-19 2021-01-29 国网湖南省电力有限公司 Entity alignment method and device based on edge type attention mechanism
CN112287123B (en) * 2020-11-19 2022-02-22 国网湖南省电力有限公司 Entity alignment method and device based on edge type attention mechanism
CN112287126B (en) * 2020-12-24 2021-03-19 中国人民解放军国防科技大学 Entity alignment method and device suitable for multi-mode knowledge graph
CN112287126A (en) * 2020-12-24 2021-01-29 中国人民解放军国防科技大学 Entity alignment method and device suitable for multi-mode knowledge graph
CN113487088A (en) * 2021-07-06 2021-10-08 哈尔滨工业大学(深圳) Traffic prediction method and device based on dynamic space-time diagram convolution attention model
CN113987121A (en) * 2021-10-21 2022-01-28 泰康保险集团股份有限公司 Question-answer processing method, device, equipment and readable medium of multi-language reasoning model
CN114357114A (en) * 2022-01-04 2022-04-15 新华智云科技有限公司 Entity cleaning method and system based on unsupervised learning
CN114896394A (en) * 2022-04-18 2022-08-12 桂林电子科技大学 Event trigger detection and classification method based on multi-language pre-training model
CN114896394B (en) * 2022-04-18 2024-04-05 桂林电子科技大学 Event trigger word detection and classification method based on multilingual pre-training model
CN115795060A (en) * 2023-02-06 2023-03-14 吉奥时空信息技术股份有限公司 Entity alignment method based on knowledge enhancement
CN117435714A (en) * 2023-12-20 2024-01-23 湖南紫薇垣信息系统有限公司 Knowledge graph-based database and middleware problem intelligent diagnosis system
CN117435714B (en) * 2023-12-20 2024-03-08 湖南紫薇垣信息系统有限公司 Knowledge graph-based database and middleware problem intelligent diagnosis system

Also Published As

Publication number Publication date
CN111680488B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111680488B (en) Cross-language entity alignment method based on knowledge graph multi-view information
CN108197111B (en) Text automatic summarization method based on fusion semantic clustering
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
Alkhatlan et al. Word sense disambiguation for arabic exploiting arabic wordnet and word embedding
CN109446333A (en) A kind of method that realizing Chinese Text Categorization and relevant device
CN112417854A (en) Chinese document abstraction type abstract method
CN109101490B (en) Factual implicit emotion recognition method and system based on fusion feature representation
Panda Developing an efficient text pre-processing method with sparse generative Naive Bayes for text mining
CN107180026A (en) The event phrase learning method and device of a kind of word-based embedded Semantic mapping
Ouyang et al. Spatial pyramid pooling mechanism in 3D convolutional network for sentence-level classification
CN114969304A (en) Case public opinion multi-document generation type abstract method based on element graph attention
CN111144410A (en) Cross-modal image semantic extraction method, system, device and medium
Errami et al. Sentiment Analysis onMoroccan Dialect based on ML and Social Media Content Detection
CN112163089A (en) Military high-technology text classification method and system fusing named entity recognition
Aggarwal et al. " Did you really mean what you said?": Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings
Jin et al. Multi-label sentiment analysis base on BERT with modified TF-IDF
Jia et al. Attention in character-based BiLSTM-CRF for Chinese named entity recognition
CN117251524A (en) Short text classification method based on multi-strategy fusion
Basu et al. Multimodal sentiment analysis of# metoo tweets using focal loss (grand challenge)
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
Kumar et al. A reliable technique for sentiment analysis on tweets via machine learning and bert
Nabil et al. Cufe at semeval-2016 task 4: A gated recurrent model for sentiment classification
Sarhan et al. Arabic relation extraction: A survey
Uddin et al. Extracting severe negative sentence pattern from bangla data via long short-term memory neural network
Dongjie et al. Multimodal knowledge learning for named entity disambiguation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant