CN107545033B

CN107545033B - Knowledge base entity classification calculation method based on representation learning

Info

Publication number: CN107545033B
Application number: CN201710608234.8A
Authority: CN
Inventors: 李涓子; 侯磊; 金海龙; 张鹏
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-07-24
Filing date: 2017-07-24
Publication date: 2020-12-01
Anticipated expiration: 2037-07-24
Also published as: CN107545033A

Abstract

The invention relates to a computing device for knowledge base entity classification based on representation learning, and relates to the field of text classification and knowledge base completion. The method comprises the following steps: constructing a co-occurrence network containing different levels of information for entities in a knowledge base, and encoding co-occurrence information among words, entities, words, categories, words and categories into the network; learning vector representations of entities and categories based on the constructed co-occurrence network using a network-based representation learning method; learning a mapping matrix for entities and categories based on the vector representation obtained by learning by using a learning ordering algorithm, wherein semantically related entities and categories are close to each other in a semantic space; and automatically distributing categories for the entities in the knowledge base by using a top-down searching method to obtain a category path. The method of the invention is beneficial to solving the problems existing in the existing entity classification method.

Description

Knowledge base entity classification calculation method based on representation learning

Technical Field

The invention relates to the technical field of text classification and knowledge base completion, in particular to a knowledge base entity classification calculation method based on representation learning.

Background

This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present invention, and is believed to provide the reader with useful background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that the description in this section is for purposes of illustration and is not an admission of prior art.

In recent years, knowledge bases have attracted increasing research interest. Most of the existing knowledge bases are imperfect, and a plurality of researchers are dedicated to the work of completing the knowledge bases. Assigning classes to entities in a knowledge base is an important task for completion of the knowledge base. The entity category information plays a very important role in the knowledge base, and is beneficial to tasks such as a question and answer system, a recommendation system, relationship extraction and the like. The current main direction of research is to assign fine-grained categories to entities, as fine-grained categories can provide richer semantic information.

The existing research generally adopts a multi-classification algorithm in machine learning to classify entities in a knowledge base, i.e., an entity classification task in the knowledge base is regarded as a traditional text classification problem in natural language processing. The main steps are that some characteristics based on the knowledge base are defined, and then the prediction of the category is realized by using the traditional multi-classification algorithm. In recent years, the rapid development of learning technology is shown, and great help is provided for entity classification tasks, and the common method is to respectively define characteristics for entities and classes, and then map the characteristics of the entities and the classes into the same semantic space, so as to realize the reasoning calculation of the entity classes, and obtain better effect.

However, existing entity classification algorithms face 2 major problems: firstly, effective characteristics are difficult to design for entities in the knowledge base, different from entities appearing in context, the included semantic information is less, and the entities in the knowledge base include rich text information and structural information, so that the entities in the knowledge base need to be represented in a reasonable mode; secondly, the hierarchical relationship among the categories is not fully considered, the categories in the knowledge base form a tree structure containing corresponding structural information, and the hierarchical structure of the classification tree is not fully considered in the existing method.

Disclosure of Invention

The technical problem to be solved is how to provide a calculation method for knowledge base entity classification based on representation learning.

Aiming at the defects in the prior art, the invention provides a calculation method for entity classification of a knowledge base based on representation learning, which can better solve the problems in the entity classification method in the prior knowledge base.

In a first aspect, the present invention provides a computing device for knowledge base entity classification based on representation learning, comprising the steps of:

a: for entities in a given class labeled knowledge base, constructing a co-occurrence network with 4 layers of words, entities, words, classes, words and entities, and integrating semantic information into 4 heterogeneous co-occurrence networks;

b: based on the 4 heterogeneous co-occurrence networks, learning to obtain vector representation of each entity and each category by using a network-based representation learning algorithm;

c: based on the vector representation of the entities and the categories, learning a mapping matrix of the entities and the categories by using a learning sorting algorithm, and mapping the entities and the categories to the same semantic space;

d: and calculating the similarity between the entity and the category according to the vector representation and the mapping matrix, and distributing a category path to the unmarked entity by using a top-down searching method.

Optionally, the step a includes:

a1: constructing word-word co-occurrence network G_wwCo-occurrence information for describing the level of wording in the entity description, formally denoted as G_ww＝(V,E_ww) Each node represents a word, the edge-to-ground weight ω_ijRepresenting the number of co-occurrences of two words in the text;

a2: constructing entity-word co-occurrence network G_ewIs a bipartite graph composed of entity and word, and is formally expressed as

Side-to-ground weight ω_ijRepresents a word w_jIn an entity e_iThe number of occurrences in the textual description of (1);

a3: constructing type-word co-occurrence network G_twIs a two composed of type and wordIs represented formally as

Side-to-ground weight ω_ijRepresents a word w_jIn a type t_iThe number of occurrences in (a);

a4: constructing entity-type co-occurrence network G_etIs a bipartite graph composed of entity and type, formally expressed as

Entity e_iAnd category t_jThere is a side (omega) between_ij1) if and only if the entity e_iBelong to the category t_j

Wherein, ω is_ijRepresenting the weight on an edge; w is a_iRepresents a word; t is t_iRepresenting a category; e.g. of the type_iRepresents an entity e_iA vector representation of (a); t is t_iRepresents a category t_iIs represented by a vector of (a).

Optionally, the step B includes the steps of:

based on 4 heterogeneous co-occurrence networks G that obtain_ww、G_ew、G_twAnd G_etLearning each entity e using PTE algorithm_iAnd category t_jA vector representation of (a);

b1: for any bipartite graph G ═ V_A∪V_B,E)，V_AAnd V_BIs a set of disjoint points, E is a set of edges, defining v_j∈V_BGenerating v_i∈V_AThe conditional probability of (a) is:

wherein the content of the first and second substances,

and

are each v_iAnd v_jFor arbitrary v, for_j∈V_BCan be defined as V_AConditional distribution p (. | v) on all nodes in_j)；

B2: based on the conditional distribution for each point defined by B1, for v_j,∈V_BMake the condition distribution p (· | v)_j) Near empirical distribution

The closeness between the two distributions is measured by the KL divergence:

wherein λ_j＝∑_iw_ijIndicating point v_jThe calculation method of the empirical distribution is as follows:

the target function is simplified into O ═ Sigma_(i,j)∈Ew_ijlog(p(v_j|v_i))；

B3: based on the objective function defined by B2, for each bipartite graph defined in A, a corresponding objective function O is defined_ww、O_ew、O_etAnd O_twSumming the objective functions:

O_n＝O_ww+O_ew+O_et+O_tw

joint optimization to obtain vector representation of each entity and class, E_emb＝{e_iAnd T_emb＝{t_i}。

Optionally, the step C includes the steps of:

c1: defining a priority relationship between the two categories;

c2: based on the priority relationship between the categories defined by C1, learning the mapping matrix of the entities and the categories, and mapping the entities and the categories into the same semantic space, wherein the semantically related entities and categories are also close to each other in the semantic space:

Φ_e(e_i)＝U·e_i

Φ_t(t_j)＝V·t_j

wherein G is_wwRepresenting a word-word co-occurrence network; v represents the set of all words; e_wwRepresenting a set of edges in a word-word co-occurrence network; g_ewRepresenting an entity-word co-occurrence network; represents a collection of all entities; e_ewRepresenting a set of edges in an entity-word co-occurrence network; g_twRepresenting a category-word co-occurrence network;

represents a collection of all categories; e_twRepresenting a set of edges in a category-word co-occurrence network; g_etRepresenting entity-class co-occurrence networks; e_etRepresenting a set of edges in an entity-class co-occurrence network; g denotes a bipartite graph, V_AAnd V_BIs a set of two disjoint points in graph G, E is a set of edges in graph G; p (v)_i|v_j) Represents V_BA point v in_jGenerating V_AA point v in_iThe conditional probability of (a);

and

are each v_iAnd v_jA vector representation of (a); exp is an exponential function; p (· | v)_j) Represents V_BA point v in_jGenerating V_AConditional distribution of all nodes in the tree;

represents p (· | v)_j) A corresponding empirical distribution; o is_ww、O_ew、O_etAnd O_twRespectively representing network representation learning method in word-word co-occurrence network G_wwEntity-word co-occurrence network G_ewEntity-class co-occurrence network G_etAnd category-word co-occurrence network G_twThe objective function of (1). O is_nRepresenting an overall objective function of the network representation learning method on four heterogeneous networks; u represents a mapping matrix or a projection matrix corresponding to the entity vector; phi_e(e_i) Representing an entity vector e_iIs calculated by using the projection matrix U; v represents a mapping matrix or a projection matrix corresponding to the category vector; phi_t(t_j) Represents a category vector t_jIs calculated using the projection matrix V; s (e)_i,t_j) Representing an entity e_iAnd category t_jThe similarity of (c).

Optionally, in the step C2, the priority relationship between the two categories includes:

wherein l (t)_i,t_j) Represents a category t_iAnd category t_jThe distance in the classification tree, the objective function based on the first priority relationship, is defined as:

wherein p (e) represents a category path of an entity, A (t)_k) Represents a category t_kThe node of the ancestor node of (c),

mapping the rank to a weight of a floating point number, s (e, t)_k) Represents phi_e(e) And phi_t(t_k) The inner product of (d).

wherein S (t)_k) Represents a category t_kThe sibling nodes of (1). And summing all entities with class marking information to obtain an objective function:

and solving the objective function by adopting a Stochastic Gradient Descent (SGD) algorithm, and learning to obtain mapping matrixes U and V of the entities and the classes.

Optionally, in the step D,

and B, predicting the category path of the unmarked entity by adopting a top-down search strategy based on the vector representation of the entity and the category obtained in the step B and the mapping matrix obtained in the step C.

Optionally, in the step D,

starting from a root node of the classification tree, finding the class which is the most matched with the current entity in each layer by calculating the similarity between the entity and the class, and recursively searching until the similarity is terminated at a leaf node or is lower than a certain threshold, wherein the similarity between the entity and the class is calculated in the following way:

s(e_i,t_j)＝Φ_e(e_i)·Φ_t(t_j)

vector representation of entities and categories is used when calculating similarity (e)_iAnd t_j) And entity and mapping matrix (U and V) of the classification, the whole process is a search process from top to bottom, the predicted result naturally forms a classification path, meet the requirement of the entity classification task of fine granularity.

According to the technical scheme, the method for calculating the entity classification of the knowledge base based on representation learning, provided by the invention, constructs an information network by using the text description of the entity, and then obtains the low-dimensional dense vector representation of the entity and the class through learning in the network, so that the characteristics do not need to be manually defined for the entity, and the problem of entity representation is effectively solved; by utilizing a learning sequencing algorithm, the priority relationship between the two types is defined, the entity and the category are mapped into the same semantic space, the hierarchical relationship between the categories is fully considered, and the problem of hierarchical classification is effectively solved. The method starts from large-scale texts, constructs networks containing different information, obtains vector representation of entities and categories by using a representation learning algorithm, does not need to manually define characteristics, and effectively solves the problem of difficult entity representation in a knowledge base. On the other hand, a learning sorting algorithm is adopted, the entity and the category are mapped into the same semantic space by defining the priority relationship between the two categories, so that top-down category reasoning is realized, the hierarchical relationship between the categories is effectively considered in the model, and the method is suitable for the problem of hierarchical classification.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for computing knowledge base entity classification based on representation learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of a computing method for knowledge base entity classification based on representation learning according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 and 2, the invention provides a computing device flow diagram based on knowledge base entity classification representing learning. As shown in fig. 1, the method includes:

step A: 4 heterogeneous co-occurrence networks are constructed, namely word-word (word-word), entity-word (entity-word), category-word (type-word) and entity-category (entity-type) co-occurrence networks, and each network can be regarded as a bipartite graph.

The step A specifically comprises the following steps:

a1: constructing word-word co-occurrence network G_wwCo-occurrence information for describing the level of wording in the entity description, formally denoted as G_ww＝(V,E_ww) Each node represents a word, the edge-to-ground weight ω_ijRepresenting the number of co-occurrences of two words in the text (given the co-occurrence window).

Side-to-ground weight ω_ijRepresents a word w_jIn an entity e_iThe number of occurrences in the text description of (1).

A3: constructing type-word co-occurrence network G_twIs a bipartite graph composed of type and word, formally represented as

Side-to-ground weight ω_ijRepresents a word w_jIn a type t_iThe specific calculation mode is to respectively calculate w_jPresent in each of the categories t_iThe number of times in the text description of the entity in (a) and summing them to obtain w_jIn type t_iTotal number of occurrences in all entities below.

Entity e_iAnd category t_jThere is a side (omega) between_ij1) if and only if the entity e_iBelong to the category t_j。

And B: based on 4 heterogeneous co-occurrence networks G obtained in the step A_ww、G_ew、G_twAnd G_etLearning each entity e using a network-based representation learning algorithm_iAnd category t_jThe vectors of (1) are representative of semantically similar entities having similar representations, and semantically similar categories having similar representations.

The step B specifically comprises the following steps:

wherein the content of the first and second substances,

and

are each v_iAnd v_jIs represented by a vector of (a). To pairAt any v_j∈V_BCan be defined as V_AConditional distribution p (. | v) on all nodes in_j)。

The closeness between the two distributions is measured by the KL divergence:

the target function is simplified into O ═ Sigma_(i,j)∈Ew_ijlog(p(v_j|v_i))。

O_n＝O_ww+O_ew+O_et+O_tw

And C: and B, based on the vector representation of the entities and the categories obtained in the step B, utilizing a Learning ordering algorithm (Learning to Rank) to learn a mapping matrix of the entities and the categories, and mapping the entities and the categories into the same semantic space, wherein the entities and the categories which are similar semantically are also close to each other in the semantic space.

The step C specifically comprises the following steps:

c1: a precedence relationship between the two categories is defined. First, in the category path corresponding to an entity, a more specific category is closer to the entity than a more general category, which is called an operator order. Second, the correct class is closer to the entity than the sibling class in the classification tree, called the filing order.

Φ_e(e_i)＝U·e_i

Φ_t(t_j)＝V·t_j

step D: and B, predicting the category path of the unmarked entity by adopting a top-down search strategy based on the vector representation of the entity and the category obtained in the step B and the mapping matrix obtained in the step C. Starting from a root node of the classification tree, finding the class which is the most matched with the current entity in each layer by calculating the similarity between the entity and the class, and recursively searching until the similarity is terminated at a leaf node or is lower than a certain threshold, wherein the similarity between the entity and the class is calculated in the following way:

s(e_i,t_j)＝Φ_e(e_i)·Φ_t(t_j)

vector representation of entities and categories is used when calculating similarity (e)_iAnd t_j) And entity and category mapping matrices (U and V), the whole process is a top-down search process, and a category path is naturally formed by a predicted result.

The following is a description of the formula letters involved in the present invention:

ω_ijbroadly refers to the weight (unlimited subscript) on an edge.

w_iA term (without limitation a subscript) is intended to be generic.

e_iGenerally refers to an entity (without limitation a subscript).

t_iBroadly refers to a category(without limitation to subscripts).

e_iBroadly refers to an entity e_iThe vector of (a) represents (without subscript).

t_iBroadly refers to a category t_iThe vector of (a) represents (without subscript).

G_wwRepresenting a word-word co-occurrence network.

V denotes the set of all words.

E_wwRepresenting a collection of edges in a word-word co-occurrence network.

G_ewRepresenting an entity-word co-occurrence network.

Representing a collection of all entities.

E_ewRepresenting a collection of edges in an entity-word co-occurrence network.

G_twA category-word co-occurrence network is represented.

Representing a collection of all categories.

E_twRepresenting a set of edges in a category-word co-occurrence network.

G_etRepresenting entity-class co-occurrence networks.

E_etRepresenting a collection of edges in an entity-class co-occurrence network.

G generally denotes a bipartite graph, V_AAnd V_BIs the set of two disjoint points in graph G, and E is the set of edges in graph G.

p(v_i|v_j) Represents V_BA point v in_jGenerating V_AA point v in_iThe conditional probability of (2).

And

are each v_iAnd v_jIs represented by a vector of (a).

exp is an exponential function.

p(·|v_j) Represents V_BA point v in_jGenerating V_AThe condition distribution of all nodes in the tree.

Represents p (· | v)_j) A corresponding empirical distribution.

O_ww、O_ew、O_etAnd O_twRespectively representing network representation learning method in word-word co-occurrence network G_wwEntity-word co-occurrence network G_ewEntity-class co-occurrence network G_etAnd category-word co-occurrence network G_twThe objective function of (1). O is_nThe representation network represents the overall objective function of the learning method on four heterogeneous networks.

U represents a mapping matrix or a projection matrix corresponding to the entity vector.

Φ_e(e_i) Representing an entity vector e_iIs calculated using the projection matrix U.

V represents a mapping matrix or a projection matrix corresponding to the category vector.

Φ_t(t_j) Represents a category vector t_jIs calculated using the projection matrix V.

s(e_i,t_j) Representing an entity e_iAnd category t_jThe similarity of (c).

Experiments are carried out by adopting the method of the invention, and the specific experimental process is as follows:

1. introduction of data sets. The data set is constructed by using the classification tree of Dbpedia and the text description in Wikipedia, each entry in Wikipedia has a unique category path (a path in the classification tree corresponding to Dbpedia), and the text of wikipedia is used as the text description of each entity. A total of 3 data sets were constructed: (1) the text information of each wiki is used as the text description of the entity. (2) The abstract portion of each wiki entry is used as a textual description of the entity. (3) And carrying out word stem processing on the text of the entry, and using the word stem as the text description of the entity. The relevant information of the data set is shown in table 1.

TABLE 1 correlation statistics of data sets

Data set	Full text	Abstract	Word drying
				Types	451	451	450
Entities	3,087,751	2,536,198	2,847,568
				Words	31,752	17,451	25,430
G_etedges	7,757,347	6,340,495	7,190,233
				G_ewedges	418,527,303	247,165,283	334,632,976
G_twedges	6,743,100	3,184,492	4,730,374
				G_wwedges	377,267,923	147,490,406	224,829,203

For full-text datasets, low frequency words are filtered by a threshold of 1500. For the summarized dataset, low frequency words are filtered by a threshold of 1000. The training data and test data were divided at the scale of 80/20.

2. And (4) setting an experiment. As with previous work, Strict-F1, Mi-F1 and Ma-F1 were used to evaluate the effect of the experiment. The comparison method comprises the following steps: a Tipalo model, an SDType model, a FIGMENT model, a CUTE model and a CE/HCE model; and comparative experiments on their own. The first 4 of these are traditional entity classification algorithms, and CE/HCE is an entity classification algorithm based on representation learning. Self-contained contrast experiments are used to test the role of a word-word network.

3. Results and analysis of the experiments

Using the above data sets and experimental setup, we tested the method of the present disclosure on each data set and compared it to the mainstream method above (the method of the present disclosure is denoted by EFHET). As shown in table 2, the evaluation results of the entity classification are shown. On each data set, the EFHET is obviously superior to the comparison method under 3 evaluation indexes, and the accuracy and the stability of the method disclosed by the invention are proved.

TABLE 2 analysis of entity Classification results in the knowledge base

And (5) analyzing an experimental result. First, the EFHET approach performs better than several popular entity classification algorithms. The EFHET method mainly utilizes more structured information, semantically related entities have similar representation and semantically close categories also have similar representation in the network-based representation learning process; in the process of learning the mapping matrix of the entity and the category, a bridge between the entity and the category is established through the defined priority relationship between the two categories, and the method has strong distinguishing capability in the classifying process, so the effect is better.

In addition, EFHET has significant advantages over CE/HCE, which is a representation-based learning method. The main reason is that the CE/HCE method depends on entity pairs appearing in the context, the co-occurrence relationship between the entities is different from the co-occurrence relationship between words, the co-occurrence relationship is very sparse, and the CE/HCE method has great noise and small data volume and can influence the experimental effect. And the EFHET starts from a large-scale text, only utilizes the naive text information, starts from a word angle, has larger data scale and naturally better effect.

Finally, in the comparison of the network and the network, the word-word network can be seen to have certain help to the final experiment effect. For example, "computer" and "computer" are similar words, and similar expressions can be obtained in the course of expression learning. "computer" and "computer" may each often appear in different entities, but these entities are likely to have similar representations because of the relationship between "computer" and "computer". The Word-Word network solves the problem of synonyms to a certain extent, and improves the final result to a certain extent.

In summary, the method for calculating the entity classification of the knowledge base based on representation learning provided by the invention constructs an information network by using the text description of the entity, and then learns from the network to obtain the low-dimensional dense vector representation of the entity and the class, without manually defining the characteristics for the entity, thereby effectively solving the problem of entity representation; by utilizing a learning sequencing algorithm, the priority relationship between the two types is defined, the entity and the category are mapped into the same semantic space, the hierarchical relationship between the categories is fully considered, and the problem of hierarchical classification is effectively solved. The method starts from large-scale texts, constructs networks containing different information, obtains vector representation of entities and categories by using a representation learning algorithm, does not need to manually define characteristics, and effectively solves the problem of difficult entity representation in a knowledge base. On the other hand, a learning sorting algorithm is adopted, the entity and the category are mapped into the same semantic space by defining the priority relationship between the two categories, so that top-down category reasoning is realized, the hierarchical relationship between the categories is effectively considered in the model, and the method is suitable for the problem of hierarchical classification.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for computing a classification of an entity based on a knowledge base of representation learning, comprising:

2. The method of claim 1, wherein step a comprises:

a2: constructing entity-word co-occurrence network G_ewIs a bipartite graph composed of entity and word, formally denoted as G_ew＝(∪V,E_ew) On the side ofThe ground weight ω_ijRepresents a word w_jIn an entity e_iThe number of occurrences in the textual description of (1);

Entity e_iAnd category t_jThere is a side omega_ij1, if and only if the entity e_iBelong to the category t_j；

Wherein, ω is_ijRepresenting the weight on an edge; w is a_iRepresents a word; t is t_iRepresenting a category; e.g. of the type_iRepresents an entity; g_wwRepresenting a word-word co-occurrence network; v represents the set of all words; e_wwRepresenting a set of edges in a word-word co-occurrence network; g_ewRepresenting an entity-word co-occurrence network; represents a collection of all entities; e_ewRepresenting a set of edges in an entity-word co-occurrence network; g_twRepresenting a category-word co-occurrence network;

represents a collection of all categories; e_twRepresenting a set of edges in a category-word co-occurrence network; g_etRepresenting entity-class co-occurrence networks; e_etRepresenting a collection of edges in an entity-class co-occurrence network.

3. The method of claim 1, wherein the step B comprises the steps of:

based on obtained4 heterogeneous co-occurrence networks G_ww、G_ew、G_twAnd G_etLearning each entity e using PTE algorithm_iAnd category t_jA vector representation of (a);

wherein the content of the first and second substances,

and

The closeness between the two distributions is measured by the KL divergence:

B3: defining a corresponding objective function O for each bipartite graph defined in the step A based on the objective function defined in B2_ww、O_ew、O_etAnd O_twSumming the objective functions:

O_n＝O_ww+O_ew+O_et+O_tw

joint optimization to obtain vector representation of each entity and class, E_emb＝{e_iAnd T_emb＝{t_i}；

Wherein, O_ww、O_ew、O_etAnd O_twRespectively representing network representation learning method in word-word co-occurrence network G_wwEntity-word co-occurrence network G_ewEntity-class co-occurrence network G_etAnd category-word co-occurrence network G_twAn objective function of (1); omega_ijRepresenting the weight on one edge, j representing the number corresponding to the entity different from i, i' representing the set V_AOr V_BNumber, V, corresponding to any one of the entities_AAnd V_BRespectively, two disjoint sets of points in graph G are shown, a representing a first point set category and B representing a second point set category.

4. The method of claim 1, wherein said step C comprises the steps of:

c1: defining a priority relationship between the two categories;

Φ_e(e_i)＝U·e_i

Φ_t(t_j)＝C·t_j

and

represents p (· | v)_j) A corresponding empirical distribution; o is_ww、O_ew、O_etAnd O_twRespectively representing network representation learning method in word-word co-occurrence network G_wwEntity-word co-occurrence network G_ewEntity-class co-occurrence network G_etAnd category-word co-occurrence network G_twAn objective function of (3), O_nRepresenting an overall objective function of the network representation learning method on four heterogeneous networks; u representsA mapping matrix or a projection matrix corresponding to the entity vector; phi_e(e_i) Representing an entity vector e_iIs calculated by using the projection matrix U; phi_t(t_j) Represents a category vector t_jIs calculated using C; s (e)_i,t_j) Representing an entity e_iAnd category t_jThe similarity of (c).

5. The method according to claim 4, wherein in the step C2, the priority relationship between two categories comprises:

mapping the rank to a weight of a floating point number, s (e, t)_k) Represents phi_e(e) And phi_t(t_k) Root represents the root node in the classification tree.

6. The method according to claim 4, wherein in the step C2, the priority relationship between two categories comprises:

wherein S (t)_k) Represents a category t_kThe sibling nodes of (a), (b), (c) represent a class path of an entity, s (e, t)_k) Represents phi_e(e) And phi_t(t_k) Inner product of, t_k′And (3) representing an ancestor category or a brother category of any category of the entity e, and summing all entities with category marking information to obtain an objective function:

and solving the objective function by adopting a random gradient descent (SGD) algorithm, and learning to obtain mapping matrixes U and C of the entities and the classes.

7. The method according to claim 1, wherein in step D,

8. The method according to claim 1, wherein in step D,

s(e_i,t_j)＝Φ_e(e_i)·Φ_t(t_j)

wherein phi_e(e_i) Representing an entity vector e_iOf a mapping function of phi_t(t_j) Represents a category vector t_jA mapping function of (a);

when calculating the similarity, the vector representation e of the entity and the category is used_iAnd t_jAnd entity and category mapping matrixes U and C, wherein the whole process is a top-down searching process, and a category path is naturally formed by a predicted result so as to meet the requirement of a fine-grained entity classification task.