CN112632291A

CN112632291A - Inductive atlas characterization method for ontology concept enhancement

Info

Publication number: CN112632291A
Application number: CN202011534627.7A
Authority: CN
Inventors: 徐童; 任超; 张乐; 高子彭; 杜逸超; 陈恩红
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-09
Anticipated expiration: 2040-12-23
Also published as: CN112632291B

Abstract

The invention discloses a method for characterizing an induction type atlas for enhancing ontology concepts, which can generate ontology concept representations containing rich information through a double-layer attention mechanism, and can be used for effectively improving the embedding effect of a newly added entity; fusing a plurality of concepts corresponding to a newly added entity by using an attention mechanism determined by the relationship to generate an entity template vector; and then the template vector further fuses personalized features provided by neighbors in the newly added entity to generate a vector representation of the newly added entity, and finally, the effect of the atlas completion task is effectively improved.

Description

Inductive atlas characterization method for ontology concept enhancement

Technical Field

The invention relates to the field of knowledge graph representation learning in natural language processing, in particular to a method for characterizing an inductive graph with enhanced ontology concept.

Background

The knowledge graph comprises a plurality of example entity triples, wherein the example entity triples can be expressed in a form of (head entity, relation and tail entity), and one example entity triplet represents a piece of knowledge. Knowledge maps play an increasingly important role in tasks such as information retrieval, question answering, recommendation and the like, and the application range of the knowledge maps is also continuously expanded. However, the existing knowledge spectrograms generally have the problem of not being complete enough, i.e. the relationships between a large number of entities are still not included in the spectrogram. The graph completion task based on representation learning aims to learn vector representation of entities and relations and predict missing relations between the entities based on the entity and relation vectors. Conventional direct-push representation learning methods assume that all test entities are visible during the training phase. However, in a real scene, the atlas is continuously improved after being constructed, so that new entities continuously appear in the atlas. The direct-push representation learning method needs to retrain the whole graph to obtain the representation of the newly added entity, which is very inefficient and resource-consuming.

Therefore, the induction type atlas representation learning method aims at inductively generating the representation of the newly added entity, thereby saving resources and meeting the requirement of real-time calculation. At present, there are a few related technical schemes and research achievements for the inductive expression learning method of the newly added entity, and some representative public technologies include: CN202010809387.0, a local training method for knowledge graph representation learning, which obtains the initialized representation of the newly added entity by using the vector representation of the original entity and relationship of the graph according to the TransE model, and then carries out fine tuning. CN201911380039.X, a knowledge graph representation learning method based on anchor points, which utilizes text information as semantic basis of newly added entities and combines related local knowledge of existing knowledge graphs for training.

The prior art can be divided into two categories: (1) the method based on the internal neighbors of the new entity (for example, patent CN202010809387.0) generally uses a graph convolutional neural network model to fuse the internal neighbors of the new entity, and then inductively generates the representation of the new entity. (2) Methods based on description information of new entities (for example, patent cn201911380039.x) generally use text or image description information of the new entities, and adopt text or image embedding tools to obtain vector representations of the new entities.

However, with the method in (1), the representation of the new entity generated by the simple fusion algorithm is often not accurate enough due to the sparsity and heterogeneity of the internal neighbors of the new entity. For the method in (2), the effect of the characterization is highly dependent on the quality of the description information. In practical applications, it is difficult to obtain high-quality description information that meets the requirements.

Disclosure of Invention

The invention aims to provide a method for characterizing a generalized atlas with enhanced ontology concepts, which can be used for generating a characterization of a new entity in a generalized manner, so that the characterization of the new entity is more accurate and efficient, and the accuracy of a downstream atlas completion task is improved.

The purpose of the invention is realized by the following technical scheme:

an ontology concept enhanced inductive atlas characterization method, comprising:

constructing a network model, giving a triple containing the newly added instance entities, and giving an ontology concept and a neighbor instance entity set of each newly added instance entity in the triple; the network model generates the representation of the ontology concept for each ontology concept of each newly added instance entity in the triple through a double-layer attention mechanism; generating template representation of the newly added instance entity based on the representations of all ontology concepts and the triples containing the newly added instance entity, and generating a final representation vector of the newly added instance entity by combining a neighbor instance entity set; evaluating the legality of the triple containing the newly added instance entities based on the final characterization vectors of all the newly added instance entities; if the legitimacy requirement is met, the triples are added to the knowledge-graph.

According to the technical scheme provided by the invention, the ontology conceptual representation containing rich information can be generated through a double-layer attention mechanism, and the ontology conceptual representation can be used for effectively improving the embedding effect of a newly-added entity; fusing a plurality of concepts corresponding to a newly added entity by using an attention mechanism determined by the relationship to generate an entity template vector; and then the template vector further fuses personalized features provided by neighbors in the newly added entity to generate a vector representation of the newly added entity, and finally, the effect of the atlas completion task is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flow chart of a generalized graph characterization method with ontology concept enhancement provided by an embodiment of the present invention;

fig. 2 is a model diagram of an ontology concept enhanced inductive atlas characterization method provided in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Both methods in the prior art ignore an information important to the newly added entity, namely the ontology concept. The knowledge graph comprises example entities and corresponding ontology concepts. In one aspect, instance entities provide rich detail information for corresponding ontological concepts. In another approach, the ontology concept provides basic summary information for its instance entities, which is particularly important for newly added entities. The ontology concept can be used as a basic template of a newly added entity to provide a relatively precise position range for the newly added entity in a vector space, so that the embodiment of the invention provides a method for characterizing an ontology concept-enhanced inductive atlas, as shown in fig. 1, which mainly comprises the following steps:

constructing a network model as shown in FIG. 2, giving a triple including newly added instance entities, and giving an ontology concept and a neighbor instance entity set of each newly added instance entity in the triple; the network model generates the representation of the ontology concept for each ontology concept of each newly added instance entity in the triple through a double-layer attention mechanism; generating template representation of the newly added instance entity based on the representations of all ontology concepts and the triples containing the newly added instance entity, and generating a final representation vector of the newly added instance entity by combining a neighbor instance entity set; evaluating the legality of the triple containing the newly added instance entities based on the final characterization vectors of all the newly added instance entities; and if the validity requirement is met, adding the triple into the knowledge graph to improve the integrity of the graph.

The main principle of the whole scheme is introduced above, and the network model needs to be trained and parameter estimated, and then used for the prediction task.

For ease of understanding, the following description will be made in connection with the model training and parameter estimation processes and the prediction task, in conjunction with the principles described above.

Firstly, arranging and preprocessing basic data.

Before model training and parameter estimation, basic data needs to be collected and collated, and then relevant data is preprocessed, wherein the preferred embodiment is as follows:

1. and (5) basic data arrangement.

In the embodiment of the invention, the basic data is a knowledge graph containing ontology concept information, and mainly comprises three parts of data information: instance entity triples representing relationships between instance entities; ontology concept triples representing meta-relationships between ontology concepts; and the example entity concept pair represents the corresponding relation between the example entity and the ontology concept to which the example entity belongs, and the data is usually in a text form.

In the embodiment of the present invention, the meta-relationship between the ontology concepts can reflect the association relationship between different ontology concepts, such as (city at _ location state), for a special meta-relationship subclass _ of (subclass), such as (city subclass _ of place), a parent-child relationship between the ontology concepts is embodied, and other meta-relationships (non-subclass _ of) are general meta-relationships. For a concept such as city, its parent concept and child concept can be obtained by sorting according to the subiases _ of relation. From other general meta-relationships (e.g., at _ location), general neighbor concepts (e.g., state) can be derived.

In order to facilitate model training, the atlas needs to be sorted. After sorting, acquiring a neighbor instance entity set and a corresponding ontology concept set for each instance entity; and (4) sorting each ontology concept to obtain a child concept set, a parent concept set, a general adjacent concept set and a corresponding instance entity set.

2. And (4) preprocessing data.

In the embodiment of the invention, the data preprocessing corresponds to example entity triples, and negative samples are constructed through preprocessing so as to carry out model training. And taking the example entity triples in the knowledge graph as positive samples, generating negative samples for each positive sample in a mode of randomly replacing the head entity or the tail entity in the example entity triples to form positive and negative sample pairs, and performing model training and parameter estimation by combining a plurality of positive and negative sample pairs.

In the embodiment of the invention, only one of the head instance entity or the tail instance entity is replaced for one positive triple, and the instance entity for replacement is randomly selected in the map.

Secondly, model training and parameter estimation.

The model training and parameter estimation mainly comprises three parts:

the first part is the characterization of the concept of computational ontology: recording any instance entity (including a head instance entity and a tail instance entity) in the positive and negative sample pairs as a target instance entity, extracting a corresponding ontology concept set, and modeling a local hierarchy by using a double-layer attention mechanism for each ontology concept, wherein the ontology concept is associated with multiple types of nodes, and the nodes comprise: other ontological concepts and instance entities; firstly, for each type of node, fusing node information of the same type by using a node level attention mechanism, thereby obtaining information representation of each node type; then, the type-level attention mechanism is used to aggregate the information tokens of the respective types, resulting in a token of the ontology concept.

The second part is to compute the final characterization vector of the target instance entity: according to the relationship information in the example entity triple, an attention mechanism determined by the relationship is used to combine the representations of all ontology concepts corresponding to the target example entity to generate a template representation of the target example entity, and a final representation vector of the target example entity is generated by combining the template representation of the target example entity and a neighbor example entity set of the target example entity by adopting a door mechanism.

The third part is to calculate the loss by scoring: and calculating the scores of the positive sample and the negative sample by using a scoring function and combining the final characterization vectors of all the example entities in the positive sample and the negative sample, and constructing a loss function by using the scoring result to estimate the parameters of the model.

The preferred embodiments of the three sections described above are as follows:

1. a characterization of the ontology concept is computed.

In the embodiment of the invention, a local hierarchical structure of the ontology concept is modeled by using a double-layer attention mechanism, the local hierarchical structure information of the ontology concept is fully considered, and the representation of the ontology concept is obtained by modeling. Consider that each ontological concept is typically associated with four types of information: parent concepts, child concepts, general neighboring concepts, and instance entities; therefore, for each type of node (father, son and general adjacent concepts, which are ontology concepts, instance entities are not concepts, and for an ontology concept c which has the four types of neighbor nodes, in the knowledge graph, the concepts, the entities are all nodes, the relationship is an edge, and only the types are different), a node-level attention mechanism is used to fuse the node sets of the same type, so as to obtain the information representation of each type; then, the type-level attention mechanism is used to aggregate the information tokens of the respective types, resulting in a token of the ontology concept.

In the embodiment of the invention, a node level attention mechanism is provided to represent that the weights of adjacent nodes in the same type are different in representing target concepts. For example, in the practice of the concept of "singerIn example entities, top-ranked singers such as "Zhou Jieren" may be more representative than other singers. Thus, nodes of the same type are treated as a group

For each node in the group, its weight is derived using a node-level attention mechanism, which is calculated as follows:

wherein, t is belonged to {1, 2, 3, 4} to represent 4 groups of neighbor node information of different types, which are respectively a father concept set, a child concept set, a general adjacent concept set and an example entity set corresponding to the ontology concept obtained in the knowledge graph arrangement stage;

representing the original representation of the ontology concept c,

d is the dimension of the token vector, which is the set of real numbers;

is the ith node of type t

The characterization vector of (a) is determined,

section of type t representing ontology concept cA set of points; the symbol | | represents the splicing operation, and σ is a LeakyReLU function;

and

is a training parameter related to t.

When in use

Representing a parent concept, or a child concept, or an instance entity,

represents a corresponding concept representation or entity representation; but when

When representing a general neighborhood concept, there is also an associated meta-relationship. The invention uses the basic assumption of TransE and adopts transformation

To express the influence of meta-relations, here

Representing a general neighborhood of the concept characterization vector,

the representation element relationship represents a vector. All the above entities, concepts and relationship characterization vectors are obtained by random initialization and are continuously adjusted as parameters in the training process.

The node level attention value is obtained through the calculation of the operations

And then fusing neighbor nodes under the same type to obtain information representations of each type, wherein the representations are as follows:

after obtaining each type of information representation, the embodiment of the present invention proposes a type-level attention mechanism to fuse the various type representations, thereby obtaining a final concept representation containing rich information. The type-level attention mechanism considers that different types of information have different influences on representing the target concept, and for the ontology concept c, the type-level attention mechanism is used for calculating type-level attention values of each type

Expressed as:

wherein,

and

are training parameters.

Finally, aggregating the information representations of the various types to obtain a representation of the ontology concept, which is expressed as:

where c' represents a characterization of the ontology concept c by a two-layer attention mechanism.

The type-level attention value calculated by the embodiment of the invention is implicitly shared by nodes under the same type, and the information sharing among the nodes can be promoted. In fact, the two-layer attention mechanism provides a finer granularity process for learning the attention value, and improves the interpretability of the model to a certain extent.

2. And calculating a final characterization vector of the target instance entity.

In the first part, modeling of a local two-level hierarchy of ontology concepts is completed, and concept representation containing rich information is obtained. This section will design an inductive entity characterization scheme based on the characterization of ontology concepts.

The prior structure-based inductive entity characterization method obtains the characterization of a target instance entity by aggregating neighbor instance entities of the target instance entity. This approach has limited effectiveness in processing newly added entities. Because the neighbor instance entities of the newly added instance entity are typically very sparse and the instance entities are heterogeneous in the knowledge-graph. To solve this problem, the present invention proposes a template perfection strategy to represent target entities (applicable to newly added instance entities) in a generalized manner. Here, for a target instance entity, its corresponding ontology concept is used in addition to its limited heterogeneous neighbor entities. In embodiments of the present invention, the ontology concept may be considered to describe the basic outline of the current instance entity, while the neighbor instance entities of the target instance entity may provide personalized features for it. For example, for a target instance entity, given its ontology concept "singer," the approximate location of the instance entity in vector space may be known. Meanwhile, given its neighbor instance entity: (e.g., "stand for", "blue and white porcelain") which can be distinguished from other example entities belonging to "singers".

Considering that a target instance entity may correspond to a plurality of ontological concepts, for example, for an instance entity of "dragon," the corresponding ontological concepts are: "actor", "husband", etc. However, for a given instance entity triplet ("dragon", "lead", "plan a"), the ontological concept "actor" is more important here as can be appreciated by the relationship "lead".

Therefore, in the embodiment of the invention, the influence of different concepts is measured by using an attention mechanism determined by a relation, and further the template representation vector of the target entity is generated. First, an attention value γ is calculated using an attention mechanism determined by the relationship_iExpressed as:

wherein, c'_iIs the ith ontological concept c of the target instance entity_iR is a characterization vector of the relationship in the example entity triplet to which the target example entity belongs;

is a collection of ontological concepts corresponding to the target instance entity;

and

are training parameters.

Then, a template representation of the target instance entity is computed, denoted as:

wherein, t_eThe template representation of the target instance entity describes the summary information of the target instance entity. By this reference, the characterization of the target instance entity is obtained without excessive error. Then, the neighbor instance entity of the target instance entity will provide personalized information for it, thereby obtaining a more accurate representation of the target instance entity. In embodiments of the invention, it is believed that each neighboring entity displays some aspect of a feature unique to the target entity, which may be formalized as a scaling in some dimension of the template representation determined by the neighboring entity. Thus, it is possible to provideWe use a gate mechanism so that each neighbor entity can weight all dimensions of the template, and this process can be described by the following equation:

n_i＝h_i+r_i

y_i＝tanh(Ut_e+Vn_i)

wherein n is_iThe ith neighbor instance entity n being the target instance entity_i，

And

respectively an adjacent head entity token vector and a relation token vector,

represents a set of neighboring instance entities,

representation collection

The number of the elements (c) is,

and

is the training parameter, e' represents the final characterization vector of the target instance entity.

3. TransE was used as a scoring function and losses were calculated.

In the embodiment of the invention, the TransE is used as a scoring function, for a positive sample (h, r, t), the basic idea of the TransE is that the relation r is regarded as a translation from a head instance entity h to a tail instance entity t, and the corresponding characterization vector is h + r ≈ t; h and t respectively represent a final characterization vector calculated by the scheme with a head instance entity h and a tail instance entity t as target instance entities, and r is a characterization vector of a relation r (which can be obtained by a conventional way); the scoring function is then expressed as:

f(h，r，t)＝||h+r-t||

the scoring function can estimate the probability that the example entity triple is a positive sample, namely the legality of the triple, and the smaller the f value is, the better the legal triple is.

For the positive sample (h, r, t), the negative sample obtained by the previous pre-processing is noted as (h ', r, t'), and the score is calculated in the same way:

f(h′，r，t′)＝||h′+r-t′||

only one of h 'and t' in the negative sample is replaced, and the other is equivalent to the instance entity in the positive sample; h 'and t' are also calculated according to the scheme provided above.

Constructing a loss function according to the evaluation result of each positive and negative sample pair, wherein the loss function is expressed as:

where N is the number of positive and negative sample pairs, τ is a hyperparameter, [ x ]]₊Representing a function [ x ]]₊Ma x (0, x). The above-described loss function treats positive and negative samples independently, the fraction of positive samples will tend to 0 and the fraction of negative triples will be greater than or equal to τ.

In the model training and parameter estimation, based on the trained parameters, for the ontology concept, the influence of adjacent nodes on the representation of the target concept can be measured in two levels (node level and type level); for the target entity, basic information of the target entity can be obtained based on the ontology concept, and personalized features of the target entity are obtained through the improvement of neighbor entities. The target instance entity characterization vector obtained in this way is more accurate.

And thirdly, applying the model to a prediction task.

After the network model is trained, a triple comprising the newly added instance entity is given, the ontology concept and the neighbor instance entity set of the newly added instance entity are given, the trained network model calculates the final characterization vector of the newly added instance entity in an inductive manner, the legality of the given triple is evaluated, and the calculation details related to the stage are the same as those of model training and parameter estimation, so that the details are not repeated.

In the embodiment of the invention, in the triple containing the newly added instance entity, the head instance entity h, the tail instance entity t and the relation between the head instance entity h and the tail instance entity t are newly added, and the triple is not contained in the original knowledge graph.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An ontology concept enhanced inductive atlas characterization method is characterized by comprising the following steps:

2. The method of claim 1, wherein in the training phase of the network model, the knowledge graph containing ontology concept information is sorted, and the knowledge graph contains three parts of data information: instance entity triples representing relationships between instance entities; ontology concept triples representing meta-relationships between ontology concepts; the instance entity concept pair represents the corresponding relation between the instance entity and the ontology concept to which the instance entity belongs; then, carrying out pretreatment: the method comprises the following steps of taking an example entity triple in a knowledge graph as a positive sample, generating a negative sample by randomly replacing a head entity or a tail entity in the example entity triple to form a positive and negative sample pair, and performing model training and parameter estimation by combining a plurality of positive and negative sample pairs, wherein the method comprises the following steps:

a first part: recording any instance entity in the positive and negative sample pairs as a target instance entity, extracting a corresponding ontology concept set, and modeling a local hierarchical structure by using a double-layer attention mechanism for each ontology concept, wherein the ontology concept is associated with multiple types of nodes, and the nodes comprise: other ontological concepts and instance entities; firstly, for each type of node, fusing node information of the same type by using a node level attention mechanism, thereby obtaining information representation of each node type; then, aggregating the information representation of each type by using a type-level attention mechanism to obtain the representation of the ontology concept;

a second part: according to the relationship information in the instance entity triple, combining the representations of all ontology concepts corresponding to the target instance entity by using an attention mechanism determined by the relationship to generate a template representation of the target instance entity, and generating a final representation vector of the target instance entity by adopting a gate mechanism and combining the template representation of the target instance entity and a neighbor instance entity set of the target instance entity;

and a third part: and calculating the scores of the positive sample and the negative sample by using a scoring function and combining the final characterization vectors of all the example entities in the positive sample and the negative sample, and constructing a loss function by using the scoring result to estimate the parameters of the model.

3. The method for characterizing the inductive graph with the enhanced ontology concept according to claim 2, wherein after the knowledge graph is sorted, a set of neighboring instance entities and a corresponding set of ontology concepts are obtained for each instance entity; and (4) sorting each ontology concept to obtain a child concept set, a parent concept set, a general adjacent concept set and a corresponding instance entity set.

4. The method for generalizing graph according to claim 2, wherein in the first part, for each ontology concept, fusing the same type of concept information using a node-level attention mechanism, so as to obtain the information characterization of each type, comprises:

using nodes of the same type as a group

For each node in the group, a node-level attention mechanism is used to obtain a node-level attention value for each node in the group

Expressed as:

wherein, t is belonged to {1, 2, 3, 4} to represent 4 groups of node information of different types, which are respectively a father concept set, a child concept set, a general adjacent concept set and an example entity set corresponding to the ontology concept obtained in the knowledge graph arrangement stage;

representing the original representation of the ontology concept c,

d is the dimension of the token vector, which is the set of real numbers;

is a token vector for the ith node of type t,

a node set with type t representing ontology concept c; the symbol | | represents the splicing operation, and σ is a LeakyReLU function;

and

is a training parameter related to t;

and then, fusing node information under the same type to obtain information representation of each node type, wherein the representation is as follows:

5. the method of claim 2 or 4, wherein aggregating information characterizations of respective types using a type-level attention mechanism to obtain a characterization of an ontology concept comprises:

calculating type-level attentiveness values for each type using a type-level attentiveness mechanism

Expressed as:

wherein, t is belonged to {1, 2, 3, 4} to represent 4 groups of node information of different types, which are respectively a father concept set, a child concept set, a general adjacent concept set and an example entity set corresponding to the ontology concept obtained in the knowledge graph arrangement stage; m is_tRepresenting an information representation with the type t;

representing the original representation of the ontology concept c,

and

is a parameter of the training session that is,

d is the dimension of the token vector, which is the set of real numbers;

and aggregating the information representations of all types to obtain the representation of the ontology concept, which is expressed as:

6. The method of claim 2, wherein in the second part, the generating the template representation of the target instance entity using the attention mechanism determined by the relationship to combine the representations of all ontology concepts corresponding to the target instance entity comprises:

first, an attention value γ is calculated using an attention mechanism determined by the relationship_iExpressed as:

s_i＝LeakyReLU(u[W₄c_i′||W₅r])

and

is a parameter of the training session that is,

d is the dimension of the token vector, which is the set of real numbers;

wherein, t_eIs a template representation of the target instance entity.

7. The method for characterizing the ontology concept enhanced inductive graph as claimed in claim 2 or 6, wherein a gate mechanism is adopted to generate a final characterization vector of the target instance entity by combining the template characterization of the target instance entity and the set of neighbor instance entities of the target instance entity, and the final characterization vector is expressed as:

n_i＝h_i+r_i

y_i＝tanh(Ut_e+Vn_i)

And

respectively an adjacent head entity token vector and a relation token vector,

represents a set of neighboring instance entities,

representation collection

The number of the elements (c) is,

and

is a parameter of the training session that is,

for a set of real numbers, d is the dimension characterizing the vector, t_eFor the template token of the target instance entity, e' represents the final token vector of the target instance entity.

8. The method for characterizing the ontology concept enhanced inductive atlas according to claim 2, wherein in the third part, using TransE as a scoring function, for a positive sample (h, r, t), consider the relation r as a translation from a head instance entity h to a tail instance entity t, and the corresponding characterization vector is in the form of h + r ≈ t; the scoring function is then expressed as:

f(h，r，t)＝||h+r-t||

wherein, | | | | represents a norm of l1 or l 2; h and t respectively represent final characterization vectors of a head instance entity h and a tail instance entity t in the positive sample; r is a characterization vector of the relation r;

for negative examples (h ', r, t'), the score is calculated in the same way,

f(h′，r，t′)＝||h′+r-t′||

only one of h 'and t' in the negative sample is replaced, and the other is equivalent to the instance entity in the positive sample; h 'and t' respectively represent final characterization vectors of the instance entity h 'and the tail instance entity t' in the negative sample;

the final loss function is:

where N is the number of positive and negative sample pairs, τ is a hyperparameter, [ x ]]₊Representing a function [ x ]]₊＝max(0，x)。

9. The method of claim 2, wherein after training the network model, a triplet including the newly added instance entity is given, and the ontology concept and the set of neighboring instance entities of the newly added instance entity are given, so that a final characterization vector and a validity evaluation result of the newly added instance entity can be obtained by using the trained network model.