CN112100380A

CN112100380A - Generation type zero sample prediction method based on knowledge graph

Info

Publication number: CN112100380A
Application number: CN202010973420.3A
Authority: CN
Inventors: 陈华钧; 耿玉霞; 陈卓; 叶志权
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2020-12-18
Anticipated expiration: 2040-09-16
Also published as: CN112100380B

Abstract

The invention discloses a generating type zero sample prediction method based on a knowledge graph, which comprises the following steps: constructing a knowledge graph fusing various semantic information by taking the hierarchically structured categories as category nodes and taking category connection attribute description, text description and external knowledge as additional nodes; adopting a graph neural network algorithm to encode semantic information of the knowledge graph and generating category vector representation; the generative class vector representation is used as an input to a generative model to generate samples for the class for learning and prediction by a zero-sample learning algorithm. By constructing a knowledge graph fusing various semantic information and generating a sample with richer characteristics and more inter-class discrimination for each invisible class based on the knowledge graph, the prediction problem of the invisible class sample is better solved.

Description

Generation type zero sample prediction method based on knowledge graph

Technical Field

The invention relates to the field of generative zero sample learning, in particular to a generative zero sample prediction method based on a knowledge graph.

Background

Zero-shot Learning (ZSL) is an important branch of the field of migration Learning, and is mainly used for handling the problem of sample missing in supervised Learning. Typical supervised learning requires manual labeling of training samples to guide the machine learning model to extract features, and the labeling of samples often requires enormous manpower and financial resources, especially in the classification problem, when some new classes appear, hundreds or thousands of training samples need to be manually labeled. The heavy sample labeling work makes the model difficult to generalize.

The zero sample learning technology can process the learning and prediction problem of the model under the condition that the training samples are absent, namely, the semantic prior knowledge among the sample labels is utilized to transfer the sample characteristics learned by the model from the training samples with known labels to the unknown new labels lacking the training samples, so that the sample prediction problem of the new labels is processed. In recent years, ZSL and its related algorithms have been widely applied in the fields of image classification, text classification, relationship classification, etc., in this task, the class of the existing training sample is generally defined as visible class (i.e. visible in the training data set), and the class of the training sample missing is generally defined as invisible class (i.e. invisible in the training data set).

Taking image classification as an example, a typical ZSL algorithm assumes that "semantically similar classes (i.e., similar in semantic space) also have similar visual features (i.e., similar in sample space)", and therefore, some ZSL algorithms map sample features and semantic features to the same vector space by learning a spatial mapping function, and perform nearest neighbor calculation in the vector space, thereby predicting the class of the sample. However, due to the lack of training samples of the invisible classes, such algorithms only participate in training samples of the visible classes during training, which easily leads to the bias of the algorithm model during prediction, i.e. predicting samples of the invisible classes as labels of the visible classes, especially when the visible classes and the invisible classes are contained in the sample space at the same time. To solve this problem, some ZSL algorithms propose to generate samples of invisible classes using a generation model, specifically, semantic prior information of the classes is used to generate samples of the classes, such as generation of countermeasure network (GAN). The zero sample learning method of the generating formula converts zero sample learning into traditional supervised learning by generating invisible training samples, thereby effectively solving the problem of sample loss in the zero sample learning.

However, most generative zero sample learning methods utilize a single semantic prior information, such as attribute description of a class, hierarchical structure of a class, or text description of a class, when generating a sample. Attributes of a category describe in detail semantic features of a category including visual features such as color, shape, non-visual features (e.g., habitat of animal category), and other features. However, the same attribute may behave differently in different categories, such as when classifying two animals as "zebra" and "pig", the same attribute "tail" behaves differently in the two animals. The hierarchical structure of the categories defines a classification system to which the categories belong, such as "horse" and "zebra" belong to the same "equine family", however, since the two types of animals belong to the same level in the classification hierarchy, the semantic information of the animals does not have distinction degree. The text description of the categories provides detailed descriptions of the categories such as "tiger is a large feline with sharp hearing, night vision, freely telescoping prongs and robust canine teeth, and black vertical stripes on the coat". However, these descriptions have much noise, and the extraction of useful information has certain difficulty.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a generating-type zero-sample prediction method based on a knowledge graph, which constructs a knowledge graph fusing various semantic information, and generates a sample with richer features and more inter-class discrimination for each invisible class based on the knowledge graph, so as to better solve the prediction problem of the invisible class sample.

In order to achieve the purpose, the invention provides the following technical scheme:

a generating type zero sample prediction method based on a knowledge graph comprises the following steps:

constructing a knowledge graph fusing various semantic information by taking the hierarchically structured categories as category nodes and taking category connection attribute description, text description and external knowledge as additional nodes;

adopting a graph neural network algorithm to encode semantic information of the knowledge graph and generating category vector representation;

the generated class vector representation is used as an input of a generative model to generate samples of the class for learning and prediction of the zero-sample class.

In the knowledge graph-based generating type zero sample prediction method, various types of semantic information are fused, the characteristics of various types of semantic information are integrated, advantage complementation is carried out, for example, constraint of a type layer is added in attribute description, attributes with discrimination are introduced in a type hierarchical structure, meanwhile, on the basis, the method is fused with external knowledge bases such as ConceptNet, DBpedia and the like, richer type semantic information is introduced, and therefore a generating model is combined to generate a sample with richer characteristics and more inter-type discrimination for each invisible type.

Preferably, when the knowledge graph is constructed, a hierarchical skeleton structure is constructed based on the relation between superior words and inferior words contained in the vocabulary knowledge base, wherein each category is used as a category node and corresponds to a vocabulary, and different category nodes are connected through a subclass relation according to a semantic structure in the vocabulary knowledge base;

and taking the attribute description and the text description of the category as additional nodes, and connecting the additional nodes with the category nodes, wherein each category is connected with the labeled attribute description through an attribute-containing relationship, and each category is connected with the description text through a description-existing relationship.

Preferably, when the knowledge graph is constructed, the categories are aligned with the entities in the external knowledge base, the external knowledge of the entities in the external knowledge base is used as the attachment nodes, and the external knowledge is connected with the categories by containing the external knowledge relationship. The specific mode of fusing external knowledge is as follows: the category keywords are aligned with the entities of the external knowledge base by using the existing tool or platform, the external knowledge corresponding to the entities (namely categories) is inquired by using the existing KPI or inquiry tool based on the aligned entities, then the inquired external knowledge is reasonably combined, and the category nodes in the current map are connected by including the external knowledge relationship.

Preferably, the encoding semantic information of the knowledge graph by using a graph neural network algorithm, and generating the class vector representation includes:

dividing the knowledge graph into a plurality of sub-graphs according to relations, wherein the relations comprise subclass relations, attribute-containing relations, description-existing relations and external knowledge-containing relations;

and respectively coding the subgraphs by adopting a graph neural network to obtain category sub-vector representations fusing each type of semantic information, and splicing all the category sub-vector representations to obtain category vector representations.

In the invention, when different subgraphs are coded by using a graph neural network algorithm, information is spread among different nodes and semantic information of different nodes is fused, so that semantic representation of each node is obtained in a vector space. Before the subgraph is encoded by adopting the graph neural network, word vectors are adopted for each category node and additional node in the subgraph for initialization. Specifically, word vector algorithms such as word2vec, GloVe and the like are used for training on Wikipedia corpora to obtain pre-trained word vectors, and the corpora comprise classes, attributes and vocabularies related to text description. Representing each node based on the pre-trained word vectors, wherein for nodes containing single words, such as 'horses', the initialization of the nodes is represented as the word vectors of the corresponding words; for nodes containing multiple words, such as "long tails" and the like, and text description nodes (typically a sentence), the initialization of the node is expressed as an average of the vectors of the words involved.

The vocabulary knowledge base adopts WordNet, and the external knowledge base adopts ConceptNet and DBpedia.

Preferably, the generative model is constructed based on a generative confrontation network. The generation model combines random noise which obeys certain distribution under the condition of taking category vector representation as input to generate the sample characteristics of the category, so that the generation model is used for learning and prediction of a zero-sample learning algorithm.

Compared with the prior art, the invention has the beneficial effects that at least:

(1) the method is based on knowledge graph fusion of the existing category semantic information (category attribute description, category hierarchical structure and category text description) and advantage complementation is carried out, wherein the attribute description can introduce semantic features with discrimination into the hierarchical structure, the hierarchical structure can increase category level constraint for the attribute description, and the fusion of the three semantic information contains more semantic information compared with the single text description with certain noise. In addition, the invention provides that the category is linked to the external knowledge base so as to introduce more external knowledge into the knowledge graph, therefore, the knowledge graph provided by the invention contains more comprehensive category semantic prior knowledge, and based on the knowledge graph, the generation model can generate more abundant sample characteristics;

(2) different from the prior knowledge graph representation learning method (namely a graph coding method such as TransE and the like), the invention provides a method for fusing semantic information in a graph and mapping the semantic information to a vector space by using a graph neural network algorithm, and simultaneously, initializing the representation of each node by using a pre-trained word vector;

(3) different from the existing generation type zero sample prediction method which utilizes single category semantic information and depends on a model optimizer or a complex network, the invention provides the method which uses a knowledge graph containing rich semantic information as the input of a generation model, and simultaneously uses a basic generation model framework to generate more rich sample characteristics for invisible categories, and simultaneously has higher classification accuracy in the test scene of a zero sample learning algorithm.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for predicting a generated null sample based on a knowledge-graph according to an embodiment;

FIG. 2 is a schematic diagram of a skeleton structure of a knowledge graph constructed in an animal image classification scene;

FIG. 3 is a schematic diagram of class-attribute and class-text description constructed in an animal image classification scene;

fig. 4 is a map diagram of an animal image classification scene fused with an external knowledge base.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

The knowledge graph-based generation type zero sample prediction method can be used for application scenes which are not limited to the fields of image classification, text classification, relation classification and the like and lack of training samples, the learning and prediction problems of the zero sample classes are solved by utilizing rich semantic information among the classes contained in the knowledge graph, the zero sample animal image classification is taken as an example in the embodiment, and the superiority of the algorithm is proved by testing the classification performance of the images in the zero sample scenes. As shown in fig. 1, when the method for generating zero-sample prediction based on knowledge-map is applied to animal image classification, the method comprises the following steps:

step 1, fusing attribute description, hierarchical structure, text description and external knowledge of categories based on a knowledge graph to model rich semantic prior knowledge among the categories;

step 2, using a graph neural network algorithm to encode semantic information in a graph and represent the semantic information in a vector space;

and 3, using the class vector representation obtained by map coding as the input of a generation model to generate samples with rich characteristics for learning and prediction of a zero-sample learning algorithm.

In the step (1), firstly, a skeleton structure of the knowledge graph is constructed, known animal types can correspond to a synonymous word in WordNet, all types (including visible types and invisible types) are regarded as a type node, and the types are connected through subclass relations according to the structures of superior words and inferior words in WordNet, and fig. 2 is a schematic diagram of the constructed skeleton structure of the knowledge graph, and the complete skeleton structure comprises all animal types.

And secondly, acquiring the attribute description and the text description of the category, and connecting the attribute description and the text description as additional nodes with the category node.

The text description of the category is obtained from the Wikidata, specifically, the Wikidata entity ID corresponding to the category is queried through a Mediawiki API, and then the text description of the entity (category) is queried and returned through the ID through a built-in Wikidata toolkit of Python.

The attribute description of the category is obtained from Wikipedia and is manually marked to the category to which the category belongs. Specifically, each animal category is searched for a corresponding Wikipedia entry, animal description texts in the entries are crawled through tools, and a series of vocabularies related to attribute description are extracted from the entries to construct an attribute table. Based on the attribute table, inviting volunteers to perform attribute labeling on animal categories: under the condition that each volunteer provides 25 reference images in each category, marking 3-6 attributes for each category according to an attribute table; each category will be labeled by 5 volunteers in the same way, and finally the final labeling result is obtained by a few majority-compliant principles.

After the attribute description and the text description are obtained, the attribute and the text are used as additional nodes to be connected with the corresponding class nodes through the relation including attribute and existence description, and fig. 3 shows attribute description and text description information corresponding to the class tiger.

Finally, in addition to the existing category semantic information, the knowledge-graph proposed by the present invention introduces knowledge about categories in an external knowledge-base by linking the categories to external knowledge-base entities. Take the example of linking to an external knowledge base ConceptNet. Specifically, concepts related to the current category and connected by specific relationships and stored in the ConceptNet are inquired and obtained through a ConceptNet REST API, and the relationships and concepts in the ConceptNet are in a natural language description form, so that when the knowledge derived from the ConceptNet is merged, the concepts connected by the specific relationships and the connected relationships are spliced to form an additional node, and then the node is connected with the current category node through a relationship of 'external knowledge existence'. FIG. 4 shows the extrinsic knowledge associated with the category of dogs and the final fusion results.

Therefore, the construction of the knowledge graph containing rich category semantic prior is completed, and the graph contains nodes such as categories, attributes, text description, external knowledge and the like, sub-class relations, attribute-containing relations, description-existing relations and external knowledge-containing relations.

In the step (2), based on the constructed knowledge Graph, the Graph is divided into subgraphs with different views according to different relations, and then nodes in the different subgraphs are fused by using a Graph neural network algorithm such as a Graph Auto-Encoder (GAE), so that the subgraphs are mapped to a vector space. Specifically, the graph is segmented into subgraphs G representing the subclass relations according to the existing four-class relation subclass relations, including attribute relations, describing relations and external knowledge relations_sSubgraph G representing attribute-containing relationship_aSub graph G for representing existence of description relation_tAnd sub-graph G representing relationships involving external knowledge_eFor each subgraph, respectively using an unsupervised GAE graph algorithm to obtain a representation of each node in the subgraph, specifically, using a subgraph G_aFor example, for each node i in the graph, the GAE first aggregates the features of its surrounding nodes using the graph convolutional layer to get a representation of the node:

in the formula (I), the compound is shown in the specification,

respectively show the current graph G_aThe representation of the current node in the (l-1) th convolutional layer and the updated representation obtained after the convolutional aggregation of the l-th layer, N_iRepresenting the surrounding nodes of the current node i (in graph G)_aThe attribute nodes for which the surrounding nodes of each category in the set labeled),

and

respectively representing the weight matrix and the bias of the ith layer convolution. Through multilayer convolution, the final vector representation of the node i can be obtained

Note that in order to enrich the semantic information of the nodes themselves, the present invention proposes to initialize the nodes using word vectors, so the input of the first layer convolution operation is a word vector representation of each node.

Similarly, sub-graph G encoded by the GAE algorithm_s、G_tAnd G_eThe vector representation of the node i can be obtained separately

And

in step (3), considering that different subgraphs all contain class nodes, different subgraphs can obtain vector representations of class nodes from different views after GAE coding, for example, for class node c, vector representations can be obtained after different subgraphs are coded

And

in order to fuse the class representations from different views, the invention proposes to concatenate the vectors to obtain a final class vector representation:

at this time, g_cContains semantic prior knowledge about the class c from different views as described in the knowledge-graph.

The generated category vector representation can be used as an input of a generation model constructed based on a generated confrontation network (GAN) to generate sample features corresponding to categories, so that zero sample learning is converted into a traditional supervised learning model, and test samples of invisible categories are classified. Specifically, the GAN comprises a generator and a discriminator, wherein the generator generates samples (image features) of corresponding classes by taking class vector representation and random noise subjected to certain distribution as input, and a loss function trained by the generator

Is defined as:

in the formula (I), the compound is shown in the specification,

representing the generated samples, z representing a random noise vector sampled from a gaussian distribution, a first term of a loss function representing a loss term of Wasserstein GAN (a variant of GAN), and a second term representing a common softmax classification loss term for classifying the generated samples to ensure inter-class discrimination of the generated samples, wherein c represents a class c, and λ is a corresponding weight coefficient,

representing a sample

The predicted probability distribution for class c. Note that, the generator generates image features (such as features extracted by using a pre-trained convolutional neural network) instead of directly generating images, because it is more difficult to generate images according to experience of related work, and the result of zero sample prediction is poorer than that of generating image features, and in addition, when classifying test images, it is also necessary to extract features first and then classify the test images, so that it is a better choice to directly generate sample features.

The main function of the discriminator is to distinguish real samples from generated samples, and the loss function of the discriminator

Is defined as:

wherein, the first two items

Represents the Wasserstein distance between the generated sample and the real sample, the last term

Denotes the Lipschitz constraint term and beta denotes the corresponding weight coefficient, wherein

Obeying a gaussian distribution.

The training of the GAN model is performed in an iterative manner, wherein the model first fixes the generator while maximizing the loss function of the discriminator, so that the discriminator can well distinguish real samples from generated samples; and then fixing the discriminator by the model, and simultaneously minimizing a loss function of the generator, so that the generator generates high-quality samples as much as possible to cheat the discriminator, and finally generating high-quality samples which cannot be distinguished by the discriminator by the model after multiple iterations.

The training of the GAN model uses the class vector representation of the visible class and the training samples for training, and the trained generator can generate high-quality samples for the invisible class as the training samples of the subsequent algorithm under the condition of taking the invisible class vector as the input. After the GAN model is trained by the visible class, the GAN model can be directly used for generating an invisible class sample, because there is a relevant relationship between the visible class and the invisible class in a semantic space (i.e., a knowledge graph). With the generated training samples, a classifier can be trained for each class of the invisible classes, so that the model can classify the test samples of the invisible classes in the test stage.

TABLE 1

Table 1 shows the quantitative evaluation index comparison of the classification results of the method of the present invention and the conventional generated zero sample method on the same animal image dataset (comprising 80 classes, 77,173 images, wherein 25 classes are visible classes, each visible class has about 1300 training images, and 55 classes are invisible classes, and no training images). Compared with the existing method (such as GAZSL, LisGAN and the like) using text description, attribute description or hierarchical structure as semantic prior knowledge, the method can generate samples with richer characteristics and more inter-class discrimination, so that the classification accuracy in a standard ZSL test scene (classifying invisible test samples into invisible classes during testing) and a generalized ZSL test scene (classifying visible test samples into visible classes and invisible classes during testing) is higher, and the superiority of the algorithm is demonstrated.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A generating type zero sample prediction method based on a knowledge graph is characterized by comprising the following steps:

2. The method of claim 1, wherein when constructing the knowledge graph, a hierarchical skeleton structure is constructed based on the relation between superior and inferior words contained in the vocabulary knowledge base, wherein each category is used as a category node corresponding to a vocabulary, and different category nodes are connected by a subclass relation according to a semantic structure in the vocabulary knowledge base;

3. The method as claimed in claim 1 or 2, wherein the knowledge-graph is constructed by aligning the category with the entity in the external knowledge base, using the external knowledge of the entity in the external knowledge base as an attachment node, and connecting the external knowledge with the category by including an external knowledge relationship.

4. The method of claim 1, wherein the generating class vector representation by encoding semantic information of the knowledge-graph using a graph neural network algorithm comprises:

5. The method of knowledge-graph-based generative zero-sample prediction as claimed in claim 4 wherein prior to encoding the subgraph using the graph neural network, word vectors are initialized for each class node and additional node in the subgraph.

6. The method of knowledgemap-based generative zero sample prediction as claimed in claim 3 wherein the lexical knowledge base employs WordNet and the external knowledge base employs ConceptNet, DBpedia.

7. The method of knowledge-graph-based generative zero sample prediction as claimed in claim 1 wherein the generative model is constructed based on generative confrontation networks.

8. The method of claim 1 or 7, wherein the generative model generates the sample characteristics of the class by combining random noise obeying a certain distribution under the condition of taking the class vector representation as input, so as to be used for learning and prediction of the zero-sample learning algorithm.