CN112183580A

CN112183580A - Small sample classification method based on dynamic knowledge path learning

Info

Publication number: CN112183580A
Application number: CN202010927478.4A
Authority: CN
Inventors: 廖清; 尹哲; 柴合言; 漆舒汉; 刘洋
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2021-01-05
Anticipated expiration: 2040-09-07
Also published as: CN112183580B

Abstract

A small sample classification method based on dynamic knowledge path learning, comprising the following steps: in the knowledge selection stage based on knowledge graph, by forming auxiliary sets into knowledge graph, use small sample instances to find a learning path suitable for oneself in the knowledge graph; In the dynamic path generation stage of category constraints, a small sample instance selects the most relevant knowledge points in the knowledge graph to form a path, introduces the constraints of the path at the category level, obtains category commonality, and constrains the quality of the path by calculating the path loss; path-based knowledge In the learning and classification phase, the information carried by the most relevant knowledge points is sequentially extracted to enhance the feature expression of the target instance. On the category with the highest similarity, the cross-entropy loss is used to measure the classification loss, and a small-sample classification model is established by the weighted summation of the classification loss and the path loss.

Description

Small sample classification method based on dynamic knowledge path learning

Technical Field

The invention relates to a small sample classification method based on dynamic knowledge path learning, and belongs to the technical field of small sample classification.

Background

Deep learning has achieved good results in various fields, but the existing artificial intelligence depends on mass data training, the model generalization capability is poor, and the effect and the capability of rapidly expanding to a new task in the limited data field are unsatisfactory. To address this problem, a small sample Learning (FSL) problem is proposed. Small sample learning can help alleviate the difficulty of collecting large-scale supervised data or manual labeling, making artificial intelligence more convenient to use in an industrial environment, such as ResNet's classification accuracy on manually labeled ImageNet datasets has exceeded human, but on the other hand people can identify 30,000 classes, making it almost impossible for a machine to do a task. In contrast, small sample learning may help reduce the data collection effort for these data intensive applications, such as image classification, image retrieval, target tracking, gesture recognition, image captioning and visual problem solving, video event detection and language modeling, and so forth. Therefore, if the small sample learning architecture is used for solving the problems, the computing resources and labor cost can be greatly reduced, and the models and the algorithms can successfully solve the small sample learning problems and can achieve better effect under the condition of sufficient data. Secondly, in the fields of privacy, security, medical treatment and the like, supervision information is difficult or impossible to obtain, and small sample learning has gained much research on such tasks. In the medical field, the discovery of drugs often requires exploring the properties of molecules as the basis of new drugs, however, due to the possible toxicity and low activity of new molecules, there are generally few biological records and clinical experiments, which results in slow progress of research. In the recommendation field, the cold start problem has been troubling because there is not enough user support for a new system, leading to many algorithms based on user-commodity matrix decomposition failing, but it has become possible to learn models for such rare cases of samples through small samples.

Existing small sample learning frameworks are broadly divided into three categories, data-driven, model-driven, and algorithm-driven.

(1) And (3) data driving:

the data-driven approach makes more data through a priori knowledge to reduce estimation errors. It can be subdivided into two small directions, one using some transformations to assign a training set, and the other to obtain new data from other data sets. The former is typically data enhancement by some manual rules or to make the network learn how to transform the data set, such as common countermeasures to generate the network, e.g. simulating the data distribution while producing a large number of valid samples. Some of the methods enhance the picture by learning to change irrelevant background information in the picture, such as changing the sunlight of the picture in a target recognition task and modifying the scenery of the picture to achieve the purpose of increasing samples. The latter uses unsupervised data to enhance the expressiveness of the model, or adds similar data sets to enable the network to learn how to generate a good minimum experience risk device, for example, the method of adding unsupervised data sets generally uses a large unlabeled data set as a priori knowledge, and the key point is how to find data with the same label as that in the training set, and then adds the data into the training set to enhance the data, so that the change of the same kind of data can be increased to enable the model to have better generalization capability, and the skill is used in the semi-supervised prototype propagation graph network. The similarity data may bring some deviation and misleading because the similarity data is not directly designed for the target task, so that indistinguishable false data is generated on a data set with many data in a method based on a countermeasure generation network, and the mean and the variance on the data set are considered in the generation process, so that the generation process has more variability.

(2) And (3) driving an algorithm:

the algorithm drive considers that the search strategy in the assumed space is changed by using the prior knowledge, so that the optimal solution is better found, and the method can be roughly divided into three types: fine tuning already trained parameters, meta-learner, learning how to search. The first strategy stimulates the trend of storing the trained model parameters, so that the transfer learning becomes a more popular branch in the field of small sample learning. Thus learning how to adapt to new tasks becomes a new goal. The latter two strategies belong to the category of meta-learning, the former is to learn trained parameters in a plurality of tasks which are distributed in the same way through a meta-learner to be used as the initialization of a test task, and the latter directly applies the learned search steps or update rules to a new task.

(3) Driving a model:

the method tries to learn a proper feature embedding space where the feature embedding of pictures with the same label will be similar, while the features of different classes of pictures are exactly the opposite, and the final classification utilizes the nearest neighbor method. The twin network is classified by calculating similarity scores of pairs of picture inputs, and the matching network uses an attention mechanism and a memory unit to compare the similarity between the test sample and the support set sample. The prototype network uses the embedded mean value of the small sample class pictures as prototype expression of the class, and returns the prediction result by searching nearest neighbor. There are also methods to improve the depth metric of prototype networks or learning migratability by three clustering methods based on semi-supervision. Then, some scholars directly form a graph by similarity of the support set and the test sample, the graph is iterated and then directly classified by using the node characteristics after iteration, and some works adopt a closed label propagation mode to learn how to associate the test sample to the support set label by using a meta-learning mode in the relational graph, so that the test set label is directly obtained. There are also methods that use two-stage learning to add a priori knowledge, which is then used to assist in the subsequent small sample learning task.

A two-stage training mode is utilized, a model is trained in a large-scale data set to enable the model to have the capability of extracting features, a large data set is utilized as an auxiliary to provide some extra priori knowledge in the second stage, and a small sample data set is utilized for retraining so that the model can adapt to the conditions of few samples and many new tasks. The knowledge propagation processes of the two are global and are diverged out by the auxiliary set, and the direction propagation of the knowledge is not considered.

The existing small sample learning framework considers that training tasks are similar, however, actually dissimilar tasks bring negative migration to pollute the whole model, and the class center is difficult to extract from a small amount of data because extra knowledge is not learned; the directional transmissibility of knowledge learning is not considered, and the advantages brought by the front and back relations of knowledge learning are not utilized.

Disclosure of Invention

The invention provides a small sample classification method based on dynamic knowledge path learning, which solves the problems that the prior art is difficult to extract a class center from a small amount of data due to the fact that extra knowledge is not learned, and the other technologies utilizing global extra knowledge do not consider the benefits brought by knowledge learning directionality, often lack of interpretability and poor learning effect, and has the following specific technical scheme:

a small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:

in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;

in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;

and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.

Preferably, in the knowledge selection stage based on the knowledge graph, the specific method for finding the learning path suitable for the small sample in the knowledge graph is as follows:

the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):

p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;

selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance

For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is

After selecting the knowledge points, the knowledge points are updated to

Using hidden states

The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution

In the formula (2), W^T，W_hAnd W_vAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,

representing the jth knowledge point in the auxiliary graph,

representing the preference degree of the ith instance to the jth knowledge point at the time t;

in the formula (3), the first and second groups,

the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j

Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node

That is, the ith instance picks the path node at time t as

Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated

The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:

furthermore, in the dynamic path generation phase based on the category constraint, the path acquisition is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path dⁱAnd the t-th node therein is marked as

The path chosen for the ith instance is therefore routed

Composition is carried out;

the path loss is calculated to constrain the path quality, and the calculation is as follows:

wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, y^o，y^vAnd y^uEach represents a label of example i, when i ═ o, v, u;

while

The average degree of attention of example i to each knowledge point from time 1 to t is shown. When i is o, it is expressed as u or v, similarly

All represent the average attention degree of a certain example to each knowledge point from 1 to t;

the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.

Furthermore, in the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer states to enhance the feature expression of the target instance, and the nodes on the path and the hidden layer states updated through the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input by using a gating mechanism to update the corresponding hidden layer states:

wherein, W_r，W_z，

All represent linear transformation operations, and

intermediate states that are gating mechanisms, and finally hidden states are removed from

Is updated to

Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;

after gathering the information of the T knowledge points, the final hidden layer state is obtained

Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using p_i or q_gRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:

wherein

And

each set is used to represent the situation at each time step,

represents the average degree of attention to the knowledge points at each moment, and

representing the hidden layer state at each time;

and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:

wherein

The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known

Pr(y＝c|q_g) Representing the probability of belonging to category c;

to express p in the formula_iTranspose of (y)_kHere again, the label of the kth instance in the supporting set is represented;

wherein W_sIs a learnable parameter, using Pr_gRepresenting the probability vectors for each category for the query set instance g,

then, the cross entropy loss is used for measuring the classification loss, and N represents the packet of the support setNumber of classes contained:

building a small sample classification model through the weighted summation of the classification loss and the path loss:

L＝λL₁+μL₂+νL₃ #(16)

wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.

The invention can simulate the sequentiality of knowledge learning and the unexplainable property of a previous model, convert the classification problem into the problems of knowledge selection and sequence determination, and avoid the influence caused by dissimilar tasks.

Drawings

Fig. 1 is a flowchart of the small sample classification method based on dynamic path knowledge learning according to the present invention.

FIG. 2 is a workflow diagram of the present invention based on dynamic path knowledge learning.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a small sample classification method based on dynamic knowledge path learning, which is divided into three stages of knowledge selection based on a knowledge graph, dynamic path generation based on category constraint and knowledge learning based on a path, and respectively considers the learning process of forming a learning path after knowledge is selected and then learning, acquires the commonality of categories by learning matched knowledge for each small sample example, and reduces the influence caused by respective difference among samples.

As shown in fig. 1, a method for classifying small samples based on dynamic knowledge path learning includes the following steps:

1. in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; the specific method comprises the following steps:

After selecting the knowledge points, the knowledge points are updated to

Using hidden states

representing the jth knowledge point in the auxiliary graph,

in the formula (3), the first and second groups,

That is, the ith instance picks the path node at time t as

2. in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; the path is obtained by selecting the most relevant knowledge point from the knowledge graph T times by a small sample example, and the path selected by the ith example is marked as a path dⁱAnd the t-th node therein is marked as

The path chosen for the ith instance is therefore routed

Composition is carried out;

wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, y^o，y^vAnd y^uEach represents a label of example i, when i ═ o, v, u; KL is an abbreviation of Kullback-Leibler divergence;

the average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarly

3. In the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer state to enhance the feature expression of the target example, and the nodes on the path and the hidden layer state updated through the recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer state by utilizing a gating mechanism:

wherein, W_r，W_z，

All represent linear transformation operations, and

Is updated to

wherein

And

each set is used to represent the situation at each time step,

representing the hidden layer state at each time;

wherein

Pr(y＝c|q_g) Representing the probability of belonging to category c;

to express p in the formula_iTranspose of (y)_kAre still denoted hereinSupporting the tag of the kth instance;

then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:

L＝λL₁+μL₂+νL₃ #(16)

The invention provides a small sample classification method for dynamic path learning, which enables prior knowledge to be orderly transmitted among tasks, designs a loss function aiming at path generation and ensures the reasonability of path generation to the greatest extent.

As shown in fig. 2, the working flow of small sample learning based on the present invention is: and (3) forming a knowledge graph by the aid of similarity among the auxiliary sets, dynamically selecting a dedicated knowledge path for each small sample task instance when a small sample classification task is input, then learning a final feature expression according to the path, and performing feature classification on the basis of the feature expression to fulfill the purpose of small sample classification.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. a small sample classification method based on dynamic knowledge path learning, is characterized in that: step is as follows:

In the knowledge selection stage based on knowledge graph, the auxiliary set is formed into a knowledge graph, and a small sample instance is used to find a suitable learning path in the knowledge graph;

In the dynamic path generation stage based on category constraints, a small sample instance selects the most relevant knowledge points in the knowledge graph to form a path, introduces the constraints of the path at the category level, obtains the category commonality, and constrains the quality of the path by calculating the path loss;

In the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points is sequentially extracted to enhance the feature expression of the target instance, and the similarity calculation of each instance in the query set and the feature expression of each instance in the support set is performed. , so that the target instance is assigned to the category with the highest similarity, and then the cross entropy loss is used to measure the classification loss, and a small sample classification model is established by the weighted summation of the classification loss and the path loss.

2. A small sample classification method based on dynamic knowledge path learning according to claim 1, characterized in that: in the knowledge selection stage based on the knowledge graph, the small sample instance finds a learning path suitable for itself in the knowledge graph The specific method is:

The nodes of the knowledge graph are directly composed of the prototype of the auxiliary set category, and the prototype is the knowledge point, and the edge uses the similarity between the auxiliary categories as the weight of the edge to connect the nodes; the similarity between the knowledge graph nodes is calculated by the formula (1) shows:

In formula (1), p and q are the knowledge points of the auxiliary set category, and s is the similarity measurement function and is defined as the dot product similarity; through this function, the weight of the edge between two nodes in the knowledge graph can be obtained, so that Determine a knowledge graph composed of auxiliary set B;

Select an exclusive knowledge path for each small sample instance, set the length of the knowledge path as T, and select a knowledge point in the knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, for One of the paths i is the path selected by the i-th instance, and a hidden state is set

It is used to calculate the probability of selecting knowledge points, and the hidden state of the i-th instance at time t is

After the knowledge point is selected, it will be updated as

Use hidden state

Perform attention calculation with all knowledge points in the auxiliary set B, and use the attention distribution as the probability of selecting knowledge point j

In formula (2), W ^T , W _h and W _v are all linear transformation operations, which can transform the matrix to the appropriate dimension, |B| represents the number of knowledge points in the auxiliary graph,

represents the jth knowledge point in the auxiliary graph,

Represents the preference of the i-th instance to the j-th knowledge point at time t;

In formula (3),

Represents the preference of the i-th instance to the k-th knowledge point at time t, and is converted into a probability form by formula (3), so as to obtain the probability of selecting knowledge point j

According to formula (4), the knowledge point with the highest probability is selected as the node of the path, then the t-th node on the path is denoted as

That is, the path node selected by the i-th instance at time t is

The average knowledge point feature is introduced when calculating the hidden layer state at time t+1

To reduce the impact of deviating from the original problem when searching for a path, and finally update the hidden layer state through a recurrent neural network:

3. A small sample classification method based on dynamic knowledge path learning according to claim 2, characterized in that: in the class constraint-based dynamic path generation stage, the path is obtained by the instance of the small sample task from the knowledge graph After selecting the most relevant knowledge point T times from the , the path selected by the i-th instance is denoted as the path d ⁱ , and the t-th node in it is denoted as

So the path picked for the ith instance is given by

composition;

The quality of the path is constrained by calculating the path loss, which is calculated as follows:

Among them, o, u, and v are indexes, which are used to represent any value in a range; |Q|, |S| are used to represent the number of instances in the query set and the number of instances in the support set in the small sample task, respectively. Number, y ^o , y ^v and y ^u all represent instance i, the label when i=o, v, u;

and

is the average degree of attention of instance i to each knowledge point from 1 to t; at this time i=o, in the same way, when i=u or v, it is expressed as

Both represent the average degree of attention paid to each knowledge point by a certain instance from 1 to t;

It is used to indicate whether the knowledge point is selected from 1 to t, and the value increases with the increase of time steps to increase the penalty for the distribution that does not meet the requirements.

4. A small sample classification method based on dynamic knowledge path learning according to claim 3, characterized in that: in the path-based knowledge learning and classification stage, the most relevant knowledge can be sequentially sorted through the hidden layer state The information carried by the point is extracted to enhance the feature expression of the target instance, and the node on the path and the state of the hidden layer updated by the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer. state:

Among them, W _r , W _z ,

both represent linear transformation operations, and

are the intermediate states of the gating mechanism, and finally the hidden state is changed from

update to

σ is the activation function, which is used to increase the nonlinear characteristics of the intermediate state of the gating mechanism;

After the aggregation of T knowledge point information, the final hidden layer state is obtained

Combined with the attention distribution of the knowledge selection stage based on knowledge graph, an output network is used to obtain the feature expression of the instance in the new space, and p _i or q _g are used to represent the gth instance from the support set or from the query. The feature representation of the i-th instance in the new space:

in

and

are a set used to represent the situation at each time step,

represents the average degree of attention to each knowledge point at each moment, and

Represents the state of the hidden layer at each moment;

The similarity calculation is performed between the query set instance and the feature expression of each small sample instance in the support set, so that the target instance is assigned to the category with the highest similarity. The calculation formula is as follows:

in

Represents the similarity of the g-th instance in the query set to the i-th instance in the support set. Similarly, it can be seen that

Pr(y=c|q _g ) represents the probability of belonging to category c;

In order to express the transpose of pi in the formula, y _k here still represents the label of the _k -th instance in the support set;

where W _s is a learnable parameter, and Pr _g is used to represent the probability vector of query set instance g for each category,

Then use the cross entropy loss to measure the classification loss, N represents the number of categories contained in the support set:

Build a few-shot classification model by weighted summation of classification loss and path loss:

L=λL ₁ +μL ₂ +νL ₃ # (16)

Among them, λ, μ and ν are hyperparameters used to control the weight of each loss function; using this objective function can guide the model to find a more reasonable knowledge path and improve the accuracy of small sample classification.