CN112183580A - Small sample classification method based on dynamic knowledge path learning - Google Patents

Small sample classification method based on dynamic knowledge path learning Download PDF

Info

Publication number
CN112183580A
CN112183580A CN202010927478.4A CN202010927478A CN112183580A CN 112183580 A CN112183580 A CN 112183580A CN 202010927478 A CN202010927478 A CN 202010927478A CN 112183580 A CN112183580 A CN 112183580A
Authority
CN
China
Prior art keywords
knowledge
path
instance
small sample
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010927478.4A
Other languages
Chinese (zh)
Other versions
CN112183580B (en
Inventor
廖清
尹哲
柴合言
漆舒汉
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN202010927478.4A priority Critical patent/CN112183580B/en
Publication of CN112183580A publication Critical patent/CN112183580A/en
Application granted granted Critical
Publication of CN112183580B publication Critical patent/CN112183580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种基于动态知识路径学习的小样本分类方法,包括如下步骤:基于知识图的知识挑选阶段,通过将辅助集组成知识图,用以小样本实例在知识图中寻找适合自己的学习路径;基于类别约束的动态路径生成阶段,小样本实例选择知识图中最相关的知识点组成路径,引入路径在类别级的约束,获得类别共性,通过计算路径损失来约束路径的好坏;基于路径的知识学习与分类阶段,顺序地将最相关的知识点所携带的信息提取出来增强目标实例的特征表达,查询集实例与支撑集中每一个小样本实例的特征表达进行相似度计算,使得目标实例分到相似度最高的类别上,然后使用交叉熵损失来衡量分类损失,通过分类损失和路径损失的加权求和建立小样本分类模型。

Figure 202010927478

A small sample classification method based on dynamic knowledge path learning, comprising the following steps: in the knowledge selection stage based on knowledge graph, by forming auxiliary sets into knowledge graph, use small sample instances to find a learning path suitable for oneself in the knowledge graph; In the dynamic path generation stage of category constraints, a small sample instance selects the most relevant knowledge points in the knowledge graph to form a path, introduces the constraints of the path at the category level, obtains category commonality, and constrains the quality of the path by calculating the path loss; path-based knowledge In the learning and classification phase, the information carried by the most relevant knowledge points is sequentially extracted to enhance the feature expression of the target instance. On the category with the highest similarity, the cross-entropy loss is used to measure the classification loss, and a small-sample classification model is established by the weighted summation of the classification loss and the path loss.

Figure 202010927478

Description

Small sample classification method based on dynamic knowledge path learning
Technical Field
The invention relates to a small sample classification method based on dynamic knowledge path learning, and belongs to the technical field of small sample classification.
Background
Deep learning has achieved good results in various fields, but the existing artificial intelligence depends on mass data training, the model generalization capability is poor, and the effect and the capability of rapidly expanding to a new task in the limited data field are unsatisfactory. To address this problem, a small sample Learning (FSL) problem is proposed. Small sample learning can help alleviate the difficulty of collecting large-scale supervised data or manual labeling, making artificial intelligence more convenient to use in an industrial environment, such as ResNet's classification accuracy on manually labeled ImageNet datasets has exceeded human, but on the other hand people can identify 30,000 classes, making it almost impossible for a machine to do a task. In contrast, small sample learning may help reduce the data collection effort for these data intensive applications, such as image classification, image retrieval, target tracking, gesture recognition, image captioning and visual problem solving, video event detection and language modeling, and so forth. Therefore, if the small sample learning architecture is used for solving the problems, the computing resources and labor cost can be greatly reduced, and the models and the algorithms can successfully solve the small sample learning problems and can achieve better effect under the condition of sufficient data. Secondly, in the fields of privacy, security, medical treatment and the like, supervision information is difficult or impossible to obtain, and small sample learning has gained much research on such tasks. In the medical field, the discovery of drugs often requires exploring the properties of molecules as the basis of new drugs, however, due to the possible toxicity and low activity of new molecules, there are generally few biological records and clinical experiments, which results in slow progress of research. In the recommendation field, the cold start problem has been troubling because there is not enough user support for a new system, leading to many algorithms based on user-commodity matrix decomposition failing, but it has become possible to learn models for such rare cases of samples through small samples.
Existing small sample learning frameworks are broadly divided into three categories, data-driven, model-driven, and algorithm-driven.
(1) And (3) data driving:
the data-driven approach makes more data through a priori knowledge to reduce estimation errors. It can be subdivided into two small directions, one using some transformations to assign a training set, and the other to obtain new data from other data sets. The former is typically data enhancement by some manual rules or to make the network learn how to transform the data set, such as common countermeasures to generate the network, e.g. simulating the data distribution while producing a large number of valid samples. Some of the methods enhance the picture by learning to change irrelevant background information in the picture, such as changing the sunlight of the picture in a target recognition task and modifying the scenery of the picture to achieve the purpose of increasing samples. The latter uses unsupervised data to enhance the expressiveness of the model, or adds similar data sets to enable the network to learn how to generate a good minimum experience risk device, for example, the method of adding unsupervised data sets generally uses a large unlabeled data set as a priori knowledge, and the key point is how to find data with the same label as that in the training set, and then adds the data into the training set to enhance the data, so that the change of the same kind of data can be increased to enable the model to have better generalization capability, and the skill is used in the semi-supervised prototype propagation graph network. The similarity data may bring some deviation and misleading because the similarity data is not directly designed for the target task, so that indistinguishable false data is generated on a data set with many data in a method based on a countermeasure generation network, and the mean and the variance on the data set are considered in the generation process, so that the generation process has more variability.
(2) And (3) driving an algorithm:
the algorithm drive considers that the search strategy in the assumed space is changed by using the prior knowledge, so that the optimal solution is better found, and the method can be roughly divided into three types: fine tuning already trained parameters, meta-learner, learning how to search. The first strategy stimulates the trend of storing the trained model parameters, so that the transfer learning becomes a more popular branch in the field of small sample learning. Thus learning how to adapt to new tasks becomes a new goal. The latter two strategies belong to the category of meta-learning, the former is to learn trained parameters in a plurality of tasks which are distributed in the same way through a meta-learner to be used as the initialization of a test task, and the latter directly applies the learned search steps or update rules to a new task.
(3) Driving a model:
the method tries to learn a proper feature embedding space where the feature embedding of pictures with the same label will be similar, while the features of different classes of pictures are exactly the opposite, and the final classification utilizes the nearest neighbor method. The twin network is classified by calculating similarity scores of pairs of picture inputs, and the matching network uses an attention mechanism and a memory unit to compare the similarity between the test sample and the support set sample. The prototype network uses the embedded mean value of the small sample class pictures as prototype expression of the class, and returns the prediction result by searching nearest neighbor. There are also methods to improve the depth metric of prototype networks or learning migratability by three clustering methods based on semi-supervision. Then, some scholars directly form a graph by similarity of the support set and the test sample, the graph is iterated and then directly classified by using the node characteristics after iteration, and some works adopt a closed label propagation mode to learn how to associate the test sample to the support set label by using a meta-learning mode in the relational graph, so that the test set label is directly obtained. There are also methods that use two-stage learning to add a priori knowledge, which is then used to assist in the subsequent small sample learning task.
A two-stage training mode is utilized, a model is trained in a large-scale data set to enable the model to have the capability of extracting features, a large data set is utilized as an auxiliary to provide some extra priori knowledge in the second stage, and a small sample data set is utilized for retraining so that the model can adapt to the conditions of few samples and many new tasks. The knowledge propagation processes of the two are global and are diverged out by the auxiliary set, and the direction propagation of the knowledge is not considered.
The existing small sample learning framework considers that training tasks are similar, however, actually dissimilar tasks bring negative migration to pollute the whole model, and the class center is difficult to extract from a small amount of data because extra knowledge is not learned; the directional transmissibility of knowledge learning is not considered, and the advantages brought by the front and back relations of knowledge learning are not utilized.
Disclosure of Invention
The invention provides a small sample classification method based on dynamic knowledge path learning, which solves the problems that the prior art is difficult to extract a class center from a small amount of data due to the fact that extra knowledge is not learned, and the other technologies utilizing global extra knowledge do not consider the benefits brought by knowledge learning directionality, often lack of interpretability and poor learning effect, and has the following specific technical scheme:
a small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:
in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;
in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;
and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
Preferably, in the knowledge selection stage based on the knowledge graph, the specific method for finding the learning path suitable for the small sample in the knowledge graph is as follows:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
Figure BDA0002668953790000031
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance
Figure BDA0002668953790000041
For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is
Figure BDA0002668953790000042
After selecting the knowledge points, the knowledge points are updated to
Figure BDA0002668953790000043
Using hidden states
Figure BDA0002668953790000044
The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
Figure BDA0002668953790000045
Figure BDA0002668953790000046
Figure BDA0002668953790000047
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,
Figure BDA0002668953790000048
representing the jth knowledge point in the auxiliary graph,
Figure BDA0002668953790000049
representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,
Figure BDA00026689537900000410
the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Figure BDA00026689537900000411
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node
Figure BDA00026689537900000412
That is, the ith instance picks the path node at time t as
Figure BDA00026689537900000413
Figure BDA00026689537900000414
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated
Figure BDA00026689537900000415
The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
Figure BDA00026689537900000416
furthermore, in the dynamic path generation phase based on the category constraint, the path acquisition is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path diAnd the t-th node therein is marked as
Figure BDA00026689537900000417
The path chosen for the ith instance is therefore routed
Figure BDA00026689537900000418
Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
Figure BDA0002668953790000051
Figure BDA0002668953790000052
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u;
while
Figure BDA0002668953790000053
The average degree of attention of example i to each knowledge point from time 1 to t is shown. When i is o, it is expressed as u or v, similarly
Figure BDA0002668953790000054
All represent the average attention degree of a certain example to each knowledge point from 1 to t;
Figure BDA0002668953790000055
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
Furthermore, in the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer states to enhance the feature expression of the target instance, and the nodes on the path and the hidden layer states updated through the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input by using a gating mechanism to update the corresponding hidden layer states:
Figure BDA0002668953790000056
Figure BDA0002668953790000057
Figure BDA0002668953790000058
Figure BDA0002668953790000059
wherein, Wr,Wz
Figure BDA00026689537900000510
All represent linear transformation operations, and
Figure BDA00026689537900000511
intermediate states that are gating mechanisms, and finally hidden states are removed from
Figure BDA00026689537900000512
Is updated to
Figure BDA00026689537900000513
Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtained
Figure BDA00026689537900000514
Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
Figure BDA0002668953790000061
wherein
Figure BDA0002668953790000062
And
Figure BDA0002668953790000063
each set is used to represent the situation at each time step,
Figure BDA0002668953790000064
represents the average degree of attention to the knowledge points at each moment, and
Figure BDA0002668953790000065
representing the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
Figure BDA0002668953790000066
Figure BDA0002668953790000067
wherein
Figure BDA0002668953790000068
The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known
Figure BDA0002668953790000069
Pr(y=c|qg) Representing the probability of belonging to category c;
Figure BDA00026689537900000610
to express p in the formulaiTranspose of (y)kHere again, the label of the kth instance in the supporting set is represented;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,
Figure BDA00026689537900000611
then, the cross entropy loss is used for measuring the classification loss, and N represents the packet of the support setNumber of classes contained:
Figure BDA00026689537900000612
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention can simulate the sequentiality of knowledge learning and the unexplainable property of a previous model, convert the classification problem into the problems of knowledge selection and sequence determination, and avoid the influence caused by dissimilar tasks.
Drawings
Fig. 1 is a flowchart of the small sample classification method based on dynamic path knowledge learning according to the present invention.
FIG. 2 is a workflow diagram of the present invention based on dynamic path knowledge learning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a small sample classification method based on dynamic knowledge path learning, which is divided into three stages of knowledge selection based on a knowledge graph, dynamic path generation based on category constraint and knowledge learning based on a path, and respectively considers the learning process of forming a learning path after knowledge is selected and then learning, acquires the commonality of categories by learning matched knowledge for each small sample example, and reduces the influence caused by respective difference among samples.
As shown in fig. 1, a method for classifying small samples based on dynamic knowledge path learning includes the following steps:
1. in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; the specific method comprises the following steps:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
Figure BDA0002668953790000071
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance
Figure BDA0002668953790000072
For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is
Figure BDA0002668953790000073
After selecting the knowledge points, the knowledge points are updated to
Figure BDA0002668953790000074
Using hidden states
Figure BDA0002668953790000075
The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
Figure BDA0002668953790000076
Figure BDA0002668953790000077
Figure BDA0002668953790000081
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,
Figure BDA0002668953790000082
representing the jth knowledge point in the auxiliary graph,
Figure BDA0002668953790000083
representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,
Figure BDA0002668953790000084
the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Figure BDA0002668953790000085
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node
Figure BDA0002668953790000086
That is, the ith instance picks the path node at time t as
Figure BDA0002668953790000087
Figure BDA0002668953790000088
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated
Figure BDA0002668953790000089
The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
Figure BDA00026689537900000810
2. in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; the path is obtained by selecting the most relevant knowledge point from the knowledge graph T times by a small sample example, and the path selected by the ith example is marked as a path diAnd the t-th node therein is marked as
Figure BDA00026689537900000811
The path chosen for the ith instance is therefore routed
Figure BDA00026689537900000812
Figure BDA00026689537900000813
Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
Figure BDA00026689537900000814
Figure BDA00026689537900000815
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u; KL is an abbreviation of Kullback-Leibler divergence;
Figure BDA0002668953790000091
the average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarly
Figure BDA0002668953790000092
All represent the average attention degree of a certain example to each knowledge point from 1 to t;
Figure BDA0002668953790000093
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
3. In the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer state to enhance the feature expression of the target example, and the nodes on the path and the hidden layer state updated through the recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer state by utilizing a gating mechanism:
Figure BDA0002668953790000094
Figure BDA0002668953790000095
Figure BDA0002668953790000096
Figure BDA0002668953790000097
wherein, Wr,Wz
Figure BDA0002668953790000098
All represent linear transformation operations, and
Figure BDA0002668953790000099
intermediate states that are gating mechanisms, and finally hidden states are removed from
Figure BDA00026689537900000910
Is updated to
Figure BDA00026689537900000911
Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtained
Figure BDA00026689537900000912
Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
Figure BDA00026689537900000913
wherein
Figure BDA00026689537900000914
And
Figure BDA00026689537900000915
each set is used to represent the situation at each time step,
Figure BDA00026689537900000916
represents the average degree of attention to the knowledge points at each moment, and
Figure BDA00026689537900000917
representing the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
Figure BDA00026689537900000918
Figure BDA0002668953790000101
wherein
Figure BDA0002668953790000102
The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known
Figure BDA0002668953790000103
Pr(y=c|qg) Representing the probability of belonging to category c;
Figure BDA0002668953790000104
to express p in the formulaiTranspose of (y)kAre still denoted hereinSupporting the tag of the kth instance;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,
Figure BDA0002668953790000105
then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:
Figure BDA0002668953790000106
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention provides a small sample classification method for dynamic path learning, which enables prior knowledge to be orderly transmitted among tasks, designs a loss function aiming at path generation and ensures the reasonability of path generation to the greatest extent.
As shown in fig. 2, the working flow of small sample learning based on the present invention is: and (3) forming a knowledge graph by the aid of similarity among the auxiliary sets, dynamically selecting a dedicated knowledge path for each small sample task instance when a small sample classification task is input, then learning a final feature expression according to the path, and performing feature classification on the basis of the feature expression to fulfill the purpose of small sample classification.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (4)

1.一种基于动态知识路径学习的小样本分类方法,其特征在于:步骤如下:1. a small sample classification method based on dynamic knowledge path learning, is characterized in that: step is as follows: 基于知识图的知识挑选阶段,通过将辅助集组成知识图,用以小样本实例在知识图中寻找适合自己的学习路径;In the knowledge selection stage based on knowledge graph, the auxiliary set is formed into a knowledge graph, and a small sample instance is used to find a suitable learning path in the knowledge graph; 基于类别约束的动态路径生成阶段,小样本实例选择知识图中最相关的知识点组成路径,引入路径在类别级的约束,获得类别共性,通过计算路径损失来约束路径的好坏;In the dynamic path generation stage based on category constraints, a small sample instance selects the most relevant knowledge points in the knowledge graph to form a path, introduces the constraints of the path at the category level, obtains the category commonality, and constrains the quality of the path by calculating the path loss; 基于路径的知识学习与分类阶段,顺序地将最相关的知识点所携带的信息提取出来增强目标实例的特征表达,查询集每一个实例与支撑集中每一个实例的特征表达两两进行相似度计算,使得目标实例分到相似度最高的类别上,然后使用交叉熵损失来衡量分类损失,通过分类损失和路径损失的加权求和建立小样本分类模型。In the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points is sequentially extracted to enhance the feature expression of the target instance, and the similarity calculation of each instance in the query set and the feature expression of each instance in the support set is performed. , so that the target instance is assigned to the category with the highest similarity, and then the cross entropy loss is used to measure the classification loss, and a small sample classification model is established by the weighted summation of the classification loss and the path loss. 2.根据权利要求1所述的一种基于动态知识路径学习的小样本分类方法,其特征在于:所述基于知识图的知识挑选阶段中,小样本实例在知识图中寻找适合自己的学习路径的具体方法为:2. A small sample classification method based on dynamic knowledge path learning according to claim 1, characterized in that: in the knowledge selection stage based on the knowledge graph, the small sample instance finds a learning path suitable for itself in the knowledge graph The specific method is: 知识图的节点直接由辅助集类别原型组成,原型即知识点,而边利用各辅助类别之间的相似度,作为边的权重将各节点进行连接;知识图节点之间的相似度计算由公式(1)所示:The nodes of the knowledge graph are directly composed of the prototype of the auxiliary set category, and the prototype is the knowledge point, and the edge uses the similarity between the auxiliary categories as the weight of the edge to connect the nodes; the similarity between the knowledge graph nodes is calculated by the formula (1) shows:
Figure FDA0002668953780000011
Figure FDA0002668953780000011
公式(1)中的p,q为辅助集类别知识点,s为相似度度量函数并将其定义为点积相似度;通过该函数可以得知识图中两两节点之间边的权重,从而确定出一个由辅助集B构成的知识图;In formula (1), p and q are the knowledge points of the auxiliary set category, and s is the similarity measurement function and is defined as the dot product similarity; through this function, the weight of the edge between two nodes in the knowledge graph can be obtained, so that Determine a knowledge graph composed of auxiliary set B; 为每一个小样本实例都挑选一条专属的知识路径,设知识路径长度为T,在每个时刻都在辅助集B构成的知识图中选取一个知识点作为路径上的结点;特别地,对于其中一条路径i即第i个实例所挑选的路径,设定一个隐藏状态
Figure FDA0002668953780000012
用于计算选取知识点的概率,而在时刻t第i个实例的隐藏状态为
Figure FDA0002668953780000013
选取知识点后将被更新为
Figure FDA0002668953780000014
Select an exclusive knowledge path for each small sample instance, set the length of the knowledge path as T, and select a knowledge point in the knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, for One of the paths i is the path selected by the i-th instance, and a hidden state is set
Figure FDA0002668953780000012
It is used to calculate the probability of selecting knowledge points, and the hidden state of the i-th instance at time t is
Figure FDA0002668953780000013
After the knowledge point is selected, it will be updated as
Figure FDA0002668953780000014
利用隐藏状态
Figure FDA0002668953780000015
与辅助集B中所有知识点进行注意力计算,通过注意力分布作为选取知识点j的概率
Figure FDA0002668953780000016
Use hidden state
Figure FDA0002668953780000015
Perform attention calculation with all knowledge points in the auxiliary set B, and use the attention distribution as the probability of selecting knowledge point j
Figure FDA0002668953780000016
Figure FDA0002668953780000017
Figure FDA0002668953780000017
Figure FDA0002668953780000021
Figure FDA0002668953780000021
在公式(2)中,WT,Wh和Wv都是线性变换操作,可以将矩阵转换到合适的维度,|B|表示辅助图中知识点的数量,
Figure FDA0002668953780000022
表示辅助图中第j个知识点,
Figure FDA0002668953780000023
表示第i个实例在时刻t对第j个知识点的偏好程度;
In formula (2), W T , W h and W v are all linear transformation operations, which can transform the matrix to the appropriate dimension, |B| represents the number of knowledge points in the auxiliary graph,
Figure FDA0002668953780000022
represents the jth knowledge point in the auxiliary graph,
Figure FDA0002668953780000023
Represents the preference of the i-th instance to the j-th knowledge point at time t;
在公式(3)中,
Figure FDA0002668953780000024
表示第i个实例在时刻t对第k个知识点的偏好程度,通过公式(3)转换为概率形式,从而得到选取知识点j的概率
Figure FDA0002668953780000025
In formula (3),
Figure FDA0002668953780000024
Represents the preference of the i-th instance to the k-th knowledge point at time t, and is converted into a probability form by formula (3), so as to obtain the probability of selecting knowledge point j
Figure FDA0002668953780000025
通过公式(4)选取概率最大知识点作为路径的结点,则路径上的第t个结点,记为
Figure FDA0002668953780000026
即第i个实例在时刻t挑选的路径结点为
Figure FDA0002668953780000027
According to formula (4), the knowledge point with the highest probability is selected as the node of the path, then the t-th node on the path is denoted as
Figure FDA0002668953780000026
That is, the path node selected by the i-th instance at time t is
Figure FDA0002668953780000027
Figure FDA0002668953780000028
Figure FDA0002668953780000028
在计算t+1时刻的隐藏层状态时引入了平均知识点特征
Figure FDA0002668953780000029
来降低在搜索路径时偏离原问题的影响,最终通过循环神经网络更新隐藏层状态:
The average knowledge point feature is introduced when calculating the hidden layer state at time t+1
Figure FDA0002668953780000029
To reduce the impact of deviating from the original problem when searching for a path, and finally update the hidden layer state through a recurrent neural network:
Figure FDA00026689537800000210
Figure FDA00026689537800000210
3.根据权利要求2所述的一种基于动态知识路径学习的小样本分类方法,其特征在于:所述基于类别约束的动态路径生成阶段中,路径的获取由小样本任务的实例从知识图中挑选最相关的知识点T次后得到,对于第i个实例所挑选的路径,记为路径di,而其中的第t个结点记为
Figure FDA00026689537800000211
因此对于第i个实例挑选的路径由
Figure FDA00026689537800000212
组成;
3. A small sample classification method based on dynamic knowledge path learning according to claim 2, characterized in that: in the class constraint-based dynamic path generation stage, the path is obtained by the instance of the small sample task from the knowledge graph After selecting the most relevant knowledge point T times from the , the path selected by the i-th instance is denoted as the path d i , and the t-th node in it is denoted as
Figure FDA00026689537800000211
So the path picked for the ith instance is given by
Figure FDA00026689537800000212
composition;
通过计算路径损失来约束路径的好坏,计算如下:The quality of the path is constrained by calculating the path loss, which is calculated as follows:
Figure FDA00026689537800000213
Figure FDA00026689537800000213
Figure FDA00026689537800000214
Figure FDA00026689537800000214
其中o,u,v均为索引,用来代表一个范围内的任意一个值;用|Q|,|S|来分别表示小样本任务中查询集下实例的个数和支撑集下实例的个数,yo,yv和yu均表示实例i,在i=o,v,u时的标签;Among them, o, u, and v are indexes, which are used to represent any value in a range; |Q|, |S| are used to represent the number of instances in the query set and the number of instances in the support set in the small sample task, respectively. Number, y o , y v and y u all represent instance i, the label when i=o, v, u;
Figure FDA0002668953780000031
为实例i在1~t时刻对各知识点的平均注意程度;此时i=o,同理,当i=u或v时,被表示为
Figure FDA0002668953780000032
均表示某一个实例在1~t时刻对各知识点的平均注意程度;
and
Figure FDA0002668953780000031
is the average degree of attention of instance i to each knowledge point from 1 to t; at this time i=o, in the same way, when i=u or v, it is expressed as
Figure FDA0002668953780000032
Both represent the average degree of attention paid to each knowledge point by a certain instance from 1 to t;
Figure FDA0002668953780000033
用来表示知识点是否有在1~t时刻被挑选出来,且数值随着时间步的增加而增加,来增大对不符合需求的分布的惩罚。
Figure FDA0002668953780000033
It is used to indicate whether the knowledge point is selected from 1 to t, and the value increases with the increase of time steps to increase the penalty for the distribution that does not meet the requirements.
4.根据权利要求3所述的一种基于动态知识路径学习的小样本分类方法,其特征在于:所述基于路径的知识学习与分类阶段中,通过隐藏层状态能够顺序地将最相关的知识点所携带的信息提取出来增强目标实例的特征表达,利用门控机制将路径上的结点和之前基于知识图的知识挑选阶段中通过循环神经网络更新的隐藏层状态作为输入更新对应的隐藏层状态:4. A small sample classification method based on dynamic knowledge path learning according to claim 3, characterized in that: in the path-based knowledge learning and classification stage, the most relevant knowledge can be sequentially sorted through the hidden layer state The information carried by the point is extracted to enhance the feature expression of the target instance, and the node on the path and the state of the hidden layer updated by the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer. state:
Figure FDA0002668953780000034
Figure FDA0002668953780000034
Figure FDA0002668953780000035
Figure FDA0002668953780000035
Figure FDA0002668953780000036
Figure FDA0002668953780000036
Figure FDA0002668953780000037
Figure FDA0002668953780000037
其中,Wr,Wz
Figure FDA0002668953780000038
均表示线性变换操作,而
Figure FDA0002668953780000039
均为门控机制的中间状态,最终将隐藏状态从
Figure FDA00026689537800000310
更新为
Figure FDA00026689537800000311
σ为激活函数,用于增加门控机制的中间状态的非线性特征;
Among them, W r , W z ,
Figure FDA0002668953780000038
both represent linear transformation operations, and
Figure FDA0002668953780000039
are the intermediate states of the gating mechanism, and finally the hidden state is changed from
Figure FDA00026689537800000310
update to
Figure FDA00026689537800000311
σ is the activation function, which is used to increase the nonlinear characteristics of the intermediate state of the gating mechanism;
经过T个知识点信息的汇聚后,得到最终的隐藏层状态
Figure FDA00026689537800000312
结合基于知识图的知识挑选阶段的注意力分布,通过一个输出网络来得到该实例在新空间中的特征表达,用pior qg分别来表示来自于支撑集第g个实例或来自于查询集第i个实例在新空间中的特征表达:
After the aggregation of T knowledge point information, the final hidden layer state is obtained
Figure FDA00026689537800000312
Combined with the attention distribution of the knowledge selection stage based on knowledge graph, an output network is used to obtain the feature expression of the instance in the new space, and p i or q g are used to represent the gth instance from the support set or from the query. The feature representation of the i-th instance in the new space:
Figure FDA00026689537800000313
Figure FDA00026689537800000313
其中
Figure FDA00026689537800000314
Figure FDA00026689537800000315
均为一个集合用来表示每个时间步的情况,
Figure FDA00026689537800000316
表示每个时刻对各知识点的平均注意程度,而
Figure FDA00026689537800000317
表示各时刻的隐藏层状态;
in
Figure FDA00026689537800000314
and
Figure FDA00026689537800000315
are a set used to represent the situation at each time step,
Figure FDA00026689537800000316
represents the average degree of attention to each knowledge point at each moment, and
Figure FDA00026689537800000317
Represents the state of the hidden layer at each moment;
查询集实例与支撑集中每一个小样本实例的特征表达进行相似度计算,使得目标实例分到相似度最高的类别上,计算公式如下:The similarity calculation is performed between the query set instance and the feature expression of each small sample instance in the support set, so that the target instance is assigned to the category with the highest similarity. The calculation formula is as follows:
Figure FDA00026689537800000318
Figure FDA00026689537800000318
Figure FDA0002668953780000041
Figure FDA0002668953780000041
其中
Figure FDA0002668953780000042
表示查询集中第g个实例对支撑集中第i个实例的相似性,同理可知
Figure FDA0002668953780000043
Pr(y=c|qg)表示属于类别c的概率;
Figure FDA0002668953780000044
为在公式中表示pi的转置,yk在此仍然表示支撑集中第k个实例的标签;
in
Figure FDA0002668953780000042
Represents the similarity of the g-th instance in the query set to the i-th instance in the support set. Similarly, it can be seen that
Figure FDA0002668953780000043
Pr(y=c|q g ) represents the probability of belonging to category c;
Figure FDA0002668953780000044
In order to express the transpose of pi in the formula, y k here still represents the label of the k -th instance in the support set;
其中Ws是可学习的参数,用Prg表示查询集实例g对各类别的概率向量,
Figure FDA0002668953780000045
之后使用交叉熵损失来衡量分类损失,N表示支撑集所包含的类别数量:
where W s is a learnable parameter, and Pr g is used to represent the probability vector of query set instance g for each category,
Figure FDA0002668953780000045
Then use the cross entropy loss to measure the classification loss, N represents the number of categories contained in the support set:
Figure FDA0002668953780000046
Figure FDA0002668953780000046
通过分类损失和路径损失的加权求和建立小样本分类模型:Build a few-shot classification model by weighted summation of classification loss and path loss: L=λL1+μL2+νL3# (16)L=λL 1 +μL 2 +νL 3 # (16) 其中λ,μ和ν均为超参数,用来控制各损失函数的权重;利用该目标函数能够指导模型寻找更为合理的知识路径并提高小样本分类的精度。Among them, λ, μ and ν are hyperparameters used to control the weight of each loss function; using this objective function can guide the model to find a more reasonable knowledge path and improve the accuracy of small sample classification.
CN202010927478.4A 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning Active CN112183580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010927478.4A CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010927478.4A CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Publications (2)

Publication Number Publication Date
CN112183580A true CN112183580A (en) 2021-01-05
CN112183580B CN112183580B (en) 2021-08-10

Family

ID=73924858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010927478.4A Active CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Country Status (1)

Country Link
CN (1) CN112183580B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800196A (en) * 2021-01-18 2021-05-14 北京明略软件系统有限公司 FAQ question-answer library matching method and system based on twin network
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908650A (en) * 2017-10-12 2018-04-13 浙江大学 Knowledge train of thought method for auto constructing based on mass digital books
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A knowledge-driven parameter propagation model and its few-shot learning method
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908650A (en) * 2017-10-12 2018-04-13 浙江大学 Knowledge train of thought method for auto constructing based on mass digital books
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A knowledge-driven parameter propagation model and its few-shot learning method
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUAXIU等: ""Graph Few-shot Learning via Knowledge Transfer"", 《ARXIV》 *
WENHAN等: ""One-Shot Relational Learning for Knowledge Graphs"", 《PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800196A (en) * 2021-01-18 2021-05-14 北京明略软件系统有限公司 FAQ question-answer library matching method and system based on twin network
CN112800196B (en) * 2021-01-18 2024-03-01 南京明略科技有限公司 FAQ question-answering library matching method and system based on twin network
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system

Also Published As

Publication number Publication date
CN112183580B (en) 2021-08-10

Similar Documents

Publication Publication Date Title
Ma et al. SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search
Xin et al. Semi-supervised person re-identification using multi-view clustering
CN109948425B (en) A pedestrian search method and device based on structure-aware self-attention and online instance aggregation and matching
Li et al. 2-D stochastic configuration networks for image data analytics
CN110909673B (en) Pedestrian re-identification method based on natural language description
Huang et al. Cost-effective training of deep cnns with active model adaptation
Liu et al. Incdet: In defense of elastic weight consolidation for incremental object detection
Wang et al. Relational deep learning: A deep latent variable model for link prediction
Yu et al. Unsupervised random forest indexing for fast action search
Cong et al. Self-supervised online metric learning with low rank constraint for scene categorization
CN112131967A (en) Remote sensing scene classification method based on multi-classifier anti-transfer learning
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
CN117992805B (en) Zero-shot cross-modal retrieval method and system based on tensor product graph fusion diffusion
Liu et al. Boosting semi-supervised face recognition with noise robustness
CN112183580B (en) Small sample classification method based on dynamic knowledge path learning
Menaga et al. Deep learning: a recent computing platform for multimedia information retrieval
Wang et al. Dynamic texture video classification using extreme learning machine
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
Chen et al. Active one-shot learning by a deep Q-network strategy
CN114817581B (en) Cross-modal hash retrieval method based on fusion attention mechanism and DenseNet network
US20240221381A1 (en) Obtaining Custom Artificial Neural Network Architectures
Dou Research on personalized recommendation algorithm based on cluster analysis and artificial intelligence
Ou et al. Improving person re-identification by multi-task learning
US20210365794A1 (en) Discovering Novel Artificial Neural Network Architectures
Skublewska-Paszkowska et al. Recognition of tennis shots using convolutional neural networks based on three-dimensional data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Harbin Institute of Technology,Shenzhen(Shenzhen Institute of science and technology innovation Harbin Institute of Technology)

Address before: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN)

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant