CN112183580A - Small sample classification method based on dynamic knowledge path learning - Google Patents
Small sample classification method based on dynamic knowledge path learning Download PDFInfo
- Publication number
- CN112183580A CN112183580A CN202010927478.4A CN202010927478A CN112183580A CN 112183580 A CN112183580 A CN 112183580A CN 202010927478 A CN202010927478 A CN 202010927478A CN 112183580 A CN112183580 A CN 112183580A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- path
- small sample
- instance
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 18
- 238000013145 classification model Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 16
- 230000007246 mechanism Effects 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000012549 training Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 231100000683 possible toxicity Toxicity 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012559 user support system Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Animal Behavior & Ethology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
A small sample classification method based on dynamic knowledge path learning comprises the following steps: in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each small sample example in the query set example and the support set, so that the target example is classified into the category with the highest similarity, then the cross entropy loss is used for measuring the classification loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
Description
Technical Field
The invention relates to a small sample classification method based on dynamic knowledge path learning, and belongs to the technical field of small sample classification.
Background
Deep learning has achieved good results in various fields, but the existing artificial intelligence depends on mass data training, the model generalization capability is poor, and the effect and the capability of rapidly expanding to a new task in the limited data field are unsatisfactory. To address this problem, a small sample Learning (FSL) problem is proposed. Small sample learning can help alleviate the difficulty of collecting large-scale supervised data or manual labeling, making artificial intelligence more convenient to use in an industrial environment, such as ResNet's classification accuracy on manually labeled ImageNet datasets has exceeded human, but on the other hand people can identify 30,000 classes, making it almost impossible for a machine to do a task. In contrast, small sample learning may help reduce the data collection effort for these data intensive applications, such as image classification, image retrieval, target tracking, gesture recognition, image captioning and visual problem solving, video event detection and language modeling, and so forth. Therefore, if the small sample learning architecture is used for solving the problems, the computing resources and labor cost can be greatly reduced, and the models and the algorithms can successfully solve the small sample learning problems and can achieve better effect under the condition of sufficient data. Secondly, in the fields of privacy, security, medical treatment and the like, supervision information is difficult or impossible to obtain, and small sample learning has gained much research on such tasks. In the medical field, the discovery of drugs often requires exploring the properties of molecules as the basis of new drugs, however, due to the possible toxicity and low activity of new molecules, there are generally few biological records and clinical experiments, which results in slow progress of research. In the recommendation field, the cold start problem has been troubling because there is not enough user support for a new system, leading to many algorithms based on user-commodity matrix decomposition failing, but it has become possible to learn models for such rare cases of samples through small samples.
Existing small sample learning frameworks are broadly divided into three categories, data-driven, model-driven, and algorithm-driven.
(1) And (3) data driving:
the data-driven approach makes more data through a priori knowledge to reduce estimation errors. It can be subdivided into two small directions, one using some transformations to assign a training set, and the other to obtain new data from other data sets. The former is typically data enhancement by some manual rules or to make the network learn how to transform the data set, such as common countermeasures to generate the network, e.g. simulating the data distribution while producing a large number of valid samples. Some of the methods enhance the picture by learning to change irrelevant background information in the picture, such as changing the sunlight of the picture in a target recognition task and modifying the scenery of the picture to achieve the purpose of increasing samples. The latter uses unsupervised data to enhance the expressiveness of the model, or adds similar data sets to enable the network to learn how to generate a good minimum experience risk device, for example, the method of adding unsupervised data sets generally uses a large unlabeled data set as a priori knowledge, and the key point is how to find data with the same label as that in the training set, and then adds the data into the training set to enhance the data, so that the change of the same kind of data can be increased to enable the model to have better generalization capability, and the skill is used in the semi-supervised prototype propagation graph network. The similarity data may bring some deviation and misleading because the similarity data is not directly designed for the target task, so that indistinguishable false data is generated on a data set with many data in a method based on a countermeasure generation network, and the mean and the variance on the data set are considered in the generation process, so that the generation process has more variability.
(2) And (3) driving an algorithm:
the algorithm drive considers that the search strategy in the assumed space is changed by using the prior knowledge, so that the optimal solution is better found, and the method can be roughly divided into three types: fine tuning already trained parameters, meta-learner, learning how to search. The first strategy stimulates the trend of storing the trained model parameters, so that the transfer learning becomes a more popular branch in the field of small sample learning. Thus learning how to adapt to new tasks becomes a new goal. The latter two strategies belong to the category of meta-learning, the former is to learn trained parameters in a plurality of tasks which are distributed in the same way through a meta-learner to be used as the initialization of a test task, and the latter directly applies the learned search steps or update rules to a new task.
(3) Driving a model:
the method tries to learn a proper feature embedding space where the feature embedding of pictures with the same label will be similar, while the features of different classes of pictures are exactly the opposite, and the final classification utilizes the nearest neighbor method. The twin network is classified by calculating similarity scores of pairs of picture inputs, and the matching network uses an attention mechanism and a memory unit to compare the similarity between the test sample and the support set sample. The prototype network uses the embedded mean value of the small sample class pictures as prototype expression of the class, and returns the prediction result by searching nearest neighbor. There are also methods to improve the depth metric of prototype networks or learning migratability by three clustering methods based on semi-supervision. Then, some scholars directly form a graph by similarity of the support set and the test sample, the graph is iterated and then directly classified by using the node characteristics after iteration, and some works adopt a closed label propagation mode to learn how to associate the test sample to the support set label by using a meta-learning mode in the relational graph, so that the test set label is directly obtained. There are also methods that use two-stage learning to add a priori knowledge, which is then used to assist in the subsequent small sample learning task.
A two-stage training mode is utilized, a model is trained in a large-scale data set to enable the model to have the capability of extracting features, a large data set is utilized as an auxiliary to provide some extra priori knowledge in the second stage, and a small sample data set is utilized for retraining so that the model can adapt to the conditions of few samples and many new tasks. The knowledge propagation processes of the two are global and are diverged out by the auxiliary set, and the direction propagation of the knowledge is not considered.
The existing small sample learning framework considers that training tasks are similar, however, actually dissimilar tasks bring negative migration to pollute the whole model, and the class center is difficult to extract from a small amount of data because extra knowledge is not learned; the directional transmissibility of knowledge learning is not considered, and the advantages brought by the front and back relations of knowledge learning are not utilized.
Disclosure of Invention
The invention provides a small sample classification method based on dynamic knowledge path learning, which solves the problems that the prior art is difficult to extract a class center from a small amount of data due to the fact that extra knowledge is not learned, and the other technologies utilizing global extra knowledge do not consider the benefits brought by knowledge learning directionality, often lack of interpretability and poor learning effect, and has the following specific technical scheme:
a small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:
in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;
in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;
and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
Preferably, in the knowledge selection stage based on the knowledge graph, the specific method for finding the learning path suitable for the small sample in the knowledge graph is as follows:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instanceFor calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t isAfter selecting the knowledge points, the knowledge points are updated to
Using hidden statesThe attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,representing the jth knowledge point in the auxiliary graph,representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the nodeThat is, the ith instance picks the path node at time t as
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculatedThe influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
furthermore, in the dynamic path generation phase based on the category constraint, the path acquisition is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path diAnd the t-th node therein is marked asThe path chosen for the ith instance is therefore routedComposition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u;
whileThe average degree of attention of example i to each knowledge point from time 1 to t is shown. When i is o, it is expressed as u or v, similarlyAll represent the average attention degree of a certain example to each knowledge point from 1 to t;
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
Furthermore, in the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer states to enhance the feature expression of the target instance, and the nodes on the path and the hidden layer states updated through the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input by using a gating mechanism to update the corresponding hidden layer states:
wherein, Wr,Wz,All represent linear transformation operations, andintermediate states that are gating mechanisms, and finally hidden states are removed fromIs updated toSigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtainedObtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
whereinAndeach set is used to represent the situation at each time step,represents the average degree of attention to the knowledge points at each moment, andrepresenting the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
whereinThe similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be knownPr(y=c|qg) Representing the probability of belonging to category c;to express p in the formulaiTranspose of (y)kHere again, the label of the kth instance in the supporting set is represented;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,then, the cross entropy loss is used for measuring the classification loss, and N represents the packet of the support setNumber of classes contained:
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention can simulate the sequentiality of knowledge learning and the unexplainable property of a previous model, convert the classification problem into the problems of knowledge selection and sequence determination, and avoid the influence caused by dissimilar tasks.
Drawings
Fig. 1 is a flowchart of the small sample classification method based on dynamic path knowledge learning according to the present invention.
FIG. 2 is a workflow diagram of the present invention based on dynamic path knowledge learning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a small sample classification method based on dynamic knowledge path learning, which is divided into three stages of knowledge selection based on a knowledge graph, dynamic path generation based on category constraint and knowledge learning based on a path, and respectively considers the learning process of forming a learning path after knowledge is selected and then learning, acquires the commonality of categories by learning matched knowledge for each small sample example, and reduces the influence caused by respective difference among samples.
As shown in fig. 1, a method for classifying small samples based on dynamic knowledge path learning includes the following steps:
1. in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; the specific method comprises the following steps:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instanceFor calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t isAfter selecting the knowledge points, the knowledge points are updated to
Using hidden statesThe attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,representing the jth knowledge point in the auxiliary graph,representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the nodeThat is, the ith instance picks the path node at time t as
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculatedThe influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
2. in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; the path is obtained by selecting the most relevant knowledge point from the knowledge graph T times by a small sample example, and the path selected by the ith example is marked as a path diAnd the t-th node therein is marked asThe path chosen for the ith instance is therefore routed Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u; KL is an abbreviation of Kullback-Leibler divergence;
the average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarlyAll represent the average attention degree of a certain example to each knowledge point from 1 to t;
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
3. In the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer state to enhance the feature expression of the target example, and the nodes on the path and the hidden layer state updated through the recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer state by utilizing a gating mechanism:
wherein, Wr,Wz,All represent linear transformation operations, andintermediate states that are gating mechanisms, and finally hidden states are removed fromIs updated toSigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtainedObtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
whereinAndeach set is used to represent the situation at each time step,represents the average degree of attention to the knowledge points at each moment, andrepresenting the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
whereinThe similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be knownPr(y=c|qg) Representing the probability of belonging to category c;to express p in the formulaiTranspose of (y)kAre still denoted hereinSupporting the tag of the kth instance;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention provides a small sample classification method for dynamic path learning, which enables prior knowledge to be orderly transmitted among tasks, designs a loss function aiming at path generation and ensures the reasonability of path generation to the greatest extent.
As shown in fig. 2, the working flow of small sample learning based on the present invention is: and (3) forming a knowledge graph by the aid of similarity among the auxiliary sets, dynamically selecting a dedicated knowledge path for each small sample task instance when a small sample classification task is input, then learning a final feature expression according to the path, and performing feature classification on the basis of the feature expression to fulfill the purpose of small sample classification.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (4)
1. A small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:
in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;
in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;
and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
2. The method for classifying the small samples based on the dynamic knowledge path learning as claimed in claim 1, wherein: in the knowledge selection stage based on the knowledge graph, the specific method for searching the learning path suitable for the small sample example in the knowledge graph is as follows:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instanceFor calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t isAfter selecting the knowledge points, the knowledge points are updated to
Using hidden statesThe attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
In the formula (2), WT,WhAnd WvAll linear transformation operations, which convert the matrix into the appropriate dimensionsB | represents the number of knowledge points in the auxiliary graph,representing the jth knowledge point in the auxiliary graph,representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,the preference degree of the ith instance to the kth knowledge point at the moment t is expressed, and the ith instance is converted into a probability form through a formula (3), so that the probability of selecting the knowledge point j is obtained
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the nodeThat is, the ith instance picks the path node at time t as
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculatedThe influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
3. the method for classifying the small samples based on the dynamic knowledge path learning as claimed in claim 2, wherein: in the dynamic path generation stage based on the category constraint, the path is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path diAnd the t-th node therein is marked asThe path chosen for the ith instance is therefore routedComposition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u;
whileThe average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarlyAll represent the average attention degree of a certain example to each knowledge point from 1 to t;
4. The method for classifying the small samples based on the dynamic knowledge path learning, according to claim 3, is characterized in that: in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through hidden layer states to enhance the feature expression of a target example, and nodes on the path and hidden layer states updated through a recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer states by utilizing a gating mechanism:
wherein, Wr,Wz,All represent linear transformation operations, andintermediate states that are gating mechanisms, and finally hidden states are removed fromIs updated toSigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtainedObtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pior qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
whereinAndeach set is used to represent the situation at each time step,represents the average degree of attention to the knowledge points at each moment, andrepresenting the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
whereinThe similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be knownPr(y=c|qg) Representing the probability of belonging to category c;to express p in the formulaiTranspose of (y)kHere again, the label of the kth instance in the supporting set is represented;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3# (16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010927478.4A CN112183580B (en) | 2020-09-07 | 2020-09-07 | Small sample classification method based on dynamic knowledge path learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010927478.4A CN112183580B (en) | 2020-09-07 | 2020-09-07 | Small sample classification method based on dynamic knowledge path learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112183580A true CN112183580A (en) | 2021-01-05 |
CN112183580B CN112183580B (en) | 2021-08-10 |
Family
ID=73924858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010927478.4A Active CN112183580B (en) | 2020-09-07 | 2020-09-07 | Small sample classification method based on dynamic knowledge path learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112183580B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800196A (en) * | 2021-01-18 | 2021-05-14 | 北京明略软件系统有限公司 | FAQ question-answer library matching method and system based on twin network |
CN115100532A (en) * | 2022-08-02 | 2022-09-23 | 北京卫星信息工程研究所 | Small sample remote sensing image target detection method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908650A (en) * | 2017-10-12 | 2018-04-13 | 浙江大学 | Knowledge train of thought method for auto constructing based on mass digital books |
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
CN111222049A (en) * | 2020-01-08 | 2020-06-02 | 东北大学 | Top-k similarity searching method on semantically enhanced heterogeneous information network |
-
2020
- 2020-09-07 CN CN202010927478.4A patent/CN112183580B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908650A (en) * | 2017-10-12 | 2018-04-13 | 浙江大学 | Knowledge train of thought method for auto constructing based on mass digital books |
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
CN111222049A (en) * | 2020-01-08 | 2020-06-02 | 东北大学 | Top-k similarity searching method on semantically enhanced heterogeneous information network |
Non-Patent Citations (2)
Title |
---|
HUAXIU等: ""Graph Few-shot Learning via Knowledge Transfer"", 《ARXIV》 * |
WENHAN等: ""One-Shot Relational Learning for Knowledge Graphs"", 《PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800196A (en) * | 2021-01-18 | 2021-05-14 | 北京明略软件系统有限公司 | FAQ question-answer library matching method and system based on twin network |
CN112800196B (en) * | 2021-01-18 | 2024-03-01 | 南京明略科技有限公司 | FAQ question-answering library matching method and system based on twin network |
CN115100532A (en) * | 2022-08-02 | 2022-09-23 | 北京卫星信息工程研究所 | Small sample remote sensing image target detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112183580B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | 2-D stochastic configuration networks for image data analytics | |
Huang et al. | Cost-effective training of deep cnns with active model adaptation | |
Wu et al. | A novel deep model with multi-loss and efficient training for person re-identification | |
EP3798917A1 (en) | Generative adversarial network (gan) for generating images | |
Yu et al. | Unsupervised random forest indexing for fast action search | |
CN111967294A (en) | Unsupervised domain self-adaptive pedestrian re-identification method | |
Chu et al. | Unsupervised temporal commonality discovery | |
CN110674323A (en) | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression | |
CN112183580B (en) | Small sample classification method based on dynamic knowledge path learning | |
CN114333027B (en) | Cross-domain novel facial expression recognition method based on combined and alternate learning frames | |
CN117992805B (en) | Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion | |
Gu et al. | Local optimality of self-organising neuro-fuzzy inference systems | |
Menaga et al. | Deep learning: a recent computing platform for multimedia information retrieval | |
CN110991500A (en) | Small sample multi-classification method based on nested integrated depth support vector machine | |
Wang et al. | A novel multiface recognition method with short training time and lightweight based on ABASNet and H-softmax | |
CN114973226B (en) | Training method for text recognition system in self-supervision contrast learning natural scene | |
CN116912624A (en) | Pseudo tag unsupervised data training method, device, equipment and medium | |
Ou et al. | Improving person re-identification by multi-task learning | |
CN115392474B (en) | Local perception graph representation learning method based on iterative optimization | |
CN116680578A (en) | Cross-modal model-based deep semantic understanding method | |
CN116092138A (en) | K neighbor graph iterative vein recognition method and system based on deep learning | |
Xue et al. | Fast and unsupervised neural architecture evolution for visual representation learning | |
Tian et al. | Modeling cardinality in image hashing | |
CN113887353A (en) | Visible light-infrared pedestrian re-identification method and system | |
CN113420821A (en) | Multi-label learning method based on local correlation of labels and features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB02 | Change of applicant information |
Address after: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province Applicant after: Harbin Institute of Technology,Shenzhen(Shenzhen Institute of science and technology innovation Harbin Institute of Technology) Address before: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province Applicant before: HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN) |
|
CB02 | Change of applicant information | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |