CN112183580A - Small sample classification method based on dynamic knowledge path learning - Google Patents

Small sample classification method based on dynamic knowledge path learning Download PDF

Info

Publication number
CN112183580A
CN112183580A CN202010927478.4A CN202010927478A CN112183580A CN 112183580 A CN112183580 A CN 112183580A CN 202010927478 A CN202010927478 A CN 202010927478A CN 112183580 A CN112183580 A CN 112183580A
Authority
CN
China
Prior art keywords
knowledge
path
small sample
instance
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010927478.4A
Other languages
Chinese (zh)
Other versions
CN112183580B (en
Inventor
廖清
尹哲
柴合言
漆舒汉
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202010927478.4A priority Critical patent/CN112183580B/en
Publication of CN112183580A publication Critical patent/CN112183580A/en
Application granted granted Critical
Publication of CN112183580B publication Critical patent/CN112183580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A small sample classification method based on dynamic knowledge path learning comprises the following steps: in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each small sample example in the query set example and the support set, so that the target example is classified into the category with the highest similarity, then the cross entropy loss is used for measuring the classification loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.

Description

Small sample classification method based on dynamic knowledge path learning
Technical Field
The invention relates to a small sample classification method based on dynamic knowledge path learning, and belongs to the technical field of small sample classification.
Background
Deep learning has achieved good results in various fields, but the existing artificial intelligence depends on mass data training, the model generalization capability is poor, and the effect and the capability of rapidly expanding to a new task in the limited data field are unsatisfactory. To address this problem, a small sample Learning (FSL) problem is proposed. Small sample learning can help alleviate the difficulty of collecting large-scale supervised data or manual labeling, making artificial intelligence more convenient to use in an industrial environment, such as ResNet's classification accuracy on manually labeled ImageNet datasets has exceeded human, but on the other hand people can identify 30,000 classes, making it almost impossible for a machine to do a task. In contrast, small sample learning may help reduce the data collection effort for these data intensive applications, such as image classification, image retrieval, target tracking, gesture recognition, image captioning and visual problem solving, video event detection and language modeling, and so forth. Therefore, if the small sample learning architecture is used for solving the problems, the computing resources and labor cost can be greatly reduced, and the models and the algorithms can successfully solve the small sample learning problems and can achieve better effect under the condition of sufficient data. Secondly, in the fields of privacy, security, medical treatment and the like, supervision information is difficult or impossible to obtain, and small sample learning has gained much research on such tasks. In the medical field, the discovery of drugs often requires exploring the properties of molecules as the basis of new drugs, however, due to the possible toxicity and low activity of new molecules, there are generally few biological records and clinical experiments, which results in slow progress of research. In the recommendation field, the cold start problem has been troubling because there is not enough user support for a new system, leading to many algorithms based on user-commodity matrix decomposition failing, but it has become possible to learn models for such rare cases of samples through small samples.
Existing small sample learning frameworks are broadly divided into three categories, data-driven, model-driven, and algorithm-driven.
(1) And (3) data driving:
the data-driven approach makes more data through a priori knowledge to reduce estimation errors. It can be subdivided into two small directions, one using some transformations to assign a training set, and the other to obtain new data from other data sets. The former is typically data enhancement by some manual rules or to make the network learn how to transform the data set, such as common countermeasures to generate the network, e.g. simulating the data distribution while producing a large number of valid samples. Some of the methods enhance the picture by learning to change irrelevant background information in the picture, such as changing the sunlight of the picture in a target recognition task and modifying the scenery of the picture to achieve the purpose of increasing samples. The latter uses unsupervised data to enhance the expressiveness of the model, or adds similar data sets to enable the network to learn how to generate a good minimum experience risk device, for example, the method of adding unsupervised data sets generally uses a large unlabeled data set as a priori knowledge, and the key point is how to find data with the same label as that in the training set, and then adds the data into the training set to enhance the data, so that the change of the same kind of data can be increased to enable the model to have better generalization capability, and the skill is used in the semi-supervised prototype propagation graph network. The similarity data may bring some deviation and misleading because the similarity data is not directly designed for the target task, so that indistinguishable false data is generated on a data set with many data in a method based on a countermeasure generation network, and the mean and the variance on the data set are considered in the generation process, so that the generation process has more variability.
(2) And (3) driving an algorithm:
the algorithm drive considers that the search strategy in the assumed space is changed by using the prior knowledge, so that the optimal solution is better found, and the method can be roughly divided into three types: fine tuning already trained parameters, meta-learner, learning how to search. The first strategy stimulates the trend of storing the trained model parameters, so that the transfer learning becomes a more popular branch in the field of small sample learning. Thus learning how to adapt to new tasks becomes a new goal. The latter two strategies belong to the category of meta-learning, the former is to learn trained parameters in a plurality of tasks which are distributed in the same way through a meta-learner to be used as the initialization of a test task, and the latter directly applies the learned search steps or update rules to a new task.
(3) Driving a model:
the method tries to learn a proper feature embedding space where the feature embedding of pictures with the same label will be similar, while the features of different classes of pictures are exactly the opposite, and the final classification utilizes the nearest neighbor method. The twin network is classified by calculating similarity scores of pairs of picture inputs, and the matching network uses an attention mechanism and a memory unit to compare the similarity between the test sample and the support set sample. The prototype network uses the embedded mean value of the small sample class pictures as prototype expression of the class, and returns the prediction result by searching nearest neighbor. There are also methods to improve the depth metric of prototype networks or learning migratability by three clustering methods based on semi-supervision. Then, some scholars directly form a graph by similarity of the support set and the test sample, the graph is iterated and then directly classified by using the node characteristics after iteration, and some works adopt a closed label propagation mode to learn how to associate the test sample to the support set label by using a meta-learning mode in the relational graph, so that the test set label is directly obtained. There are also methods that use two-stage learning to add a priori knowledge, which is then used to assist in the subsequent small sample learning task.
A two-stage training mode is utilized, a model is trained in a large-scale data set to enable the model to have the capability of extracting features, a large data set is utilized as an auxiliary to provide some extra priori knowledge in the second stage, and a small sample data set is utilized for retraining so that the model can adapt to the conditions of few samples and many new tasks. The knowledge propagation processes of the two are global and are diverged out by the auxiliary set, and the direction propagation of the knowledge is not considered.
The existing small sample learning framework considers that training tasks are similar, however, actually dissimilar tasks bring negative migration to pollute the whole model, and the class center is difficult to extract from a small amount of data because extra knowledge is not learned; the directional transmissibility of knowledge learning is not considered, and the advantages brought by the front and back relations of knowledge learning are not utilized.
Disclosure of Invention
The invention provides a small sample classification method based on dynamic knowledge path learning, which solves the problems that the prior art is difficult to extract a class center from a small amount of data due to the fact that extra knowledge is not learned, and the other technologies utilizing global extra knowledge do not consider the benefits brought by knowledge learning directionality, often lack of interpretability and poor learning effect, and has the following specific technical scheme:
a small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:
in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;
in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;
and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
Preferably, in the knowledge selection stage based on the knowledge graph, the specific method for finding the learning path suitable for the small sample in the knowledge graph is as follows:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
Figure BDA0002668953790000031
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance
Figure BDA0002668953790000041
For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is
Figure BDA0002668953790000042
After selecting the knowledge points, the knowledge points are updated to
Figure BDA0002668953790000043
Using hidden states
Figure BDA0002668953790000044
The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
Figure BDA0002668953790000045
Figure BDA0002668953790000046
Figure BDA0002668953790000047
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,
Figure BDA0002668953790000048
representing the jth knowledge point in the auxiliary graph,
Figure BDA0002668953790000049
representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,
Figure BDA00026689537900000410
the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Figure BDA00026689537900000411
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node
Figure BDA00026689537900000412
That is, the ith instance picks the path node at time t as
Figure BDA00026689537900000413
Figure BDA00026689537900000414
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated
Figure BDA00026689537900000415
The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
Figure BDA00026689537900000416
furthermore, in the dynamic path generation phase based on the category constraint, the path acquisition is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path diAnd the t-th node therein is marked as
Figure BDA00026689537900000417
The path chosen for the ith instance is therefore routed
Figure BDA00026689537900000418
Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
Figure BDA0002668953790000051
Figure BDA0002668953790000052
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u;
while
Figure BDA0002668953790000053
The average degree of attention of example i to each knowledge point from time 1 to t is shown. When i is o, it is expressed as u or v, similarly
Figure BDA0002668953790000054
All represent the average attention degree of a certain example to each knowledge point from 1 to t;
Figure BDA0002668953790000055
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
Furthermore, in the path-based knowledge learning and classification stage, the information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer states to enhance the feature expression of the target instance, and the nodes on the path and the hidden layer states updated through the recurrent neural network in the previous knowledge selection stage based on the knowledge graph are used as input by using a gating mechanism to update the corresponding hidden layer states:
Figure BDA0002668953790000056
Figure BDA0002668953790000057
Figure BDA0002668953790000058
Figure BDA0002668953790000059
wherein, Wr,Wz
Figure BDA00026689537900000510
All represent linear transformation operations, and
Figure BDA00026689537900000511
intermediate states that are gating mechanisms, and finally hidden states are removed from
Figure BDA00026689537900000512
Is updated to
Figure BDA00026689537900000513
Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtained
Figure BDA00026689537900000514
Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
Figure BDA0002668953790000061
wherein
Figure BDA0002668953790000062
And
Figure BDA0002668953790000063
each set is used to represent the situation at each time step,
Figure BDA0002668953790000064
represents the average degree of attention to the knowledge points at each moment, and
Figure BDA0002668953790000065
representing the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
Figure BDA0002668953790000066
Figure BDA0002668953790000067
wherein
Figure BDA0002668953790000068
The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known
Figure BDA0002668953790000069
Pr(y=c|qg) Representing the probability of belonging to category c;
Figure BDA00026689537900000610
to express p in the formulaiTranspose of (y)kHere again, the label of the kth instance in the supporting set is represented;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,
Figure BDA00026689537900000611
then, the cross entropy loss is used for measuring the classification loss, and N represents the packet of the support setNumber of classes contained:
Figure BDA00026689537900000612
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention can simulate the sequentiality of knowledge learning and the unexplainable property of a previous model, convert the classification problem into the problems of knowledge selection and sequence determination, and avoid the influence caused by dissimilar tasks.
Drawings
Fig. 1 is a flowchart of the small sample classification method based on dynamic path knowledge learning according to the present invention.
FIG. 2 is a workflow diagram of the present invention based on dynamic path knowledge learning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a small sample classification method based on dynamic knowledge path learning, which is divided into three stages of knowledge selection based on a knowledge graph, dynamic path generation based on category constraint and knowledge learning based on a path, and respectively considers the learning process of forming a learning path after knowledge is selected and then learning, acquires the commonality of categories by learning matched knowledge for each small sample example, and reduces the influence caused by respective difference among samples.
As shown in fig. 1, a method for classifying small samples based on dynamic knowledge path learning includes the following steps:
1. in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph; the specific method comprises the following steps:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
Figure BDA0002668953790000071
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance
Figure BDA0002668953790000072
For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is
Figure BDA0002668953790000073
After selecting the knowledge points, the knowledge points are updated to
Figure BDA0002668953790000074
Using hidden states
Figure BDA0002668953790000075
The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
Figure BDA0002668953790000076
Figure BDA0002668953790000077
Figure BDA0002668953790000081
In the formula (2), WT,WhAnd WvAre all linear transformation operations, the matrix can be transformed to the appropriate dimensions, | B | represents the number of knowledge points in the auxiliary graph,
Figure BDA0002668953790000082
representing the jth knowledge point in the auxiliary graph,
Figure BDA0002668953790000083
representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,
Figure BDA0002668953790000084
the preference degree of the ith instance to the kth knowledge point at the time t is shown, and is specifically proposed here to obtain the preference degrees of all knowledge points through addition, and convert the preference degrees into a probability form through a formula (3), so as to obtain the probability of selecting the knowledge point j
Figure BDA0002668953790000085
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node
Figure BDA0002668953790000086
That is, the ith instance picks the path node at time t as
Figure BDA0002668953790000087
Figure BDA0002668953790000088
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated
Figure BDA0002668953790000089
The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
Figure BDA00026689537900000810
2. in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss; the path is obtained by selecting the most relevant knowledge point from the knowledge graph T times by a small sample example, and the path selected by the ith example is marked as a path diAnd the t-th node therein is marked as
Figure BDA00026689537900000811
The path chosen for the ith instance is therefore routed
Figure BDA00026689537900000812
Figure BDA00026689537900000813
Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
Figure BDA00026689537900000814
Figure BDA00026689537900000815
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u; KL is an abbreviation of Kullback-Leibler divergence;
Figure BDA0002668953790000091
the average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarly
Figure BDA0002668953790000092
All represent the average attention degree of a certain example to each knowledge point from 1 to t;
Figure BDA0002668953790000093
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
3. In the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through the hidden layer state to enhance the feature expression of the target example, and the nodes on the path and the hidden layer state updated through the recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer state by utilizing a gating mechanism:
Figure BDA0002668953790000094
Figure BDA0002668953790000095
Figure BDA0002668953790000096
Figure BDA0002668953790000097
wherein, Wr,Wz
Figure BDA0002668953790000098
All represent linear transformation operations, and
Figure BDA0002668953790000099
intermediate states that are gating mechanisms, and finally hidden states are removed from
Figure BDA00026689537900000910
Is updated to
Figure BDA00026689537900000911
Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtained
Figure BDA00026689537900000912
Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pi or qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
Figure BDA00026689537900000913
wherein
Figure BDA00026689537900000914
And
Figure BDA00026689537900000915
each set is used to represent the situation at each time step,
Figure BDA00026689537900000916
represents the average degree of attention to the knowledge points at each moment, and
Figure BDA00026689537900000917
representing the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
Figure BDA00026689537900000918
Figure BDA0002668953790000101
wherein
Figure BDA0002668953790000102
The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known
Figure BDA0002668953790000103
Pr(y=c|qg) Representing the probability of belonging to category c;
Figure BDA0002668953790000104
to express p in the formulaiTranspose of (y)kAre still denoted hereinSupporting the tag of the kth instance;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,
Figure BDA0002668953790000105
then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:
Figure BDA0002668953790000106
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3 #(16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
The invention provides a small sample classification method for dynamic path learning, which enables prior knowledge to be orderly transmitted among tasks, designs a loss function aiming at path generation and ensures the reasonability of path generation to the greatest extent.
As shown in fig. 2, the working flow of small sample learning based on the present invention is: and (3) forming a knowledge graph by the aid of similarity among the auxiliary sets, dynamically selecting a dedicated knowledge path for each small sample task instance when a small sample classification task is input, then learning a final feature expression according to the path, and performing feature classification on the basis of the feature expression to fulfill the purpose of small sample classification.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (4)

1. A small sample classification method based on dynamic knowledge path learning is characterized in that: the method comprises the following steps:
in the knowledge selection stage based on the knowledge graph, a small sample example is used for searching a learning path suitable for the user in the knowledge graph by combining the auxiliary set into the knowledge graph;
in a dynamic path generation stage based on category constraint, selecting the most relevant knowledge points in a knowledge graph to form a path by a small sample example, introducing the constraint of the path at a category level to obtain category commonality, and constraining the path quality by calculating path loss;
and in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points is extracted in sequence to enhance the feature expression of the target example, similarity calculation is carried out on the feature expression of each example in the query set and each example in the support set pairwise, so that the target examples are classified into the category with the highest similarity, then the classification loss is measured by using cross entropy loss, and a small sample classification model is established through the weighted summation of the classification loss and the path loss.
2. The method for classifying the small samples based on the dynamic knowledge path learning as claimed in claim 1, wherein: in the knowledge selection stage based on the knowledge graph, the specific method for searching the learning path suitable for the small sample example in the knowledge graph is as follows:
the nodes of the knowledge graph are directly composed of auxiliary set type prototypes, wherein the prototypes are knowledge points, and the nodes are connected by using the similarity between the auxiliary types as the weight of the edges; the similarity calculation between the knowledge graph nodes is shown by formula (1):
Figure FDA0002668953780000011
p and q in the formula (1) are auxiliary set class knowledge points, and s is a similarity measurement function and is defined as dot product similarity; the weight of the edge between every two nodes in the knowledge graph can be obtained through the function, so that a knowledge graph formed by the auxiliary set B is determined;
selecting an exclusive knowledge path for each small sample instance, setting the length of the knowledge path as T, and selecting a knowledge point from a knowledge graph formed by the auxiliary set B at each moment as a node on the path; in particular, a hidden state is set for one of the paths i, i.e. the path chosen by the ith instance
Figure FDA0002668953780000012
For calculating the probability of selecting a knowledge point, and the hidden state of the ith instance at time t is
Figure FDA0002668953780000013
After selecting the knowledge points, the knowledge points are updated to
Figure FDA0002668953780000014
Using hidden states
Figure FDA0002668953780000015
The attention calculation is carried out on all the knowledge points in the auxiliary set B, and the probability of selecting the knowledge point j is taken as the attention distribution
Figure FDA0002668953780000016
Figure FDA0002668953780000017
Figure FDA0002668953780000021
In the formula (2), WT,WhAnd WvAll linear transformation operations, which convert the matrix into the appropriate dimensionsB | represents the number of knowledge points in the auxiliary graph,
Figure FDA0002668953780000022
representing the jth knowledge point in the auxiliary graph,
Figure FDA0002668953780000023
representing the preference degree of the ith instance to the jth knowledge point at the time t;
in the formula (3), the first and second groups,
Figure FDA0002668953780000024
the preference degree of the ith instance to the kth knowledge point at the moment t is expressed, and the ith instance is converted into a probability form through a formula (3), so that the probability of selecting the knowledge point j is obtained
Figure FDA0002668953780000025
Selecting the point with the maximum probability knowledge as the node of the path according to the formula (4), and recording the t-th node on the path as the node
Figure FDA0002668953780000026
That is, the ith instance picks the path node at time t as
Figure FDA0002668953780000027
Figure FDA0002668953780000028
Average knowledge point characteristic is introduced when hidden layer state at t +1 moment is calculated
Figure FDA0002668953780000029
The influence of deviation from the original problem in path searching is reduced, and finally, the hidden layer state is updated through a recurrent neural network:
Figure FDA00026689537800000210
3. the method for classifying the small samples based on the dynamic knowledge path learning as claimed in claim 2, wherein: in the dynamic path generation stage based on the category constraint, the path is obtained by selecting the most relevant knowledge point from the knowledge graph for T times by the instance of the small sample task, and the path selected by the ith instance is marked as a path diAnd the t-th node therein is marked as
Figure FDA00026689537800000211
The path chosen for the ith instance is therefore routed
Figure FDA00026689537800000212
Composition is carried out;
the path loss is calculated to constrain the path quality, and the calculation is as follows:
Figure FDA00026689537800000213
Figure FDA00026689537800000214
wherein o, u, v are indices used to represent any value within a range; the number of the examples under the query set and the number of the examples under the support set in the small sample task are respectively expressed by | Q | and | S |, yo,yvAnd yuEach represents a label of example i, when i ═ o, v, u;
while
Figure FDA0002668953780000031
The average attention degree of the example i to each knowledge point at the time 1-t is shown; when i is o, it is expressed as u or v, similarly
Figure FDA0002668953780000032
All represent the average attention degree of a certain example to each knowledge point from 1 to t;
Figure FDA0002668953780000033
the method is used for indicating whether the knowledge points are selected at the time from 1 to t, and the value is increased along with the increase of the time step so as to increase the punishment on the distribution which does not meet the requirement.
4. The method for classifying the small samples based on the dynamic knowledge path learning, according to claim 3, is characterized in that: in the knowledge learning and classification stage based on the path, information carried by the most relevant knowledge points can be sequentially extracted through hidden layer states to enhance the feature expression of a target example, and nodes on the path and hidden layer states updated through a recurrent neural network in the knowledge selection stage based on the knowledge graph are used as input to update the corresponding hidden layer states by utilizing a gating mechanism:
Figure FDA0002668953780000034
Figure FDA0002668953780000035
Figure FDA0002668953780000036
Figure FDA0002668953780000037
wherein, Wr,Wz
Figure FDA0002668953780000038
All represent linear transformation operations, and
Figure FDA0002668953780000039
intermediate states that are gating mechanisms, and finally hidden states are removed from
Figure FDA00026689537800000310
Is updated to
Figure FDA00026689537800000311
Sigma is an activation function for increasing the nonlinear characteristics of the intermediate state of the gating mechanism;
after gathering the information of the T knowledge points, the final hidden layer state is obtained
Figure FDA00026689537800000312
Obtaining the feature expression of the instance in the new space through an output network by combining the attention distribution of the knowledge selection stage based on the knowledge graph, and using pior qgRespectively, the feature expression in the new space of the g-th instance from the support set or the i-th instance from the query set:
Figure FDA00026689537800000313
wherein
Figure FDA00026689537800000314
And
Figure FDA00026689537800000315
each set is used to represent the situation at each time step,
Figure FDA00026689537800000316
represents the average degree of attention to the knowledge points at each moment, and
Figure FDA00026689537800000317
representing the hidden layer state at each time;
and performing similarity calculation on the feature expression of each small sample example in the query set example and the support set so that the target examples are classified into the categories with the highest similarity, wherein the calculation formula is as follows:
Figure FDA00026689537800000318
Figure FDA0002668953780000041
wherein
Figure FDA0002668953780000042
The similarity of the g-th instance in the query set to the i-th instance in the support set is shown, and the same principle can be known
Figure FDA0002668953780000043
Pr(y=c|qg) Representing the probability of belonging to category c;
Figure FDA0002668953780000044
to express p in the formulaiTranspose of (y)kHere again, the label of the kth instance in the supporting set is represented;
wherein WsIs a learnable parameter, using PrgRepresenting the probability vectors for each category for the query set instance g,
Figure FDA0002668953780000045
then, the cross entropy loss is used to measure the classification loss, and N represents the number of classes contained in the support set:
Figure FDA0002668953780000046
building a small sample classification model through the weighted summation of the classification loss and the path loss:
L=λL1+μL2+νL3# (16)
wherein λ, μ and ν are hyper-parameters for controlling the weight of each loss function; the objective function can be used for guiding the model to search for a more reasonable knowledge path and improving the classification precision of the small samples.
CN202010927478.4A 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning Active CN112183580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010927478.4A CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010927478.4A CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Publications (2)

Publication Number Publication Date
CN112183580A true CN112183580A (en) 2021-01-05
CN112183580B CN112183580B (en) 2021-08-10

Family

ID=73924858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010927478.4A Active CN112183580B (en) 2020-09-07 2020-09-07 Small sample classification method based on dynamic knowledge path learning

Country Status (1)

Country Link
CN (1) CN112183580B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800196A (en) * 2021-01-18 2021-05-14 北京明略软件系统有限公司 FAQ question-answer library matching method and system based on twin network
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908650A (en) * 2017-10-12 2018-04-13 浙江大学 Knowledge train of thought method for auto constructing based on mass digital books
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A kind of Knowledge driving parameter transformation model and its few sample learning method
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908650A (en) * 2017-10-12 2018-04-13 浙江大学 Knowledge train of thought method for auto constructing based on mass digital books
CN109934261A (en) * 2019-01-31 2019-06-25 中山大学 A kind of Knowledge driving parameter transformation model and its few sample learning method
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUAXIU等: ""Graph Few-shot Learning via Knowledge Transfer"", 《ARXIV》 *
WENHAN等: ""One-Shot Relational Learning for Knowledge Graphs"", 《PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800196A (en) * 2021-01-18 2021-05-14 北京明略软件系统有限公司 FAQ question-answer library matching method and system based on twin network
CN112800196B (en) * 2021-01-18 2024-03-01 南京明略科技有限公司 FAQ question-answering library matching method and system based on twin network
CN115100532A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Small sample remote sensing image target detection method and system

Also Published As

Publication number Publication date
CN112183580B (en) 2021-08-10

Similar Documents

Publication Publication Date Title
Li et al. 2-D stochastic configuration networks for image data analytics
Huang et al. Cost-effective training of deep cnns with active model adaptation
Wu et al. A novel deep model with multi-loss and efficient training for person re-identification
EP3798917A1 (en) Generative adversarial network (gan) for generating images
Yu et al. Unsupervised random forest indexing for fast action search
CN111967294A (en) Unsupervised domain self-adaptive pedestrian re-identification method
Chu et al. Unsupervised temporal commonality discovery
CN110674323A (en) Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
CN112183580B (en) Small sample classification method based on dynamic knowledge path learning
CN114333027B (en) Cross-domain novel facial expression recognition method based on combined and alternate learning frames
CN117992805B (en) Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion
Gu et al. Local optimality of self-organising neuro-fuzzy inference systems
Menaga et al. Deep learning: a recent computing platform for multimedia information retrieval
CN110991500A (en) Small sample multi-classification method based on nested integrated depth support vector machine
Wang et al. A novel multiface recognition method with short training time and lightweight based on ABASNet and H-softmax
CN114973226B (en) Training method for text recognition system in self-supervision contrast learning natural scene
CN116912624A (en) Pseudo tag unsupervised data training method, device, equipment and medium
Ou et al. Improving person re-identification by multi-task learning
CN115392474B (en) Local perception graph representation learning method based on iterative optimization
CN116680578A (en) Cross-modal model-based deep semantic understanding method
CN116092138A (en) K neighbor graph iterative vein recognition method and system based on deep learning
Xue et al. Fast and unsupervised neural architecture evolution for visual representation learning
Tian et al. Modeling cardinality in image hashing
CN113887353A (en) Visible light-infrared pedestrian re-identification method and system
CN113420821A (en) Multi-label learning method based on local correlation of labels and features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Harbin Institute of Technology,Shenzhen(Shenzhen Institute of science and technology innovation Harbin Institute of Technology)

Address before: 518055 campus of Harbin Institute of technology, Shenzhen University Town, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN)

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant