CN117688455B - Meta-task small sample classification method based on data quality and reinforcement learning - Google Patents
Meta-task small sample classification method based on data quality and reinforcement learning Download PDFInfo
- Publication number
- CN117688455B CN117688455B CN202410158075.6A CN202410158075A CN117688455B CN 117688455 B CN117688455 B CN 117688455B CN 202410158075 A CN202410158075 A CN 202410158075A CN 117688455 B CN117688455 B CN 117688455B
- Authority
- CN
- China
- Prior art keywords
- task
- meta
- representing
- learning
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000002787 reinforcement Effects 0.000 title claims abstract description 21
- 230000006870 function Effects 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000012360 testing method Methods 0.000 claims abstract description 35
- 238000005070 sampling Methods 0.000 claims abstract description 23
- 238000013145 classification model Methods 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 10
- 238000013441 quality evaluation Methods 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000001303 quality assessment method Methods 0.000 claims description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a meta-task small sample classification method based on data quality and reinforcement learning, which comprises the following steps: respectively sampling the standardized training set and the standardized test set to obtain a sampling training set and a sampling test set; combining the sampling training set and the sampling test set into a unitary learning task; normalizing each calculated weight to obtain a normalized weight corresponding to each task; determining the category of each task based on the normalization weight corresponding to each task; performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter; performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter; obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter; and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.
Description
Technical Field
The application relates to the technical field of meta-task small sample classification, in particular to a meta-task small sample classification method based on data quality and reinforcement learning.
Background
The small sample classification refers to training a model capable of accurately classifying new types of data under the condition that only a small amount of marked data exists. In the financial field, small sample classification is an important problem, because financial data often has high-dimensional, sparse, nonlinear, dynamic change characteristics and the like, so that it is difficult to acquire enough labeling data to train an effective classification model. For example, in credit card fraud detection, the proportion of fraudulent transactions is very low and the means of fraud is continually updated, so that there is a need for a method that can quickly identify new fraudulent patterns with small amounts of annotation data.
Meta-learning has great potential in this respect, for example, in that it can learn a generic classifier using existing normal and fraudulent transaction data, and then quickly adapt and predict on the newly emerging transaction data. However, meta-learning also suffers from data quality, and if there is an imbalance in the category or noise of the data extracted from the meta-task, it is difficult for the meta-learning model to obtain effective information from the meta-task, thereby affecting its generalization ability on the target task. Therefore, there is a need for a method to evaluate the quality of data extracted by a metatask and optimize the selection strategy of the metatask so that it can better adapt to the target task.
Disclosure of Invention
Based on the above, it is necessary to provide a small sample classification method capable of dynamically selecting and adjusting the data distribution and the number of meta-tasks according to the characteristics of the target task, so as to improve the classification performance of the meta-learning model on the target task, in particular to a meta-task small sample classification method based on data quality and reinforcement learning, aiming at the problems of over-fitting, poor generalization capability, sensitivity to noise data and the like of the traditional small sample classification method.
The invention provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data and comprises the following steps:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, lending, stock, billing, insurance, information and funds;
s2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; combining the sampling training set and the sampling test set into a unitary learning task;
S3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task;
the tasks are classified tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data and information type or fund type classification of the financial technology product data;
The data in each task is financial and scientific product data corresponding to each label in the two-class task;
s4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1;
performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter;
S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter;
S6: constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model;
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
The beneficial effects are that: the method utilizes the data quality index to evaluate the adaptation algorithm of each meta-task, and then uses the reinforcement learning algorithm to dynamically allocate different meta-tasks to different strategies, thereby improving the performance of small sample classification.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for classifying small samples of meta-tasks based on data quality and reinforcement learning according to an embodiment of the application.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the present embodiment provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data, and includes:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels include financial, payment, lending, stock, billing, insurance, information, funds.
S2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; and combining the sampling training set and the sampling test set into a unitary learning task.
Specifically, the original dataset is divided into a training set and a testing set according to a set proportion p, the training set is marked as D train, the testing set is marked as D test, and the relation between the training set and the testing set is satisfied:
;
;
Wherein D represents the original dataset; A feature vector representing the ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the total number of financial and scientific product data.
Further, normalizing the training set and the test set includes: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:
;
;
;
Wherein, A feature vector representing normalized ith financial and scientific product data; /(I)A feature vector representing the ith financial and scientific product data; μ represents the mean of the feature vectors; sigma represents the standard deviation of the feature vector and N represents the total number of financial and scientific product data.
Still further, the sample training set is expressed as:
;
the sample test set is expressed as:
;
The meta learning task is expressed as:
;
a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity.
Further comprises: dividing tasks in the training set into a support set and a query set; the support set is expressed as:
;
the set of queries is represented as:
;
And the relation between the support set and the query set is satisfied:
;
A feature vector representing the v-th financial and scientific product data; /(I) A category label representing v-th financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the support set; /(I)A feature vector representing the first financial and scientific product data; /(I)A category label representing the first financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the query set; /(I)Representing the u-th task in the training set.
S3: and calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task.
Specifically, a data quality evaluation function is adopted to calculate the weight of each task in the meta-learning task, and a calculation formula is as follows:
;
Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)Standard deviation of all data representing the kth task in the meta-learning task; the calculation formula is as follows:
;
Wherein, The first data representing the kth task in the meta-learning task;
normalizing each calculated weight to obtain a normalized weight corresponding to each task, wherein the calculation formula is as follows:
;
Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented.
In this embodiment, the data of the financial technology product is related information of the financial technology product.
The tasks are classified into tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data, and information type or fund type classification of the financial technology product data. The essence of each task is to split a multi-label classification problem into two classification problems by task, each task is focused on identifying and distinguishing two specific categories from financial and scientific product data.
The data in each task is financial and scientific product data corresponding to each label in the classification task.
S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1.
Specifically, based on the normalized weight corresponding to each task, determining the category to which each task belongs, where the expression is:
;
Wherein, Representing the category of the kth task in the meta-learning task; /(I)In this embodiment, the value of the adjustable threshold parameter is 0.5, which is used as a judging condition for task classification, and the parameter can be adjusted according to actual situations for different model users.
And performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter.
Further, the method comprises the following steps:
Step 1: initializing a unary policy Wherein/>The parameters are meta-policy parameters;
Step 2: for each task b belonging to category 0 in meta-learning tasks, sampling an initial strategy from the meta-strategies Wherein/>;/>Representing initial policy parameters,/>Is a random disturbance;
step 3: executing initial policies on task b And collecting first trajectory data/>, therein,Expressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:
;
Wherein, Representing a first jackpot; t represents a time step; /(I)Representing the rewards earned by the initial strategy at time step t;
step 4: updating the initial strategy parameters according to the first track data and the first jackpot to obtain a new strategy Wherein/>;/>Representing new strategy parameters, wherein alpha is a first learning rate;
Step 5: executing a new policy on task b And collecting second trajectory data/>, therein; Calculating a second jackpot based on the second trajectory data, the calculation formula being:
;
Wherein, Representing a second jackpot;
step 6: updating the meta-strategy parameters according to the second cumulative rewards of all the tasks belonging to the category 0 in the meta-learning tasks to obtain a new meta-strategy Wherein/>;/>Representing new meta-strategy parameters, wherein beta is a second learning rate;
Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-policy and a first parameter.
And performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter corresponding to the second meta strategy.
Further, the method comprises the following steps:
step 1: initializing a parameter delta, a meta learning rate zeta and a task learning rate eta of a unitary reinforcement learning model;
step 2: for each meta-training iteration, a batch of tasks G is sampled from the tasks of category 1 to which the meta-learning task belongs, The h task G h comprises an unbalanced training set and a second testing set;
Step 3: for the h task G h, the following operations are performed:
step 3.1: using the current parameter delta to perform one-step strategy gradient update on the unbalanced training set to obtain a new parameter ,/>; Wherein/>Is a policy objective function based on data tasks with unbalanced classes, and has the expression:
;
;
;
Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; is a policy function based on a parameter delta; /(I) Is a dominance function based on the parameter delta; lambda is a regularization coefficient; is the entropy of the policy; /(I) Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z c is the weight of category c,/>; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r for adjusting the magnitude of the bonus value; gamma represents a discount factor for measuring the impact of future prize values on current state-action pairs; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained;
;
The representation is based on parameters/> Action cost function of/>Representing temperature parameters, controlling the randomness of the strategy. The higher the temperature parameter, the more the policy tends to randomly select action a or/>; The lower the temperature parameter, the more aggressive the policy is to the greedy selection action.
;
Step 3.2: using new parametersEvaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;
step 4: based on the report value updating parameter delta of all tasks belonging to the category 1 in the meta-learning task, the updating formula is as follows:
;
Wherein, Representing the updated parameters; m represents the number of tasks sampled from the tasks belonging to category 1 in the meta-learning task;
step 5: and (3) repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter corresponding to the second element strategy.
S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; and obtaining a mixing parameter based on the first parameter and the second parameter.
Specifically, based on the first meta-policy and the second meta-policy, a hybrid policy is obtained, and a calculation formula is as follows:
;
Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; /(I)Representing a second element policy;
Based on the first parameter and the second parameter, a mixing parameter is obtained, and a calculation formula is as follows:
;
Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [ ] represents a join operation.
S6: and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.
Specifically, the method comprises the following steps:
Step 1: and learning a return function of each task in the tasks by using the mixing strategy and the mixing parameter evaluation element, and calculating an average value as a target function, wherein the calculation formula is as follows:
;
Wherein, Representing an objective function; /(I)Representing a strategy corresponding to a j-th task in the meta-learning task; /(I)Representing strategy parameters corresponding to a j-th task in the meta-learning task; k represents the total number of tasks in the meta-learning task; the report function of the j-th task in the meta-learning task is represented as follows:
;
true label representing j-th task in meta-learning tasks,/> The value of (2) is 0 or 1; /(I)Representing the predictive probability of the j-th task in meta-learning tasks,/>The value range of (2) is [0,1];
step 2: maximizing objective function by gradient-lifting method And updating the mixing coefficient and the mixing parameter, wherein the expression is as follows:
;
Wherein, Representing the mixing coefficient,/>Third learning rate,/>Representing a fourth learning rate;
step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model.
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
In this embodiment, the classification result is a classification label of the financial and scientific product data to be classified, which is predicted by the classification model.
The method provided by the embodiment utilizes the data quality index to evaluate the adaptation algorithm of each metatask, and then uses the reinforcement learning algorithm to dynamically allocate different metatasks to different strategies, thereby improving the performance of small sample classification. The method can evaluate the quality of the data extracted by the meta-task and optimize the selection strategy of the meta-task, so that the method can be better adapted to the target task.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
Claims (4)
1. The meta-task small sample classification method based on data quality and reinforcement learning is used for classifying financial and scientific product data and is characterized by comprising the following steps of:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, mobile banking, lending, stock, accounting, insurance, information and funds;
s2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; combining the sampling training set and the sampling test set into a unitary learning task;
s3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task; the calculation formula is as follows:
;
Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented;
the tasks are classified tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data and information type or fund type classification of the financial technology product data;
The data in each task is financial and scientific product data corresponding to each label in the two-class task;
The calculation formula of the data quality evaluation function is as follows:
;
Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)The standard deviation of all data representing the kth task in the meta-learning task is calculated as follows:
;
Wherein, The first data representing the kth task in the meta-learning task;
S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1; the expression is:
;
Wherein, Representing the category of the kth task in the meta-learning task; /(I)Is an adjustable threshold parameter;
performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter;
Performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter, wherein the obtaining the first meta-policy and the first parameter comprises:
Step 1: initializing a unary policy Wherein/>The parameters are meta-policy parameters;
Step 2: for each task b belonging to category 0 in meta-learning tasks, sampling an initial strategy from the meta-strategies Wherein/>;/>Representing initial policy parameters,/>Is a random disturbance;
step 3: executing initial policies on task b And collecting first trajectory data/>, therein,/>Expressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:
;
Wherein, Representing a first jackpot; t represents a time step; /(I)Representing the rewards earned by the initial strategy at time step t;
step 4: updating the initial strategy parameters according to the first track data and the first jackpot to obtain a new strategy Wherein/>;/>Representing new strategy parameters, wherein alpha is a first learning rate;
Step 5: executing a new policy on task b And collecting second trajectory data/>, therein; Calculating a second jackpot based on the second trajectory data, the calculation formula being:
;
Wherein, Representing a second jackpot;
step 6: updating the meta-strategy parameters according to the second cumulative rewards of all the tasks belonging to the category 0 in the meta-learning tasks to obtain a new meta-strategy Wherein/>;/>Representing new meta-strategy parameters, wherein beta is a second learning rate;
Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-strategy and a first parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter, wherein the obtaining of the second meta strategy and the second parameter comprises:
step 1: initializing a parameter delta, a meta learning rate zeta and a task learning rate eta of a unitary reinforcement learning model;
step 2: for each meta-training iteration, a batch of tasks G is sampled from the tasks of category 1 to which the meta-learning task belongs, The h task G h comprises an unbalanced training set and a second testing set;
Step 3: for the h task G h, the following operations are performed:
step 3.1: using the current parameter delta to perform one-step strategy gradient update on the unbalanced training set to obtain a new parameter ,/>; Wherein/>Is a policy objective function based on data tasks with unbalanced classes, and has the expression:
;
;
;
;
;
Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; /(I)Is a policy function based on a parameter delta; /(I)Is a dominance function based on the parameter delta; lambda is a regularization coefficient; /(I)Is the entropy of the policy; /(I)Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z c is the weight of category c,; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r; gamma represents a discount factor; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained; /(I)The representation is based on parameters/>Action cost function of/>Representing a temperature parameter;
step 3.2: using new parameters Evaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;
step 4: based on the report value updating parameter delta of all tasks belonging to the category 1 in the meta-learning task, the updating formula is as follows:
;
Wherein, Representing the updated parameters; m represents the number of tasks sampled from the tasks belonging to category 1 in the meta-learning task;
Step 5: repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter;
s5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; the calculation formula is as follows:
;
Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; Representing a second element policy;
obtaining a mixing parameter based on the first parameter and the second parameter; the calculation formula is as follows:
;
Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [, ] represents a join operation;
S6: constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model;
The step of constructing an objective function based on the mixing strategy and the mixing parameters, maximizing the objective function until convergence, and obtaining a classification model includes:
Step 1: and learning a return function of each task in the tasks by using the mixing strategy and the mixing parameter evaluation element, and calculating an average value as a target function, wherein the calculation formula is as follows:
;
Wherein, Representing an objective function; /(I)Representing a strategy corresponding to a j-th task in the meta-learning task; /(I)Representing strategy parameters corresponding to a j-th task in the meta-learning task; k represents the total number of tasks in the meta-learning task; the report function of the j-th task in the meta-learning task is represented as follows:
;
true label representing j-th task in meta-learning tasks,/> The value of (2) is 0 or 1; /(I)Representing the predictive probability of the j-th task in meta-learning tasks,/>The value range of (2) is [0,1];
step 2: maximizing objective function by gradient-lifting method And updating the mixing coefficient and the mixing parameter, wherein the expression is as follows:
;
Wherein, Representing the mixing coefficient,/>Third learning rate,/>Representing a fourth learning rate;
step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model;
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
2. The meta-task small sample classification method based on data quality and reinforcement learning according to claim 1, wherein the original data set is divided into a training set and a test set according to a set proportion p, the training set is denoted as D train, the test set is denoted as D test, and a relation between the training set and the test set is satisfied:
;
;
Wherein D represents the original dataset; A feature vector representing the ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the total number of financial and scientific product data.
3. The method for classifying metatask small samples based on data quality and reinforcement learning according to claim 2, wherein normalizing the training set and the test set comprises: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:
;
;
;
Wherein, A feature vector representing normalized ith financial and scientific product data; /(I)A feature vector representing the ith financial and scientific product data; μ represents the mean of the feature vectors; sigma represents the standard deviation of the feature vector and N represents the total number of financial and scientific product data.
4. The meta-task small sample classification method based on data quality and reinforcement learning of claim 3, wherein the sampling training set is expressed as:
;
the sample test set is expressed as:
;
The meta learning task is expressed as:
;
a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity. /(I)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410158075.6A CN117688455B (en) | 2024-02-04 | 2024-02-04 | Meta-task small sample classification method based on data quality and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410158075.6A CN117688455B (en) | 2024-02-04 | 2024-02-04 | Meta-task small sample classification method based on data quality and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117688455A CN117688455A (en) | 2024-03-12 |
CN117688455B true CN117688455B (en) | 2024-05-03 |
Family
ID=90139478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410158075.6A Active CN117688455B (en) | 2024-02-04 | 2024-02-04 | Meta-task small sample classification method based on data quality and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117688455B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711433A (en) * | 2018-11-30 | 2019-05-03 | 东南大学 | A kind of fine grit classification method based on meta learning |
CN110569886A (en) * | 2019-08-20 | 2019-12-13 | 天津大学 | Image classification method for bidirectional channel attention element learning |
CN114119319A (en) * | 2021-11-24 | 2022-03-01 | 中建三局绿色产业投资有限公司 | Intelligent water affair management method and system based on distributed small sample algorithm |
CN114429009A (en) * | 2022-04-07 | 2022-05-03 | 中国石油大学(华东) | Small sample sucker-rod pump well working condition diagnosis method based on meta-migration learning |
CN117136360A (en) * | 2021-02-12 | 2023-11-28 | 硕动力公司 | System and method for task-oriented dialog security policy improvement |
CN117150359A (en) * | 2023-08-31 | 2023-12-01 | 华能阜新风力发电有限责任公司 | Small sample fault diagnosis method, system, device and medium based on model independent element learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919317A (en) * | 2018-01-11 | 2019-06-21 | 华为技术有限公司 | A kind of machine learning model training method and device |
-
2024
- 2024-02-04 CN CN202410158075.6A patent/CN117688455B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711433A (en) * | 2018-11-30 | 2019-05-03 | 东南大学 | A kind of fine grit classification method based on meta learning |
CN110569886A (en) * | 2019-08-20 | 2019-12-13 | 天津大学 | Image classification method for bidirectional channel attention element learning |
CN117136360A (en) * | 2021-02-12 | 2023-11-28 | 硕动力公司 | System and method for task-oriented dialog security policy improvement |
CN114119319A (en) * | 2021-11-24 | 2022-03-01 | 中建三局绿色产业投资有限公司 | Intelligent water affair management method and system based on distributed small sample algorithm |
CN114429009A (en) * | 2022-04-07 | 2022-05-03 | 中国石油大学(华东) | Small sample sucker-rod pump well working condition diagnosis method based on meta-migration learning |
CN117150359A (en) * | 2023-08-31 | 2023-12-01 | 华能阜新风力发电有限责任公司 | Small sample fault diagnosis method, system, device and medium based on model independent element learning |
Non-Patent Citations (2)
Title |
---|
A Survey of Meta-Reinforcement Learning;Jacob Beck et al.;《arXiv:2301.08028v1 [cs.LG]》;20230119;第1-53页 * |
基于博弈策略强化学习的函数优化算法;叶俊 等;《计算机工程与应用》;20050501(第17期);第67-68, 101页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117688455A (en) | 2024-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11599939B2 (en) | System, method and computer program for underwriting and processing of loans using machine learning | |
Bahnsen et al. | Cost sensitive credit card fraud detection using Bayes minimum risk | |
US10579938B2 (en) | Real time autonomous archetype outlier analytics | |
CN104321794A (en) | A system and method using multi-dimensional rating to determine an entity's future commercial viability | |
Fan et al. | Improved ML-based technique for credit card scoring in internet financial risk control | |
CN115205011B (en) | Bank user portrait model generation method based on LSF-FC algorithm | |
CN111325344A (en) | Method and apparatus for evaluating model interpretation tools | |
CN112085593B (en) | Credit data mining method for small and medium enterprises | |
CN117688455B (en) | Meta-task small sample classification method based on data quality and reinforcement learning | |
CN112508684A (en) | Joint convolutional neural network-based collection risk rating method and system | |
Naik | Predicting credit risk for unsecured lending: A machine learning approach | |
Fikriya et al. | Support Vector Machine Predictive Analysis Implementation: Case Study of Tax Revenue in Government of South Lampung | |
Dzelihodzic et al. | Data Mining Techniques for Credit Risk Assessment Task | |
Lee et al. | Application of machine learning in credit risk scorecard | |
CN117094817B (en) | Credit risk control intelligent prediction method and system | |
Oh et al. | Developing time-based clustering neural networks to use change-point detection: Application to financial time series | |
Sarlija et al. | A Neural Network Classification of Credit Applicants in Consumer Credit Scoring. | |
CN113282886B (en) | Bank loan default judgment method based on logistic regression | |
CN113158230B (en) | Online classification method based on differential privacy | |
CN116843432B (en) | Anti-fraud method and device based on address text information | |
CN113554228B (en) | Training method of repayment rate prediction model and repayment rate prediction method | |
Li et al. | CUS-RF-Based Credit Card Fraud Detection with Imbalanced Data | |
CN118071482A (en) | Method for constructing retail credit risk prediction model and consumer credit business Scorebetad model | |
Yan et al. | Beyond classification and ranking: constrained optimization of the ROI | |
Davalos et al. | Deriving rules for forecasting air carrier financial stress and insolvency: A genetic algorithm approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |