CN117688455B - Meta-task small sample classification method based on data quality and reinforcement learning - Google Patents

Meta-task small sample classification method based on data quality and reinforcement learning Download PDF

Info

Publication number
CN117688455B
CN117688455B CN202410158075.6A CN202410158075A CN117688455B CN 117688455 B CN117688455 B CN 117688455B CN 202410158075 A CN202410158075 A CN 202410158075A CN 117688455 B CN117688455 B CN 117688455B
Authority
CN
China
Prior art keywords
task
meta
representing
learning
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410158075.6A
Other languages
Chinese (zh)
Other versions
CN117688455A (en
Inventor
陈晓红
霍杨杰
徐雪松
张震
王煜坤
许冠英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangjiang Laboratory
Original Assignee
Xiangjiang Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangjiang Laboratory filed Critical Xiangjiang Laboratory
Priority to CN202410158075.6A priority Critical patent/CN117688455B/en
Publication of CN117688455A publication Critical patent/CN117688455A/en
Application granted granted Critical
Publication of CN117688455B publication Critical patent/CN117688455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a meta-task small sample classification method based on data quality and reinforcement learning, which comprises the following steps: respectively sampling the standardized training set and the standardized test set to obtain a sampling training set and a sampling test set; combining the sampling training set and the sampling test set into a unitary learning task; normalizing each calculated weight to obtain a normalized weight corresponding to each task; determining the category of each task based on the normalization weight corresponding to each task; performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter; performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter; obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter; and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.

Description

Meta-task small sample classification method based on data quality and reinforcement learning
Technical Field
The application relates to the technical field of meta-task small sample classification, in particular to a meta-task small sample classification method based on data quality and reinforcement learning.
Background
The small sample classification refers to training a model capable of accurately classifying new types of data under the condition that only a small amount of marked data exists. In the financial field, small sample classification is an important problem, because financial data often has high-dimensional, sparse, nonlinear, dynamic change characteristics and the like, so that it is difficult to acquire enough labeling data to train an effective classification model. For example, in credit card fraud detection, the proportion of fraudulent transactions is very low and the means of fraud is continually updated, so that there is a need for a method that can quickly identify new fraudulent patterns with small amounts of annotation data.
Meta-learning has great potential in this respect, for example, in that it can learn a generic classifier using existing normal and fraudulent transaction data, and then quickly adapt and predict on the newly emerging transaction data. However, meta-learning also suffers from data quality, and if there is an imbalance in the category or noise of the data extracted from the meta-task, it is difficult for the meta-learning model to obtain effective information from the meta-task, thereby affecting its generalization ability on the target task. Therefore, there is a need for a method to evaluate the quality of data extracted by a metatask and optimize the selection strategy of the metatask so that it can better adapt to the target task.
Disclosure of Invention
Based on the above, it is necessary to provide a small sample classification method capable of dynamically selecting and adjusting the data distribution and the number of meta-tasks according to the characteristics of the target task, so as to improve the classification performance of the meta-learning model on the target task, in particular to a meta-task small sample classification method based on data quality and reinforcement learning, aiming at the problems of over-fitting, poor generalization capability, sensitivity to noise data and the like of the traditional small sample classification method.
The invention provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data and comprises the following steps:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, lending, stock, billing, insurance, information and funds;
s2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; combining the sampling training set and the sampling test set into a unitary learning task;
S3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task;
the tasks are classified tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data and information type or fund type classification of the financial technology product data;
The data in each task is financial and scientific product data corresponding to each label in the two-class task;
s4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1;
performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter;
S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter;
S6: constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model;
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
The beneficial effects are that: the method utilizes the data quality index to evaluate the adaptation algorithm of each meta-task, and then uses the reinforcement learning algorithm to dynamically allocate different meta-tasks to different strategies, thereby improving the performance of small sample classification.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for classifying small samples of meta-tasks based on data quality and reinforcement learning according to an embodiment of the application.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the present embodiment provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data, and includes:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels include financial, payment, lending, stock, billing, insurance, information, funds.
S2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; and combining the sampling training set and the sampling test set into a unitary learning task.
Specifically, the original dataset is divided into a training set and a testing set according to a set proportion p, the training set is marked as D train, the testing set is marked as D test, and the relation between the training set and the testing set is satisfied:
Wherein D represents the original dataset; A feature vector representing the ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the total number of financial and scientific product data.
Further, normalizing the training set and the test set includes: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:
Wherein, A feature vector representing normalized ith financial and scientific product data; /(I)A feature vector representing the ith financial and scientific product data; μ represents the mean of the feature vectors; sigma represents the standard deviation of the feature vector and N represents the total number of financial and scientific product data.
Still further, the sample training set is expressed as:
the sample test set is expressed as:
The meta learning task is expressed as:
a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity.
Further comprises: dividing tasks in the training set into a support set and a query set; the support set is expressed as:
the set of queries is represented as:
And the relation between the support set and the query set is satisfied:
A feature vector representing the v-th financial and scientific product data; /(I) A category label representing v-th financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the support set; /(I)A feature vector representing the first financial and scientific product data; /(I)A category label representing the first financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the query set; /(I)Representing the u-th task in the training set.
S3: and calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task.
Specifically, a data quality evaluation function is adopted to calculate the weight of each task in the meta-learning task, and a calculation formula is as follows:
Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)Standard deviation of all data representing the kth task in the meta-learning task; the calculation formula is as follows:
Wherein, The first data representing the kth task in the meta-learning task;
normalizing each calculated weight to obtain a normalized weight corresponding to each task, wherein the calculation formula is as follows:
Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented.
In this embodiment, the data of the financial technology product is related information of the financial technology product.
The tasks are classified into tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data, and information type or fund type classification of the financial technology product data. The essence of each task is to split a multi-label classification problem into two classification problems by task, each task is focused on identifying and distinguishing two specific categories from financial and scientific product data.
The data in each task is financial and scientific product data corresponding to each label in the classification task.
S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1.
Specifically, based on the normalized weight corresponding to each task, determining the category to which each task belongs, where the expression is:
Wherein, Representing the category of the kth task in the meta-learning task; /(I)In this embodiment, the value of the adjustable threshold parameter is 0.5, which is used as a judging condition for task classification, and the parameter can be adjusted according to actual situations for different model users.
And performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter.
Further, the method comprises the following steps:
Step 1: initializing a unary policy Wherein/>The parameters are meta-policy parameters;
Step 2: for each task b belonging to category 0 in meta-learning tasks, sampling an initial strategy from the meta-strategies Wherein/>;/>Representing initial policy parameters,/>Is a random disturbance;
step 3: executing initial policies on task b And collecting first trajectory data/>, thereinExpressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:
Wherein, Representing a first jackpot; t represents a time step; /(I)Representing the rewards earned by the initial strategy at time step t;
step 4: updating the initial strategy parameters according to the first track data and the first jackpot to obtain a new strategy Wherein/>;/>Representing new strategy parameters, wherein alpha is a first learning rate;
Step 5: executing a new policy on task b And collecting second trajectory data/>, therein; Calculating a second jackpot based on the second trajectory data, the calculation formula being:
Wherein, Representing a second jackpot;
step 6: updating the meta-strategy parameters according to the second cumulative rewards of all the tasks belonging to the category 0 in the meta-learning tasks to obtain a new meta-strategy Wherein/>;/>Representing new meta-strategy parameters, wherein beta is a second learning rate;
Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-policy and a first parameter.
And performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter corresponding to the second meta strategy.
Further, the method comprises the following steps:
step 1: initializing a parameter delta, a meta learning rate zeta and a task learning rate eta of a unitary reinforcement learning model;
step 2: for each meta-training iteration, a batch of tasks G is sampled from the tasks of category 1 to which the meta-learning task belongs, The h task G h comprises an unbalanced training set and a second testing set;
Step 3: for the h task G h, the following operations are performed:
step 3.1: using the current parameter delta to perform one-step strategy gradient update on the unbalanced training set to obtain a new parameter ,/>; Wherein/>Is a policy objective function based on data tasks with unbalanced classes, and has the expression:
Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; is a policy function based on a parameter delta; /(I) Is a dominance function based on the parameter delta; lambda is a regularization coefficient; is the entropy of the policy; /(I) Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z c is the weight of category c,/>; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r for adjusting the magnitude of the bonus value; gamma represents a discount factor for measuring the impact of future prize values on current state-action pairs; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained;
The representation is based on parameters/> Action cost function of/>Representing temperature parameters, controlling the randomness of the strategy. The higher the temperature parameter, the more the policy tends to randomly select action a or/>; The lower the temperature parameter, the more aggressive the policy is to the greedy selection action.
Step 3.2: using new parametersEvaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;
step 4: based on the report value updating parameter delta of all tasks belonging to the category 1 in the meta-learning task, the updating formula is as follows:
Wherein, Representing the updated parameters; m represents the number of tasks sampled from the tasks belonging to category 1 in the meta-learning task;
step 5: and (3) repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter corresponding to the second element strategy.
S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; and obtaining a mixing parameter based on the first parameter and the second parameter.
Specifically, based on the first meta-policy and the second meta-policy, a hybrid policy is obtained, and a calculation formula is as follows:
Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; /(I)Representing a second element policy;
Based on the first parameter and the second parameter, a mixing parameter is obtained, and a calculation formula is as follows:
Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [ ] represents a join operation.
S6: and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.
Specifically, the method comprises the following steps:
Step 1: and learning a return function of each task in the tasks by using the mixing strategy and the mixing parameter evaluation element, and calculating an average value as a target function, wherein the calculation formula is as follows:
Wherein, Representing an objective function; /(I)Representing a strategy corresponding to a j-th task in the meta-learning task; /(I)Representing strategy parameters corresponding to a j-th task in the meta-learning task; k represents the total number of tasks in the meta-learning task; the report function of the j-th task in the meta-learning task is represented as follows:
true label representing j-th task in meta-learning tasks,/> The value of (2) is 0 or 1; /(I)Representing the predictive probability of the j-th task in meta-learning tasks,/>The value range of (2) is [0,1];
step 2: maximizing objective function by gradient-lifting method And updating the mixing coefficient and the mixing parameter, wherein the expression is as follows:
Wherein, Representing the mixing coefficient,/>Third learning rate,/>Representing a fourth learning rate;
step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model.
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
In this embodiment, the classification result is a classification label of the financial and scientific product data to be classified, which is predicted by the classification model.
The method provided by the embodiment utilizes the data quality index to evaluate the adaptation algorithm of each metatask, and then uses the reinforcement learning algorithm to dynamically allocate different metatasks to different strategies, thereby improving the performance of small sample classification. The method can evaluate the quality of the data extracted by the meta-task and optimize the selection strategy of the meta-task, so that the method can be better adapted to the target task.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (4)

1. The meta-task small sample classification method based on data quality and reinforcement learning is used for classifying financial and scientific product data and is characterized by comprising the following steps of:
S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, mobile banking, lending, stock, accounting, insurance, information and funds;
s2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; combining the sampling training set and the sampling test set into a unitary learning task;
s3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task; the calculation formula is as follows:
Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented;
the tasks are classified tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data and information type or fund type classification of the financial technology product data;
The data in each task is financial and scientific product data corresponding to each label in the two-class task;
The calculation formula of the data quality evaluation function is as follows:
Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)The standard deviation of all data representing the kth task in the meta-learning task is calculated as follows:
Wherein, The first data representing the kth task in the meta-learning task;
S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1; the expression is:
Wherein, Representing the category of the kth task in the meta-learning task; /(I)Is an adjustable threshold parameter;
performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter;
Performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter, wherein the obtaining the first meta-policy and the first parameter comprises:
Step 1: initializing a unary policy Wherein/>The parameters are meta-policy parameters;
Step 2: for each task b belonging to category 0 in meta-learning tasks, sampling an initial strategy from the meta-strategies Wherein/>;/>Representing initial policy parameters,/>Is a random disturbance;
step 3: executing initial policies on task b And collecting first trajectory data/>, therein,/>Expressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:
Wherein, Representing a first jackpot; t represents a time step; /(I)Representing the rewards earned by the initial strategy at time step t;
step 4: updating the initial strategy parameters according to the first track data and the first jackpot to obtain a new strategy Wherein/>;/>Representing new strategy parameters, wherein alpha is a first learning rate;
Step 5: executing a new policy on task b And collecting second trajectory data/>, therein; Calculating a second jackpot based on the second trajectory data, the calculation formula being:
Wherein, Representing a second jackpot;
step 6: updating the meta-strategy parameters according to the second cumulative rewards of all the tasks belonging to the category 0 in the meta-learning tasks to obtain a new meta-strategy Wherein/>;/>Representing new meta-strategy parameters, wherein beta is a second learning rate;
Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-strategy and a first parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter;
performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter, wherein the obtaining of the second meta strategy and the second parameter comprises:
step 1: initializing a parameter delta, a meta learning rate zeta and a task learning rate eta of a unitary reinforcement learning model;
step 2: for each meta-training iteration, a batch of tasks G is sampled from the tasks of category 1 to which the meta-learning task belongs, The h task G h comprises an unbalanced training set and a second testing set;
Step 3: for the h task G h, the following operations are performed:
step 3.1: using the current parameter delta to perform one-step strategy gradient update on the unbalanced training set to obtain a new parameter ,/>; Wherein/>Is a policy objective function based on data tasks with unbalanced classes, and has the expression:
Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; /(I)Is a policy function based on a parameter delta; /(I)Is a dominance function based on the parameter delta; lambda is a regularization coefficient; /(I)Is the entropy of the policy; /(I)Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z c is the weight of category c,; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r; gamma represents a discount factor; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained; /(I)The representation is based on parameters/>Action cost function of/>Representing a temperature parameter;
step 3.2: using new parameters Evaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;
step 4: based on the report value updating parameter delta of all tasks belonging to the category 1 in the meta-learning task, the updating formula is as follows:
Wherein, Representing the updated parameters; m represents the number of tasks sampled from the tasks belonging to category 1 in the meta-learning task;
Step 5: repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter;
s5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; the calculation formula is as follows:
Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; Representing a second element policy;
obtaining a mixing parameter based on the first parameter and the second parameter; the calculation formula is as follows:
Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [, ] represents a join operation;
S6: constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model;
The step of constructing an objective function based on the mixing strategy and the mixing parameters, maximizing the objective function until convergence, and obtaining a classification model includes:
Step 1: and learning a return function of each task in the tasks by using the mixing strategy and the mixing parameter evaluation element, and calculating an average value as a target function, wherein the calculation formula is as follows:
Wherein, Representing an objective function; /(I)Representing a strategy corresponding to a j-th task in the meta-learning task; /(I)Representing strategy parameters corresponding to a j-th task in the meta-learning task; k represents the total number of tasks in the meta-learning task; the report function of the j-th task in the meta-learning task is represented as follows:
true label representing j-th task in meta-learning tasks,/> The value of (2) is 0 or 1; /(I)Representing the predictive probability of the j-th task in meta-learning tasks,/>The value range of (2) is [0,1];
step 2: maximizing objective function by gradient-lifting method And updating the mixing coefficient and the mixing parameter, wherein the expression is as follows:
Wherein, Representing the mixing coefficient,/>Third learning rate,/>Representing a fourth learning rate;
step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model;
S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.
2. The meta-task small sample classification method based on data quality and reinforcement learning according to claim 1, wherein the original data set is divided into a training set and a test set according to a set proportion p, the training set is denoted as D train, the test set is denoted as D test, and a relation between the training set and the test set is satisfied:
Wherein D represents the original dataset; A feature vector representing the ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the total number of financial and scientific product data.
3. The method for classifying metatask small samples based on data quality and reinforcement learning according to claim 2, wherein normalizing the training set and the test set comprises: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:
Wherein, A feature vector representing normalized ith financial and scientific product data; /(I)A feature vector representing the ith financial and scientific product data; μ represents the mean of the feature vectors; sigma represents the standard deviation of the feature vector and N represents the total number of financial and scientific product data.
4. The meta-task small sample classification method based on data quality and reinforcement learning of claim 3, wherein the sampling training set is expressed as:
the sample test set is expressed as:
The meta learning task is expressed as:
a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity. /(I)
CN202410158075.6A 2024-02-04 2024-02-04 Meta-task small sample classification method based on data quality and reinforcement learning Active CN117688455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410158075.6A CN117688455B (en) 2024-02-04 2024-02-04 Meta-task small sample classification method based on data quality and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410158075.6A CN117688455B (en) 2024-02-04 2024-02-04 Meta-task small sample classification method based on data quality and reinforcement learning

Publications (2)

Publication Number Publication Date
CN117688455A CN117688455A (en) 2024-03-12
CN117688455B true CN117688455B (en) 2024-05-03

Family

ID=90139478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410158075.6A Active CN117688455B (en) 2024-02-04 2024-02-04 Meta-task small sample classification method based on data quality and reinforcement learning

Country Status (1)

Country Link
CN (1) CN117688455B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711433A (en) * 2018-11-30 2019-05-03 东南大学 A kind of fine grit classification method based on meta learning
CN110569886A (en) * 2019-08-20 2019-12-13 天津大学 Image classification method for bidirectional channel attention element learning
CN114119319A (en) * 2021-11-24 2022-03-01 中建三局绿色产业投资有限公司 Intelligent water affair management method and system based on distributed small sample algorithm
CN114429009A (en) * 2022-04-07 2022-05-03 中国石油大学(华东) Small sample sucker-rod pump well working condition diagnosis method based on meta-migration learning
CN117136360A (en) * 2021-02-12 2023-11-28 硕动力公司 System and method for task-oriented dialog security policy improvement
CN117150359A (en) * 2023-08-31 2023-12-01 华能阜新风力发电有限责任公司 Small sample fault diagnosis method, system, device and medium based on model independent element learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919317A (en) * 2018-01-11 2019-06-21 华为技术有限公司 A kind of machine learning model training method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711433A (en) * 2018-11-30 2019-05-03 东南大学 A kind of fine grit classification method based on meta learning
CN110569886A (en) * 2019-08-20 2019-12-13 天津大学 Image classification method for bidirectional channel attention element learning
CN117136360A (en) * 2021-02-12 2023-11-28 硕动力公司 System and method for task-oriented dialog security policy improvement
CN114119319A (en) * 2021-11-24 2022-03-01 中建三局绿色产业投资有限公司 Intelligent water affair management method and system based on distributed small sample algorithm
CN114429009A (en) * 2022-04-07 2022-05-03 中国石油大学(华东) Small sample sucker-rod pump well working condition diagnosis method based on meta-migration learning
CN117150359A (en) * 2023-08-31 2023-12-01 华能阜新风力发电有限责任公司 Small sample fault diagnosis method, system, device and medium based on model independent element learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Survey of Meta-Reinforcement Learning;Jacob Beck et al.;《arXiv:2301.08028v1 [cs.LG]》;20230119;第1-53页 *
基于博弈策略强化学习的函数优化算法;叶俊 等;《计算机工程与应用》;20050501(第17期);第67-68, 101页 *

Also Published As

Publication number Publication date
CN117688455A (en) 2024-03-12

Similar Documents

Publication Publication Date Title
US11599939B2 (en) System, method and computer program for underwriting and processing of loans using machine learning
Bahnsen et al. Cost sensitive credit card fraud detection using Bayes minimum risk
US10579938B2 (en) Real time autonomous archetype outlier analytics
CN104321794A (en) A system and method using multi-dimensional rating to determine an entity's future commercial viability
Fan et al. Improved ML-based technique for credit card scoring in internet financial risk control
CN115205011B (en) Bank user portrait model generation method based on LSF-FC algorithm
CN111325344A (en) Method and apparatus for evaluating model interpretation tools
CN112085593B (en) Credit data mining method for small and medium enterprises
CN117688455B (en) Meta-task small sample classification method based on data quality and reinforcement learning
CN112508684A (en) Joint convolutional neural network-based collection risk rating method and system
Naik Predicting credit risk for unsecured lending: A machine learning approach
Fikriya et al. Support Vector Machine Predictive Analysis Implementation: Case Study of Tax Revenue in Government of South Lampung
Dzelihodzic et al. Data Mining Techniques for Credit Risk Assessment Task
Lee et al. Application of machine learning in credit risk scorecard
CN117094817B (en) Credit risk control intelligent prediction method and system
Oh et al. Developing time-based clustering neural networks to use change-point detection: Application to financial time series
Sarlija et al. A Neural Network Classification of Credit Applicants in Consumer Credit Scoring.
CN113282886B (en) Bank loan default judgment method based on logistic regression
CN113158230B (en) Online classification method based on differential privacy
CN116843432B (en) Anti-fraud method and device based on address text information
CN113554228B (en) Training method of repayment rate prediction model and repayment rate prediction method
Li et al. CUS-RF-Based Credit Card Fraud Detection with Imbalanced Data
CN118071482A (en) Method for constructing retail credit risk prediction model and consumer credit business Scorebetad model
Yan et al. Beyond classification and ranking: constrained optimization of the ROI
Davalos et al. Deriving rules for forecasting air carrier financial stress and insolvency: A genetic algorithm approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant