CN117688455B

CN117688455B - Meta-task small sample classification method based on data quality and reinforcement learning

Info

Publication number: CN117688455B
Application number: CN202410158075.6A
Authority: CN
Inventors: 陈晓红; 霍杨杰; 徐雪松; 张震; 王煜坤; 许冠英
Original assignee: Xiangjiang Laboratory
Current assignee: Xiangjiang Laboratory
Priority date: 2024-02-04
Filing date: 2024-02-04
Publication date: 2024-05-03
Anticipated expiration: 2044-02-04
Also published as: CN117688455A

Abstract

The application relates to a meta-task small sample classification method based on data quality and reinforcement learning, which comprises the following steps: respectively sampling the standardized training set and the standardized test set to obtain a sampling training set and a sampling test set; combining the sampling training set and the sampling test set into a unitary learning task; normalizing each calculated weight to obtain a normalized weight corresponding to each task; determining the category of each task based on the normalization weight corresponding to each task; performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter; performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter; obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter; and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.

Description

Meta-task small sample classification method based on data quality and reinforcement learning

Technical Field

The application relates to the technical field of meta-task small sample classification, in particular to a meta-task small sample classification method based on data quality and reinforcement learning.

Background

The small sample classification refers to training a model capable of accurately classifying new types of data under the condition that only a small amount of marked data exists. In the financial field, small sample classification is an important problem, because financial data often has high-dimensional, sparse, nonlinear, dynamic change characteristics and the like, so that it is difficult to acquire enough labeling data to train an effective classification model. For example, in credit card fraud detection, the proportion of fraudulent transactions is very low and the means of fraud is continually updated, so that there is a need for a method that can quickly identify new fraudulent patterns with small amounts of annotation data.

Meta-learning has great potential in this respect, for example, in that it can learn a generic classifier using existing normal and fraudulent transaction data, and then quickly adapt and predict on the newly emerging transaction data. However, meta-learning also suffers from data quality, and if there is an imbalance in the category or noise of the data extracted from the meta-task, it is difficult for the meta-learning model to obtain effective information from the meta-task, thereby affecting its generalization ability on the target task. Therefore, there is a need for a method to evaluate the quality of data extracted by a metatask and optimize the selection strategy of the metatask so that it can better adapt to the target task.

Disclosure of Invention

Based on the above, it is necessary to provide a small sample classification method capable of dynamically selecting and adjusting the data distribution and the number of meta-tasks according to the characteristics of the target task, so as to improve the classification performance of the meta-learning model on the target task, in particular to a meta-task small sample classification method based on data quality and reinforcement learning, aiming at the problems of over-fitting, poor generalization capability, sensitivity to noise data and the like of the traditional small sample classification method.

The invention provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data and comprises the following steps:

S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, lending, stock, billing, insurance, information and funds;

s2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; combining the sampling training set and the sampling test set into a unitary learning task;

S3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task;

the tasks are classified tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data and information type or fund type classification of the financial technology product data;

The data in each task is financial and scientific product data corresponding to each label in the two-class task;

s4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1;

performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter;

performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter;

S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; obtaining a mixing parameter based on the first parameter and the second parameter;

S6: constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model;

S7: and inputting the data of the financial and scientific products to be classified into the classification model to obtain a classification result.

The beneficial effects are that: the method utilizes the data quality index to evaluate the adaptation algorithm of each meta-task, and then uses the reinforcement learning algorithm to dynamically allocate different meta-tasks to different strategies, thereby improving the performance of small sample classification.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for classifying small samples of meta-tasks based on data quality and reinforcement learning according to an embodiment of the application.

Detailed Description

In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below.

As shown in fig. 1, the present embodiment provides a meta-task small sample classification method based on data quality and reinforcement learning, which is used for classifying financial and scientific product data, and includes:

S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels include financial, payment, lending, stock, billing, insurance, information, funds.

S2: dividing the original data set into a training set and a testing set according to a proportion, and respectively sampling the standardized training set and the standardized testing set to obtain a sampling training set and a sampling testing set; and combining the sampling training set and the sampling test set into a unitary learning task.

Specifically, the original dataset is divided into a training set and a testing set according to a set proportion p, the training set is marked as D _train, the testing set is marked as D _test, and the relation between the training set and the testing set is satisfied:

；

Wherein D represents the original dataset; A feature vector representing the ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the total number of financial and scientific product data.

Further, normalizing the training set and the test set includes: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:

；

Wherein, A feature vector representing normalized ith financial and scientific product data; /(I)A feature vector representing the ith financial and scientific product data; μ represents the mean of the feature vectors; sigma represents the standard deviation of the feature vector and N represents the total number of financial and scientific product data.

Still further, the sample training set is expressed as:

；

the sample test set is expressed as:

；

The meta learning task is expressed as:

；

a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity.

Further comprises: dividing tasks in the training set into a support set and a query set; the support set is expressed as:

；

the set of queries is represented as:

；

And the relation between the support set and the query set is satisfied:

；

A feature vector representing the v-th financial and scientific product data; /(I) A category label representing v-th financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the support set; /(I)A feature vector representing the first financial and scientific product data; /(I)A category label representing the first financial and scientific product data; /(I)Representing a total number of financial and scientific product data in the query set; /(I)Representing the u-th task in the training set.

S3: and calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task.

Specifically, a data quality evaluation function is adopted to calculate the weight of each task in the meta-learning task, and a calculation formula is as follows:

；

Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)Standard deviation of all data representing the kth task in the meta-learning task; the calculation formula is as follows:

；

Wherein, The first data representing the kth task in the meta-learning task;

normalizing each calculated weight to obtain a normalized weight corresponding to each task, wherein the calculation formula is as follows:

；

Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented.

In this embodiment, the data of the financial technology product is related information of the financial technology product.

The tasks are classified into tasks, and the tasks comprise financial type or payment type classification of the financial technology product data, lending type or stock type classification of the financial technology product data, accounting type or insurance type classification of the financial technology product data, and information type or fund type classification of the financial technology product data. The essence of each task is to split a multi-label classification problem into two classification problems by task, each task is focused on identifying and distinguishing two specific categories from financial and scientific product data.

The data in each task is financial and scientific product data corresponding to each label in the classification task.

S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1.

Specifically, based on the normalized weight corresponding to each task, determining the category to which each task belongs, where the expression is:

；

Wherein, Representing the category of the kth task in the meta-learning task; /(I)In this embodiment, the value of the adjustable threshold parameter is 0.5, which is used as a judging condition for task classification, and the parameter can be adjusted according to actual situations for different model users.

And performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter.

Further, the method comprises the following steps:

Step 1: initializing a unary policy Wherein/>The parameters are meta-policy parameters;

Step 2: for each task b belonging to category 0 in meta-learning tasks, sampling an initial strategy from the meta-strategies Wherein/>；/>Representing initial policy parameters,/>Is a random disturbance;

step 3: executing initial policies on task b And collecting first trajectory data/>, therein，Expressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:

；

Wherein, Representing a first jackpot; t represents a time step; /(I)Representing the rewards earned by the initial strategy at time step t;

step 4: updating the initial strategy parameters according to the first track data and the first jackpot to obtain a new strategy Wherein/>；/>Representing new strategy parameters, wherein alpha is a first learning rate;

Step 5: executing a new policy on task b And collecting second trajectory data/>, therein; Calculating a second jackpot based on the second trajectory data, the calculation formula being:

；

Wherein, Representing a second jackpot;

step 6: updating the meta-strategy parameters according to the second cumulative rewards of all the tasks belonging to the category 0 in the meta-learning tasks to obtain a new meta-strategy Wherein/>；/>Representing new meta-strategy parameters, wherein beta is a second learning rate;

Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-policy and a first parameter.

And performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter corresponding to the second meta strategy.

Further, the method comprises the following steps:

step 1: initializing a parameter delta, a meta learning rate zeta and a task learning rate eta of a unitary reinforcement learning model;

step 2: for each meta-training iteration, a batch of tasks G is sampled from the tasks of category 1 to which the meta-learning task belongs, The h task G _h comprises an unbalanced training set and a second testing set;

Step 3: for the h task G _h, the following operations are performed:

step 3.1: using the current parameter delta to perform one-step strategy gradient update on the unbalanced training set to obtain a new parameter ，/>; Wherein/>Is a policy objective function based on data tasks with unbalanced classes, and has the expression:

；

Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; is a policy function based on a parameter delta; /(I) Is a dominance function based on the parameter delta; lambda is a regularization coefficient; is the entropy of the policy; /(I) Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z _c is the weight of category c,/>; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r for adjusting the magnitude of the bonus value; gamma represents a discount factor for measuring the impact of future prize values on current state-action pairs; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained;

；

The representation is based on parameters/> Action cost function of/>Representing temperature parameters, controlling the randomness of the strategy. The higher the temperature parameter, the more the policy tends to randomly select action a or/>; The lower the temperature parameter, the more aggressive the policy is to the greedy selection action.

；

Step 3.2: using new parametersEvaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;

step 4: based on the report value updating parameter delta of all tasks belonging to the category 1 in the meta-learning task, the updating formula is as follows:

；

Wherein, Representing the updated parameters; m represents the number of tasks sampled from the tasks belonging to category 1 in the meta-learning task;

step 5: and (3) repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter corresponding to the second element strategy.

S5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; and obtaining a mixing parameter based on the first parameter and the second parameter.

Specifically, based on the first meta-policy and the second meta-policy, a hybrid policy is obtained, and a calculation formula is as follows:

；

Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; /(I)Representing a second element policy;

Based on the first parameter and the second parameter, a mixing parameter is obtained, and a calculation formula is as follows:

；

Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [ ] represents a join operation.

S6: and constructing an objective function based on the mixing strategy and the mixing parameters, and maximizing the objective function until convergence to obtain a classification model.

Specifically, the method comprises the following steps:

Step 1: and learning a return function of each task in the tasks by using the mixing strategy and the mixing parameter evaluation element, and calculating an average value as a target function, wherein the calculation formula is as follows:

；

Wherein, Representing an objective function; /(I)Representing a strategy corresponding to a j-th task in the meta-learning task; /(I)Representing strategy parameters corresponding to a j-th task in the meta-learning task; k represents the total number of tasks in the meta-learning task; the report function of the j-th task in the meta-learning task is represented as follows:

；

true label representing j-th task in meta-learning tasks,/> The value of (2) is 0 or 1; /(I)Representing the predictive probability of the j-th task in meta-learning tasks,/>The value range of (2) is [0,1];

step 2: maximizing objective function by gradient-lifting method And updating the mixing coefficient and the mixing parameter, wherein the expression is as follows:

；

Wherein, Representing the mixing coefficient,/>Third learning rate,/>Representing a fourth learning rate;

step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model.

In this embodiment, the classification result is a classification label of the financial and scientific product data to be classified, which is predicted by the classification model.

The method provided by the embodiment utilizes the data quality index to evaluate the adaptation algorithm of each metatask, and then uses the reinforcement learning algorithm to dynamically allocate different metatasks to different strategies, thereby improving the performance of small sample classification. The method can evaluate the quality of the data extracted by the meta-task and optimize the selection strategy of the meta-task, so that the method can be better adapted to the target task.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. The meta-task small sample classification method based on data quality and reinforcement learning is used for classifying financial and scientific product data and is characterized by comprising the following steps of:

S1: acquiring feature vectors and category labels of financial and scientific product data, and constructing an original data set; the feature vector comprises names, descriptions, developers, technical foundations and types of financial and scientific products; the category labels comprise financial management, payment, mobile banking, lending, stock, accounting, insurance, information and funds;

s3: calculating the weight of each task in the meta-learning task by adopting a data quality evaluation function, and normalizing each calculated weight to obtain a normalized weight corresponding to each task; the calculation formula is as follows:

；

Wherein, Representing the normalization weight corresponding to the kth task in the meta-learning task; k represents the total number of tasks in the meta-learning task; /(I)The weight of the j-th task in the meta-learning task is represented;

The calculation formula of the data quality evaluation function is as follows:

；

Wherein, The weight of a kth task in the meta-learning task is represented; f (·) represents a data quality assessment function; /(I)A set of all data representing a kth task of the meta-learning tasks; /(I)The quality of the 1 st piece of data of the kth task in the meta-learning task is represented; /(I)Representing the quality of the e-th data of the k-th task in the meta-learning task; e represents the total number of all data of the kth task in the meta-learning task; /(I)The standard deviation of all data representing the kth task in the meta-learning task is calculated as follows:

；

Wherein, The first data representing the kth task in the meta-learning task;

S4: determining the category of each task based on the normalization weight corresponding to each task; the belonging categories of tasks include 0 and 1; the expression is:

；

Wherein, Representing the category of the kth task in the meta-learning task; /(I)Is an adjustable threshold parameter;

Performing meta-policy optimization on the task with the category of 0 to obtain a first meta-policy and a first parameter, wherein the obtaining the first meta-policy and the first parameter comprises:

step 3: executing initial policies on task b And collecting first trajectory data/>, therein，/>Expressed in task b, initial policy/>A record of a series of actions taken and a series of rewards earned; a first jackpot is calculated based on the first trajectory data, the calculation formula being:

；

Wherein, Representing a second jackpot;

Step 7: repeating the steps 2-6 until convergence, and obtaining a first meta-strategy and a first parameter;

performing meta reinforcement learning on the task with the category of 1 to obtain a second meta strategy and a second parameter, wherein the obtaining of the second meta strategy and the second parameter comprises:

Step 3: for the h task G _h, the following operations are performed:

；

Wherein, A training set representing class imbalance; /(I)Data in the training set that is class imbalance; /(I)Is a policy function based on a parameter delta; /(I)Is a dominance function based on the parameter delta; lambda is a regularization coefficient; /(I)Is the entropy of the policy; /(I)Representing the number of categories; i [ DEG ] is an instruction function; y is the true category corresponding to state s; z _c is the weight of category c,; P (c) is the frequency of occurrence of category c in the training set of category imbalance; a is action; o represents a constant factor in the bonus function r; gamma represents a discount factor; /(I)Representing a cost function based on a parameter delta for estimating a state s or state/>Lower policy/>A desired return that can be obtained; /(I)The representation is based on parameters/>Action cost function of/>Representing a temperature parameter;

step 3.2: using new parameters Evaluating performance of the policy on the second test set to obtain a return value; Wherein/>For the second test set, J (·) is the policy objective function;

；

Step 5: repeating the steps 2-4 until convergence, and obtaining a second element strategy and a second parameter;

s5: obtaining a mixed strategy based on the first meta-strategy and the second meta-strategy; the calculation formula is as follows:

；

Wherein, Representing a mixing strategy; a is action, s is state; /(I)Is a mixing coefficient; /(I)Representing a first meta-policy; Representing a second element policy;

obtaining a mixing parameter based on the first parameter and the second parameter; the calculation formula is as follows:

；

Wherein, Representing the mixing parameters; /(I)Representing a first parameter; /(I)Representing a second parameter, [, ] represents a join operation;

The step of constructing an objective function based on the mixing strategy and the mixing parameters, maximizing the objective function until convergence, and obtaining a classification model includes:

；

step 3: repeating the steps 1 and 2 until convergence, and taking the converged objective function as a classification model;

2. The meta-task small sample classification method based on data quality and reinforcement learning according to claim 1, wherein the original data set is divided into a training set and a test set according to a set proportion p, the training set is denoted as D _train, the test set is denoted as D _test, and a relation between the training set and the test set is satisfied:

；

3. The method for classifying metatask small samples based on data quality and reinforcement learning according to claim 2, wherein normalizing the training set and the test set comprises: the feature vector of the financial technology product data in the training set and the feature vector of the financial technology product data in the testing set are standardized, and the standardized formula is as follows:

；

4. The meta-task small sample classification method based on data quality and reinforcement learning of claim 3, wherein the sampling training set is expressed as:

；

the sample test set is expressed as:

；

The meta learning task is expressed as:

；

a feature vector representing normalized ith financial and scientific product data; /(I) A category label representing the ith financial and scientific product data; n represents the sampled financial and scientific product data quantity. /(I)