CN115423540B

CN115423540B - Financial model knowledge distillation method and device based on reinforcement learning

Info

Publication number: CN115423540B
Application number: CN202211373039.9A
Authority: CN
Inventors: 韩柳; 胡雪枫; 朱威; 郑宇晟; 唐镇坤; 黄文辉
Original assignee: China Post Consumer Finance Co ltd
Current assignee: China Post Consumer Finance Co ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-02-03
Anticipated expiration: 2042-11-04
Also published as: CN115423540A

Abstract

The invention relates to a financial model knowledge distillation method and a device based on reinforcement learning, which comprises the following steps: s1: model design of an enterprise A and an enterprise B is carried out, and pre-training distillation and initialization are carried out on student models of the enterprise A; s2: building the pre-trained, distilled and initialized student model in a server of an enterprise B, and performing distillation training again; s3: and reasoning and forecasting are carried out through a teacher reasoning model of the enterprise A, and data enhancement is carried out on a student model of the enterprise B through a reasoning result. The financial model knowledge distillation method and device based on reinforcement learning provided by the invention realize a cross-organization combined modeling scheme, achieve the purpose of protecting data privacy by using the weak interpretability of a deep learning model in knowledge distillation, and simultaneously can obtain a high-quality response rate passenger group in a drainage mechanism required by a credit company under the condition of not revealing a credit company wind control strategy, thereby saving the marketing and passenger obtaining cost.

Description

Financial model knowledge distillation method and device based on reinforcement learning

Technical Field

The invention relates to the technical field of data enhancement of knowledge distillation training, in particular to a financial model knowledge distillation method and device based on reinforcement learning.

Background

In the inter-enterprise drainage of the credit industry, joint modeling is involved, and the following modes can be adopted: desensitization data is brought out of the company or federal learning techniques are used, but the following drawbacks are encountered:

(1) Desensitization data is limited to other domain samples below ten thousand levels to participate in local modeling of the other party, and the effect is limited due to the limited open data volume;

(2) When the system is oriented to a financial scene, a unified federal learning training platform is not provided, and is trusted by all credit industry institutions, meanwhile, the training and debugging period of the federal learning is more time-consuming compared with local training, the time period is 2-6 times of that of local modeling, and popularization is influenced;

(3) Under a cross-domain scene, the financial company and other domain companies have guest group labels with larger difference, and under the scene of giving consideration to privacy, the difficulty of selecting the risk guest group of the financial company is large;

(4) When a financial scene searches for a customer group, the marketing customer group is searched by using the characteristics of risk classes, so that the customer group does not always have the willingness of loan, the conversion rate is low, and the operation and delivery cost is increased.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a knowledge distillation method and a knowledge distillation device suitable for a financial customer consensus model of a credit company under the condition of implementing cross-domain or co-domain cooperation, wherein the financial consensus model is defined by the range of response rate, repayment willingness after default, application of a scoring card, behavior scoring classification and the like.

In order to achieve the purpose of the invention, the invention provides a financial model knowledge distillation method based on reinforcement learning, which comprises the following steps:

s1: model design of enterprises A and enterprises B is carried out, and pre-training distillation and initialization are carried out on student models of the enterprises A;

s2: building the pre-trained, distilled and initialized student model in a server of an enterprise B, and performing distillation training again;

s3: and reasoning and forecasting are carried out through a teacher reasoning model of the enterprise A, and data enhancement is carried out on a student model of the enterprise B through a reasoning result.

Preferably, the model design of the enterprise a in the step S1 specifically includes:

the teacher model is designed into n layers of transformers, and the transformers of the teacher model output corresponding hidden variables through multi-head attention by taking the hidden variables of the upper layer as input.

Preferably, the model design of the enterprise B in step S1 specifically includes:

carry out the design of bull teacher's model of B enterprise, its teacher's model is only distilled as local student's model and is used, specifically includes: a teacher model of risk assessment classification.

Preferably, the step S1 of designing the model of the enterprise a specifically further includes:

cleaning the behavior data of the enterprise A, wherein the behavior data specifically comprises the following components: when the user behavior data is too large, the statistical-level feature engineering is required, and the method specifically comprises the following steps: the number of clicks of the user and the page element dwell time.

Preferably, the specific steps of step S3 are:

and (4) reasoning and predicting through a teacher reasoning model of the enterprise A, and performing data enhancement on a student model of the enterprise B by adopting a hardlab mode.

Preferably, the specific steps of step S2 include:

the pre-trained, distilled and initialized student models are built in a server of an enterprise B, the student models of the enterprise B are subjected to distillation training through the knowledge distilled by teacher models of the enterprise A and the enterprise B, and the trained student models of the enterprise B are deployed in the enterprise A and used as prediction reasoning considering the batch approval rate and the advertisement response rate of the enterprise B.

Preferably, the specific step of performing distillation training again in step S2 is:

based on the Actor-Critic method, a student model of the enterprise B is used as an Actor, and the produced behavior sequence is explored according to different scenes, wherein the behavior sequence comprises a financial risk type strategy and a marketing pull-up type strategy.

Preferably, the financial risk type policy and the marketing pull-up type policy specifically include:

financial risk class policy: derating, promoting, rejecting, and passing risk intervention strategies;

marketing pull new class strategy: designing a marketing strategy of issuing coupons and avoiding activities;

the financial risk strategy is used for stimulating the local wind control environment of the enterprise B and collecting status; the marketing pull class strategy is used for sending to the A enterprise and is responsible for collecting status.

Preferably, the teacher inference model of the enterprise A and the teacher model of the enterprise B are used as Critic, behavior scores are made through behavior of the Critic based on Actor, preset values are selected according to the obtained scores, and the Critic and the Actor are updated simultaneously according to the scores of the preset values.

Preferably, the invention also provides a financial model knowledge distillation device based on reinforcement learning, which comprises:

a configuration module: the method is used for carrying out model design of the enterprise A and the enterprise B;

a training module: pre-training, distilling and initializing the student models of the enterprise A, building the pre-trained, distilled and initialized student models in a server of the enterprise B, and performing distillation training again;

the data enhancement module: and carrying out reasoning prediction through a teacher reasoning model of the enterprise A, and carrying out data enhancement on a student model of the enterprise B through a reasoning result.

The invention has the beneficial effects that: the financial model knowledge distillation method and device based on reinforcement learning provided by the invention realize a cross-organization combined modeling scheme, achieve the purpose of protecting data privacy by using the weak interpretability of a deep learning model in knowledge distillation, and simultaneously can obtain a high-quality response rate passenger group in a drainage mechanism required by a credit company under the condition of not revealing a credit company wind control strategy, thereby saving the marketing and passenger obtaining cost.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intended to be drawn to scale in actual dimensions, emphasis instead being placed upon illustrating the principles of the invention.

FIG. 1 is a schematic flow chart of a method and apparatus for distilling knowledge of financial models based on reinforcement learning according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating specific steps of a financial model knowledge distillation method and apparatus based on reinforcement learning according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention are further described in detail with reference to the drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not limited to the present invention.

Referring to fig. 1-2, an embodiment of the invention provides a financial model knowledge distillation method based on reinforcement learning, including the following steps:

s1: model design of an enterprise A (other domain enterprises) and an enterprise B (financial enterprises) is carried out, and pre-training distillation and initialization are carried out on student models of the enterprise A;

s3: and carrying out reasoning prediction through a teacher reasoning model of the enterprise A, and carrying out data enhancement on a student model of the enterprise B through a reasoning result.

The beneficial effects of the invention are as follows: the financial model knowledge distillation method and device based on reinforcement learning provided by the invention adopt a teacher-student mode, realize a cross-institution joint modeling scheme, utilize the weak interpretability of a deep learning model in knowledge distillation, achieve the purpose of protecting data privacy, and simultaneously can obtain a high-quality response rate guest group in a drainage mechanism required by a credit company under the condition of not revealing a credit company wind control strategy, thereby saving the marketing and guest obtaining cost.

Referring to fig. 1-2, in a preferred embodiment, the model design of the enterprise a in step S1 specifically includes:

the teacher model is designed as n layers of transformers, and the transformers of the teacher model output relatively good mutual hidden variables (the student model needs to consider privacy, and the number of layers and the attention head need to keep a certain number) by taking the hidden variables of the previous layer as input and through multi-head attention (considering privacy and proposing more than 12 heads).

Referring to fig. 1-2, in a preferred embodiment, the model design of the enterprise B in step S1 specifically includes:

carry out the design of bull teacher's model of B enterprise, its teacher's model is only distilled as local student's model and is used, specifically includes: a teacher model of risk assessment classification. The enterprise B is used as a user or an advertiser of a flow passenger group, financial wind control attributes are considered, a teacher model for risk assessment classification is established locally in the enterprise B, the teacher model established by the enterprise B is also a deep learning model and is only used for local student model distillation, so that the interpretability is not considered to be weak, and the teacher model is not used as a real scoring card model.

So-called knowledge distillation: generally, a large model is often a single complex network or a collection of networks, and has good performance and generalization capability, while a small model has limited expression capability because of a small network size. Therefore, the knowledge learned by the large model can be used for guiding the training of the small model, so that the small model has the performance equivalent to that of the large model, but the number of parameters is greatly reduced, and the model compression and acceleration are realized.

Referring to fig. 1-2, in a further preferred embodiment, the step S1 of designing the model of the enterprise a further includes:

cleaning the behavior data of the enterprise a (the enterprise a is used as an advertisement operator, and the characteristics of users on the platform of the enterprise a need to be fully considered), wherein the behavior data specifically includes but is not limited to: when the user behavior data is too large, statistical-level feature engineering is required, which specifically includes but is not limited to: the number of clicks of the user and the page element dwell time.

Referring to fig. 1-2, in a further preferred embodiment, the specific steps of step S3 are:

and (4) reasoning and predicting through a teacher reasoning model of the enterprise A, and performing data enhancement on a student model of the enterprise B by adopting a hardlab mode. Namely: the behavior of the customer group predicted by the enterprise inference model A is real and response performance, the condition characteristic local condition can be added at the activation function during the training of the student model, and the possibility that the error is transmitted to Net-S during the distillation of the teacher model can be effectively reduced by using the ground route.

Referring to fig. 1-2, in a preferred embodiment, the specific steps of step S2 include:

the method comprises the steps that a pre-trained, distilled and initialized student model is built in a server of an enterprise B, the student model of the enterprise B is subjected to distillation training through distilled knowledge of teacher models of the enterprise A and the enterprise B, and the trained student model of the enterprise B is deployed in the enterprise A and used as prediction reasoning considering batch approval rate and advertisement response rate of the enterprise B.

The financial model knowledge distillation method and device based on reinforcement learning provided by the invention also have the following characteristics:

1. for enterprises B, a teacher model in the enterprises A takes two factors into consideration during design, wherein the teacher model is a large model pre-trained by knowledge in the enterprise A field, such as a recommendation model, and used data such as user behavior data or local tags; in the initial phase, enterprise B will negotiate with enterprise a about the BASE user group scope of the initial drainage for the requirements related to risk characteristics.

2. The output of the teacher reasoning model needs to be used as a data enhancement means during the training of the student model;

3. in the aspect of protecting data privacy, the enterprise B is required by financial supervision, and the student model uses a deep learning structure, so that even if the subsequent student reasoning model is deployed in the enterprise A, the risk scoring knowledge learned by the enterprise B in the process of training the student model can not be exposed; on the other hand, in the fine adjustment stage of the enterprise B, the knowledge of the distilled teacher model of the enterprise A taken by the enterprise B can be ensured, and the original data of the enterprise A cannot be easily cracked.

Referring to fig. 1-2, in a further preferred embodiment, the specific steps of performing distillation training again in step S2 are:

based on the Actor-Critic method, a student model of an enterprise B is used as an Actor, and heuristics (N behavior sequences can be obtained by assuming that an Actor heuristics N times) of different generated behaviors are realized aiming at different scenes, wherein the behavior sequences comprise a financial risk type strategy and a marketing update type strategy.

Referring to fig. 1-2, in a preferred embodiment, the financial risk policy and the marketing pull-new policy specifically include:

marketing pull new class strategy: designing a coupon issuing and activity free marketing strategy;

the financial risk strategy is used for stimulating the local wind control environment of the enterprise B (such as changing the advertising style and the offering rule), and collecting status; the marketing pull class strategy is used for sending to the A enterprise and is responsible for collecting status.

Referring to fig. 1-2, in a preferred embodiment, a teacher inference model of enterprise a and a teacher model of enterprise B are used as Critic, behavior scores are made by the Critic based on actions of Actor, preset values are selected according to the obtained scores (latest status, reward, td _ error are obtained), the Critic and Actor are updated simultaneously according to the scores of the preset values (defined according to actual conditions), and loss is log _ prob _ td _ error.

According to the financial model knowledge distillation method and device based on reinforcement learning, the number of exploration times of action cannot be large according to the life cycle of a financial client, so that after a student model is preset, the number of iteration rounds of an operator-critical frame is 1-2 times per 2 weeks, and meanwhile, the learning rate \1013ofthe operator-critical frame is controlled well due to the fact that the structure of a commonly used bert is used for designing a teacher-student model.

Referring to fig. 1-2, in a preferred embodiment, the present invention further provides an apparatus for distilling knowledge of financial models based on reinforcement learning, comprising:

a training module: pre-training, distilling and initializing the student model of the enterprise A, building the pre-trained, distilled and initialized student model in a server of the enterprise B, and performing distillation training again;

The invention has the beneficial effects that: the invention provides a financial model knowledge distillation method based on reinforcement learning, which realizes a cross-organization combined modeling scheme, achieves the purpose of protecting data privacy by using the weak interpretability of a deep learning model in knowledge distillation, and can obtain a high-quality response rate passenger group in a drainage mechanism required by a credit company under the condition of not revealing a credit company wind control strategy, thereby saving the marketing and passenger obtaining cost.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A financial model knowledge distillation method based on reinforcement learning is characterized by comprising the following steps:

s1: model design of an enterprise A and an enterprise B is carried out, and pre-training distillation and initialization are carried out on student models of the enterprise A;

s3: reasoning and forecasting are carried out through a teacher reasoning model of the enterprise A, and data enhancement is carried out on a student model of the enterprise B through a reasoning result;

the specific steps of the step S2 include:

building the pre-trained, distilled and initialized student model in a server of an enterprise B, carrying out distillation training on the student model of the enterprise B through the distilled knowledge of the teacher models of the enterprise A and the enterprise B, deploying the trained student model of the enterprise B in the enterprise A, and using the trained student model of the enterprise B as a predictive reasoning which gives consideration to the batch approval rate and the advertisement response rate of the enterprise B;

the specific step of performing distillation training again in the step S2 is as follows:

based on an Actor-Critic method, a student model of an enterprise B is used as an Actor, and the exploration of the generated behavior sequence is realized aiming at different scenes, wherein the behavior sequence comprises a financial risk class strategy and a marketing update class strategy; taking the teacher inference model of the enterprise A and the teacher model of the enterprise B as Critic, performing behavior scoring based on action of Actor through the Critic, selecting a preset value according to the obtained scoring, and updating the Critic and the Actor simultaneously according to the scoring of the preset value;

the financial risk type strategy and the marketing pull-new type strategy specifically comprise the following steps:

financial risk class policy: derating, promoting, declining, and passing risk intervention strategies;

2. The method of claim 1, wherein the model design of the enterprise a in step S1 comprises:

3. The method of claim 1, wherein the model design of the business B in step S1 comprises:

4. The method of claim 1, wherein the step of designing the model of enterprise a in step S1 further comprises:

5. The financial model knowledge distillation method of claim 1, wherein the step S3 comprises the following steps:

6. A financial model knowledge distillation apparatus based on reinforcement learning, comprising:

a configuration module: the method is used for carrying out model design on the enterprise A and the enterprise B;

the data enhancement module: reasoning and predicting through a teacher reasoning model of the enterprise A, and performing data enhancement on a student model of the enterprise B through a reasoning result;

the training module comprises the following specific steps: building a pre-trained, distilled and initialized student model in a server of an enterprise B, carrying out distillation training on the student model of the enterprise B through distilled knowledge of teacher models of the enterprise A and the enterprise B, deploying the trained student model of the enterprise B in the enterprise A, and carrying out prediction reasoning considering batch approval rate and advertisement response rate of the enterprise B;

the specific steps of the training module further comprise: based on an Actor-Critic method, a student model of an enterprise B is used as an Actor, and the exploration of the generated behavior sequence is realized aiming at different scenes, wherein the behavior sequence comprises a financial risk class strategy and a marketing update class strategy; taking a teacher inference model of an enterprise A and a teacher model of an enterprise B as Critic, making behavior scores through the Critic based on action of Actor, selecting preset values according to the obtained scores, and updating the Critic and the Actor simultaneously according to the scores of the preset values;