CN115587535A

CN115587535A - Model construction optimization method, device, storage medium, and program product

Info

Publication number: CN115587535A
Application number: CN202211204151.XA
Authority: CN
Inventors: 何元钦; 康焱
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-10

Abstract

The invention discloses a model construction optimization method, a device, a storage medium and a program product, wherein the method comprises the following steps: the first participant equipment respectively inputs the first original feature data into the first feature projection model; receiving each piece of second projection characteristic data sent by the second participant equipment and forming a data pair with each piece of first projection characteristic data; inputting the data pairs into a first coding model to obtain first coding characteristic data; inputting the first coding characteristic data into a first classification model to obtain a classification result corresponding to the data pair, wherein the classification result is used for representing whether the projection characteristic data in the data pair correspond to the same sample or not; and after at least one round of pre-training updating is carried out on each model, longitudinal federal learning is carried out on the model and second participant equipment to obtain a target task model. The invention realizes model pre-training by using the non-label data, so that the non-label data can be used for participating in the longitudinal federal learning, and the prediction accuracy of the model obtained by the longitudinal federal learning can be improved.

Description

Model construction optimization method, device, storage medium, and program product

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a model construction optimization method, device, storage medium, and program product.

Background

The longitudinal federal learning is used for solving the data island problem of a multi-party combined construction business model in the financial field. The application scenario of the longitudinal federal learning is the situation that users of all participants overlap more, but features overlap less, and different participants have information of different fields/different angles of the users. A typical scenario is that each party finds out through user matching that the proportion of overlapping users is high, but each party has a low proportion of corresponding labels, so that there are few samples that can be actually used for training, and how to be able to utilize unlabeled data is a problem of great practical significance. For example, in the field of wind control, banks and electronic commerce institutions can perform joint longitudinal federal learning by using data of different dimensions of users owned by the banks and the electronic commerce institutions to obtain a model for predicting the repayment risk of the users, but the banks and the electronic commerce institutions have fewer labeled common users, that is, the amount of training data which can be used in the longitudinal federal learning process is small, and the model cannot be trained sufficiently, so that the accuracy of the repayment risk of the users predicted by the model is not high.

Disclosure of Invention

The invention mainly aims to provide a model construction optimization method, equipment, a storage medium and a program product, and aims to provide a method for model pre-training by adopting label-free data in a longitudinal federal learning scene, so that the label-free data can be used for participating in longitudinal federal learning, and the prediction accuracy of a model obtained by the longitudinal federal learning can be improved.

In order to achieve the above object, the present invention provides a model construction optimization method, which is applied to a first participant device participating in longitudinal federal learning, wherein the first participant device deploys a first feature projection model, a first coding model and a first classification model, and a second participant device participating in longitudinal federal learning deploys a second feature projection model, and the method includes the following steps:

respectively inputting the first original feature data of each sample in the first participant equipment into the first feature projection model for projection to obtain each piece of first projection feature data;

receiving each piece of second projection characteristic data sent by the second participant device and forming a data pair with each piece of first projection characteristic data, wherein each piece of second projection characteristic data is obtained by inputting second original characteristic data of each sample into the second characteristic projection model for projection by the second participant device;

inputting the data pair into the first coding model to be coded to obtain first coding feature data, wherein the data pair comprises a first data pair marked with a first label or a second data pair marked with a second label, the first data pair consists of a piece of first projection feature data and a piece of second projection feature data corresponding to the same sample, and the second data pair consists of a piece of first projection feature data and a piece of second projection feature data corresponding to different samples;

inputting the first coding feature data into the first classification model for classification to obtain a classification result corresponding to the data pair, wherein the classification result is used for representing whether two pieces of projection feature data in the data pair correspond to the same sample;

performing a round of pre-training updating on the first feature projection model, the first coding model and the first classification model according to an error between a corresponding label of the data pair and the classification result;

after at least one round of pre-training update is performed on each model, longitudinal federal learning is performed on the basis of the updated first feature projection model, the first coding model, the second feature projection model and the second participant device to obtain a target task model, wherein the second feature projection model is updated by the second participant device in the process of pre-training update on each model.

Optionally, the step of receiving each piece of second projection feature data sent by the second participant device and forming a data pair with each piece of the first projection feature data includes:

for a first target sample in each sample, calculating a sample distance between the first target sample and other samples except the first target sample in each sample;

taking a preset number of samples closest to the sample distance between the first target sample as second target samples, or taking samples with the sample distance between the first target sample smaller than a preset threshold value as the second target samples;

newly adding the first projection characteristic data corresponding to each second target sample into first projection characteristic data corresponding to the first target sample to obtain a plurality of pieces of first projection characteristic data corresponding to the first target sample;

and respectively combining a plurality of pieces of first projection characteristic data corresponding to the first target sample and the second projection characteristic data corresponding to the first target sample into the first data pairs, and labeling the first label for each first data pair.

Optionally, when the first original feature data is table data including values of a corresponding sample under a plurality of data items, the first feature projection model includes a word list and a neural network, and the target original feature data is any one of the first original feature data;

the step of inputting the target original feature data into the first feature projection model for projection to obtain the first projection feature data corresponding to the target original feature data comprises:

selecting embedded characteristic data corresponding to values under a first data item in the target original characteristic data from embedded characteristic data corresponding to various values under the first data item in the word list as first part of projection characteristic data, wherein the first data item is a data item with discrete values;

inputting values under each second data item in the target original feature data into the neural network for feature projection to obtain second part of projection feature data, wherein the second data items are data items with continuous values;

and obtaining the first projection characteristic data corresponding to the target original characteristic data according to the first part of projection characteristic data and the second part of projection characteristic data.

Optionally, when the first coding model is a coding model based on an attention-focused system, the first projection feature data and the second projection feature data each include a plurality of projection feature vectors, a target data pair is any one of the data pairs, and the step of inputting the target data pair into the first coding model to obtain the first coding feature data corresponding to the target data pair by coding includes:

forming a feature vector sequence by using each projection feature vector included in first target projection feature data and each projection feature vector included in second target projection feature data, wherein the first target projection feature data is the first projection feature data included in the target data pair, and the second target projection feature data is the second projection feature data included in the target data pair;

inputting the characteristic vector sequence into the first coding model for coding to obtain coding characteristic vectors corresponding to each vector in the characteristic vector sequence;

and obtaining the first coding feature data corresponding to the target data pair according to the coding feature vector.

Optionally, the second participant device is further deployed with a second coding model and a second classification model, and the step of performing a round of pre-training update on the first feature projection model, the first coding model and the first classification model according to an error between the data pair corresponding to the label and the classification result further includes:

before next round of pre-training and updating is carried out on each model, sending a first target coding layer in the updated first coding model to aggregation equipment, so that the aggregation equipment aggregates the first target coding layer and a second target coding layer received from second participant equipment to obtain a third target coding layer, wherein the first target coding layer is an nth layer of coding layer in the updated first coding model, the second target coding layer is an nth layer of coding layer in the updated second coding model, and n is greater than 1;

and receiving the third target coding layer sent by the aggregation equipment, and performing the next round of pre-training updating on each model after updating the first target coding layer in the first coding model by adopting the third target coding layer.

Optionally, after the step of inputting the first encoding characteristic data into the first classification model for classification to obtain a classification result corresponding to the data pair, the method further includes:

calculating an error between a corresponding label of the data pair and the classification result to obtain a first intermediate result for updating the second feature projection model;

and sending the first intermediate result to the second participant device, so that the second participant device performs a round of pre-training update on the second feature projection model by using the first intermediate result.

Optionally, the first participant device is further deployed with a prediction model, and the step of performing longitudinal federal learning with the second participant device based on the updated first feature projection model, the first coding model, and the second feature projection model to obtain a target task model includes:

inputting the first original feature data of each aligned sample in the first participant device into the updated first feature projection model for projection to obtain each piece of third projection feature data;

receiving each piece of fourth projection feature data sent by the second participant device, wherein the second participant device inputs second original feature data of each aligned sample into the updated second feature projection model for projection to obtain each piece of fourth projection feature data;

combining one piece of the third projection characteristic data and one piece of the fourth projection characteristic data corresponding to the same aligned sample, and inputting the combined data into the updated first coding model for coding to obtain second coding characteristic data corresponding to each aligned sample;

inputting each piece of second coding characteristic data into the prediction model respectively for prediction to obtain a prediction result corresponding to each aligned sample;

performing one round of longitudinal federal update on the first feature projection model, the first coding model and the prediction model according to errors between prediction labels corresponding to the alignment samples and the prediction results, and calculating to obtain a second intermediate result for updating the second feature projection model;

sending the second intermediate result to the second participant device, so that the second participant device performs a round of longitudinal federal update on the second feature projection model according to the second intermediate result;

and after at least one round of longitudinal federal update is carried out on each model, obtaining a target task model based on the updated first feature projection model, the second feature projection model, the first coding model and the prediction model.

In addition, to achieve the above object, the present invention also provides a model construction optimization apparatus, including: a memory, a processor and a model building optimization program stored on the memory and executable on the processor, the model building optimization program when executed by the processor implementing the steps of the model building optimization method as described above.

Furthermore, to achieve the above object, the present invention further provides a computer readable storage medium having stored thereon a model building optimization program, which when executed by a processor implements the steps of the model building optimization method as described above.

Furthermore, to achieve the above object, the present invention also proposes a computer program product comprising a computer program which, when being executed by a processor, implements the steps of the model building optimization method as described above.

In the invention, a first characteristic projection model, a first coding model and a first classification model are deployed in first participant equipment participating in longitudinal federated learning, and a second characteristic projection model is deployed in second participant equipment participating in longitudinal federated learning; the first participant equipment respectively inputs the first original characteristic data of each sample into the first characteristic projection model for projection to obtain each piece of first projection characteristic data; receiving each piece of second projection characteristic data sent by second participant equipment and forming a data pair with each piece of first projection characteristic data, wherein each piece of second projection characteristic data is obtained by inputting second original characteristic data of each sample into a second characteristic projection model by the second participant equipment for projection; inputting data pairs into a first coding model to be coded to obtain first coding characteristic data, wherein the data pairs comprise first data pairs marked with first labels or second data pairs marked with second labels, the first data pairs consist of a piece of first projection characteristic data and a piece of second projection characteristic data corresponding to the same sample, and the second data pairs consist of a piece of first projection characteristic data and a piece of second projection characteristic data corresponding to different samples; inputting the first coding feature data into a first classification model for classification to obtain a classification result corresponding to the data pair, wherein the classification result is used for representing whether two pieces of projection feature data in the data pair correspond to the same sample; performing one round of pre-training updating on the first characteristic projection model, the first coding model and the first classification model according to the error between the corresponding label and the classification result of the data pair; after at least one round of pre-training updating is carried out on each model, longitudinal federal learning is carried out on the first feature projection model, the first coding model and the second feature projection model which are updated and second participant equipment to obtain a target task model, wherein the second feature projection model is updated by the second participant equipment in the process of carrying out pre-training updating on each model.

Compared with the method that the first feature projection model, the second feature projection model, the first coding model and the prediction model which are initialized randomly or according to manual experience are subjected to longitudinal federal learning by adopting labeled sample data in each participant, the method adopts the pre-trained first feature projection model, the second feature projection model and the first coding model as the basis of the longitudinal federal learning, and because the first feature projection model, the second feature projection model and the first coding model learn some information related to the features of the sample, in the longitudinal federal learning stage, the coding feature data which are more beneficial to the prediction model to obtain an accurate prediction result can be coded and obtained by processing the first feature projection model, the second feature projection model and the first coding model, so that the prediction accuracy of a target task model obtained by training can be improved, the time length of the longitudinal federal learning can be shortened, and the consumption of computing resources in the longitudinal federal learning stage can be reduced. In addition, the original feature data of the unlabeled samples are applied to the longitudinal federal learning scene to participate in model training, and therefore the model prediction accuracy in the longitudinal federal learning scene is improved based on the original feature data of the unlabeled samples. In addition, in the pre-training process, the original characteristic data of the sample is not directly sent by each participant device, so that the privacy and the safety of the data in each participant are also ensured.

Drawings

Fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of the model construction optimization method of the present invention;

FIG. 3 is a diagram illustrating a pre-training architecture according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a pre-training architecture according to an embodiment of the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that, the model building optimization device in the embodiment of the present invention may be a smart phone, a personal computer, a server, and the like, and is not limited herein.

As shown in fig. 1, the model building optimization apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory such as a disk memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 1 does not constitute a limitation of the model building optimization apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a model building optimizer. An operating system is a program that manages and controls the hardware and software resources of a device, supports the operation of a model building optimizer and other software or programs. In the apparatus shown in fig. 1, the user interface 1003 is mainly used for data communication with a client; the network interface 1004 is mainly used for establishing communication connection with a server; and processor 1001 may be configured to invoke the model building optimization program stored in memory 1005 and perform the operations described below in the various embodiments of the model building optimization method of the present invention.

Based on the structure, various embodiments of the model construction optimization method are provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the model building optimization method of the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein. In this embodiment, the model building optimization method is applied to the first participant device participating in the longitudinal federal learning. The first participant device is deployed to a first participant participating in longitudinal federal learning, and devices for longitudinal federal learning deployed in other participants participating in longitudinal federal learning are referred to as second participant devices. The first participant device and the second participant device may be devices such as a smart phone, a personal computer, and a server, which are not limited in this embodiment. In this embodiment, the model building and optimizing method includes:

step S10, inputting the first original feature data of each sample in the first participant device into the first feature projection model respectively for projection to obtain each piece of first projection feature data;

participants participating in longitudinal federal learning generally have one participant (which may be referred to as a "data application") providing feature data and tag data and at least one participant (which may be referred to as a "data provider") providing only feature data (hereinafter referred to as "raw feature data" for distinction). In an application scenario of longitudinal federal learning, feature dimensions of original feature data of a data application party and a data provider are different, for example, images shot from different angles are acquired, and for example, acquired user data with different dimensions are acquired.

For convenience of description, a device deployed at a data applier is referred to as a data applier device, and a device deployed at a data provider is referred to as a data provider device.

In a specific application scene, a model task can be set as required, a model to be trained of each participant is designed according to the model task, the model to be trained is deployed in equipment of each participant, and longitudinal federal learning is carried out on the model to be trained to obtain a target task model. The model task refers to a purpose of the model after modeling is completed, and may be risk prediction, advertisement recommendation, and the like, for example, that is, when the model task is risk prediction, the trained target task model may be used for risk prediction, and when the model task is advertisement recommendation, the trained target task model may be used for advertisement recommendation, and the model task is not limited in this embodiment.

In order to facilitate pre-training the model to be trained by using the original feature data of the unlabeled sample to improve the training effect of the model to be trained in the longitudinal federal learning phase, in this embodiment, the model to be trained may include a feature projection model, a coding model and a prediction model deployed in the data application side device, and further include a feature projection model deployed in the data provider side device. The feature projection models in the data application side device and the feature projection models in the data providing side devices are all realized by different types or the same type of models with different structures.

The feature projection model is used for projecting original feature data of a sample from a feature control to which the original feature data belong to another feature space, so that feature data (hereinafter referred to as "projection feature data") different from the original feature data in data form can be obtained without changing specific information contained in the original feature data, when a data application device or a data provider device projects the original feature data of the data application device through a feature projection model of the data application device to obtain projection feature data, and then sends the projection feature data to a device of the other party, the device of the other party can input the projection feature data to a subsequent model for processing, but cannot obtain the corresponding original feature data, and therefore privacy information carried in the original feature data of the sample is protected from being leaked to the other party. The specific implementation of the feature projection model by using which model is implemented in this embodiment is not limited, and the feature projection model may be specifically selected according to a data form of the original feature data, for example, when the original feature data is image data, the feature projection model may use a neural network, and network parameters in the neural network may be updated in a training process; when the original feature data is a text, the feature projection model may adopt a word list, embedded feature data corresponding to each word or phrase is recorded in the word list, the embedded feature data may be a vector or a matrix, and the embedded feature data may be updated in a training process.

An encoding model is a model used to encode feature data to extract information that is useful to the model prediction task implicit therein. The specific coding model is implemented by which model, for example, a multi-layer neural network may be used, or for example, a self-attention model (e.g., an encoder in a transform model) may be used.

The model type selected by the prediction model can be set according to the prediction result required by the model task; for example, if the model task is two-class, then the prediction model may employ two classifiers, or if the model task is multi-class, then the prediction model may employ multiple classifiers, and based on the result output by the prediction model, it may be determined to which class the sample belongs.

In order to facilitate pre-training the model to be trained by using the original feature data of the unlabeled sample, in this embodiment, a classification model may be further deployed in the data application device, where the classification model may be implemented by using two classifiers, and is used to predict whether the original feature data from the data application device and the original feature data from the data provider device belong to the same sample, and pre-train the feature projection model and the coding model in the data application device by using the classification task, or pre-train the feature projection model in the data provider device according to the classification task. In a specific implementation manner, other models may not be additionally deployed in the data provider device, and at this time, the feature projection model in the data provider device is pre-trained depending on a classification task in the data application device; or, the data provider device may further deploy a coding model and a classification model, where the classification model is also used to predict whether the original feature data from the data application device and the original feature data from the data provider device belong to the same sample, and based on a classification task of the data provider device, the feature projection model in the data provider device may be pre-trained.

It should be noted that, in the conventional longitudinal federal learning process, parameters in the model are initialized by random initialization or according to manual experience, and the prediction accuracy of the model is low at this time, and even in the case of randomly initializing the parameters, it takes a long time to train the parameters in the longitudinal federal learning process to a state in which the prediction accuracy of the model is high, or if the overall direction of the initialized parameters is misaligned, the model with the prediction accuracy meeting the requirements cannot be trained.

In the present embodiment, it is proposed to pre-train the model before longitudinal federal learning. Pre-training refers to pre-training of the feature projection model and the coding model in the data application device and the feature projection model in the data provider device before formal federal learning is performed. That is, compared with the case that the feature projection model, the coding model and the prediction model initialized randomly or according to manual experience are subjected to longitudinal federal learning by using the original feature data of the labeled samples in each participant, in this embodiment, the feature projection model and the coding model after pre-training are used as the basis of longitudinal federal learning, and how to mine the link between the original feature data of the samples in each participant can be learned by using the feature projection model and the coding model after pre-training, so that the feature projection model and the coding model after pre-training stand on the starting line vividly, and the feature projection model and the coding model after pre-training run for a certain distance from the starting line to the correct direction (the direction for improving the prediction accuracy), so that in the longitudinal federal learning stage, the feature projection is performed by using the feature projection model after pre-training, the coding is performed by using the coding model after pre-training, the coding feature data more beneficial to the prediction model to obtain an accurate prediction result can be obtained, the effect of the federal learning in the subsequent longitudinal federal learning stage can be improved, and the consumption of the target model obtained by the training can be further, and the longitudinal learning task supervision task of the longitudinal learning can be reduced.

In this embodiment, one of the participants of the longitudinal federal learning is referred to as a first participant, the participants other than the first participant are referred to as a second participant, the device for longitudinal federal learning deployed in the first participant is referred to as a first participant device, and the device for longitudinal federal learning deployed in the second participant is referred to as a second participant device. In this embodiment, the first participant device may be a data application device or a data provider device, and is not limited in this embodiment.

Hereinafter, the feature projection model, the coding model, and the classification model deployed in the first participant apparatus are referred to as a first feature projection model, a first coding model, and a first classification model, respectively, for distinction, and the feature projection model, the coding model, and the classification model deployed in the second participant apparatus are referred to as a second feature projection model, a second coding model, and a second classification model, respectively, for distinction.

It will be appreciated that in particular embodiments, when the first participant device is a data application device, the second encoding model and the second classification model may not be deployed in the second participant device; when the first participant device is a data provider device, the second coding model and the second classification model need to be deployed in the second participant device, but the first feature projection model and the first coding model in the first participant device may not need to be pre-trained by the second coding model and the second classification model in the second participant device.

In particular, the pre-training of the first feature projection model, the second feature projection model and the first coding model may comprise one or more rounds of updating of the respective models. The updating of the model in the pre-training phase is referred to as pre-training updating, and the updating of the model in the longitudinal federal learning phase is referred to as longitudinal federal updating for distinction. The following describes the pre-training process by taking a round of pre-training update as an example. In addition, updating the model refers to updating parameters in the model.

Before starting the pre-training, the parameters in the first feature projection model, the second feature projection model and the first coding model may be initialized randomly or empirically.

The first participant device may obtain, either locally or remotely, raw feature data for each sample owned by the first participant (hereinafter referred to as "first raw feature data" for clarity). One sample can correspond to at least one piece of sample data, one piece of sample data comprises original feature data of the sample in each participant, feature dimensions of the original feature data of the sample owned by each participant are different, and the sample can be regarded as being described from different angles. For example, in one embodiment, one participant is a bank, the other participant is an e-commerce institution, the bank owns financial activity business data of the user (sample), such as deposit amount, loan amount, etc., the e-commerce owns purchase record data of the user, such as purchase commodity type, purchase commodity amount, etc., and the model task may be to predict repayment risk of the user using the financial activity business data and the purchase record data of the user.

The samples used in the pre-training phase may be aligned samples, i.e., samples that are common to each party but have different feature data for the samples, or may be misaligned samples. For samples without label data, referred to as unlabeled samples, the raw feature data of the unlabeled samples may be used for pre-training since the pre-training process may not require label data of the samples.

After the first participant device acquires the first original feature data of each sample, the first participant device may input each piece of the first original feature data to the first feature projection model for projection, so as to obtain projection feature data corresponding to each sample (hereinafter referred to as "first projection feature data" for distinction).

The data format of the first projection feature data may be determined according to the data format of the input data of the first coding model, and is not limited in this embodiment. For example, when the data form of the input data of the first coding model is a vector, the first projection feature data may be a vector, and when the data form of the input data of the first coding model is a vector sequence (composed of a plurality of vectors), the first projection feature data may be a vector sequence.

For example, in an embodiment, when the first raw feature data is image data, the first feature projection model may be implemented by using a neural network, and when the data form of the input data of the first coding model is a vector, the data form of the output data of the neural network may also be set as a vector, and the first participant device inputs the first raw feature data into the neural network, and outputs a vector, that is, the first projection feature data, through the action of each neuron of the neural network.

Step S20, receiving each piece of second projection feature data sent by the second participant device, and forming a data pair with each piece of the first projection feature data, where each piece of the second projection feature data is obtained by inputting second original feature data of each sample into the second feature projection model by the second participant device for projection;

the second participant device may obtain, either locally or remotely, raw feature data for each sample owned by the second participant (hereinafter referred to as "second raw feature data" to distinguish). It will be appreciated that the feature dimensions of the second raw feature data are different from the feature dimensions of the first raw feature data. After the second participant device acquires the second original feature data of each sample, the second participant device may input each piece of the second original feature data to the second feature projection model for projection, so as to obtain projection feature data corresponding to each sample (hereinafter referred to as "second projection feature data" for distinction). It should be noted that the intersection of each sample of the first party and each sample of the second party is not empty, that is, there is first projection feature data and second projection feature data from the same sample. The second characteristic projection model is different from the first characteristic projection model in model type, or parameters in the model are different when the same type of model is sampled. For example, when the first feature projection model and the first feature projection model are both neural networks, parameters in the two neural networks are different, and for example, when the first feature projection model and the second feature projection model both sample word lists, embedded feature data corresponding to the same word or word in the two word lists are different. The specific implementation manner of inputting the second original feature data into the second feature projection model for projection to obtain the second projection feature data may refer to the specific implementation manner of inputting the first original feature data into the first feature projection model for projection to obtain the first projection feature data, which is not described in this embodiment again.

After obtaining the second projection feature data corresponding to each sample, the second participant device may send each piece of the second projection feature data to the first participant device.

It can be understood that, since the second projection feature data is obtained by projecting the second original feature data projection model through the second feature projection model, the second projection feature data still carries information carried by the second original feature data, but the data format of the second projection feature data is changed compared with that of the second original feature data, the first participant device cannot obtain the privacy information carried in the second original feature data by analyzing the second projection feature data, so that the privacy data of the samples owned by each party is not leaked to other participants in the pre-training process, thereby ensuring the data security in the pre-training process.

Step S30, inputting the data pair into the first coding model to obtain first coding feature data, where the data pair includes a first data pair labeled with a first label or a second data pair labeled with a second label, the first data pair is composed of one piece of the first projection feature data and one piece of the second projection feature data corresponding to the same sample, and the second data pair is composed of one piece of the first projection feature data and one piece of the second projection feature data corresponding to different samples;

the first participant device may combine the first projection characteristic data and the second projection characteristic data to obtain a plurality of data pairs, where each data pair is composed of one piece of first projection characteristic data and one piece of second projection characteristic data. And dividing the data pair into a first data pair and a second data pair according to sample attribution of two pieces of projection feature data in the data pair, wherein the first projection feature data and the second projection feature data in the first data pair correspond to the same sample (namely from aligned samples), and the first projection feature data and the second projection feature data in the second data pair correspond to different samples (namely from unaligned samples). The first participant device labels the first data pair with a first label and labels the second data pair with a second label. Wherein the first label indicates that the two projection feature data in the data pair are from the same sample, and the second label indicates that the two projection feature data in the data pair are from different samples.

It should be noted that the first participant device and the second participant device may perform sample alignment in advance, for example, perform sample alignment by using an encrypted sample alignment technique, and determine aligned samples in each sample. The feature data of the aligned samples may be identified with the same ID, so that the first participant device can identify whether the feature data is from the same sample according to the ID of the feature data.

In a specific embodiment, there may be one or more first projection feature data and second projection feature data corresponding to one sample, and when one sample corresponds to multiple pieces of first projection feature data or multiple pieces of second projection feature data, the first projection feature data and the second projection feature data may be combined in a permutation and combination manner to obtain multiple first data pairs.

The manner of obtaining the plurality of first projection feature data corresponding to the sample is not limited herein. For example, in an embodiment, the first participant device may obtain the plurality of pieces of first projection feature data corresponding to the sample in a data enhancement manner: for one target sample in each sample, the first participant device may perform data enhancement on first original feature data of the target sample to obtain multiple pieces of enhanced first original feature data corresponding to the target sample, and input each piece of enhanced first original feature data to the first feature projection model for projection to obtain multiple pieces of first projection feature data corresponding to the target sample; the data enhancement mode is not limited in this embodiment, for example, when the first original feature data is image data, the data enhancement mode may be performed by rotating the image. Similarly, the plurality of second projection feature data corresponding to the sample may also be obtained by the second participant device in a data enhancement manner, which is not described herein again.

The first participant inputs each data pair to the first coding model for coding, and obtains coding characteristic data (hereinafter referred to as "first coding characteristic data" for distinction) corresponding to each data pair.

In a specific embodiment, when the first participant device inputs the data pair to the first coding model, the first projection feature data and the second projection feature data in the data pair may be subjected to splicing, calculation, or other processing operations to obtain data in a data form that conforms to the input data of the first coding model, and then the data is input to the first coding model for coding. For example, in an embodiment, when the first projection feature data and the second projection feature data are in a vector form and the data form of the input data of the first coding model is a vector with a length of n, the first projection feature data and the second projection feature data may be subjected to vector splicing or weighted average to obtain a vector, and the vector may be subjected to bit complementing or clipping to be converted into a vector with a length of n and then input to the first coding model.

Step S40, inputting the first coding feature data into the first classification model for classification to obtain a data pair classification result, wherein the classification result is used for representing whether two pieces of projection feature data in the data pair correspond to the same sample;

after obtaining each piece of first encoding characteristic data, the first participant device may input each piece of first encoding characteristic data to the first classification model for classification, so as to obtain a classification result corresponding to each data pair. The classification result is used for representing whether two pieces of projection characteristic data in the corresponding data pair correspond to the same sample. It should be noted that, when the pre-training is just started, the classification result is not necessarily accurate, and the pre-training is performed for at least one round of pre-training update on each model, so that the error between the classification result of the data pair and the label marked by the data pair is smaller and smaller, that is, the classification result is more and more accurate, in this process, each model can learn how to accurately mine the relevant features between two pieces of projection feature data, and then accurately judge whether the two pieces of projection feature data come from the same sample based on the features.

Step S50, performing one round of pre-training updating on the first feature projection model, the first coding model and the first classification model according to the error between the corresponding label of the data pair and the classification result;

after obtaining the classification result corresponding to each data pair, the first participant device may calculate an error between the label of the data pair and the classification result, and update each parameter in the first feature projection model, the first coding model, and the first classification model to update the first feature projection model, the first coding model, and the first classification model for the purpose of reducing the error.

When the first participant device performs one round of pre-training update on the first feature projection model, the first coding model and the first classification model, the second participant device also performs at least one round of pre-training update on the second feature projection model. The manner in which the second participant device performs the pre-training update on the second feature projection model is not limited in this embodiment. For example, in an embodiment, a second encoding model and a second classifier may be deployed in a second party device, a first party device sends first projection feature data of each sample to the second party device, and the second party device updates the second feature projection model, the second encoding model, and the second classification model in the same manner as the first party device, which is not described herein again.

The method for calculating the update parameter may specifically adopt a gradient descent algorithm. Specifically, a loss function of an error between a label of the characterization data pair and the classification data may be calculated, and the loss function may be a cross entropy loss function or a two-class cross entropy loss function, which is not limited in this embodiment specifically; and calculating gradient values of the loss function corresponding to the parameters in the first feature projection model, the first coding model and the first classification model respectively, and adopting the gradient values to correspondingly update the parameters in the first feature projection model, the first coding model and the first classification model.

And step S60, after at least one round of pre-training updating is carried out on each model, longitudinal federal learning is carried out on the first feature projection model, the first coding model, the second feature projection model and the second participant device after updating to obtain a target task model, wherein the second feature projection model is updated by the second participant device in the process of pre-training updating on each model.

After performing a round of pre-training update on each model (including a first characteristic projection model, a second characteristic projection model, a first coding model and a first classification model), detecting whether a condition for finishing pre-training is met; if the condition for finishing the pre-training is not met, performing the next round of pre-training updating on the basis of each updated model; if the condition for finishing the pre-training is met, the first participant device may perform longitudinal federal learning with the second participant device based on the updated first feature projection model, the first coding model, the second feature projection model, and the second participant device of the round to obtain the target task model.

The condition for ending the pre-training may be that the round of updating the pre-training reaches a preset round, or that the duration of the pre-training reaches a preset duration, or that an error between a label of each data pair and a classification result converges, and the like.

The process of performing the longitudinal federal learning based on the updated first feature projection model, the updated first coding model and the updated second feature projection model may refer to a conventional longitudinal federal learning process, which is not described in detail in this embodiment.

In this embodiment, through the above process, the principle analysis that can achieve the pre-training purpose is as follows: in the longitudinal federal learning scenario, the same sample has correlation between raw feature data of different participants. In particular, the raw feature data of the sample among the various participants is often acquired in different scenarios, characterizing different aspects of the sample, or characterizing different behaviors of the sample (when the sample is a user). The characteristics and behaviors of one sample in different scenes have consistency, and the characteristics and behaviors of different samples in different scenes have discrimination. Therefore, by judging whether the original feature data from different participants belong to the same sample or not and using the sample as a pre-training task, the models in the pre-training process can learn the characteristics (features) with high discrimination. For example, for sample 1 and sample 2, the sample labels are 0 and 1, respectively, with raw feature data at both party a and party B; through pre-training, the model can distinguish the original characteristic data of the sample 1 on the participant A and the original characteristic data of the sample 2 on the participant B to a certain extent, which do not belong to the same sample, so that the model can learn some information related to the characteristics of the sample, but does not have the characteristic extraction capability at all, namely, compared with a model initialized randomly or according to manual experience, the pre-trained model runs out from the starting line to the correct direction for a certain distance. The model after pre-training is used for subsequent longitudinal federal learning, and the training effect of supervised modeling in the subsequent longitudinal federal learning can be improved. Meanwhile, the samples which can be used for training can be increased through a large amount of original feature data of the unaligned and unlabeled samples, and the advantage of pre-training is also achieved.

Therefore, compared with the method for performing longitudinal federal learning on a first feature projection model, a second feature projection model, a first coding model and a prediction model which are initialized randomly or initialized according to manual experience by adopting sample data with labels in all participants, in the embodiment, the first feature projection model, the second feature projection model and the first coding model which are pre-trained are used as the basis of the longitudinal federal learning, and because the first feature projection model, the second feature projection model and the first coding model learn some information related to the features of the samples, in the longitudinal federal learning stage, through the processing of the first feature projection model, the second feature projection model and the first coding model, coding feature data which are more beneficial to the prediction model to obtain an accurate prediction result can be coded, so that the prediction accuracy of a target task model obtained by training can be improved, the time for the longitudinal federal learning can be shortened, and the consumption of computing resources in the longitudinal federal learning stage can be reduced. In addition, the original feature data of the unlabeled samples are applied to the longitudinal federal learning scene to participate in model training, and therefore the model prediction accuracy in the longitudinal federal learning scene is improved based on the original feature data of the unlabeled samples. In addition, in the pre-training process, the original characteristic data of the sample is not directly sent by each participant device, so that the privacy and the safety of the data in each participant are also ensured.

Further, in an embodiment, the step S20 includes:

step S201, calculating a sample distance between a first target sample and other samples except the first target sample in each sample for one first target sample in each sample;

in order to further improve the training efficiency and the training effect in the pre-training stage, shorten the pre-training duration, and reduce the computing resource consumption in the pre-training stage, in this embodiment, the amount of the pre-training data may be expanded by expanding the number of the first data pairs.

Specifically, taking one of the samples of the first participant device as an example (hereinafter referred to as a first target sample for distinction), the sample distance between the first target sample and the other samples of the samples except the first sample may be calculated.

The manner of calculating the sample distance between two samples is not limited in this embodiment, for example, in an embodiment, the similarity between the first original feature data of two samples may be calculated as the sample distance between two samples, and as another embodiment, the similarity between the first projection feature data of two samples may be calculated as the sample distance between two samples. The way of calculating the similarity between feature data is not limited in this embodiment, and for example, when the feature data is a vector, an euclidean distance, a manhattan distance, or the like may be used.

Step S202, taking a preset number of samples closest to the sample distance between the first target samples as second target samples, or taking samples with the sample distance between the first target samples smaller than a preset threshold value as second target samples;

and the first participant device takes the preset number of samples which are closest to the sample distance between the first target sample and the other samples as second target samples. The preset number may be set as needed, and is not limited in this embodiment, for example, 10 samples are set, that is, 10 samples closest to the sample distance between the first target sample in each other sample are used as the second target sample.

Alternatively, the first participant device may also use, as the second target sample, a sample whose sample distance from the first target sample is smaller than a preset threshold value among the other samples. The preset threshold may be set as needed, and is not limited in this embodiment.

Step S203, newly adding the first projection characteristic data corresponding to each second target sample to first projection characteristic data corresponding to the first target sample, to obtain multiple pieces of first projection characteristic data corresponding to the first target sample;

because the sample distance between the second target sample and the first target sample is small, the second target sample and the first target sample can be regarded as the same sample to a certain extent, and the corresponding first projection feature data can be regarded as coming from the same sample, so that the first projection feature data corresponding to each second target sample is newly added to the first projection feature data corresponding to the first target sample. That is, the first target sample corresponds to a plurality of pieces of first projection feature data, one of which is from the first original feature data of the first target sample, and the other of which is from the first original feature data of the second target sample.

Step S204, respectively combine the plurality of pieces of first projection feature data corresponding to the first target sample and the second projection feature data corresponding to the first target sample into the first data pairs, and label the first label for each first data pair.

The first participant device respectively combines each piece of first projection feature data corresponding to the first target sample with the second projection feature data corresponding to the first target sample to form a first data pair, and labels each first data pair with a first label. By the method, the number of the first data pairs marked with the first labels can be expanded, so that the training efficiency and the training effect in the pre-training stage are improved, the pre-training duration is shortened, and the consumption of computing resources in the pre-training stage is reduced.

Further, based on the first embodiment described above, a second embodiment of the model construction optimization method of the present invention is provided, and in this embodiment, when the first original feature data is table data including values of corresponding samples under multiple data items, since there may be data items with discrete values and data items with continuous values in each data item, how to set the first feature projection model for values of two types of data items is a problem to be solved. In this embodiment, for the problem, a neural network and a vocabulary are used in combination as a first feature projection model, and first original feature data is projected based on the first feature projection model to obtain first projection feature data. In addition, the first feature projection model may be referred to in the embodiment of the second feature projection model.

For convenience of description, the description of the present embodiment is given by taking any one of the first original feature data as an example, and for the sake of distinction, the first original feature data is referred to as target original feature data.

In step S10, inputting the target original feature data into the first feature projection model for projection to obtain the first projection feature data corresponding to the target original feature data includes:

step S101, selecting embedded characteristic data corresponding to values under a first data item in the target original characteristic data from embedded characteristic data corresponding to various values under the first data item in the word list as first part of projection characteristic data, wherein the first data item is a data item with discrete values;

data items with discrete values are called first data items, and data items with continuous values are called second data items. The value dispersion refers to that each value is discontinuous and is a specific discrete value, for example, the data item of the gender is a data item with a discrete value, and the data item of the deposit amount is a data item with a continuous value.

For each first data item, the word list may set embedded feature data corresponding to various values of the first data item, where the embedded feature data of different values are different. Each embedded feature data in the vocabulary may be initialized according to manual experience or initialized randomly, and when the first feature projection model is updated in the pre-training process, each embedded feature data is specifically updated. The data form of the embedded feature data is not limited in this embodiment, and for example, when the data form of the input data of the first coding model is a vector sequence, the embedded feature data may be a vector.

The first participant device may select, from the embedded feature data corresponding to the various values in the first data item in the vocabulary, embedded feature data corresponding to the value in the first data item in the target original feature data, and use the selected embedded feature data as the first partial projection feature data corresponding to the target original feature data. It can be understood that, when the embedded feature data is a vector and there are a plurality of first data items, the first part of the projection feature data is a vector sequence composed of embedded feature data corresponding to values of the respective first data items.

For step S102, inputting values under each second data item in the target original feature data to the neural network for feature projection to obtain second part of projection feature data, where the second data items are data items with continuous values;

the number of neurons in the neural network and the connection structure of the neurons are not limited in this embodiment, and may be set according to specific needs. The network parameters in the neural network may be initialized according to manual experience or initialized randomly, and when the first feature projection model is updated in the pre-training process, the updating of each network parameter is specifically included.

The first participant device can input the values under each second data item in the target original feature data to the neural network together for feature projection, and the obtained result is used as second part of projection feature data corresponding to the target original feature data.

The data format of the second part of projection feature data (that is, the data format of the output data of the neural network) is not limited in this embodiment, for example, when the data format of the input data of the first coding model is a vector sequence, the second part of projection feature data may be a vector sequence, or may also be a vector, and the vector is cut by the first participant device to obtain a vector sequence.

Step S103, obtaining the first projection feature data corresponding to the target original feature data according to the first part of projection feature data and the second part of projection feature data.

After the first participant device obtains the first part of projection feature data and the second part of projection feature data corresponding to the target original feature data, the first participant device may obtain the first projection feature data corresponding to the target original feature data according to the first part of projection feature data and the second part of projection feature data. In a specific embodiment, the first projection feature data may be obtained by splicing, calculating, or otherwise processing the first projection feature data and the second projection feature data, that is, in this embodiment, there is no limitation on an embodiment of obtaining the first projection feature data corresponding to the target original feature data according to the first projection feature data and the second projection feature data.

In an embodiment, when the data format of the input data of the first coding model is a vector sequence, the data formats of the first projection feature data and the second projection feature data may also be set as a vector sequence, and when the data pair is input into the first coding model, the two vector sequences of the first projection feature data and the second projection feature data in the data pair may be spliced and input into the first coding model. The length L of the vector sequence input by the first coding model may be set in advance as needed; the lengths of the two vector sequences of the first projection characteristic data and the second projection characteristic data are n1 and n2 respectively, n1 and n2 may be equal or unequal, and specifically may be set as required, where n1+ n2 is equal to or less than L. The data form of the first part of projection characteristic data and the second part of projection characteristic data can also be a vector sequence, and the first participant equipment splices the first part of projection characteristic data and the second part of projection characteristic data to obtain a vector sequence as first projection characteristic data; the number of first data items and the number of second data items are denoted d respectively _s And d _c The length of the vector sequence of the first part of the projection feature data is d _s (ii) a In order to ensure the balance of the vector sequence dimension, the length K of the vector sequence of the second part of the projection feature data should be d _c However, considering that the length of the vector sequence of the first projection feature data is n1, the value of K can be determined as follows:

if the length of the vector sequence obtained by splicing the first part of projection feature data and the second part of projection feature data is smaller than n1, the set vector can be adopted for bit complementing to obtain the vector sequence with the length of n1, and the vector sequence is used as the first projection feature data.

Further, based on the first and/or second embodiments, a third embodiment of the model construction optimization method of the present invention is provided. In order to further improve the effect of the pre-training, that is, the capability of each model obtained by the pre-training to mine the sample features, in this embodiment, an encoding model based on the attention-free mechanism is adopted as the first encoding model, for example, an encoder in a transform model may be adopted as the first encoding model. It should be noted that the coding model based on the self-attention mechanism can more effectively extract the relevant information between the two input sequences, so that the coding model is more suitable for coding the data pairs in the classification task in the pre-training stage, that is, a better pre-training effect can be obtained.

When the first coding model is a coding model based on the self-attention mechanism, the data form of the input data of the first coding model may be set as a vector sequence of a certain length. And inputting the vector sequence into the first coding model for coding, so as to obtain coded vectors corresponding to all vectors in the vector sequence. Accordingly, the first projection feature data and the second projection feature data may be a vector sequence including a plurality of vectors (hereinafter, referred to as "projection feature vectors" for distinction), respectively.

For convenience of description, the present embodiment will be described by taking any one of the data pairs as an example, and the data pair will be referred to as a target data pair for the sake of distinction.

In the step S30, the step of inputting the target data pair into the first coding model to obtain the first coding feature data corresponding to the target data pair includes:

step S301, forming a feature vector sequence by using each projection feature vector included in first target projection feature data and each projection feature vector included in second target projection feature data, where the first target projection feature data is the first projection feature data included in the target data pair, and the second target projection feature data is the second projection feature data included in the target data pair;

the first projection feature data and the second projection feature data included in the target data pair are referred to as first target projection feature data and second target projection feature data, respectively, for distinction. The first participant device may combine each projection feature vector included in the first target projection feature data with each projection feature vector included in the second target projection feature data into a feature vector sequence. The combination is not limited in this embodiment. For example, assume that the first target projection feature data is represented as (v) ₁ 、v ₂ 、……、v _n1 ) The second target projection feature data is represented as (v) ₁ ’、v ₂ ’、……、v _n2 ') to a host; in one embodiment, the combined feature vector sequence may be (v) ₁ 、v ₂ 、……、v _n1 、v ₁ ’、v ₂ ’、……、v _n2 ') to a host; in another embodiment, a vector representing the start may be added in front of the first target projection feature data, a vector representing the separation may be added in the middle of the first target projection feature data and the second target projection feature data, and a vector representing the end may be added behind the second target projection feature data, for example, the combined feature vector sequence may be ([ CLS)]、v ₁ 、v ₂ 、……、v _n1 、[PAD]、v ₁ ’、v ₂ ’、……、v _n2 ’、[PAD])。

Step S302, inputting the characteristic vector sequence into the first coding model for coding to obtain coding characteristic vectors corresponding to each vector in the characteristic vector sequence;

the first participant device may input the feature vector sequence obtained by the combination to the first coding model for coding, so as to obtain coded feature vectors corresponding to the respective vectors in the feature vector sequence. For example, when the feature vector sequence is represented as ([ CLS)]、v ₁ 、v ₂ 、……、v _n1 、[PAD]、v ₁ ’、v ₂ ’、……、v _n2 ’、[PAD]) Then, coding is carried out to obtain coding feature vectors and tables corresponding to all vectors in the feature vector sequence respectivelyShown as (E ([ CLS)])、E(v ₁ )、E(v ₂ )、……、E(v _n1 )、E([PAD])、E(v ₁ ’)、E(v ₂ ’)、……、E(v _n2 ’)、E([PAD]))。

Step S303, obtaining the first encoding feature data corresponding to the target data pair according to the encoding feature vector.

The first participant device may obtain first encoding feature data corresponding to the target data pair according to the encoding feature vector. In a specific embodiment, the first participant device may obtain the first encoding feature data corresponding to the target data pair according to all the encoding feature vectors, or may obtain the first encoding feature data corresponding to the target data pair only by using one or a part of the encoding feature vectors. For example, in one embodiment, the first participant device may use the encoded feature vector corresponding to the first vector in the feature vector sequence as the first encoded feature data, such as E ([ CLS ]) as the first encoded feature data; in another embodiment, all the encoded feature vectors may be averaged, and the resulting vector may be used as the first encoded feature data.

For example, in one embodiment, as shown in fig. 3, the first participant device inputs raw feature data (corresponding to raw data in the figure) X into the feature projection model to obtain a feature vector sequence (v) of the first participant device ₁ 、v ₂ 、……、v _n ) (ii) a The second participant equipment transmits the original feature data X' to a feature projection model to obtain a feature vector sequence (v) ₁ ’、v ₂ ’、……、v _n2 ') to a test; the first participant device will (v) ₁ 、v ₂ 、……、v _n ) And (v) ₁ ’、v ₂ ’、……、v _n2 ') are spliced and an upper vector [ CLS ] is added]And [ SEP]Then inputting the first bit vector into a coding model (in the figure, a coder in a transform model is adopted) for coding, and obtaining a first bit vector [ CLS ]]Corresponding encoded vector (denoted [ CLS ] in the figure]) And inputting the vector into a classification model for classification to obtain a classification result of whether the vectors belong to the same user (sample).

Further, based on the first, second and/or third embodiments, a fourth embodiment of the model construction method according to the present invention is provided, in this embodiment, the second participant device further deploys a second coding model and a second classification model, and the step S50 further includes:

step S501, before next round of pre-training and updating is performed on each model, sending a first target coding layer in the updated first coding model to aggregation equipment, so that the aggregation equipment aggregates the first target coding layer and a second target coding layer received from the second participant equipment to obtain a third target coding layer, wherein the first target coding layer is an nth layer of the updated first coding model, the second target coding layer is an nth layer of the updated second coding model, and n is greater than 1;

in order to further improve the effect of the pre-training phase, in this embodiment, the second participant device may also deploy a second coding model and a second classification model, and the second participant device may also perform pre-training update on the second feature projection model, the second coding model, and the second classification model in the same manner as the first participant device.

Specifically, the first participant device may first send the first target coding layer in the updated first coding model to the aggregation device before performing the next round of pre-training update on each model. The aggregation device may be one of the first participant device and the second participant device, or may also be a third-party device other than the first participant device and the second participant device, which is not limited in this embodiment. The first target coding layer is an nth layer coding layer in the updated first coding model, and the first target coding layer is sent to the aggregation device, specifically, the model parameters in the first target coding layer may be sent to the aggregation device. Correspondingly, the second participant device may also send the second target coding layer in the updated second coding model to the aggregation device before performing the next round of pre-training update on each model, where the second target coding layer is the nth layer coding layer in the updated second coding model.

n is greater than 1, that is, the first participant device and the second participant device may send any coding layer of the respective coding models, except the first coding layer, to the aggregation device for aggregation, for example, send each coding layer except the first coding layer to the aggregation device for aggregation.

The aggregation device aggregates the received first target coding layer and the second target coding layer, where the aggregation may specifically be to average or weighted average model parameters in the first target coding layer and the second target coding layer, and an obtained result is referred to as a third target coding layer to show differentiation. The aggregation device sends the third target coding layer to the first participant device and the second participant device.

Step S502, receiving the third target coding layer sent by the aggregation device, and after updating the first target coding layer in the first coding model by using the third target coding layer, performing a next round of pre-training update on each model.

After receiving the third target coding layer sent by the aggregation device, the first participant device may update the first target coding layer in the first coding model by using the third target coding layer, that is, may update the model parameter corresponding to the first target coding layer in the first coding model by using the model parameter corresponding to the third target coding layer. After updating the first target encoding layer, the first participant device may perform a next round of pre-training update on each model. The second participant device may similarly update the second target coding layer in the second coding model using the third target coding layer, and then perform the next round of pre-training update on each model.

It should be noted that, the closer the features extracted by the coding layer from the input data in the first coding model are more relevant to the data, and conversely, the farther the features extracted by the coding layer from the input data are less relevant to the data, so in this embodiment, by using this feature, it is proposed that the coding layers (i.e., the coding layers other than the first layer) from the input data in the coding models of the participating devices can be aggregated, thereby further improving the utilization capability of the unlabeled samples in the pre-training stage, and further prompting the pre-training effect.

Further, in an embodiment, after the step S40, the method further includes:

step A10, calculating according to the error between the label corresponding to each data pair and the classification result to obtain a first intermediate result for updating the second feature projection model;

step a20, sending the first intermediate result to the second participant device, so that the second participant device performs a round of pre-training update on the second feature projection model by using the first intermediate result.

In this embodiment, a method for the second participant device to pre-train and update the second feature projection model is provided, and this method can be applied to a scenario in which the second coding model and the second classification model are not deployed in the second participant device, and certainly, can also be applied to a scenario in which the second coding model and the second classification model are deployed in the second participant device.

Specifically, after obtaining the classification result corresponding to each data pair, the first participant device may calculate an error between the label of the data pair and the classification result, obtain an intermediate result (hereinafter referred to as a first intermediate result to show differentiation) for updating the second feature projection model according to the error calculation, send the first intermediate result to the second participant device, and perform a round of pre-training update on the second feature projection model by the second participant device according to the first intermediate result. Specifically, a loss function of an error between a label of the characterization data pair and the classification data may be calculated, and the loss function may be a cross entropy loss function or a two-class cross entropy loss function, which is not limited in this embodiment specifically; calculating a gradient value of the loss function relative to the second projection characteristic data, and sending the gradient value of the second projection characteristic data as a first intermediate result to the second participant device; and the second participant equipment calculates gradient values of the loss function relative to each model parameter in the second feature projection model by adopting a back propagation algorithm according to the first intermediate result, and updates the model parameters in the second feature projection model according to each gradient value, thereby completing one-time pre-training update of the second feature projection model.

Further, in an embodiment, when the second encoding model and the second classification model are deployed in the second participant device, the process of pre-training the first participant device and the second participant device may include the following two embodiments.

The first is alternating order training. When the t-th round of pre-training is updated, after the first participant equipment obtains second projection characteristic data sent by the second participant equipment, a data pair is constructed based on the second projection characteristic data and first projection characteristic data obtained by the current round of calculation, further, one round of pre-training updating is carried out on the first characteristic projection model, the first coding model and the first classification model, and meanwhile, a first intermediate result is obtained by calculation and sent to the second participant equipment; the second participant equipment updates the second characteristic projection model according to the first intermediate result; the first participant equipment calculates to obtain first projection characteristic data according to the updated first characteristic projection model and sends the first projection characteristic data to the second participant equipment; the second participant device constructs a data pair according to the first projection characteristic data and second projection characteristic data calculated based on a current second characteristic projection model, then performs one-round pre-training updating on the second characteristic projection model, the second coding model and the second classification model, and simultaneously calculates an intermediate result for updating the first characteristic projection model and sends the intermediate result to the first participant device; the first participant device updates the first feature projection model based on the intermediate results, thus completing a round of pre-training update. The training effect of this way is more stable.

The second is that both parties start the training of the present party immediately after getting the data of the other party. The first participant equipment and the second participant equipment respectively calculate projection characteristic data based on the current characteristic projection model and send the projection characteristic data to the other party, the other party constructs a data pair based on the obtained projection characteristic data and the projection characteristic data of the own party, then the respective models are updated, and corresponding intermediate results are calculated and sent back to the other party for the other party to update the characteristic projection model. In this way, all parties do not need to wait for the intermediate result of the other party, so that the training efficiency is higher.

In one embodiment, as shown in fig. 4, for example, party a and party B, party a inputs original feature data (corresponding to the original data in the figure) X into the feature projection model to obtain a feature vector sequence (v) ₁ 、v ₂ 、……、v _n ) (ii) a The participant B sends the original feature data X' to a feature projection model to obtain a feature vector sequence (v) ₁ ’、v ₂ ’、……、v _n2 ') will (v) ₁ ’、v ₂ ’、……、v _n2 ') to participant a; party A will (v) ₁ 、v ₂ 、……、v _n ) And (v) ₁ ’、v ₂ ’、……、v _n2 ') are spliced and an upper vector [ CLS ] is added]And [ SEP ]]Then inputting the first bit vector into a coding model (in the figure, a coder in a transform model is adopted) for coding, and obtaining a first bit vector [ CLS ]]Corresponding encoded vector (denoted [ CLS ] in the figure]) Inputting the vector into a classification model for classification to obtain a classification result of whether the vector belongs to the same user (sample); party A will calculate the penalty function relative to (v) ₁ ’、v ₂ ’、……、v _n2 ') gradient value (of the figure

) Returned to the participant B, and the participant B adopts

Updating the feature projection model; party B also performs the same operations as party a; participant a and participant B encode the partial coding layer (denoted as transform in the figure) of the transform model encoder that is far from the input data _up ) And sending the information to a third-party device for aggregation, and updating a local coding layer by using the aggregated coding layer.

Further, based on the first, second, third and/or fourth embodiments, a fifth embodiment of the model construction optimization method according to the present invention is provided, in this embodiment, after step S60, the method further includes:

step B10, inputting the first original feature data of each aligned sample in the first participant device into the updated first feature projection model for projection respectively to obtain each piece of third projection feature data;

in this embodiment, a specific implementation manner is provided in which the first participant device and the second participant device perform longitudinal federal learning based on a pre-trained model. In particular, the first and second participant devices may employ the first and second raw feature data of the aligned samples for longitudinal federal learning. In this embodiment, the first participant device is the participant's device with label data for each aligned sample. And the first participant equipment is also provided with a prediction model for completing a model task in longitudinal federal learning. The implementation of the prediction model is different according to different model tasks, for example, when the model task is to predict whether the user will be overdue for payment, the prediction model can be implemented by adopting a two-classifier.

The longitudinal federal learning process can include at least one or more longitudinal federal updates to the first feature projection model, the second feature projection model, the first coding model, and the prediction model. The following description of the longitudinal federal learning procedure is given by taking a round of longitudinal federal update as an example.

The first participant device inputs the first original feature data of each aligned sample to the first feature projection model updated through pre-training for projection, and the data obtained through projection is referred to as third projection feature data.

Step B20, receiving each piece of fourth projection feature data sent by the second party device, where the second original feature data of each aligned sample is respectively input to the updated second feature projection model by the second party device for projection to obtain each piece of fourth projection feature data;

the second participant device also inputs the second original feature data of each aligned sample into the second feature projection model updated through pre-training, and performs projection, and the data obtained through projection is referred to as fourth projection feature data. The second participant device sends the fourth projected feature data to the first participant device.

Step B30, combining one piece of the third projection characteristic data and one piece of the fourth projection characteristic data corresponding to the same alignment sample, and inputting the combined data into the updated first coding model for coding to obtain second coding characteristic data corresponding to each alignment sample;

for each alignment sample, the first participant device combines the third projection feature data and the fourth projection feature data corresponding to the alignment sample, and inputs the combination into the first coding model after being pre-trained and updated to perform coding, so as to obtain coding feature data corresponding to the alignment sample (hereinafter referred to as second coding feature data for distinction). Each aligned sample may result in corresponding second encoding characteristic data.

It should be noted that the specific implementation manner of the first participant device inputting the third projection feature data and the fourth projection feature data into the first coding model in a combined manner is the same as the specific implementation manner of inputting the data pair obtained by combining the first projection feature data and the second projection feature data into the first coding model, and details are not repeated here.

Step B40, inputting each piece of second coding characteristic data into the prediction model respectively for prediction to obtain a prediction result corresponding to each aligned sample;

step B50, performing one round of longitudinal federal update on the first feature projection model, the first coding model and the prediction model according to the error between the prediction label corresponding to each aligned sample and the prediction result, and calculating to obtain a second intermediate result for updating the second feature projection model;

step B60, sending the second intermediate result to the second participant device, so that the second participant device performs a round of longitudinal federal update on the second feature projection model according to the second intermediate result;

and the first participant equipment respectively inputs each piece of second coding characteristic data into the prediction model for prediction, so that a prediction result corresponding to each aligned sample can be obtained.

And the first participant equipment carries out one round of longitudinal federal update on the first feature projection model, the first coding model and the prediction model according to the error between the prediction label (namely label data related to the model task) corresponding to each alignment sample and the prediction result. The first participant equipment further calculates an intermediate result (hereinafter referred to as a second intermediate result for showing distinction) for updating the second feature projection model according to an error between the prediction tag and the prediction result, sends the second intermediate result to the second participant equipment, and performs a round of longitudinal federal update on the second feature projection model by the second participant equipment according to the second intermediate result.

Specifically, the first participant device may calculate a loss function representing an error between a prediction tag of the aligned sample and a prediction result, where the loss function may adopt a cross-entropy loss function or a two-class cross-entropy loss function, and is not limited in this embodiment specifically; calculating gradient values of the loss function corresponding to the parameters in the first feature projection model, the first coding model and the prediction model respectively, and adopting the gradient values to correspondingly update the parameters in the first feature projection model, the first coding model and the first classification model; and calculating a gradient value of the loss function relative to the fourth projection feature data, sending the gradient value as a second intermediate result to the second participant device; and the second participant equipment calculates gradient values of the loss function relative to all model parameters in the second feature projection model by adopting a gradient back propagation algorithm according to the second intermediate result, and updates all model parameters in the second feature projection model by adopting all the gradient values, so that one round of longitudinal federal update is completed on the second feature projection model.

And step B70, after at least one round of longitudinal federal update is carried out on each model, obtaining a target task model based on the updated first characteristic projection model, the updated second characteristic projection model, the updated first coding model and the updated prediction model.

After each model (including a first feature projection model, a second feature projection model, a first coding model and a prediction model) is subjected to a round of longitudinal federal update, whether a condition for ending longitudinal federal learning is met or not is detected; if the condition for finishing the longitudinal federal learning is not met, then the next round of longitudinal federal updating can be carried out on the basis of each updated model; if the condition for ending the longitudinal federal learning is met, the updated first feature projection model, the updated second feature projection model, the updated first coding model and the updated prediction model in the round can be used as target task models.

The condition for ending the longitudinal federal learning may be that the turn of the longitudinal federal update reaches a preset turn, or that the time length of the longitudinal federal learning reaches a preset time length, or that an error between a prediction tag of an alignment sample and a prediction result converges, and the like.

Further, in an embodiment, the first party may be a bank, and the first party possesses service data generated when the user (sample) transacts services in the bank, which may specifically include deposit and withdrawal records, loan records, repayment records, and the like, and may also possess repayment risk labels of some users. The second party may be an e-commerce organization, and the second party has purchase record data generated when the user purchases the commodity on the e-commerce platform, and the purchase record data may specifically include a commodity purchase amount, a payment mode, a return and exchange number, and the like. The method comprises the steps that a first participant and a second participant can set a model task for predicting repayment risks of users based on business data and purchase record data of the users, a first feature projection model, a first coding model and a prediction model are deployed in first participant equipment, a second feature projection model is deployed in second participant equipment, and longitudinal federal learning is carried out on the business data and the purchase record data of the common users with labels for the first feature projection model, the second feature projection model, the first coding model and the prediction model to obtain a repayment risk prediction model used for predicting the repayment risks of the users. The first participant device is also deployed with a first classification model, and before longitudinal federal learning is performed on the first feature projection model, the second feature projection model, the first coding model and the prediction model, the first participant device and the second participant device adopt unlabeled business data (first original feature data) and purchase record data (second original feature data) of a user to pre-train the first feature projection model, the second feature projection model and the first coding model. A round of pre-training update processes may include:

the first participant equipment inputs the service data of each user into the first characteristic projection model respectively to carry out projection to obtain each piece of first projection characteristic data;

the first participant equipment receives all the second projection characteristic data sent by the second participant equipment and forms data pairs with all the first projection characteristic data, wherein all the second projection characteristic data are obtained by respectively inputting the purchase record data of all the users into a second characteristic projection model by the second participant equipment for projection;

the first participant equipment inputs a data pair into a first coding model to be coded to obtain first coding characteristic data, wherein the data pair comprises a first data pair marked with a first label or a second data pair marked with a second label, the first data pair consists of a piece of first projection characteristic data and a piece of second projection characteristic data corresponding to the same sample, and the second data pair consists of a piece of first projection characteristic data and a piece of second projection characteristic data corresponding to different samples;

performing one round of pre-training updating on the first characteristic projection model, the first coding model and the first classification model according to the error between the corresponding label and the classification result of the data pair;

and after at least one round of pre-training updating is carried out on each model, longitudinal federal learning is carried out on the basis of the updated first feature projection model, the first coding model, the second feature projection model and the second participant equipment to obtain a repayment risk prediction model, wherein the second feature projection model is updated by the second participant equipment in the process of carrying out pre-training updating on each model.

In the embodiment, the pre-training of the first characteristic projection model, the second characteristic projection model and the first coding model is realized by adopting the business data and the purchase record data of the common users without labels of the bank and the e-commerce institution. Compared with the method for conducting longitudinal federal learning on a first feature projection model, a second feature projection model, a first coding model and a prediction model which are initialized randomly or initialized according to manual experience by adopting sample data with labels in banks and e-commerce institutions, in the embodiment, the first feature projection model, the second feature projection model and the first coding model which are pre-trained are used as the basis of the longitudinal federal learning, the first feature projection model, the second feature projection model and the first coding model which are pre-trained learn how to mine information related to features of a user, the feature projection model and the coding model which are initialized randomly or initialized according to manual experience can be vividly understood as standing on a starting line, and the feature projection model and the coding model which are pre-trained go to a correct direction (a direction for improving the prediction accuracy) from the starting line for a certain distance, so that in a longitudinal federal learning stage, the coding can obtain coding feature data which are more beneficial to the prediction model, the coding feature projection model and the first coding model can further improve the prediction accuracy of the code data, and the longitudinal federal learning can help to shorten the longitudinal learning stage of resource calculation. In addition, the original characteristic data of the unlabeled sample is applied to the longitudinal federal learning scene to participate in model training, and therefore model prediction accuracy in the longitudinal federal learning scene is improved based on the original characteristic data of the unlabeled sample. In addition, in the pre-training process, each participant device does not directly send original characteristic data of the sample, so that privacy and safety of user data in banks and e-commerce institutions are guaranteed.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a model building optimization program is stored on the storage medium, and when executed by a processor, the model building optimization program implements the steps of the model building optimization method as described below.

The invention also proposes a computer program product comprising a computer program which, when executed by a processor, implements the steps of the model building optimization method as described above.

The embodiments of the model building and optimizing device, the computer-readable storage medium, and the computer program product of the present invention may refer to the embodiments of the model building and optimizing method of the present invention, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A model construction optimization method is applied to a first participant device participating in longitudinal federal learning, the first participant device deploys a first feature projection model, a first coding model and a first classification model, and a second participant device participating in longitudinal federal learning deploys a second feature projection model, and the method comprises the following steps:

inputting first original feature data of each sample in the first participant device into the first feature projection model respectively for projection to obtain each piece of first projection feature data;

inputting the data pair into the first coding model to be coded to obtain first coding feature data, wherein the data pair comprises a first data pair marked with a first label or a second data pair marked with a second label, the first data pair consists of one piece of first projection feature data and one piece of second projection feature data corresponding to the same sample, and the second data pair consists of one piece of first projection feature data and one piece of second projection feature data corresponding to different samples;

performing a round of pre-training update on the first feature projection model, the first coding model and the first classification model according to an error between a corresponding label of the data pair and the classification result;

and after at least one round of pre-training updating is carried out on each model, longitudinal federal learning is carried out on the first feature projection model, the first coding model, the second feature projection model and the second participant device after updating to obtain a target task model, wherein the second feature projection model is updated by the second participant device in the process of pre-training updating each model.

2. The model building optimization method of claim 1, wherein the step of receiving each piece of second projection feature data sent by the second participant device and forming a data pair with each piece of the first projection feature data comprises:

and respectively combining the plurality of pieces of first projection characteristic data corresponding to the first target sample and the second projection characteristic data corresponding to the first target sample into the first data pairs, and labeling the first label for each first data pair.

3. The model construction optimization method according to claim 1, wherein when the first raw feature data is tabular data including values of corresponding samples under a plurality of data items, the first feature projection model includes a vocabulary and a neural network, and the target raw feature data is any one of the first raw feature data;

the step of inputting the target original feature data into the first feature projection model for projection to obtain the first projection feature data corresponding to the target original feature data comprises the following steps:

selecting embedded characteristic data corresponding to values under the first data item in the target original characteristic data from embedded characteristic data corresponding to various values under the first data item in the word list as first part of projection characteristic data, wherein the first data item is a data item with discrete values;

4. The model building optimization method of claim 1, wherein when the first coding model is an autofocusing mechanism-based coding model, the first projection feature data and the second projection feature data each include a plurality of projection feature vectors, a target data pair is any one of the data pairs, and the step of inputting the target data pair into the first coding model to obtain the first coding feature data corresponding to the target data pair comprises:

5. The model building optimization method of claim 1, wherein the second participant device is further deployed with a second coding model and a second classification model, and the step of performing a round of pre-training update on the first feature projection model, the first coding model and the first classification model according to the error between the data pair corresponding label and the classification result further comprises:

before next round of pre-training and updating is carried out on each model, sending a first target coding layer in the updated first coding model to aggregation equipment, so that the aggregation equipment aggregates the first target coding layer and a second target coding layer received from the second participant equipment to obtain a third target coding layer, wherein the first target coding layer is an nth layer of coding layer in the updated first coding model, the second target coding layer is an nth layer of coding layer in the updated second coding model, and n is greater than 1;

6. The method for optimizing model building according to claim 1, wherein after the step of inputting the first encoding characteristic data into the first classification model for classification to obtain the classification result corresponding to the data pair, the method further comprises:

and sending the first intermediate result to the second participant equipment so that the second participant equipment performs one round of pre-training updating on the second feature projection model by adopting the first intermediate result.

7. The model building optimization method according to any one of claims 1 to 6, wherein the first participant device is further deployed with a prediction model, and the step of performing longitudinal federal learning with the second participant device to obtain a target task model based on the updated first feature projection model, the first coding model and the second feature projection model comprises:

8. A model building optimization apparatus, characterized in that the model building optimization apparatus comprises: a memory, a processor and a model building optimization program stored on the memory and executable on the processor, the model building optimization program when executed by the processor implementing the steps of the model building optimization method according to any one of claims 1-7.

9. A computer-readable storage medium, having stored thereon a model building optimization program which, when executed by a processor, performs the steps of the model building optimization method of any one of claims 1-7.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the model building optimization method according to any one of claims 1 to 7.