CN112580826A

CN112580826A - Business model training method, device and system

Info

Publication number: CN112580826A
Application number: CN202110160640.9A
Authority: CN
Inventors: 郑龙飞; 陈超超; 王力; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-03-30
Anticipated expiration: 2041-02-05
Also published as: CN112580826B

Abstract

Embodiments of the present specification provide methods, apparatuses, and systems for training a business model via a first member device and at least two second member devices. Each second member device has a local business model and local sample data, and the local sample data is non-independent co-distributed data. Each second member device provides the local sample data distribution information to the first member device. And the first member equipment determines the probability distribution of the whole sample data according to the local sample data distribution information of each second member equipment and sends the probability distribution of the whole sample data to each second member equipment. And each second member device determines the expansion sample data of various types of sample data from the local sample data according to the probability distribution of the whole sample data and the hyperparameter, wherein the expansion sample data is used for expanding the training sample data of the service model.

Description

Business model training method, device and system

Technical Field

Embodiments of the present disclosure relate generally to the field of artificial intelligence, and more particularly, to a method, an apparatus, and a system for training a business model.

Background

Machine learning techniques are widely applied in various business application scenarios. In a business application scenario, a machine learning model is used as a business model to perform various business prediction services, such as classification prediction, business risk prediction, and the like. In many cases, the business model requires model training using business data of multiple data owners. A plurality of data owners (e.g., an electronic commerce company, a courier company, and a bank) each own part of data of training sample data used for training a business model. The multiple data owners desire to use each other's data together to train the business model uniformly, but do not want to provide their respective data to other data owners to prevent their own data from being leaked.

In view of such a situation, a business model training method capable of protecting data security is proposed, which can coordinate multiple data owners to train a business model while ensuring the data security of each of the multiple data owners.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide a method, an apparatus, and a system for training a business model, which can improve the overall model performance of the trained business model by performing sample data expansion on local sample data of each data owner according to the overall sample data probability distribution and using the expanded sample data to cooperatively train the business model when the business model is cooperatively trained via a plurality of data owners having non-independent and identically distributed local data.

According to an aspect of embodiments herein, there is provided a method for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent identically distributed data, the method applied to the second member devices, the method comprising: providing the local sample data distribution information to the first member equipment; receiving an overall sample data probability distribution from the first member device, wherein the overall sample data probability distribution is determined by the first member device according to local sample data distribution information received from each second member device; and determining expansion sample data of various types of sample data from local sample data according to the probability distribution of the whole sample data and the hyperparameter, wherein the expansion sample data is used for expanding training sample data of a service model, and the hyperparameter at least comprises a sample data expansion proportion.

Optionally, in an example of the above aspect, the method may further include: and performing federal learning by using respective local sample data and expansion sample data together with at least other second member equipment to obtain an updated local service model of each second member equipment.

Optionally, in an example of the above aspect, performing federal learning using the respective local sample data and extended sample data at least with other second member devices, and obtaining the updated local business model of each second member device includes: and at least using the local sample data and the extended sample data which are subjected to disorder processing together with other second member equipment to carry out federal learning to obtain an updated local service model of each second member equipment.

Optionally, in an example of the above aspect, determining extended sample data of various types of sample data from local sample data according to the overall sample data probability distribution and the hyper-parameter includes: determining the sample data expansion number of each type of sample data according to the total number of the local sample data, the probability distribution of the whole sample data and the hyper-parameter; and extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

Optionally, in an example of the above aspect, extracting extended sample data of each type of sample data from the local sample data according to the determined sample data extended number of each type of sample data includes: and randomly extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

Optionally, in an example of the above aspect, the method may further include: determining model performance parameters of the updated local business model at the second member device; sending the determined model performance parameters to first member equipment; and receiving the adjusted hyper-parameters from the first member equipment, wherein the adjusted hyper-parameters are determined by the first member equipment according to the overall model performance parameters of the service model, and the overall model performance parameters are determined according to the model performance parameters of each second member equipment, wherein the expansion sample data determining step, the federal learning step, the model performance parameter determining step, the model performance parameter sending step and the hyper-parameter receiving step are executed in a circulating manner until the target overall model performance parameters are reached.

Optionally, in one example of the above aspect, the overall model performance parameter comprises an average model performance parameter.

Optionally, in one example of the above aspect, providing local sample data distribution information to the first member device comprises: and providing the local sample data distribution information to the first member equipment in a secure aggregation mode.

Optionally, in one example of the above aspect, the secure aggregation comprises: secret sharing based security aggregation; secure aggregation based on homomorphic encryption; secure aggregation based on inadvertent transmissions; secure obfuscation-based aggregation; or a secure aggregation based on trusted execution environments.

Optionally, in one example of the above aspect, the overall sample data probability distribution comprises: overall sample data probability distribution based on the label; overall sample data probability distribution based on features; or an overall sample data probability distribution based on the number of connected edges.

According to another aspect of embodiments herein, there is provided a method for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent identically distributed data, the method applied to the first member device, the method comprising: receiving local sample data distribution information sent by each second member device; determining the probability distribution of the whole sample data according to the received local sample data distribution information of each second member device; and sending the whole sample data probability distribution to each second member device, wherein the whole sample data probability distribution and the hyper-parameters of each second member device are used by each second member device together to extract the extended sample data of various types of sample data from the local sample data, the extended sample data is used for extending the training sample data of the service model, and the hyper-parameters at least comprise the sample data extension proportion.

Optionally, in an example of the above aspect, the method may further include: and circularly executing the following steps until the target overall model performance parameter is reached: receiving, from each second member device, model performance parameters of the updated local business model at each second member device; determining the overall model performance parameters of the business model according to the model performance parameters of each second member device; determining the adjusted hyper-parameter according to the overall model performance parameter; and sending the adjusted hyper-parameters to each second member device.

According to another aspect of embodiments herein, there is provided an apparatus for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent identically distributed data, the apparatus applied to the second member devices, the apparatus comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored in the memory, the at least one processor executing the computer program to implement: providing the local sample data distribution information to the first member equipment; receiving an overall sample data probability distribution from the first member device, wherein the overall sample data probability distribution is determined by the first member device according to local sample data distribution information received from each second member device; and determining expansion sample data of various types of sample data from the local sample data according to the probability distribution of the whole sample data and the hyperparameter, wherein the expansion sample data is used for expanding training sample data of a service model, and the hyperparameter at least comprises a sample data expansion proportion.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to implement: and performing federal learning by using respective local sample data and expansion sample data together with at least other second member equipment to obtain an updated local service model of each second member equipment.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to implement: and at least using the local sample data and the extended sample data which are subjected to disorder processing together with other second member equipment to carry out federal learning to obtain an updated local service model of each second member equipment.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to implement: determining the sample data expansion number of each type of sample data according to the total number of the local sample data, the probability distribution of the whole sample data and the hyper-parameter; and extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to implement: and randomly extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to further implement: determining model performance parameters of the updated local business model at the second member device; sending the determined model performance parameters to first member equipment; and receiving the adjusted hyper-parameters from the first member equipment, wherein the adjusted hyper-parameters are determined by the first member equipment according to the overall model performance parameters of the service model, and the overall model performance parameters are determined according to the model performance parameters of each second member equipment, wherein the expansion sample data determining step, the federal learning step, the model performance parameter determining step, the model performance parameter sending step and the hyper-parameter receiving step are executed in a circulating manner until the target overall model performance parameters are reached.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to implement: and providing the local sample data distribution information to the first member equipment in a secure aggregation mode.

According to another aspect of embodiments herein, there is provided an apparatus for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent identically distributed data, the apparatus applied to the first member device, the apparatus comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored in the memory, the at least one processor executing the computer program to implement: receiving local sample data distribution information sent by each second member device; determining the probability distribution of the whole sample data according to the received local sample data distribution information of each second member device; and sending the whole sample data probability distribution to each second member device, wherein the whole sample data probability distribution and the hyper-parameters of each second member device are used by each second member device together to extract the extended sample data of various types of sample data from the local sample data, the extended sample data is used for extending the training sample data of the service model, and the hyper-parameters at least comprise the sample data extension proportion.

Optionally, in one example of the above aspect, the at least one processor executes the computer program to further implement: and circularly executing the following steps until the target overall model performance parameter is reached: receiving, from each second member device, model performance parameters of the updated local business model at each second member device; determining the overall model performance parameters of the business model according to the model performance parameters of each second member device; determining the adjusted hyper-parameter according to the overall model performance parameter; and sending the adjusted hyper-parameters to each second member device.

According to another aspect of embodiments herein, there is provided a business model training system, comprising: a first member device comprising an apparatus as described above; and at least two second member devices, each second member device comprising an apparatus as described above, wherein each second member device has a local business model and local sample data, and each second member device has local sample data that is non-independent co-distributed data.

According to another aspect of embodiments herein, there is provided a computer-readable storage medium storing a computer program for execution by a processor to implement a business model training method performed on a second member device side.

According to another aspect of embodiments herein, there is provided a computer program product comprising a computer program for execution by a processor to implement a business model training method performed on the second member device side.

According to another aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing a computer program for execution by a processor to implement a business model training method performed on a first member device side.

According to another aspect of embodiments of the present description, there is provided a computer program product comprising a computer program for execution by a processor for implementing a business model training method performed at a first member device side.

Drawings

A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

FIG. 1 illustrates an example schematic diagram of a prior business model training system for federated learning of business models.

FIG. 2 illustrates a schematic diagram of a data-owner federated training business model with a plurality of non-independently identically distributed sets of private data samples that are horizontally sliced.

FIG. 3 illustrates an architectural diagram of a business model training system in accordance with an embodiment of the present description.

FIG. 4 shows a flow diagram of a method for training a business model in accordance with an embodiment of the present description.

FIG. 5 illustrates an example schematic diagram of a horizontally sliced data sample set in accordance with an embodiment of the present description.

FIG. 6 illustrates an example schematic diagram of a vertically sliced data sample set in accordance with an embodiment of the present description.

Fig. 7 shows a flowchart of one implementation example of an extended sample data determination process according to an embodiment of the present specification.

Fig. 8 shows a block diagram of a business model training apparatus for training a business model on the second member device side according to an embodiment of the present specification.

Fig. 9 is a block diagram illustrating an implementation example of an expansion sample data determination unit on the second member device side according to an embodiment of the present specification.

Fig. 10 shows a block diagram of an implementation example of a hyperparameter processing unit on the second member device side according to an embodiment of the present description.

Fig. 11 shows a block diagram of a business model training apparatus for training a business model on the first member device side according to an embodiment of the present specification.

Fig. 12 shows a block diagram of an implementation example of a hyper-parameter processing unit of a first member device side according to an embodiment of the present description.

FIG. 13 illustrates a schematic diagram of a business model training apparatus for training business models based on a computer implemented on a second member device side, according to embodiments of the present description.

FIG. 14 illustrates a schematic diagram of a business model training apparatus for training business models based on a computer implemented on a first member device side in accordance with embodiments of the present description.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

In this specification, the term "business model" refers to a machine learning model applied in a business scenario for business prediction services, such as machine learning models for classification prediction, business risk prediction, and the like. Examples of machine learning models may include, but are not limited to: linear regression models, logistic regression models, neural network models, decision tree models, support vector machines, and the like. Examples of Neural Network models may include, but are not limited to, Deep Neural Network (DNN) models, Convolutional Neural Network (CNN) models, BP Neural networks, and the like.

The specific implementation of the business model depends on the business scenario applied. For example, in an application scenario where the business model is applied to classify a user, the business model is implemented as a user classification model. Accordingly, the user characteristic data of the user to be classified can be subjected to user classification prediction according to the service model. In an application scenario where the business model is applied to business risk prediction for business transactions occurring on a business system, the business model is implemented as a business risk prediction model. Accordingly, business risk prediction can be performed on the business transaction characteristic data of the business transaction according to the business model.

In one example of the present specification, the training sample set used in the business model training scheme may be a training sample set consisting of a horizontally sliced data sample set. The term "horizontal slicing" refers to slicing a set of data samples into a plurality of data subsets according to a module/function (or some specified rule), each data subset containing a portion of the data sample, and the data samples included in each data subset being complete data samples, i.e., including all the characteristic data and corresponding tag values of the data sample, but having different sample IDs. In this example, each data owner obtains local data to form a local data sample set, each piece of data contained in the local data sample set being a complete data sample. Local data sample sets obtained by all data owners jointly form a training sample set of a neural network model according to a horizontal segmentation mode, wherein each local data sample set is used as a training sample subset of the training sample set to train a business model.

For example, taking two data owners as an example, assuming that the training sample set includes 100 data samples, each of which contains a plurality of feature values and label values, in the case where the training sample set used is a training sample set composed of horizontally sliced data sample sets, the data owned by the first data owner may be the first 30 data samples in the training sample set, and the data owned by the second data owner may be the last 70 data samples in the training sample set. In addition, the model structure of the business model owned by each data owner is the same as that of the business model to be trained, and the business models of the data owners jointly form the business model to be trained, that is, each data owner has part of the model parameters of the business model to be trained.

In another example of the present specification, the training sample set used in the business model training scheme may be a training sample set consisting of a vertically sliced data sample set. The term "vertical slicing" refers to slicing a data sample set into a plurality of data subsets according to a module/function (or some specified rule), each data subset containing a partial sample data of each data sample in the data sample set, and the partial sample data contained in all the data subsets constituting the complete data sample but having the same sample ID. In one example, assume that there are two data owners, Alice and Bob, and that the data sample includes a tag

And characteristic data

、

Then, after vertical segmentation, the data owner Alice owns the data sample

And

and the data owner Bob owns the data sample

. In another example, assume that a data sample includes a tag

And characteristic data

、

、

Then, after vertical segmentation, the data owner Alice owns the data sample

And

、

and the data owner Bob owns the data sample

And

. In addition to these two examples, there are other possible scenarios, which are not listed here.

For example, taking two data owners as an example, assuming that the training sample set includes 100 data samples, each of which contains a plurality of feature data and a tag value, in the case where the training sample set used is a training sample set composed of vertically sliced data sample sets, the data owned by the first data owner may be partial feature data and a tag value of each of the 100 data samples, and the data owned by the second data owner may be the remaining feature data of each of the 100 data samples.

With the development of artificial intelligence technology, machine learning technology is widely applied to various business application scenarios as a business model to perform various business prediction services, such as classification prediction, business risk prediction, and the like. For example, business models have wide application in the fields of financial fraud, recommendation systems, image recognition, and the like. To achieve better model performance, more training data needs to be used to train the business model. In the application fields of medical treatment, finance and the like, different enterprises or institutions have different data samples, and once the data are jointly trained, the model accuracy of the business model is greatly improved, so that huge economic benefits are brought to the enterprises.

In view of the foregoing, a federal learning scheme has been proposed. In a federated learning scheme, multiple data owners co-train a business model with the assistance of a server. FIG. 1 illustrates an example schematic diagram of a prior art business model training system 100 for federated learning of business models.

As shown in FIG. 1, the business model training system 100 includes a server 110 and a plurality of data owners (clients) 120, such as data owner A120-1, data owner B120-2, and data owner C120-3. The server 110 has a business model 10. When performing federal learning, the server 110 issues the business model 10 to each data owner, and is used by each data owner to perform model calculation based on local data at the data owner, so as to obtain a model prediction value at the data owner. Then, each data owner determines gradient information of each business model based on the calculated model prediction value and tag value, and provides the gradient information to the server 110. The service end 110 updates the business model using the acquired respective gradient information.

In some examples, because the data owner's samples correspond to different users, different regions of the users, and different time windows for data acquisition, when performing the joint training, the data sample sets of the data owners often have different feature distributions or label distributions, and the features are not independent of each other, and such data sample sets are called Non-iid (identified and indexed) data sample sets.

The Non-IID data sample set is described by taking the example of uneven label distribution. Assume that for the CIFAR-10 picture dataset there are ten types of pictures: when multiple data owners train jointly, a certain data owner only has one or more types of pictures, for example, the data owner a only contains pictures of airplanes, the data owner B only contains pictures of cars, and so on, thereby causing uneven distribution of sample labels among the data owners.

As shown in fig. 2, the private data sample sets that multiple data owners have are horizontally sliced Non-IID data. Each data owner has the same feature space (f 1, f2, f 3), but has data sample sets with different data distributions. In addition, the Non-IID data of the respective data owners have different sample nodes, and the probability distributions of the sample node labels are also different.

Under the condition that private data owned by each data owner is a Non-IID data sample set, if each data owner performs joint model training by using respective local original sample data as a training sample set, the data distribution of the training sample set of each data owner is different (i.e., sample data difference), so that batch sample data (batch data) sampled from the training sample set by each data owner at each time has larger difference, the gradient descending directions of the trained business models are different, and the overall performance of the trained business models is poorer.

In view of the above, embodiments of the present specification propose a sample data expansion scheme. In the sample data expansion scheme, each second member device has a local business model and local sample data, and the local sample data is non-independent and same-distribution data. Each second member device provides the local sample data distribution information to the first member device. And the first member equipment determines the probability distribution of the whole sample data according to the local sample data distribution information of each second member equipment and sends the probability distribution of the whole sample data to each second member equipment. And each second member device determines the expansion sample data of various types of sample data from the local sample data according to the probability distribution of the whole sample data and the expansion proportion of the sample data.

According to the sample data expansion scheme, the sample data expansion of the local data of each data owner can be carried out according to the overall sample data distribution characteristics of the sample data used by the service model training, so that the sample data set used by each data owner for carrying out the local service model updating can better reflect the overall sample data distribution characteristics of the sample data used by the service model training, the difference of the data distribution of the sample data of each data owner is reduced, and the overall performance of the trained service model is improved.

The following describes a method, an apparatus, and a system for training a business model according to an embodiment of the present specification in detail with reference to the accompanying drawings.

FIG. 3 shows an architectural diagram illustrating a system for training business models (hereinafter "business model training system 300") according to embodiments of the present specification.

As shown in FIG. 3, business model training system 300 includes a first member device 310 and at least two second member devices 320. In fig. 3, 3 second member devices 320-1 through 320-3 are shown. In other embodiments of the present description, more or fewer second member devices 320 may be included. The first member device 310 and the at least two second member devices 320 may communicate with each other over a network 330 such as, but not limited to, the internet or a local area network, etc.

In embodiments of the present description, first member device 310 may be a device or device side that does not deploy or maintain a business model. The second member device 320 may be a device or a device side for locally collecting data samples, such as a smart terminal device, a server device, or the like. In this specification, the term "second member device" and the terms "data owner" or "training participant" may be used interchangeably.

In this description, the local data for the second member devices 320-1 through 320-3 may include traffic data collected locally by the respective member devices. The business data may include characteristic data of the business object. Examples of business objects may include, but are not limited to, users, goods, events, or relationships. Accordingly, the business data may include, for example, but is not limited to, locally collected user characteristic data, commodity characteristic data, event characteristic data, or relationship characteristic data, such as user characteristic data, business process data, financial transaction data, commodity transaction data, medical health data, and the like. Business data can be applied to business models for model prediction, model training, and other suitable multiparty data joint processing, for example.

In this specification, the service data may include service data based on text data, image data, and/or voice data. Accordingly, the business model may be applied to business risk identification, business classification, or business decision, etc., based on text data, image data, and/or voice data. For example, the local data may be medical data collected by a hospital, and the business model may be used to perform disease examinations or disease diagnoses. Alternatively, the collected local data may include user characteristic data. Accordingly, the business model may be applied to business risk identification, business classification, business recommendation or business decision, etc. based on user characteristic data. Examples of business models may include, but are not limited to, face recognition models, disease diagnosis models, business risk prediction models, service recommendation models, and so forth.

In addition, a local business model is deployed on the second member device. In one example, the local business models at the respective second member devices have the same model structure. In another example, the model structure of the business model at each second member device may be determined from the data distribution characteristics of the local data present at that second member device, i.e., may have a different model structure. For example, where the business model is a graph neural network model, the business model at each second member device may include a feature vector representation sub-model, a normalization sub-model, and a unified discriminant model having different model structures. Alternatively, where the local data is a vertically sliced data set, the business models at the respective second member devices have different model structures.

In this specification, the local data possessed by each second member device collectively constitutes training sample data of the business model, and the local data possessed by each second member device is secret to the second member device and cannot be learned or completely learned by other second member devices.

In one practical application example, the first member device may be, for example, a server of a service provider or a service operator, such as a server of a third party payment platform for providing payment services. The respective second member device may be, for example, a data storage server or an intelligent terminal device of the service application party or the service application association party, such as a local data storage server or an intelligent terminal device of a different financial institution or medical institution.

In this description, first member device 310 and each second member device 320 may be any suitable electronic device with computing capabilities. The electronic devices include, but are not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile electronic devices, smart phones, tablet computers, cellular phones, Personal Digital Assistants (PDAs), handheld devices, messaging devices, wearable electronic devices, consumer electronic devices, and the like.

In addition, the first member device 310 and the second member devices 320-1, 320-2, 320-3 have business model training means, respectively. The business model training devices at the first and second member devices 310, 320-1, 320-2, 320-3 may perform network communication via the network 330 for data interaction, whereby a collaborative process performs a model training process for the business model. The operation and structure of the business model training apparatus will be described in detail below with reference to the accompanying drawings.

In some embodiments, network 330 may be any one or more of a wired network or a wireless network. Examples of network 330 may include, but are not limited to, a cable network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a zigbee network (zigbee), Near Field Communication (NFC), an intra-device bus, an intra-device line, and the like, or any combination thereof.

FIG. 4 shows a flow diagram of a method 400 for training a business model in accordance with an embodiment of the present description. In the business model training method 400 shown in fig. 4, the data sample set owned by the plurality of second member devices may be a horizontally sliced data sample set or a vertically sliced data sample set.

FIG. 5 illustrates an example schematic diagram of a horizontally sliced data sample set in accordance with an embodiment of the present description. In fig. 5, 2 data owners Alice and Bob are shown, as are the data owners. Each data sample in the data sample set owned by each data owner, Alice and Bob, is complete, i.e., each data sample includes complete feature data(s) ((s))x) And tag data: (y). For example, Alice possesses a complete data sample (

，

）。

FIG. 6 showsAn example schematic diagram of a vertically sliced data sample set is presented, according to an embodiment of the present description. In fig. 6, 2 data owners Alice and Bob are shown, as are the data owners. Each data owner Alice and Bob owns a part of data of each data sample in the data sample set for model training, and for each data sample, the part of data owned by the data owners Alice and Bob are combined together to form the complete content of the data sample. For example, assume that a data sample includes a tag

And characteristic data

、

Then, after vertical segmentation, the data owner Alice owns the label

And characteristic data

And the data owner Bob owns the characteristic data

。

In one example of the present specification, the sample data distribution information may be label-based sample data distribution information, feature-based sample data distribution information, or sample data distribution information based on the number of connected edges (in the case where the sample data is graph data). In one example, the sample data distribution information may be, for example, a sample number statistic. Accordingly, the sample probability distribution may be a label-based sample probability distribution, a feature-based sample probability distribution, or a connection edge number-based sample probability distribution, i.e., a sample probability distribution determined based on statistics obtained by performing statistics on labels, features, or connection edges. Next, a description will be given taking a sample number statistic based on a label as sample data distribution information as an example. In other embodiments of the present description, sample data distribution information based on features or sample data distribution information based on the number of connected edges may also be employed.

As shown in FIG. 4, at 401, at each second member device 320-1 through 320-3, a local business model at each second member device is initialized.

At 402, at each of the second member devices 320-1 through 320-3, respective local sample data sample distribution information is obtained.

For example, in the case where the sample distribution information is a tag-based sample number statistic, each second member device may perform sample data statistics on the tag of each sample data, thereby obtaining the sample number statistic of each tag. In one example, the tag-based local sample quantity statistic at the ith second member device may employ a sample data vector

Where i is the number of the second member device, k is the number of tags in the tag data, and a sample data vector

Element (1) of

Indicating the number of sample data marked as tag j in the ith second member device. Correspondingly, after sample data statistics is performed on each second member device based on the label, the sample data after statistics can be classified based on the label, and the sample data with the same label belongs to the same type of sample data, so that multiple types of sample data are obtained on each second member device. Similarly, in the case where the sample distribution information is sample distribution information based on the feature or the connection edge number, sample data classification may be performed based on the feature or the connection edge number.

At 403, the respective second member devices 320-1 to 320-3 provide respective local sample data distribution information to the first member device 310.

In one example, each second member device may provide tag-based local sample data distribution information to the first member device in a secure aggregation manner. Examples of the security aggregation may include, but are not limited to: secret sharing based security aggregation; secure aggregation based on homomorphic encryption; secure aggregation based on inadvertent transmissions; secure obfuscation-based aggregation; or a secure aggregation based on trusted execution environments. Further, in other examples of the present description, other suitable secure aggregation means may also be employed.

At 404, at the first member device 310, an overall sample data probability distribution is determined from local sample data distribution information received from the respective second member devices.

For example, in one example, at first member device 310, tag-based local sample number statistics from respective second member devices can be utilized

Determining an overall sample data probability distribution

Wherein, in the step (A),

representing the probability of being labeled as label j in the total data samples used for traffic model training.

In one example, first, a sample number statistic for each second member device is utilized

Determining a whole body sample quantity statistic

Wherein, in the step (A),

representing the number of samples in the total data samples labeled as label j,

. Then, based on the overall sample data statistics

Determining an overall sample data probability distribution

Wherein, in the step (A),

，

is the total number of samples.

At 405, first member device 310 probabilistically distributes the overall sample data

To each of the second member devices 320-1 through 320-3.

At 406, at each second member device 320-1 to 320-3, extended sample data of various types of sample data is determined from the local sample data according to the overall sample data probability distribution and the hyperparameter, where the hyperparameter includes at least the sample data extension proportion. In the case where the sample data is counted based on the tag, the class of the sample data is distinguished by the tag to which the sample data belongs.

Fig. 7 shows a flowchart of one implementation example of an extended sample data determination process 700 according to an embodiment of the present specification.

As shown in fig. 7, at 710, at each second member device, an extended number of sample data for each type of sample data is determined according to the total number of local sample data, the probability distribution of the whole sample data, and the hyper-parameter. For example, the sample data expansion number of each type of sample data may be determined according to the total number of local sample data, the probability distribution of the whole sample data, and the sample data expansion ratio.

For example, assume the sample data expansion ratio is

The second member device i comprises two labels (label 1 and label 2), and the corresponding sample number statistic value is

And

the probability distribution of the whole sample data is

Then the total sample data expansion number at the second member device i is

Wherein, the sample data expansion number of the class sample data with label of 1 is

The sample data expansion number of the class of sample data with label of 2 is

。

At 720, each second member device extracts the extended sample data of each type of sample data from the local sample data according to the determined sample data extended number of each type of sample data.

For example, for two classes of sample data, the second member device i may sample data from local (original) sample data set

) Respectively extracting

And

obtaining extended sample data of various sample data from the sample data, wherein the extended sample data of various sample data can form extended sample data set

。

Optionally, in an example, the sample data extraction process at each second member device may be extended sample data in which various types of sample data are randomly extracted from local sample data. In another example, the sample data extraction process described above at each second member device may be extended sample data that extracts various types of sample data from local sample data according to a specified extraction mechanism.

In another example, an extended sample data extraction model may be originally trained, and then local sample data, an overall sample data probability distribution and a hyper-parameter are provided to the extended sample data extraction model, thereby obtaining extended sample data of various types of sample data.

Returning to fig. 4, at 407, each second member device uses the respective local sample data and the extended sample data together to perform federal learning, resulting in an updated local business model for each second member device. Optionally, in another example, each second member device may perform federal learning together with the first member device by using the respective local sample data and the extended sample data to obtain an updated local service model of each second member device. Here, the federal learning process can be implemented using various applicable federal learning schemes in the art.

After each second member device obtains the updated local business model, at 408, model performance parameters of the respective updated local business model are determined at each second member device. For example, each second member device may select some sample data that is not used for business model training from the local sample data, perform model prediction on the selected sample data by using the updated local business model, and then evaluate the model performance parameters of the updated local business model according to the model prediction result.

At 409, each second member device sends the determined updated model performance parameters of the local business model to the first member device.

At 410, at the first member device, an overall model performance parameter of the business model is determined based on the model performance parameters of the respective second member devices. For example, in one example, an average model performance parameter of the model performance parameters of the respective second member devices may be calculated as an overall model performance parameter of the trained business model. In addition, the overall model performance parameters of the trained business model may be evaluated based on the model performance parameters of the respective second member devices in other suitable manners. In one example, the overall model performance parameters may include an average model performance parameter, whereby the overall model performance parameters may be obtained by calculating an average of the model performance parameters of the respective second member devices.

At 411, it is determined whether the determined global model performance parameters reach the target global model performance parameters. For example, in one example, the target overall model performance parameter may be a maximum overall model performance parameter. When the overall model performance parameter is the average model performance parameter, the target overall model performance parameter may be the maximum average model performance parameter, thereby determining whether the calculated average model performance parameter reaches the maximum average model performance parameter. In another example, the target overall model performance parameter may be a predetermined value that is set in advance.

And if the target overall model performance parameters are reached, the process is ended. If the target overall model performance parameters are not met, then at 412, the currently used hyper-parameters are adjusted at the first member device based on the determined overall model performance parameters.

In one example, the superparameter α may be optimized according to the overall model performance parameters using grid search method or other superparametric optimization algorithms (bayes optimization, reinforcement learning, genetic algorithms, etc.).

After the hyper-parameter adjustment is completed, returning to 406, sample data expansion is performed again using the adjusted hyper-parameter as the current hyper-parameter, and the subsequent steps are performed. And the steps are executed in a circulating mode until the determined overall model performance parameters reach the target overall model performance parameters.

Optionally, in some embodiments, before performing the federal learning, the respective local sample data and the extended sample data may be further processed out of order at each second member device. And then, carrying out federal learning by using local sample data and extended sample data which are subjected to disorder processing together with the second member equipment or the second member equipment and the first member equipment to obtain an updated local service model of each second member equipment.

Further, optionally, in some embodiments, the method may not include some or all of the operations 407 to 412. For example, all operations 407 to 412 may not be included, thereby obtaining a sample data expansion scheme of the training sample data set of the traffic model, and performing training sample expansion based on the sample data expansion scheme for subsequent traffic model training.

By using the scheme, the sample data expansion is carried out on the local data of each data owner according to the overall sample data distribution characteristics of the sample data used by the service model training, so that the sample data set used by each data owner for carrying out the local service model updating can better reflect the overall sample data distribution characteristics of the sample data used by the service model training, the difference of the data distribution of the sample data of each data owner is reduced, and the overall performance of the trained service model is improved.

By using the scheme, the second member equipment executes alliance learning by using the local sample data and the extended sample data to obtain the updated local service model, so that the service model training is realized under the condition of ensuring the safety of the local data of the second member equipment.

In addition, by using the scheme, the local sample data after sample expansion at each second member device is subjected to disorder processing, so that batch sample data selected from the local sample data after sample expansion is more balanced, and the overall performance of the trained business model is improved.

In addition, by means of the scheme, the extended sample data of various types of sample data is randomly extracted from the local sample data, so that the extended sample data has randomness, and privacy protection of the local data at each second member device is further improved.

In addition, with the scheme, each second member device provides the local sample distribution information to the first member device in a safe aggregation mode, so that the local sample distribution information of each second member device can be prevented from being leaked to other second member devices.

Fig. 8 shows a block diagram of a business model training apparatus 800 for training a business model at the second member device side according to an embodiment of the present specification.

As shown in fig. 8, the traffic model training apparatus 800 includes a sample data distribution information providing unit 810, an overall sample data probability distribution receiving unit 820, an extended sample data determining unit 830, and a federal learning unit 840.

The sample data distribution information providing unit 810 is configured to provide local sample data distribution information to the first member device. The operation of the sample data distribution information providing unit 810 may refer to the operation described above with reference to 403 of fig. 4.

The overall sample data probability distribution receiving unit 820 is configured to receive an overall sample data probability distribution from the first member device, which is determined by the first member device from the local sample data distribution information received from the respective second member devices. The operation of the entire sample data probability distribution receiving unit 820 may refer to the operation described above with reference to 405 of fig. 4.

The extended sample data determining unit 830 is configured to determine extended sample data of various types of sample data from the local sample data according to the overall sample data probability distribution and the super-parameter, where the super-parameter at least includes a sample data extension ratio. The operation of the expansion sample data determination unit 830 may refer to the operation described above with reference to 406 of fig. 4.

The federal learning unit 840 is configured to perform federal learning at least with other second member devices using respective local sample data and extended sample data to obtain updated local business models of the respective second member devices. The operation of federal learning unit 840 may refer to the operation described above with reference to 407 of fig. 4.

Fig. 9 shows a block diagram of an implementation example of the extended sample data determination unit 830 according to an embodiment of the present specification. As shown in fig. 9, the extended sample data determination unit 830 includes an extended number determination module 831 and an extended sample data determination module 833.

The extended number determining module 831 is configured to determine the extended number of sample data of each type of sample data according to the total number of local sample data, the probability distribution of the whole sample data, and the hyper-parameter. The operation of the augmentation-number determining model 831 may refer to the operation described above with reference to 710 of fig. 7.

The extended sample data determining module 833 is configured to extract extended sample data of each type of sample data from the local sample data according to the determined sample data extended number of each type of sample data. The operation of the extended sample data determination module 833 may refer to the operation described above with reference to 720 of fig. 7.

In addition, optionally, the business model training apparatus 800 may further include a hyper-parameter processing unit. The hyper-parameter processing unit is configured to perform a process related to hyper-parameter adjustment. In this case, the extended sample data determination unit 830, the federal learning unit 840, and the hyper-parameter processing unit cyclically perform operations until the determined overall model performance parameters reach the target overall model performance parameters.

Fig. 10 shows a block diagram of an implementation example of the hyperparameter processing unit 1000 of the second member device side according to an embodiment of the present description. As shown in fig. 10, the hyper-parameter processing unit 1000 includes a model performance parameter determination module 1010, a model performance parameter transmission module 1020, and a hyper-parameter reception module 1030.

The model performance parameter determination module 1010 is configured to determine model performance parameters of the updated local business model at the second member device. The operation of the model performance parameter determination module 1010 may refer to the operation described above with reference to 408 of FIG. 4.

The model performance parameter sending module 1020 is configured to send the determined model performance parameters to the first member device. The operation of the model performance parameter sending module 1020 may refer to the operation described above with reference to 409 of fig. 4.

The hyper-parameter receiving module 1030 is configured to receive an adjusted hyper-parameter from a first member device, the adjusted hyper-parameter being determined by the first member device from an overall model performance parameter of the business model, the overall model performance parameter being determined from a model performance parameter of each second member device.

Furthermore, it is noted that the business model training apparatus 800 may not include the federal learning unit 840. In this case, the business model training device 800 may correspond to a sample data expansion device.

Fig. 11 shows a block diagram of a business model training apparatus 1100 for training a business model at a first member device side according to an embodiment of the present description. As shown in fig. 11, the traffic model training apparatus 1100 includes a sample data distribution information receiving unit 1110, an entire sample data probability distribution determining unit 1120, and an entire sample data probability distribution transmitting unit 1130.

The sample data distribution information receiving unit 1110 is configured to receive local sample data distribution information transmitted by the respective second member devices. The operation of the sample data distribution information receiving unit 1110 may refer to the operation described above with reference to 403 of fig. 4.

The overall sample data probability distribution determination unit 1120 is configured to determine an overall sample data probability distribution from the received local sample data distribution information of the respective second member devices. The operation of the overall sample data probability distribution determining unit 1120 may refer to the operation described above with reference to 404 of fig. 4.

The whole sample data probability distribution sending unit 1130 is configured to send a whole sample data probability distribution to each second member device, where the whole sample data probability distribution is used by each second member device together with the hyper-parameters of each second member device to extract, from the local sample data, extended sample data of various types of sample data of training sample data used in federal learning for extending the service model, where the hyper-parameters at least include a sample data extension ratio.

In addition, optionally, the business model training apparatus 1100 may further include a hyper-parameter processing unit. The hyper-parameter processing unit is configured to perform a process related to hyper-parameter adjustment.

Fig. 12 shows a block diagram of an implementation example of a hyper-parameter processing unit 1200 at the first member device side according to an embodiment of the present description. As shown in fig. 12, the hyper-parameter processing unit 1200 includes a model performance parameter receiving module 1210, an overall model performance determining module 1220, a hyper-parameter determining module 1230, and a hyper-parameter transmitting module 1240. During model training, the model performance parameter receiving module 1210, the overall model performance determining module 1220, the hyper-parameter determining module 1230 and the hyper-parameter sending module 1240 perform operations in a loop until the determined overall model performance parameters reach the target overall model performance parameters.

Specifically, at each cycle, the model performance parameter receiving module 1210 is configured to receive from each second member device the updated model performance parameters of the local business model at each second member device. The operation of the model performance parameter receiving module 1210 may refer to the operation described above with reference to 409 of fig. 4.

The overall model performance determination module 1220 is configured to determine overall model performance parameters of the business model based on the model performance parameters of the respective second member devices. The operations of the overall model performance determination module 1220 may refer to the operations described above with reference to 410 of FIG. 4.

The hyper-parameter determination module 1230 is configured to determine the adjusted hyper-parameter based on the overall model performance parameters. The operation of the hyper-parameter determination module 1230 may refer to the operation described above with reference to 412 of FIG. 4.

The hyper-parameter sending module 1240 is configured to send the adjusted hyper-parameter to each second member device for use by each second member device to perform the next loop process.

A method for business model training, an apparatus for business model training, and a business model training system according to embodiments of the present specification are described above with reference to fig. 1 to 12. The above apparatus for training the business model can be implemented by hardware, or can be implemented by software, or a combination of hardware and software.

Fig. 13 shows a schematic diagram of a business model training apparatus 1300 for training a business model based on a computer implemented on a second member device side according to an embodiment of the present description. As shown in FIG. 13, business model training apparatus 1300 may include at least one processor 1310, storage (e.g., non-volatile storage) 1320, memory 1330, and communication interface 1340, and the at least one processor 1310, storage 1320, memory 1330, and communication interface 1340 are connected together via a bus 1360. The at least one processor 1310 executes at least one computer program (i.e., the above-described elements implemented in software) stored or encoded in memory.

In one embodiment, a computer program is stored in the memory that, when executed, causes the at least one processor 1310 to: providing the local sample data distribution information to the first member equipment; receiving an overall sample data probability distribution from the first member device, wherein the overall sample data probability distribution is determined by the first member device according to local sample data distribution information received from each second member device; and determining expansion sample data of various types of sample data from the local sample data according to the probability distribution of the whole sample data and the hyperparameter, wherein the expansion sample data is used for expanding training sample data of the service model, and the hyperparameter at least comprises a sample data expansion proportion.

It should be appreciated that the computer programs stored in the memory, when executed, cause the at least one processor 1310 to perform the various operations and functions described above in connection with fig. 1-12 in the various embodiments of the present specification.

FIG. 14 shows a schematic diagram of a business model training apparatus 1400 for training a business model based on a computer implementation on a first member device side according to an embodiment of the present description. As shown in fig. 14, electronic device 1400 may include at least one processor 1410, storage (e.g., non-volatile storage) 1420, memory 1430, and communication interface 1440, and the at least one processor 1410, storage 1420, memory 1430, and communication interface 1440 are connected together via a bus 1460. The at least one processor 1410 executes at least one computer program (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, a computer program is stored in the memory that, when executed, causes the at least one processor 1410 to: receiving local sample data distribution information sent by each second member device; determining the probability distribution of the whole sample data according to the received local sample data distribution information of each second member device; and sending the probability distribution of the whole sample data to each second member device, wherein the probability distribution of the whole sample data and the hyper-parameters of each second member device are used by each second member device together to extract the expansion sample data of various sample data from the local sample data, the expansion sample data is used for expanding the training sample data of the business model federation, and the hyper-parameters at least comprise the sample data expansion proportion.

It should be appreciated that the computer programs stored in the memory, when executed, cause the at least one processor 1410 to perform the various operations and functions described above in connection with fig. 1-12 in the various embodiments of the present specification.

According to one embodiment, a program product, such as a computer-readable medium (e.g., a non-transitory computer-readable medium), is provided. The computer-readable medium may have a computer program (i.e., the elements described above as being implemented in software) that, when executed by a processor, causes the processor to perform various operations and functions described above in connection with fig. 1-12 in the various embodiments of the present specification. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the computer-readable code and the readable storage medium storing the computer-readable code constitute a part of the present invention.

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

According to one embodiment, a computer program product is provided that includes a computer program that, when executed by a processor, causes the processor to perform the various operations and functions described above in connection with fig. 1-12 in the various embodiments of the present specification.

It will be understood by those skilled in the art that various changes and modifications may be made in the above-disclosed embodiments without departing from the spirit of the invention. Accordingly, the scope of the invention should be determined from the following claims.

It should be noted that not all steps and units in the above flows and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware units or processors may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent co-distributed data, the method applied to second member devices, the method comprising:

providing the local sample data distribution information to the first member equipment;

receiving an overall sample data probability distribution from the first member device, wherein the overall sample data probability distribution is determined by the first member device according to local sample data distribution information received from each second member device; and

and determining expansion sample data of various types of sample data from local sample data according to the probability distribution of the whole sample data and the hyperparameter, wherein the expansion sample data is used for expanding training sample data of a service model, and the hyperparameter at least comprises a sample data expansion proportion.

2. The method of claim 1, further comprising:

and performing federal learning by using respective local sample data and expansion sample data together with at least other second member equipment to obtain an updated local service model of each second member equipment.

3. The method of claim 2, wherein using the respective local sample data and the extended sample data for federated learning with at least other second member devices to obtain an updated local business model for each second member device comprises:

and at least using the local sample data and the extended sample data which are subjected to disorder processing together with other second member equipment to carry out federal learning to obtain an updated local service model of each second member equipment.

4. The method of claim 1, wherein determining extended sample data of each type of sample data from local sample data according to the overall sample data probability distribution and the hyperparameters comprises:

determining the sample data expansion number of each type of sample data according to the total number of the local sample data, the probability distribution of the whole sample data and the hyper-parameter; and

and extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

5. The method of claim 4, wherein extracting extended sample data of each type of sample data from the local sample data according to the determined extended number of sample data of each type of sample data comprises:

and randomly extracting the expansion sample data of various types of sample data from the local sample data according to the determined expansion number of the sample data of various types of sample data.

6. The method of claim 2, further comprising:

determining model performance parameters of the updated local business model at the second member device;

sending the determined model performance parameters to first member equipment;

receiving adjusted hyper-parameters from the first member devices, the adjusted hyper-parameters being determined by the first member devices according to overall model performance parameters of the business model, the overall model performance parameters being determined according to model performance parameters of the respective second member devices,

and the extended sample data determining step, the federal learning step, the model performance parameter determining step, the model performance parameter sending step and the hyper-parameter receiving step are executed circularly until the target overall model performance parameter is reached.

7. The method of claim 6, wherein the overall model performance parameter is an average model performance parameter.

8. The method of claim 1, wherein providing local sample data distribution information to a first member device comprises:

and providing the local sample data distribution information to the first member equipment in a secure aggregation mode.

9. The method of claim 8, wherein the secure aggregation comprises:

secret sharing based security aggregation;

secure aggregation based on homomorphic encryption;

secure aggregation based on inadvertent transmissions;

secure obfuscation-based aggregation; or

Secure aggregation based on trusted execution environments.

10. The method of any of claims 1 to 9, wherein the overall sample data probability distribution comprises:

overall sample data probability distribution based on the label;

overall sample data probability distribution based on features; or

And (4) overall sample data probability distribution based on the number of connecting edges.

11. A method for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent co-distributed data, the method applied to the first member device, the method comprising:

receiving local sample data distribution information sent by each second member device;

determining the probability distribution of the whole sample data according to the received local sample data distribution information of each second member device; and

and sending the whole sample data probability distribution to each second member device, wherein the whole sample data probability distribution and the hyper-parameters of each second member device are used together by each second member device to extract the extended sample data of various types of sample data from the local sample data, the extended sample data is used for extending the training sample data of the service model, and the hyper-parameters at least comprise the sample data extension proportion.

12. The method of claim 11, further comprising:

and circularly executing the following steps until the target average model performance parameter is reached:

receiving, from each second member device, model performance parameters of the updated local business model at each second member device;

determining the overall model performance parameters of the business model according to the model performance parameters of each second member device;

determining the adjusted hyper-parameter according to the overall model performance parameter; and

and sending the adjusted hyper-parameters to each second member device.

13. An apparatus for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent co-distributed data, the apparatus applied to second member devices, the apparatus comprising:

at least one processor for executing a program code for the at least one processor,

a memory coupled to the at least one processor, an

A computer program stored in the memory, the computer program being executable by the at least one processor to implement:

receiving an overall sample data probability distribution from the first member device, wherein the overall sample data probability distribution is determined by the first member device according to local sample data distribution information received from each second member device;

14. The apparatus of claim 13, wherein the at least one processor executes the computer program to implement:

15. The apparatus of claim 14, wherein the at least one processor executes the computer program to implement:

16. The apparatus of claim 13, wherein the at least one processor executes the computer program to implement:

17. The apparatus of claim 16, wherein the at least one processor executes the computer program to implement:

18. The apparatus of claim 14, wherein the at least one processor executes the computer program to further implement:

sending the determined model performance parameters to first member equipment;

wherein the extended sample data determining step, the federal learning step, the model performance parameter determining step, the model performance parameter transmitting step and the hyper-parameter receiving step are executed circularly until a target average model performance parameter is reached.

19. The apparatus of claim 13, wherein the at least one processor executes the computer program to implement:

20. The apparatus of any of claims 13 to 19, wherein the overall sample data probability distribution comprises:

overall sample data probability distribution based on the label;

overall sample data probability distribution based on features; or

21. An apparatus for training a business model via a first member device and at least two second member devices, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent co-distributed data, the apparatus applied to the first member device, the apparatus comprising:

a memory coupled to the at least one processor, an

22. The apparatus of claim 21, wherein the at least one processor executes the computer program to further implement:

and circularly executing the following steps until the target overall model performance parameter is reached:

and sending the adjusted hyper-parameters to each second member device.

23. A business model training system, comprising:

a first member device comprising the apparatus of claim 21 or 22; and

at least two second member devices, each second member device comprising an apparatus according to any of claims 13 to 20, each second member device having a local business model and local sample data, and each second member device having local sample data that is non-independent co-distributed data.

24. A computer-readable storage medium storing a computer program for execution by a processor to implement the method of any one of claims 1 to 10.

25. A computer program product comprising a computer program for execution by a processor to implement the method of any one of claims 1 to 10.

26. A computer-readable storage medium storing a computer program for execution by a processor to implement the method of claim 11 or 12.

27. A computer program product comprising a computer program for execution by a processor to implement the method of claim 11 or 12.