CN115936209A - Product purchase prediction method, system and controller based on horizontal federation - Google Patents

Product purchase prediction method, system and controller based on horizontal federation Download PDF

Info

Publication number
CN115936209A
CN115936209A CN202211534469.4A CN202211534469A CN115936209A CN 115936209 A CN115936209 A CN 115936209A CN 202211534469 A CN202211534469 A CN 202211534469A CN 115936209 A CN115936209 A CN 115936209A
Authority
CN
China
Prior art keywords
data
feature
quantiles
product purchase
coordinator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211534469.4A
Other languages
Chinese (zh)
Inventor
宁景文
钟焰涛
郑毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Resources Digital Technology Co Ltd
Original Assignee
China Resources Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Resources Digital Technology Co Ltd filed Critical China Resources Digital Technology Co Ltd
Priority to CN202211534469.4A priority Critical patent/CN115936209A/en
Publication of CN115936209A publication Critical patent/CN115936209A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a product purchase prediction method based on a transverse federation, a product purchase prediction system, a controller and a computer storage medium, wherein the product purchase prediction method based on the transverse federation comprises the steps of carrying out random forest sampling processing on respective preset data sets by a plurality of data parties to obtain N data sampling sets, determining the optimal characteristics and the optimal splitting points according to information gain generated by a coordinator, obtaining base learners corresponding to the N data sampling sets, and obtaining strong learners according to the N base learners.

Description

Product purchase prediction method, system and controller based on horizontal federation
Technical Field
The application relates to the technical field of federal learning, in particular to a product purchase prediction method, a product purchase prediction system, a controller and a computer storage medium based on horizontal federal.
Background
In the prior art, with the issuance of personal information protection laws, data safety laws and network safety laws, data use compliance becomes a problem to be solved urgently, in the field of artificial intelligence, a plurality of data leakage scenes exist in data cooperation, specifically, a central people bank serves as a coordinator, other banks serve as data parties, and a data leakage scene exists in a scene of carrying out financial product purchase prediction according to user data, so that the federal learns to be born as a new technical route to meet the data use compliance requirement;
however, at present, in a related scene of product purchase prediction based on federal learning, the basic strategy of federal learning is an addition model plus a forward distribution algorithm, the approach to a true value is achieved through continuous fitting residual errors, and the prediction precision is improved by reducing deviation.
Disclosure of Invention
The embodiment of the application provides a product purchase prediction method, a product purchase prediction system, a controller and a computer storage medium based on a horizontal federation, and at least guarantees that the scheme of the application reduces generalization errors from the aspect of variance, samples various data square samples through a random forest strategy, and leads various data squares to train N base learners, thereby effectively reducing the variance of federation learning, improving the generalization capability of federation learning, effectively preventing overfitting, improving the prediction precision of federation learning and improving the precision of the product purchase prediction method.
In a first aspect, an embodiment of the present application provides a horizontal federation-based product purchase prediction method, which is applied to a product purchase prediction system, where the product purchase prediction system includes a coordinator and multiple data parties, and the horizontal federation-based product purchase prediction method includes:
the data parties carry out random forest sampling processing on respective preset data sets to obtain N data sampling sets;
the data parties respectively correspond to the number M of samples of the data sampling set i The local minimum value and the local maximum value are sent to a coordinator;
the coordinator according to the number M of samples i Obtaining a plurality of feature quantiles from the local minimum value and the local maximum value, and respectively sending the plurality of feature quantiles to a plurality of data parties, wherein the feature quantiles comprise a plurality of quantiles of a plurality of features;
the data parties respectively perform feature binning processing according to the feature quantiles to obtain feature binning data, and the feature binning data are sent to the coordination party so that the best features and the best splitting points can be determined according to information gains generated by the coordination party according to the feature binning data;
and obtaining N base learners corresponding to the data sampling sets by the data parties respectively, and obtaining a strong learner according to the N base learners.
In some embodiments, the predetermined data set corresponding to the data party includes M i The method comprises the following steps that a plurality of data parties carry out random forest sampling processing on respective data sets to obtain N data sampling sets, and the method comprises the following steps:
the data side carries out N M on the preset data set i Then the random sampling treatment with the replacement is carried out to obtain N samples with the number of M i The set of data samples of (a).
In some embodiments, the lateral federal based product purchase forecasting method further comprises:
determining a target data party from a plurality of the data parties;
carrying out random column sampling on a data set corresponding to the target data party to obtain a characteristic sequence;
and the target data party sends the characteristic sequence to the rest data parties so as to enable the rest data parties to update the characteristic sequence according to the characteristic sequence.
In some embodiments, the feature quantile comprises a plurality of quantiles for each feature in the sequence of features, and the coordinator is in accordance with the number of samples M i Obtaining a plurality of feature quantiles from the local minimum and the local maximum, including:
the coordinator obtains a global minimum value and a global maximum value according to the local minimum value and the local maximum value;
the coordinator obtains a temporary quantile according to the global minimum and the global maximum;
the coordinator according to the temporary quantile and the number M of samples i A plurality of feature quantiles are obtained.
In some embodiments, the coordinator is based on the temporary quantile and the number of samples M i Obtaining a plurality of feature quantiles, including:
the coordinating party sends the temporary quantiles to a plurality of data parties so that the data parties can obtain the number n of data smaller than the temporary quantiles in the preset data set according to the temporary quantiles ik
The coordinator obtains the data number n ik And according to said data quantity n ik And predicting the quantile percentage, the global minimum value and the global maximum value to obtain a plurality of feature quantiles.
In some embodiments, the performing, by the data parties, feature binning processing according to the feature quantiles to obtain feature binning data, and sending the feature binning data to the coordinating party includes:
the data side collects the data samples according to the number of the data samplesCarrying out derivation processing on the samples to obtain a plurality of first-order derivatives g i And second derivative h i
The data side performs matching on the first-order derivatives g according to the feature quantile i And the second derivative h i Respectively carrying out box separation and aggregation treatment to obtain characteristic box separation data G v And H v
The data side boxes the characteristics into data G through a key exchange protocol v And H v And sending the information gain to the coordinator to obtain the information gain generated by the coordinator, wherein the information gain corresponds to the optimal feature and the optimal split point.
In some embodiments, the strong learner is configured to obtain the prediction result according to a training task, and the obtaining the strong learner according to the N base learners includes:
under the condition that the training task is regression training, the prediction result is the expected value of the prediction values of the N base learners;
or, in the case that the training task is classification training, the prediction result is a predicted mode value of the N base learners.
In a second aspect, the present application provides a product purchase prediction system, where the product purchase prediction system includes a coordinator and multiple data parties;
the coordinator is used for receiving the number M of samples of the data sampling set corresponding to each of the plurality of data parties i Local minimum and local maximum according to the number M of samples i Obtaining a plurality of characteristic quantiles by the local minimum value and the local maximum value, respectively sending the plurality of characteristic quantiles to a plurality of data parties, and generating information gain according to characteristic binning data of the plurality of data parties so as to enable the plurality of data parties to determine the optimal characteristics and the optimal splitting point;
the data side is used for carrying out random forest sampling processing on respective preset data sets to obtain N data sampling sets, and the number M of corresponding samples of the data sampling sets i Local minimum and local maximum transmissionAnd the coordinator obtains a plurality of characteristic quantiles sent by the coordinator, performs characteristic binning processing according to the characteristic quantiles to obtain characteristic binning data, sends the characteristic binning data to the coordinator, determines the optimal characteristic and the optimal splitting point according to information gain generated by the coordinator according to the characteristic binning data, further obtains N base learners corresponding to the data sampling sets, and obtains a strong learner according to the N base learners.
In a third aspect, an embodiment of the present application provides a controller, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the horizontal federation-based product purchase prediction method as described in any one of the embodiments in the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions for performing a lateral federal based product purchase forecast method as described in any one of the embodiments of the first aspect.
The application has at least the following beneficial effects: the method includes the steps that a plurality of data parties carry out random forest sampling processing on respective preset data sets to obtain N data sampling sets, according to information gain generated by a coordinating party, the optimal characteristics and the optimal splitting points are determined, then base learners corresponding to the N data sampling sets are obtained, and strong learners are obtained according to the N base learners.
Drawings
FIG. 1 is a flow chart of a lateral federal based product purchase forecasting method in accordance with an embodiment of the present application;
FIG. 2 is a flow chart of a random forest sampling process in a horizontal federal based product purchase forecasting method according to another embodiment of the present application;
FIG. 3 is a flow chart of feature sequence acquisition in a horizontal federation-based product purchase forecasting method according to another embodiment of the present application;
FIG. 4 is a flow chart of obtaining a plurality of feature quantiles in a horizontal federation-based product purchase forecasting method according to another embodiment of the present application;
FIG. 5 is another flow chart of obtaining a plurality of feature quantiles in a horizontal federal based product purchase forecasting method in accordance with another embodiment of the present application;
fig. 6 is a flowchart of sending the feature binning data to the coordinator in the horizontal federation-based product purchase forecasting method according to another embodiment of the present application;
FIG. 7 is a flowchart illustrating the method for forecasting product purchases based on the horizontal federation according to another embodiment of the present application, wherein the strong learner is obtained according to N of the base learners;
FIG. 8 is a diagram illustrating an example of a lateral federal based product purchase forecasting method in accordance with another embodiment of the present application;
FIG. 9 is a schematic diagram of the present application for generating a strong learner by randomly sampling a sample set according to another embodiment of the present application;
fig. 10 is a block diagram of a controller according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In some embodiments, while functional block divisions are performed in system diagrams, with logical orders shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different than the block divisions in the systems, or the flowcharts. The terms first, second and the like in the description and in the claims, and the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
At present, in the field of integrated learning in federal learning, the basic strategy of a safety tree model is to add an additive model and a forward distribution algorithm, approach to a true value is achieved through continuous fitting of residual errors, and prediction precision is improved by reducing deviation; in addition, the encryption scheme of practical safety aggregation realizes the horizontal federal learning, but the generalization capability of the Secureboost algorithm of the safety tree model is not improved, and overfitting is easy.
In order to solve at least the problems, the application discloses a product purchase prediction method based on the horizontal federation, a product purchase prediction system, a controller and a computer storage medium, wherein the product purchase prediction method based on the horizontal federation performs random forest sampling processing on respective preset data sets through a plurality of data parties to obtain N data sampling sets, determines the optimal characteristics and the optimal splitting points according to information gain generated by a coordinating party, further obtains base learners corresponding to the N data sampling sets, and obtains strong learners according to the N base learners.
Embodiments of the present application are further described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart of a horizontal federation-based product purchase forecasting method provided in an embodiment of a first aspect of the present application, and in some embodiments, the horizontal federation-based product purchase forecasting method is applied to a product purchase forecasting system, the product purchase forecasting system includes a coordinator and a plurality of data parties, and the horizontal federation-based product purchase forecasting method includes, but is not limited to, the following steps S110, S120, S130, S140, and S150;
step S110, a plurality of data parties perform random forest sampling processing on respective preset data sets to obtain N data sampling sets;
step S120, each of the plurality of data partiesNumber of samples M of corresponding data sampling set i The local minimum value and the local maximum value are sent to a coordinator;
step S130, the coordinator carries out the coordination according to the number M of the samples i Obtaining a plurality of feature quantiles by the local minimum value and the local maximum value, and respectively sending the plurality of feature quantiles to a plurality of data parties, wherein the feature quantiles comprise a plurality of quantiles of a plurality of features;
step S140, a plurality of data parties respectively perform characteristic binning processing according to the characteristic quantiles to obtain characteristic binning data, and the characteristic binning data are sent to a coordinating party so as to determine the optimal characteristics and the optimal splitting points according to information gain generated by the coordinating party according to the characteristic binning data;
and S150, obtaining base learners corresponding to the N data sampling sets by a plurality of data parties respectively, and obtaining a strong learner according to the N base learners.
In some embodiments, the generalization error is reduced from the variance angle, and each data side sample is sampled through a random forest strategy, so that each data side trains N base learners, the variance of federal learning is further effectively reduced, the generalization capability of federal learning is improved, overfitting is effectively prevented, and the prediction accuracy of federal learning is improved.
In some embodiments, in the process of training the base learner in the horizontal federation, the optimal feature and the optimal split point are continuously obtained, the data side continuously splits the nodes and obtains two new nodes according to the optimal feature and the optimal split point, so as to update the local model, and the rest is repeated, so that the nodes are continuously split until the loss value is less than the tolerance or reaches the maximum tree depth, the generation of the base learner is realized, and the base learners corresponding to the N data sampling sets are obtained.
In some embodiments, the technology belongs to the field of federal learning in privacy computing, and is used for multi-party training for privacy protection, in the application, a product purchase prediction method based on the horizontal federal can be used for bank financing product purchase prediction, specifically, for example, a central people bank serves as a coordinator, other banks serve as data parties, and user data is possessed, and the product purchase prediction method based on the horizontal federal is characterized by deposit balance, product type, product amount, product period, purchase amount, browsing times, browsing duration and the like, the label value is ('purchase', 'unpurched'), and each bank serves as a data party and can be modeled by the horizontal federal based product purchase prediction method, so that the data is fully utilized mutually.
Referring to fig. 2, fig. 2 is a flow chart of a random forest sampling process in a horizontal federal-based product purchase forecasting method according to another embodiment of the present application, in which, in some embodiments, a preset data set corresponding to a data party includes M i The method comprises the following steps that (S210) a plurality of samples are obtained, and a plurality of data parties perform random forest sampling processing on respective data sets to obtain N data sample sets;
step S210, the data side performs N × M on the preset data set i Then the random sampling treatment with the replacement is carried out to obtain N samples with the number of M i The set of data samples of (a).
In some embodiments, the data side performs line sampling on a preset data set, and the data side performs self M i Random sampling with put back of training set of individual samples, M i Sub-sampling to obtain new sampling sets with the same number as the training sets, wherein the probability of one sample being selected in the sub-sampling is 1/M i And the probability of being selected in M times of samples is as follows:
Figure BDA0003977038740000061
the parallelism of line sampling is N, N new sampling sets are obtained, data of each new sampling set are not identical, the result of a finally generated decision tree is not identical, the generalization of the application is improved, in addition, 0.377 data are not selected and can be used for verifying a number set, and the prediction result of the application is verified.
Column sampling: and selecting one data side for random column sampling, and transmitting the obtained new characteristic sequence [ f1, f2, …, fk ] to other data sides for updating the characteristic sequence. The generalization property is improved.
Referring to fig. 3, fig. 3 is a flowchart of a feature sequence acquired in a horizontal federation-based product purchase forecasting method according to another embodiment of the present application, where in some embodiments, the horizontal federation-based product purchase forecasting method includes, but is not limited to, the following steps S310, S320, and S330;
step S310, determining a target data party from a plurality of data parties;
step S320, random column sampling is carried out on a data set corresponding to a target data side to obtain a characteristic sequence;
and step S330, the target data side sends the characteristic sequence to the other data sides so that the other data sides update the characteristic sequence according to the characteristic sequence.
In some embodiments, the features of a plurality of data parties are subjected to column sampling, one data party is selected for random column sampling, and the obtained new feature sequence [ f1, f2, …, fk ] is transmitted to other data parties for updating the feature sequence, so that the generalization of the product purchase prediction method based on the horizontal federation is improved.
Referring to fig. 4, fig. 4 is a flowchart of obtaining a plurality of feature quantiles in a product purchase prediction method based on the horizontal federation according to another embodiment of the present application, where in some embodiments, the feature quantile includes a plurality of quantiles of each feature in a feature sequence, and a coordinating party is based on a sample number M i The local minimum value and the local maximum value obtain a plurality of feature quantiles, including but not limited to the following steps S410, S420 and S430;
step S410, the coordinator obtains a global minimum and a global maximum according to the local minimum and the local maximum;
step S420, the coordinating party obtains a temporary quantile according to the global minimum and the global maximum;
step S430, the coordinating party carries out temporary quantile and sample number M i A plurality of feature quantiles are obtained.
In some embodiments, the data party willEach M is i The number of samples is sent to the coordinator, and the coordinator obtains
Figure BDA0003977038740000062
For the kth characteristic, each data party sends the local minimum value and the local maximum value of each data party to the coordination party, and the coordination party obtains the global minimum value and the global maximum value (Q) through calculation min ,Q max ). Coordinator according to Q = (Q) min +Q max ) And/2, calculating a temporary quantile Q, wherein the characteristic quantile refers to a quantile numerical value, the first quantile percentage is obtained by calculating the reciprocal according to the predicted quantile, and the obtained multiple characteristic quantiles are used for enabling all data parties to adopt the same quantile numerical value to perform binning when the characteristics are binned, so that the preparation for later transverse federal learning calculation is achieved.
In some embodiments, the specific relationship of the local minimum, local maximum, global minimum and global maximum is as follows: the data side 1 has data: user a, deposit 10 (local maximum); user B, deposit 5 (local minimum); the data side 2 has data: user C, deposit 15 (local maximum); user D, deposit 6 (local minimum); after being collected to a coordinator, the following results are obtained: user C, deposit 15 (global maximum); user B, credit 5 (global minimum).
It is conceivable that, in the security tree model of the present application, the sorted data is directly subjected to random integer transformation during data cleaning, and may directly participate in calculation, and the finally calculated quantiles are also substantially the sorting result of random sorting, and have no size relationship.
Referring to fig. 5, fig. 5 is another flow chart of obtaining a plurality of feature quantiles in a product purchase forecasting method based on the horizontal federation according to another embodiment of the present application, and in some embodiments, a coordinating party is based on the temporary quantile and the number of samples M i Obtaining a plurality of feature quantiles, including but not limited to the following steps S510 and S520;
step S510, the coordinating party sends the temporary quantiles to a plurality of data parties so that the data parties can obtain data numbers smaller than the temporary quantiles in a preset data set according to the temporary quantilesQuantity n ik
Step S520, the coordinator obtains the data quantity n ik And according to the number n of data ik And predicting the quantile percentage, the global minimum value and the global maximum value to obtain a plurality of feature quantiles.
In some embodiments, the coordinating party sends the temporary quantile to a plurality of data parties, so that the data parties obtain the number n of data smaller than the temporary quantile in a preset data set according to the temporary quantile ik The coordinator obtains the data quantity n ik And according to the number n of data ik The specific process of predicting the quantile percentage, the global minimum value and the global maximum value to obtain a plurality of feature quantiles is as follows: after the temporary quantile Q is calculated, distributing the temporary quantile Q to all data sides, and counting the number n smaller than Q by all data sides ik Then converge to the coordinator to obtain
Figure BDA0003977038740000071
If Q is greater than the predetermined vth fraction percentage, Q max = Q; if less than, Q min = Q. Continuously repeating the steps until Q is equal to or approximately equal to the predicted quantile percentage to obtain the v quantile Q v And continuously repeating the steps to obtain all quantiles of all the characteristics, and synchronizing to each data side, wherein Q is (0,1) so as to be compared with v, and Q is a specific numerical value and takes a value from minus infinity to plus infinity. />
Referring to fig. 6, fig. 6 is a flowchart illustrating a method for predicting product purchase based on horizontal federation according to another embodiment of the present application, where in some embodiments, a plurality of data parties perform feature binning processing according to feature quantiles to obtain feature binning data, and send the feature binning data to a coordinating party, where the method includes, but is not limited to, the following steps S610, S620, and S630;
step S610, the data side conducts derivation processing on a plurality of samples in the data sampling set to obtain a plurality of first-order derivatives g i And second derivative h i
Step S620, the data side according toFeature quantile versus first order derivative g i And second derivative h i Respectively carrying out box separation and aggregation treatment to obtain characteristic box separation data G v And H v
Step S630, the data side boxes the characteristic data G through a key exchange protocol v And H v And sending the information gain to the coordinator to obtain the information gain generated by the coordinator, wherein the information gain corresponds to the optimal characteristic and the optimal splitting point.
In some embodiments, the data side derives a plurality of first derivatives g from deriving a plurality of samples in the data sample set i And second derivative h i The data side is based on feature quantiles and the number of first derivatives g i And second derivative h i Respectively carrying out box separation and aggregation treatment to obtain characteristic box separation data G v And H v Wherein, the application trains N base learners in parallel. The data side respectively calculates the first derivative g of each sample according to the set loss function and the label value i Second derivative h i And are grouped into G according to bins v And H v And initializes the root node.
In some embodiments, the first derivative g i Second derivative h i And the first derivative and the second derivative of the loss function are subjected to binning according to the fractional value obtained by the previous calculation. For example, for a first bin of the first feature, all g's belonging to the first bin are binned 1 Adding up to obtain G1, belonging to all h of the first sub-box 1 And accumulating to obtain H1.
In some embodiments, the data party bins the features into data G via a key exchange protocol v And H v And the data side generates a symmetric key AES according to a key exchange protocol Diffie He lman. And generating masks pairwise, wherein the formula is as follows, wherein j refers to other data parties:
Figure BDA0003977038740000081
in some embodiments, a specific procedure to obtain coordinator-generated information gainThe following were used: data side is G' v Sent to a coordinator to be summarized
Figure BDA0003977038740000082
Then, traversing all the quantiles of all the features to calculate the information gain, wherein the formula is as follows:
Figure BDA0003977038740000083
in some embodiments, the data party boxes the features into data G through a key exchange protocol v And H v Sending the information gain to a coordinator to obtain the information gain generated by the coordinator, wherein the information gain corresponds to the optimal feature and the optimal split point, then setting the base learner training round _ round to 1 and the learning rate learning _ round to 1 to obtain the optimal parameter, namely a predicted value, for each leaf node based on the security tree model secureboost, and the formula is as follows:
Figure BDA0003977038740000084
because the error is not reduced by gradually fitting the residual error, the original boost training round is only one time, so the learning rate is one, and the tree is obtained by fitting in place in one step. And then reducing the generalization error by combining a plurality of trees obtained by parallel training and strategies, thereby reducing the error.
Referring to fig. 7, fig. 7 is a flowchart illustrating a method for horizontal federation-based product purchase forecasting according to N base learners to obtain a strong learner, where in some embodiments, the strong learner is configured to obtain a forecasting result according to a training task and obtain the strong learner according to N base learners, where the method includes, but is not limited to, the following steps S710 and S720;
step S710, under the condition that the training task is regression training, the prediction result is the expected value of the prediction values of the N base learners;
in step S720, or in the case that the training task is classification training, the prediction result is a mode value of the prediction expectation of the N base learners.
In some embodiments, the product purchase prediction method based on the horizontal federation completes training of N base learners, and if the training task is regression, expectation is obtained for the predicted values, namely, if the training task is classification, mode is obtained for the predicted values, so that the prediction accuracy can be effectively improved.
Referring to fig. 8, fig. 8 is an example diagram of a product purchase prediction method based on the horizontal federation according to another embodiment of the present application, in some embodiments, the present application starts a horizontal federation training base learner by inputting data of each party, performing data random sampling and column sampling, and performing horizontal quantile binning, and simultaneously realizes generation of the base learner by judging whether a loss value is less than a tolerance or reaches a preset tree depth, and finally obtains a federated learning model capable of high generalization by a base learner set policy, which effectively prevents overfitting. And random column sampling of the features of the data set may also reduce variance. And then based on secureboost, setting boosting _ round, namely the training round of the base learner, as 1, setting the learning _ rate as 1, finally, performing parallel training on the N base learners, and solving an expected value of a training result to further reduce the variance, thereby effectively preventing the federal training from overfitting.
Referring to fig. 9, fig. 9 is a schematic diagram of the present application that generates a strong learner by randomly sampling a sample set according to another embodiment of the present application, in which a plurality of data parties perform random forest sampling processing on respective preset data sets to obtain N data sample sets, and determine an optimal feature and an optimal split point according to information gain generated by a coordinator, so as to obtain base learners corresponding to the N data sample sets, and obtain the strong learner according to the N base learners.
A second aspect of the present application provides a product purchase prediction system, which includes a coordinator and a plurality of data parties;
a coordinator for receiving the number M of samples of the data sampling set corresponding to each of the plurality of data parties i Local minimum and local maximum according to the number M of samples i The local minimum value and the local maximum value obtain a plurality of feature quantiles, the feature quantiles are respectively sent to a plurality of data parties, and information gain is generated according to feature binning data of the data parties so that the data parties determine the optimal features and the optimal splitting points;
a data side for random forest sampling processing on the respective preset data sets to obtain N data sampling sets, and counting the number M of samples in the corresponding data sampling sets i And sending the local minimum value and the local maximum value to a coordinator to obtain a plurality of feature quantiles sent by the coordinator, performing feature binning processing according to the feature quantiles to obtain feature binning data, sending the feature binning data to the coordinator, determining the optimal feature and the optimal splitting point according to information gain generated by the coordinator according to the feature binning data, further respectively obtaining base learners corresponding to the N data sampling sets, and obtaining a strong learner according to the N base learners.
In some embodiments, the product purchase prediction system conforms to an operating environment in which the horizontal federal-based product purchase prediction method of any one of the above embodiments operates, so that the product purchase prediction system has the functions and effects of the horizontal federal-based product purchase prediction method of any one of the above embodiments.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a controller according to an embodiment of the present invention.
Some embodiments of the present invention provide a controller, the controller includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor when executing the computer program implements the horizontal federation-based product purchase forecasting method according to any one of the above embodiments, for example, executes the above-described method steps S110 to S150 in fig. 1, S210 in fig. 2, S310 to S330 in fig. 3, S410 to S430 in fig. 4, S510 to S520 in fig. 5, S610 to S630 in fig. 6, and S710 to S720 in fig. 7.
The controller 1000 according to the embodiment of the present invention includes one or more processors 1010 and memories 1020, and fig. 10 illustrates one processor 1010 and one memory 1020 as an example.
The processor 1010 and the memory 1020 may be connected by a bus or other means, such as the bus connection shown in FIG. 10.
The memory 1020, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory 1020 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1020 may optionally include memory 1020 located remotely from the processor 1010, which may be connected to the controller 1000 via a network, examples of which include, but are not limited to, the internet, an intranet, a local area network, a mobile communications network, and combinations thereof.
In some embodiments, the processor, when executing the computer program, executes the horizontal federal based product purchase forecast method of any of the above embodiments at preset intervals.
Those skilled in the art will appreciate that the device configuration shown in fig. 10 does not constitute a limitation of the controller 1000, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
In the controller 1000 shown in fig. 10, the processor 1010 may be configured to invoke the lateral federal based product purchase forecast method stored in the memory 1020, thereby implementing the lateral federal based product purchase forecast method.
Based on the hardware structure of the controller 1000, the embodiments of the product purchase prediction system of the present invention are proposed, and at the same time, the non-transitory software programs and instructions required to implement the horizontal federal-based product purchase prediction method of the embodiments are stored in the memory, and when executed by the processor, the horizontal federal-based product purchase prediction method of the embodiments is executed.
In addition, the embodiment of the invention also provides a product purchase prediction system, which comprises the controller.
In some embodiments, since the product purchase prediction system according to the embodiment of the present invention has the controller according to the above-described embodiment, and the controller according to the above-described embodiment is capable of executing the product purchase prediction method according to the above-described embodiment, a specific implementation and a technical effect of the product purchase prediction system according to the embodiment of the present invention may refer to a specific implementation and a technical effect of the product purchase prediction method according to the above-described embodiment.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions for performing the horizontal federation-based product purchase forecasting method described above, for example, the one or more processors may be caused to execute the horizontal federation-based product purchase forecasting method in the above method embodiment, for example, the method steps S110 to S150 in fig. 1, the method step S210 in fig. 2, the method steps S310 to S330 in fig. 3, the method steps S410 to S430 in fig. 4, the method steps S510 to S520 in fig. 5, the method steps S610 to S630 in fig. 6, and the method steps S710 to S720 in fig. 7 described above are executed.
The above-described embodiments of the apparatus are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network nodes. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer readable storage media (or non-transitory media) and communication media (or transitory media). The term computer readable storage medium includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are to be included within the scope of the present invention defined by the claims.

Claims (10)

1. A lateral federation-based product purchase forecasting method applied to a product purchase forecasting system including a coordinating party and a plurality of data parties, the lateral federation-based product purchase forecasting method comprising:
the data parties carry out random forest sampling processing on respective preset data sets to obtain N data sampling sets;
the data parties respectively correspond to the number M of samples of the data sampling set i The local minimum value and the local maximum value are sent to a coordinator;
the coordinator according to the number M of samples i Obtaining a plurality of feature quantiles from the local minimum value and the local maximum value, and respectively sending the plurality of feature quantiles to a plurality of data parties, wherein the feature quantiles comprise a plurality of quantiles of a plurality of features;
the plurality of data parties respectively perform characteristic binning processing according to the characteristic quantiles to obtain characteristic binning data, and the characteristic binning data are sent to the coordination party so as to determine an optimal characteristic and an optimal splitting point according to information gain generated by the coordination party according to the characteristic binning data;
and obtaining N base learners corresponding to the data sampling sets by the data parties respectively, and obtaining a strong learner according to the N base learners.
2. The horizontal federation-based product purchase forecasting method of claim 1, wherein the preset data set corresponding to the data party comprises M i The method comprises the following steps that a plurality of data parties carry out random forest sampling processing on respective data sets to obtain N data sampling sets, and the method comprises the following steps:
the data side carries out N M on the preset data set i Then the random sampling treatment with the replacement is carried out to obtain N samples with the number of M i The set of data samples of (a).
3. The lateral federation-based product purchase forecasting method of claim 1 or 2, further comprising:
determining a target data party from a plurality of the data parties;
carrying out random column sampling on a data set corresponding to the target data party to obtain a characteristic sequence;
and the target data party sends the characteristic sequence to the rest data parties so as to enable the rest data parties to update the characteristic sequence according to the characteristic sequence.
4. The method according to claim 3, wherein the feature quantile comprises a plurality of quantiles for each feature in the sequence of features, and the coordinator is configured to predict the product purchase based on the number of samples M i Obtaining a plurality of feature quantiles from the local minimum and the local maximum, including:
the coordinator obtains a global minimum value and a global maximum value according to the local minimum value and the local maximum value;
the coordinator obtains a temporary quantile according to the global minimum and the global maximum;
the coordinator according to the temporary quantile and the sample number M i And obtaining a plurality of feature quantiles.
5. The method of claim 4, wherein the orchestrator is configured to predict the product purchases according to the temporal quantile and the number of samples M i Obtaining a plurality of feature quantiles, including:
the coordinating party sends the temporary quantiles to a plurality of data parties so that the data parties can obtain the number n of data smaller than the temporary quantiles in the preset data set according to the temporary quantiles ik
The coordinator obtains the data quantity n ik And according to the data number n ik And predicting the quantile percentage, the global minimum value and the global maximum value to obtain a plurality of feature quantiles.
6. The horizontal federation-based product purchase prediction method of claim 1, wherein the plurality of data parties respectively perform feature binning according to the feature quantiles to obtain feature binning data, and send the feature binning data to the coordinator, including:
the data side conducts derivation processing on a plurality of samples in the data sampling set to obtain a plurality of first-order derivatives g i And second derivative h i
The data side performs pair on the first order derivatives g according to the characteristic quantile i And the second derivative h i Respectively carrying out box separation and aggregation treatment to obtain characteristic box separation data G v And H v
The data side boxes the characteristics into data G through a key exchange protocol v And H v And sending the information gain to the coordinator to obtain the information gain generated by the coordinator, wherein the information gain corresponds to the optimal feature and the optimal split point.
7. The horizontal federation-based product purchase prediction method of claim 1, wherein the reinforcement learner is configured to obtain a prediction result from a training task, and the reinforcement learner is obtained from N of the base learners, and comprises:
under the condition that the training task is regression training, the prediction result is the expected value of the prediction values of the N base learners;
or, in the case that the training task is classification training, the prediction result is a mode value of the prediction expectation of the N base learners.
8. A product purchase forecasting system, the product purchase forecasting system comprising a coordinator and a plurality of data parties;
the coordinator is used for receiving the number M of samples of the data sampling set corresponding to each of the plurality of data parties i Local minimum and local maximum according to the number M of samples i Obtaining a plurality of feature quantiles from the local minimum and the local maximum, sending the plurality of feature quantiles to the plurality of data parties respectively, and obtaining a plurality of feature quantiles according to the plurality of data partiesGenerating information gain for a plurality of said data parties to determine optimal characteristics and optimal split points;
the data side is used for carrying out random forest sampling processing on respective preset data sets to obtain N data sampling sets, and the number M of corresponding samples of the data sampling sets i The local minimum value and the local maximum value are sent to the coordinating party so as to obtain that the coordinating party sends a plurality of feature quantiles, feature binning processing is carried out according to the feature quantiles to obtain feature binning data, the feature binning data are sent to the coordinating party, optimal features and optimal splitting points are determined according to information gain generated by the coordinating party according to the feature binning data, N base learners corresponding to the data sampling sets are obtained, and strong learners are obtained according to the N base learners.
9. A controller comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing the horizontal federal based product purchase forecast method as in any of claims 1-7.
10. A computer readable storage medium storing computer executable instructions for performing the horizontal federal based product purchase prediction method as claimed in any one of claims 1 to 7.
CN202211534469.4A 2022-12-02 2022-12-02 Product purchase prediction method, system and controller based on horizontal federation Pending CN115936209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211534469.4A CN115936209A (en) 2022-12-02 2022-12-02 Product purchase prediction method, system and controller based on horizontal federation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211534469.4A CN115936209A (en) 2022-12-02 2022-12-02 Product purchase prediction method, system and controller based on horizontal federation

Publications (1)

Publication Number Publication Date
CN115936209A true CN115936209A (en) 2023-04-07

Family

ID=86697286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211534469.4A Pending CN115936209A (en) 2022-12-02 2022-12-02 Product purchase prediction method, system and controller based on horizontal federation

Country Status (1)

Country Link
CN (1) CN115936209A (en)

Similar Documents

Publication Publication Date Title
CN110942154B (en) Data processing method, device, equipment and storage medium based on federal learning
CN112770291B (en) Distributed intrusion detection method and system based on federal learning and trust evaluation
CN104978601B (en) neural network model training system and method
CN112712182B (en) Model training method and device based on federal learning and storage medium
CN106104406A (en) Neutral net and the method for neural metwork training
CN104008028B (en) Intelligent mobile terminal data backup memory method and system based on many cloud storages
CN104091081B (en) Traffic data make-up method
CN113033712B (en) Multi-user cooperative training people flow statistical method and system based on federal learning
CN112862001A (en) Decentralized data modeling method under privacy protection
CN109543924A (en) Goods amount prediction technique, device and computer equipment
CN102510529A (en) Method for performing on-demand play quantity prediction and memory scheduling on programs
CN105956723A (en) Logistics information management method based on data mining
CN113947211A (en) Federal learning model training method and device, electronic equipment and storage medium
CN113902037A (en) Abnormal bank account identification method, system, electronic device and storage medium
CN105550772A (en) Online historical data tendency analysis method
CN110610098A (en) Data set generation method and device
Ekárt et al. Genetic programming with transfer learning for urban traffic modelling and prediction
CN114565106A (en) Defense method for federal learning poisoning attack based on isolated forest
CN115936209A (en) Product purchase prediction method, system and controller based on horizontal federation
CN116186629B (en) Financial customer classification and prediction method and device based on personalized federal learning
CN112037063A (en) Exchange rate prediction model generation method, exchange rate prediction method and related equipment
Gulić et al. Evolution of vehicle routing problem heuristics with genetic programming
CN108898527A (en) A kind of traffic data fill method based on the generation model for having loss measurement
CN107766927A (en) Universal parallel method of the intelligent optimization algorithm based on individual population on Spark
CN113762526B (en) Federal learning method, hierarchical network system, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination