CN113377625B - Method and device for data monitoring aiming at multi-party combined service prediction - Google Patents

Method and device for data monitoring aiming at multi-party combined service prediction Download PDF

Info

Publication number
CN113377625B
CN113377625B CN202110830690.3A CN202110830690A CN113377625B CN 113377625 B CN113377625 B CN 113377625B CN 202110830690 A CN202110830690 A CN 202110830690A CN 113377625 B CN113377625 B CN 113377625B
Authority
CN
China
Prior art keywords
data
prediction
service
statistical
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110830690.3A
Other languages
Chinese (zh)
Other versions
CN113377625A (en
Inventor
吴庭丞
余超凡
余可丰
操顺德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110830690.3A priority Critical patent/CN113377625B/en
Publication of CN113377625A publication Critical patent/CN113377625A/en
Application granted granted Critical
Publication of CN113377625B publication Critical patent/CN113377625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The embodiment of the specification provides a method and a device for data monitoring aiming at multi-party joint service prediction, which are used for carrying out data monitoring on joint service prediction by using a multi-party security computation MPC through monitoring equipment and a plurality of service party equipment. Any one service side device firstly obtains data to be predicted containing private data of the service side device, performs service prediction on the data to be predicted through data interaction between a plurality of service side devices by using an MPC (multimedia personal computer) based on a service prediction model in the service side devices, enables the service side device to obtain a prediction result aiming at the data to be predicted, and then adds the data to be predicted and the corresponding prediction result to a data set to be counted. When the preset statistical conditions are met, the business side equipment performs statistics on the non-statistical data in the data set to be subjected to statistics to obtain statistical characteristics which do not contain privacy data, the statistical characteristics are sent to the monitoring equipment, and the monitoring equipment performs combined processing on the statistical characteristics sent by the multiple business side equipment.

Description

Method and device for data monitoring aiming at multi-party combined service prediction
Technical Field
One or more embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a method and an apparatus for data monitoring for multi-party federated service prediction.
Background
With the development of artificial intelligence technology, neural networks have been gradually applied to the fields of risk assessment, speech recognition, face recognition, natural language processing, and the like. The neural network structure under different application scenarios is already relatively fixed, and more training data is needed to realize better model performance. In the fields of medical treatment, finance and the like, different enterprises or institutions have different business data, and the model accuracy can be greatly improved by performing joint training on the business data.
When the joint training model is used for on-line prediction, in order to ensure the correctness of a prediction result and the normal operation of the model, the input and output data of the model need to be monitored and displayed in real time by monitoring equipment, and are saved for follow-up problem tracking and backtracking. However, business data owned by different enterprises or organizations usually contains a large amount of private data, and input and output data of online prediction are strictly controlled, and domains are not obtained.
Therefore, an improved scheme is expected to be provided, which can monitor the input and output data of the model when a plurality of parties jointly predict the service, and simultaneously ensure that the data privacy of the service parties is not leaked.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for data monitoring for multi-party federated service prediction, so as to implement monitoring of input and output data of a model when a multi-party federated service prediction is performed, and simultaneously ensure that data privacy of a service party is not revealed. The specific technical scheme is as follows.
In a first aspect, an embodiment provides a method for data monitoring for multi-party federated service prediction, where data monitoring is performed on federated service prediction using a multi-party security computing MPC by a monitoring device and multiple service-party devices, where the method is performed by any one of the service-party devices, and includes:
acquiring data to be predicted, which contains privacy data, of the business side equipment;
based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted;
adding the data to be predicted and the corresponding prediction result to a data set to be counted;
when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data;
and sending the statistical characteristics to the monitoring equipment so that the monitoring equipment performs joint processing on the statistical characteristics sent by the plurality of business side equipment.
In one embodiment, the statistical conditions include several of the following:
the number of data in the data set to be counted reaches a preset number threshold;
the time length between the current time and the last statistical time reaches a preset time length threshold value.
In one embodiment, the step of performing statistics on the non-statistical data in the data set to be counted includes:
selecting several ways of the following modes for statistics:
counting the data to be predicted in the data set to be counted;
counting the prediction results in the data set to be counted;
and counting the corresponding relation between the plurality of data to be predicted in the data set to be counted and the prediction result.
In one embodiment, the statistical features include several of the following: maximum, minimum, mean, median, bucket data.
In one embodiment, the statistical features include bucketized data; the step of counting the non-statistical data in the data set to be counted comprises the following steps:
acquiring a plurality of barrel intervals of non-statistical data;
determining the number of the non-statistical data in the data set to be counted in a plurality of barrel-dividing intervals to obtain barrel-dividing data;
wherein, the step of obtaining a plurality of barrel intervals of the non-statistical data comprises:
when the non-statistical data are the data to be predicted, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
when the non-statistical data is the prediction result, determining a barrel division interval according to the value of the prediction result;
when the non-statistical data are the corresponding relation, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
the step of determining the number of the non-statistical data in the data set to be counted in the plurality of bucket-dividing intervals comprises:
and when the non-statistical data is the corresponding relation, determining the number of the prediction results in a plurality of barrel-dividing intervals based on the corresponding relations to obtain barrel-dividing data.
In one embodiment, after obtaining the statistical feature that does not include the private data, the method further includes:
acquiring historical statistical characteristics;
and determining whether the joint service prediction has an abnormal condition or not based on the comparison of the historical statistical characteristics and the statistical characteristics.
In one embodiment, the step of performing traffic prediction on the data to be predicted includes:
sending a prediction request to other business side equipment; and when receiving feedback information of the other business side equipment for the prediction request, performing business prediction on the data to be predicted.
In one embodiment, the method further comprises:
receiving prediction requests sent by other business side equipment;
and sending feedback information aiming at the prediction request to other business side equipment, and performing business prediction on data to be predicted of other business side equipment through data interaction between the business side equipment by using the MPC on the basis of business prediction models in the business side equipment.
In one embodiment, the method further comprises:
and determining request information related to a prediction request and adding the request information to the data set to be counted.
In one embodiment, the request information includes several of the following: number of requests, number of times prediction was successfully performed, proportion of predictions successfully performed.
In one embodiment, a plurality of business side devices comprise part of model parameters of the business prediction model; part of model parameters of a plurality of business side devices form all model parameters of the business prediction model after assumed combination;
or, the partial model parameters of the multiple business side devices and the partial model parameters of the server form all the model parameters of the business prediction model after assumed combination; the step of predicting the service of the data to be predicted comprises the following steps: and performing service prediction on the data to be predicted through data interaction between the plurality of service party devices and the server by using the MPC on the basis of the plurality of service party devices and the service prediction model in the server.
In a second aspect, an embodiment provides a method for data monitoring for multi-party federated business prediction, where data monitoring is performed on federated business prediction using MPC by a monitoring device and a plurality of business-party devices, and the method is performed by the monitoring device and includes:
receiving statistical characteristics sent by a plurality of business side devices, wherein the statistical characteristics are obtained by counting data to be predicted and corresponding prediction results, and the statistical characteristics do not contain privacy data; the prediction result is obtained by performing service prediction on the data to be predicted based on data interaction between a plurality of service side devices by using the MPC and respective service prediction models;
and performing joint processing on the statistical characteristics of the plurality of business side devices to obtain a joint processing result.
In one embodiment, after the statistical characteristics of the plurality of business side devices are jointly processed, a modification scheme for the business prediction models in the plurality of business side devices is determined by using relevant data in the joint processing result.
In a third aspect, an embodiment provides a method for data monitoring for multiparty joint service prediction, in which a monitoring device and a plurality of service party devices are used to perform data monitoring for joint service prediction using an MPC;
any one service side device acquires data to be predicted, which contains privacy data, of the service side device; based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted; adding the data to be predicted and the corresponding prediction result to a data set to be counted; when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data; sending the statistical characteristics to the monitoring device;
the monitoring device receives the statistical characteristics sent by the plurality of business side devices and performs combined processing on the statistical characteristics of the plurality of business side devices.
In a fourth aspect, an embodiment provides an apparatus for data monitoring for multi-party federated service prediction, where data monitoring is performed on federated service prediction using an MPC by a monitoring device and a plurality of service-party devices, and the apparatus is deployed in any one of the service-party devices, and includes:
the acquisition module is configured to acquire to-be-predicted data containing privacy data of the business side equipment;
the prediction module is configured to perform service prediction on the data to be predicted through data interaction between a plurality of service party devices by using the MPC based on service prediction models in the service party devices, so that the service party devices obtain prediction results aiming at the data to be predicted;
the adding module is configured to add the data to be predicted and the corresponding prediction result to a data set to be counted;
the statistical module is configured to count the non-statistical data in the data set to be counted when a preset statistical condition is met, so as to obtain a statistical characteristic without privacy data;
and the sending module is configured to send the statistical characteristics to the monitoring equipment so that the monitoring equipment performs joint processing on the statistical characteristics sent by the plurality of business side equipment.
In one embodiment, the statistical conditions include several of the following:
the number of data in the data set to be counted reaches a preset number threshold;
the time length between the current time and the last statistical time reaches a preset time length threshold value.
In one embodiment, the statistical module is specifically configured to:
selecting several ways of the following modes for statistics:
counting the data to be predicted in the data set to be counted;
counting the prediction results in the data set to be counted;
and counting the corresponding relation between the plurality of data to be predicted in the data set to be counted and the prediction result.
In one embodiment, the statistical features include several of the following: maximum, minimum, mean, median, bucket data.
In one embodiment, the statistical features include bucketized data; the statistical module is specifically configured to:
acquiring a plurality of barrel intervals of non-statistical data;
determining the number of the non-statistical data in the data set to be counted in a plurality of barrel-dividing intervals to obtain barrel-dividing data;
wherein, when the statistics module obtains a plurality of bucket intervals of not counting data, include:
when the non-statistical data are the data to be predicted, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
when the non-statistical data is the prediction result, determining a barrel division interval according to the value of the prediction result;
when the non-statistical data are the corresponding relation, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
the statistical module, when confirming the number of the non-statistical data in the data set to be counted in a plurality of sub-bucket intervals, comprises:
and when the non-statistical data is the corresponding relation, determining the number of the prediction results in a plurality of barrel-dividing intervals based on the corresponding relations to obtain barrel-dividing data.
In a fifth aspect, an embodiment provides an apparatus for data monitoring for multi-party federated service prediction, where data monitoring is performed on federated service prediction using MPC through a monitoring device and a plurality of service-party devices, and the apparatus is deployed in the monitoring device, and includes:
the receiving module is configured to receive statistical characteristics sent by a plurality of business side devices, wherein the statistical characteristics are obtained by carrying out statistics on data to be predicted and corresponding prediction results, and the statistical characteristics do not contain privacy data; the prediction result is obtained by performing service prediction on the data to be predicted based on data interaction between a plurality of service side devices by using the MPC and respective service prediction models;
and the processing module is configured to perform joint processing on the statistical characteristics of the plurality of business side devices to obtain a joint processing result.
In a sixth aspect, an embodiment provides a system for data monitoring for multi-party federated service prediction, including a monitoring device and a plurality of service party devices, where the monitoring device and the plurality of service party devices perform data monitoring for federated service prediction using an MPC;
any service side device is used for acquiring data to be predicted, which contains privacy data, of the service side device; based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted; adding the data to be predicted and the corresponding prediction result to a data set to be counted; when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data; sending the statistical characteristics to the monitoring device;
the monitoring device is used for receiving the statistical characteristics sent by the plurality of business side devices and carrying out combined processing on the statistical characteristics of the plurality of business side devices.
In a seventh aspect, embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when executed in a computer, causes the computer to perform the method of any one of the first to third aspects.
In an eighth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first to third aspects.
In the method and the apparatus provided in the embodiment of the present specification, after data interaction is performed between service side devices by using an MPC, and a prediction result of data to be predicted is determined, the data to be predicted including privacy data and a corresponding prediction result may be added to a data set to be counted, and when a preset statistical condition is satisfied, non-statistical data in the data set to be counted is counted to obtain a statistical characteristic that does not include privacy data, and the statistical characteristic is sent to a monitoring device. The monitoring device may receive the statistical characteristics sent by the multiple service side devices, and may perform joint processing on the multiple statistical characteristics. Therefore, the business side equipment does not need to send the privacy data to the monitoring equipment, the monitoring equipment carries out joint processing on the statistical characteristics of the business side equipment, the obtained joint processing result is consistent with the result obtained by directly obtaining the privacy data of the business side equipment and carrying out statistics, but the business side equipment does not need to transmit the privacy data out of the domain, and therefore monitoring on the input and output data of the model can be achieved on the premise that the data privacy of the business side is not leaked.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 is a flowchart illustrating a method for data monitoring for multiparty federation service prediction according to an embodiment;
FIG. 3 is a schematic diagram of an implementation flow within a business-side device;
FIG. 4 is a schematic diagram of comparing historical bucketized data with current bucketized data;
fig. 5 is another schematic flowchart of a method for data monitoring for multi-party federated business prediction according to an embodiment;
FIG. 6 is a schematic block diagram of an apparatus for data monitoring for multi-party federated business prediction according to an embodiment;
FIG. 7 provides another schematic block diagram of an apparatus for data monitoring for multiparty federated business prediction;
the embodiment of fig. 8 provides a schematic block diagram of a system for data monitoring for multi-party federated business prediction.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The service side devices have respective service data, which belong to the privacy data. A plurality of (two or more) service side devices can jointly train a service prediction model based on respective existing service data through data interaction by using the MPC. After the business prediction model is trained, the plurality of business side devices perform data interaction by using the MPC based on the business prediction model of common joint training, so as to perform joint business prediction. Any business side device can count input and output data in the joint business prediction to obtain statistical characteristics without privacy data, and the statistical characteristics are sent to the monitoring device. Therefore, the monitoring device can receive the statistical characteristics of the plurality of business side devices and perform joint processing on the statistical characteristics to obtain joint processing results of the business prediction data aiming at the plurality of business side devices.
The business side devices correspond to enterprises or institutions, each of which belongs to a business side. For example, different banks, different shopping malls, different hospitals, different schools, etc. may correspond to one business party respectively. Each service party can perform data processing, data transmission and other operations through the service party equipment. Each service party has its service data, which is restricted to access in the intranet of the service party and cannot be out of domain. A business-side device may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities.
The business data may include feature data of an object, and the object may be one of various objects to be analyzed in business such as a user, a commodity, an event, and the like, and thus the business data may be user data, commodity data, event data, and the like. The service data may include a plurality of characteristic items. The characteristic item of the object may include at least one of the following characteristic items: basic attribute information, association relation information, interaction information, historical behavior information and the like of the object. For example, when the object is a user, the basic attribute information may include gender, age, income, and the like of the user, the association information of the user may include other users, companies, regions, and the like, which have an association with the user, the interaction information of the user may include information of clicking, viewing, participating in a certain activity, and the like of the user at a certain website, and the historical behavior information of the user may include historical transaction behavior, payment behavior, purchase behavior, and the like of the user. The service data is often privacy data of a service party, and higher privacy and security are required to be kept in the processing process.
Multiple business parties can jointly train the business prediction model based on respective sample data sets. Wherein, the sample data set may include existing service data or historical service data. The business prediction model can be a model trained in a machine learning mode and used for conducting business prediction on the business data of the input object, including classification or regression on the object. For example, the business prediction model may process the input user data, determine whether the user is a high-risk user, or determine a risk value for the user; the input service data may be subjected to abnormality detection or the like.
In one model architecture for joint processing, the business prediction model may include n computational layers. Any one of the business side devices may have the structure of the n computation layers, and only have a part of model parameters in each computation layer, that is, the business side device includes a part of model parameters of the business prediction model, and the part of model parameters of the business side devices form all model parameters of the business prediction model after assumed combination. In this scenario, the structure of n computation layers is deployed in multiple business side devices, and part of the model parameters of each computation layer are distributed in multiple business side devices, respectively. For any one computation layer, all model parameters of the computation layer are equal to the sum of partial model parameters of a plurality of business side devices in the computation layer. The computation layer here may comprise an output layer.
The service side devices can be respectively used as input nodes of the service prediction model, and the service data of the service side devices can be used as one input data of the service prediction model.
In another model architecture of the joint process, the business prediction model may be divided into two parts, namely a first part model and a second part model, the first part model may be deployed in the server, the second part model is deployed in the business side device, and an output of one part model is used as an input of the other part model, for example, an output of the first part model may be used as an input of the second part model, or an output of the second part model is used as an input of the first part model. That is, some of the model parameters of the plurality of business side devices and some of the model parameters of the server, after assumed combination, constitute all of the model parameters of the business prediction model.
The first partial model comprises a computation layer deployed in a server. The computing layer included in the second partial model is deployed in a plurality of business side devices, and partial model parameters of each computing layer of the second partial model are respectively distributed in the plurality of business side devices. For each computation layer of the second partial model, all model parameters of the computation layer are equal to the sum of the partial model parameters of the plurality of business side devices at the computation layer.
For the two model architectures, for any one computation layer deployed in the business side devices, when neurons in the computation layer are computed, multiple business side devices can perform data interaction among multiple parties by using the MPC based on data owned by each business side device, and perform joint computation on the neurons in the computation layer.
Among them, the Multi-party Secure computing (MPC) is an existing data privacy protection technology that can be used for Multi-party participation, and its specific implementation includes techniques such as homomorphic encryption, garbled circuit, careless transmission, secret sharing, etc. By adopting a multi-party safe calculation mode, reliable private data calculation can be provided when a plurality of business side devices jointly perform business prediction, and privacy disclosure of the business side devices is avoided.
In the model training process and the service prediction process, the structure of the service prediction model deployed in the service side equipment is unchanged. Compared to the business prediction process, the model training process further comprises the following operations: after the service side equipment inputs the sample data into the service prediction model and obtains the prediction result aiming at the sample data through the joint processing of the MPC, the service side equipment can determine the prediction loss based on the difference between the labeling information of the sample data and the prediction result, and update the model parameters of the service prediction model by utilizing the prediction loss and the data interaction between the plurality of service side equipment. And in the service prediction process, as a service side device of a prediction initiator, inputting the data to be predicted containing the private data into a service prediction model, and obtaining the prediction result of the data to be predicted through joint processing by using the MPC. The prediction result may be whether the object is a high-risk user, or a risk value of the object, or a probability value that the object has an abnormality. Correspondingly, the labeling information may be a label of whether the object is a high-risk user, a labeling risk value of the object, and a labeling probability value of the object having an abnormality, respectively. In this case, the traffic prediction model may also be implemented in a specific form such as an anomaly detection model.
In the online prediction process, the monitoring device needs to monitor and display input and output data of a plurality of business side devices, and track and backtrack problems, so as to ensure the correctness of the prediction result and the normal operation of the model. However, the business policy is directed to input data of the business prediction model and output data obtained from the business prediction model, which belong to private data, and cannot be output in a clear text.
In the scenario of performing business prediction by combining multiple parties, in order to monitor input and output data of a model and ensure that privacy of the data is not leaked, an embodiment of the specification provides a data monitoring method. Any one service side device obtains data to be predicted containing private data of the service side, performs data interaction between the plurality of service side devices by using the MPC based on service prediction models in the plurality of service side devices, performs service prediction on the data to be predicted, and enables the service side device to obtain a prediction result aiming at the data to be predicted. The business side equipment adds the data to be predicted and the corresponding prediction result to the data set to be counted, when the preset statistical condition is met, the non-statistical data in the data set to be counted are counted to obtain the statistical characteristic which does not contain the privacy data, and the statistical characteristic is sent to the monitoring equipment. When the monitoring equipment receives the statistical characteristics of the plurality of service side equipment, the statistical characteristics of the plurality of service side equipment are subjected to combined processing to obtain a combined processing result.
Therefore, the monitoring equipment does not need to directly count the input and output data of the business side equipment, but only carries out joint processing on the statistical characteristics which are sent by the business side equipment and do not contain the privacy data, namely, the input and output data of the business side equipment are counted, and the safety of the business side privacy data is ensured.
The present description will be described with reference to specific examples.
Fig. 2 is a flowchart illustrating a method for data monitoring for multi-party federated service prediction according to an embodiment. In this embodiment, data monitoring is performed on joint service prediction using MPC by the monitoring device C and the plurality of service side devices. The monitoring device may be implemented by any device, platform, device cluster, etc. having computing and processing capabilities. For convenience of description, the following examples take two service parties as examples. For example, the two service parties are a service party a and a service party B, respectively, and the corresponding devices for performing data processing are a service party device a and a service party device B, respectively. The service side and the service side equipment are in one-to-one correspondence. The service side equipment is used for executing data calculation, data transmission and other processing of the service side. And the monitoring equipment adopts the mark C. The service party a is taken as an initiator of online service prediction for explanation. The method of the present embodiment specifically includes the following steps S210 to S250.
In step S210, the service device a obtains Data to be predicted 1 of the service device a, which contains private Data.
The data to be predicted may be business data of the business party a, for example, data of a user of the business party a who is to perform risk assessment, data of an object of the business party a who is to perform risk value prediction, data of an object to be subjected to abnormality detection, and the like. In one example, business party a and business party B are different banks, respectively. The business party A has the user 1 to be subjected to risk assessment, at the moment, the business party device A can obtain the characteristic Data of the user 1 as Data1 to be predicted, and the risk prediction by combining multiple parties can be more accurate. In another example, business party a and business party B are an insurance company and a hospital, respectively. The service party A has the user 2 to be subjected to the anomaly detection, at the moment, the service party device A can obtain the characteristic Data of the user 2 as Data1 to be predicted, and the anomaly detection is carried out by combining multiple parties, so that the service party A can determine whether the user 2 has information which does not accord with the participation condition.
The data to be predicted may be characteristic data of the object, which belongs to business data, including privacy data. The service party device a can read the encrypted Data to be predicted from the high available storage space of the service party a, and decrypt the encrypted Data to be predicted to obtain the Data1 to be predicted. The Data1 to be predicted can also be directly stored in the service side device a, and the service side device a can directly acquire the Data1 to be predicted from the storage space of the service side device a.
The Data to be predicted Data1 is input Data of the business party a for inputting business prediction in the business prediction model to perform business prediction, and belongs to Data to be monitored.
Step S220, based on the service prediction models in the multiple service side devices, performing service prediction on the Data to be predicted 1 through Data interaction between the multiple service side devices by using the MPC, so that the service side device a obtains a prediction result Re1 for the Data to be predicted Data 1.
In this step, the business prediction model is a model subjected to multi-party joint training, and can be used for business prediction.
The model architecture of the business prediction model may include a variety of approaches, at least two of which have been enumerated in the foregoing. In any model architecture mode, multiple Data interactions are performed by utilizing multi-party security calculation based on Data to be predicted 1 and multiple service party equipment a obtains a prediction result Re1 of Data to be predicted 1 in a function calculation mode of a calculation layer of a service prediction model among multiple service party equipment.
When the model architecture of the business prediction model does not include a server, the business prediction model is distributed in a plurality of business side devices. In this case, data interaction may be performed between a plurality of business side devices. For example, the business side device a and the business side device B perform Data interaction for multiple times by using multi-party security calculation based on the respective owned partial model parameters and the Data to be predicted 1 as the input Data of the model, thereby realizing business prediction.
When the model architecture of the business prediction model includes a server, the server and the plurality of business side devices may each deploy part of the model parameters. That is, some of the model parameters of the plurality of business side devices and some of the model parameters of the server, after assumed combination, constitute all of the model parameters of the business prediction model. In this case, based on the service prediction models in the plurality of service-side devices and the server, service prediction is performed on the Data to be predicted 1 through Data interaction between the plurality of service-side devices and the server by using the MPC, and the service-side device a is made to obtain a prediction result Re1 of the Data to be predicted Data 1. Namely, data interaction is carried out between the business side equipment and the server for many times. For example, the service side device a, the service side device B and the server use Data1 to be predicted as input Data of the model based on respective owned partial model parameters, and perform multiple Data interactions by using multi-party security calculation to realize service prediction.
In order to perform service prediction jointly with other service party devices, the service party a, as an initiator of service prediction, may send a prediction request to other service party devices, where the prediction request is a request initiated for the Data to be predicted 1, and may be used to request other service party devices to perform joint prediction on the Data to be predicted 1. The prediction request can carry an identifier of Data to be predicted 1 and an identifier of a task to be executed for requesting other business side devices to jointly predict Data to be predicted Data 1. The other service-side device is usually in a different network from service-side a, i.e. the prediction request can be transmitted to the other service-side device via the public network. The other service side device refers to a device other than the service side device a among the plurality of service side devices.
Other business side equipment can receive the prediction request of the business side equipment A, and can send feedback information aiming at the prediction request to the business side equipment A when determining that the business side equipment A can execute the joint business prediction task. For example, service device B may send the feedback information to service device a.
And when the service side device A receives the feedback information of other service side devices aiming at the prediction request, service prediction is carried out on the Data1 to be predicted. When the service side device a obtains the prediction result Re1, it may be determined that the joint service prediction task is successful. When the service side device a does not receive the feedback information of other service side devices, the service prediction of the Data to be predicted 1 may be abandoned, that is, the joint service prediction task fails this time.
In one embodiment, when other service side device becomes the initiator of the service prediction, for example, service side device B becomes the initiator of the service prediction, it may send a prediction request to service side device a, where the prediction request is a request initiated by Data to be predicted 2 of service side device B, and may be used to request, for example, service side device a to perform joint prediction on Data to be predicted Data 2. The prediction request can carry an identifier of Data to be predicted 2 and an identifier of a task to be executed for requesting the service party device a to perform joint prediction on the Data to be predicted 2.
The service side device a may receive prediction requests sent by other service side devices, send feedback information for the prediction requests to other service side devices when determining that it can perform the joint service prediction task, and perform service prediction on Data to be predicted (e.g., Data2) of other service side devices (e.g., service side device B) through Data interaction between the plurality of service side devices by using the MPC based on service prediction models in the plurality of service side devices.
When the model architecture of the service prediction model includes a server, the target device of the service party device a that sends the prediction request may include the server, and the target device of the service party device B that sends the prediction request also includes the server, and is used to request the server and a plurality of service party devices to perform joint service prediction on data to be predicted.
The prediction result Re1 is output data of the service prediction model, belongs to the privacy data of the service party a, and is also to-be-monitored data of the monitoring device C.
In step S230, the service device a adds the Data to be predicted 1 and the corresponding prediction result Re1 to the Data set to be counted.
The service side device a may also determine request information related to the prediction request, and add the request information to the data set to be counted. Wherein, the request information may include several of the following: number of requests, number of times prediction was successfully performed, proportion of predictions successfully performed.
The request information may include request information of a prediction request transmitted by the service-side device a and/or request information of a prediction request received by the service-side device a. For example, the request times may include the request times of the predicted request sent by the service side device a, and/or the request times of the predicted request received by the service side device a; the number of times of successfully performing the prediction may include the number of predicted successes initiated by the service party device a, and/or the number of predicted successes initiated by other service party devices; the proportion of successful execution predictions may include the proportion of predicted success initiated by the business party device a and/or the proportion of predicted success initiated by other business party devices. Other information related to the predicted request may also be included in the request information.
In step S240, when the preset statistical condition is satisfied, the service side device a may count the non-statistical data in the data set to be counted to obtain a statistical characteristic St1 not including the private data, the service side device a may send the statistical characteristic St1 to the monitoring device C, and the monitoring device C may receive the statistical characteristic St1 sent by the service side device a.
The steps S210 to S230 may be executed multiple times, so that multiple pieces of data to be predicted and corresponding prediction results may be added to the data set to be counted.
Fig. 3 is a schematic diagram of an implementation flow within a service device. Including an online prediction component and a data statistics component. In the online prediction part, data to be predicted are input into a business prediction model, and a prediction result is obtained through the business prediction model. In the data statistics part, a feature statistics module carries out feature recording on data to be predicted, records a prediction result and sends the obtained statistical features to a network where the monitoring equipment is located. The service party is any service party.
The service side device B may also initiate service prediction, and perform the service prediction process circularly according to the above steps S210 to S230, so that a plurality of pieces of data to be predicted and corresponding prediction results may also be added to the data set to be counted of the service prediction side B. When the preset statistical condition is met, the service side device B may perform statistics on non-statistical data in the data set to be counted, to obtain a statistical characteristic St2 not including the private data, the service side device B may send the statistical characteristic St2 to the monitoring device C, and the monitoring device C may receive the statistical characteristic St2 sent by the service side device B. The statistical conditions of the service side device a and the statistical conditions of the service side device B may be the same or different.
The statistical condition may be that the number of the non-statistical data in the data set to be counted reaches a preset number threshold, or that the time length from the last statistical time of the current time reaches a preset time length threshold. The two conditions may be used alternatively or simultaneously. When the data statistics method is used simultaneously, the two conditions can be in an OR relationship, that is, when the number of the data which is not counted reaches a preset number threshold value, or the time length of the current time from the last counting time reaches a preset time length threshold value, the counting conditions are met, and the counting conditions are not met when the number of the data which is not counted does not reach the preset time length threshold value; or a sum relationship, that is, if the number of the non-counted data reaches the preset number threshold, and the time length between the current time and the last counted time reaches the preset time length threshold, the counting condition is satisfied, and if either one of the two is not reached, the counting condition is not satisfied.
Taking the service side device a as an example, when the service side device a counts the non-statistical data in the to-be-statistical data set, statistics can be performed for different data layers. For example, the statistics may be performed on the Data to be predicted 1 in the Data set to be counted, the statistics may be performed on the prediction result Re1 in the Data set to be counted, or the statistics may be performed on the correspondence between the plurality of Data to be predicted Data1 in the Data set to be counted and the prediction result Re 1. The three statistical methods may be implemented in one way, or two or three statistical methods may be selected from among the three statistical methods.
Wherein, the statistical characteristics may include several of the following: maximum, minimum, mean, median, bucket data, etc. For example, if m1 pieces of data to be predicted are included in the data set to be counted and each piece of data to be predicted includes feature values of k1 feature items, a maximum value, a minimum value, an average value, or a median value may be determined from m1 feature values for each feature item. The maximum value, the minimum value, the average value or the median value can be determined from the m1 prediction results when m1 prediction results exist in the m1 data to be predicted correspondingly. For example, the prediction result indicates the probability that the user is a high-risk user, each data to be predicted corresponds to one probability value, and m1 data to be predicted corresponds to m1 probability values, so that the statistical characteristics can be determined from the m1 probability values. The prediction result may be configured differently according to different specific scenarios.
When the statistical characteristics include barrel data, the non-statistical data in the data set to be statistically counted is counted, a plurality of barrel intervals of the non-statistical data can be obtained, the number of the non-statistical data in the data set to be statistically counted in the plurality of barrel intervals is determined, and the barrel data is obtained.
Specifically, when the non-statistical Data is the Data to be predicted Data1, the bucket division section may be determined for the feature item included in the Data to be predicted 1. When the Data1 to be predicted contains a plurality of feature items, the corresponding sub-bucket interval can be determined for the value of each feature item. For example, when the characteristic item is age, the determined bucket-dividing interval may include 1 to 18 years old, 19 to 30 years old, 31 to 45 years old, 46 to 60 years old, and 60 to a preset maximum age max. Therefore, the number of the Data1 to be predicted in each bucket-dividing interval can be counted to obtain the bucket-dividing Data.
When the non-statistical data is the prediction result Re1, the bucket division interval can be determined according to the value of the prediction result Re 1. For example, when the prediction result is a probability value, the value of the probability value is from 0 to 1, and the bucket-dividing intervals may include 0 to 0.2, 0.3 to 0.6, and 0.7 to 1. Therefore, the number of the prediction results in each barrel interval can be counted from the data set to be counted, and barrel data can be obtained. Since the data to be predicted corresponds to the prediction result, the bucket data also represents the number of data to be predicted in each bucket interval. For example, when the data to be predicted represents user data, the bucket data represents the number of users in each bucket interval.
When the non-statistical data is the above correspondence, the bucket division interval may be determined for the feature item included in the data to be predicted. The method for determining the bucket dividing interval may be consistent with the bucket dividing method when the non-statistical data is the data to be predicted. When determining the bucket dividing data, the number of prediction results in a plurality of bucket dividing intervals can be determined based on the plurality of corresponding relations, and the bucket dividing data is obtained.
The characteristic item is taken as the age, and the continuous age values are divided into the following 5 barrel intervals: for example, the age is 1-18 years, 19-30 years, 31-45 years, 46-60 years, and 60-a predetermined maximum age max. Assuming that there are m1 correspondences between data to be predicted and prediction results, where the m1 correspondences include m1 data to be predicted and m1 prediction results, and the prediction results include two types, i.e., class 1 (e.g., high risk) and class 2 (e.g., low risk), then it is possible to determine, according to the user age values in the m1 data to be predicted, bucket-partitioned intervals to which the m1 data to be predicted belong, and then, according to the correspondences, determine the distribution numbers of the class 1 and the class 2 in the assigned bucket-partitioned intervals. That is, the number of users belonging to class 1 and the number of users belonging to class 2 in each bucket division can be obtained, which is an expression form of the bucket data of the prediction result.
When determining the sub-bucket intervals, the service side device a may obtain a sub-bucket division point for a certain feature item in the Data to be predicted Data1, obtain a plurality of sub-bucket intervals based on the sub-bucket division point, or obtain a sub-bucket division point for a value of the prediction result Re1, and obtain a plurality of sub-bucket intervals based on the sub-bucket division point. The partitioning point of the feature item may be determined based on feature values of the feature item in the accumulated data to be predicted, or may be set empirically. The partitioning point of the prediction result may be determined based on the accumulated values of the prediction result, or may be set empirically.
After the statistical characteristics not containing the private data are obtained, the service side device a may obtain historical statistical characteristics, and determine whether the joint service prediction is abnormal based on a comparison between the historical statistical characteristics and the current statistical characteristics. The historical statistical characteristics can be historical maximum values, historical minimum values, historical average values, historical median values or historical bucket dividing data, and the statistical characteristics at this time can also be the maximum values at this time, the minimum values at this time, the average values at this time, the median values at this time or the bucket dividing data at this time.
For the historical maximum value and the current maximum value, the historical minimum value and the current minimum value, the historical average value and the current average value, and the comparison between the historical median value and the current median value, the comparison is between one numerical value and the other numerical value, and the comparison mode can be difference calculation, proportion calculation or percentage calculation of variation and the like. Then, the obtained difference, ratio or percentage of variation may be compared with a corresponding preset threshold, and when the difference, ratio or percentage of variation is greater than the corresponding preset threshold, it is determined that an abnormal condition exists in the joint service prediction.
In another embodiment, when comparing the historical statistical features with the current statistical features, the comparison result may be determined based on the logarithm of the difference multiplied by the product of the historical statistical features H1 and the current statistical features H2, and may be determined, for example, by the following formula:
(H1-H2)·ln(H1·H2)
the data of the buckets is distribution data, and can be represented by vectors. For the comparison between the historical data and the current data, the historical data and the corresponding vector elements in the current data may be compared, for example, the comparison result of each vector element may be determined by means of difference calculation, proportion calculation, or percentage calculation of a variation, and the comparison result of the historical data and the current data may be obtained by summing all the comparison results. And then, comparing the comparison result with a preset threshold, and determining that an abnormal condition exists in the joint service prediction when the comparison result is greater than the preset threshold. The above-mentioned summation of all comparison results includes weighted summation and unweighted summation.
In one embodiment, when comparing the historical sub-bucket data with the corresponding vector elements in the current sub-bucket data, the following formula may also be used to obtain the comparison result:
∑i[(H1i-H2i)·ln(H1i·H2i)]
where H1i is the ith element in the history data, H2i is the ith element in the current data, and the summation symbol sums i.
In an embodiment, the PSI value of the historical statistical feature and the PSI value of the current statistical feature may be determined by using a Population Stability Index (PSI), and the PSI value may be determined as the comparison result.
In one implementation scenario, the data to be predicted is user data, and the data is statistically binned for age items. Fig. 4 is a schematic diagram of comparing historical data with current data. Wherein, 5 age sub-barrel intervals are respectively displayed on a horizontal coordinate axis, and a vertical coordinate axis represents the number of users. In each age bucket interval, the number of users participating in service prediction can be recorded. The data in the broken line frame represents the current data, and the data in the solid line frame represents the historical data. When comparing the historical sub-bucket data with the current sub-bucket data, the absolute values of the differences between 2 and 5, 11 and 16, 19 and 14, 12 and 10, 2 and 3 can be obtained and summed to obtain the comparison result.
In an embodiment, the historical statistical characteristics may be multiple ones, and the current statistical characteristics are compared with the historical statistical characteristics respectively to obtain comparison results, so that the comparison is more comprehensive. Alternatively, the historical statistical features may be obtained by fusing a plurality of accumulated statistical features.
And comparing the historical statistical characteristics with the statistical characteristics of the time, so that abnormal conditions in the joint service prediction can be found. For example, when the business prediction model is a medical prediction model for predicting whether a user is suitable for participating in a certain medical insurance, the data to be predicted is user data including characteristic items of the user, such as age, gender, previous diseases, blood pressure value, blood lipid value, current state, etc., the prediction result is a probability value suitable for the user to participate in the medical insurance, and the statistical period is 1 week. When the ages in the user data are counted, the age classification data of the week can be compared with the age classification data of the last week, and if the comparison result exceeds a preset threshold, the medical insurance prediction of the week can be considered to be abnormal.
When the business prediction model is a risk prediction model for predicting whether the user is a high-risk user in the financial field, the data to be predicted is user data which comprises the age, the gender, the overdue times of repayment, consumption records and the like of the user, the prediction result is the probability value of the user belonging to the high-risk user, and the statistical period is 1 day. For example, statistics may be performed on the number of overdue payments in the user data, and when a comparison result between the historical statistical characteristics and the current statistical characteristics exceeds a preset threshold, it may be determined that the risk prediction of a certain day is abnormal.
Step S250, the monitoring device C receives the statistical characteristics sent by the multiple service side devices, and performs joint processing on the statistical characteristics sent by the multiple service side devices. For example, when the monitoring device C receives the statistical characteristic St1 sent by the service side device a and the statistical characteristic St2 sent by the service side device B, the statistical characteristics St1 and St2 may be jointly processed to obtain a joint processing result.
For example, the monitoring device may aggregate the statistical characteristics of multiple business side devices, resulting in a larger granularity of statistical characteristics than that of a single business side device. For example, when the statistical characteristic is a maximum value (or a minimum value, a median value) of the data to be predicted, the monitoring device may determine an aggregated maximum value (or an aggregated minimum value, an aggregated median value) from the unilaterally maximum values (or the unilaterally minimum values, the unilaterally median values) transmitted by the multiple traffic side devices. Likewise, the monitoring device may determine the aggregate average based on the unilateral average sent by the multiple traffic side devices.
In another embodiment, the monitoring device may perform joint processing in a PSI indicator manner based on the bucket data of multiple service side devices to obtain an overall model stability value. For example, the following formula can be used for calculation:
WPSI=∑i[(Ai-Bi)·ln(Ai·Bi)]
wherein, WPSIFor the overall model stability value, i represents the ith bucket division in a plurality of bucket division intervals, Ai is the sum of the current bucket division data of a plurality of business side devices, Bi is the sum of the historical bucket division data of the business side devices, a summation symbol sigma sums i, and ln is a natural logarithm.
And the monitoring equipment C obtains a joint processing result after joint processing is carried out on the statistical characteristics of the plurality of business side equipment, and can determine a correction scheme aiming at the business prediction models in the plurality of business side equipment by utilizing related data in the joint processing result. When the joint processing result contains data indicating inaccurate prediction results, a modification scheme for the business prediction model can be determined according to the data. The monitoring device C can grasp the accuracy of the prediction result as a whole, and determine an appropriate correction scheme based on joint analysis of a plurality of business side devices.
In this embodiment, the monitoring device may obtain statistical characteristics of the parameter output and parameter input of the model within the aggregation time interval, so as to implement weak real-time monitoring on the service prediction service. Meanwhile, the embodiment can realize monitoring of various statistical dimensions on the service data, realize weak real-time monitoring on the operation condition of the model on the premise of protecting the security of the private data of the client, and better balance the dual requirements of data security and data monitoring.
In the execution process of this embodiment, a plurality of service side devices may be respectively used as the service prediction initiator, and the time for initiating the service prediction is random. The arrangement of the steps in this embodiment is only a logical order, and does not limit the specific execution order. For example, steps S210 to S230 may be performed a plurality of times, step S240 may be performed at intervals, step S250 may be performed after receiving statistical characteristics sent by a plurality of service devices, and so on.
Fig. 5 is another flowchart of a method for data monitoring for multi-party federated business prediction according to an embodiment. The method monitors data of joint service prediction by using a multi-party security computing MPC through monitoring equipment and a plurality of service party equipment. The method comprises the following steps:
step S510, any service side device obtains data to be predicted, which contains privacy data, of the service side device; based on a service prediction model in a plurality of service side devices, performing service prediction on data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain prediction results aiming at the data to be predicted; adding data to be predicted and a corresponding prediction result to a data set to be counted; when the preset statistical conditions are met, counting the non-statistical data in the data set to be counted to obtain the statistical characteristics without privacy data; and sending the statistical characteristics to the monitoring equipment.
Step S520, the monitoring device receives the statistical characteristics sent by the multiple service side devices, and performs joint processing on the statistical characteristics of the multiple service side devices.
The embodiment of fig. 5 is obtained based on the embodiment of fig. 2, and the implementation and description thereof are the same as those of the embodiment of fig. 2, and reference may be made to the description of fig. 2.
The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 6 is a schematic block diagram of an apparatus for data monitoring for multi-party federated business prediction according to an embodiment. In this embodiment, data monitoring is performed on joint service prediction using MPC by using a monitoring device and a plurality of service side devices, and the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2. The device is deployed in any service side equipment, and comprises:
an obtaining module 610 configured to obtain data to be predicted of the business side device, which includes private data;
the prediction module 620 is configured to perform service prediction on the data to be predicted through data interaction between multiple service side devices by using the MPC based on service prediction models in the multiple service side devices, so that the service side devices obtain prediction results for the data to be predicted;
an adding module 630, configured to add the data to be predicted and the corresponding prediction result to a data set to be counted;
the statistical module 640 is configured to, when a preset statistical condition is met, perform statistics on non-statistical data in the data set to be counted to obtain a statistical characteristic that does not include private data;
the sending module 650 is configured to send the statistical characteristics to the monitoring device, so that the monitoring device performs joint processing on the statistical characteristics sent by multiple service side devices.
In one embodiment, the statistical conditions include several of the following:
the number of data in the data set to be counted reaches a preset number threshold;
the time length between the current time and the last statistical time reaches a preset time length threshold value.
In one embodiment, the statistics module 640 is specifically configured to:
selecting several ways of the following modes for statistics:
counting the data to be predicted in the data set to be counted;
counting the prediction results in the data set to be counted;
and counting the corresponding relation between the plurality of data to be predicted in the data set to be counted and the prediction result.
In one embodiment, the statistical features include several of the following: maximum, minimum, mean, median, bucket data.
In one embodiment, the statistical features include bucketized data; the statistics module 640 is specifically configured to:
acquiring a plurality of barrel intervals of non-statistical data;
determining the number of the non-statistical data in the data set to be counted in a plurality of barrel-dividing intervals to obtain barrel-dividing data;
wherein, when statistics module 640 obtains a plurality of bucket intervals of not counting data, include:
when the non-statistical data are the data to be predicted, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
when the non-statistical data is the prediction result, determining a barrel division interval according to the value of the prediction result;
when the non-statistical data is the corresponding relation, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
the counting module 640, when determining the number of the non-statistical data in the data set to be counted in a plurality of bucket-dividing intervals, includes:
and when the non-statistical data is the corresponding relation, determining the number of the prediction results in a plurality of barrel-dividing intervals based on the corresponding relations to obtain barrel-dividing data.
In one embodiment, the apparatus further comprises an exception module (not shown) configured to:
after the statistical characteristics which do not contain the privacy data are obtained, acquiring historical statistical characteristics;
and determining whether the joint service prediction has an abnormal condition or not based on the comparison of the historical statistical characteristics and the statistical characteristics.
In one embodiment, the prediction module 620 is specifically configured to:
sending a prediction request to other business side equipment;
and when receiving feedback information of the other business side equipment aiming at the prediction request, performing business prediction on the data to be predicted.
In one embodiment, the prediction module 620 is further configured to:
receiving prediction requests sent by other business side equipment;
and sending feedback information aiming at the prediction request to other business side equipment, and performing business prediction on data to be predicted of other business side equipment through data interaction between the business side equipment by using the MPC on the basis of business prediction models in the business side equipment.
In one embodiment, the apparatus further comprises a determining module (not shown in the figure) configured to:
and determining request information related to a prediction request and adding the request information to the data set to be counted.
In one embodiment, the request information includes several of the following: number of requests, number of times prediction was successfully performed, proportion of predictions successfully performed.
In one embodiment, a plurality of business side devices comprise part of model parameters of the business prediction model; part of model parameters of a plurality of business side devices form all model parameters of the business prediction model after assumed combination;
or, the partial model parameters of the multiple business side devices and the partial model parameters of the server form all the model parameters of the business prediction model after assumed combination; the prediction module 620 is specifically configured to perform service prediction on the data to be predicted through data interaction between the multiple service party devices and the server by using an MPC based on the service prediction models in the multiple service party devices and the server.
The embodiment of fig. 7 provides another schematic block diagram of an apparatus for data monitoring for multi-party federated business prediction. In the embodiment, data monitoring is performed on joint service prediction by using a multi-party security computing MPC through a monitoring device and a plurality of service party devices. This embodiment of the device corresponds to the embodiment of the method shown in fig. 2. The apparatus 700 is deployed in a monitoring device, and comprises:
a receiving module 710 configured to receive statistical characteristics sent by multiple service side devices, where the statistical characteristics are obtained by performing statistics on data to be predicted and corresponding prediction results, and do not include privacy data; the prediction result is obtained by performing service prediction on the data to be predicted based on data interaction between a plurality of service side devices by using the MPC and respective service prediction models;
the processing module 720 is configured to perform joint processing on the statistical characteristics of the multiple service side devices to obtain a joint processing result.
In one embodiment, the apparatus 700 further includes a modification module (not shown in the figure) configured to, after performing joint processing on the statistical characteristics of the multiple business side devices, determine a modification scheme for the business prediction models in the multiple business side devices by using relevant data in a result of the joint processing.
The above device embodiments correspond to the method embodiments, and for specific description, reference may be made to the description of the method embodiments, which is not described herein again. The device embodiment is obtained based on the corresponding method embodiment, has the same technical effect as the corresponding method embodiment, and for the specific description, reference may be made to the corresponding method embodiment.
The embodiment of fig. 8 provides a schematic block diagram of a system for data monitoring for multi-party federated business prediction. The system 800 comprises a monitoring device 810 and a plurality of service side devices 820, wherein the monitoring device 810 and the plurality of service side devices 820 perform data monitoring on joint service prediction by using a multi-party security computing MPC;
any one of the service side devices 820 is configured to obtain data to be predicted of the service side device 820, where the data to be predicted includes private data; based on the service prediction models in the service party devices 820, performing service prediction on the data to be predicted through data interaction between the service party devices 820 by using the MPC, so that the service party device 820 obtains a prediction result for the data to be predicted; adding the data to be predicted and the corresponding prediction result to a data set to be counted; when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data; sending the statistical characteristics to the monitoring device 810;
the monitoring device 810 is configured to receive the statistical characteristics sent by the multiple service side devices 820, and perform joint processing on the statistical characteristics of the multiple service side devices 820.
Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 5.
The present specification also provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 5.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and is not intended to limit the scope of the present invention, and any modification, equivalent replacement, or improvement made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (22)

1. A method for data monitoring aiming at multi-party joint service prediction, which is implemented by any service side device and comprises the following steps that data monitoring is carried out on joint service prediction by utilizing a multi-party security computation MPC through a monitoring device and a plurality of service side devices:
acquiring data to be predicted, which contains privacy data, of the business side equipment;
based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted;
adding the data to be predicted and the corresponding prediction result to a data set to be counted;
when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data;
and sending the statistical characteristics to the monitoring equipment so that the monitoring equipment performs joint processing on the statistical characteristics sent by the plurality of business side equipment.
2. The method of claim 1, the statistical conditions comprising several of:
the number of data in the data set to be counted reaches a preset number threshold;
the time length between the current time and the last statistical time reaches a preset time length threshold value.
3. The method according to claim 1, wherein the step of counting the non-statistical data in the data set to be counted comprises:
selecting several ways of the following modes for statistics:
counting the data to be predicted in the data set to be counted;
counting the prediction results in the data set to be counted;
and counting the corresponding relation between the plurality of data to be predicted in the data set to be counted and the prediction result.
4. The method of claim 1 or 3, the statistical features comprising several of: maximum, minimum, mean, median, bucket data.
5. The method of claim 3, the statistical features comprising bucketized data; the step of counting the non-statistical data in the data set to be counted comprises the following steps:
acquiring a plurality of barrel intervals of non-statistical data;
determining the number of the non-statistical data in the data set to be counted in a plurality of barrel-dividing intervals to obtain barrel-dividing data;
wherein, the step of obtaining a plurality of barrel intervals of the non-statistical data comprises:
when the non-statistical data are the data to be predicted, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
when the non-statistical data is the prediction result, determining a barrel division interval according to the value of the prediction result;
when the non-statistical data are the corresponding relation, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
the step of determining the number of the non-statistical data in the data set to be counted in the plurality of bucket-dividing intervals comprises:
and when the non-statistical data is the corresponding relation, determining the number of the prediction results in a plurality of barrel-dividing intervals based on the corresponding relations to obtain barrel-dividing data.
6. The method of claim 1, after obtaining the statistical features that do not contain private data, further comprising:
acquiring historical statistical characteristics;
and determining whether the joint service prediction has an abnormal condition or not based on the comparison between the historical statistical characteristics and the statistical characteristics.
7. The method of claim 1, wherein the step of traffic prediction of the data to be predicted comprises:
sending a prediction request to other business side equipment;
and when receiving feedback information of the other business side equipment aiming at the prediction request, performing business prediction on the data to be predicted.
8. The method of claim 7, further comprising:
receiving prediction requests sent by other business side equipment;
and sending feedback information aiming at the prediction request to other business side equipment, and performing business prediction on data to be predicted of other business side equipment through data interaction between the business side equipment by using the MPC on the basis of business prediction models in the business side equipment.
9. The method of claim 7 or 8, further comprising:
and determining request information related to a prediction request and adding the request information to the data set to be counted.
10. The method of claim 9, the request information comprising several of: number of requests, number of times prediction was successfully performed, proportion of predictions successfully performed.
11. The method of claim 1, including partial model parameters of the business prediction model in a plurality of business-party devices; part of model parameters of a plurality of business side devices form all model parameters of the business prediction model after assumed combination;
or, the partial model parameters of the multiple business side devices and the partial model parameters of the server form all the model parameters of the business prediction model after assumed combination; the step of predicting the service of the data to be predicted comprises the following steps: and performing service prediction on the data to be predicted through data interaction between the plurality of service party devices and the server by using the MPC on the basis of the plurality of service party devices and the service prediction model in the server.
12. A method for data monitoring for multi-party federated business prediction, the federated business prediction using a multi-party security computing, MPC, being monitored by a monitoring device and a plurality of business party devices, the method being performed by the monitoring device, comprising:
receiving statistical characteristics sent by a plurality of business side devices, wherein the statistical characteristics are obtained by counting data to be predicted and corresponding prediction results, and the statistical characteristics do not contain privacy data; the prediction result is obtained by performing service prediction on the data to be predicted based on data interaction between a plurality of service side devices by using the MPC and respective service prediction models;
and performing joint processing on the statistical characteristics of the plurality of business side devices to obtain a joint processing result.
13. A method for carrying on data monitoring to predicting the business of uniting to many parties, to utilizing many parties to calculate MPC uniting business to predict and carry on data monitoring through supervisory equipment and multiple business side apparatus;
any one service side device acquires data to be predicted, which contains privacy data, of the service side device; based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted; adding the data to be predicted and the corresponding prediction result to a data set to be counted; when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data; sending the statistical characteristics to the monitoring device;
the monitoring device receives the statistical characteristics sent by the plurality of business side devices and performs combined processing on the statistical characteristics of the plurality of business side devices.
14. The method of claim 13, further comprising:
after the statistical characteristics of the plurality of business side devices are subjected to joint processing, a correction scheme aiming at the business prediction models in the plurality of business side devices is determined by utilizing relevant data in a joint processing result.
15. A data monitoring device aiming at multi-party joint service prediction, which monitors data of joint service prediction by using a multi-party security computation MPC through a monitoring device and a plurality of service party devices, wherein the device is deployed in any one service party device and comprises:
the acquisition module is configured to acquire to-be-predicted data containing privacy data of the business side equipment;
the prediction module is configured to perform service prediction on the data to be predicted through data interaction between a plurality of service party devices by using the MPC based on service prediction models in the service party devices, so that the service party devices obtain prediction results aiming at the data to be predicted;
the adding module is configured to add the data to be predicted and the corresponding prediction result to a data set to be counted;
the statistical module is configured to count the non-statistical data in the data set to be counted when a preset statistical condition is met, so as to obtain a statistical characteristic without privacy data;
and the sending module is configured to send the statistical characteristics to the monitoring equipment so that the monitoring equipment performs joint processing on the statistical characteristics sent by the plurality of business side equipment.
16. The apparatus of claim 15, the statistics module being specifically configured to:
selecting several ways of the following modes for statistics:
counting the data to be predicted in the data set to be counted;
counting the prediction results in the data set to be counted;
and counting the corresponding relation between the plurality of data to be predicted in the data set to be counted and the prediction result.
17. The apparatus of claim 15 or 16, the statistical features comprising several of: maximum, minimum, mean, median, bucket data.
18. The apparatus of claim 16, the statistical features comprising bucketing data; the statistical module is specifically configured to:
acquiring a plurality of barrel intervals of non-statistical data;
determining the number of the non-statistical data in the data set to be counted in a plurality of barrel-dividing intervals to obtain barrel-dividing data;
wherein, when the statistics module obtains a plurality of bucket intervals of not counting data, include:
when the non-statistical data are the data to be predicted, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
when the non-statistical data is the prediction result, determining a barrel division interval according to the value of the prediction result;
when the non-statistical data are the corresponding relation, determining a barreled interval aiming at the characteristic items contained in the data to be predicted;
the counting module, when determining the number of the non-statistical data in the data set to be counted in a plurality of sub-bucket intervals, comprises:
and when the non-statistical data is the corresponding relation, determining the number of the prediction results in a plurality of barrel-dividing intervals based on the corresponding relations to obtain barrel-dividing data.
19. An apparatus for data monitoring aiming at multi-party joint service prediction, which performs data monitoring on joint service prediction by using a multi-party security computing (MPC) through a monitoring device and a plurality of service party devices, the apparatus being deployed in the monitoring device, and comprising:
the receiving module is configured to receive statistical characteristics sent by a plurality of business side devices, wherein the statistical characteristics are obtained by carrying out statistics on data to be predicted and corresponding prediction results, and the statistical characteristics do not contain privacy data; the prediction result is obtained by performing service prediction on the data to be predicted based on data interaction between a plurality of service side devices by using the MPC and respective service prediction models;
and the processing module is configured to perform joint processing on the statistical characteristics of the plurality of business side devices to obtain a joint processing result.
20. A system for carrying out data monitoring aiming at multi-party combined service prediction comprises a monitoring device and a plurality of service party devices, wherein the monitoring device and the plurality of service party devices carry out data monitoring on the combined service prediction by utilizing a multi-party security calculation MPC;
any service side device is used for acquiring data to be predicted, which contains privacy data, of the service side device; based on a service prediction model in a plurality of service side devices, performing service prediction on the data to be predicted through data interaction between the plurality of service side devices by using an MPC (multimedia personal computer), so that the service side devices obtain a prediction result aiming at the data to be predicted; adding the data to be predicted and the corresponding prediction result to a data set to be counted; when a preset statistical condition is met, counting the non-statistical data in the data set to be counted to obtain a statistical characteristic without privacy data; sending the statistical characteristics to the monitoring device;
the monitoring device is used for receiving the statistical characteristics sent by the plurality of business side devices and carrying out combined processing on the statistical characteristics of the plurality of business side devices.
21. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-14.
22. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-14.
CN202110830690.3A 2021-07-22 2021-07-22 Method and device for data monitoring aiming at multi-party combined service prediction Active CN113377625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830690.3A CN113377625B (en) 2021-07-22 2021-07-22 Method and device for data monitoring aiming at multi-party combined service prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830690.3A CN113377625B (en) 2021-07-22 2021-07-22 Method and device for data monitoring aiming at multi-party combined service prediction

Publications (2)

Publication Number Publication Date
CN113377625A CN113377625A (en) 2021-09-10
CN113377625B true CN113377625B (en) 2022-05-17

Family

ID=77582725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830690.3A Active CN113377625B (en) 2021-07-22 2021-07-22 Method and device for data monitoring aiming at multi-party combined service prediction

Country Status (1)

Country Link
CN (1) CN113377625B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756895B (en) * 2022-06-16 2022-08-26 深圳市洞见智慧科技有限公司 Hidden trace data verification method and system based on homomorphic encryption

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942147A (en) * 2019-11-28 2020-03-31 支付宝(杭州)信息技术有限公司 Neural network model training and predicting method and device based on multi-party safety calculation
CN112084520A (en) * 2020-09-18 2020-12-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy through joint training of two parties
CN112148801A (en) * 2020-11-24 2020-12-29 支付宝(杭州)信息技术有限公司 Method and device for predicting business object by combining multiple parties for protecting data privacy
CN112199706A (en) * 2020-10-26 2021-01-08 支付宝(杭州)信息技术有限公司 Tree model training method and business prediction method based on multi-party safety calculation
CN112241549A (en) * 2020-05-26 2021-01-19 中国银联股份有限公司 Secure privacy calculation method, server, system, and storage medium
EP3779752A1 (en) * 2018-08-14 2021-02-17 Advanced New Technologies Co., Ltd. Secure multi-party computation method and apparatus, and electronic device
CN112560085A (en) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 Privacy protection method and device of business prediction model
CN112883387A (en) * 2021-01-29 2021-06-01 南京航空航天大学 Privacy protection method for machine-learning-oriented whole process

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528760B2 (en) * 2017-08-03 2020-01-07 Hrl Laboratories, Llc Privacy-preserving multi-client and cloud computation with application to secure navigation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3779752A1 (en) * 2018-08-14 2021-02-17 Advanced New Technologies Co., Ltd. Secure multi-party computation method and apparatus, and electronic device
CN110942147A (en) * 2019-11-28 2020-03-31 支付宝(杭州)信息技术有限公司 Neural network model training and predicting method and device based on multi-party safety calculation
CN112241549A (en) * 2020-05-26 2021-01-19 中国银联股份有限公司 Secure privacy calculation method, server, system, and storage medium
CN112084520A (en) * 2020-09-18 2020-12-15 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy through joint training of two parties
CN112199706A (en) * 2020-10-26 2021-01-08 支付宝(杭州)信息技术有限公司 Tree model training method and business prediction method based on multi-party safety calculation
CN112148801A (en) * 2020-11-24 2020-12-29 支付宝(杭州)信息技术有限公司 Method and device for predicting business object by combining multiple parties for protecting data privacy
CN112560085A (en) * 2020-12-10 2021-03-26 支付宝(杭州)信息技术有限公司 Privacy protection method and device of business prediction model
CN112883387A (en) * 2021-01-29 2021-06-01 南京航空航天大学 Privacy protection method for machine-learning-oriented whole process

Also Published As

Publication number Publication date
CN113377625A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US11790216B2 (en) Predicting likelihoods of conditions being satisfied using recurrent neural networks
US10402721B2 (en) Identifying predictive health events in temporal sequences using recurrent neural network
Perifanis et al. Federated neural collaborative filtering
Fan et al. Improving fairness for data valuation in horizontal federated learning
Xiao et al. Joint modeling of event sequence and time series with attentional twin recurrent neural networks
US20170032241A1 (en) Analyzing health events using recurrent neural networks
Wink et al. An approach for peer-to-peer federated learning
Gao et al. FGFL: A blockchain-based fair incentive governor for Federated Learning
Alamleh et al. Federated learning for IoMT applications: A standardization and benchmarking framework of intrusion detection systems
CN113377625B (en) Method and device for data monitoring aiming at multi-party combined service prediction
US20230262491A1 (en) System and method for reducing system performance degradation due to excess traffic
Bozorgi et al. Prescriptive process monitoring based on causal effect estimation
US20220351209A1 (en) Automated fraud monitoring and trigger-system for detecting unusual patterns associated with fraudulent activity, and corresponding method thereof
Zheng et al. Modeling the dynamic trust of online service providers using HMM
Padella et al. Explainable process prescriptive analytics
US11301879B2 (en) Systems and methods for quantifying customer engagement
CN111833078A (en) Block chain based recommendation method, device, medium and electronic equipment
Muazu et al. A federated learning system with data fusion for healthcare using multi-party computation and additive secret sharing
US20230307136A1 (en) Risk assessment systems and methods for predicting and reducing negative health outcomes associated with social determinants of health
Kumar et al. Review on Social Network Trust With Respect To Big Data Analytics
Rahmatikargar et al. Social Isolation Detection in Palliative Care Using Social Network Analysis
Isah et al. Detection of a Real-time Cyber-attack using Locator Agent Algorithm
Vallarino Analyzing Economic Convergence Across the Americas: A Survival Analysis Approach to GDP per Capita Trajectories
Muchahari et al. Mmh: An effective clustering algorithm for trustworthy cloud service provider selection
Ochuko E-banking operational risk assessment. A soft computing approach in the context of the Nigerian banking industry.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant