CN112580106A - Multi-source data processing system and multi-source data processing method - Google Patents

Multi-source data processing system and multi-source data processing method Download PDF

Info

Publication number
CN112580106A
CN112580106A CN202110103428.9A CN202110103428A CN112580106A CN 112580106 A CN112580106 A CN 112580106A CN 202110103428 A CN202110103428 A CN 202110103428A CN 112580106 A CN112580106 A CN 112580106A
Authority
CN
China
Prior art keywords
attribute set
clients
execution
client
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110103428.9A
Other languages
Chinese (zh)
Inventor
任静涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
E Capital Transfer Co ltd
Original Assignee
E Capital Transfer Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by E Capital Transfer Co ltd filed Critical E Capital Transfer Co ltd
Priority to CN202110103428.9A priority Critical patent/CN112580106A/en
Publication of CN112580106A publication Critical patent/CN112580106A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols

Abstract

The invention relates to a multi-source data processing method and a system thereof. The method comprises the following steps: mixing a redundant attribute set in the key attribute set as an initial attribute set; carrying out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format; selecting two or more clients as simulation execution clients, simulating the simulation execution clients to execute the attribute set to be processed and obtaining a simulation completion attribute set; sending the simulation completion attribute set serving as noise and the attribute set to be processed to the rest of the plurality of clients except the simulation execution client, executing the attribute set to be processed by the rest of the clients and obtaining an execution completion attribute set; and inputting the execution completion attribute set into a decision analysis model for calculation and analysis to obtain a decision analysis result. According to the invention, on the basis of protecting the privacy information of a plurality of clients and the privacy of the decision analysis model of the server, the information from the plurality of clients can be integrated to obtain the decision analysis result.

Description

Multi-source data processing system and multi-source data processing method
Technical Field
The invention relates to computer technology, in particular to a multi-source data processing system and a multi-source data processing method for data processing of multi-source data (namely data from a plurality of clients).
Background
In the security industry cloud service environment, sometimes private data information of multiple dealer organizations needs to be utilized for comprehensive data processing, so that a method for analyzing and processing data on the premise of protecting the private data of each dealer organization needs to be provided.
On the other hand, cloud service providers generally use their own decision analysis models to perform relevant data analysis and processing, and in such a case, it is also necessary to provide a data analysis and processing method that does not reveal the decision analysis models of the cloud service providers.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a multi-source data processing system and a multi-source data processing method capable of processing multi-source data without revealing private privacy information from multiple sources (i.e., clients) and at the same time without revealing privacy information of a server.
The invention also provides a multi-source data processing method, which is characterized by being realized by a server and a plurality of clients, and the method comprises the following steps:
a redundancy adding step, in which the server side mixes a redundancy attribute set in the key attribute set as an initial attribute set;
format conversion, namely, the server performs format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
a simulation execution step, in which the server selects two or more clients as simulation execution clients based on a first random algorithm, and simulates the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a real execution step, in which the server side sends the simulation completion attribute set as noise together with the attribute set to be processed to the rest clients except the simulation execution client side in the plurality of clients, and the rest clients execute the attribute set to be processed and obtain an execution completion attribute set; and
and analyzing and deciding, namely inputting the execution completion attribute set into a decision analysis model by the server side for calculation and analysis to obtain a decision analysis result.
Optionally, the decision analysis model performs gradual judgment according to the attribute of the execution completion attribute set, and finally obtains a decision result.
Optionally, a unique client identifier is preset for each client,
in the simulation execution step, two or more clients are selected as simulation execution clients from the client identification numbers of the plurality of clients by a first random algorithm.
Optionally, the first random algorithm comprises any one of:
numerical probability algorithms, the Las Vegas algorithm, the Monte Carlo algorithm, and the Skiwood algorithm.
Optionally, in the format conversion step, the following format conversion is performed on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
Optionally, the step of actually performing comprises the sub-steps of:
substep 1: the server side selects one client side from the rest client sides except the simulation execution client side in the plurality of client sides by adopting a second random algorithm;
substep 2: sending the attribute set to be processed and the simulation attribute set to the selected client;
substep 3: the selected client executes the attribute set to be processed, adds the execution result to the attribute set to be processed and returns the result to the server;
substep 4: and the server repeatedly executes the substeps 1-3 until the rest clients all execute the attribute set to be processed and obtain an execution completion attribute set.
The multi-source data processing system of the present invention is characterized by comprising: a service end and a plurality of client ends,
wherein, the server side includes:
the redundancy adding module is used for mixing a redundancy attribute set in the key attribute set as an initial attribute set;
the format conversion module is used for carrying out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
the simulation execution module is used for selecting two or more clients as simulation execution clients based on a first random algorithm, simulating the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a first communication module, communicatively connected to the client, configured to send the simulation completion attribute set as noise together with the to-be-processed attribute set to remaining clients, except the simulation execution client, among the plurality of clients, and configured to accept a returned execution completion attribute set; and
an analysis decision module for inputting the execution completion attribute set into a decision analysis model for calculation analysis and obtaining a decision analysis result,
optionally, the decision analysis model performs gradual judgment according to the attribute of the execution completion attribute set, and finally obtains a decision result.
Wherein the client comprises:
the second communication module is in communication connection with the server and is used for receiving the simulation completion attribute set and the attribute set to be processed from the server and returning an execution completion attribute set obtained by the execution module to the server; and
and the execution module is used for executing the attribute set to be processed and obtaining an execution completion attribute set.
Optionally, a unique client identifier is preset for each client,
the simulation execution module selects two or more clients as simulation execution clients from the client identification numbers of the plurality of clients through a first random algorithm.
Optionally, the first random algorithm comprises any one of:
numerical probability algorithms, the Las Vegas algorithm, the Monte Carlo algorithm, and the Skiwood algorithm.
Optionally, the format conversion module performs the following format conversion on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
The server of the present invention is a server for communicating with a plurality of clients, and includes:
the redundancy adding module is used for mixing a redundancy attribute set in the key attribute set as an initial attribute set;
the format conversion module is used for carrying out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
the simulation execution module is used for selecting two or more clients from the plurality of clients as simulation execution clients based on a first random algorithm, simulating the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a first communication module, configured to send the simulation completion attribute set as noise together with the to-be-processed attribute set to remaining clients other than the simulation execution client among the plurality of clients and to accept a returned execution completion attribute set; and
and the analysis decision module is used for inputting the execution completion attribute set into a decision analysis model for calculation analysis and obtaining a decision analysis result.
The computer-readable medium of the present invention, on which a computer program is stored, is characterized in that,
the computer program, when executed by a processor, implements the multi-source data processing method described above.
The computer device of the present invention includes a storage module, a processor, and a computer program stored on the storage module and executable on the processor, and is characterized in that the processor implements the above-mentioned multi-source data processing method when executing the computer program.
As described above, according to the multi-source data processing system and the multi-source data processing method of the present invention, a protection policy for protecting privacy of private data of a client can be provided, and a part of redundant attribute sets are added to a key attribute set input by a decision analysis model, and converted into a question set with a yes or no answer, and transmitted to the client in service in a random order, the client updates the answer of the question set according to its own privacy data, and the last completed client transmits the answer to the server to complete attribute collection for the same object. Therefore, the key attribute information of all the clients about the analysis object is obtained, the privacy of the information can be ensured, and the key attribute is prevented from being leaked. Moreover, the strategy for protecting the decision analysis model can be provided by the server, each client can not acquire the decision analysis model of the server, and privacy protection is provided for the server.
Moreover, according to the multi-source data processing system and the multi-source data processing method of the present invention, the client of the selected client identifier is used as the simulation execution client, so that other real data can be concealed.
Therefore, according to the invention, private privacy information from multiple sources (namely the client) can not be disclosed, and privacy information (namely the decision analysis model) of the server can not be disclosed at the same time, and on the basis of realizing the protection of privacy data, the information from the multiple sources can be synthesized to obtain a final decision analysis result.
Drawings
FIG. 1 is a flow diagram illustrating a multi-source data processing method of the present invention.
FIG. 2 is a block diagram showing the architecture of a multi-source data processing system of the present invention.
FIG. 3 is a flow diagram illustrating a multi-source data processing method according to one embodiment of the invention.
Detailed Description
The following description is of some of the several embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
For the purposes of brevity and explanation, the principles of the present invention are described herein with reference primarily to exemplary embodiments thereof. However, those skilled in the art will readily recognize that the same principles are equally applicable to all types of multi-source data processing systems and multi-source data processing methods, and that these same principles, as well as any such variations, may be implemented therein without departing from the true spirit and scope of the present patent application.
FIG. 1 is a flow diagram illustrating a multi-source data processing method of the present invention.
As shown in fig. 1, the multi-source data processing method of the present invention includes:
redundancy addition step S100: the server side mixes a redundant attribute set in the key attribute set as an initial attribute set;
format conversion step S200: the server side carries out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
the simulation executes step S300: the server side selects two or more than two clients as simulation execution clients based on a first random algorithm, simulates the simulation execution clients to execute the attribute set to be processed and obtains a simulation completion attribute set;
actually executing step S400: the server side sends the simulation completion attribute set as noise to the rest clients except the simulation execution client side together with the attribute set to be processed, and the rest clients execute the attribute set to be processed and obtain an execution completion attribute set; and
an analysis decision step S500: and the server inputs the execution completion attribute set to a decision analysis model for calculation and analysis to obtain a decision analysis result.
In the redundancy adding step S100, the redundancy attribute set is mixed into the key attribute set as the initial attribute set, so that it is practically impossible for a plurality of clients to know what the "key attribute set" is, and therefore the "redundancy attribute set" is mixed therein as noise, thereby obtaining the technical effect that it can be ensured that the information to be collected by the server is really information and cannot be obtained by a plurality of clients.
The decision analysis model is established in the server and is only visible to the server, so that the decision analysis model can be prevented from being leaked to a plurality of clients serving as multiple sources.
Furthermore, a unique client identifier is preset and distributed to each client, and the client identifiers of the plurality of clients are recorded and stored in the server. In the simulation execution step S300, the server selects two or more client identifiers from the client identifiers of the plurality of clients through a first random algorithm, and uses the client with the selected client identifier as a simulation execution client.
Here, any one of the following may be employed as the first random algorithm: numerical probability algorithms, las vegas algorithms, monte carlo algorithms, and schouard algorithms, etc.
In the format conversion step S200, the following format conversion is performed on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
For example, as an example, for a field a which is a discrete attribute, such as Value1 and Value2 … Value en, a question set is generated { If field a = Value1 and If field a = Value2 … If field a = Value en }, where N is a natural number; the field B which is the linear attribute is subjected to discrete processing by using a discretization technology, and the subsequent processing mode is the same as that of the discrete attribute field A.
By performing the above format processing in the format conversion step S200, the original information can be converted into a form that can be easily read and counted by a computer, and moreover, the private data of the client itself can not be leaked, whereby the technical effect that the data processing speed is increased and the privacy of the data is improved at the same time can be obtained.
The meaning of the simulation executing step S300 is that if the simulation executing client is not set, if the first client executes the set of attributes to be processed and then sends the updated data to the second client, the second client can estimate the privacy information of the set of attributes to be processed of the first client, so that by adding the "simulation executing client" as noise, the privacy information of the first client that actually executes the set of attributes to be processed can be protected.
The step S400 of actually executing includes the following sub-steps:
substep 1: the server side selects one client side from the rest client sides except the simulation execution client side in the plurality of client sides by adopting a second random algorithm;
substep 2: sending the attribute set to be processed and the simulation attribute set to the selected client;
substep 3: the selected client executes the attribute set to be processed, adds the execution result to the attribute set to be processed and returns the result to the server; and
substep 4: and the server repeatedly executes the substeps 1-3 until the rest clients all execute the attribute set to be processed and obtain an execution completion attribute set.
Here, the second random algorithm may be the same algorithm as the first random algorithm or may be a different algorithm.
In addition, for data transmitted between the client and the server, an encrypted form may be used, for example, an asymmetric algorithm is used for encryption, and here, RSA, DSA, ECC, and the like may be used as the asymmetric algorithm.
In the analysis and decision step S500, the server inputs the execution completion attribute set to a decision analysis model for computational analysis and obtaining a decision analysis result, wherein the decision analysis model performs gradual judgment according to the attributes of the execution completion attribute set to finally obtain the decision result. Specifically, in decision making, a certain attribute value is used for judgment at an internal node of the tree, and a decision is made as to which branch node to enter according to a judgment result until a leaf node is reached to obtain a decision result. As an algorithm for decision making, ID3, C4.5, CART, etc. may be employed.
In addition, optionally, after the analyzing and deciding step S500, the method can further include: and the server returns the decision analysis result to each client. This has the technical effect that the correlation results can be shared.
FIG. 2 is a block diagram showing the architecture of a multi-source data processing system of the present invention.
As shown in FIG. 2, the multi-source data processing system of the present invention comprises: a server 100 and a plurality of clients 200.
The server 100 includes:
a redundancy adding module 110, configured to mix a redundancy attribute set in the key attribute set as an initial attribute set;
a format conversion module 120, configured to perform format conversion on the initial attribute set to obtain a to-be-processed attribute set in a predetermined format;
a simulation execution module 130, configured to select two or more clients as simulation execution clients based on a first random algorithm, and simulate the simulation execution clients to execute the to-be-processed attribute set and obtain a simulation completion attribute set;
a first communication module 140, communicatively connected to the client, for sending the simulation completion attribute set as noise to the remaining clients except the simulation execution client among the plurality of clients together with the to-be-processed attribute set and for accepting a returned execution completion attribute set; and
and the analysis decision module 150 is configured to input the execution completion attribute set to a decision analysis model for computational analysis and obtaining a decision analysis result.
Each of the plurality of clients 200 includes:
a second communication module 210, communicatively connected to the server 100, configured to receive the simulation completion attribute set and the pending attribute set from the server 100 and return an execution completion attribute set obtained by an execution module 220 described below to the server 100; and
and the execution module 220 is configured to execute the to-be-processed attribute set and obtain an execution completion attribute set.
In the server 100, the redundancy adding module 110 mixes the redundancy attribute set in the key attribute set as the initial attribute set, so that it is practically impossible for a plurality of clients 200 to know what the "key attribute set" is, and therefore the "redundancy attribute set" is mixed therein as noise, thereby obtaining the technical effect that it can be ensured that the information to be collected by the server 100 is really information that cannot be obtained for a plurality of clients.
Furthermore, the decision analysis model is established by the analysis decision module 150 in the server 100 and is only visible to the server, which can ensure that the decision analysis model is not revealed to multiple clients as multiple sources.
In the analysis and decision module 500, the server inputs the execution completion attribute set to a decision analysis model for computational analysis and obtaining a decision analysis result, wherein the decision analysis model performs gradual judgment according to the attributes of the execution completion attribute set to finally obtain the decision result. Specifically, in decision making, a certain attribute value is used for judgment at an internal node of the tree, and a decision is made as to which branch node to enter according to a judgment result until a leaf node is reached to obtain a decision result. As an algorithm for decision making, ID3, C4.5, CART, etc. may be employed.
Furthermore, a unique client identifier is preset and distributed to each client, and the client identifiers of the plurality of clients are recorded and stored in the server. The simulation execution module 130 of the server 100 selects two or more client ids from the client id numbers of the plurality of clients by a first random algorithm, uses the client of the selected client id as a simulation execution client, simulates the simulation execution client to execute the set of attributes to be processed and obtain a set of simulation completion attributes, and sends the set of simulation completion attributes as noise together with the set of attributes to be processed to the remaining clients except the simulation execution client among the plurality of clients, thereby obtaining a technical effect that the randomly selected client and the simulation selected client execute the set of attributes to be processed, and then for the remaining clients, even if the set of simulation completion attributes is obtained, real information cannot be obtained because the set of simulation completion attributes is formed by simulation of the server rather than real, the method is a technical means for concealing real data by providing the noise as noise to the rest clients.
Here, as the first random algorithm, a numerical probability algorithm, a las vegas algorithm, a monte carlo algorithm, and a schouard algorithm can be adopted. The client is selected through a random algorithm to perform simulation execution on the attribute set to be processed and obtain a simulation completion attribute set, and the technical effect that manual selection can be avoided and the finally obtained decision result is relatively accurate can be obtained.
In the format conversion module 120, the following format conversion is performed on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
For example, as an example, for a field a which is a discrete attribute, such as Value1 and Value2 … Value en, a question set is generated { If field a = Value1 and If field a = Value2 … If field a = Value en }, where N is a natural number; the field B which is the linear attribute is subjected to discrete processing by using a discretization technology, and the subsequent processing mode is the same as that of the discrete attribute field A.
By performing the above format processing by the format conversion module 120, the original information can be converted into a form that can be easily read and counted by a computer, and moreover, the private data of the client itself can not be leaked, whereby the technical effect that the data processing speed is increased and the privacy of the data is improved at the same time can be obtained.
Next, a multi-source data processing method according to an embodiment of the present invention will be described. The embodiment applies the multi-source data processing method to the security industry cloud service environment, and protects the privacy of the decision analysis model of the server while protecting the data privacy of each client (dealer client).
FIG. 3 is a flow diagram illustrating a multi-source data processing method according to one embodiment of the invention.
As shown in fig. 3, step S1: the server side mixes the key attribute set input by the model into a partial redundant attribute set;
step S2: the server side converts the format of the attribute set mixed with the partial redundant attribute set into a question set with a yes answer or a no answer, wherein the answer is 1, and the answer is 0 if the answer is no;
step S3: the client of each dealer has a unique identification number, such as serial number 1,2,3,4 …, the server randomly selects the identification numbers of the clients of two or more dealers by using a random algorithm and simulates the inquired dealer, for example, 3,24, that is, the random dealer clients are the dealer clients of 3 and 24 and simulate the executed attribute sets;
step S4: randomly selecting another dealer identification number such as 8 dealer client, and sending a message { "queried dealer": {3,24}, "object": obj1, "problem set": { problem 1:0, problem 2:0, problem 3:0, … } };
step S5: the dealer client side with the identification number of 8 adds the self identification number to the inquired dealer after receiving the message, and simultaneously answers the questions in the question set in sequence by using the data of the self about Obj1, when the answer is yes, the answer identification of the corresponding question is updated, and after the message is updated, the information is { "inquired dealer": {3,8,24}, "object": obj1, "problem set": { problem 1:0, problem 2:1, problem 3:0, … } };
step S6: the dealer client with the identification number of 8 randomly selects a dealer client to send a message to the dealer client after removing the 'inquired dealer';
step S7: the last client side feeds back information to the server side, namely, the information is sent to the server side by the last client side after all the dealers except the dealer marked as 3 and 24 are answered, and the server side inputs the finally fed-back information to the decision analysis model to obtain a decision analysis result.
Here, the server side completes the analysis decision about the object Obj1 by using the obtained key attribute set and the decision analysis model established by itself, and optionally, after the analysis is completed, an analysis conclusion set may be sent to a plurality of clients, where each conclusion includes conclusion information and a probability of speculative correctness, for example, { "object": obj1, "conclusion set": { (conclusion 1, 89%), (conclusion 2, 60%), (conclusion 3, 76%), … } }.
As described above, according to the multi-source data processing system and the multi-source data processing method of the embodiment, a protection policy for protecting privacy of private data of a dealer of a client can be provided, a part of redundant attribute sets are added to key attribute sets input by a decision analysis model, the key attribute sets are converted into question sets with yes or no answers, the question sets are transmitted to the client in service in a random order, the client updates the answers of the question sets according to privacy data of the client, and the last completed client is transmitted to the server to complete attribute collection for the same object.
Therefore, after the key attribute which is input as a decision analysis model is added into the redundant attribute and converted into the question set with the answer of yes or no, the key attribute information of all the clients about the analysis object can be obtained, the privacy of the information can be ensured, and the key attribute can be prevented from being disclosed. Moreover, the strategy for protecting the decision analysis model can be provided by the server, each client can not acquire the decision analysis model of the server, and privacy protection is provided for the server.
Further, according to the multi-source data processing system and the multi-source data processing method of the embodiment, the client identified by the selected client is taken as a simulation execution client (simulation completion query) because the simulation completion attribute set is formed by simulation of the server and is not real, and is provided to the remaining clients together as noise, so that other real data can be concealed.
The present invention also provides a computer-readable medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the above-described multi-source data processing method.
The invention also provides computer equipment which comprises a storage module, a processor and a computer program which is stored on the storage module and can run on the processor, and is characterized in that the processor realizes the multi-source data processing method when executing the computer program.
The above examples have mainly explained the multi-source data processing system and the multi-source data processing method of the present invention. Although only a few embodiments of the present invention have been described in detail, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (14)

1. A multi-source data processing method, comprising:
a redundancy adding step, in which the server side mixes a redundancy attribute set in the key attribute set as an initial attribute set;
format conversion, namely, the server performs format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
a simulation execution step, in which the server selects two or more clients in the plurality of clients as simulation execution clients based on a first random algorithm, and simulates the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a real execution step, in which the server sends the simulation completion attribute set as noise to the remaining clients except the simulation execution client among the plurality of clients together with the to-be-processed attribute set, and receives the execution completion attribute set from the remaining clients, wherein the execution completion attribute set is obtained by the remaining clients executing the to-be-processed attribute set; and analyzing and deciding, namely inputting the execution completion attribute set into a decision analysis model by the server side for calculation and analysis to obtain a decision analysis result.
2. The multi-source data processing method of claim 1,
and the decision analysis model performs gradual judgment according to the attributes of the execution completion attribute set to finally obtain a decision result.
3. The multi-source data processing method of claim 1,
a unique client identifier is preset for each client,
in the simulation execution step, two or more clients are selected as simulation execution clients from the client identification numbers of the plurality of clients by a first random algorithm.
4. The multi-source data processing method of claim 3,
the first stochastic algorithm comprises any one of:
numerical probability algorithms, the Las Vegas algorithm, the Monte Carlo algorithm, and the Skiwood algorithm.
5. The multi-source data processing method of claim 1,
in the format conversion step, the following format conversion is performed on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
6. The multi-source data processing method of claim 1,
the step of actually executing comprises the following substeps:
substep 1: the server side selects one client side from the rest client sides except the simulation execution client side in the plurality of client sides by adopting a second random algorithm;
substep 2: sending the attribute set to be processed and the simulation attribute set to the selected client;
substep 3: the selected client executes the attribute set to be processed, adds the execution result to the attribute set to be processed and returns the result to the server;
substep 4: and the server repeatedly executes the substeps 1-3 until the rest clients all execute the attribute set to be processed and obtain an execution completion attribute set.
7. A server configured to communicate with a plurality of clients, comprising:
the redundancy adding module is used for mixing a redundancy attribute set in the key attribute set as an initial attribute set;
the format conversion module is used for carrying out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
the simulation execution module is used for selecting two or more clients from the plurality of clients as simulation execution clients based on a first random algorithm, simulating the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a first communication module, configured to send the simulation completion attribute set as noise together with the to-be-processed attribute set to remaining clients other than the simulation execution client among the plurality of clients and to accept a returned execution completion attribute set; and
and the analysis decision module is used for inputting the execution completion attribute set into a decision analysis model for calculation analysis and obtaining a decision analysis result.
8. The server according to claim 7,
and the decision analysis model performs gradual judgment according to the attributes of the execution completion attribute set to finally obtain a decision result.
9. The server according to claim 7,
a unique client identifier is preset for each client,
the simulation execution module selects two or more clients as simulation execution clients from the client identification numbers of the plurality of clients through a first random algorithm.
10. The server according to claim 9,
the first stochastic algorithm comprises any one of:
numerical probability algorithms, the Las Vegas algorithm, the Monte Carlo algorithm, and the Skiwood algorithm.
11. The server according to claim 7,
the format conversion module performs the following format conversion on the attribute fields in the initial attribute set:
for the discrete field, generating a problem set;
for the fields with linear attributes, a problem set is generated after discrete processing is carried out by adopting a discretization technology.
12. A multi-source data processing system, comprising: a service end and a plurality of client ends,
wherein, the server side includes:
the redundancy adding module is used for mixing a redundancy attribute set in the key attribute set as an initial attribute set;
the format conversion module is used for carrying out format conversion on the initial attribute set to obtain a to-be-processed attribute set with a preset format;
the simulation execution module is used for selecting two or more clients as simulation execution clients based on a first random algorithm, simulating the simulation execution clients to execute the attribute set to be processed and obtain a simulation completion attribute set;
a first communication module, communicatively connected to the client, configured to send the simulation completion attribute set as noise together with the to-be-processed attribute set to remaining clients, except the simulation execution client, among the plurality of clients, and configured to accept a returned execution completion attribute set; and
an analysis decision module for inputting the execution completion attribute set into a decision analysis model for calculation analysis and obtaining a decision analysis result,
wherein the client comprises:
the second communication module is in communication connection with the server and is used for receiving the simulation completion attribute set and the attribute set to be processed from the server and returning an execution completion attribute set obtained by the execution module to the server; and
and the execution module is used for executing the attribute set to be processed and obtaining an execution completion attribute set.
13. A computer-readable medium, having stored thereon a computer program,
the computer program, when executed by a processor, implements the multi-source data processing method of any of claims 1-6.
14. A computer device comprising a storage module, a processor and a computer program stored on the storage module and executable on the processor, wherein the processor implements the multi-source data processing method of any one of claims 1 to 6 when executing the computer program.
CN202110103428.9A 2021-01-26 2021-01-26 Multi-source data processing system and multi-source data processing method Pending CN112580106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110103428.9A CN112580106A (en) 2021-01-26 2021-01-26 Multi-source data processing system and multi-source data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110103428.9A CN112580106A (en) 2021-01-26 2021-01-26 Multi-source data processing system and multi-source data processing method

Publications (1)

Publication Number Publication Date
CN112580106A true CN112580106A (en) 2021-03-30

Family

ID=75145688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110103428.9A Pending CN112580106A (en) 2021-01-26 2021-01-26 Multi-source data processing system and multi-source data processing method

Country Status (1)

Country Link
CN (1) CN112580106A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902816A (en) * 2014-03-12 2014-07-02 郑州轻工业学院 Electrification detection data processing method based on data mining technology
CN106557531A (en) * 2015-09-30 2017-04-05 伊姆西公司 Labyrinth object is converted into planarizing data
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
US20170293763A1 (en) * 2013-03-15 2017-10-12 Advanced Elemental Technologies, Inc. Methods and systems for secure and reliable identity-based computing
CN109636660A (en) * 2018-10-22 2019-04-16 广东精点数据科技股份有限公司 A kind of agricultural weather data redundancy removing method and system based on comentropy
CN110083855A (en) * 2019-02-20 2019-08-02 复旦大学 Competitive strategy result simulation system and its analogy method based on ABM model
CN111258623A (en) * 2020-01-16 2020-06-09 证通股份有限公司 Server and method for providing application and file, user terminal and computer readable storage medium
CN111833080A (en) * 2019-04-15 2020-10-27 北京嘀嘀无限科技发展有限公司 Information pushing method and device, electronic equipment and computer-readable storage medium
CN111971675A (en) * 2017-12-18 2020-11-20 普威达有限公司 Data product publishing method or system
CN112052480A (en) * 2020-09-11 2020-12-08 哈尔滨工业大学(深圳) Privacy protection method, system and related equipment in model training process

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293763A1 (en) * 2013-03-15 2017-10-12 Advanced Elemental Technologies, Inc. Methods and systems for secure and reliable identity-based computing
CN103902816A (en) * 2014-03-12 2014-07-02 郑州轻工业学院 Electrification detection data processing method based on data mining technology
CN106557531A (en) * 2015-09-30 2017-04-05 伊姆西公司 Labyrinth object is converted into planarizing data
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
CN111971675A (en) * 2017-12-18 2020-11-20 普威达有限公司 Data product publishing method or system
CN109636660A (en) * 2018-10-22 2019-04-16 广东精点数据科技股份有限公司 A kind of agricultural weather data redundancy removing method and system based on comentropy
CN110083855A (en) * 2019-02-20 2019-08-02 复旦大学 Competitive strategy result simulation system and its analogy method based on ABM model
CN111833080A (en) * 2019-04-15 2020-10-27 北京嘀嘀无限科技发展有限公司 Information pushing method and device, electronic equipment and computer-readable storage medium
CN111258623A (en) * 2020-01-16 2020-06-09 证通股份有限公司 Server and method for providing application and file, user terminal and computer readable storage medium
CN112052480A (en) * 2020-09-11 2020-12-08 哈尔滨工业大学(深圳) Privacy protection method, system and related equipment in model training process

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张大胤: "基于互联网环境下学习行为的数据挖掘研究——以MOOC学习行为研究为例", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01, 15 January 2019 (2019-01-15), pages 138 - 2087 *
顾玉宛: "基于并行计算的苹果采摘机器人关键技术研究", 《中国博士学位论文全文数据库 农业科技辑》, no. 09, 15 September 2016 (2016-09-15), pages 044 - 2 *

Similar Documents

Publication Publication Date Title
CN110991552B (en) Isolated forest model construction and prediction method and device based on federal learning
CN110032878B (en) Safety feature engineering method and device
CN107426165B (en) Bidirectional secure cloud storage data integrity detection method supporting key updating
US9037550B2 (en) Detecting inconsistent data records
US11487969B2 (en) Apparatuses, computer program products, and computer-implemented methods for privacy-preserving federated learning
KR102603797B1 (en) How to verify the execution integrity of an application on a target device
Karr et al. Secure, privacy-preserving analysis of distributed databases
US20090099987A1 (en) Decomposed optimal bayesian stackelberg solver
CN112669138B (en) Data processing method and related equipment
CN108694330A (en) Internet of Things data management method, platform and equipment
Diamadi et al. A simple game for the study of trust in distributed systems
CN112363923A (en) Test method, device, computer equipment and medium based on questionnaire system
CN111476446A (en) Service state monitoring processing method, device, equipment and storage medium
CN111934881B (en) Data right determining method and device, storage medium and electronic device
CN112580106A (en) Multi-source data processing system and multi-source data processing method
CN116777294A (en) Crowd-sourced quality safety assessment method based on federal learning under assistance of blockchain
CN108696530B (en) Online encrypted data security assessment method and device
CN110660450A (en) Safety counting query and integrity verification device and method based on encrypted genome data
CN115687526A (en) Seismic data model sharing method based on block chain and federal learning
CN114925033A (en) Information uplink method, device, system and storage medium
TW202137732A (en) Deterministic sparse-tree based cryptographic proof of liabilities
CN113538019A (en) Food traceability information evidence storing method and device
CN113392101A (en) Method, main server, service platform and system for constructing horizontal federated tree
CN112632247A (en) Method and device for detecting man-hour report, computer equipment and storage medium
Asuncion et al. Nonadaptive mastermind algorithms for string and vector databases, with case studies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination