CN114329300B - Multi-party projection method based on data security and multi-party production data analysis method - Google Patents

Multi-party projection method based on data security and multi-party production data analysis method Download PDF

Info

Publication number
CN114329300B
CN114329300B CN202210244755.0A CN202210244755A CN114329300B CN 114329300 B CN114329300 B CN 114329300B CN 202210244755 A CN202210244755 A CN 202210244755A CN 114329300 B CN114329300 B CN 114329300B
Authority
CN
China
Prior art keywords
projection
data
model
server
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202210244755.0A
Other languages
Chinese (zh)
Other versions
CN114329300A (en
Inventor
夏佳志
林伟星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202210244755.0A priority Critical patent/CN114329300B/en
Publication of CN114329300A publication Critical patent/CN114329300A/en
Application granted granted Critical
Publication of CN114329300B publication Critical patent/CN114329300B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a multi-party projection method based on data security, which comprises an acquisition server and a client set; the server constructs a global model and an initial global dictionary and issues a client; the client initializes a local model, obtains a dictionary of the other party, trains a new local model, projects local data and uploads the local model and part of projection results to a server; the server aggregates to obtain a new global model and a new global dictionary and issues the new global model and the new global dictionary to the client; repeating the steps, and obtaining a final projection model by the server; the service issues a projection model to the client; the client side projects local data by adopting a projection model and uploads the result to the server; and the server draws all the received projection results to a scatter diagram to finish the multi-party projection. The invention also provides a multi-party production data analysis method comprising the multi-party projection method based on data safety. The method has the advantages of good projection effect, high safety and high efficiency.

Description

Multi-party projection method based on data security and multi-party production data analysis method
Technical Field
The invention belongs to the field of data processing, and particularly relates to a multi-party projection method and a multi-party production data analysis method based on data security.
Background
With the development of technology and the improvement of living standard of people, the intelligent big data technology is widely applied to the production and the life of people. The current data is generally high-dimensional data, so it is very important to process the high-dimensional data.
A high-dimensional data projection method is a commonly used data analysis method. The method projects high-dimensional data into a low-dimensional space, so that a data analyst is supported to analyze high-dimensional data features from a low-dimensional projection result. In the past, data analysts typically collected data from multiple data providers on a single device and then projected. However, with the increased awareness of people's privacy and the advent of privacy protection policies, it has become increasingly difficult to collect and share data, especially data with sensitive information, among data providers. Therefore, how to obtain the global data projection result on the premise of not collecting data of all parties becomes a common difficulty faced by the data analyst at present; and this problem is also known as the secure multiparty projection problem.
In order to make the projection result truly reflect the high-dimensional data distribution, it is necessary to maintain the data proximity relation. This presents two challenges to the security multiparty projection problem: first, how to keep the projection result in the cross-square data proximity relation: maintaining a cross-party data proximity relationship refers to having high-dimensional neighbor data of the dispersed parties projected onto locations of low-dimensional proximity; secondly, how to maintain the data proximity relation under the premise that the data are not independently and identically distributed. In the above scenario, the data of each party is usually not independently and identically distributed, and under this condition, the projection results of the parties are easily overlapped, thereby destroying the data proximity relationship.
The traditional projection method needs to gather data together to project, which does not meet the requirement of data confidentiality. Some projection methods are available to calculate the multi-party projection result. A Secure Multi-party Projection (Secure Multi-party Projection) method SMAP (t-Distributed stored Neighbor Embedding) joint Projection method based on a homomorphic encryption method can calculate joint Projection with a consistent Projection effect with a single party; however, the computational overhead added by homomorphic encryption is large, making the method difficult to be put into practice. MSDSNE (Multi shot Decentralized Data storage Neighbor Embedding) projection method performs joint t-SNE projection among Data parties based on shared anchor Data; however, there are many limitations to the MSDSNE approach: firstly, the method needs to share an additional data set, and does not meet the requirement of problems; secondly, the projection effect of the method has high randomness; finally, the method approximately maintains the cross-party data proximity relationship with the additional data set as an anchor point, the retention capability of which is limited by the size of the additional data set. Therefore, the current methods cannot effectively solve the problem of safe multi-party projection.
Disclosure of Invention
The invention aims to provide a multi-projection method based on data security, which has good projection effect, high safety and high efficiency.
The invention also aims to provide a multi-party production data analysis method comprising the multi-party projection method based on data security.
The invention provides a multi-party projection method based on data security, which comprises the following steps:
s1, acquiring a server and client set;
s2, the server constructs a global model and an initial global dictionary and issues the global model and the global dictionary to each client;
s3, each client initializes the respective current local model according to the received global model, and filters the global model to obtain the dictionary of the other party;
s4, each client side trains to obtain a new local model according to the local model obtained in the step S3 and the dictionary of the other party, and the new local model is used as the current local model;
s5, each client uses the current local model obtained in the step S4 to project the local data of the client to obtain a projection result, and uploads the current local model and the randomly selected partial projection result to the server;
s6, the server aggregates to obtain a new global model and a new global dictionary according to the received local model and the projection result, and issues the new global model and the new global dictionary to each client;
s7, repeating the steps S3-S6 until the set conditions are met, and obtaining a final projection model by the server;
s8, the server sends the final projection model obtained in the step S7 to each client;
s9, each client projects local data of the client by adopting the received projection model, and uploads a projection result to the server;
and S10, the server draws all the received projection results to a scatter diagram to finish the safe multi-party projection based on the data safety.
The server building the global model and the initial global dictionary in step S2 includes the following steps:
A. the server selects a model architecture and generates model parameters, constructs a global model and sends the global model to each client;
B. each client projects respective local data by adopting the received global model to obtain respective local projection results;
C. randomly extracting a part of local projection results of each client from each local projection result and uploading the part of the local projection results to a server;
D. and the server constructs an initial global dictionary according to the received projection result.
Each client side stated in step S3 initializes its respective current local model according to the received global model, and filters the global dictionary to obtain the dictionary of the other party, including the following steps:
a. each client generates a local model according to the received topological structure and parameters of the global model;
b. and each client selects the interval where the projection data of the client is located in the received global dictionary according to the sequence number of the client, and constructs the data outside the interval as the dictionary of the other party.
Each client side in the step S4 trains and obtains a respective new local model according to the local model obtained in the step S3 and the other party dictionary, and uses the new local model as the current local model, which specifically includes the following steps:
(1) the client acquires a domain graph of local data and calculates to obtain a weighted graph;
(2) sampling data pairs according to the weight values of the edges in the weighted graph obtained in the step (1) and generating a training data set;
(3) projecting each data pair in the training data set by using the local model obtained in the step S3 to obtain a projection pair;
(4) randomly sampling n high-dimensional vectors for each projection pair obtained in the step (3) and setting the n high-dimensional vectors as non-neighbor vectors so as to calculate projection results of the n high-dimensional vectors;
(5) repeating the step (3) to the step (4) until a set condition is reached, and optimizing parameters of the local model by adopting a cross entropy loss function in the repeating process;
(6) and obtaining a final optimized new local model, and using the final optimized new local model as the current local model.
The cross entropy loss function in the step (5) is specifically a cross entropy loss function which is as follows:
Figure 381412DEST_PATH_IMAGE002
in the formulaLoss(X,Y,D) Is a cross entropy loss function;Xis high-dimensional data;Yis a projection result;Dprojecting a dictionary for the other party;
Figure 524949DEST_PATH_IMAGE004
is a hyper-parameter for controlling the repulsive force;R(Y,D) For implementing a loss term of the other party rejection strategy for introducing a repulsion force between the result of the present projection and the result of the other party projection, and
Figure 313913DEST_PATH_IMAGE006
Figure 150282DEST_PATH_IMAGE008
is composed ofYTo middleiAn element andDto middlekLow dimensional similarity between individual elements and
Figure 657487DEST_PATH_IMAGE010
aandbparameters for the umap (Uniform Manifold Approximation and Projection) algorithm in calculating low-dimensional similarity, preferablya=1.93,b=0.79,Y i Is composed ofYTo middleiThe number of the elements is one,D k is composed ofDTo middlekAn element;CE(X,Y) Is the difference between the projected distribution and the high-dimensional distribution of the data pairs, and
Figure 198190DEST_PATH_IMAGE012
Figure 107240DEST_PATH_IMAGE014
for calculating high-dimensional data by using umap algorithmXTo middleiAn element andja similarity function between the individual elements;
Figure 81887DEST_PATH_IMAGE016
for computing low-dimensional data by using umap algorithmYTo middleiAn element andja similarity function between the individual elements; log is a logarithmic operation based on e.
Each client in step S5 projects its own local data with the current local model obtained in step S4 to obtain a projection result, and uploads the current local model and a randomly selected part of the projection result to the server, specifically, each client projects its own local data with the current local model obtained in step S4 to obtain a projection result, randomly extracts a projection result with a fixed length and without repetition according to the length of the projection result, and uploads the projection result together with the current local model to the server.
The server in step S6 obtains a new global model and a new global dictionary by aggregation according to the received local model and projection result, and issues the new global model and the new global dictionary to each client, and specifically includes the following steps:
1) the server receives the local models and the projection results uploaded by the clients;
2) the server adopts a federal average algorithm to aggregate the local models according to the local models of the clients received in the step 1), so as to obtain a new global model;
3) the server combines the projection results of the clients received in the step 1) according to the numbering sequence of the clients, so as to obtain a new global dictionary.
The local model is aggregated by adopting a federal average algorithm in the step 2), specifically, the local model is aggregated by adopting the following formula:
Figure 76388DEST_PATH_IMAGE018
in the formulaf(w) Parameters of the polymerization model;n k is as followskThe amount of data owned by each client;nis the total amount of data;
Figure 420781DEST_PATH_IMAGE020
is as followskParameters of the local model;Kis the number of clients.
The invention also provides a multi-party production data analysis method comprising the multi-party projection method based on data security, which comprises the following steps:
the method comprises the following steps that SA, a headquarter server is used as a server in the multi-projection method based on the data security, and data centers of various factories of an enterprise are used as clients in the multi-projection method based on the data security;
SB., the data center and headquarters server of each factory of the enterprise project by the above-mentioned multi-projection method based on data security;
SC., the enterprise headquarter server draws all the received projection results to a scatter diagram to complete the safe multi-party projection based on data safety;
SD. the headquarters personnel analyzes the multi-party production data based on the scatter plot obtained in step SC.
According to the multi-party projection method based on data safety and the multi-party production data analysis method, a federal learning framework and a depth dimension reduction method are innovatively combined, the cross-party data proximity relation can be kept, and a new solution is provided for the safe multi-party projection problem; the invention provides a new technology for keeping the data adjacent relation under the condition of non-independent same distribution of data, and effectively solves the problem of projection overlapping under the condition of non-independent same distribution of data; therefore, the method has the advantages of good projection effect, high safety and high efficiency.
Drawings
FIG. 1 is a schematic flow chart of a projection method according to the present invention.
FIG. 2 is a graph showing a comparison of the performance of the method of the present invention with that of a conventional MSDSNE method under IID (Independent and Identifically Distributed) conditions.
FIG. 3 is a diagram showing the quantitative validation of the effectiveness of the rejection strategy under the NonIID (Non-Independent and reactive distribution) condition.
Fig. 4 is a schematic diagram of qualitative validation of effectiveness of the projection exclusion policy of the other party when the number of clients on the small _ washion data set is 2 under the NonIID condition.
FIG. 5 is a schematic method flow diagram of the analysis method of the present invention.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the invention provides a multi-party projection method based on data security, which comprises the following steps:
s1, acquiring a server and client set;
s2, the server constructs a global model and an initial global dictionary and issues the global model and the global dictionary to each client; the method specifically comprises the following steps:
A. the server selects a model architecture, generates model parameters, constructs a global model and sends the global model to each client;
B. each client projects respective local data by adopting the received global model to obtain respective local projection results;
C. randomly extracting a part of local projection results of each client from each local projection result and uploading the part of the local projection results to a server;
D. the server constructs an initial global dictionary according to the received projection result;
s3, each client initializes the respective current local model according to the received global model, and filters the global model to obtain the dictionary of the other party; the method specifically comprises the following steps:
a. each client generates a local model according to the received topological structure and parameters of the global model;
b. each client selects an interval where self projection data are located in the received global dictionary according to the sequence number of the client, and data outside the interval are constructed into other dictionaries;
s4, each client trains to obtain a new local model according to the local model obtained in the step S3 and the dictionary of the other party, and the new local model is used as the current local model; the method specifically comprises the following steps:
(1) the client acquires a domain graph of local data and calculates to obtain a weighted graph;
(2) sampling data pairs according to the weight values of the edges in the weighted graph obtained in the step (1) and generating a training data set;
(3) projecting each data pair in the training data set by using the local model obtained in the step S3 to obtain a projection pair;
(4) randomly sampling n high-dimensional vectors for each projection pair obtained in the step (3) and setting the n high-dimensional vectors as non-neighbor vectors so as to calculate the projection results of the n high-dimensional vectors;
(5) repeating the step (3) to the step (4) until a set condition is reached, and optimizing parameters of the local model by adopting a cross entropy loss function in the repeating process;
in specific implementation, the following cross entropy loss function is adopted:
Figure 184338DEST_PATH_IMAGE002
in the formulaLoss(X,Y,D) Is a cross entropy loss function;Xis high-dimensional data;Yis a projection result;Dprojecting a dictionary for the other party;
Figure 96930DEST_PATH_IMAGE004
is a hyper-parameter for controlling the repulsive force;R(Y,D) For implementing a loss term of the other party rejection strategy for introducing a repulsion force between the result of the present projection and the result of the other party projection, and
Figure 313148DEST_PATH_IMAGE006
Figure 726812DEST_PATH_IMAGE008
is composed ofYTo middleiAn element andDto middlekLow dimensional phase between individual elementsSimilarity and
Figure 220241DEST_PATH_IMAGE010
aandbfor the parameters of the umap algorithm in calculating the low-dimensional similarity, the method is preferably useda=1.93,b=0.79,Y i Is composed ofYTo middleiThe number of the elements is one,D k is composed ofDTo middlekAn element;CE(X,Y) Is the difference between the projected distribution and the high-dimensional distribution of the data pairs, and
Figure 693948DEST_PATH_IMAGE012
Figure 397462DEST_PATH_IMAGE014
for calculating high-dimensional data by using umap algorithmXTo middleiAn element andja similarity function between the individual elements;
Figure 224603DEST_PATH_IMAGE016
for computing low-dimensional data by using umap algorithmYTo middleiAn element andja similarity function between the individual elements; log is a logarithmic operation taking e as a base number;
(6) obtaining a new local model after final optimization, and taking the new local model as a current local model;
s5, each client uses the current local model obtained in the step S4 to project the local data of the client to obtain a projection result, and uploads the current local model and the randomly selected partial projection result to the server; specifically, each client uses the current local model obtained in the step S4 to project its own local data to obtain a projection result, randomly extracts a projection result with a fixed length and without repetition according to the length of the projection result, and uploads the projection result together with the current local model to the server;
s6, the server aggregates to obtain a new global model and a new global dictionary according to the received local model and the projection result, and issues the new global model and the new global dictionary to each client; the method specifically comprises the following steps:
1) the server receives the local models and the projection results uploaded by the clients;
2) the server adopts a federal average algorithm to aggregate the local models according to the local models of the clients received in the step 1), so as to obtain a new global model; specifically, the polymerization is carried out by adopting the following formula:
Figure 962752DEST_PATH_IMAGE018
in the formulaf(w) Parameters of the polymerization model;n k is as followskThe amount of data owned by each client;nis the total amount of data;
Figure 975402DEST_PATH_IMAGE020
is as followskParameters of the local model;Kthe number of the clients;
3) the server combines the projection results of the clients received in the step 1) according to the numbering sequence of the clients, so as to obtain a new global dictionary;
s7, repeating the steps S3-S6 until the set conditions are met, and obtaining a final projection model by the server;
s8, the server sends the final projection model obtained in the step S7 to each client;
s9, each client projects local data by adopting the received projection model, and uploads the projection result to the server;
and S10, the server draws all the received projection results to a scatter diagram to finish the safe multi-party projection based on the data safety.
FIG. 2 is a graphical representation of the comparison of the performance of the method of the present invention with the existing MSDSNE method under IID conditions.
The performance of the method of the invention was compared to the existing MSDSNE method under IID conditions, as shown in figure 2. The left graph (fig. 2 (a)) is the experimental result of the mnst _ test data set, and the right graph (fig. 2 (b)) is the experimental result of the small _ washion data set. In which umap and pumap (Parametric Uniform Approximation and Projection) are both centralized Projection methods, were used as controls in this experiment. FP is the method of the invention. The MSDSNE method is the prior art mentioned in the background, and the percentage figures in parentheses indicate the scale of shared data in the MSDSNE method. In fig. 2, comparison of performance is performed using the classification accuracy and neighborhood preserving degree of KNN (K-Nearest Neighbors) as indexes. As can be seen from the figure, the method of the invention is greatly superior to the existing MSDSNE method in KNN classification accuracy and neighborhood preservation degree.
Fig. 3 is a diagram illustrating the quantitative validation of the effectiveness of the other projection exclusion strategy under the NonIID condition.
The effectiveness of the rejection strategy was assessed quantitatively under NonIID conditions, as shown in figure 3. The left graph (fig. 3 (a)) is the experimental result of the mnst _ test data set, and the right graph (fig. 3 (b)) is the experimental result of the small _ washion data set. Wherein LR represents a label _ ratio index for representing KNN classification accuracy; IR denotes an index _ ratio index for indicating a neighborhood preservation degree. FP is the method of the invention (i.e. setting in the loss function) without using the other party rejection strategy
Figure DEST_PATH_IMAGE022
). FP (R) is the method of the invention using the other party rejection strategy (i.e. setting in the loss function)
Figure DEST_PATH_IMAGE024
). The MSDSNE method was used as a control. Therefore, the 6 folding lines in fig. 3 correspond to: the broken line 1 is an LR index of the MSDSNE method; broken line 2 is the LR index of the method of the invention without using the other party rejection strategy; the broken line 3 is the LR index of the method of the invention using the strategy of exclusion of other parties; a polyline 4 is an IR index of the MSDSNE method; polyline 5 is the IR index of the method of the invention without using the other party exclusion strategy; polyline 6 is the IR indicator of the inventive method using the other party exclusion strategy. In fig. 3, polyline 3 is higher than polyline 2, illustrating the effectiveness of the other rejection strategy in solving the projection overlap problem with KNN classification accuracy. The fold line 6 is higher than the fold line 5, which explains that the projection weight is solvedAfter the stacking problem, the neighborhood preservation degree is better maintained.
Fig. 4 is a schematic diagram of qualitative validation of effectiveness of the projection exclusion policy of the other party when the number of clients on the small _ washion data set is 2 under the NonIID condition.
And qualitatively evaluating the effectiveness of the projection exclusion strategy of the other party when the number of clients on the small _ washion data set is 2 under the NonIID condition. As shown in fig. 4. Fig. 4 (a) shows the projection result of the centralized projection method, pumap, for comparison, where the LR (label _ ratio) result is 98.2% and the IR (index _ ratio) result is 22.3%. The problem of projective overlap arises in the method of the present invention that does not use the exclusive strategy in fig. 4(b), and the method has an LR (label _ ratio) index result of 76.8% and an IR (index _ ratio) index result of 17.5%. In fig. 4(c), the method of the present invention using the other-party exclusion strategy solves the problem of projection overlap, and the LR (label _ ratio) index result of the method is 100%, and the IR (index _ ratio) index result is 22.5%; this demonstrates the effectiveness of the other party rejection strategy in solving the problem of projection overlap. Fig. 4 (d) shows the projection result of the MSDSNE method, and it can be seen that there is a serious problem of projection overlap, and the LR (label _ ratio) index result of the method is 78.6%, and the IR (index _ ratio) index result is 1.6%. This indicates that the MSDSNE method does not solve the projection overlap problem under the NonIID condition.
The difference between the method of the present invention (FP for short) and the SMAP method of the prior art is:
1. in the SMAP method, each data party needs to transmit encrypted data to two central servers, and then the two central servers calculate a projection result by cooperation. In the method (FP method) of the invention, data does not need to be encrypted or leave the local of a data side, and only the model parameters and the projection result are transmitted to the central server. In contrast, the SMAP method still risks the encrypted data being cracked. If its two central servers are in collusion, the encrypted data may be cracked. The method (FP method) of the invention has no risk of cracking the original data. Although methods for estimating original data through model parameters exist at present, the methods have more constraints and have poor estimation effect.
The difference between the method (FP for short) of the invention and the MSDSNE method in the prior art is as follows:
1. the MSDSNE method needs to share an additional data set among data parties, and the method (FP method) does not need to share the additional data set; the shared data set does not accord with the current data privacy protection policy;
2. the projection effect of the method (FP method) of the invention is superior to that of the MSDSNE method: firstly, under the IID condition of data, the index of class separation degree or the index of proximity relation keeping degree is superior to the MSDSNE method; secondly, under the condition of data NonIID, the method (called FP for short) of the invention using the projection exclusion strategy of the other party can relieve the problem of projection overlap, so that the two indexes are improved.
FIG. 5 is a schematic flow chart of the analysis method of the present invention: the invention also provides a multi-party production data analysis method comprising the multi-party projection method based on data security, which comprises the following steps:
the method comprises the following steps that SA, a headquarter server is used as a server in the multi-projection method based on the data security, and data centers of various factories of an enterprise are used as clients in the multi-projection method based on the data security;
SB., the data center and headquarters server of each factory of the enterprise project by the above-mentioned multi-projection method based on data security;
SC., the enterprise headquarter server draws all the received projection results to a scatter diagram to complete the safe multi-party projection based on data safety;
SD. the headquarters personnel analyzes the multi-party production data based on the scatter plot obtained in step SC.
For example, the above multi-party production data analysis method can be used for abnormal production data analysis of enterprises; more specifically, for example, a headquarters of a certain automobile manufacturing enterprise has a plurality of automobile manufacturing plants arranged all over the country in a location a, each automobile manufacturing plant operates independently and produces an automobile of type B for the automobile manufacturing enterprise; then, if a head office researcher of the automobile manufacturing enterprise needs to analyze abnormal data in the production process of the B-type automobile so as to optimize the production flow, the head office of the automobile manufacturing enterprise needs to be able to obtain the production process data of each automobile manufacturing plant.
Conventionally, a conventional method is to collect production data of each production plant and to perform analysis by a headquarters. However, production data may be obtained by hackers during data transmission, and leaking product production data may pose a significant hazard to the company. Then, the headquarters of the automobile production enterprise and each production factory can adopt the multi-party production data analysis method provided by the invention, a server of the headquarters of the automobile production enterprise is used as a server, a data center of each production factory is used as a client, and the server and the client operate the multi-party projection method based on data security provided by the invention together, so that the headquarters obtains data projection results of each production factory on the premise of ensuring data security and draws the results to a scatter diagram; then, the total researchers can analyze multi-part production data according to the obtained scatter diagram, namely, abnormal data in the production process of each production plant can be analyzed.
The data security-based multi-party projection method provided by the invention can be particularly applied to the Internet industry and the industrial Internet of things industry.
In the internet industry, if an internet company wants to analyze the mobile browsing behavior pattern of a user, the mobile browsing behavior information of the user is private data and cannot leave the mobile of the user. The internet company can then use the method of the invention. In the application scenario, the server of the present invention is a company server of the internet company, and the mobile phone of the user is a client. And a plurality of users cooperatively train the projection model based on local mobile phone browsing behavior information with the help of the internet company server. Finally, the Internet company can obtain the mobile phone browsing behavior projection result of the user. In the projection result, the internet company can analyze the browsing behavior pattern of the user according to the clustering condition in the projection.
Aiming at the industrial Internet of things scene, the industrial Internet of things transmits mass industrial data to an industrial chain at a very high speed, so that the machine learning method based on data driving is widely applied to industrial manufacturing. In the industrial field, however, data resources cannot be shared among enterprises for competition or user privacy reasons. For example, a manufacturing company may want to analyze production data for a product. The traditional approach is to aggregate the production data from multiple plants for analysis. However, product production data may be obtained by hacker attacks during data transmission. Revealing product production data can pose a significant hazard to a company's product marketing strategy. Therefore, it is very important to analyze data on the premise of protecting the production data privacy of enterprise products. In this scenario, the server of the present invention is a server of a company headquarters, and each local factory data storage center is a client. Multiple plant data storage centers co-train the projection model with the help of a headquarters server. And finally, the headquarters can obtain the projection results of the production data of the products of various factories. In the projection result, the company can analyze the problems existing in the production process of the products of various regions according to the abnormal data in the projection result.

Claims (2)

1. A multi-projection method based on data security is characterized by comprising the following steps:
s1, acquiring a server and client set;
s2, the server constructs a global model and an initial global dictionary and issues the global model and the global dictionary to each client; the server constructs a global model and an initial global dictionary, and specifically comprises the following steps:
A. the server selects a model architecture and generates model parameters, constructs a global model and sends the global model to each client;
B. each client projects respective local data by adopting the received global model to obtain respective local projection results;
C. randomly extracting a part of local projection results of each client from each local projection result and uploading the part of the local projection results to a server;
D. the server constructs an initial global dictionary according to the received projection result;
s3, each client initializes the respective current local model according to the received global model, and filters the global model to obtain the dictionary of the other party; the method specifically comprises the following steps:
a. each client generates a local model according to the received topological structure and parameters of the global model;
b. each client selects an interval where self projection data are located in the received global dictionary according to the sequence number of the client, and data outside the interval are constructed into other dictionaries;
s4, each client trains to obtain a new local model according to the local model obtained in the step S3 and the dictionary of the other party, and the new local model is used as the current local model; the method specifically comprises the following steps:
(1) the client acquires a domain graph of local data and calculates to obtain a weighted graph;
(2) sampling data pairs according to the weight values of the edges in the weighted graph obtained in the step (1) and generating a training data set;
(3) projecting each data pair in the training data set by using the local model obtained in the step S3 to obtain a projection pair;
(4) randomly sampling n high-dimensional vectors for each projection pair obtained in the step (3) and setting the n high-dimensional vectors as non-neighbor vectors so as to calculate the projection results of the n high-dimensional vectors;
(5) repeating the step (3) to the step (4) until a set condition is reached, and optimizing parameters of the local model by adopting a cross entropy loss function in the repeated process; specifically, the following cross entropy loss function is adopted:
Figure DEST_PATH_IMAGE002
in the formulaLoss(X,Y,D) Is a cross entropy loss function;Xis high-dimensional data;Yis a projection result;Dprojecting a dictionary for the other party;
Figure DEST_PATH_IMAGE004
is a hyper-parameter for controlling the repulsive force;R(Y,D) For implementing a loss term of the other party rejection strategy for introducing a repulsion force between the result of the present projection and the result of the other party projection, and
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
is composed ofYTo middleiAn element andDto middlekLow dimensional similarity between individual elements and
Figure DEST_PATH_IMAGE010
aandbfor the parameters of the umap algorithm in calculating the low-dimensional similarity,a=1.93,b=0.79,Y i is composed ofYTo middleiThe number of the elements is one,D k is composed ofDTo middlekAn element;CE(X,Y) Is the difference between the projected distribution and the high-dimensional distribution of the data pairs, and
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
for calculating high-dimensional data by using umap algorithmXTo middleiAn element and ajA similarity function between the individual elements;
Figure DEST_PATH_IMAGE016
for computing low-dimensional data by using umap algorithmYTo middleiAn element andja similarity function between the individual elements; log is a logarithmic operation taking e as a base number;
(6) obtaining a new local model after final optimization, and taking the new local model as a current local model;
s5, each client uses the current local model obtained in the step S4 to project the local data of the client to obtain a projection result, and uploads the current local model and the randomly selected partial projection result to the server; specifically, each client uses the current local model obtained in the step S4 to project its own local data to obtain a projection result, randomly extracts a projection result with a fixed length and without repetition according to the length of the projection result, and uploads the projection result together with the current local model to the server;
s6, the server aggregates to obtain a new global model and a new global dictionary according to the received local model and the projection result, and issues the new global model and the new global dictionary to each client; the method specifically comprises the following steps:
1) the server receives the local models and the projection results uploaded by the clients;
2) the server adopts a federal average algorithm to aggregate the local models according to the local models of the clients received in the step 1), so as to obtain a new global model; the local model is aggregated by adopting a federal average algorithm, and specifically, the aggregation is performed by adopting the following formula:
Figure DEST_PATH_IMAGE018
in the formulaf(w) Parameters of the polymerization model;n k is as followskThe amount of data owned by each client;nis the total amount of data;
Figure DEST_PATH_IMAGE020
is as followskParameters of the local model;Kthe number of the clients;
3) the server combines the projection results of the clients received in the step 1) according to the numbering sequence of the clients, so as to obtain a new global dictionary;
s7, repeating the steps S3-S6 until the set conditions are met, and obtaining a final projection model by the server;
s8, the server sends the final projection model obtained in the step S7 to each client;
s9, each client projects local data of the client by adopting the received projection model, and uploads a projection result to the server;
and S10, the server draws all the received projection results to a scatter diagram to finish the safe multi-party projection based on the data safety.
2. A multi-party production data analysis method comprising the data security-based multi-party projection method of claim 1, characterized by comprising the steps of:
the method comprises the following steps that SA, a headquarter server is used as a server in the multi-projection method based on the data security, and data centers of various factories of an enterprise are used as clients in the multi-projection method based on the data security;
SB., the data center and headquarters server of each factory of the enterprise project by the above-mentioned multi-projection method based on data security;
SC., the enterprise headquarter server draws all the received projection results to a scatter diagram to finish the safe multi-party projection based on the data safety;
SD. the headquarters personnel analyzes the multi-party production data based on the scatter plot obtained in step SC.
CN202210244755.0A 2022-03-14 2022-03-14 Multi-party projection method based on data security and multi-party production data analysis method Expired - Fee Related CN114329300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210244755.0A CN114329300B (en) 2022-03-14 2022-03-14 Multi-party projection method based on data security and multi-party production data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210244755.0A CN114329300B (en) 2022-03-14 2022-03-14 Multi-party projection method based on data security and multi-party production data analysis method

Publications (2)

Publication Number Publication Date
CN114329300A CN114329300A (en) 2022-04-12
CN114329300B true CN114329300B (en) 2022-05-20

Family

ID=81033209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210244755.0A Expired - Fee Related CN114329300B (en) 2022-03-14 2022-03-14 Multi-party projection method based on data security and multi-party production data analysis method

Country Status (1)

Country Link
CN (1) CN114329300B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049648B (en) * 2022-11-17 2023-08-04 北京东方通科技股份有限公司 Multiparty projection method and multiparty data analysis method based on data security

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015847B1 (en) * 2014-05-06 2015-04-21 Synack, Inc. Computer system for distributed discovery of vulnerabilities in applications
CN110493256A (en) * 2019-09-04 2019-11-22 深圳供电局有限公司 Data transmission security authentication method and system based on edge calculations and vector projection
CN113850270A (en) * 2021-04-15 2021-12-28 北京大学 Semantic scene completion method and system based on point cloud-voxel aggregation network model
CN114140641A (en) * 2021-11-08 2022-03-04 江苏大学 Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015847B1 (en) * 2014-05-06 2015-04-21 Synack, Inc. Computer system for distributed discovery of vulnerabilities in applications
CN110493256A (en) * 2019-09-04 2019-11-22 深圳供电局有限公司 Data transmission security authentication method and system based on edge calculations and vector projection
CN113850270A (en) * 2021-04-15 2021-12-28 北京大学 Semantic scene completion method and system based on point cloud-voxel aggregation network model
CN114140641A (en) * 2021-11-08 2022-03-04 江苏大学 Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于改进向量投影距离的知识图谱表示方法;李鑫超等;《信息科技》;20191216;全文 *

Also Published As

Publication number Publication date
CN114329300A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
Xu et al. Privacy-preserving federated deep learning with irregular users
WO2022057631A1 (en) Data processing method and system based on node group, and device and medium
CN114329300B (en) Multi-party projection method based on data security and multi-party production data analysis method
Jallepalli et al. Federated learning for object detection in autonomous vehicles
CN112788064B (en) Encryption network abnormal flow detection method based on knowledge graph
Sun et al. Network security technology of intelligent information terminal based on mobile internet of things
CN114218322B (en) Data display method, device, equipment and medium based on ciphertext transmission
CN113806768A (en) Lightweight federated learning privacy protection method based on decentralized security aggregation
CN116527362A (en) Data protection method based on LayerCFL intrusion detection
Gao et al. Privacy threats against federated matrix factorization
Nanavati et al. A novel privacy‐preserving scheme for collaborative frequent itemset mining across vertically partitioned data
CN113240129A (en) Multi-type task image analysis-oriented federal learning system
CN115086315A (en) Cloud edge collaborative security authentication method and system based on image sensitivity identification
CN116708009A (en) Network intrusion detection method based on federal learning
Mohammed et al. Security and privacy in the Internet of Things (IoT): Survey
CN116502732B (en) Federal learning method and system based on trusted execution environment
Wu et al. Distributed modelling approaches for data privacy preserving
CN109697613B (en) Security authentication method and system for network transaction in block chain
Jangir et al. Poster: Vogue: Faster computation of private heavy hitters
CN113191396B (en) Modeling method and device based on data privacy security protection
Li et al. A federated recommendation system based on local differential privacy clustering
Shankar et al. Secure optimal k-NN on encrypted cloud data using homomorphic encryption with query users
CN114362988A (en) Network traffic identification method and device
Gao et al. Privacy-preserving verifiable asynchronous federated learning
Yang et al. Research on the security sharing model of power grid data based on federated learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220520