CN116861152A - Tax data security graph neural network training method based on matrix decomposition - Google Patents
Tax data security graph neural network training method based on matrix decomposition Download PDFInfo
- Publication number
- CN116861152A CN116861152A CN202310795131.2A CN202310795131A CN116861152A CN 116861152 A CN116861152 A CN 116861152A CN 202310795131 A CN202310795131 A CN 202310795131A CN 116861152 A CN116861152 A CN 116861152A
- Authority
- CN
- China
- Prior art keywords
- matrix
- tax data
- decomposition
- graph
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 182
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 230000010354 integration Effects 0.000 claims description 9
- 238000003062 neural network model Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 230000035945 sensitivity Effects 0.000 claims description 7
- 239000000243 solution Substances 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000002347 injection Methods 0.000 claims description 3
- 239000007924 injection Substances 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000000547 structure data Methods 0.000 description 6
- 230000006872 improvement Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 235000008694 Humulus lupulus Nutrition 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/10—Tax strategies
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Finance (AREA)
- Pure & Applied Mathematics (AREA)
- Bioethics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Databases & Information Systems (AREA)
- Economics (AREA)
- Algebra (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a tax data security graph neural network training method based on matrix decomposition, which comprises the following steps: firstly, carrying out safe eigenvalue decomposition on an adjacent matrix part of a tax data graph by using an external server, dividing an obtained eigenvalue decomposition result into a plurality of parts, and carrying out operation on the parts and an eigenvector matrix to generate a plurality of distributable adjacent matrixes; secondly, carrying out differential privacy on the feature matrix part of the tax data graph; thirdly, the tax data has the characteristic matrix after the decomposed adjacency matrix and the differential privacy are distributed to each computing party through a parameter server for model training; and finally, returning the calculation result to the tax data owner by the calculator, and obtaining the target model parameters through integrating and updating by the parameter server. According to the method, the original tax data is safely decomposed in the modes of topology secret sharing and adjacency matrix eigenvalue decomposition, so that the tax data is efficiently analyzed and modeled by means of external computing resources, and the analysis efficiency is improved.
Description
Technical Field
The invention belongs to the technical field of graph privacy protection methods, and particularly relates to a tax data security graph neural network training method based on matrix decomposition.
Background
In recent years, tax data is increasingly complex with rapid development of national economy and continuous prosperity of market economy. Tax data is often represented as a graph structure data type, reflecting individual tax information and social relationship information. Therefore, the graph neural network can effectively model the graph structure data in the tax data, and deeply mine the information contained in the graph structure data. Tax data modeling is a basic work of tax data intelligent processing, is a key premise for realizing tax big data, but the increasingly large scale of tax data and the large amount of privacy information contained in tax data prevent analysis and utilization of tax data. Traditional data protection aims at discrete data points, single data cannot be identified and utilized through technical means, and the graph structure data not only contains node information, but also contains rich and important topology information, so that the traditional data protection means are difficult to comprehensively protect. Unlike traditional data protection approaches, current privacy protection research is directed to achieving "available invisible" of the data, i.e., protecting the private information therein from disclosure without affecting the use of the data. Aiming at the graph structure data, privacy protection research focuses on protecting the graph topology information so as to avoid sensitive information from being leaked, and compared with the traditional mode, the security of the graph structure data can be more powerfully ensured. In the existing tax data modeling, due to the two limits of huge tax data scale and limited computing capacity of tax authorities, the modeling task completed by the tax authorities themselves is often inefficient, and the efficiency is improved by means of external computing resources; however, the tax data contains a large amount of sensitive information, the exposure result is serious, and the tax data is not allowed to be processed directly by means of the calculation force of an external mechanism, so that the tax data needs to be processed safely, the disclosure of the privacy information is avoided, and the processed data needs to be modeled correctly. Along with the increasing of tax payer data volume and increasing of tax data scale, contents are increasingly complex, and how to get rid of local calculation restriction while guaranteeing tax data safety, the efficient training of a graph neural network model aiming at tax data by using external calculation becomes a problem to be solved urgently, and has important significance for accelerating tax data processing and further realizing tax big data.
At present, no related research has provided a corresponding solution for a tax data privacy protection graph neural network training method, and the related invention patent of tax data protection mainly relates to the following:
document 1: tax information processing method and system based on blockchain (202011290032.1)
Document 2: enterprise batch clustering method and system (202211142876.0) based on multidimensional characteristics
Document 1 discloses a method and a system for processing tax information based on a blockchain, wherein the blockchain is utilized to take a tax agency as a tax node of the blockchain, manage the blockchain, divide different channels according to business agencies, link the tax node and corresponding business agency nodes in each channel, broadcast tax proving information to the business agency nodes in the corresponding channels according to user authorization by utilizing the tax node, and enable the business agency nodes to acquire the tax proving information.
Document 2 discloses a multi-dimensional feature-based enterprise batch clustering method and system, which are characterized in that tax data, news data and public opinion data of a plurality of target enterprises to be clustered in the tax field are collected, the collected data are analyzed to generate feature data, a graph structure is constructed according to the feature data, and the graph structure is used as input of an optimal graph neural network clustering model to obtain a clustering result of the target enterprises to be clustered.
In the above technical scheme, the document 1 focuses on the storage protection of tax data, the application blockchain technology ensures the safety of the data, but the application of the protected data is not considered, the query use efficiency of the data is lower, the document 2 models the tax data graph on the premise of collecting the tax data, and the graph neural network is used for analyzing the tax data. However, in reality, the processing efficiency of the existing tax data is low due to the computing power of the tax institution, and the processing sensitivity of the existing tax data is also limited, so that the related data can not be processed by directly analyzing the computing power of the external institution. Therefore, how to train the graph neural network model aiming at tax data with high efficiency while ensuring the safety of the tax data has become a problem to be solved.
Disclosure of Invention
The invention aims to provide a tax data security graph neural network training method based on matrix decomposition. Firstly, carrying out safe eigenvalue decomposition on an adjacent matrix part of a tax data graph by using an external server, dividing an obtained eigenvalue decomposition result into a plurality of parts, and carrying out operation on the parts and an eigenvector matrix to generate a plurality of distributable adjacent matrixes; secondly, carrying out differential privacy on the feature matrix part of the tax data graph; thirdly, the tax data has the characteristic matrix after the decomposed adjacency matrix and the differential privacy are distributed to each computing party through a parameter server for model training; and finally, returning the calculation result to the tax data owner by the calculator, and obtaining the target model parameters through integrating and updating by the parameter server.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a tax data security map neural network training method based on matrix decomposition comprises the following steps:
firstly, carrying out safe eigenvalue decomposition on an adjacent matrix of a tax data graph by using an external server, dividing an obtained eigenvalue decomposition result into a plurality of parts, and carrying out operation on the parts and an eigenvector matrix to generate partial secrets of a plurality of distributable adjacent matrices; secondly, carrying out differential privacy on the feature matrix of the tax data graph; thirdly, the tax data has the characteristic matrix after partial secrets and differential privacy of the decomposed adjacent matrix are distributed to each computing party through a parameter server for model training; and finally, returning the calculation result to the tax data owner by the calculator, and obtaining the target model parameters through integrating and updating by the parameter server.
The invention is further improved in that the method specifically comprises the following steps:
1) Adjacency matrix secret sharing based on eigenvalue decomposition
The adjacency matrix of the tax data graph is subjected to safe eigenvalue decomposition by an external server; randomly and equally dividing the characteristic values into corresponding parts according to the number of the calculation parties, wherein the operation result of the characteristic value decomposition result and the characteristic vector matrix is the partial secret of the publishable adjacent matrix;
2) Feature matrix protection based on differential privacy
The feature matrix of the tax data graph is protected by using a Laplacian mechanism by using a differential privacy method;
3) Model training and integration based on parameter server
And distributing the partial secrets of the decomposed adjacent matrixes and the characteristic matrixes after the differential privacy to each computing party, and each computing party trains a graph convolution neural network model based on the distributed data and obtains target model parameters by sending, collecting and integrating model parameters through a parameter server.
The invention further improves that in the step 1), the secret sharing of the adjacency matrix based on eigenvalue decomposition comprises:
step1: secure matrix eigenvalue decomposition
For the adjacency matrix A of the tax data graph, obtaining a characteristic value decomposition numerical solution with enough accuracy through multiple iterations of QR decomposition:
wherein t is iteration round, Q t 、R t Respectively t times A t QR decomposition results of (2); after k iterations, eigenvalue diagonal matrix Λ=a k Feature vector matrix x=q 1 …Q 1 Original adjacency matrix a=xΛx -1 ;
Step2: topological secret sharing
For the obtained eigenvalue diagonal matrix Λ, randomly dividing eigenvalues into a plurality of groups in the form of a plurality of diagonal matrices, and when dividing into two groups, specifically comprising the following steps:
generating a random diagonal 01 matrix S, wherein the diagonal elements obey the following rules:
generating a new diagonal matrix Λ 1 、Λ 2 The method comprises the following steps:
wherein In represents an n-dimensional identity matrix, x h Representing the Hadamard product, i.e. the multiplication of the matrix corresponding elements;
using a newly generated diagonal matrix Λ 1 、Λ 2 The new matrices A1, A2 are generated as follows:
A 1 、A 2 has the following properties:
in the GNN model, the graph topology is represented in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model.
The invention is further improved in that in Step 1) of the Step 1), the third party server is arranged to perform security decomposition on the adjacency matrix A: the data owner generates a sparse random 01 matrix P, calculates and uploads A' =PAP to a third party server -1 The third party server calculates the eigenvalue decomposition of A ' according to the iterative solution process and returns the calculation result X ', Λ to the data owner, wherein A ' =X ' ΛX ' -1 Data owner calculates x=p -1 X', obtaining a matrix decomposition result.
A further improvement of the invention is that in Step2 of Step 1), for two layers of GCN, node embedding is affected by neighbors in the range of two hops, the neighbors in the two ranges being affected by the square A of the adjacency matrix 2 Representation, A 2 The connection relation of the graph and the information transfer between the nodes can be effectively indicated; recording node number n, decomposing original adjacent matrix into k matrices, each matrix after decomposition containingPersonal characteristic value, absence->The probability of correct arrangement is obtained on the premise of obtaining all the characteristic values
A further improvement of the invention is that when n=100, k=2, p≡3.3×10 -65 。
The invention further improves that in the step 2), the feature matrix protection based on differential privacy comprises:
step1: privacy budget and global sensitivity calculation
Applying Laplace mechanism to perform difference on feature matrix X of tax data graphCalculating global sensitivity delta according to privacy budget E set by privacy protection f :
Δ f =max D,D′ {|h=h′|}
Wherein D, D 'is a pair of adjacent data, h are the results of the random query for D, D', respectively; order theThe laplace noise profile to be added is set as follows:
the above Laplace mechanism satisfies the E-differential privacy, namely:
Pr[M(D)=y]≤e ∈ Pr[M(D′)=y]
wherein M is the processing mechanism applied;
step2: noise injection
And inserting the Laplacian noise generated in the last step into the feature matrix X of the tax data graph to obtain a feature matrix X' for privacy protection.
The invention further improves that in the step 3), the model training and integration based on the parameter server comprises the following steps:
step1: data distribution
Adjacency matrix A of tax data graph is decomposed into { A } k …, k=1, 2, …, the feature matrix X of the tax data map is subjected to differential privacy processing to obtain X'; data possession provides privacy protected data to the computing parties, each party obtaining A k And X' as input to the GNN model;
step2: model training based on privacy-preserving data
Selecting a graph convolution neural network model for training, wherein a computing party k locally has a two-layer GCN model, and training is carried out on data distributed to the computing party k; the first layer of input is a node characteristic matrix X and an adjacent matrix Ak, and the node characteristic matrix is output after information transmission and aggregation:
H k,1 =f(A k X′W k,1 )
the second layer input being the output H of the first layer k,1 And adjacent matrix A k Output node hidden feature matrix H k,2 For node classification or other downstream tasks;
H k,2 =f(A k H k,1 W k,2 )
after training, the computing party uploads model parameters W to a parameter server held by the data owner k,1 、W k,2 Simultaneously pulling the updated model parameters of the parameter server; the parameter server integrates the model parameters by means of a model average mode in a distributed machine learning method after collecting the model parameters uploaded by each participant, so that new model parameters are obtained.
The invention has at least the following beneficial technical effects:
(1) According to the invention, privacy protection processing is respectively carried out on the adjacency matrix and the special matrix of the tax data graph, the topology information of the graph is protected from being known by a computing party in a topology secret sharing and adjacency matrix eigenvalue decomposition mode, and the node eigenvalue information is protected in a differential privacy mode.
(2) According to the method, the original tax data is safely decomposed in the modes of topology secret sharing and adjacency matrix eigenvalue decomposition, so that the tax data is efficiently analyzed and modeled by means of external computing resources, and the analysis efficiency is improved.
Drawings
Fig. 1 is a flow chart of an overall framework.
FIG. 2 is a flow chart of secret sharing of an adjacency matrix based on eigenvalue decomposition.
FIG. 3 is a flow chart of model training and integration based on a parameter server.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 1, the training method of the tax data security map neural network based on matrix decomposition provided by the invention comprises the following steps:
1) Adjacency matrix secret sharing based on eigenvalue decomposition
The adjacency matrix of the tax data graph is subjected to safe eigenvalue decomposition by an external server; randomly and equally dividing the characteristic values into corresponding parts according to the number of the calculation parties, wherein the operation result of the characteristic value decomposition result and the characteristic vector matrix is the partial secret of the publishable adjacent matrix; the secret sharing of the adjacency matrix based on eigenvalue decomposition comprises:
step1: secure matrix eigenvalue decomposition
For the adjacency matrix A of the tax data graph, obtaining a characteristic value decomposition numerical solution with enough accuracy through multiple iterations of QR decomposition:
wherein t is iteration round, Q t 、R t Respectively t times A t QR decomposition results of (2); after k iterations, eigenvalue diagonal matrix Λ=a k Feature vector matrix x=q 1 …Q 1 Original adjacency matrix a=xΛx -1 ;
Step2: topological secret sharing
For the obtained eigenvalue diagonal matrix Λ, randomly dividing eigenvalues into a plurality of groups in the form of a plurality of diagonal matrices, and when dividing into two groups, specifically comprising the following steps:
generating a random diagonal 01 matrix S, wherein the diagonal elements obey the following rules:
generating a new diagonal matrix Λ 1 、Λ 2 The method comprises the following steps:
wherein In represents an n-dimensional identity matrix, x h Representing the Hadamard product, i.e. the multiplication of the matrix corresponding elements;
using a newly generated diagonal matrix Λ 1 、Λ 2 The new matrices A1, A2 are generated as follows:
A 1 、A 2 has the following properties:
in the GNN model, the graph topology is represented in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model. Performing security decomposition on the adjacency matrix A by setting a third party server: the data owner generates a sparse random 01 matrix P, calculates and uploads A' =PAP to a third party server -1 The third party server calculates the eigenvalue decomposition of A ' according to the iterative solution process and returns the calculation result X ', Λ to the data owner, wherein A ' =X ' ΛX ' -1 Data owner calculates x=p -1 X', obtaining a matrix decomposition result. For two-layer GCN, node embedding is affected by neighbors in the range of two hops, and the neighbors in the two ranges are obtained by squaring A of an adjacency matrix 2 Representation, A 2 Connection relation and node of graph can be effectively indicatedInformation transfer between them. Due to A 1 、A 2 The property of the data is distributed to different calculation powers to calculate the power, and the power result of the original data is obtained by recycling the integration result. On the other hand, for the calculator, A after decomposition 1 、A 2 The value is quite different from the original adjacency matrix with 01 value, contains a large number of decimal numbers, and cannot recognize the existence of any edge in the value. For the calculator, only partial eigenvalues of the original adjacent matrix can be deduced from the allocated matrix, which is insufficient to recover the original adjacent matrix. Even if the calculator obtains all the eigenvalues, it needs to arrange the eigenvalues in the correct order, so as to restore the original adjacency matrix, but the probability of correct arrangement is very small. Recording node number n, decomposing original adjacent matrix into k matrices, each matrix after decomposition containingPersonal characteristic value, absence->The probability of correct arrangement on the premise of obtaining all the characteristic values>In the case where all the eigenvalues are not obtained, the probability that the calculator recovers the original adjacency matrix is much smaller than the above probability p.
2) Feature matrix protection based on differential privacy
The feature matrix of the tax data graph is protected by using a Laplacian mechanism by using a differential privacy method; feature matrix protection based on differential privacy includes:
step1: privacy budget and global sensitivity calculation
Differential privacy protection is carried out on the feature matrix X of the tax data graph by using a Laplacian mechanism, and the global sensitivity delta is calculated according to the set privacy budget E f :
Δ f =max D,D′ {|h=h′|}
Wherein D, D' isA pair of adjacent data, h being the result of a random query against D, D', respectively; order theThe laplace noise profile to be added is set as follows:
the above Laplace mechanism satisfies the E-differential privacy, namely:
Pr[M(D)=y]≤e ∈ Pr[M(D′)=y]
wherein M is the processing mechanism applied;
step2: noise injection
And inserting the Laplacian noise generated in the last step into the feature matrix X of the tax data graph to obtain a feature matrix X' for privacy protection.
3) Model training and integration based on parameter server
And distributing the partial secrets of the decomposed adjacent matrixes and the characteristic matrixes after the differential privacy to each computing party, and each computing party trains a graph convolution neural network model based on the distributed data and obtains target model parameters by sending, collecting and integrating model parameters through a parameter server. As shown in fig. 3, the model training and integration based on the parameter server includes:
step1: data distribution
Adjacency matrix A of tax data graph is decomposed into { A } k …, k=1, 2, …, the feature matrix X of the tax data map is subjected to differential privacy processing to obtain X'; data possession provides privacy protected data to the computing parties, each party obtaining A k And X' as input to the GNN model;
step2: model training based on privacy-preserving data
Selecting a graph convolution neural network model for training, wherein a computing party k locally has a two-layer GCN model, and training is carried out on data distributed to the computing party k; wherein the first layer inputs are a node feature matrix X and an adjacency matrix A k After information transmission and aggregation, the feature matrix is hidden by the output node:
H k,1 =f(A k X′W k,1 )
the second layer input being the output H of the first layer k,1 And adjacent matrix A k Output node hidden feature matrix H k,2 For node classification or other downstream tasks;
H k,2 =f(A k H k,1 W k,2 )
after training, the computing party uploads model parameters W to a parameter server held by the data owner k,1 、W k,2 Simultaneously pulling the updated model parameters of the parameter server; the parameter server integrates the model parameters by means of a model average mode in a distributed machine learning method after collecting the model parameters uploaded by each participant, so that new model parameters are obtained.
Examples
Selecting a local tax data graph from 2017 to 2019 in national tax of a certain region, wherein the local tax data graph comprises 2786 nodes, 5728 edges, the node characteristic dimension is 1289, and the label dimension is 6. The present invention will be described in further detail with reference to the accompanying drawings, in conjunction with experimental examples and embodiments. All techniques implemented based on this disclosure are within the scope of the present invention.
As shown in fig. 1, in the implementation of the present invention, the training method of the tax data privacy protection graph neural network based on matrix decomposition and differential privacy includes the following steps:
step 1. Secret sharing of adjacency matrix based on eigenvalue decomposition
The tax data graph contains a large number of adjacency matrixes, and the adjacency matrixes can be effectively prevented from being presumed by a calculator through a secret sharing mode. The implementation process of secret sharing of the adjacency matrix is shown in fig. 2, and specifically includes the following steps:
s101, decomposing eigenvalues of adjacent matrixes
Locally first generating a random 01 matrix P of size 2786×2786, then computing a masked matrix adjacency matrix a' =pap -1 And uploaded to a third party server. Third stepThe square server carries out eigenvalue decomposition on the matrix adjacent matrix A 'which is uploaded and shielded, and the decomposition results X', Λ are returned to the local. Further processing the decomposed results X', Λ locally to obtain tax data graph adjacency matrix decomposed results X=P -1 X′P,Λ。
S102, topology secret sharing
In this embodiment, there are two calculators, so a is decomposed into a by means of a randomly generated diagonal 01 matrix S of size 2786X 2786 1 =S× h Λ、Λ 1 =(I 2786 -S)× h Λ and thus a new matrix a 1 =XΛ 1 X -1 ,A 2 =XΛ 2 X -1 . Will A 1 、A 2 Respectively to two computing parties.
Step2, feature matrix protection based on differential privacy
The feature matrix can be effectively protected simply by utilizing differential privacy.
Specifically, in this embodiment, let privacy budget e=10, calculate corresponding laplace noise and insert the calculated laplace noise into feature matrix X to obtain feature matrix X' with privacy protection.
Step 3, model training and integration based on parameter server
By utilizing the feature matrix after secret sharing and differential privacy of the adjacency matrix, a calculator cannot reversely push information in the original tax data graph in the calculation process, and then correct training of the model can be completed by means of the parameter server.
Specifically, in this embodiment, the computing party k trains a two-layer GCN model with model parameters denoted as W i,1 ,W i,2 . Based on the assigned data A i And X', computing a square k training model, wherein the first layer inputs are a node feature matrix X and an adjacency matrix A k After information transmission and aggregation, the feature matrix is hidden by the output node:
H k,1 =f(A k X′W k,1 )
the second layer input being the output H of the first layer k,1 And adjacent matrix A k Output node is hiddenTibetan character matrix H k,2 May be used for node classification or other downstream tasks.
H k,2 =f(A k H k,1 W k,2 )
And after training is finished, W k,1 ,W k,2 And uploading the parameter server. After both calculation parties upload, the parameter server integrates model parameters in a model average mode to obtainAnd will update the parameter W 1 、W 2 And the final model parameters are obtained after the next iteration is carried out for 100 times. The accuracy of the final model on the original tax data graph reaches 81.6%, compared with the accuracy of the model obtained by directly training the tax data graph, the accuracy of the final model is 84.1%, the accuracy is only reduced by 3 percentage points, and the modeling speed is greatly increased by means of external computing force.
It will be readily appreciated by those skilled in the art that the foregoing is merely illustrative of the present invention and is not intended to limit the invention, but any modifications, equivalents, improvements or the like which fall within the spirit and principles of the present invention are intended to be included within the scope of the present invention.
Claims (9)
1. The tax data security map neural network training method based on matrix decomposition is characterized by comprising the following steps of:
firstly, carrying out safe eigenvalue decomposition on an adjacent matrix of a tax data graph by using an external server, dividing an obtained eigenvalue decomposition result into a plurality of parts, and carrying out operation on the parts and an eigenvector matrix to generate partial secrets of a plurality of distributable adjacent matrices; secondly, carrying out differential privacy on the feature matrix of the tax data graph; thirdly, the tax data has the characteristic matrix after partial secrets and differential privacy of the decomposed adjacent matrix are distributed to each computing party through a parameter server for model training; and finally, returning the calculation result to the tax data owner by the calculator, and obtaining the target model parameters through integrating and updating by the parameter server.
2. The training method of the tax data security map neural network based on matrix decomposition according to claim 1, wherein the method specifically comprises the following steps:
1) Adjacency matrix secret sharing based on eigenvalue decomposition
The adjacency matrix of the tax data graph is subjected to safe eigenvalue decomposition by an external server; randomly and equally dividing the characteristic values into corresponding parts according to the number of the calculation parties, wherein the operation result of the characteristic value decomposition result and the characteristic vector matrix is the partial secret of the publishable adjacent matrix;
2) Feature matrix protection based on differential privacy
The feature matrix of the tax data graph is protected by using a Laplacian mechanism by using a differential privacy method;
3) Model training and integration based on parameter server
And distributing the partial secrets of the decomposed adjacent matrixes and the characteristic matrixes after the differential privacy to each computing party, and each computing party trains a graph convolution neural network model based on the distributed data and obtains target model parameters by sending, collecting and integrating model parameters through a parameter server.
3. The training method of the neural network for the tax data security map based on matrix decomposition according to claim 2, wherein in the step 1), the secret sharing of the adjacency matrix based on eigenvalue decomposition comprises:
step1: secure matrix eigenvalue decomposition
For the adjacency matrix A of the tax data graph, obtaining a characteristic value decomposition numerical solution with enough accuracy through multiple iterations of QR decomposition:
wherein t is iteration round, Q t 、R t Respectively t times A t QR decomposition results of (2); through k iterationsAfter that, eigenvalue diagonal matrix Λ=a k Feature vector matrix x=q 1 …Q 1 Original adjacency matrix a=xΛx -1 ;
Step2: topological secret sharing
For the obtained eigenvalue diagonal matrix Λ, randomly dividing eigenvalues into a plurality of groups in the form of a plurality of diagonal matrices, and when dividing into two groups, specifically comprising the following steps:
generating a random diagonal 01 matrix S, wherein the diagonal elements obey the following rules:
generating a new diagonal matrix Λ 1 、Λ 2 The method comprises the following steps:
wherein I is n Representing an n-dimensional identity matrix, and x h represents Hadamard products, namely multiplying corresponding elements of the matrix;
using a newly generated diagonal matrix Λ 1 、Λ 2 Generating a new matrix A 1 、A 2 The method comprises the following steps:
A 1 、A 2 has the following properties:
in the GNN model, the graph topology is represented in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model.
4. According toThe training method for the tax data security map neural network based on matrix decomposition according to claim 3, wherein in Step 1) of the Step 1), security decomposition on the adjacent matrix a is performed by setting a third party server: the data owner generates a sparse random 01 matrix P, calculates and uploads A' =PAP to a third party server -1 The third party server calculates the eigenvalue decomposition of A ' according to the iterative solution process and returns the calculation result X ', Λ to the data owner, wherein A ' =X ' ΛX ' -1 Data owner calculates x=p -1 X', obtaining a matrix decomposition result.
5. The training method of tax data security graph neural network based on matrix decomposition according to claim 3, wherein in Step2 of Step 1), for two layers of GCNs, node embedding is affected by neighbors in the two-hop range of the two layers of GCNs, and the neighbors in the two ranges are represented by square A of an adjacency matrix 2 Representation, A 2 The connection relation of the graph and the information transfer between the nodes can be effectively indicated; recording node number n, decomposing original adjacent matrix into k matrices, each matrix after decomposition containingPersonal characteristic value, lack ofThe probability of correct arrangement on the premise of obtaining all the characteristic values>
6. The training method of the neural network for the tax data security map based on matrix factorization of claim 5, wherein when n=100 and k=2, p is approximately 3.3×10 -6 。
7. The method for training a neural network for tax data security map based on matrix decomposition according to claim 3, wherein in step 2), the feature matrix protection based on differential privacy comprises:
step1: privacy budget and global sensitivity calculation
Differential privacy protection is carried out on the feature matrix X of the tax data graph by using a Laplacian mechanism, and the global sensitivity delta is calculated according to the set privacy budget E f :
Δ f =max D,D′ {|h=h′|}
Wherein D, D 'is a pair of adjacent data, h are the results of the random query for D, D', respectively; order theThe laplace noise profile to be added is set as follows:
step2: noise injection
And inserting the Laplacian noise generated in the last step into the feature matrix X of the tax data graph to obtain a feature matrix X' for privacy protection.
8. The training method of the tax data security map neural network based on matrix decomposition according to claim 7, wherein in Step2 of Step 2), the laplace mechanism satisfies e-difference privacy, namely:
Pr[M(D)=y]≤e ∈ Pr[M(D′)=y]
where M is the processing mechanism applied.
9. The method for training a neural network for tax data security map based on matrix factorization of claim 7, wherein in step 3), the model training and integration based on the parameter server comprises:
step1: data distribution
Adjacency matrix A of tax data graph is decomposed into { A } k …, k=1, 2, …, the feature matrix X of the tax data map is subjected to differential privacy processing to obtain X'; data possession provides privacy protected data to the computing parties, each party obtaining A k And X' as input to the GNN model;
step2: model training based on privacy-preserving data
Selecting a graph convolution neural network model for training, wherein a computing party k locally has a two-layer GCN model, and training is carried out on data distributed to the computing party k; wherein the first layer inputs are a node feature matrix X and an adjacency matrix A k After information transmission and aggregation, the feature matrix is hidden by the output node:
H k,1 =f(A k X′W k,1 )
the second layer input being the output H of the first layer k,1 And adjacent matrix A k Output node hidden feature matrix H k,2 For node classification or other downstream tasks;
H k,2 =f(A k H k,1 W k,2 )
after training, the computing party uploads model parameters W to a parameter server held by the data owner k,1 、W k,2 Simultaneously pulling the updated model parameters of the parameter server; the parameter server integrates the model parameters by means of a model average mode in a distributed machine learning method after collecting the model parameters uploaded by each participant, so that new model parameters are obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310795131.2A CN116861152A (en) | 2023-06-30 | 2023-06-30 | Tax data security graph neural network training method based on matrix decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310795131.2A CN116861152A (en) | 2023-06-30 | 2023-06-30 | Tax data security graph neural network training method based on matrix decomposition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116861152A true CN116861152A (en) | 2023-10-10 |
Family
ID=88224495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310795131.2A Pending CN116861152A (en) | 2023-06-30 | 2023-06-30 | Tax data security graph neural network training method based on matrix decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116861152A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117371046A (en) * | 2023-12-07 | 2024-01-09 | 清华大学 | Multi-party collaborative optimization-oriented data privacy enhancement method and device |
-
2023
- 2023-06-30 CN CN202310795131.2A patent/CN116861152A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117371046A (en) * | 2023-12-07 | 2024-01-09 | 清华大学 | Multi-party collaborative optimization-oriented data privacy enhancement method and device |
CN117371046B (en) * | 2023-12-07 | 2024-03-01 | 清华大学 | Multi-party collaborative optimization-oriented data privacy enhancement method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | Differential privacy and applications | |
Li et al. | Multi-source information fusion based heterogeneous network embedding | |
CN116861152A (en) | Tax data security graph neural network training method based on matrix decomposition | |
CN115660050A (en) | Robust federated learning method with efficient privacy protection | |
CN115098882B (en) | Multi-dimensional data release method and system based on local differential privacy of incremental learning | |
Kang et al. | Enhanced privacy preserving for social networks relational data based on personalized differential privacy | |
CN112101577A (en) | XGboost-based cross-sample federal learning and testing method, system, device and medium | |
Lin et al. | Computing the diffusion state distance on graphs via algebraic multigrid and random projections | |
Ji et al. | Differentially private binary-and matrix-valued data query: An XOR mechanism | |
Chen et al. | JIT2R: A joint framework for item tagging and tag-based recommendation | |
CN114896977A (en) | Dynamic evaluation method for entity service trust value of Internet of things | |
CN114036581A (en) | Privacy calculation method based on neural network model | |
Chen et al. | Neighborhood convolutional graph neural network | |
CN116361759B (en) | Intelligent compliance control method based on quantitative authority guidance | |
Liu et al. | Secure Federated Evolutionary Optimization—A Survey | |
Zhang et al. | A novel privacy-preserving graph convolutional network via secure matrix multiplication | |
CN115640427A (en) | Network structure hiding method and device based on personality information in social network | |
CN111967600B (en) | Feature derivation method based on genetic algorithm in wind control scene | |
Lv et al. | Market behavior-oriented deep learning-based secure data analysis in smart cities | |
CN107798249A (en) | The dissemination method and terminal device of behavioral pattern data | |
Gao et al. | BI-FedGNN: Federated graph neural networks framework based on Bayesian inference | |
CN113158088A (en) | Position recommendation method based on graph neural network | |
Wu et al. | Spatio‐Temporal Traffic Data Tensor Restoration Method Based on Direction Weighting and P‐Shrinkage Norm | |
Demir | Authorship Authentication of Short Messages from Social Networks Using Recurrent Artificial Neural Networks | |
Huang et al. | A federated graph neural network framework for privacy-preserving personalization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |