CN116861152A - Tax data security graph neural network training method based on matrix decomposition - Google Patents

Tax data security graph neural network training method based on matrix decomposition Download PDF

Info

Publication number
CN116861152A
CN116861152A CN202310795131.2A CN202310795131A CN116861152A CN 116861152 A CN116861152 A CN 116861152A CN 202310795131 A CN202310795131 A CN 202310795131A CN 116861152 A CN116861152 A CN 116861152A
Authority
CN
China
Prior art keywords
matrix
tax data
graph
decomposition
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310795131.2A
Other languages
Chinese (zh)
Inventor
师斌
刘奥
张纪强
赵锐
潘天泽
董博
郑庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202310795131.2A priority Critical patent/CN116861152A/en
Publication of CN116861152A publication Critical patent/CN116861152A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Finance (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioethics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Algebra (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a tax data security graph neural network training method based on matrix decomposition, which comprises the following steps: firstly, carrying out safe eigenvalue decomposition on an adjacent matrix part of a tax data graph by using an external server, dividing an obtained eigenvalue decomposition result into a plurality of parts, and carrying out operation on the parts and an eigenvector matrix to generate a plurality of distributable adjacent matrixes; secondly, carrying out differential privacy on the feature matrix part of the tax data graph; thirdly, the tax data has the characteristic matrix after the decomposed adjacency matrix and the differential privacy are distributed to each computing party through a parameter server for model training; and finally, returning the calculation result to the tax data owner by the calculator, and obtaining the target model parameters through integrating and updating by the parameter server. According to the method, the original tax data is safely decomposed in the modes of topology secret sharing and adjacency matrix eigenvalue decomposition, so that the tax data is efficiently analyzed and modeled by means of external computing resources, and the analysis efficiency is improved.

Description

一种基于矩阵分解的税务数据安全图神经网络训练方法A neural network training method for tax data security graph based on matrix decomposition

技术领域Technical field

本发明属于图隐私保护方法技术领域,特别涉及一种基于矩阵分解的税务数据安全图神经网络训练方法。The invention belongs to the technical field of graph privacy protection methods, and particularly relates to a tax data security graph neural network training method based on matrix decomposition.

背景技术Background technique

近年来,随着国民经济的快速发展和市场经济的不断繁荣,税务数据日益庞杂。税务数据往往表现为图结构数据类型,反应个体税务信息与社会关系信息。因此,图神经网络能够有效针对税务数据中的图结构数据建模,深度挖掘其中蕴含信息。税务数据建模是税务数据智能化处理的基础性工作,是实现税务大数据的关键前提,但是税务数据规模的日益庞大和其中蕴含的大量隐私信息阻碍了对其的分析利用。传统的数据保护致力于离散的数据点,通过技术手段使得单个数据无法被识别利用,而图结构数据不仅包含节点信息,还包含丰富且重要的拓扑信息,传统数据保护手段难以全面保护。与传统数据保护方式不同,目前的隐私保护研究致力于实现数据的“可用不可见”,即在不影响使用数据的情况下保护其中的隐私信息不被泄露。针对图结构数据,隐私保护研究聚焦于保护图拓扑信息,以避免敏感信息被泄露,相比传统方式能够更加有力地保障图结构数据的安全。现有的税务数据建模,由于税务数据规模的庞大与税务机构的计算能力有限两重限制,由税务机构自身完成建模任务往往效率低下,迫切需要借助外部计算资源提升效率;但同时税务数据中包含大量敏感信息,暴露的后果十分严重,不被允许直接借助外部机构算力处理相关数据,一方面需要对税务数据进行安全化处理,避免隐私信息泄露,另一方面又需要处理后的数据可以正确地建模。随着纳税人数据量日益庞大,税务数据规模日益增大,内容日益复杂,如何在保证税务数据安全的同时,摆脱本地算力制约,利用外部算力高效训练针对税务数据的图神经网络模型已成为一个亟待解决的问题,对于加速税务数据处理,进一步实现税务大数据具有重要意义。In recent years, with the rapid development of the national economy and the continuous prosperity of the market economy, tax data has become increasingly complex. Tax data is often represented as a graph structure data type, reflecting individual tax information and social relationship information. Therefore, graph neural networks can effectively model graph-structured data in tax data and deeply mine the information contained therein. Tax data modeling is the basic work for intelligent processing of tax data and a key prerequisite for realizing tax big data. However, the increasing scale of tax data and the large amount of private information contained in it hinder its analysis and utilization. Traditional data protection focuses on discrete data points, making individual data unable to be identified and utilized through technical means. However, graph-structured data contains not only node information, but also rich and important topological information, making it difficult for traditional data protection methods to fully protect it. Different from traditional data protection methods, current privacy protection research is committed to making data "available and invisible", that is, protecting private information from being leaked without affecting the use of data. For graph-structured data, privacy protection research focuses on protecting graph topology information to avoid leakage of sensitive information, which can more effectively ensure the security of graph-structured data than traditional methods. Existing tax data modeling is often inefficient due to the huge scale of tax data and the limited computing power of tax agencies. It is often inefficient for tax agencies to complete the modeling tasks themselves, and there is an urgent need to use external computing resources to improve efficiency; but at the same time, tax data It contains a large amount of sensitive information, and the consequences of exposure are very serious. It is not allowed to directly use the computing power of external organizations to process relevant data. On the one hand, tax data needs to be processed securely to avoid the leakage of private information, and on the other hand, the processed data needs to be processed. can be modeled correctly. With the increasing amount of taxpayer data, the scale of tax data is increasing, and the content is becoming increasingly complex. How to ensure the security of tax data while getting rid of the constraints of local computing power and using external computing power to efficiently train graph neural network models for tax data has become a problem. It has become an urgent problem to be solved, which is of great significance for accelerating tax data processing and further realizing tax big data.

目前尚未有相关研究对税务数据隐私保护图神经网络训练方法提出相应的解决方案,主要涉及的税务数据保护相关发明专利有:At present, there is no relevant research that proposes corresponding solutions for the tax data privacy protection graph neural network training method. The main invention patents related to tax data protection include:

文献1:一种基于区块链的税务信息处理方法及系统(202011290032.1)Document 1: A blockchain-based tax information processing method and system (202011290032.1)

文献2:一种基于多维度特征的企业批量聚类方法和系统(202211142876.0)Document 2: An enterprise batch clustering method and system based on multi-dimensional features (202211142876.0)

文献1设计了一种基于区块链的税务信息处理方法和系统,利用区块链,将税务机构作为区块链的税务节点,管理区块,并根据业务机构划分不同的通道,每一个通道链接税务节点和相应的业务机构节点,利用税务节点根据用户授权,将税务证明信息广播给相应通道内的业务机构节点,使所述的业务机构节点获取税务证明信息。Document 1 designs a tax information processing method and system based on blockchain. Using blockchain, the tax agency is used as the tax node of the blockchain to manage the blocks and divide different channels according to the business organization. Each channel Link the tax node and the corresponding business organization node, and use the tax node to broadcast the tax certificate information to the business organization node in the corresponding channel according to user authorization, so that the business organization node obtains the tax certificate information.

文献2设计了一种基于多维度特征的企业批量聚类方法和系统,通过采集税务领域多个待聚类目标企业的税务数据,新闻数据和舆情数据,对采集的数据进行解析后生成特征数据,并根据特征数据构建图结构,以及将所述图结构作为最优图神经网络聚类模型的输入,获取待聚类目标企业的聚类结果。Document 2 designs a batch clustering method and system for enterprises based on multi-dimensional features. It collects tax data, news data and public opinion data from multiple target enterprises in the tax field to be clustered, and analyzes the collected data to generate feature data. , and construct a graph structure based on the characteristic data, and use the graph structure as the input of the optimal graph neural network clustering model to obtain the clustering results of the target enterprises to be clustered.

上述技术方案中,文献1聚焦于税务数据的存储保护,应用区块链技术保证了数据的安全性,但未考虑受保护数据的应用,对数据的查询使用效率较低,文献2在采集好税务数据的前提下对税务数据图进行建模,利用图神经网络分析税务数据,尽管获得了良好的分析结果,但在整个过程中没有考虑税务数据的隐私保护,可能带来一定的安全隐患。然而,现实情况下,受限于税务机构的计算能力,对现有税务数据处理效率较低,同时又受限于税务数据的敏感性,不能直接借用外部机构算力分析处理相关数据。因此,如何在保证税务数据安全的同时,高效训练针对税务数据的图神经网络模型已成为一个亟待解决的问题。Among the above technical solutions, Document 1 focuses on the storage protection of tax data and applies blockchain technology to ensure data security. However, it does not consider the application of protected data, and the query and use efficiency of data is low. Document 2 is good at collecting data. The tax data graph is modeled on the premise of tax data, and the tax data is analyzed using graph neural network. Although good analysis results are obtained, the privacy protection of tax data is not considered in the entire process, which may bring certain security risks. However, in reality, limited by the computing power of tax agencies, the processing efficiency of existing tax data is low. At the same time, limited by the sensitivity of tax data, it is not possible to directly borrow the computing power of external organizations to analyze and process relevant data. Therefore, how to efficiently train a graph neural network model for tax data while ensuring the security of tax data has become an urgent problem to be solved.

发明内容Contents of the invention

本发明旨在提供一种基于矩阵分解的税务数据安全图神经网络训练方法。首先,对税务数据图的邻接矩阵部分利用外部服务器进行安全的特征值分解,并将获得的特征值分解结果分成多个部分,与特征向量矩阵做运算,生成多个可分发的邻接矩阵;其次,对税务数据图的特征矩阵部分,进行差分隐私;再次,税务数据拥有着通过参数服务器将分解后的邻接矩阵与差分隐私后的特征矩阵分发给各计算方进行模型训练;最后,计算方将计算结果返回给税务数据拥有者,经过参数服务器整合更新获得目标模型参数。The present invention aims to provide a tax data security graph neural network training method based on matrix decomposition. First, use an external server to perform secure eigenvalue decomposition on the adjacency matrix part of the tax data graph, divide the obtained eigenvalue decomposition results into multiple parts, and perform operations with the eigenvector matrix to generate multiple distributable adjacency matrices; secondly, , carry out differential privacy on the feature matrix part of the tax data graph; thirdly, the tax data owner distributes the decomposed adjacency matrix and the differentially private feature matrix through the parameter server to each computing party for model training; finally, the computing party will The calculation results are returned to the tax data owner, and the target model parameters are obtained through integration and update by the parameter server.

为了达到以上目的,本发明采取以下技术方案:In order to achieve the above objects, the present invention adopts the following technical solutions:

一种基于矩阵分解的税务数据安全图神经网络训练方法,包括:A tax data security graph neural network training method based on matrix decomposition, including:

首先,对税务数据图的邻接矩阵利用外部服务器进行安全的特征值分解,并将获得的特征值分解结果分成多个部分,与特征向量矩阵做运算,生成多个可分发的邻接矩阵的部分秘密;其次,对税务数据图的特征矩阵,进行差分隐私;再次,税务数据拥有着通过参数服务器将分解后的邻接矩阵的部分秘密与差分隐私后的特征矩阵分发给各计算方进行模型训练;最后,计算方将计算结果返回给税务数据拥有者,经过参数服务器整合更新获得目标模型参数。First, use an external server to perform secure eigenvalue decomposition on the adjacency matrix of the tax data graph, divide the obtained eigenvalue decomposition results into multiple parts, and perform operations with the eigenvector matrix to generate partial secrets of multiple distributable adjacency matrices. ; Secondly, carry out differential privacy on the feature matrix of the tax data graph; thirdly, the tax data has the partial secret of the decomposed adjacency matrix and the differentially private feature matrix distributed to each computing party through the parameter server for model training; finally , the calculation party returns the calculation results to the tax data owner, and obtains the target model parameters through integration and update by the parameter server.

本发明进一步的改进在于,该方法具体包括以下步骤:A further improvement of the present invention is that the method specifically includes the following steps:

1)基于特征值分解的邻接矩阵秘密分享1) Adjacency matrix secret sharing based on eigenvalue decomposition

对税务数据图的邻接矩阵,借助外部服务器对其进行安全的特征值分解;根据计算方数量将特征值随机均等分成相应份数,特征值分解结果与特征向量矩阵的运算结果即为可发布的邻接矩阵的部分秘密;For the adjacency matrix of the tax data graph, use an external server to perform secure eigenvalue decomposition; randomly divide the eigenvalues into corresponding parts according to the number of calculation parties, and the eigenvalue decomposition result and the eigenvector matrix operation result can be published Part of the secret of the adjacency matrix;

2)基于差分隐私的特征矩阵保护2) Feature matrix protection based on differential privacy

对税务数据图的特征矩阵,利用差分隐私方法,应用拉普拉斯机制加以保护;For the feature matrix of the tax data graph, the differential privacy method is used and the Laplacian mechanism is applied to protect it;

3)基于参数服务器的模型训练与整合3) Model training and integration based on parameter server

将分解后的邻接矩阵的部分秘密和差分隐私后的特征矩阵分发给各计算方,各计算方基于分配的数据训练图卷积神经网络模型,通过参数服务器发送、收集和整合模型参数,获得目标模型参数。Distribute the partial secrets of the decomposed adjacency matrix and the differentially private feature matrix to each computing party. Each computing party trains the graph convolutional neural network model based on the distributed data, sends, collects and integrates model parameters through the parameter server to obtain the target model parameters.

本发明进一步的改进在于,步骤1)中,基于特征值分解的邻接矩阵秘密分享包括:A further improvement of the present invention is that in step 1), the adjacency matrix secret sharing based on eigenvalue decomposition includes:

Step1:安全的矩阵特征值分解Step1: Safe matrix eigenvalue decomposition

对税务数据图的邻接矩阵A,通过QR分解的多次迭代,获得足够精确的特征值分解数值解:For the adjacency matrix A of the tax data graph, through multiple iterations of QR decomposition, a sufficiently accurate numerical solution of eigenvalue decomposition is obtained:

其中t为迭代轮次,Qt、Rt分别是t轮次对At的QR分解结果;经过k次迭代后,特征值对角矩阵Λ=Ak,特征向量矩阵X=Q1…Q1,原邻接矩阵A=XΛX-1where t is the iteration round, Q t and R t are the QR decomposition results of A t in round t respectively; after k iterations, the eigenvalue diagonal matrix Λ=A k and the eigenvector matrix X=Q 1 ...Q 1 , the original adjacency matrix A=XΛX -1 ;

Step2:拓扑秘密分享Step2: Topology secret sharing

对于获得的特征值对角矩阵Λ,以多个对角矩阵的形式将特征值随机分成多组,当分成两组时,具体步骤如下:For the obtained eigenvalue diagonal matrix Λ, the eigenvalues are randomly divided into multiple groups in the form of multiple diagonal matrices. When divided into two groups, the specific steps are as follows:

生成随机对角01矩阵S,其中对角元素服从以下规则:Generate a random diagonal 01 matrix S, where the diagonal elements obey the following rules:

生成新对角矩阵Λ1、Λ2,方法如下:Generate new diagonal matrices Λ 1 and Λ 2 as follows:

其中In表示n维单位矩阵,×h表示哈达玛积,即矩阵对应元素相乘;Where In represents the n-dimensional unit matrix, × h represents the Hadamard product, that is, the multiplication of corresponding elements of the matrix;

利用新生成的对角矩阵Λ1、Λ2,生成新矩阵A1、A2,方法如下:Use the newly generated diagonal matrices Λ 1 and Λ 2 to generate new matrices A1 and A2 as follows:

A1、A2具有以下性质:A 1 and A 2 have the following properties:

在GNN模型中,图拓扑结构以邻接矩阵形式表示,邻接矩阵的乘方能够反应GNN模型的信息传递过程。In the GNN model, the graph topology is expressed in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model.

本发明进一步的改进在于,步骤1)的Step1中,通过设置第三方服务器进行对邻接矩阵A的安全分解:数据拥有方生成稀疏的随机01矩阵P,计算并向第三方服务器上传A′=PAP-1,第三方服务器按上述迭代求解过程计算A′的特征值分解并将计算结果X′、Λ返回给数据拥有方,有A′=X′ΛX′-1,数据拥有方计算X=P-1X′,得到矩阵分解结果。A further improvement of the present invention is that in Step 1), a third-party server is set up to perform safe decomposition of the adjacency matrix A: the data owner generates a sparse random 01 matrix P, calculates and uploads A′=PAP to the third-party server -1 , the third-party server calculates the eigenvalue decomposition of A′ according to the above iterative solution process and returns the calculation results X′, Λ to the data owner. A′=X′ΛX′ -1 , the data owner calculates X=P -1 X′, get the matrix decomposition result.

本发明进一步的改进在于,步骤1)的Step2中,对于两层GCN,节点嵌入受其两跳范围内的邻居影响,两条范围内的邻居用邻接矩阵的平方A2表示,A2能够有效表明图的连接关系和节点间的信息传递;记节点数n,将原始邻接矩阵分解成k个矩阵,分解后的每个矩阵包含个特征值,缺少/>个特征值,在获得全部特征值的前提下,正确排列的概率 A further improvement of the present invention is that in Step 2 of step 1), for the two-layer GCN, the node embedding is affected by its neighbors within the two-hop range. The neighbors within the two ranges are represented by the square of the adjacency matrix A 2 , and A 2 can effectively Indicate the connection relationship of the graph and the information transfer between nodes; record the number of nodes n, decompose the original adjacency matrix into k matrices, and each decomposed matrix contains eigenvalues, missing/> eigenvalues, on the premise of obtaining all eigenvalues, the probability of correct arrangement

本发明进一步的改进在于,当n=100,k=2时,p≈3.3×10-65A further improvement of the present invention is that when n=100 and k=2, p≈3.3×10 -65 .

本发明进一步的改进在于,步骤2)中,基于差分隐私的特征矩阵保护包括:A further improvement of the present invention is that in step 2), the feature matrix protection based on differential privacy includes:

Step1:隐私预算及全局敏感度计算Step1: Privacy budget and global sensitivity calculation

应用拉普拉斯机制,对税务数据图的特征矩阵X进行差分隐私保护,根据设置的隐私预算∈,计算全局敏感度ΔfApply the Laplacian mechanism to perform differential privacy protection on the feature matrix X of the tax data graph, and calculate the global sensitivity Δ f according to the set privacy budget ∈:

Δf=maxD,D′{|h=h′|}Δ f =max D, D′ {|h=h′|}

其中D、D′为一对相邻数据,h、h分别是针对D、D′的随机查询的结果;令设置要添加的拉普拉斯噪声分布如下:Among them, D and D′ are a pair of adjacent data, h and h are the results of random queries for D and D′ respectively; let Set the Laplacian noise distribution to be added as follows:

上述拉普拉斯机制满足∈-差分隐私,即:The above Laplacian mechanism satisfies ∈-differential privacy, that is:

Pr[M(D)=y]≤ePr[M(D′)=y]Pr[M(D)=y]≤e Pr[M(D′)=y]

其中M为所应用的处理机制;Where M is the processing mechanism applied;

Step2:噪声注入Step2: Noise injection

对税务数据图的特征矩阵X,插入上一步生成的拉普拉斯噪声,获得隐私保护的特征矩阵X′。Insert the Laplacian noise generated in the previous step into the feature matrix X of the tax data graph to obtain the privacy-preserving feature matrix X′.

本发明进一步的改进在于,步骤3)中,基于参数服务器的模型训练与整合包括:A further improvement of the present invention is that in step 3), the model training and integration based on the parameter server includes:

Step1:数据分配Step1: Data distribution

税务数据图的邻接矩阵A被分解为{Ak…},k=1,2,…,税务数据图的特征矩阵X经过差分隐私处理得到X′;数据拥有方向计算方提供隐私保护的数据,每个计算方得到Ak和X′作为GNN模型的输入;The adjacency matrix A of the tax data graph is decomposed into {A k ...}, k=1, 2, ..., and the feature matrix X of the tax data graph is processed through differential privacy to obtain Each calculation side obtains A k and X′ as inputs to the GNN model;

Step2:基于隐私保护数据的模型训练Step2: Model training based on privacy-preserving data

选择图卷积神经网络模型进行训练,计算方k本地拥有两层的GCN模型,在分配给自身的数据上进行训练;其中第一层输入是节点特征矩阵X和邻接矩阵Ak,经过信息传递与聚合后输出节点隐藏特征矩阵:The graph convolutional neural network model is selected for training. Calculator k locally has a two-layer GCN model and is trained on the data assigned to itself; the first layer input is the node feature matrix X and the adjacency matrix Ak. After information transfer and After aggregation, output node hidden feature matrix:

Hk,1=f(AkX′Wk,1)H k,1 =f(A k X'W k,1 )

第二层输入是第一层的输出Hk,1与邻接矩阵Ak,输出节点隐藏特征矩阵Hk,2,用于节点分类或其他下游任务;The input of the second layer is the output H k,1 of the first layer and the adjacency matrix A k , and the output node hidden feature matrix H k,2 is used for node classification or other downstream tasks;

Hk,2=f(AkHk,1Wk,2)H k,2 =f(A k H k,1 W k,2 )

计算方在训练后向由数据拥有方持有的参数服务器上传模型参数Wk,1、Wk,2,同时拉取参数服务器更新后的模型参数;参数服务器在收集各个参与方上传的模型参数后借助分布式机器学习方法中的模型平均方式对模型参数进行整合,从而获得新的模型参数。After training, the calculating party uploads the model parameters W k,1 , W k,2 to the parameter server held by the data owner, and at the same time pulls the updated model parameters from the parameter server; the parameter server collects the model parameters uploaded by each participant Finally, the model parameters are integrated using the model averaging method in the distributed machine learning method to obtain new model parameters.

本发明至少具有以下有益的技术效果:The present invention has at least the following beneficial technical effects:

(1)本发明对税务数据图的邻接矩阵和特证矩阵分别进行了隐私保护处理,通过拓扑秘密分享及邻接矩阵特征值分解的方式,保护图拓扑信息不被计算方所知,通过差分隐私的方式,保护节点特征信息,本发明在整个过程中保障了敏感信息的安全性。(1) This invention performs privacy protection processing on the adjacency matrix and the special certificate matrix of the tax data graph respectively. Through topological secret sharing and adjacency matrix eigenvalue decomposition, the topological information of the graph is protected from being known by the calculating party. Through differential privacy, way to protect node characteristic information, and the present invention ensures the security of sensitive information in the entire process.

(2)本发明拓扑秘密分享及邻接矩阵特征值分解的方式,将原始税务数据进行了安全的分解,进而借助外部算力资源实现了对税务数据的高效分析建模,提高了分析效率。(2) The present invention uses topological secret sharing and adjacency matrix eigenvalue decomposition to safely decompose original tax data, and then uses external computing resources to achieve efficient analysis and modeling of tax data, improving analysis efficiency.

附图说明Description of the drawings

图1为整体框架流程图。Figure 1 is the overall framework flow chart.

图2为基于特征值分解的邻接矩阵秘密分享流程图。Figure 2 is a flow chart of adjacency matrix secret sharing based on eigenvalue decomposition.

图3为基于参数服务器的模型训练与整合流程图。Figure 3 is a flow chart of model training and integration based on parameter server.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a thorough understanding of the disclosure, and to fully convey the scope of the disclosure to those skilled in the art. It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and embodiments.

如图1所示,本发明提供的一种基于矩阵分解的税务数据安全图神经网络训练方法,包括以下步骤:As shown in Figure 1, the invention provides a tax data security graph neural network training method based on matrix decomposition, including the following steps:

1)基于特征值分解的邻接矩阵秘密分享1) Adjacency matrix secret sharing based on eigenvalue decomposition

对税务数据图的邻接矩阵,借助外部服务器对其进行安全的特征值分解;根据计算方数量将特征值随机均等分成相应份数,特征值分解结果与特征向量矩阵的运算结果即为可发布的邻接矩阵的部分秘密;基于特征值分解的邻接矩阵秘密分享包括:For the adjacency matrix of the tax data graph, use an external server to perform secure eigenvalue decomposition; randomly divide the eigenvalues into corresponding parts according to the number of calculation parties, and the eigenvalue decomposition result and the eigenvector matrix operation result can be published Part of the secret of the adjacency matrix; the secret sharing of the adjacency matrix based on eigenvalue decomposition includes:

Step1:安全的矩阵特征值分解Step1: Safe matrix eigenvalue decomposition

对税务数据图的邻接矩阵A,通过QR分解的多次迭代,获得足够精确的特征值分解数值解:For the adjacency matrix A of the tax data graph, through multiple iterations of QR decomposition, a sufficiently accurate numerical solution of eigenvalue decomposition is obtained:

其中t为迭代轮次,Qt、Rt分别是t轮次对At的QR分解结果;经过k次迭代后,特征值对角矩阵Λ=Ak,特征向量矩阵X=Q1…Q1,原邻接矩阵A=XΛX-1where t is the iteration round, Q t and R t are the QR decomposition results of A t in round t respectively; after k iterations, the eigenvalue diagonal matrix Λ=A k and the eigenvector matrix X=Q 1 ...Q 1 , the original adjacency matrix A=XΛX -1 ;

Step2:拓扑秘密分享Step2: Topology secret sharing

对于获得的特征值对角矩阵Λ,以多个对角矩阵的形式将特征值随机分成多组,当分成两组时,具体步骤如下:For the obtained eigenvalue diagonal matrix Λ, the eigenvalues are randomly divided into multiple groups in the form of multiple diagonal matrices. When divided into two groups, the specific steps are as follows:

生成随机对角01矩阵S,其中对角元素服从以下规则:Generate a random diagonal 01 matrix S, where the diagonal elements obey the following rules:

生成新对角矩阵Λ1、Λ2,方法如下:Generate new diagonal matrices Λ 1 and Λ 2 as follows:

其中In表示n维单位矩阵,×h表示哈达玛积,即矩阵对应元素相乘;Where In represents the n-dimensional unit matrix, × h represents the Hadamard product, that is, the multiplication of corresponding elements of the matrix;

利用新生成的对角矩阵Λ1、Λ2,生成新矩阵A1、A2,方法如下:Use the newly generated diagonal matrices Λ 1 and Λ 2 to generate new matrices A1 and A2 as follows:

A1、A2具有以下性质:A 1 and A 2 have the following properties:

在GNN模型中,图拓扑结构以邻接矩阵形式表示,邻接矩阵的乘方能够反应GNN模型的信息传递过程。通过设置第三方服务器进行对邻接矩阵A的安全分解:数据拥有方生成稀疏的随机01矩阵P,计算并向第三方服务器上传A′=PAP-1,第三方服务器按上述迭代求解过程计算A′的特征值分解并将计算结果X′、Λ返回给数据拥有方,有A′=X′ΛX′-1,数据拥有方计算X=P-1X′,得到矩阵分解结果。对于两层GCN,节点嵌入受其两跳范围内的邻居影响,两条范围内的邻居用邻接矩阵的平方A2表示,A2能够有效表明图的连接关系和节点间的信息传递。由于A1、A2所具有的性质,将其分发给不同的计算方计算乘方,并通过回收整合结果获得原始数据的乘方结果。另一方面,对计算方而言,分解后的A1、A2数值上与有01值的原始邻接矩阵大不相同,包含大量小数,从数值上无法识别任意边的存在。对于计算方,从分配的矩阵中仅能推出原始邻接矩阵部分特征值,不足以恢复原始邻接矩阵。即使计算方获得了所有的特征值,其需要以正确顺序排列特征值,以此恢复原始邻接矩阵,但正确排列的概率非常小。记节点数n,将原始邻接矩阵分解成k个矩阵,分解后的每个矩阵包含个特征值,缺少/>个特征值,在获得全部特征值的前提下,正确排列的概率/>在未获得全部特征值的情况下,计算方恢复原始邻接矩阵的概率更是远小于上述概率p。In the GNN model, the graph topology is expressed in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model. Securely decompose the adjacency matrix A by setting up a third-party server: the data owner generates a sparse random 01 matrix P, calculates and uploads A′=PAP -1 to the third-party server, and the third-party server calculates A′ according to the above iterative solution process The eigenvalues are decomposed and the calculation results X′ and Λ are returned to the data owner, with A′=X′ΛX′ -1 . The data owner calculates X=P -1 X′ and obtains the matrix decomposition result. For a two-layer GCN, the node embedding is affected by its neighbors within the two-hop range. The neighbors within the two ranges are represented by the square A 2 of the adjacency matrix. A 2 can effectively indicate the connection relationship of the graph and the information transfer between nodes. Due to the properties of A 1 and A 2 , they are distributed to different calculation parties to calculate exponentiation, and the exponentiation result of the original data is obtained by recycling the integration results. On the other hand, for the calculation side, the decomposed A 1 and A 2 are numerically very different from the original adjacency matrix with a value of 01, and contain a large number of decimals, making it impossible to identify the existence of any edge numerically. For the calculation side, only part of the eigenvalues of the original adjacency matrix can be derived from the allocated matrix, which is not enough to restore the original adjacency matrix. Even if the calculation side obtains all the eigenvalues, it needs to arrange the eigenvalues in the correct order to restore the original adjacency matrix, but the probability of correct arrangement is very small. Record the number of nodes n, and decompose the original adjacency matrix into k matrices. Each decomposed matrix contains eigenvalues, missing/> eigenvalues, on the premise of obtaining all eigenvalues, the probability of correct arrangement/> In the case where all eigenvalues are not obtained, the probability of the calculation party recovering the original adjacency matrix is much smaller than the above probability p.

2)基于差分隐私的特征矩阵保护2) Feature matrix protection based on differential privacy

对税务数据图的特征矩阵,利用差分隐私方法,应用拉普拉斯机制加以保护;基于差分隐私的特征矩阵保护包括:For the feature matrix of the tax data graph, the differential privacy method is used and the Laplacian mechanism is applied to protect it; the feature matrix protection based on differential privacy includes:

Step1:隐私预算及全局敏感度计算Step1: Privacy budget and global sensitivity calculation

应用拉普拉斯机制,对税务数据图的特征矩阵X进行差分隐私保护,根据设置的隐私预算∈,计算全局敏感度ΔfApply the Laplacian mechanism to perform differential privacy protection on the feature matrix X of the tax data graph, and calculate the global sensitivity Δ f according to the set privacy budget ∈:

Δf=maxD,D′{|h=h′|}Δ f =max D, D′ {|h=h′|}

其中D、D′为一对相邻数据,h、h分别是针对D、D′的随机查询的结果;令设置要添加的拉普拉斯噪声分布如下:Among them, D and D′ are a pair of adjacent data, h and h are the results of random queries for D and D′ respectively; let Set the Laplacian noise distribution to be added as follows:

上述拉普拉斯机制满足∈-差分隐私,即:The above Laplacian mechanism satisfies ∈-differential privacy, that is:

Pr[M(D)=y]≤ePr[M(D′)=y]Pr[M(D)=y]≤e Pr[M(D′)=y]

其中M为所应用的处理机制;Where M is the processing mechanism applied;

Step2:噪声注入Step2: Noise injection

对税务数据图的特征矩阵X,插入上一步生成的拉普拉斯噪声,获得隐私保护的特征矩阵X′。Insert the Laplacian noise generated in the previous step into the feature matrix X of the tax data graph to obtain the privacy-preserving feature matrix X′.

3)基于参数服务器的模型训练与整合3) Model training and integration based on parameter server

将分解后的邻接矩阵的部分秘密和差分隐私后的特征矩阵分发给各计算方,各计算方基于分配的数据训练图卷积神经网络模型,通过参数服务器发送、收集和整合模型参数,获得目标模型参数。如图3所示,基于参数服务器的模型训练与整合包括:Distribute the partial secrets of the decomposed adjacency matrix and the differentially private feature matrix to each computing party. Each computing party trains the graph convolutional neural network model based on the distributed data, sends, collects and integrates model parameters through the parameter server to obtain the target model parameters. As shown in Figure 3, model training and integration based on parameter server include:

Step1:数据分配Step1: Data distribution

税务数据图的邻接矩阵A被分解为{Ak…},k=1,2,…,税务数据图的特征矩阵X经过差分隐私处理得到X′;数据拥有方向计算方提供隐私保护的数据,每个计算方得到Ak和X′作为GNN模型的输入;The adjacency matrix A of the tax data graph is decomposed into {A k ...}, k=1, 2, ..., and the feature matrix X of the tax data graph is processed through differential privacy to obtain Each calculation side obtains A k and X′ as inputs to the GNN model;

Step2:基于隐私保护数据的模型训练Step2: Model training based on privacy-preserving data

选择图卷积神经网络模型进行训练,计算方k本地拥有两层的GCN模型,在分配给自身的数据上进行训练;其中第一层输入是节点特征矩阵X和邻接矩阵Ak,经过信息传递与聚合后输出节点隐藏特征矩阵:The graph convolutional neural network model is selected for training. Calculator k locally has a two-layer GCN model and is trained on the data assigned to itself; the first layer input is the node feature matrix X and the adjacency matrix A k . After information transfer And the hidden feature matrix of the output node after aggregation:

Hk,1=f(AkX′Wk,1)H k,1 =f(A k X'W k,1 )

第二层输入是第一层的输出Hk,1与邻接矩阵Ak,输出节点隐藏特征矩阵Hk,2,用于节点分类或其他下游任务;The input of the second layer is the output H k,1 of the first layer and the adjacency matrix A k , and the output node hidden feature matrix H k,2 is used for node classification or other downstream tasks;

Hk,2=f(AkHk,1Wk,2)H k,2 =f(A k H k,1 W k,2 )

计算方在训练后向由数据拥有方持有的参数服务器上传模型参数Wk,1、Wk,2,同时拉取参数服务器更新后的模型参数;参数服务器在收集各个参与方上传的模型参数后借助分布式机器学习方法中的模型平均方式对模型参数进行整合,从而获得新的模型参数。After training, the calculating party uploads the model parameters W k,1 , W k,2 to the parameter server held by the data owner, and at the same time pulls the updated model parameters from the parameter server; the parameter server collects the model parameters uploaded by each participant Finally, the model parameters are integrated using the model averaging method in the distributed machine learning method to obtain new model parameters.

实施例Example

选取某地区国税中2017年至2019年的局部税务数据图,包含2786个节点,5728条边,节点特征维度为1289,标签维度为6。以下参照附图,结合实验案例及具体实施方式对本发明作进一步的详细描述。凡基于本发明内容所实现的技术均属于本发明的范围。Select the local tax data graph from 2017 to 2019 in the national tax of a certain region, which contains 2786 nodes, 5728 edges, the node feature dimension is 1289, and the label dimension is 6. The present invention will be described in further detail below with reference to the accompanying drawings, experimental cases and specific implementations. All technologies implemented based on the content of the present invention belong to the scope of the present invention.

如图1所示,本发明具体实施中,基于矩阵分解和差分隐私的税务数据隐私保护图神经网络训练方法包括以下步骤:As shown in Figure 1, in the specific implementation of the present invention, the tax data privacy protection graph neural network training method based on matrix decomposition and differential privacy includes the following steps:

步骤1.基于特征值分解的邻接矩阵秘密分享Step 1. Adjacency matrix secret sharing based on eigenvalue decomposition

税务数据图中包含大量邻接矩阵,通过秘密分享方式可以有效阻止计算方推测邻接矩阵。邻接矩阵秘密分享实施过程如图2,具体包括以下步骤:The tax data graph contains a large number of adjacency matrices, and the secret sharing method can effectively prevent the calculation party from guessing the adjacency matrix. The adjacency matrix secret sharing implementation process is shown in Figure 2, which specifically includes the following steps:

S101.邻接矩阵特征值分解S101. Adjacency matrix eigenvalue decomposition

本地首先生成大小为2786×2786的随机01矩阵P,然后计算遮蔽过的矩阵邻接矩阵A′=PAP-1并上传至第三方服务器。第三方服务器将上传遮蔽过的矩阵邻接矩阵A′进行特征值分解,将分解结果X′、Λ传回本地。本地对分解的结果X′、Λ进一步处理得到税务数据图邻接矩阵分解结果X=P-1X′P,Λ。First, a random 01 matrix P with a size of 2786×2786 is generated locally, and then the masked matrix adjacency matrix A′=PAP -1 is calculated and uploaded to the third-party server. The third-party server will upload the masked matrix adjacency matrix A′ for eigenvalue decomposition, and transmit the decomposition results X′ and Λ back to the local. The decomposition results X′ and Λ are further processed locally to obtain the tax data graph adjacency matrix decomposition result X=P -1 X′P, Λ.

S102.拓扑秘密分享S102. Topology secret sharing

本实施例中具有两个计算方,因此借助随机生成的大小为2786×2786的对角01矩阵S,将Λ分解为Λ1=S×hΛ、Λ1=(I2786-S)×hΛ,并由此得到新矩阵A1=XΛ1X-1,A2=XΛ2X-1。将A1、A2分别分给两个计算方。There are two calculation sides in this embodiment, so with the help of a randomly generated diagonal 01 matrix S of size 2786×2786, Λ is decomposed into Λ 1 =S× h Λ, Λ 1 =(I 2786 -S)× h Λ, and thus obtain new matrices A 1 =XΛ 1 X -1 and A 2 =XΛ 2 X -1 . Divide A 1 and A 2 to two calculating parties respectively.

步骤2.基于差分隐私的特征矩阵保护Step 2. Feature matrix protection based on differential privacy

利用差分隐私,可以简单有效地对特征矩阵进行有效的保护。Using differential privacy, the feature matrix can be effectively protected simply and effectively.

具体的,本实施例中令隐私预算∈=10,计算相应的拉普拉斯噪声并插入到特征矩阵X中获得隐私保护的特征矩阵X′。Specifically, in this embodiment, the privacy budget ∈ = 10, the corresponding Laplacian noise is calculated and inserted into the feature matrix X to obtain the privacy-protecting feature matrix X′.

步骤3.基于参数服务器的模型训练与整合Step 3. Model training and integration based on parameter server

利用邻接矩阵的秘密分享和差分隐私后的特征矩阵,计算方在计算过程中无法逆推原始税务数据图中的信息,再借助参数服务器,可以完成模型的正确训练。By utilizing the secret sharing of the adjacency matrix and the feature matrix after differential privacy, the calculation party cannot reversely deduce the information in the original tax data graph during the calculation process. With the help of the parameter server, the correct training of the model can be completed.

具体的,本实施例中,计算方k训练两层GCN模型,模型参数记为Wi,1,Wi,2。基于所分配的数据Ai和X′,计算方k训练模型,其中第一层输入是节点特征矩阵X和邻接矩阵Ak,经过信息传递与聚合后输出节点隐藏特征矩阵:Specifically, in this embodiment, calculation method k trains a two-layer GCN model, and the model parameters are recorded as Wi , 1 and Wi , 2 . Based on the allocated data A i and

Hk,1=f(AkX′Wk,1)H k,1 =f(A k X'W k,1 )

第二层输入是第一层的输出Hk,1与邻接矩阵Ak,输出节点隐藏特征矩阵Hk,2,可用于节点分类或其他下游任务。The input of the second layer is the output H k,1 of the first layer and the adjacency matrix A k , and the output node hidden feature matrix H k,2 can be used for node classification or other downstream tasks.

Hk,2=f(AkHk,1Wk,2)H k,2 =f(A k H k,1 W k,2 )

并在训练结束后将Wk,1,Wk,2上传参数服务器。两计算方都上传后,参数服务器利用模型平均方式整合模型参数,获得并将更新后的参数W1、W2再次下发给计算方以进行下次迭代,共迭代100次,获得最终的模型参数。最终模型在原税务数据图上的精确度达到81.6%,相比直接在税务数据图训练所得模型的精确度84.1%,仅下降3个百分点,但借助外部算力大大加快了建模速度。And after training, W k, 1 and W k, 2 are uploaded to the parameter server. After both calculation methods are uploaded, the parameter server uses the model averaging method to integrate the model parameters to obtain And the updated parameters W 1 and W 2 are sent to the calculation party again for the next iteration, with a total of 100 iterations to obtain the final model parameters. The accuracy of the final model on the original tax data map reached 81.6%, which was only 3 percentage points lower than the accuracy of 84.1% of the model trained directly on the tax data map. However, the modeling speed was greatly accelerated with the help of external computing power.

本领域的技术人员容易理解,以上所述仅为本发明的方法实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。It is easy for those skilled in the art to understand that the above descriptions are only method embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention shall be regarded as should be included within the protection scope of the present invention.

Claims (9)

1.一种基于矩阵分解的税务数据安全图神经网络训练方法,特征在于,包括:1. A tax data security graph neural network training method based on matrix decomposition, characterized by including: 首先,对税务数据图的邻接矩阵利用外部服务器进行安全的特征值分解,并将获得的特征值分解结果分成多个部分,与特征向量矩阵做运算,生成多个可分发的邻接矩阵的部分秘密;其次,对税务数据图的特征矩阵,进行差分隐私;再次,税务数据拥有着通过参数服务器将分解后的邻接矩阵的部分秘密与差分隐私后的特征矩阵分发给各计算方进行模型训练;最后,计算方将计算结果返回给税务数据拥有者,经过参数服务器整合更新获得目标模型参数。First, use an external server to perform secure eigenvalue decomposition on the adjacency matrix of the tax data graph, divide the obtained eigenvalue decomposition results into multiple parts, and perform operations with the eigenvector matrix to generate partial secrets of multiple distributable adjacency matrices. ; Secondly, carry out differential privacy on the feature matrix of the tax data graph; thirdly, the tax data has the partial secret of the decomposed adjacency matrix and the differentially private feature matrix distributed to each computing party through the parameter server for model training; finally , the calculation party returns the calculation results to the tax data owner, and obtains the target model parameters through integration and update by the parameter server. 2.根据权利要求1所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,该方法具体包括以下步骤:2. A tax data security graph neural network training method based on matrix decomposition according to claim 1, characterized in that the method specifically includes the following steps: 1)基于特征值分解的邻接矩阵秘密分享1) Adjacency matrix secret sharing based on eigenvalue decomposition 对税务数据图的邻接矩阵,借助外部服务器对其进行安全的特征值分解;根据计算方数量将特征值随机均等分成相应份数,特征值分解结果与特征向量矩阵的运算结果即为可发布的邻接矩阵的部分秘密;For the adjacency matrix of the tax data graph, use an external server to perform secure eigenvalue decomposition; randomly divide the eigenvalues into corresponding parts according to the number of calculation parties, and the eigenvalue decomposition result and the eigenvector matrix operation result can be published Part of the secret of the adjacency matrix; 2)基于差分隐私的特征矩阵保护2) Feature matrix protection based on differential privacy 对税务数据图的特征矩阵,利用差分隐私方法,应用拉普拉斯机制加以保护;For the feature matrix of the tax data graph, the differential privacy method is used and the Laplacian mechanism is applied to protect it; 3)基于参数服务器的模型训练与整合3) Model training and integration based on parameter server 将分解后的邻接矩阵的部分秘密和差分隐私后的特征矩阵分发给各计算方,各计算方基于分配的数据训练图卷积神经网络模型,通过参数服务器发送、收集和整合模型参数,获得目标模型参数。Distribute the partial secrets of the decomposed adjacency matrix and the differentially private feature matrix to each computing party. Each computing party trains the graph convolutional neural network model based on the distributed data, sends, collects and integrates model parameters through the parameter server to obtain the target model parameters. 3.根据权利要求2所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤1)中,基于特征值分解的邻接矩阵秘密分享包括:3. A tax data security graph neural network training method based on matrix decomposition according to claim 2, characterized in that in step 1), the adjacency matrix secret sharing based on eigenvalue decomposition includes: Step1:安全的矩阵特征值分解Step1: Safe matrix eigenvalue decomposition 对税务数据图的邻接矩阵A,通过QR分解的多次迭代,获得足够精确的特征值分解数值解:For the adjacency matrix A of the tax data graph, through multiple iterations of QR decomposition, a sufficiently accurate numerical solution of eigenvalue decomposition is obtained: 其中t为迭代轮次,Qt、Rt分别是t轮次对At的QR分解结果;经过k次迭代后,特征值对角矩阵Λ=Ak,特征向量矩阵X=Q1…Q1,原邻接矩阵A=XΛX-1where t is the iteration round, Q t and R t are the QR decomposition results of A t in round t respectively; after k iterations, the eigenvalue diagonal matrix Λ=A k and the eigenvector matrix X=Q 1 ...Q 1 , the original adjacency matrix A=XΛX -1 ; Step2:拓扑秘密分享Step2: Topology secret sharing 对于获得的特征值对角矩阵Λ,以多个对角矩阵的形式将特征值随机分成多组,当分成两组时,具体步骤如下:For the obtained eigenvalue diagonal matrix Λ, the eigenvalues are randomly divided into multiple groups in the form of multiple diagonal matrices. When divided into two groups, the specific steps are as follows: 生成随机对角01矩阵S,其中对角元素服从以下规则:Generate a random diagonal 01 matrix S, where the diagonal elements obey the following rules: 生成新对角矩阵Λ1、Λ2,方法如下:Generate new diagonal matrices Λ 1 and Λ 2 as follows: 其中In表示n维单位矩阵,×h表示哈达玛积,即矩阵对应元素相乘;Where I n represents the n-dimensional unit matrix, × h represents the Hadamard product, that is, the multiplication of corresponding elements of the matrix; 利用新生成的对角矩阵Λ1、Λ2,生成新矩阵A1、A2,方法如下:Use the newly generated diagonal matrices Λ 1 and Λ 2 to generate new matrices A 1 and A 2 as follows: A1、A2具有以下性质:A 1 and A 2 have the following properties: 在GNN模型中,图拓扑结构以邻接矩阵形式表示,邻接矩阵的乘方能够反应GNN模型的信息传递过程。In the GNN model, the graph topology is expressed in the form of an adjacency matrix, and the power of the adjacency matrix can reflect the information transfer process of the GNN model. 4.根据权利要求3所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤1)的Step1中,通过设置第三方服务器进行对邻接矩阵A的安全分解:数据拥有方生成稀疏的随机01矩阵P,计算并向第三方服务器上传A′=PAP-1,第三方服务器按上述迭代求解过程计算A′的特征值分解并将计算结果X′、Λ返回给数据拥有方,有A′=X′ΛX′-1,数据拥有方计算X=P-1X′,得到矩阵分解结果。4. A tax data security graph neural network training method based on matrix decomposition according to claim 3, characterized in that in Step 1 of step 1), secure decomposition of the adjacency matrix A is performed by setting up a third-party server: Data The owner generates a sparse random 01 matrix P, calculates and uploads A′=PAP -1 to the third-party server. The third-party server calculates the eigenvalue decomposition of A′ according to the above iterative solution process and returns the calculation results X′ and Λ to the data The owner has A′=X′ΛX′ -1 . The data owner calculates X=P -1 X′ and obtains the matrix decomposition result. 5.根据权利要求3所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤1)的Step2中,对于两层GCN,节点嵌入受其两跳范围内的邻居影响,两条范围内的邻居用邻接矩阵的平方A2表示,A2能够有效表明图的连接关系和节点间的信息传递;记节点数n,将原始邻接矩阵分解成k个矩阵,分解后的每个矩阵包含个特征值,缺少个特征值,在获得全部特征值的前提下,正确排列的概率/> 5. A tax data security graph neural network training method based on matrix decomposition according to claim 3, characterized in that, in Step 2 of step 1), for a two-layer GCN, the node embedding is affected by its neighbors within the two-hop range. Influence, the neighbors within two ranges are represented by the square of the adjacency matrix A 2. A 2 can effectively indicate the connection relationship of the graph and the information transfer between nodes; record the number of nodes n, decompose the original adjacency matrix into k matrices, and after decomposition Each matrix of contains eigenvalues, missing eigenvalues, on the premise of obtaining all eigenvalues, the probability of correct arrangement/> 6.根据权利要求5所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,当n=100,k=2时,p≈3.3×10-66. A tax data security graph neural network training method based on matrix decomposition according to claim 5, characterized in that when n=100, k=2, p≈3.3×10 -6 . 7.根据权利要求3所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤2)中,基于差分隐私的特征矩阵保护包括:7. A tax data security graph neural network training method based on matrix decomposition according to claim 3, characterized in that in step 2), the feature matrix protection based on differential privacy includes: Step1:隐私预算及全局敏感度计算Step1: Privacy budget and global sensitivity calculation 应用拉普拉斯机制,对税务数据图的特征矩阵X进行差分隐私保护,根据设置的隐私预算∈,计算全局敏感度ΔfApply the Laplacian mechanism to perform differential privacy protection on the feature matrix X of the tax data graph, and calculate the global sensitivity Δ f according to the set privacy budget ∈: Δf=maxD,D′{|h=h′|}Δ f =max D,D′ {|h=h′|} 其中D、D′为一对相邻数据,h、h分别是针对D、D′的随机查询的结果;令设置要添加的拉普拉斯噪声分布如下:Among them, D and D′ are a pair of adjacent data, h and h are the results of random queries for D and D′ respectively; let Set the Laplacian noise distribution to be added as follows: Step2:噪声注入Step2: Noise injection 对税务数据图的特征矩阵X,插入上一步生成的拉普拉斯噪声,获得隐私保护的特征矩阵X′。Insert the Laplacian noise generated in the previous step into the feature matrix X of the tax data graph to obtain the privacy-preserving feature matrix X′. 8.根据权利要求7所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤2)的Step2中,拉普拉斯机制满足∈-差分隐私,即:8. A tax data security graph neural network training method based on matrix decomposition according to claim 7, characterized in that, in Step 2 of step 2), the Laplacian mechanism satisfies ∈-differential privacy, that is: Pr[M(D)=y]≤ePr[M(D′)=y]Pr[M(D)=y]≤e Pr[M(D′)=y] 其中M为所应用的处理机制。Where M is the processing mechanism applied. 9.根据权利要求7所述的一种基于矩阵分解的税务数据安全图神经网络训练方法,其特征在于,步骤3)中,基于参数服务器的模型训练与整合包括:9. A tax data security graph neural network training method based on matrix decomposition according to claim 7, characterized in that in step 3), the parameter server-based model training and integration includes: Step1:数据分配Step1: Data distribution 税务数据图的邻接矩阵A被分解为{Ak…},k=1,2,…,税务数据图的特征矩阵X经过差分隐私处理得到X′;数据拥有方向计算方提供隐私保护的数据,每个计算方得到Ak和X′作为GNN模型的输入;The adjacency matrix A of the tax data graph is decomposed into {A k ...}, k=1,2,..., and the feature matrix X of the tax data graph is processed with differential privacy to obtain Each calculation side obtains A k and X′ as inputs to the GNN model; Step2:基于隐私保护数据的模型训练Step2: Model training based on privacy-preserving data 选择图卷积神经网络模型进行训练,计算方k本地拥有两层的GCN模型,在分配给自身的数据上进行训练;其中第一层输入是节点特征矩阵X和邻接矩阵Ak,经过信息传递与聚合后输出节点隐藏特征矩阵:The graph convolutional neural network model is selected for training. Calculator k locally has a two-layer GCN model and is trained on the data assigned to itself; the first layer input is the node feature matrix X and the adjacency matrix A k . After information transfer And the hidden feature matrix of the output node after aggregation: Hk,1=f(AkX′Wk,1)H k,1 =f(A k X'W k,1 ) 第二层输入是第一层的输出Hk,1与邻接矩阵Ak,输出节点隐藏特征矩阵Hk,2,用于节点分类或其他下游任务;The input of the second layer is the output H k,1 of the first layer and the adjacency matrix A k , and the output node hidden feature matrix H k,2 is used for node classification or other downstream tasks; Hk,2=f(AkHk,1Wk,2)H k,2 =f(A k H k,1 W k,2 ) 计算方在训练后向由数据拥有方持有的参数服务器上传模型参数Wk,1、Wk,2,同时拉取参数服务器更新后的模型参数;参数服务器在收集各个参与方上传的模型参数后借助分布式机器学习方法中的模型平均方式对模型参数进行整合,从而获得新的模型参数。After training, the calculating party uploads the model parameters W k,1 and W k,2 to the parameter server held by the data owner, and at the same time pulls the updated model parameters from the parameter server; the parameter server collects the model parameters uploaded by each participant Finally, the model parameters are integrated using the model averaging method in the distributed machine learning method to obtain new model parameters.
CN202310795131.2A 2023-06-30 2023-06-30 Tax data security graph neural network training method based on matrix decomposition Pending CN116861152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310795131.2A CN116861152A (en) 2023-06-30 2023-06-30 Tax data security graph neural network training method based on matrix decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310795131.2A CN116861152A (en) 2023-06-30 2023-06-30 Tax data security graph neural network training method based on matrix decomposition

Publications (1)

Publication Number Publication Date
CN116861152A true CN116861152A (en) 2023-10-10

Family

ID=88224495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310795131.2A Pending CN116861152A (en) 2023-06-30 2023-06-30 Tax data security graph neural network training method based on matrix decomposition

Country Status (1)

Country Link
CN (1) CN116861152A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371046A (en) * 2023-12-07 2024-01-09 清华大学 Multi-party collaborative optimization-oriented data privacy enhancement method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371046A (en) * 2023-12-07 2024-01-09 清华大学 Multi-party collaborative optimization-oriented data privacy enhancement method and device
CN117371046B (en) * 2023-12-07 2024-03-01 清华大学 A data privacy enhancement method and device for multi-party collaborative optimization

Similar Documents

Publication Publication Date Title
Zhao et al. Inprivate digging: Enabling tree-based distributed data mining with differential privacy
CN108519981B (en) Cross-chain intelligent contract cooperation possibility evaluation method
Piao et al. Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach
Cheng et al. Staticgreedy: solving the scalability-accuracy dilemma in influence maximization
Chen et al. Differentially private transit data publication: a case study on the montreal transportation system
Amelkin et al. A distance measure for the analysis of polar opinion dynamics in social networks
CN112101577B (en) XGboost-based cross-sample federal learning and testing method, system, device and medium
CN114092729B (en) Heterogeneous electricity utilization data publishing method based on cluster anonymization and differential privacy protection
CN116861152A (en) Tax data security graph neural network training method based on matrix decomposition
CN116628360A (en) Social network histogram issuing method and device based on differential privacy
Kang et al. Enhanced privacy preserving for social networks relational data based on personalized differential privacy
WO2016106944A1 (en) Method for creating virtual human on mapreduce platform
CN116186780A (en) A privacy protection method and system based on noise perturbation in a collaborative learning scenario
El Mfadel et al. Notes on local and nonlocal intuitionistic fuzzy fractional boundary value problems with Caputo fractional derivatives
Jiao et al. A Differential Privacy Federated Learning Scheme Based on Adaptive Gaussian Noise.
CN116361759B (en) Intelligent compliance control method based on quantitative authority guidance
CN116150483A (en) Electronic certificate recommendation method, equipment and storage medium based on Bayesian network model
CN115994381A (en) Sensitive data identification method and system for project secret assessment
CN116015939A (en) Advanced persistent threat interpretation method based on atomic technology template
CN109522750A (en) A kind of new k anonymity realization method and system
CN107798249A (en) The dissemination method and terminal device of behavioral pattern data
Huang et al. A federated graph neural network framework for privacy-preserving personalization
CN117688591B (en) Encryption method and system for OFD format document
CN111882416A (en) A training method and related device for a risk prediction model
Liu et al. Protection of user data by differential privacy algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination