CN111724241B - Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network - Google Patents

Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network Download PDF

Info

Publication number
CN111724241B
CN111724241B CN202010507242.5A CN202010507242A CN111724241B CN 111724241 B CN111724241 B CN 111724241B CN 202010507242 A CN202010507242 A CN 202010507242A CN 111724241 B CN111724241 B CN 111724241B
Authority
CN
China
Prior art keywords
invoice
network
transaction
virtual
tax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010507242.5A
Other languages
Chinese (zh)
Other versions
CN111724241A (en
Inventor
董博
王伊杨
郑庆华
高宇达
阮建飞
王嘉祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010507242.5A priority Critical patent/CN111724241B/en
Publication of CN111724241A publication Critical patent/CN111724241A/en
Application granted granted Critical
Publication of CN111724241B publication Critical patent/CN111724241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic edge feature-based enterprise invoice virtual issuing detection method for a graph annotation meaning network, which comprises the following steps: firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node characteristics and edge characteristics; secondly, extracting transaction network characteristics of each time period based on the graph attention network with enhanced edge characteristics; thirdly, extracting time sequence features by using LSTM, and then constructing an invoice virtual open detection model by using a deep neural network; then, the network parameters are adjusted through training an enterprise invoice open detection model; and finally, performing invoice virtual issuing detection on tax payment data by using the trained invoice virtual issuing detection model. The method does not depend on expert experience, dynamically analyzes the enterprise transaction network by combining historical information, considers the multidimensional characteristics of the enterprise transaction edges, can detect different invoice virtual issuing behavior modes, and improves the detection accuracy.

Description

Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network
Technical Field
The invention belongs to the technical field of tax inspection, and particularly relates to an enterprise invoice virtual issuing detection method of a graph annotation meaning network based on dynamic edge characteristics.
Background
Along with development of ' three stages of tax and ' Internet+tax ', tax departments accumulate massive tax-related data and provide favorable data support for invoice virtual issuing detection, however, due to diversity and concealment of invoice virtual issuing behaviors, how to realize efficient and accurate invoice virtual issuing detection based on tax-related data is still a urgent problem to be solved.
The following documents provide references to related methods for invoice false issue detection using tax related data:
document 1. An invoice virtual issuing early warning method and system (201711457960.0);
document 2. A method (201710147850.8) for detecting a special invoice for a virtual added value tax based on parallel loop detection;
document 3. An invoice virtual issuing identification method and system (201910636175.4) based on positive examples and unlabeled learning.
Document 1 proposes an invoice virtual issuing early warning method, which establishes a corresponding dimension table according to invoice sales information and ticket information of a target enterprise, correlates invoice information in a sales invoice data set with the invoice sales information and ticket information in the dimension table according to preset conditions to form a result table, and obtains top issuing invoice information with the invoice amount larger than a threshold value in a set time period in the result table, thereby providing the virtual issuing tax payer early warning information.
Document 2 proposes a method for detecting a special invoice for a virtual-open value-added tax based on parallel loop detection, which performs the detection of the special invoice for the virtual-open value-added tax by a method for loop detection, improves the loop detection, and distributes calculation tasks to a plurality of computers in a distributed cluster by a distributed parallel calculation method, thereby improving the calculation efficiency.
Document 3 proposes an invoice virtual issuing identification method and system based on positive examples and unlabeled learning, and the method comprises the following steps: firstly, performing feature processing and coding processing on basic information of tax payers; secondly, combining basic features and network features as feature spaces, training two classifiers based on a proposed cyclic multi-spy negative example marking method, and acquiring intersections of all the preliminary negative sample sets through the two classifiers to obtain a final negative sample set; then, constructing an invoice virtual open prediction model by taking the mined reliable negative sample and positive sample as training sets based on a k-nearest neighbor regression cooperative training algorithm; finally, the characteristics of the unlabeled enterprise samples are input into an invoice virtual issuing prediction model to identify whether the enterprise has invoice virtual issuing behaviors.
The methods described in the above documents have mainly the following problems: the methods of document 1 and document 2 rely on expert experience, and cannot cope with complex and diverse invoice virtual issuing behaviors for specific invoice virtual issuing patterns; the method of document 3 is based on an enterprise static relationship network, and cannot dynamically analyze the tax-paying enterprise invoice virtual issuing risk in combination with historical transaction information, and the detection accuracy is reduced due to the fact that the information is ignored.
Disclosure of Invention
Aiming at the defects in the above documents, the invention provides an enterprise invoice virtual issuing detection method based on a dynamic edge feature graph annotation force network for solving the problems of the existing invoice virtual issuing detection method. Firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node characteristics and edge characteristics; secondly, extracting transaction network characteristics of each time period based on the graph attention network with enhanced edge characteristics; thirdly, network characteristics extracted from the graph annotation force network with enhanced edge characteristics are used as input, and a Long Short-Term Memory (LSTM) is used for obtaining time sequence characteristics; then, constructing an invoice virtual open detection model by using a deep neural network; then, the network parameters are adjusted through a training invoice virtual open detection model; and finally, performing invoice virtual issuing detection on the target enterprise by using the trained model. The invention builds an invoice virtual issuing detection model suitable for all tax-issuing enterprises based on tax-issuing data of part of the enterprises with labels, and solves the problems that the existing invoice virtual issuing detection mode depends on expert experience, cannot cope with complex and various invoice virtual issuing behaviors aiming at a specific invoice virtual issuing behavior mode, and cannot dynamically analyze invoice virtual issuing risks of the tax-issuing enterprises by combining historical transaction information.
The invention is realized by adopting the following technical scheme:
the enterprise invoice virtual issuing detection method of the graph annotation meaning network based on the dynamic edge characteristics comprises the steps of firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node characteristics and edge characteristics; secondly, extracting transaction network characteristics of each time period based on the graph attention network with enhanced edge characteristics; thirdly, extracting time sequence features by using LSTM, and then constructing an invoice virtual open detection model by using a deep neural network; then, the network parameters are adjusted through training an enterprise invoice open detection model; and finally, performing invoice virtual issuing detection on tax payment data by using the trained invoice virtual issuing detection model.
The invention is further improved in that the method comprises the following specific implementation steps:
1) Constructing dynamic enterprise transaction network and extracting node characteristics and edge characteristics
The enterprise transaction network refers to a transaction relationship among enterprises represented by a network structure, and the dynamic transaction network refers to a transaction network in which the transaction relationship changes along with time, so that the network structure also changes in series;
(1) Construction of dynamic enterprise transaction network
Step1, determining key fields; firstly, preprocessing invoice records, deleting abnormal records in the invoice records, and then extracting a sales party tax payer electronic archive number, a purchasing party tax payer electronic archive number and an invoicing time in each invoice record, wherein the sales party tax payer electronic archive number and the purchasing party tax payer electronic archive number are used for representing nodes, and the invoicing time is used for dividing a transaction network;
step2, determining a time span division transaction record: finding the maximum value and the minimum value of the billing time field, determining the threshold value of invoice transaction, dividing the whole time span into T equal parts, and respectively obtaining transaction records of T time periods;
step3, constructing an enterprise transaction network: based on undirected graph theory, a total node is obtained by combining and de-duplicating the electronic file numbers of the tax payer and the electronic file numbers of the tax payer of the purchasing party in each time period, V is used for representing a node set in a transaction network, namely a tax payer set, and V is used for representing a tax payer set i To represent a tax administration enterprise, where v i E V, i=1, …, N; n represents the number of tax-paying enterprises in the transaction network, wherein N= |V| represents the transaction relation of the tax-paying enterprises as the side of the transaction network, and E represents the transaction relation; since trade relationship is time dependent, use E t An enterprise trade relationship representing time t, where E t ∈R n×n By e i,j,t To represent enterprise v i And enterprise v j A transaction occurs at time t, creating an edge of the transaction network, where e i,j,t =(v i ,v j T) εE; adjacency matrix A t An adjacency matrix representing time t, wherein A t ∈R n×n The matrix contains only two elements 0 or 1, 0 indicates that there is a transaction edge between two enterprises, and 0 indicates that there is no transaction edge between two enterprises, namely when e i,j,t ∈E t When A is i,j,t When =1When A is i,j,t =0; thus, the transaction network is represented by g= (V, E);
(2) Extracting node characteristics of enterprise transaction network
Step1, selecting indexes: screening out related indexes from tax payer attribute information and tax payer financial reports, and firstly extracting basic information of the tax payer; then, the business scale information and the business object information are selected according to the business scope information section of the enterprise; finally, selecting general financial and tax indexes;
step2, characteristic pretreatment: the indexes selected in Step1 comprise three parts: the method comprises the steps of firstly completing data, then carrying out OneHot coding on each field data, and converting the OneHot coding into a vector form; the numerical quantitative feature is first complemented by a missing value, and then the z-score standardization is carried out on each field data;
step3. Merging node features
Combining all the features into a matrix, wherein each row of the matrix represents the feature of one node, and the node features of the transaction network are expressed as X, X epsilon R n×d Where n represents the number of nodes, d represents the dimension of the node feature, x v ∈R d A feature matrix representing node v;
(3) Extracting edge features of enterprise transaction network
Step1, extracting characteristics based on a statistical method: the features extracted based on the statistical method reflect the basic attributes of the transaction sequence, and the mean, variance, maximum value, minimum value, sum and median of invoice record fields are extracted;
step2, extracting transaction proportion characteristics: the transaction occupation bits reflect the specific weights of specific transactions of the two purchasing parties on the two purchasing parties respectively, and the calculation method is as follows:
wherein e ij Representing the total amount of transactions involving a transaction between an ith node and a jth node, a j Representing the proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node j, and the same is true of a i The method is characterized in that the proportion of transaction amount between an ith node and a jth node to total transaction amount of the node i is represented, the calculation process firstly calculates total amount associated with one side, then calculates respective total amounts of the associated purchasing parties, and finally calculates the proportion of the total amount of the purchasing party and the proportion of the total amount of the purchasing party respectively to obtain 2-dimensional side characteristics;
step3, extracting the tax negative characteristics of purchase sales: the transaction tax characteristics of the purchase are reflected on the transaction tax conditions of both purchase parties, and the calculation method is as follows:
wherein t is ij Representing the total tax related to the transaction between the ith node and the jth node, and obtaining a transaction tax negative value by calculating the proportion of the total tax related to one edge to the total amount;
step4 edge feature merging
Combining the edge features extracted by Step1-Step3 into a matrix form, and representing the edge features of the transaction network as X eAn edge feature matrix representing the time t, p representing the dimension of the edge feature, < >>A feature vector representing the transaction edge (v, u) at time t, and X p,t ∈R n×n A channel of the edge feature at time t is shown;
2) Extracting transaction network characteristics of each time period by using a graph attention network enhanced by edge characteristics, wherein the graph attention network is hereinafter referred to as EGAT
(1) Defining a conversion function, a node similarity measurement function and an influence factor function
Step1. define the transformation function: g is a conversion function that can convert the characteristics of the node as follows:
where l represents the current number of layers of the EAGT network,is the input of the attention network of the first layer diagram at the moment t, W l Is a parameter matrix for the first layer network learning;
step2, defining a node similarity measurement function: f is a function that calculates the similarity between connected nodes, which produces an N x N tensor in the form of:
wherein N is i Representing the neighbor node set of node i, i.e., i, j e N i The method comprises the steps of carrying out a first treatment on the surface of the The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; the I represents a splicing operation, and T represents a transposition operation;
step3. defining an influence factor function: alpha is an influence factor function, and an NxN vector is generated, wherein the vector represents the influence factor of each node on a certain channel by surrounding nodes, and the form is as follows:
wherein,the p-th channel of the edge feature is shown;
(2) Network feature propagation through edge feature enhanced graph annotation networks
Each side has a multi-dimensional feature, each dimensional feature constitutes a channel, and then the output of the p-th channel of the first layer is:
wherein,the input of a layer I graph annotation force network at the moment t is represented; />The input of the p-th channel of the edge characteristic of the attention network of the first layer diagram at the moment t is shown; g is a conversion function, which can convert the characteristics of the node; alpha is an influence factor function representing a channelEach node is affected by surrounding nodes; first, the input is passed through a conversion function g l Converting, and then obtaining a final output Z by aggregating information of nodes around the channel; after obtaining the output of each channel, each is weighted differently by using the channel-based attention mechanism, and all the outputs are aggregated together, so as to obtain the output at the time t as follows:
where β is obtained by convolving the characteristics of channel p by multiple layers to obtain a value, and then taking softmax () for the values of all channels, β represents the weight of each channel, and is calculated as follows:
where softmax (.) is the activation function for classification and conv (.) is a two-dimensional convolution;
3) Acquisition of time series features by LSTM
LSTM is a special type of RNN, which can learn long-term dependency information, and after the output of each moment is obtained through the step 2), the output is input into LSTM to obtain time characteristics:
LSTM(X 1 ,X 2 ,...,X t )
wherein X is t Representing the output of the EGAT network at the time t;
4) Invoice virtual open detection using deep neural network
Inputting the time sequence characteristics obtained in the step 3) into an invoice virtual-issuing detection classifier for detecting whether a tax-paying enterprise has virtual-issuing behaviors or not; the invoice virtual open detection classifier is of a fully-connected deep neural network structure, and the step of constructing the invoice virtual open detection classifier comprises the following steps of:
(1) Construction of invoice open-of-air detection classifier
The invoice virtual open detection classifier is a model of a neural network structure, and the steps of constructing the invoice virtual open detection classifier comprise:
step1, determining an input layer of the invoice false open detection classifier, wherein the number of neurons of the input layer is equal to the dimension of a time feature acquired through LSTM;
step2, determining the output layer of the invoice false open detection classifier, wherein the number of neurons of the output layer is 1, the activation function of the output layer adopts softmax, and the output result is interval [0,1 ]]Probability value between p i To represent;
step3, determining a hidden layer of the invoice open detection classifier, wherein the hidden layer adopts a fully connected network;
inputting the execution result of the step 3) into a deep neural network input layer to obtain a final classification result expression p i =FC(LSTM(X 1 ,X 2 ,...,X t ) When p) i More than or equal to 0.5, indicating that invoice virtual issuing exists, when p i < 0.5, indicating that there is no invoice virtual open behavior;
(2) Training of invoice open detection model
Step1 initializing neural network parameters
The initialization of the neural network parameters avoids gradient dispersion under the condition of deep network layer number, accelerates the network training speed, and satisfies the following two conditions: saturation phenomenon does not occur to each layer of activation value, and each layer of activation value is not 0; xavier initialization helps to reduce gradient dispersion problems so that signals can be transmitted deeper in the neural network, so network parameter initialization takes the form of Xavier initialization, expressed in detail as:
wherein n is in Is the input dimension of the layer where the parameters are located, n out Is the output dimension, W, of the layer where the parameters are located i,j Is the weight between the individual neurons;
step2. Determining optimization objectives
Training the classifier to correctly classify tax data, wherein the classifying effect of the classifier is represented by a loss function, and the smaller the loss function is, the better the classifying effect of the classifier is represented; the output layer of the invoice false-open detection classifier adopts a softmax (& gt) activation function, trains the network to minimize the cross entropy function, and optimizes the targets as follows:
wherein y is i The label value of the tax-paying enterprise i is represented, the label of the tax-paying enterprise with the virtual invoice is 1, and the label of the tax-paying enterprise without the virtual invoice is 0; p is p i The output of the invoice virtual issuing detection classifier is represented, namely the probability that the tax-paying enterprise i has invoice virtual issuing behaviors;
step3, using BP algorithm to adjust network parameters of the model, wherein the learning process consists of forward propagation and error back propagation of signals, and the process comprises:
a) When in forward transmission, the time sequence characteristics of the input tax payment data are transmitted from the input layer of the invoice virtual issuing detection classifier, and are transmitted to the output layer of the invoice virtual issuing detection classifier after being processed layer by each hidden layer; if the actual output of the output layer of the invoice virtual issuing detection classifier is different from the corresponding label value, the reverse propagation stage of the error is shifted;
b) The error back propagation is to reversely propagate the output error of the invoice virtual open detection classifier layer by layer to the input layer of the invoice virtual open detection classifier through the hidden layer, and to distribute the error to all units of each layer, so as to obtain error signals of each layer, wherein the error signals are used as the basis for correcting the weight of the units;
c) The process of weight adjustment of each layer of forward propagation and error reverse propagation of the signal is carried out repeatedly, and the process of weight continuous adjustment, namely the process of network learning training, is carried out until the error of network output is reduced to an acceptable degree or until the preset learning times are carried out;
5) Enterprise invoice virtual issuing detection
Processing tax payment data to be detected in the steps 1), 2) and 3), inputting the obtained tax payment data time sequence characteristics into an invoice virtual issuing detection classifier, and judging whether an enterprise has invoice virtual issuing behaviors according to the output result of the invoice virtual issuing classifier.
The invention has at least the following beneficial technical effects:
the invention provides an enterprise invoice virtual issuing detection method of a drawing and annotating meaning network based on dynamic edge characteristics, which is used for solving the problems that the existing invoice virtual issuing detection mode depends on expert experience, aiming at a specific invoice virtual issuing behavior mode, complex and various invoice virtual issuing behaviors cannot be dealt with and meanwhile, tax-paying enterprise invoice virtual issuing risks cannot be dynamically analyzed by combining historical transaction information, and compared with the prior art, the method has the following advantages:
1. the characteristics of the transaction network among enterprises are extracted through a machine learning method, an enterprise invoice virtual issuing detection model is constructed, expert experience is not relied on, and complex and various invoice virtual issuing behaviors can be dealt with;
2. based on the side enhancement graph attention network, basic information of tax enterprises is considered, multidimensional information of transaction sides is considered, and a more concealed invoice virtual issuing mode can be mined;
3. the method combines the historical transaction information of enterprises, accurately grasps the dynamic evolution rule of the enterprise transaction network, and realizes the dynamic analysis of the tax-paying enterprise invoice virtual issuing risk.
In summary, the method introduces the enterprise invoice virtual issuing detection method based on the dynamic edge feature graph annotation force network, the method adopts the dynamic edge feature graph annotation force network, firstly the edge feature enhancement method can fully discover the multidimensional edge features of the transaction network, secondly the graph attention mechanism enables the model to learn the global feature dependency relationship among enterprises in the transaction network, and the long-term memory network can learn the abnormal feature of the invoice virtual issuing from the historical transaction information.
Drawings
Fig. 1 is a flow chart of an overall framework.
FIG. 2 is a flow chart for building a dynamic enterprise transaction network and extracting node features and edge features.
FIG. 3 is a flow chart of extracting transaction network characteristics for each time period using EGAT.
Fig. 4 is a schematic diagram of a timing feature extraction network.
Fig. 5 is a schematic diagram of the overall structure of the enterprise virtual invoice detection model.
Fig. 6 is a schematic diagram of a network structure of an invoice virtual open detection classifier.
FIG. 7 is a flow chart for determining parameters of an invoice false open detection model network.
Detailed Description
In order to more clearly illustrate the technical scheme of the invention, the method for detecting the false invoice of the enterprise on the basis of the dynamic edge feature graph annotation force network is described in detail below with reference to the accompanying drawings and the specific embodiment.
In the embodiment, the Shaanxi province enterprise invoice open detection model is established by using Shaanxi province tax data consisting of positive samples and negative samples with partial labels. As shown in fig. 1, the present invention mainly includes the following steps:
step1, constructing a dynamic enterprise transaction network and extracting node characteristics and edge characteristics
Since the tax-saving data of Shaanxi contains a large amount of text data, and cannot be directly applied to virtual invoice detection based on an enterprise transaction network, the enterprise transaction network needs to be constructed first, and fig. 2 is a flow chart for constructing a dynamic enterprise transaction network and extracting node characteristics and edge characteristics. The specific steps of constructing a dynamic Shaanxi tax-paying enterprise transaction network and extracting node characteristics and edge characteristics are as follows:
s201, constructing a dynamic enterprise transaction network
Firstly, preprocessing invoice records of Shaanxi province, deleting abnormal records in the invoice records, and then extracting a sales party tax payer electronic archive number, a purchasing party tax payer electronic archive number and an invoicing time in each invoice record, wherein the sales party tax payer electronic archive number and the purchasing party tax payer electronic archive number are used for representing nodes, and the invoicing time is used for dividing a transaction network.
And finding the maximum value and the minimum value of the invoicing time field, and determining the threshold value of the invoice transaction. The whole time span is reasonably divided into T equal parts, and transaction records of T time periods are respectively obtained.
Based on undirected graph theory, a total node is obtained by merging and deduplicating the electronic file numbers of the tax payers of the selling party and the electronic file numbers of the tax payers of the purchasing party in each time period, V is used for representing a node set in a transaction network of Shaanxi province, namely a set of tax-paying enterprises of Shaanxi province, and V is used for i To represent a tax administration enterprise, where v i E V, i=1, …, N; n represents the number of tax-paying enterprises in the shanxi province transaction network, wherein n= |v|. The transaction relation of the Shaanxi province tax-paying enterprises is expressed as the side of a Shaanxi province transaction network and is expressed by E; since trade relationship is time dependent, use E t An enterprise trade relationship representing time t, where E t ∈R n×n By e i,j,t To represent enterprise v i And enterprise v j A transaction occurs at time t, creating an edge of the transaction network, where e i,j,t =(v i ,v j T) εE; adjacency matrix A t An adjacency matrix representing time t, wherein A t ∈R n×n The matrix contains only two elements 0 or 1, 0 indicates that there is a transaction edge between two enterprises, and 0 indicates that there is no transaction edge between two enterprises, namely when e i,j,t ∈E t When A is i,j,t When =1When A is i,j,t =0; the transaction network can thus be represented by g= (V, E).
S202, extracting node characteristics of enterprise transaction network
Screening relevant indexes from taxpayer attribute information, taxpayer financial statement and the like according to the dynamic Shaanxi tax-paying enterprise transaction network constructed in the S201, and firstly extracting basic information of the taxpayer; then, the business scale information and the business object information are selected according to the business object information section of the enterprise; finally, some general financial and tax indexes are selected. The selected indexes mainly comprise three parts: numerical qualitative, numerical quantitative, and textual qualitative features. The indexes selected in this embodiment include registration type, tax payer status, business operation range, practitioner, investment sum, sales and the like. The numerical qualitative feature is subjected to data complementation, and then each field data is subjected to OneHot coding and converted into a vector form; the numerical quantitative feature is first complemented by a missing value, and then z-score normalization is performed on each field data.
All features are combined into a matrix, each row of the matrix representing a feature of a node. By X.epsilon.R n×d To represent node characteristics of the transaction network of Shaanxi province, wherein n represents the number of tax-paying enterprises of Shaanxi province, d represents the dimension of the characteristics of the tax-paying enterprises, and x v ∈R d Representing the feature matrix of enterprise v.
S203, extracting edge characteristics of enterprise transaction network
And extracting the mean value, variance, maximum value, minimum value, sum, median and the like of the invoice record fields of Shaanxi province as the characteristics based on the statistical method.
The transaction proportion feature is extracted, the transaction proportion feature reflects the specific weights of specific transactions of the purchasing parties on the purchasing parties, and the calculation method is as follows:
wherein e ij Representing the total amount of transactions involved in transactions between enterprise i and enterprise j. a, a j Representing the proportion of the transaction amount between enterprise i and enterprise j to the total transaction amount of enterprise j, and the same is true of a i Representing the specific gravity of the transaction amount between enterprise i and enterprise j to the total transaction amount of node i. The calculation process comprises the steps of firstly calculating the total amount of one side, then calculating the respective total amounts of the two related purchasing parties, and finally respectively calculating the proportion of the total amount of the purchasing party and the proportion of the total amount of the purchasing party to obtain the 2-dimensional side characteristic.
And extracting the tax negative characteristics of the purchase transaction. The transaction tax characteristics of the purchase are reflected on the transaction tax conditions of both purchase parties, and the calculation method is as follows:
wherein t is ij Representing the total tax involved in the transaction between business i and business j, the transaction tax negative value is obtained by calculating the proportion of the total tax involved in one side to the total amount.
Combining the edge features of Shaanxi province transaction network into matrix form by X e To show that, by means of the method,an edge feature matrix representing the time t, p representing the dimension of the edge feature, < >>A feature vector representing the transaction edge (v, u) at time t, and X p,t ∈R n×n A channel of the edge feature at time t is shown.
Step2, extracting transaction network characteristics of each time period by using Edge characteristic enhanced graph annotation meaning network (Edge-enhanced Graph Attention Network, EGAT for short)
And (3) taking the dynamic Shaanxi tax-paying enterprise transaction network constructed in the step (1) as a network of the features to be extracted. FIG. 3 is a flow chart of extracting transaction network characteristics for each time period using EGAT. The specific steps of extracting the transaction network characteristics of each time period by utilizing the EGAT are as follows:
s301, defining a conversion function, a node similarity measurement function and an influence factor function
The conversion function may convert the characteristics of the enterprise, defining the conversion function g as:
wherein l represents EAGTThe current number of layers of the network,is the input of the attention network of the first layer diagram at the moment t, W l Is a parameter matrix for layer i network learning.
f is a function that calculates the similarity between connected nodes, which produces an N x N tensor in the form of:
wherein N is i Representing a set of neighbor nodes for enterprise node i, i.e., i, j e N i The method comprises the steps of carrying out a first treatment on the surface of the The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; the l represents a splicing operation, and T represents a transpose operation.
Alpha is an influence factor function, and an NxN vector is generated, which represents the influence factor of each enterprise node on a certain channel by surrounding nodes, and the form is as follows:
wherein,the p-th channel of the edge feature is shown.
S302, network feature propagation through edge feature enhanced graph annotation meaning network
Each side has a multidimensional feature, each dimensional feature forms a channel, and the output feature of the p-th channel of the first layer is calculated by using the function defined in S301 as follows:
wherein,the input of a layer I graph annotation force network at the moment t is represented; />The input of the p-th channel of the edge characteristic of the attention network of the first layer diagram at the moment t is shown; g is a conversion function, which can convert the characteristics of the node; alpha is an influence factor function, which indicates that each node on a certain channel is influenced by surrounding nodes. First, the input is passed through a conversion function g l The transformation is then performed and the final output Z is obtained by aggregating the information of the enterprise nodes around the channel. After obtaining the output of each channel, each is weighted differently by using the channel-based attention mechanism, and all the outputs are aggregated together, so as to obtain the output at the time t as follows:
where β is obtained by convolving the characteristics of channel p by multiple layers to obtain a value, and then taking softmax () for the values of all channels, β represents the weight of each channel, and is calculated as follows:
where softmax (.) is the activation function for classification and conv (.) is a two-dimensional convolution.
Step3, obtaining time sequence characteristics through LSTM
The time signature is obtained using the transaction network signature for each time period extracted in step2 as an input to the LSTM. Fig. 4 shows LSTM structure, in which the long-short-term memory network has four layers, each with one LSTM neuron. The time characteristics of the last layer output of the LSTM are as follows:
LSTM(X 1 ,X 2 ,...,X t )
wherein X is t At tAnd etching the output of the EGAT network.
Step4, realizing invoice open detection by using deep neural network
Fig. 5 is a schematic diagram of the overall structure of an enterprise virtual invoice detection model that predicts whether an enterprise receiving tax has an invoice virtual invoice behavior by an invoice virtual invoice detection classifier. The invoice virtual open detection classifier consists of a fully-connected deep neural network. The invoice virtual issuing detection classifier takes the time sequence characteristics of the enterprise transaction network obtained in the step3 as input, continuously optimizes the detection capability of the enterprise transaction network to the tax-saving data of Shaanxi through training, and is used for detecting whether the tax-saving enterprise of Shaanxi has invoice virtual issuing behaviors or not.
The method for realizing invoice virtual issuing detection by using the deep neural network comprises the following steps of:
1) And constructing an invoice open detection classifier. Invoice virtual open detection classifier performs invoice virtual open detection on tax data by using time sequence characteristics of tax-paying enterprises, in this embodiment, the invoice virtual open detection classifier is a four-layer fully-connected deep neural network structure as shown in fig. 6, and the number L of input elements of the invoice virtual open detection classifier is determined according to the dimension of the input time sequence characteristics 1 L in the present embodiment 1 20, the second layer and the third layer are hidden layers, and the number of hidden layer ground neurons is L respectively 2 And L 3 L in this example 2 And L 3 16 and 8 respectively, the fourth layer is the output layer, the task is a classification problem, so the number of neurons of the output layer is L 4 =1. Output result p of invoice virtual issuing detection classifier C i =FC(LSTM(X 1 ,X 2 ,...,X t ) Is interval [0,1 ]]Probability value between p i More than or equal to 0.5, indicating that invoice virtual issuing exists, when p i And < 0.5, indicating that no invoice ghosting exists.
2) An invoice virtual open detection model is trained to determine neural network parameters. The detailed steps of the network parameter determination of the enterprise invoice open detection model are shown in fig. 7:
s701, initializing neural network parameters
After determining the network structure in step4, network parameters need to be determined. The neural network in this embodiment includes a convolutional neural network and a fully-connected layer, and the activation functions of the hidden layer are all linear rectification functions (ReLU), expressed in a formalized manner as:
f(x)=max(0,x)
where x is the input to the neuron.
The output layer of the invoice false-open detection classifier adopts a softmax activation function, and is formally expressed as follows:
wherein V is i The output of the classifier front-stage output unit, i represents the class index, the total class number is D, and in the example, two classes of a virtual invoicing enterprise and an unbreakable invoicing enterprise are shared, so D=2, S i The ratio of the index of the current element to the sum of all element indices is shown.
The initialization of the neural network parameters is important for training the network, and good initialization parameters can accelerate convergence. In this embodiment, the Xavier initialization parameter is adopted, and the Xavier initialization can help to reduce the gradient dispersion problem, so that the signal can be transmitted deeper in the neural network, and the specific form is as follows:
wherein n is in Is the input dimension of the layer where the parameters are located, n out Is the output dimension, W, of the layer where the parameters are located i,j Is the weight between the individual neurons.
S702, determining an optimization target
The loss function of the invoice virtual issuing detection network is as follows:
wherein y is i The label value of the tax-paying enterprise i is represented, the label of the tax-paying enterprise with the virtual invoice is 1, and the label of the tax-paying enterprise without the virtual invoice is 10;p i The probability that the tax-paying enterprises have virtual invoice behaviors is represented, N represents the number of tax-paying data with labels in the region, the tax-paying data with labels in the region in the embodiment is tax-paying data with labels in Shaanxi province, and the sample number is 15876. The smaller the loss function of the invoice virtual issuing detection network is, the better the detection effect is represented, and the optimization target is to minimize the loss function.
S703, adjusting invoice virtual issuing detection network parameters by using Shaanxi tax-paying data
And (3) adjusting network parameters of the model by using a BP algorithm, wherein the BP algorithm parameter adjustment is from the back to the front to the input layer of the classifier from the output layer of the virtual open detection classifier.
Step 5, detecting the false invoice of enterprises
And (3) performing invoice virtual issuing detection on tax data of Shaanxi province according to the models obtained in the steps 1,2,3 and 4.
Firstly, constructing an enterprise transaction network by using label-free Shaanxi tax-paying data to be detected, secondly, extracting transaction network characteristics of each time period based on a graph attention network with enhanced edge characteristics, inputting the obtained transaction network characteristics of each time period into an LSTM (link state machine) to output time sequence characteristics, finally inputting the time sequence characteristics into an invoice virtual issuing detection classifier, and marking data with invoice virtual issuing behaviors. In an actual scene, due to variability of invoice virtual issuing modes and concealment of behaviors, the invoice virtual issuing behaviors are difficult to detect, and the method provides a resolvable solution for enterprise invoice virtual issuing detection.
It will be readily appreciated by those skilled in the art that the foregoing is merely illustrative of the present invention and is not intended to limit the invention, but any modifications, equivalents, improvements or the like which fall within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (1)

1. The enterprise invoice virtual issuing detection method of the graph annotation meaning network based on the dynamic edge characteristics is characterized in that firstly, a dynamic enterprise transaction network is constructed according to tax-related data, and node characteristics and edge characteristics are extracted; secondly, extracting transaction network characteristics of each time period based on the graph attention network with enhanced edge characteristics; thirdly, extracting time sequence features by using LSTM, and then constructing an invoice virtual open detection model by using a deep neural network; then, the network parameters are adjusted through training an enterprise invoice open detection model; finally, performing invoice virtual issuing detection on tax payment data by using the trained invoice virtual issuing detection model;
the method comprises the following specific implementation steps:
1) Constructing dynamic enterprise transaction network and extracting node characteristics and edge characteristics
The enterprise transaction network refers to a transaction relationship among enterprises represented by a network structure, and the dynamic transaction network refers to a transaction network in which the transaction relationship changes along with time, so that the network structure also changes in series;
(1) Construction of dynamic enterprise transaction network
Step1, determining key fields; firstly, preprocessing invoice records, deleting abnormal records in the invoice records, and then extracting a sales party tax payer electronic archive number, a purchasing party tax payer electronic archive number and an invoicing time in each invoice record, wherein the sales party tax payer electronic archive number and the purchasing party tax payer electronic archive number are used for representing nodes, and the invoicing time is used for dividing a transaction network;
step2, determining a time span division transaction record: finding the maximum value and the minimum value of the billing time field, determining the threshold value of invoice transaction, dividing the whole time span into T equal parts, and respectively obtaining transaction records of T time periods;
step3, constructing an enterprise transaction network: based on undirected graph theory, a total node is obtained by combining and de-duplicating the electronic file numbers of the tax payer and the electronic file numbers of the tax payer of the purchasing party in each time period, V is used for representing a node set in a transaction network, namely a tax payer set, and V is used for representing a tax payer set i To represent a tax administration enterprise, where v i E V, i=1, …, N; n represents the number of tax-paying enterprises in the transaction network, wherein N= |V| represents the transaction relation of the tax-paying enterprises as the side of the transaction network, and E represents the transaction relation; since trade relationship is time dependent, use E t Enterprise transaction representing time tRelationship of E t ∈R n×n By e i,j,t To represent enterprise v i And enterprise v j A transaction occurs at time t, creating an edge of the transaction network, where e i,j,t =(v i ,v j T) εE; adjacency matrix A t An adjacency matrix representing time t, wherein A t ∈R n×n The matrix contains only two elements 0 or 1, 0 indicates that there is a transaction edge between two enterprises, and 0 indicates that there is no transaction edge between two enterprises, namely when e i,j,t ∈E t When A is i,j,t When =1When A is i,j,t =0; thus, the transaction network is represented by g= (V, E);
(2) Extracting node characteristics of enterprise transaction network
Step1, selecting indexes: screening out related indexes from tax payer attribute information and tax payer financial reports, and firstly extracting basic information of the tax payer; then, the business scale information and the business object information are selected according to the business scope information section of the enterprise; finally, selecting general financial and tax indexes;
step2, characteristic pretreatment: the indexes selected in Step1 comprise three parts: the method comprises the steps of firstly completing data, then carrying out OneHot coding on each field data, and converting the OneHot coding into a vector form; the numerical quantitative feature is first complemented by a missing value, and then the z-score standardization is carried out on each field data;
step3. Merging node features
Combining all the features into a matrix, wherein each row of the matrix represents the feature of one node, and the node features of the transaction network are expressed as X, X epsilon R n×d Where n represents the number of nodes, d represents the dimension of the node feature, x v ∈R d A feature matrix representing node v;
(3) Extracting edge features of enterprise transaction network
Step1, extracting characteristics based on a statistical method: the features extracted based on the statistical method reflect the basic attributes of the transaction sequence, and the mean, variance, maximum value, minimum value, sum and median of invoice record fields are extracted;
step2, extracting transaction proportion characteristics: the transaction occupation bits reflect the specific weights of specific transactions of the two purchasing parties on the two purchasing parties respectively, and the calculation method is as follows:
wherein e ij Representing the total amount of transactions involving a transaction between an ith node and a jth node, a j Representing the proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node j, and the same is true of a i The method is characterized in that the proportion of transaction amount between an ith node and a jth node to total transaction amount of the node i is represented, the calculation process firstly calculates total amount associated with one side, then calculates respective total amounts of the associated purchasing parties, and finally calculates the proportion of the total amount of the purchasing party and the proportion of the total amount of the purchasing party respectively to obtain 2-dimensional side characteristics;
step3, extracting the tax negative characteristics of purchase sales: the transaction tax characteristics of the purchase are reflected on the transaction tax conditions of both purchase parties, and the calculation method is as follows:
wherein t is ij Representing the total tax related to the transaction between the ith node and the jth node, and obtaining a transaction tax negative value by calculating the proportion of the total tax related to one edge to the total amount;
step4 edge feature merging
Combining the edge features extracted by Step1-Step3 into a matrix form, and representing the edge features of the transaction network as X eAn edge feature matrix representing the time t, p representing the dimension of the edge feature, < >>A feature vector representing the transaction edge (v, u) at time t, and X p,t ∈R n×n A channel of the edge feature at time t is shown;
2) Extracting transaction network characteristics of each time period by using a graph attention network enhanced by edge characteristics, wherein the graph attention network is hereinafter referred to as EGAT
(1) Defining a conversion function, a node similarity measurement function and an influence factor function
Step1. define the transformation function: g is a conversion function that can convert the characteristics of the node as follows:
where l represents the current number of layers of the EAGT network,is the input of the attention network of the first layer diagram at the moment t, W l Is a parameter matrix for the first layer network learning;
step2, defining a node similarity measurement function: f is a function that calculates the similarity between connected nodes, which produces an N x N tensor in the form of:
wherein N is i Representing the neighbor node set of node i, i.e., i, j e N i The method comprises the steps of carrying out a first treatment on the surface of the The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; the I represents a splicing operation, and T represents a transposition operation;
step3. defining an influence factor function: alpha is an influence factor function, and an NxN vector is generated, wherein the vector represents the influence factor of each node on a certain channel by surrounding nodes, and the form is as follows:
wherein,the p-th channel of the edge feature is shown;
(2) Network feature propagation through edge feature enhanced graph annotation networks
Each side has a multi-dimensional feature, each dimensional feature constitutes a channel, and then the output of the p-th channel of the first layer is:
wherein,the input of a layer I graph annotation force network at the moment t is represented; />The input of the p-th channel of the edge characteristic of the attention network of the first layer diagram at the moment t is shown; g is a conversion function, which can convert the characteristics of the node; alpha is an influence factor function, and represents the influence factor of each node on a certain channel by surrounding nodes; first, the input is passed through a conversion function g l Converting, and then obtaining a final output Z by aggregating information of nodes around the channel; after obtaining the output of each channel, each is weighted differently by using the channel-based attention mechanism, and all the outputs are aggregated together, so as to obtain the output at the time t as follows:
where β is obtained by convolving the characteristics of channel p by multiple layers to obtain a value, and then taking softmax () for the values of all channels, β represents the weight of each channel, and is calculated as follows:
where softmax (.) is the activation function for classification and conv (.) is a two-dimensional convolution;
3) Acquisition of time series features by LSTM
LSTM is a special type of RNN, which can learn long-term dependency information, and after the output of each moment is obtained through the step 2), the output is input into LSTM to obtain time characteristics:
LSTM(X 1 ,X 2 ,...,X t )
wherein X is t Representing the output of the EGAT network at the time t;
4) Invoice virtual open detection using deep neural network
Inputting the time sequence characteristics obtained in the step 3) into an invoice virtual-issuing detection classifier for detecting whether a tax-paying enterprise has virtual-issuing behaviors or not; the invoice virtual open detection classifier is of a fully-connected deep neural network structure, and the step of constructing the invoice virtual open detection classifier comprises the following steps of:
(1) Construction of invoice open-of-air detection classifier
The invoice virtual open detection classifier is a model of a neural network structure, and the steps of constructing the invoice virtual open detection classifier comprise:
step1, determining an input layer of the invoice false open detection classifier, wherein the number of neurons of the input layer is equal to the dimension of a time feature acquired through LSTM;
step2, determining the output layer of the invoice false open detection classifier, wherein the number of neurons of the output layer is 1, the activation function of the output layer adopts softmax, and the output result is interval [0,1 ]]Probability value between p i To represent;
step3, determining a hidden layer of the invoice open detection classifier, wherein the hidden layer adopts a fully connected network;
inputting the execution result of the step 3) into a deep neural network input layer to obtain a final classification result expression p i =FC(LSTM(X 1 ,X 2 ,...,X t ) When p) i More than or equal to 0.5, indicating that invoice virtual issuing exists, when p i < 0.5, indicating that there is no invoice virtual open behavior;
(2) Training of invoice open detection model
Step1 initializing neural network parameters
The initialization of the neural network parameters avoids gradient dispersion under the condition of deep network layer number, accelerates the network training speed, and satisfies the following two conditions: saturation phenomenon does not occur to each layer of activation value, and each layer of activation value is not 0; xavier initialization helps to reduce gradient dispersion problems so that signals can be transmitted deeper in the neural network, so network parameter initialization takes the form of Xavier initialization, expressed in detail as:
wherein n is in Is the input dimension of the layer where the parameters are located, n out Is the output dimension, W, of the layer where the parameters are located i,j Is the weight between the individual neurons;
step2. Determining optimization objectives
Training the classifier to correctly classify tax data, wherein the classifying effect of the classifier is represented by a loss function, and the smaller the loss function is, the better the classifying effect of the classifier is represented; the output layer of the invoice false-open detection classifier adopts a softmax (& gt) activation function, trains the network to minimize the cross entropy function, and optimizes the targets as follows:
wherein y is i The label value of the tax-paying enterprise i is represented, the label of the tax-paying enterprise with the virtual invoice is 1, and the label of the tax-paying enterprise without the virtual invoice is 0; p is p i The output of the invoice virtual issuing detection classifier is represented, namely the probability that the tax-paying enterprise i has invoice virtual issuing behaviors;
step3, using BP algorithm to adjust network parameters of the model, wherein the learning process consists of forward propagation and error back propagation of signals, and the process comprises:
a) When in forward transmission, the time sequence characteristics of the input tax payment data are transmitted from the input layer of the invoice virtual issuing detection classifier, and are transmitted to the output layer of the invoice virtual issuing detection classifier after being processed layer by each hidden layer; if the actual output of the output layer of the invoice virtual issuing detection classifier is different from the corresponding label value, the reverse propagation stage of the error is shifted;
b) The error back propagation is to reversely propagate the output error of the invoice virtual open detection classifier layer by layer to the input layer of the invoice virtual open detection classifier through the hidden layer, and to distribute the error to all units of each layer, so as to obtain error signals of each layer, wherein the error signals are used as the basis for correcting the weight of the units;
c) The process of weight adjustment of each layer of forward propagation and error reverse propagation of the signal is carried out repeatedly, and the process of weight continuous adjustment, namely the process of network learning training, is carried out until the error of network output is reduced to an acceptable degree or until the preset learning times are carried out;
5) Enterprise invoice virtual issuing detection
Processing tax payment data to be detected in the steps 1), 2) and 3), inputting the obtained tax payment data time sequence characteristics into an invoice virtual issuing detection classifier, and judging whether an enterprise has invoice virtual issuing behaviors according to the output result of the invoice virtual issuing classifier.
CN202010507242.5A 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network Active CN111724241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010507242.5A CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010507242.5A CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Publications (2)

Publication Number Publication Date
CN111724241A CN111724241A (en) 2020-09-29
CN111724241B true CN111724241B (en) 2024-03-29

Family

ID=72566017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010507242.5A Active CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Country Status (1)

Country Link
CN (1) CN111724241B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613928A (en) * 2020-12-17 2021-04-06 航天信息股份有限公司 Method and system for preventing false opening of value-added tax based on machine learning
CN113642735B (en) * 2021-07-28 2023-07-18 浪潮软件科技有限公司 Continuous learning method for identifying virtual tax payers

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269180A (en) * 2016-12-29 2018-07-10 航天信息股份有限公司 Monitor the method and device that enterprise writes out falsely invoice
CN108595621A (en) * 2018-04-23 2018-09-28 泰华智慧产业集团股份有限公司 A kind of early warning analysis method and system write false value added tax invoice
CN109993641A (en) * 2017-12-28 2019-07-09 航天信息股份有限公司 A kind of invoice writes out falsely method for early warning and system
CN110852856A (en) * 2019-11-04 2020-02-28 西安交通大学 Invoice false invoice identification method based on dynamic network representation
WO2020082673A1 (en) * 2018-10-23 2020-04-30 深圳壹账通智能科技有限公司 Invoice inspection method and apparatus, computing device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269180A (en) * 2016-12-29 2018-07-10 航天信息股份有限公司 Monitor the method and device that enterprise writes out falsely invoice
CN109993641A (en) * 2017-12-28 2019-07-09 航天信息股份有限公司 A kind of invoice writes out falsely method for early warning and system
CN108595621A (en) * 2018-04-23 2018-09-28 泰华智慧产业集团股份有限公司 A kind of early warning analysis method and system write false value added tax invoice
WO2020082673A1 (en) * 2018-10-23 2020-04-30 深圳壹账通智能科技有限公司 Invoice inspection method and apparatus, computing device and storage medium
CN110852856A (en) * 2019-11-04 2020-02-28 西安交通大学 Invoice false invoice identification method based on dynamic network representation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Zhang Guojun.Applications and Future Outlook of Blockchain Technology in Digital Tax Administration — Case Study of Blockchain Invoice System in Shenzhen Tax Service, STA.2020,第1卷(第2期),第40-46页. *
刘丽萍.基于有向图的虚开增值税发票行为检测方法研究.2017,第36-47页. *

Also Published As

Publication number Publication date
CN111724241A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN112765358B (en) Taxpayer industry classification method based on noise label learning
US20200401939A1 (en) Systems and methods for preparing data for use by machine learning algorithms
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
CN110532542B (en) Invoice false invoice identification method and system based on positive case and unmarked learning
CN112015863B (en) Multi-feature fusion Chinese text classification method based on graphic neural network
WO2021088499A1 (en) False invoice issuing identification method and system based on dynamic network representation
CN108023876A (en) Intrusion detection method and intruding detection system based on sustainability integrated study
CN110866536A (en) Cross-regional enterprise tax evasion identification method based on PU learning
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
Jerzak et al. An improved method of automated nonparametric content analysis for social science
CN111724241B (en) Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network
CN112464281B (en) Network information analysis method based on privacy grouping and emotion recognition
Li et al. Explain graph neural networks to understand weighted graph features in node classification
CN108647714A (en) Acquisition methods, terminal device and the medium of negative label weight
CN111563187A (en) Relationship determination method, device and system and electronic equipment
CN109977131A (en) A kind of house type matching system
Chen et al. A deep non-negative matrix factorization model for big data representation learning
CN111626331B (en) Automatic industry classification device and working method thereof
Ohanuba et al. Topological data analysis via unsupervised machine learning for recognizing atmospheric river patterns on flood detection
CN112418987A (en) Method and system for rating credit of transportation unit, electronic device and storage medium
CN111625578A (en) Feature extraction method suitable for time sequence data in cultural science and technology fusion field
Wu et al. Customer churn prediction for commercial banks using customer-value-weighted machine learning models
CN110189034A (en) A kind of insider trading method of identification based on model-naive Bayesian
CN116011623A (en) Enterprise marketing item tax risk prediction method based on mixed proportion estimation
US20200285895A1 (en) Method, apparatus and computer program for selecting a subset of training transactions from a plurality of training transactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant