CN111724241A - Enterprise invoice virtual invoice detection method based on dynamic edge feature enhanced graph attention network - Google Patents

Enterprise invoice virtual invoice detection method based on dynamic edge feature enhanced graph attention network Download PDF

Info

Publication number
CN111724241A
CN111724241A CN202010507242.5A CN202010507242A CN111724241A CN 111724241 A CN111724241 A CN 111724241A CN 202010507242 A CN202010507242 A CN 202010507242A CN 111724241 A CN111724241 A CN 111724241A
Authority
CN
China
Prior art keywords
invoice
network
transaction
false
enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010507242.5A
Other languages
Chinese (zh)
Other versions
CN111724241B (en
Inventor
董博
王伊杨
郑庆华
高宇达
阮建飞
王嘉祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010507242.5A priority Critical patent/CN111724241B/en
Publication of CN111724241A publication Critical patent/CN111724241A/en
Application granted granted Critical
Publication of CN111724241B publication Critical patent/CN111724241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an enterprise invoice false invoice detection method based on a dynamic edge feature enhanced graph attention network, which comprises the following steps: firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node characteristics and edge characteristics; secondly, extracting transaction network characteristics of each time period based on the edge characteristic enhanced graph attention network; thirdly, extracting time series characteristics by using the LSTM, and then constructing an invoice false-open detection model by using a deep neural network; then, network parameters are adjusted through a training enterprise invoice false invoice detection model; and finally, carrying out invoice false-open detection on the tax payment data by using the trained invoice false-open detection model. The invention does not depend on expert experience, dynamically analyzes the enterprise transaction network by combining historical information, considers the multidimensional characteristics of the enterprise transaction edge, can detect different invoice false-open behavior modes and improves the detection accuracy.

Description

Enterprise invoice virtual invoice detection method based on dynamic edge feature enhanced graph attention network
Technical Field
The invention belongs to the technical field of tax inspection, and particularly relates to an enterprise invoice false-open detection method based on a dynamic edge feature enhanced graph and attention network.
Background
With the development of 'three-phase of gold tax' and 'internet + tax', the tax department accumulates massive tax-related data and provides favorable data support for invoice false-open detection, however, due to the diversity and concealment of invoice false-open behaviors, how to realize efficient and accurate invoice false-open detection based on tax-related data is still a problem to be solved urgently.
The following documents provide a referable method of detecting false invoices using tax-related data:
document 1. a false invoice issuing warning method and system (201711457960.0);
document 2. a false value-added tax special invoice detection method based on parallel loop detection (201710147850.8);
document 3. invoice false invoice identification method and system based on positive example and unlabeled learning (201910636175.4).
Document 1 proposes an invoice false-open warning method, which establishes a corresponding dimension table according to invoice sales information and ticket information of a target enterprise, associates invoice information in a sales invoice data set with invoice sales information and ticket information in the dimension table according to preset conditions to form a result table, obtains top-amount invoice issuing information in the result table, wherein the amount of an invoice is larger than a threshold value within a set time period, and provides false-open invoice taxpayer warning information according to the top-amount invoice sales information.
Document 2 proposes a method for detecting a special invoice for a false-open value-added tax based on parallel loop detection, which detects the special invoice for the false-open value-added tax by a loop detection method, improves the loop detection, and distributes a calculation task to a plurality of computers in a distributed cluster by a distributed parallel calculation method, thereby improving the calculation efficiency.
Document 3 proposes an invoice false-open recognition method and system based on a positive example and unmarked learning, which includes the following steps: firstly, carrying out characteristic processing and coding processing on basic information of a taxpayer; secondly, merging the basic features and the network features to be used as a feature space, training a second classifier based on the proposed cyclic multi-spy negative example marking method, and acquiring an intersection of all the primary negative sample sets through the second classifier to obtain a final negative sample set; then, constructing an invoice virtual-open prediction model by taking the excavated reliable negative samples and positive samples as a training set based on a k-nearest neighbor regression collaborative training algorithm; finally, inputting the characteristics of the unmarked enterprise sample into the invoice false-open prediction model to identify whether the enterprise has invoice false-open behaviors.
The methods described in the above documents mainly have the following problems: the methods of documents 1 and 2 depend on expert experience, and cannot cope with complicated and various invoice false-open behaviors aiming at a specific invoice false-open behavior mode; the method of document 3 is based on the static relationship network of the enterprise, and can not dynamically analyze the false invoice issuing risk of the tax paying enterprise in combination with the historical transaction information, and the detection accuracy rate is reduced due to neglect of the information.
Disclosure of Invention
The invention aims to solve the problems of the conventional invoice false-open detection method and provides an enterprise invoice false-open detection method based on a dynamic edge feature enhanced graph attention network, aiming at the defects in the literature. Firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node characteristics and edge characteristics; secondly, extracting transaction network characteristics of each time period based on the edge characteristic enhanced graph attention network; thirdly, taking the network characteristics extracted by the edge characteristic enhanced graph attention network as input, and obtaining time sequence characteristics by using a Long Short-Term Memory network (LSTM); then, constructing an invoice false-open detection model by using a deep neural network; adjusting network parameters through a training invoice false-open detection model; and finally, carrying out invoice false invoice detection on the target enterprise by using the trained model. The invention constructs the invoice false-open detection model suitable for all tax payment enterprises based on tax payment data of part of enterprises with labels, and solves the problems that the existing invoice false-open detection mode depends on expert experience, cannot cope with complicated and various invoice false-open behaviors aiming at a specific invoice false-open behavior mode, and cannot dynamically analyze invoice false-open risks of the tax payment enterprises in combination with historical transaction information.
The invention is realized by adopting the following technical scheme:
the method for detecting the false invoicing of the enterprise based on the dynamic edge feature enhanced graph attention network comprises the following steps of firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node features and edge features; secondly, extracting transaction network characteristics of each time period based on the edge characteristic enhanced graph attention network; thirdly, extracting time series characteristics by using the LSTM, and then constructing an invoice false-open detection model by using a deep neural network; then, network parameters are adjusted through a training enterprise invoice false invoice detection model; and finally, carrying out invoice false-open detection on the tax payment data by using the trained invoice false-open detection model.
The further improvement of the invention is that the method is realized as follows:
1) building dynamic enterprise transaction network and extracting node features and edge features
The enterprise transaction network refers to a transaction network in which a network structure is used for representing transaction relations among enterprises, and the dynamic transaction network refers to a transaction network in which the transaction relations change along with time changes, so that the network structure also changes;
(1) building dynamic enterprise transaction networks
Step1, determining a key field; firstly, preprocessing invoice records, deleting abnormal records in the invoice records, and then extracting a sales party taxpayer electronic file number, a purchasing party taxpayer electronic file number and invoicing time in each invoice record, wherein the sales party taxpayer electronic file number and the purchasing party taxpayer electronic file number are used for representing nodes, and the invoicing time is used for dividing a transaction network;
step2. determining the time span to divide the transaction record: finding out the maximum value and the minimum value of the invoicing time field, determining the threshold value of invoice transaction, dividing the whole time span into T equal parts, and respectively obtaining transaction records of T time periods;
step3, constructing an enterprise transaction network: based on the undirected graph theory, the total nodes are obtained by merging and deduplicating the electronic file number of the sales taxpayer and the electronic file number of the purchase taxpayer in each time period, the node set in the transaction network, namely the set of tax-paying enterprises, is represented by V, and V is used for representing the node set in the transaction networkiTo represent a tax enterprise, where vi∈ V, i is 1, …, N represents the number of tax paying enterprises in the trading network, wherein N is V, the trading relation of tax paying enterprises is represented as the edge of the trading network and is represented by E, and the trading relation is related to time and is represented by EtA business transaction relationship representing time t, where Et∈Rn×nBy ei,j,tTo represent enterprise viAnd enterprise vjAt time t, a transaction occurs, creating an edge of the transaction network, where ei,j,t=(vi,vjT) ∈ E, adjacency matrix AtAn adjacency matrix representing time t, where At∈Rn×nThe matrix only contains two elements 0 or 1, 1 indicates that there is a transaction edge between two enterprises, 0 indicates that there is no transaction edge between two enterprises, i.e. when ei,j,t∈EtWhen, Ai,j,tWhen 1 is equal to
Figure BDA0002526995660000031
When, Ai,j,t0; the trading network is therefore denoted by G ═ V, E;
(2) extracting node characteristics of an enterprise transaction network
Step1, index selection: screening out relevant indexes from the taxpayer attribute information and the taxpayer financial statement, and firstly extracting basic information of the taxpayer; then, the management scale information and the management article information are selected according to the management range information section of the enterprise; finally, selecting some general financial and tax indexes;
step2. pretreatment of characteristics: the index selected in Step1 comprises three parts: the method comprises the following steps of numerical qualitative characteristics, numerical quantitative characteristics and text qualitative characteristics, wherein the numerical qualitative characteristics are subjected to data completion firstly, and then each field data is subjected to OneHot coding and converted into a vector form; the numerical quantitative characteristics are firstly completed by missing values, and then z-score standardization is carried out on each field data;
step3. Merge node characteristics
All the characteristics are combined into a matrix form, each row of the matrix represents the characteristics of one node, and the characteristics of the nodes of the transaction network are represented as X, X ∈ Rn×dWhere n denotes the number of nodes, d denotes the dimension of the node feature, xv∈RdA feature matrix representing a node v;
(3) extracting edge features of an enterprise transaction network
Step1. extracting features based on statistical methods: the features extracted based on the statistical method reflect the basic attributes of the transaction sequence, and the mean, variance, maximum, minimum, sum and median of the invoice record field are extracted;
step2, extracting transaction proportion characteristics: the transaction proportion feature reflects the proportion of specific transactions of the seller and the buyer respectively, and the calculation method is as follows:
Figure BDA0002526995660000041
wherein eijRepresenting the total transaction amount, a, involved in a transaction between the ith and jth nodesjThe proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node j is shown, and the same principle is thatiThe proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node i is represented, the calculation process firstly calculates the total amount associated with one edge, then calculates the respective total amounts of the associated seller and buyer, and finally respectively calculates the proportion of the total amount of the seller and the proportion of the total amount of the buyer to obtain the 2-dimensional edge characteristic;
step3, extracting the tax burden characteristics of the sale and purchase transaction: the sale and purchase transaction tax negative characteristic reflects the transaction tax negative condition of both sale and purchase parties, and the calculation method is as follows:
Figure BDA0002526995660000051
wherein t isijThe total tax amount related to the transaction between the ith node and the jth node is represented, and a transaction tax negative value is obtained by calculating the proportion of the total tax amount related to one edge to the total amount;
step4. edge feature merging
Combining the edge features extracted at Step1-Step3 into a matrix form, and representing the edge features of the transaction network as Xe
Figure BDA0002526995660000052
An edge feature matrix representing time t, p represents the dimension of an edge feature,
Figure BDA0002526995660000053
a feature vector representing the transaction edge (v, u) at time t, and X,,p,t∈Rn×nOne channel representing the edge feature at time t;
2) extracting transaction network characteristics of each time period by using graph attention network enhanced by edge characteristics, wherein the graph attention network is abbreviated as EGAT hereinafter
(1) Defining a transfer function, a node similarity measure function and an impact factor function
Define transformation function: g is a transfer function that can convert the characteristics of the node as follows:
Figure BDA0002526995660000054
where l represents the current number of layers of the EAGT network,
Figure BDA0002526995660000055
is the input to the l-th layer graph attention network at time t, WlIs a parameter moment of l-th network learningArraying;
step2, defining a node similarity measure function: f is a function that calculates the similarity between the connected nodes, which produces an nxn tensor of the form:
Figure BDA0002526995660000056
wherein N isiSet of neighbor nodes representing node i, i.e., i, j ∈ Ni(ii) a The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; and | | l represents the splicing operation.TDenoted is a transpose operation;
define the impact factor function: α is an influence factor function, which generates an N × N vector representing the influence factor of each node on a certain channel by surrounding nodes, and its form is as follows:
Figure BDA0002526995660000061
wherein the content of the first and second substances,
Figure BDA0002526995660000062
representing the p channel of the edge feature;
(2) network feature propagation through edge feature enhanced graph attention network
Each edge has multidimensional features, each dimension of which constitutes a channel, then the output of the pth channel of the ith layer is:
Figure BDA0002526995660000063
wherein the content of the first and second substances,
Figure BDA0002526995660000064
representing the input of the ith layer graph attention network at the time t;
Figure BDA0002526995660000065
edge feature number representing ith layer diagram attention network at time tp channels input, g is a conversion function to convert the characteristics of nodes, α is an influence factor function to show the influence factor of each node on a certain channel by the surrounding nodes, the input is first passed through the conversion function glConverting, and then obtaining final output Z by aggregating information of nodes around the channel; after the output for each channel is obtained, all the outputs are aggregated together by giving each different weight using the channel-based attention mechanism, resulting in the output at time t, as follows:
Figure BDA0002526995660000066
where β is obtained by convolving the features of the channel p by multiple layers to obtain a value and then taking softmax (.) for the values of all channels, β represents the weight of each channel, and is calculated as follows:
Figure BDA0002526995660000067
wherein softmax (.) is the activation function for classification, conv (.) is a two-dimensional convolution;
3) time series characterization by LSTM
LSTM is a special type of RNN, which can learn long-term dependence information, and after obtaining the output at each time through step 2), input it into LSTM to obtain the time characteristics:
LSTM(X1,X2,...,Xt)
wherein XtRepresenting the EGAT network output at time t;
4) method for realizing invoice false-open detection by utilizing deep neural network
Inputting the time sequence characteristics obtained in the step 3) into an invoice false invoice detection classifier for detecting whether false invoice actions exist in the tax payment enterprise or not; the invoice false-open detection classifier is of a full-connection deep neural network structure, and the steps of constructing the invoice false-open detection classifier comprise:
(1) constructing an invoice false-open detection classifier
The invoice virtual open detection classifier is a model of a neural network structure, and the steps of constructing the invoice virtual open detection classifier comprise:
step1, determining an input layer of the invoice false-open detection classifier, wherein the neuron number of the input layer is equal to the dimension of the time characteristic acquired through the LSTM;
determining an output layer of the invoice false-open detection classifier, wherein the number of neurons of the output layer is 1 because the invoice false-open detection belongs to a binary classification problem, an output layer activation function adopts softmax, and an output result is an interval [0,1 ]]Probability value between, with piTo represent;
step3, determining a hidden layer of the invoice false-open detection classifier, wherein the hidden layer adopts a full-connection network;
inputting the execution result of the step 3) into the deep neural network input layer to obtain a final classification result expression pi=FC(LSTM(X1,X2,...,Xt) When p) is presentiMore than or equal to 0.5, indicating that the invoicing false-open behavior exists, when piLess than 0.5, indicating that no invoice false invoice issuing behavior exists;
(2) training of invoice false-open detection model
Step1. initializing neural network parameters
The initialization of the neural network parameters avoids the generation of gradient dispersion under the condition that the network layer number is deep, the network training speed is accelerated, and the parameter initialization meets the following two conditions: the activation value of each layer does not generate saturation phenomenon and is not 0; the Xavier initialization helps to reduce the gradient dispersion problem, so that signals can be transmitted deeper in the neural network, and therefore the network parameter initialization adopts the Xavier initialization, which is expressed in a specific form:
Figure BDA0002526995660000081
wherein n isinIs the input dimension of the layer in which the parameter lies, noutIs the output dimension of the layer in which the parameter lies, Wi,jIs the weight between individual neurons;
step2. determining an optimization goal
Training a classifier to correctly classify the tax payment data, wherein the classification effect of the classifier is represented by a loss function, and the smaller the loss function is, the better the classification effect of the classifier is represented; the output layer of the invoice virtual invoice detection classifier adopts a softmax activation function, trains a network to minimize a cross entropy function, and has the following optimization goals:
Figure BDA0002526995660000082
wherein, yiThe label value of the tax paying enterprise i is represented, the tax paying enterprise label of the virtual invoice is 1, and the tax paying enterprise label of the non-virtual invoice is 0; p is a radical ofiThe output of the invoice false-open detection classifier is represented, namely the probability of the invoice false-open behavior of the tax paying enterprise i is represented;
step3, adjusting network parameters of the model by using a BP algorithm, wherein a learning process is composed of forward propagation and error backward propagation of signals, and the learning process comprises the following steps:
a) during forward transmission, inputting tax payment data time sequence characteristics from an input layer of the invoice false-open detection classifier, and after layer-by-layer processing of various hidden layers, transmitting the tax payment data time sequence characteristics to an output layer of the invoice false-open detection classifier; if the actual output of the output layer of the invoice false-open detection classifier is different from the corresponding label value, switching to a reverse propagation stage of errors;
b) the error back propagation is to transmit the output error of the invoice false-open detection classifier to the input layer of the invoice false-open detection classifier layer by layer through a hidden layer, and distribute the error to all units of each layer, thereby obtaining the error signal of each layer, wherein the error signal is used as the basis for correcting the weight of the unit;
c) the weight value adjustment process of each layer of signal forward propagation and error backward propagation is carried out repeatedly, the process of weight value continuous adjustment, namely the process of network learning training, is carried out until the error of network output is reduced to an acceptable degree or is carried out to preset learning times;
5) enterprise invoice false invoice detection
Processing the tax payment data to be detected in the steps 1), 2) and 3), inputting the obtained tax payment data time sequence characteristics into an invoice false-open detection classifier, and then judging whether the enterprise has invoice false-open behaviors according to the output result of the invoice false-open classifier.
The invention has at least the following beneficial technical effects:
the invention provides an enterprise invoice false-open detection method based on a dynamic edge feature enhanced graph attention network, which is used for solving the problems that the existing invoice false-open detection method depends on expert experience, cannot cope with complex and various invoice false-open behaviors aiming at a specific invoice false-open behavior mode, and cannot dynamically analyze tax enterprise invoice false-open risks in combination with historical transaction information, and compared with the prior art, the method disclosed by the invention has the following advantages:
1. the characteristics of a transaction network among enterprises are extracted through a machine learning method, an enterprise invoice false-open detection model is constructed, expert experience is not relied on, and complicated and various invoice false-open behaviors can be responded;
2. based on the edge enhancement graph attention network, not only the basic information of the tax paying enterprise is considered, but also the multidimensional information of the transaction edge is considered, and a more concealed invoice false-open mode can be mined;
3. the method combines the historical transaction information of the enterprise, accurately grasps the dynamic evolution rule of the transaction network of the enterprise, and realizes the dynamic analysis of the invoice false-open risk of the tax paying enterprise.
In summary, the invention introduces an enterprise invoice false-open detection method based on a dynamic edge feature-enhanced graph attention network, and the method adopts the dynamic edge feature-enhanced graph attention network, firstly, the edge feature enhancement method can fully explore multidimensional edge features of a transaction network, secondly, a graph attention mechanism enables a model to learn the global feature dependency relationship among enterprises in the transaction network, and a long-short term memory network can learn abnormal features of invoice false-open from historical transaction information.
Drawings
FIG. 1 is an overall framework flow diagram.
FIG. 2 is a flow chart of constructing a dynamic enterprise trading network and extracting node features and edge features.
Fig. 3 is a flow chart of the transaction network feature extraction for each time segment using EGAT.
Fig. 4 is a schematic diagram of a time-series feature extraction network structure.
FIG. 5 is a schematic diagram of the overall structure of an enterprise false invoice detection model.
Fig. 6 is a schematic diagram of a network structure of the invoice false invoice detection classifier.
FIG. 7 is a flow chart of invoice false positive detection model network parameter determination.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the following describes in detail an enterprise invoice false invoice detection method based on a dynamic edge feature enhanced graph attention network according to the present invention with reference to the accompanying drawings and specific embodiments.
In the embodiment, a Shaanxi province enterprise invoice false-open detection model is established by using Shaanxi province tax payment data consisting of positive samples and negative samples with partial labels. As shown in fig. 1, the present invention mainly comprises the following steps:
step1, constructing a dynamic enterprise transaction network and extracting node characteristics and edge characteristics
Since the tax payment data in shanxi province contains a large amount of text data and cannot be directly applied to the virtual invoice detection based on the enterprise transaction network, the enterprise transaction network needs to be constructed first, and fig. 2 is a flow chart for constructing the dynamic enterprise transaction network and extracting node features and edge features. The specific steps of constructing a dynamic Shanxi province tax payment enterprise transaction network and extracting node features and edge features are as follows:
s201, constructing a dynamic enterprise transaction network
Firstly, preprocessing the Shaanxi province invoice records, deleting abnormal records in the invoice records, and then extracting the electronic file number of the sales party taxpayer, the electronic file number of the purchasing party taxpayer and the invoicing time in each invoice record, wherein the electronic file number of the sales party taxpayer and the electronic file number of the purchasing party taxpayer are used for representing nodes, and the invoicing time is used for dividing a transaction network.
And finding the maximum value and the minimum value of the invoicing time field, and determining the threshold value of the invoice transaction. And reasonably dividing the whole time span into T equal parts to respectively obtain transaction records of T time periods.
Based on an undirected graph theory, a total node is obtained by merging and de-duplicating the sales party taxpayer electronic file number and the purchasing party taxpayer electronic file number of each time period, a node set in the Shaanxi province transaction network, namely a set of Shaanxi province tax-paying enterprises, is represented by V, and V is used for obtaining a total nodeiTo represent a tax enterprise, where vi∈ V, i is 1, …, N represents the number of tax payers in the transaction network of Shaanxi province, wherein N is | V |. the transaction relationship of the tax payers of Shaanxi province is represented as the edge of the transaction network of Shaanxi province and is represented by E, and the transaction relationship is related to time and is represented by EtA business transaction relationship representing time t, where Et∈Rn×nBy ei,j,tTo represent enterprise viAnd enterprise vjAt time t, a transaction occurs, creating an edge of the transaction network, where ei,j,t=(vi,vjT) ∈ E, adjacency matrix AtAn adjacency matrix representing time t, where At∈Rn×nThe matrix only contains two elements 0 or 1, 1 indicates that there is a transaction edge between two enterprises, 0 indicates that there is no transaction edge between two enterprises, i.e. when ei,j,t∈EtWhen, Ai,j,tWhen 1 is equal to
Figure BDA0002526995660000111
When, Ai,j,t0; the trading network can therefore be represented by G ═ V, E.
S202, extracting node characteristics of enterprise transaction network
According to the dynamic Shaanxi province tax payment enterprise transaction network constructed in the S201, relevant indexes are screened out from attribute information of taxpayers, financial statements of the taxpayers and the like, and basic information of the taxpayers is extracted firstly; then, the operation scale information and the operation article information are selected according to the operation article information of the enterprise; and finally, selecting some general financial and tax indexes. The selected indexes mainly comprise three parts: numerical qualitative features, numerical quantitative features, and textual qualitative features. The indexes selected in the embodiment include registration type, taxpayer state, enterprise operation range, number of workers, total investment and sales, and the like. The numerical qualitative characteristics are subjected to data completion, and then OneHot coding is carried out on each field data to convert the field data into a vector form; numerical quantitative characterization is first completed by missing values and then z-score normalization is performed on each field data.
Combining all features into a matrix form, each row of the matrix representing a feature of a node, with X ∈ Rn×dTo express node characteristics of transaction network in Shaanxi province, wherein n expresses the number of tax paying enterprises in Shaanxi province, d expresses the dimension of the characteristic of the tax paying enterprises, and x expresses the node characteristics of the tax paying enterprisesv∈RdA feature matrix representing enterprise v.
S203, extracting edge characteristics of enterprise transaction network
And extracting the mean value, the variance, the maximum value, the minimum value, the sum, the median and the like of the invoice record field in Shaanxi province as the characteristics based on the statistical method.
The method comprises the following steps of extracting transaction proportion features, wherein the transaction proportion features reflect the proportion of specific transactions of two sellers and two buyers respectively, and the calculation method comprises the following steps:
Figure BDA0002526995660000121
wherein eijRepresenting the total amount of transactions involved in the transaction between business i and business j. a isjRepresenting the proportion of the transaction amount between the enterprise i and the enterprise j to the total transaction amount of the enterprise j, the same principle as aiIndicating the proportion of the transaction amount between business i and business j to the total transaction amount of node i. The calculation process includes calculating the total amount of one edge, calculating the total amount of the associated seller and buyer, and calculating the ratio of the total amount to the seller and the ratio of the total amount to the buyer to obtainTo 2-dimensional edge features.
And extracting the negative characteristic of the sales transaction tax. The sale and purchase transaction tax negative characteristic reflects the transaction tax negative condition of both sale and purchase parties, and the calculation method is as follows:
Figure BDA0002526995660000122
wherein t isijAnd the total tax amount related to the transaction between the enterprise i and the enterprise j is represented, and the transaction tax negative value is obtained by calculating the proportion of the total tax amount related to one edge to the total amount.
Combining edge features of Shaanxi province transaction network into matrix form by using XeTo indicate that the user is not in a normal position,
Figure BDA0002526995660000123
an edge feature matrix representing time t, p represents the dimension of an edge feature,
Figure BDA0002526995660000124
a feature vector representing the transaction edge (v, u) at time t, and X,,p,t∈Rn×nOne channel of the edge feature at time t is shown.
Step2, extracting the transaction network characteristics of each time period by using Edge-enhanced Graph attention network (EGAT for short)
And (3) taking the dynamic Shaanxi province tax payment enterprise transaction network constructed in the step1 as a network with the characteristics to be extracted. Fig. 3 is a flow chart for extracting transaction network characteristics for each time segment using EGAT. The specific steps of extracting the transaction network characteristics of each time period by using EGAT are as follows:
s301, defining a conversion function, a node similarity measurement function and an influence factor function
The conversion function can convert the characteristics of the enterprise, and the conversion function g is defined as:
Figure BDA0002526995660000125
wherein l represents the EAGT networkThe number of layers present is such that,
Figure BDA0002526995660000126
is the input to the l-th layer graph attention network at time t, WlIs a parameter matrix for the l-th network learning.
f is a function that calculates the similarity between the connected nodes, which produces an nxn tensor of the form:
Figure BDA0002526995660000131
wherein N isiSet of neighbor nodes representing enterprise node i, i.e., i, j ∈ Ni(ii) a The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; and | | l represents the splicing operation.TA transpose operation is shown.
α is an influence factor function, which generates an N × N vector representing influence factors of each enterprise node on a certain channel by surrounding nodes, and the form of α is as follows:
Figure BDA0002526995660000132
wherein the content of the first and second substances,
Figure BDA0002526995660000133
the p channel of the edge feature is shown.
S302, network feature propagation is carried out through a graph attention network with edge feature enhancement
Each edge has multidimensional features, each dimensional feature constitutes one channel, and the output feature of the p-th channel of the l-th layer is calculated by using the function defined in S301 as:
Figure BDA0002526995660000134
wherein the content of the first and second substances,
Figure BDA0002526995660000135
denotes the l th time tLayer diagram attention network input;
Figure BDA0002526995660000136
representing the input of the p channel of the edge feature of the attention network of the ith layer diagram at the time t, g being a conversion function which can convert the feature of the node, α being an influence factor function which represents the influence factor of each node on a certain channel by the surrounding nodes, firstly passing the input through the conversion function glThe transformation is performed and the final output Z is then obtained by aggregating the information of the enterprise nodes around the channel. After the output for each channel is obtained, all the outputs are aggregated together by giving each different weight using the channel-based attention mechanism, resulting in the output at time t, as follows:
Figure BDA0002526995660000137
where β is obtained by convolving the features of the channel p by multiple layers to obtain a value and then taking softmax (.) for the values of all channels, β represents the weight of each channel, and is calculated as follows:
Figure BDA0002526995660000141
where softmax (.) is the activation function for classification, conv (.) is a two-dimensional convolution.
Step3, obtaining time series characteristics through LSTM
The time signature is obtained using the transaction network signature for each time period extracted in step2 as an input to the LSTM. FIG. 4 is a LSTM architecture, in this embodiment the long-short term memory network has four layers, each with an LSTM neuron. The temporal characteristics of the last layer output of the LSTM are:
LSTM(X1,X2,...,Xt)
wherein XtIs the output of the EGAT network at time t.
Step4, utilizing the deep neural network to realize the false invoice detection
Fig. 5 is a schematic diagram of the overall structure of an enterprise false invoice detection model, which predicts whether the tax payment enterprise has false invoice behaviors through an invoice false invoice detection classifier. The invoice false-invoice detection classifier is composed of a fully-connected deep neural network. And (3) the invoice false-open detection classifier takes the time sequence characteristics of the enterprise transaction network obtained in the step (3) as input, and continuously optimizes the detection capability of the classifier on the tax payment data of the Shaanxi province through training to detect whether the invoice false-open behavior exists in the tax payment enterprise of the Shaanxi province.
The method for realizing the invoice false-open detection by utilizing the deep neural network comprises the following steps of:
1) and constructing an invoice false-open detection classifier. The false invoice detection classifier performs false invoice detection on tax payment data by using the time sequence characteristics of tax payment enterprises, the false invoice detection classifier in the embodiment is a four-layer full-connection deep neural network structure as shown in fig. 6, and the number L of input elements of the false invoice detection classifier is determined according to the input time sequence characteristic dimension1In this embodiment, L120, the second layer and the third layer are hidden layers, and the number of the neurons in the hidden layers is L respectively2And L3In this example L2And L316 and 8 respectively, the fourth layer is an output layer, and the task is a binary problem, so that the number of neurons of the output layer is L41. Output result p of invoice false-open detection classifier Ci=FC(LSTM(X1,X2,...,Xt) Is an interval [0,1 ]]Probability value between p wheniMore than or equal to 0.5, indicating that the invoicing false-open behavior exists, when pi< 0.5, indicating that there is no false invoice.
2) The invoice false positive detection model is trained to determine neural network parameters. The detailed steps of the enterprise invoice false-open detection model network parameter determination are shown in fig. 7:
s701, initializing neural network parameters
After the network structure is determined in step4, the network parameters need to be determined. The neural network in this embodiment includes a convolutional neural network and a fully-connected layer, and the activation functions of the hidden layer all use linear rectification functions (ReLU), and are formally expressed as:
f(x)=max(0,x)
where x is the input to the neuron.
The output layer of the false invoice detection classifier adopts a softmax activation function, and the formalization is represented as follows:
Figure BDA0002526995660000151
wherein ViI represents a category index, the total number of categories is D, and in the example, two categories of enterprises with false invoices and enterprises without false invoices are shared, so that D is 2, and S isiThe ratio of the index of the current element to the sum of the indices of all elements is shown.
The initialization of neural network parameters is important for training the network, and good initialization parameters can accelerate convergence. In this embodiment, Xavier initialization parameters are adopted, and Xavier initialization can help reduce the problem of gradient dispersion, so that signals can be transmitted more deeply in a neural network, and the specific form is as follows:
Figure BDA0002526995660000152
wherein n isinIs the input dimension of the layer in which the parameter lies, noutIs the output dimension of the layer in which the parameter lies, Wi,jIs the weight between individual neurons.
S702, determining an optimization goal
The loss function of the invoice virtual invoice issuing detection network is as follows:
Figure BDA0002526995660000153
wherein, yiThe label value of the tax paying enterprise i is represented, the tax paying enterprise label of the virtual invoice is 1, and the tax paying enterprise label of the non-virtual invoice is 0; p is a radical ofiThe probability of the false invoicing behavior of the tax payment enterprise is represented, N represents the number of the labeled tax payment data of the region, and the labeled tax payment data of the region is Shaanxi province in the embodimentThe number of samples of the marked tax payment data is 15876. The smaller the loss function of the invoice virtual invoice detection network is, the better the detection effect is represented, and the optimization target is the minimized loss function.
S703, adjusting the false invoice invoicing detection network parameters by using the tax payment data of Shaanxi province
And adjusting the network parameters of the model by using a BP algorithm, wherein the BP algorithm parameter adjustment is started from the output layer of the false-invoicing detection classifier to the input layer of the classifier from back to front.
Step 5, enterprise invoice false invoice detection
And (4) carrying out invoice false invoice detection on tax payment data of Shaanxi province according to the models obtained in the steps 1, 2, 3 and 4.
The method comprises the steps of firstly, constructing an enterprise transaction network by using unlabeled Shaanxi province tax payment data to be detected, secondly, extracting transaction network characteristics of each time period based on a graph attention network with edge characteristic enhancement, then inputting the obtained transaction network characteristics of each time period into an LSTM (least squares metric) to output time sequence characteristics, and finally inputting the time sequence characteristics into an invoice false-open detection classifier to mark data with invoice false-open behaviors. In an actual scene, invoice false-open behaviors are difficult to detect due to the variability of invoice false-open modes and the concealment of behaviors, and the method provides a solvable scheme for enterprise invoice false-open detection.
It will be understood by those skilled in the art that the foregoing is only exemplary of the method of the present invention and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (2)

1. The method for detecting the false invoice of the enterprise based on the dynamic edge feature enhanced graph attention network is characterized by comprising the following steps of firstly, constructing a dynamic enterprise transaction network according to tax-related data and extracting node features and edge features; secondly, extracting transaction network characteristics of each time period based on the edge characteristic enhanced graph attention network; thirdly, extracting time series characteristics by using the LSTM, and then constructing an invoice false-open detection model by using a deep neural network; then, network parameters are adjusted through a training enterprise invoice false invoice detection model; and finally, carrying out invoice false-open detection on the tax payment data by using the trained invoice false-open detection model.
2. The method for detecting the false invoice according to claim 1, wherein the method is implemented by the following steps:
1) building dynamic enterprise transaction network and extracting node features and edge features
The enterprise transaction network refers to a transaction network in which a network structure is used for representing transaction relations among enterprises, and the dynamic transaction network refers to a transaction network in which the transaction relations change along with time changes, so that the network structure also changes;
(1) building dynamic enterprise transaction networks
Step1, determining a key field; firstly, preprocessing invoice records, deleting abnormal records in the invoice records, and then extracting a sales party taxpayer electronic file number, a purchasing party taxpayer electronic file number and invoicing time in each invoice record, wherein the sales party taxpayer electronic file number and the purchasing party taxpayer electronic file number are used for representing nodes, and the invoicing time is used for dividing a transaction network;
step2. determining the time span to divide the transaction record: finding out the maximum value and the minimum value of the invoicing time field, determining the threshold value of invoice transaction, dividing the whole time span into T equal parts, and respectively obtaining transaction records of T time periods;
step3, constructing an enterprise transaction network: based on the undirected graph theory, the total nodes are obtained by merging and deduplicating the electronic file number of the sales taxpayer and the electronic file number of the purchase taxpayer in each time period, the node set in the transaction network, namely the set of tax-paying enterprises, is represented by V, and V is used for representing the node set in the transaction networkiTo represent a tax enterprise, where vi∈ V, i is 1, …, N represents the number of tax paying enterprises in the trading network, wherein N is V, the trading relation of tax paying enterprises is represented as the edge of the trading network and is represented by E, and the trading relation is related to time and is represented by EtEnterprise representing time tTransaction relationship, wherein Et∈Rn×nBy ei,j,tTo represent enterprise viAnd enterprise vjAt time t, a transaction occurs, creating an edge of the transaction network, where ei,j,t=(vi,vjT) ∈ E, adjacency matrix AtAn adjacency matrix representing time t, where At∈Rn×nThe matrix only contains two elements 0 or 1, 1 indicates that there is a transaction edge between two enterprises, 0 indicates that there is no transaction edge between two enterprises, i.e. when ei,j,t∈EtWhen, Ai,j,tWhen 1 is equal to
Figure FDA0002526995650000022
When, Ai,j,t0; the trading network is therefore denoted by G ═ V, E;
(2) extracting node characteristics of an enterprise transaction network
Step1, index selection: screening out relevant indexes from the taxpayer attribute information and the taxpayer financial statement, and firstly extracting basic information of the taxpayer; then, the management scale information and the management article information are selected according to the management range information section of the enterprise; finally, selecting some general financial and tax indexes;
step2. pretreatment of characteristics: the index selected in Step1 comprises three parts: the method comprises the following steps of numerical qualitative characteristics, numerical quantitative characteristics and text qualitative characteristics, wherein the numerical qualitative characteristics are subjected to data completion firstly, and then each field data is subjected to OneHot coding and converted into a vector form; the numerical quantitative characteristics are firstly completed by missing values, and then z-score standardization is carried out on each field data;
step3. Merge node characteristics
All the characteristics are combined into a matrix form, each row of the matrix represents the characteristics of one node, and the characteristics of the nodes of the transaction network are represented as X, X ∈ Rn×dWhere n denotes the number of nodes, d denotes the dimension of the node feature, xv∈RdA feature matrix representing a node v;
(3) extracting edge features of an enterprise transaction network
Step1. extracting features based on statistical methods: the features extracted based on the statistical method reflect the basic attributes of the transaction sequence, and the mean, variance, maximum, minimum, sum and median of the invoice record field are extracted;
step2, extracting transaction proportion characteristics: the transaction proportion feature reflects the proportion of specific transactions of the seller and the buyer respectively, and the calculation method is as follows:
Figure FDA0002526995650000021
wherein eijRepresenting the total transaction amount, a, involved in a transaction between the ith and jth nodesjThe proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node j is shown, and the same principle is thatiThe proportion of the transaction amount between the ith node and the jth node to the total transaction amount of the node i is represented, the calculation process firstly calculates the total amount associated with one edge, then calculates the respective total amounts of the associated seller and buyer, and finally respectively calculates the proportion of the total amount of the seller and the proportion of the total amount of the buyer to obtain the 2-dimensional edge characteristic;
step3, extracting the tax burden characteristics of the sale and purchase transaction: the sale and purchase transaction tax negative characteristic reflects the transaction tax negative condition of both sale and purchase parties, and the calculation method is as follows:
Figure FDA0002526995650000031
wherein t isijThe total tax amount related to the transaction between the ith node and the jth node is represented, and a transaction tax negative value is obtained by calculating the proportion of the total tax amount related to one edge to the total amount;
step4. edge feature merging
Combining the edge features extracted at Step1-Step3 into a matrix form, and representing the edge features of the transaction network as Xe
Figure FDA0002526995650000032
Representing the edge feature matrix at time t, pThe dimensions of the edge feature(s) are,
Figure FDA0002526995650000033
a feature vector representing the transaction edge (v, u) at time t, and X,,p,t∈Rn×nOne channel representing the edge feature at time t;
2) extracting transaction network characteristics of each time period by using graph attention network enhanced by edge characteristics, wherein the graph attention network is abbreviated as EGAT hereinafter
(1) Defining a transfer function, a node similarity measure function and an impact factor function
Define transformation function: g is a transfer function that can convert the characteristics of the node as follows:
Figure FDA0002526995650000034
where l represents the current number of layers of the EAGT network,
Figure FDA0002526995650000035
is the input to the l-th layer graph attention network at time t, WlIs a parameter matrix of the l-th network learning;
step2, defining a node similarity measure function: f is a function that calculates the similarity between the connected nodes, which produces an nxn tensor of the form:
Figure FDA0002526995650000041
wherein N isiSet of neighbor nodes representing node i, i.e., i, j ∈ Ni(ii) a The attention mechanism a is a weight matrix between connection layers in a single-layer feedforward neural network; and | | l represents the splicing operation.TDenoted is a transpose operation;
define the impact factor function: α is an influence factor function, which generates an N × N vector representing the influence factor of each node on a certain channel by surrounding nodes, and its form is as follows:
Figure FDA0002526995650000042
wherein the content of the first and second substances,
Figure FDA0002526995650000043
representing the p channel of the edge feature;
(2) network feature propagation through edge feature enhanced graph attention network
Each edge has multidimensional features, each dimension of which constitutes a channel, then the output of the pth channel of the ith layer is:
Figure FDA0002526995650000044
wherein the content of the first and second substances,
Figure FDA0002526995650000045
representing the input of the ith layer graph attention network at the time t;
Figure FDA0002526995650000046
representing the input of the p channel of the edge feature of the attention network of the ith layer diagram at the time t, g being a conversion function which can convert the feature of the node, α being an influence factor function which represents the influence factor of each node on a certain channel by the surrounding nodes, firstly passing the input through the conversion function glConverting, and then obtaining final output Z by aggregating information of nodes around the channel; after the output for each channel is obtained, all the outputs are aggregated together by giving each different weight using the channel-based attention mechanism, resulting in the output at time t, as follows:
Figure FDA0002526995650000047
where β is obtained by convolving the features of the channel p by multiple layers to obtain a value and then taking softmax (.) for the values of all channels, β represents the weight of each channel, and is calculated as follows:
Figure FDA0002526995650000048
wherein softmax (.) is the activation function for classification, conv (.) is a two-dimensional convolution;
3) time series characterization by LSTM
LSTM is a special type of RNN, which can learn long-term dependence information, and after obtaining the output at each time through step 2), input it into LSTM to obtain the time characteristics:
LSTM(X1,X2,...,Xt)
wherein XtRepresenting the EGAT network output at time t;
4) method for realizing invoice false-open detection by utilizing deep neural network
Inputting the time sequence characteristics obtained in the step 3) into an invoice false invoice detection classifier for detecting whether false invoice actions exist in the tax payment enterprise or not; the invoice false-open detection classifier is of a full-connection deep neural network structure, and the steps of constructing the invoice false-open detection classifier comprise:
(1) constructing an invoice false-open detection classifier
The invoice virtual open detection classifier is a model of a neural network structure, and the steps of constructing the invoice virtual open detection classifier comprise:
step1, determining an input layer of the invoice false-open detection classifier, wherein the neuron number of the input layer is equal to the dimension of the time characteristic acquired through the LSTM;
determining an output layer of the invoice false-open detection classifier, wherein the number of neurons of the output layer is 1 because the invoice false-open detection belongs to a binary classification problem, an output layer activation function adopts softmax, and an output result is an interval [0,1 ]]Probability value between, with piTo represent;
step3, determining a hidden layer of the invoice false-open detection classifier, wherein the hidden layer adopts a full-connection network;
inputting the execution result of the step 3)Entering the input layer of the deep neural network to obtain the final classification result expression as pi=FC(LSTM(X1,X2,...,Xt) When p) is presentiMore than or equal to 0.5, indicating that the invoicing false-open behavior exists, when piLess than 0.5, indicating that no invoice false invoice issuing behavior exists;
(2) training of invoice false-open detection model
Step1. initializing neural network parameters
The initialization of the neural network parameters avoids the generation of gradient dispersion under the condition that the network layer number is deep, the network training speed is accelerated, and the parameter initialization meets the following two conditions: the activation value of each layer does not generate saturation phenomenon and is not 0; the Xavier initialization helps to reduce the gradient dispersion problem, so that signals can be transmitted deeper in the neural network, and therefore the network parameter initialization adopts the Xavier initialization, which is expressed in a specific form:
Figure FDA0002526995650000061
wherein n isinIs the input dimension of the layer in which the parameter lies, noutIs the output dimension of the layer in which the parameter lies, Wi,jIs the weight between individual neurons;
step2. determining an optimization goal
Training a classifier to correctly classify the tax payment data, wherein the classification effect of the classifier is represented by a loss function, and the smaller the loss function is, the better the classification effect of the classifier is represented; the output layer of the invoice virtual invoice detection classifier adopts a softmax activation function, trains a network to minimize a cross entropy function, and has the following optimization goals:
Figure FDA0002526995650000062
wherein, yiThe label value of the tax paying enterprise i is represented, the tax paying enterprise label of the virtual invoice is 1, and the tax paying enterprise label of the non-virtual invoice is 0; p is a radical ofiThe output of the invoice false-open detection classifier is shown, namely, the probability that the tax paying enterprise i has the invoice false-open behaviorRate;
step3, adjusting network parameters of the model by using a BP algorithm, wherein a learning process is composed of forward propagation and error backward propagation of signals, and the learning process comprises the following steps:
a) during forward transmission, inputting tax payment data time sequence characteristics from an input layer of the invoice false-open detection classifier, and after layer-by-layer processing of various hidden layers, transmitting the tax payment data time sequence characteristics to an output layer of the invoice false-open detection classifier; if the actual output of the output layer of the invoice false-open detection classifier is different from the corresponding label value, switching to a reverse propagation stage of errors;
b) the error back propagation is to transmit the output error of the invoice false-open detection classifier to the input layer of the invoice false-open detection classifier layer by layer through a hidden layer, and distribute the error to all units of each layer, thereby obtaining the error signal of each layer, wherein the error signal is used as the basis for correcting the weight of the unit;
c) the weight value adjustment process of each layer of signal forward propagation and error backward propagation is carried out repeatedly, the process of weight value continuous adjustment, namely the process of network learning training, is carried out until the error of network output is reduced to an acceptable degree or is carried out to preset learning times;
5) enterprise invoice false invoice detection
Processing the tax payment data to be detected in the steps 1), 2) and 3), inputting the obtained tax payment data time sequence characteristics into an invoice false-open detection classifier, and then judging whether the enterprise has invoice false-open behaviors according to the output result of the invoice false-open classifier.
CN202010507242.5A 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network Active CN111724241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010507242.5A CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010507242.5A CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Publications (2)

Publication Number Publication Date
CN111724241A true CN111724241A (en) 2020-09-29
CN111724241B CN111724241B (en) 2024-03-29

Family

ID=72566017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010507242.5A Active CN111724241B (en) 2020-06-05 2020-06-05 Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network

Country Status (1)

Country Link
CN (1) CN111724241B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613928A (en) * 2020-12-17 2021-04-06 航天信息股份有限公司 Method and system for preventing false opening of value-added tax based on machine learning
CN113642735A (en) * 2021-07-28 2021-11-12 浪潮软件科技有限公司 Continuous learning method for pseudo-tax payer identification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269180A (en) * 2016-12-29 2018-07-10 航天信息股份有限公司 Monitor the method and device that enterprise writes out falsely invoice
CN108595621A (en) * 2018-04-23 2018-09-28 泰华智慧产业集团股份有限公司 A kind of early warning analysis method and system write false value added tax invoice
CN109993641A (en) * 2017-12-28 2019-07-09 航天信息股份有限公司 A kind of invoice writes out falsely method for early warning and system
CN110852856A (en) * 2019-11-04 2020-02-28 西安交通大学 Invoice false invoice identification method based on dynamic network representation
WO2020082673A1 (en) * 2018-10-23 2020-04-30 深圳壹账通智能科技有限公司 Invoice inspection method and apparatus, computing device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269180A (en) * 2016-12-29 2018-07-10 航天信息股份有限公司 Monitor the method and device that enterprise writes out falsely invoice
CN109993641A (en) * 2017-12-28 2019-07-09 航天信息股份有限公司 A kind of invoice writes out falsely method for early warning and system
CN108595621A (en) * 2018-04-23 2018-09-28 泰华智慧产业集团股份有限公司 A kind of early warning analysis method and system write false value added tax invoice
WO2020082673A1 (en) * 2018-10-23 2020-04-30 深圳壹账通智能科技有限公司 Invoice inspection method and apparatus, computing device and storage medium
CN110852856A (en) * 2019-11-04 2020-02-28 西安交通大学 Invoice false invoice identification method based on dynamic network representation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG GUOJUN *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613928A (en) * 2020-12-17 2021-04-06 航天信息股份有限公司 Method and system for preventing false opening of value-added tax based on machine learning
CN113642735A (en) * 2021-07-28 2021-11-12 浪潮软件科技有限公司 Continuous learning method for pseudo-tax payer identification
CN113642735B (en) * 2021-07-28 2023-07-18 浪潮软件科技有限公司 Continuous learning method for identifying virtual tax payers

Also Published As

Publication number Publication date
CN111724241B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN110866536B (en) Cross-regional enterprise tax evasion identification method based on PU learning
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
WO2021088499A1 (en) False invoice issuing identification method and system based on dynamic network representation
CN108647993B (en) Method for identifying relationship between bidders in bidding process
Munappy et al. Data management for production quality deep learning models: Challenges and solutions
CN110956273A (en) Credit scoring method and system integrating multiple machine learning models
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
CN111724241B (en) Enterprise invoice virtual issuing detection method based on dynamic edge feature graph annotation meaning network
Wu et al. Predication of futures market by using boosting algorithm
CN111754317A (en) Financial investment data evaluation method and system
CN108647714A (en) Acquisition methods, terminal device and the medium of negative label weight
Zhang Prediction of Purchase Volume of Cross‐Border e‐Commerce Platform Based on BP Neural Network
CN111626331B (en) Automatic industry classification device and working method thereof
CN117114705A (en) Continuous learning-based e-commerce fraud identification method and system
Wu et al. Customer churn prediction for commercial banks using customer-value-weighted machine learning models
CN116011623A (en) Enterprise marketing item tax risk prediction method based on mixed proportion estimation
CN115994684A (en) Enterprise risk assessment method, enterprise risk assessment device, computer equipment and medium
Khidmat et al. Machine learning in the boardroom: gender diversity prediction using boosting and undersampling methods
Wang et al. Risk assessment of customer churn in telco using FCLCNN-LSTM model
Yang et al. An algorithm for ordinal classification based on pairwise comparison
Nawaiseh et al. Financial Statement Audit Utilising Naive Bayes Networks, Decision Trees, Linear Discriminant Analysis and Logistic Regression
Zimal et al. Customer churn prediction using machine learning
Patil et al. Stock Trend Prediction Using KNN Algorithm
Belsare et al. A novel model for house price prediction with machine learning techniques
Pires Study of Market Influence on Tender Performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant