CN112183948B - Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison - Google Patents

Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison Download PDF

Info

Publication number
CN112183948B
CN112183948B CN202010929732.4A CN202010929732A CN112183948B CN 112183948 B CN112183948 B CN 112183948B CN 202010929732 A CN202010929732 A CN 202010929732A CN 112183948 B CN112183948 B CN 112183948B
Authority
CN
China
Prior art keywords
commodity
enterprise
codes
sales
invoices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010929732.4A
Other languages
Chinese (zh)
Other versions
CN112183948A (en
Inventor
吴敬
周宏立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital China Information Systems Co ltd
Original Assignee
Digital China Information Systems Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital China Information Systems Co ltd filed Critical Digital China Information Systems Co ltd
Priority to CN202010929732.4A priority Critical patent/CN112183948B/en
Publication of CN112183948A publication Critical patent/CN112183948A/en
Application granted granted Critical
Publication of CN112183948B publication Critical patent/CN112183948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison comprises the following steps: step 1, extracting all value-added tax invoice data of a region to be evaluated; step 2, calculating the similarity of any two commodity code sums to form a commodity code similarity matrix; step 3, screening enterprises to be evaluated according to enterprise registration industry information, billing information and value-added tax declaration data; step 4, comparing all the commodity codes related to the screened enterprises with the similarity matrix SIM of step 2, and searching out the enterprises with unmatched in-and-out and corresponding commodity codes to form an inauguration enterprise list; and 5, removing enterprises which are not sold externally due to the self use of certain commodity enterprises after purchase, and forming a final risk list. Compared with the prior art, the invention has high false open risk identification accuracy and can accurately position the specific invoice and the amount of the existing risk.

Description

Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison
Technical Field
The invention relates to the technical field of tax risk assessment, in particular to a business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison.
Background
The virtual invoice refers to illegal act of issuing an invoice which is not in accordance with the actual business conditions, and when tax units and individuals aim to achieve the purpose of tax theft or purchasing units issue an invoice in the commodity transaction process for a certain purpose, the illegal act of falsifying commodity names, commodity quantity, commodity unit price and money is adopted. The method comprises the steps of opening by others in a virtual way, opening by themselves in a virtual way, and introducing four situations of opening by others in a virtual way.
In order to solve the problem of invoice virtual issuing of tax payers, tax authorities use a data information comparison analysis method to evaluate and verify the authenticity and accuracy of invoice issuing, and make qualitative and quantitative judgment on the invoice virtual issuing risk of tax payers, so as to take further investigation measures.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a commercial enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison, which is used for comprehensively analyzing and comparing three layers of commodity short, commodity coding and cargo information and improving the identification accuracy of the virtual issuing risk.
The invention adopts the following technical scheme.
A business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison is characterized by comprising the following steps:
Step 1, extracting all value-added tax invoice data in a set time interval of an area to be evaluated, wherein SP is used for representing commodity coding vectors of all invoices, SP= (SP 1,sp2,…,spj,…,spβ),spj is used for representing the j-th commodity code in the SP, and beta is used for representing the commodity coding quantity included in the SP, j=1, 2, … and beta;
Step 2, calculating similarity sim ab, a, b=1, 2, … and beta of any two commodity codes sp a and sp b, and forming a commodity code similarity matrix by taking sim ab as an element
Step 3, screening enterprises to be evaluated according to enterprise registration industry information, billing information and value-added tax declaration data to form an enterprise vector C= (C 1,c2,…,ck,…,cδ),ck represents kth enterprises to be compared, delta represents the number of enterprises to be compared, and k=1, 2, … and delta;
step 4, comparing all the commodity codes related to the screened enterprises with the similarity matrix SIM of step 2, and searching out the enterprises with unmatched in-and-out and corresponding commodity codes to form an inauguration enterprise list;
and 5, removing enterprises which have no corresponding sales invoices because the self-use enterprises do not sell outwards after purchasing certain commodity enterprises, and forming a final risk list.
Preferably, the set time interval in step 1 is two years.
Preferably, step 2 specifically includes:
Step 2.1, extracting the goods names of all invoices, and generating a word frequency vector CP j, j=1, 2, …, beta for each commodity code sp j;
step 2.2, extracting keywords by using word frequency vectors CP j of each commodity code sp j to form keyword word frequency vectors CP j' of each commodity code sp j;
Step 2.3, calculating the similarity sim ab, a, b=1, 2, …, β of any two commodity codes sp a and sp b using CP "a and CP" b, forming a commodity code similarity matrix using sim ab as element
Preferably, in step 2.1, the goods names of all the value-added tax invoices extracted in step 1 are combined and then subjected to word segmentation to form all invoice word segmentation vectors, all invoices with commodity codes of sp j are extracted, the goods names of the invoices are combined and then subjected to word segmentation to form word segmentation occurrence number vectors with the same length as all the invoice word segmentation vectors, and elements of the word segmentation occurrence number vectors are normalized to form word frequency vectors CP j with commodity codes of sp j;
In step 2.2, CP ji of the component in CP j is taken as an element to form a β×α matrix M, TFIDF conversion is performed on each element CP ji of matrix M, a conversion result CP 'ji is taken as an element to form a β×α matrix M', if the numerical rank of CP 'ji in the j-th row of matrix M' is not set to zero in the first γ, a conversion result CP "ji is taken as an element to form a β×α matrix m″;
In step 2.3, sim ab is calculated as follows,
Wherein:
Represents the product of the number of two vectors,
II represents the length of the vector.
Preferably, in step.2.1, the term frequency cp ji of w i in the full invoice shipment name with commodity code sp j is calculated as follows,
T ji represents the number of occurrences of the ith term w i in FC all in the full invoice cargo name with commodity code sp j, and if not, 0,
Forming word frequency vector CP j=(cpj1,cpj2,…,cpji,…,cp with commodity code sp j).
Preferably, step 3 specifically includes:
Step 3.1, screening enterprises belonging to wholesale industries according to enterprise registration industry information;
step 3.2, removing enterprises with service invoice proportion higher than a threshold according to the enterprise billing information;
And 3.3, screening enterprises of which the proportion of the sales amount of invoices to the total sales amount is larger than a screening threshold value according to enterprise value-added tax declaration data to form an enterprise vector C= (C 1,c2,…,ck,…,cδ),ck represents the kth enterprise to be compared, delta represents the number of enterprises to be compared, and k=1, 2, … and delta).
Preferably, step 4 specifically includes:
And (3) comparing whether the commodity is in sale or not and whether the commodity is in sale or not by utilizing the similarity matrix SIM in the step (2), and comprehensively obtaining a risk enterprise list and commodity codes with risks by combining the two results of whether the commodity is in sale or not and whether the commodity is in sale or not.
Preferably, the pin presence/absence comparison includes: the commodity code vector related to the entry invoice of the enterprise c k is represented by SP k, the mth commodity code in the SP k=(spk1,spk2,…,spkm,…,sp),spkm is represented by SP k, the number related to the commodity code is represented by theta 1, the aggregate amount of all the entry invoices with commodity codes of SP km is represented by amt1 km,
Step 4.1.1, commodity abbreviation comparison, namely commodity abbreviation of commodity code SP km is represented by ti1 km, commodity codes of all commodities, namely ti1 km, are extracted from SP k, and corresponding entry invoice amounts are summarized and represented by amt1' km; extracting commodity codes of all commodities, namely ti1 km, from sales invoices of an enterprise c k, summarizing corresponding sales invoice amounts, and representing with amt1 'km, if amt1' km≥amt1′km represents that the commodity codes of the enterprise are consistent in sales, and not carrying out the following comparison, otherwise, continuing to execute the step 4.1.2;
Step 4.1.2, commodity codes are compared, if the commodity similarity matrix of the step 2 exists commodity codes with the sp km similarity larger than a given threshold value, the commodity codes corresponding to all the commodity codes are extracted, the commodity codes are summarized by amt1 'km, if amt 1' km≥amt1km, the commodity codes of the enterprise are matched, the following comparison is not carried out, otherwise, the step 4.1.3 is continuously executed;
and 4.1.3, comparing the goods information, namely comparing the goods names of all sales invoices of the enterprise c k with the entry invoices of which the goods codes are sp km, searching sales invoices with the consistent goods information, wherein the goods information is consistent with the goods names of all the sales invoices of sp km, at least one sales invoices are identical or mutually contained, if the sales invoices with the consistent goods information exist, extracting the goods codes corresponding to the sales invoices, summarizing the sales invoices with amt1 "km, if amt1 '" km+amt1″″km≥amt1km, indicating that the goods codes of the enterprise are consistent, otherwise, indicating that the goods codes of the enterprise c k have a risk of' whether sales exist.
Preferably, the problematic commodity code amount amt1 km and the expense account difference amt1 km-amt1″″km-amt1″″km in step 4.1.3 represent actual risk sizes.
Preferably, the pin presence/absence comparison comprises: the commodity code vector related to the invoice of the enterprise c k is represented by SP ' k, the nth commodity code in SP ' k is represented by SP ' k=(sp′k1,sp′k2,…,sp′kn,…,sp′kθ2),sp′kn, the quantity related to commodity codes is represented by theta 2,
Step 4.2.1, commodity abbreviation comparison, commodity abbreviation of commodity code SP 'kn is represented by ti2 kn, commodity codes of all commodities abbreviated as ti2 km are extracted in SP' k, commodity codes of all commodities abbreviated as ti2 km are extracted in an entry invoice of enterprise c k, if the entry invoice exists, the commodity codes of the enterprise are consistent in entry invoice, the following comparison is not carried out, otherwise, step 4.2.2 is continuously executed,
Step 4.2.2, commodity codes are compared, if the commodity codes with the sp' kn similarity larger than a given threshold value exist in the commodity similarity matrix of step 2, the commodity codes of the enterprises are matched, the following comparison is not performed, otherwise, the step 4.2.3 is continuously executed,
And 4.2.3, comparing the goods information, namely comparing the goods names of all the incoming invoices of the enterprise c k with the sales invoices of which the goods codes are sp ' kn, searching the incoming invoices with consistent goods information, wherein the consistent goods information means that the goods of the sales invoices are identical to or mutually contain at least one of all the goods names of sp ' kn, if the incoming invoices with consistent goods information exist, the goods codes of the enterprise are consistent, otherwise, the goods codes of the enterprise c k are indicated, the risk of ' whether the sales are present or not exists, and the risk is less than the amount of the goods codes depending on the problem.
Compared with the prior art, the method for evaluating the virtual issuing risk of the commercial enterprise value-added tax invoice for invoice entry and sale item comparison has the beneficial effects that the method for evaluating the virtual issuing risk of the commercial enterprise value-added tax invoice for invoice entry and sale item comparison is a data information comparison analysis method for solving the problem of the virtual issuing of the invoice by a receiver. The method is mainly characterized in that the method is designed according to the industry characteristics of commercial enterprises, comprehensive analysis and comparison are carried out from three layers of commodity abbreviations, commodity codes and cargo information, the false opening risk identification accuracy is high, the specific invoice and the amount with risks can be accurately positioned, and the tax authorities can conveniently carry out subsequent risk coping on the risks.
Drawings
FIG. 1 is a flow chart of the invention providing a business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison;
FIG. 2 is a flow chart of the method for calculating the similarity matrix between all commodity codes in the region to be evaluated;
FIG. 3 is a comparative schematic of the pin entry items of the present invention "pin with or without pin" and "pin with or without pin in".
Detailed Description
The application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.
As shown in FIG. 1, the invention provides a business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison, which comprises the following steps:
And step 1, extracting all value-added tax invoice data of the region to be evaluated. And (3) representing commodity coding vectors of all invoices by using SP, wherein SP= (SP 1,sp2,…,spj,…,spβ),spj represents the j-th commodity code in the SP, and beta represents the number of commodity codes included in the SP.
All value-added tax invoices of all enterprises in the region to be evaluated are taken 2 years in time interval, and data of 2 years from the evaluation time point are taken, wherein the data comprise value-added tax special invoices and value-added tax common invoices. The generated data table comprises 9 main fields including invoice codes, invoice numbers, sales party enterprise ids, purchasing party enterprise ids, goods names, commodity codes, invoicing dates, invoice amounts and invoice tax.
And 2, calculating a similarity matrix between all commodity codes in the region to be evaluated. As shown in fig. 2, the method specifically includes:
And 2.1, extracting the names of cargoes of all invoices, and generating a word frequency vector for each commodity code.
And (3) merging the goods names of all the value-added tax invoices extracted in the step (1) and then performing word segmentation to form an all value-added tax invoice word segmentation vector FC all=(w1,w2,…,wi,…,wα),wi which represents the ith word in FC all, wherein alpha represents the number of words included in FC all.
Extracting all invoices with commodity codes of sp j, merging the commodity names of the invoices, performing word segmentation to form a word segmentation occurrence number vector T j=(tj1,tj2,…,tji,…,t),tji with the same length as that of FC all, wherein the number of occurrences of an ith word w i in FC all in all invoice names with commodity codes of sp j is represented, if the number of occurrences is not represented, the number of occurrences is 0, the elements of the word segmentation occurrence number vector T j are normalized, a word frequency vector CP j=(cpj1,cpj2,…,cpji,…,cp),cpji with commodity codes of sp j is formed, the word frequency of w i in all invoice names with commodity codes of sp j is represented, a preferred but non-limiting embodiment is that CP ji is obtained by calculation according to the following formula,
And 2.2, extracting keywords by using the word frequency vector of each commodity code in the step 2.1 to form the word frequency vector of each commodity code.
Using cp ji as element to form beta x alpha matrix M,
Each element cp ji of matrix M is TFIDF transformed with the following formula,
Using cp 'ji as element to form beta x alpha matrix M',
The matrix M' is processed in the following formula,
One preferred, but non-limiting, embodiment is γ=500.
Using cp ji as element to form beta x alpha matrix M',
The j-th line CP "j of matrix M" represents the keyword vector of all invoices for commodity code sp j, where the word corresponding to the non-zero component is its keyword and the corresponding value is its word frequency.
Step 2.3, calculating similarity sim ab, a, b=1, 2, …, beta of any two commodity codes sp a and sp b by using CP "a and CP" b for all commodity codes, and forming a commodity code similarity matrix by taking sim ab as an elementWherein sim ab is calculated in the following formula,
Wherein:
Represent the product of the number of two vectors.
II represents the length of the vector.
And 3, screening enterprises to be evaluated.
Screening all enterprises in the region to be evaluated according to the following conditions to form an enterprise list to be evaluated:
And 3.1, screening enterprises belonging to wholesale industries (commerce and trade) according to the enterprise registration industry information.
Step 3.2, removing the service invoice from the business with a higher proportion than the set proportion according to the business billing information, wherein the proportion can be set to 40% in a preferred but non-limiting embodiment.
And 3.3, screening enterprises of which the proportion of the sales amount of the invoicing invoice to the total sales amount is larger than a screening threshold according to the enterprise value-added tax declaration data. One preferred, but non-limiting, screening threshold is 80%. The comparison result vector c= (C 1,c2,…,ck,…,cδ),ck represents the kth comparison result, δ represents the comparison result, k=1, 2, …, δ).
And 4, comparing all the related commodity codes of the screened enterprises from three angles of commodity abbreviations, commodity codes and commodity names by utilizing the result in the step 2, and finding out the enterprises with unmatched business and corresponding commodity codes to form an inauguration enterprise list, as shown in figure 3. The tax risk of the enterprise value-added tax invoice comprises two types of virtual withholding and virtual issuing, corresponding to the pin entry to compare the existence of the pin and the two methods of pin advance or not are compared. The method specifically comprises the following steps:
Step 4.1, comparing "whether there is an entry or not, wherein epsilon 1 k is used for representing the number of the entry invoices of the enterprise c k, SP k is used for representing the related commodity code vector, SP k=(spk1,spk2,…,spkm,…,spkθ1),spkm is used for representing the mth commodity code in SP k, theta 1 is used for representing the number of the related commodity codes, AMT1 k is used for representing the amount vector of the entry invoices, and AMT1 k=(amt1k1,amt1k2,…,amt1km,…,amt1kθ1),amt1km is used for representing the total amount of all the entry invoices with the commodity code SP km.
Step 4.1.1, commodity abbreviation comparison, wherein ti1 km is used for representing commodity abbreviation of commodity code SP km, wherein the commodity abbreviation is a field in a commodity code table issued by tax bureau, the commodity abbreviation and the commodity code are in one-to-many relation, commodity codes of all commodities abbreviated as ti1 km are extracted from SP k, and corresponding entry invoice amounts are summarized and represented by amt1' km; in the sales invoice of the enterprise c k, the commodity codes of all commodities, which are abbreviated as ti1 km, are extracted, the corresponding sales invoice amounts are summarized and represented by amt1 'km, if amt1' km≥amt1″km, the commodity codes of the enterprise are matched, the following comparison is not performed, and otherwise, the step 4.1.2 is continuously executed.
Step 4.1.2, comparing commodity codes, according to the commodity similarity matrix in step 2, if the commodity codes with sp km similarity larger than a given threshold exist in the commodity sales invoice, extracting the sales invoice corresponding to the commodity code(s), and one skilled in the art can arbitrarily set a threshold, wherein a preferred but non-limiting implementation mode is that the threshold is set to 0.4 to 0.6, the commodity codes are dynamically adjusted according to the accuracy requirement of the comparison, the sales amount is summarized by amt1 '"km, if amt 1'" km≥amt1km, the commodity codes of the enterprises are matched, the comparison is not performed, otherwise, the step 4.1.3 is continuously executed.
And 4.1.3, comparing the goods information, namely comparing the goods names of all sales invoices of the enterprise c k with the entry invoices of which the goods codes are sp km, searching sales invoices with the consistent goods information, wherein the goods information is consistent with the goods names of all the sales invoices of sp km, at least one sales invoices are identical or mutually contained, if the sales invoices with the consistent goods information exist, extracting the goods codes corresponding to the sales invoices, summarizing the sales invoices with amt1 "km, if amt1 '" km+amt1″″km≥amt1km, indicating that the goods codes of the enterprise are consistent, otherwise, indicating that the goods codes of the enterprise c k have a risk of' whether sales exist.
The code amount of the problematic commodity, amt1 km, and the expense amount difference, amt1 km-amt1″′km-amt1″″km, represent the actual risk size.
Step 4.2, comparing whether the sales are in or not, wherein epsilon 2 k is used for representing the number of sales invoices of the enterprise c k, SP k is used for representing the related commodity code vector, SP k=(spk1,spk2,…,spkm,…,spk02),spkm is used for representing the mth commodity code in SP k, theta 2 is used for representing the number related to the commodity code, AMT2 k is used for representing the sales invoice amount vector, and AMT2 k=(amt2k1,amt2k2,…,amt2km,…,amt2),amt2km is used for representing the sum of all sales invoices with commodity codes SP km.
Step 4.2.1, commodity abbreviation comparison, namely commodity abbreviation of commodity code SP km is represented by ti2 km, commodity codes of all commodity abbreviations ti2 km are extracted in SP k, commodity codes of all commodity abbreviations ti2 km are extracted in an entry invoice of enterprise c k, if the entry invoice exists, the commodity codes of the enterprise are consistent in marketing, the following comparison is not carried out, and otherwise, step 4.2.2 is continuously executed.
And 4.2.2, comparing commodity codes, wherein according to the commodity similarity matrix in the step 2, if the commodity codes with the sp km similarity larger than a given threshold exist in the business entry invoice, the business entry invoice indicates that the business commodity codes are matched with each other, the comparison is not performed, and otherwise, the step 4.2.3 is continuously executed.
And 4.2.3, comparing the goods information, namely comparing the goods names of all the incoming invoices of the enterprise c k with the sales invoices of which the goods codes are sp km, searching the incoming invoices with consistent goods information, wherein the consistent goods information means that the goods of the sales invoices are identical with or mutually contain at least one of all the goods names of sp km, if the incoming invoices with consistent goods information exist, the goods codes of the enterprise are consistent with the sales, otherwise, the goods codes of the enterprise c k are indicated, the risk of 'sales existence and no sales exist', and the risk is less than the amount of goods codes depending on problems.
And 4.3, combining the two results of 'whether the sales are in or not' and 'whether the sales are in or not', and obtaining an inauguration enterprise list and a commodity code with problems.
And 5, further screening the risk list in the step 4, wherein after a certain commodity enterprises purchase, the commodity enterprises do not sell the commodity enterprises to the outside due to self-use, so that the commodity enterprises do not have corresponding sales invoices, and the problem enterprises can be removed from the risk list to form a final risk list.
Compared with the prior art, the method for evaluating the virtual issuing risk of the commercial enterprise value-added tax invoice for invoice entry and sale item comparison has the beneficial effects that the method for evaluating the virtual issuing risk of the commercial enterprise value-added tax invoice for invoice entry and sale item comparison is a data information comparison analysis method for solving the problem of the virtual issuing of the invoice by a receiver. The method is mainly characterized in that the method is designed according to the industry characteristics of commercial enterprises, comprehensive analysis and comparison are carried out from three layers of commodity abbreviations, commodity codes and cargo information, the false opening risk identification accuracy is high, the specific invoice and the amount with risks can be accurately positioned, and the tax authorities can conveniently carry out subsequent risk coping on the risks. While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims (7)

1. A business enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison is characterized by comprising the following steps:
Step 1, extracting all value-added tax invoice data in a set time interval of an area to be evaluated, wherein SP is used for representing commodity coding vectors of all invoices, SP= (SP 1,sp2,…,spj,…,spβ),spj is used for representing the j-th commodity code in the SP, and beta is used for representing the commodity coding quantity included in the SP, j=1, 2, … and beta;
Step 2, calculating similarity sim ab, a, b=1, 2, … and beta of any two commodity codes sp a and sp b, and forming a commodity code similarity matrix by taking sim ab as an element
Step 3, screening enterprises to be evaluated according to enterprise registration industry information, billing information and value-added tax declaration data to form an enterprise vector C= (C 1,c2,…,ck,…,cδ),ck represents kth enterprises to be compared, delta represents the number of enterprises to be compared, and k=1, 2, … and delta;
Step 4, carrying out the business-in-and-out comparison and the business-in-and-out comparison on the commodity abbreviation, the commodity code and/or the commodity name by utilizing the similarity matrix SIM of the step 2, and combining the business-in-and-out and the business-in-and-out results to find out business-in-and-out mismatch and corresponding commodity codes to form a risk business list and a commodity code with risk; the pin presence/absence comparison includes: the commodity code vector related to the entry invoice of the enterprise c k is represented by SP k, the mth commodity code in the SP k=(spk1,spk2,…,spkm,…,spkθ1),spkm is represented by SP k, the number related to the commodity code is represented by theta 1, the aggregate amount of all the entry invoices with commodity codes of SP km is represented by amt1 km,
Step 4.1.1, commodity abbreviation comparison, namely commodity abbreviation of commodity code SP km is represented by ti1 km, commodity codes of all commodities, namely ti1 km, are extracted from SP k, and corresponding entry invoice amounts are summarized and represented by amt1' km; extracting commodity codes of all commodities, namely ti1 km, from sales invoices of an enterprise c k, summarizing corresponding sales invoice amounts, and representing with amt1 'km, if amt1' km≥amt1′km represents that the commodity codes of the enterprise are consistent in sales, and not carrying out the following comparison, otherwise, continuing to execute the step 4.1.2;
Step 4.1.2, commodity codes are compared, if the commodity similarity matrix of the step 2 exists commodity codes with the sp km similarity larger than a given threshold value, the commodity codes corresponding to all the commodity codes are extracted, the commodity codes are summarized by amt1 'km, if amt 1' km≥amt1km, the commodity codes of the enterprise are matched, the following comparison is not carried out, otherwise, the step 4.1.3 is continuously executed;
Step 4.1.3, comparing the goods information, namely comparing the goods names of all sales invoices of the enterprise c k with the entry invoices of the commodity code sp km, searching sales invoices with consistent goods information, wherein the goods information is consistent with the goods names of all the sales invoices of sp km, at least one sales invoices are identical or mutually contained, if the sales invoices with consistent goods information exist, extracting commodity codes corresponding to the sales invoices, summarizing the sales invoices with amt1 "km, if amt1 '" km+amt1″″km≥amt1km, indicating that the commodity codes of the enterprise are consistent, otherwise, indicating that the commodity codes of the enterprise c k have a risk of' whether sales exist;
The pin-out comparison includes: the commodity code vector related to the invoice of the enterprise c k is represented by SP ' k, the nth commodity code in SP ' k is represented by SP ' k=(sp′k1,sp′k2,…,sp′kn,…,sp′kθ2),sp′kn, the quantity related to commodity codes is represented by theta 2,
Step 4.2.1, commodity abbreviation comparison, commodity abbreviation of commodity code SP 'kn is represented by ti2 kn, commodity codes of all commodities abbreviated as ti2 km are extracted in SP' k, commodity codes of all commodities abbreviated as ti2 km are extracted in an entry invoice of enterprise c k, if the entry invoice exists, the commodity codes of the enterprise are consistent in entry invoice, the following comparison is not carried out, otherwise, step 4.2.2 is continuously executed,
Step 4.2.2, commodity codes are compared, if the commodity codes with the sp' kn similarity larger than a given threshold value exist in the commodity similarity matrix of step 2, the commodity codes of the enterprises are matched, the following comparison is not performed, otherwise, the step 4.2.3 is continuously executed,
Step 4.2.3, comparing the goods information, namely comparing the goods names of all the incoming invoices of the enterprise c k with the sales invoices of which the goods codes are sp ' kn, searching the incoming invoices with consistent goods information, wherein the consistent goods information means that the goods of the sales invoices are identical to or mutually contain at least one of all the goods names of sp ' kn, if the incoming invoices with consistent goods information exist, the goods codes of the enterprise are consistent in sales, otherwise, the goods codes of the enterprise c k are indicated, the risk of ' whether sales exist or not exists, and the risk is less than the amount of goods codes depending on problems;
and 5, removing enterprises which have no corresponding sales invoices because the self-use enterprises do not sell outwards after purchasing certain commodity enterprises, and forming a final risk list.
2. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the entry and sales item comparison according to claim 1, wherein the method is characterized in that:
the set time interval in the step 1 is two years.
3. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the entry and sales item comparison according to claim 1, wherein the method is characterized in that:
the step 2 specifically comprises the following steps:
Step 2.1, extracting the goods names of all invoices, and generating a word frequency vector CP j, j=1, 2, …, beta for each commodity code sp j;
step 2.2, extracting keywords by using word frequency vectors CP j of each commodity code sp j to form keyword word frequency vectors CP j' of each commodity code sp j;
Step 2.3, calculating the similarity sim ab, a, b=1, 2, …, β of any two commodity codes sp a and sp b using CP "a and CP" b, forming a commodity code similarity matrix using sim ab as element
4. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the entry and sales item comparison according to claim 3, wherein the method is characterized in that:
In step 2.1, the goods names of all the value added tax invoices extracted in step 1 are combined and then subjected to word segmentation to form all invoice word segmentation vectors, all invoices with commodity codes of sp j are extracted, the goods names of the invoices are combined and then subjected to word segmentation to form word segmentation occurrence number vectors with the same length as all the invoice word segmentation vectors, and elements of the word segmentation occurrence number vectors are normalized to form word frequency vectors CP j with commodity codes of sp j;
In step 2.2, CP ji of the component in CP j is taken as an element to form a β×α matrix M, TFIDF conversion is performed on each element CP ji of matrix M, a conversion result CP 'ji is taken as an element to form a β×α matrix M', if the numerical rank of CP 'ji in the j-th row of matrix M' is not set to zero in the first γ, a conversion result CP "ji is taken as an element to form a β×α matrix m″;
In step 2.3, sim ab is calculated as follows,
Wherein:
Represents the product of the number of two vectors,
The term "vector" refers to a vector.
5. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the entry and sales item comparison of claim 4, wherein the method is characterized by comprising the following steps of:
In step 2.1, word frequency cp ji of w i in all invoice cargo names of commodity code sp j is obtained through calculation according to the following formula,
T ji represents the number of occurrences of the ith term w i in the full value added tax invoice word segmentation vector FC all in the full invoice cargo name of commodity code sp j, and if not, it is 0, forming word frequency vector CP j=(cpj1,cpj2,…,cpji,…,cp of commodity code sp j.
6. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the inlet-outlet item comparison according to any one of claims 1 to 5, wherein the method is characterized in that:
The step 3 specifically comprises the following steps:
Step 3.1, screening enterprises belonging to wholesale industries according to enterprise registration industry information;
step 3.2, removing enterprises with service invoice proportion higher than a threshold according to the enterprise billing information;
And 3.3, screening enterprises of which the proportion of the sales amount of invoices to the total sales amount is larger than a screening threshold value according to enterprise value-added tax declaration data to form an enterprise vector C= (C 1,c2,…,ck,…,cδ),ck represents the kth enterprise to be compared, delta represents the number of enterprises to be compared, and k=1, 2, … and delta).
7. The business enterprise value-added tax invoice virtual issuing risk assessment method based on the entry and sales item comparison according to claim 1, wherein the method is characterized in that:
the encoding amount amt1 km and the difference amount amt1 km-amt1″′km-amt1″″km of the sales amount of the problematic commodity in step 4.1.3 represent the actual risk size.
CN202010929732.4A 2020-09-07 2020-09-07 Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison Active CN112183948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010929732.4A CN112183948B (en) 2020-09-07 2020-09-07 Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010929732.4A CN112183948B (en) 2020-09-07 2020-09-07 Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison

Publications (2)

Publication Number Publication Date
CN112183948A CN112183948A (en) 2021-01-05
CN112183948B true CN112183948B (en) 2024-05-28

Family

ID=73925632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010929732.4A Active CN112183948B (en) 2020-09-07 2020-09-07 Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison

Country Status (1)

Country Link
CN (1) CN112183948B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268758A (en) * 2014-09-15 2015-01-07 周刚 Merchandise anti-counterfeiting system based on invoice and third-party e-commerce platform
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN110659948A (en) * 2018-06-13 2020-01-07 中国软件与技术服务股份有限公司 Calculation method for matching degree of commodity sold and false invoice risk discovery method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN104268758A (en) * 2014-09-15 2015-01-07 周刚 Merchandise anti-counterfeiting system based on invoice and third-party e-commerce platform
CN110659948A (en) * 2018-06-13 2020-01-07 中国软件与技术服务股份有限公司 Calculation method for matching degree of commodity sold and false invoice risk discovery method

Also Published As

Publication number Publication date
CN112183948A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
KR101468764B1 (en) Methods and apparatus for implementing an ensemble merchant prediction system
US20160342999A1 (en) Method, system, and computer program product for linking customer information
WO2005101265A2 (en) Systems and methods for investigation of financial reporting information
US20080208780A1 (en) System and method for evaluating documents
US10509811B2 (en) System and method for improved analysis of travel-indicating unstructured electronic documents
CN110019324B (en) Method and system for generating taxpayer fund loop
US11138372B2 (en) System and method for reporting based on electronic documents
CN112131348B (en) Method for preventing repeated declaration of project based on similarity of text and image
Shome et al. Financial distress in Indian aviation industry: Investigation using bankruptcy prediction models
CN112182207B (en) Invoice virtual offset risk assessment method based on keyword extraction and rapid text classification
Branstetter et al. Does" Made in China 2025" Work for China? Evidence from Chinese Listed Firms
US20130006820A1 (en) System and Method of Determining the Quality of Enhanced Transaction Data
US8505811B2 (en) Anomalous billing event correlation engine
CN114187084A (en) Method for identifying certificate subjects according to classification of electronic invoice tax
CN112183948B (en) Commercial and trade enterprise value-added tax invoice virtual issuing risk assessment method based on entry and sales item comparison
TWI517072B (en) System and method for comparing account receivables data or other transaction data among sellers and buyers
CN110874745A (en) Bill return management system
CN115108222B (en) Sorting method of intelligent sorting system for cross-border cargoes
Feng et al. Export capacity constraints and distortions
US20070265886A1 (en) Warranty management system and method
CN111724093B (en) HS (high speed) coding management method and system for B2C commodity outlet
CN113869802A (en) Production enterprise invoice false invoice risk assessment method based on sales entry comparison
CN112232894A (en) Data analysis method based on value-added tax invoice
US20100257073A1 (en) Duplicate Payment Prevention
CN106204174A (en) The method that commodity in sales slip are classified

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant