CN110019404B - System and method for determining tax-recommending classification code of commodity - Google Patents

System and method for determining tax-recommending classification code of commodity Download PDF

Info

Publication number
CN110019404B
CN110019404B CN201711450703.4A CN201711450703A CN110019404B CN 110019404 B CN110019404 B CN 110019404B CN 201711450703 A CN201711450703 A CN 201711450703A CN 110019404 B CN110019404 B CN 110019404B
Authority
CN
China
Prior art keywords
commodity
utilization rate
value
invoice data
classification code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711450703.4A
Other languages
Chinese (zh)
Other versions
CN110019404A (en
Inventor
刘丹
范钢
潘竞旭
田宜喜
谢宇
张玉魁
陈荣兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201711450703.4A priority Critical patent/CN110019404B/en
Publication of CN110019404A publication Critical patent/CN110019404A/en
Application granted granted Critical
Publication of CN110019404B publication Critical patent/CN110019404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission

Abstract

The invention provides a system for determining a recommended tax classification code of a commodity, which comprises: the invoice data acquisition unit acquires taxpayer information and value-added tax invoice data; the invoice data cleaning unit is used for preprocessing the value-added tax invoice data acquired by the invoice data acquisition unit and cleaning redundant data without utilization value in the invoice data; the invoice data analysis unit is used for calculating the utilization rate of each sort of classified code once issued by each commodity in the invoice data; the invoice model establishing unit is used for correcting the utilization rate of the classification codes of each commodity according to the weight value of the taxpayer on the utilization rate of the classification codes of each commodity, and normalizing the corrected utilization rate of the classification codes to establish a mathematical model; and the test unit is used for importing invoice data with known commodity classification codes into the established invoice model for inspection to obtain the optimal value of the weight value and determine the recommended classification codes of each commodity.

Description

System and method for determining tax-recommending classification code of commodity
Technical Field
The present invention relates to the field of tax control, and more particularly, to a system and method for determining a recommended tax classification code for a good.
Background
According to the relevant requirements in the bulletin about the tax collection management items related to the comprehensive push away business tax improvement value-added tax test points (No. 23 in 2016 of the State tax administration bulletin), the State tax administration starts to try out the relevant functions of adding tax classification codes and assigning codes in invoicing software at 6 months of 2016. According to the regulation, the taxpayer must assign codes to each commodity invoiced in the invoicing process in future to normally issue the value-added tax invoice. The operation of commodity coding greatly increases the workload of invoicing of taxpayers and reduces the invoicing efficiency, so that the resistance is higher in the process of pushing the taxpayers for nearly one year, and the taxpayers in many places even refuse to upgrade.
According to the operation flow of issuing the value-added tax invoice in the past, the invoicer can directly issue the invoice in the invoicing interface, no matter the invoice to be issued, whether the commodity is coded or not. However, after the tax classification coding function is added, the invoicer must assign codes to the commodities to be invoiced on the commodity classification coding setting interface before invoicing, and then invoices can be invoiced. When assigning codes to commodities, the selection must be carried out in thousands of commodity classification codes. Therefore, the billing workload of the drawer is increased, the user experience of the taxpayer is reduced, the taxpayer is inaccurate in assigning codes, and the data accuracy of the tax classification codes is greatly reduced.
Disclosure of Invention
In order to solve the technical problems of large workload and low accuracy in determining the classification code of the commodity when a taxpayer issues an invoice in the background art, the invention provides a system for determining the recommended tax classification code of the commodity, which comprises the following steps:
the invoice data acquisition unit is used for acquiring taxpayer information and value-added tax invoice data, and in practical application, as the national tax administration starts to try in 2016 (6) months to add related functions of tax classification coding and assigning codes in invoicing software, the acquired data of the invention is all value-added tax invoice data which is issued after 2016 (6) months;
the invoice data cleaning unit is used for preprocessing the value-added tax invoice data acquired by the invoice data acquisition unit, cleaning redundant data without utilization value in the invoice data, and effectively improving the efficiency of invoice data analysis of subsequent data by invoice cleaning;
the invoice data analysis unit is used for calculating the utilization rate of each classification code once issued by each commodity in the invoice data;
an invoice model establishing unit for correcting the utilization rate of the classification code of each commodity according to the weight value of the taxpayer on the utilization rate of the classification code of each commodity and normalizing the corrected utilization rate of the classification code to establish a mathematical model, wherein when the industry to which the commodity belongs and the operating range of the taxpayer both conform to the commodity, the weight value of the utilization rate is set to be alpha, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the utilization rate is set to be beta, and when the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the utilization rate is set to be gamma;
and the testing unit is used for importing invoice data with known commodity classification codes into the established invoice model, setting different alpha, beta and gamma, then testing, solving the optimal values of the industry and the operation range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate, and calculating the utilization rate of each tax classification code of each commodity based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
Further, the data collected by the invoice data collection unit comprises a gold tax third period, invoicing software, taxpayer information of an invoice platform and value-added tax invoice data.
Further, the invoice data cleaning unit is used for preprocessing invoice data collected by the invoice data collecting unit and led into a Hadoop data platform, and redundant data in the invoice data are cleaned through a Spark program.
Further, the formula for the invoice data analysis unit to calculate the usage rate of each classification code of each commodity is as follows:
Figure BDA0001528430140000031
wherein, PiIs the first of each commodityi utilization of the classified coding, AiThe total number of invoices of all taxpayers of the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Further, the formula for the invoice model establishing unit to correct the usage rate of the classification code of each commodity according to the weight value of the taxpayer on the usage rate of the classification code of each commodity is as follows:
Figure BDA0001528430140000032
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, and B is the sum of the invoices of all taxpayers of all classification codes of each commodity.
Further, the invoice model building unit normalizes the corrected classification code usage to build a mathematical model according to the following formula:
Figure BDA0001528430140000033
wherein, Pi"is the usage rate of each commodity after the ith classification code is normalized,
Figure BDA0001528430140000034
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Further, the system also comprises a commodity tax classification code recommending unit, wherein the commodity tax classification code recommending unit is used for sequencing the normalized utilization rate of different classification codes of each commodity from large to small, and feeding back 1 to 3 tax classification codes with the maximum utilization rate as recommended tax classification codes to the drawer client.
In the application, a tax classification code recommending interface is designed for the invoicing software, when an invoicer inputs a commodity name at a client, the client sends a tax classification code recommending request to the system, the system feeds back the determined tax classification code to the client by a commodity tax classification code recommending unit according to the result of data model calculation, and the background automatically and intelligently matches and recommends the tax classification code of the commodity when the invoicer fills the commodity name of the invoice, so that the invoicer invoicing smoothness and the accuracy of the tax classification code are improved.
Further, the value of α of the invoice model creation unit is 1, the value of β is 0.5, and the value of γ is 0.2.
According to another aspect of the present invention, there is provided a method of determining a recommended tax classification code for an item, the method comprising:
collecting taxpayer information and value-added tax invoice data;
preprocessing collected value-added tax invoice data, and cleaning redundant data without utilization value in the invoice data;
aiming at each commodity in the invoice data after the redundant data is eliminated, calculating the utilization rate of each classification code issued by the commodity;
correcting the utilization rate of the classification codes of each commodity according to the weight value of the taxpayer on the utilization rate of the classification codes of each commodity, and normalizing the corrected utilization rate of the classification codes to establish a mathematical model, wherein when the industry to which the commodity belongs and the operating range of the taxpayer both conform to the commodity, the weight value of the utilization rate is set to be alpha, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the utilization rate is set to be beta, and when the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the utilization rate is set to be gamma;
and importing invoice data with known commodity classification codes into the established invoice model, setting different alpha, beta and gamma, then testing, solving the optimal values of the industry and the operation range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate, and calculating the utilization rate of each tax classification code of each commodity based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
Further, the collected value-added tax invoice data is preprocessed by importing the invoice data collected by the invoice data collection unit into a Hadoop data platform and cleaning redundant data in the invoice data by using a Spark program.
Further, the formula for calculating the usage rate of each category code of each commodity is:
Figure BDA0001528430140000051
wherein, PiIs the utilization rate of the ith classification code of each commodity, AiThe total number of invoices of all taxpayers of the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Further, the formula for correcting the usage rate of the classification code of each commodity according to the weight value of the taxpayer on the usage rate of the classification code of each commodity is as follows:
Figure BDA0001528430140000052
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, and B is the sum of the invoices of all taxpayers of all classification codes of each commodity.
Further, the formula for normalizing the corrected usage rate of the classified codes to establish the mathematical model is as follows:
Figure BDA0001528430140000053
wherein, Pi"is the usage rate of each commodity after the ith classification code is normalized,
Figure BDA0001528430140000054
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Further, the method also comprises the steps of sorting the usage rates of different classification codes of each commodity after normalization from large to small, and feeding back 1 to 3 tax classification codes with the maximum usage rate value as the recommended tax classification codes to the drawer client.
Further, α is 1, β is 0.5, and γ is 0.2.
In conclusion, the invention provides a model for determining the tax classification code recommended to the commodity, which is continuously improved by introducing known invoice data for learning, determines the optimal values of the weighted values of the utilization rates of the three commodity tax classification codes, and automatically and intelligently recommends the commodity tax classification code through a commodity tax classification code recommending unit, thereby effectively improving the filling accuracy and efficiency of the tax classification code.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a block diagram of a system for determining a recommended tax classification code for a good in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart of a method for determining a recommended tax classification code for a good according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
FIG. 1 is a block diagram of a system for determining a recommended tax classification code for a good according to an embodiment of the present invention. As shown in FIG. 1, the system 100 for determining a recommended tax classification code of a commodity according to the present invention comprises:
an invoice data acquisition unit 101 for acquiring taxpayer information and value-added tax invoice data;
the invoice data cleaning unit 102 is used for preprocessing the value-added tax invoice data acquired by the invoice data acquisition unit, cleaning redundant data without value in the invoice data, and effectively improving the efficiency of invoice data analysis of subsequent data by invoice cleaning;
the invoice data analysis unit 103 is used for calculating the utilization rate of each classification code once issued by each commodity in the invoice data;
an invoice model establishing unit 104 for correcting the usage rate of the classification code of each commodity according to the magnitude of the weight value of the taxpayer on the usage rate of the classification code of each commodity, and normalizing the corrected usage rate of the classification code to establish a mathematical model, wherein when both the industry to which the commodity belongs and the operating range of the taxpayer conform to the commodity, the weight value of the usage rate is set to be α, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the usage rate is set to be β, and when both the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the usage rate is set to be γ;
and the testing unit 105 is used for importing invoice data with known commodity classification codes into the established invoice model, setting different alpha, beta and gamma, then testing, solving optimal values of the industry and the operation range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate, and calculating the utilization rate of each tax classification code of each commodity based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
The testing unit can further correct the values of alpha, beta and gamma by adding known invoice data for testing, thereby improving the accuracy of determining the recommended classification code of the commodity.
Preferably, the data collected by the invoice data collection unit 101 includes the third period of the gold tax, the invoicing software, taxpayer information of the invoice platform, and value-added tax invoice data.
Preferably, the invoice data cleaning unit 102 performs preprocessing to introduce the invoice data collected by the invoice data collection unit 101 into a Hadoop data platform, and uses a Spark program to clean redundant data in the invoice data.
Preferably, the formula for the invoice data analysis unit 103 to calculate the usage rate of each classification code of each commodity is:
Figure BDA0001528430140000071
wherein, PiIs the utilization rate of the ith classification code of each commodity, AiThe total number of invoices of all taxpayers of the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Preferably, the formula for the invoice model building unit 104 to correct the usage rate of the classification code of each commodity according to the weight value of the taxpayer on the usage rate of the classification code of each commodity is as follows:
Figure BDA0001528430140000081
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, and B is the sum of the invoices of all taxpayers of all classification codes of each commodity.
Preferably, the invoice model building unit 104 normalizes the corrected classification code usage to build the mathematical model according to the formula:
Figure BDA0001528430140000082
wherein, Pi"is the usage rate of each commodity after the ith classification code is normalized,
Figure BDA0001528430140000083
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Preferably, the system further comprises a commodity tax classification code recommending unit 106, which is used for sorting the normalized utilization rate of different classification codes of each commodity from large to small, and feeding back 1 to 3 tax classification codes with the largest utilization rate value as the recommended tax classification codes to the drawer client.
Preferably, the invoice model creation unit 104 has a value of 1 for α, 0.5 for β, and 0.2 for γ.
FIG. 2 is a flowchart of a method for determining a recommended tax classification code for a good according to an embodiment of the present invention. As shown in FIG. 2, the method 200 for determining a recommended tax classification code for an item of merchandise according to the present invention begins at step 201.
In step 201, taxpayer information and value-added tax invoice data are collected;
in step 202, preprocessing the collected value-added tax invoice data, and cleaning redundant data without utilization value in the invoice data;
in step 203, calculating the utilization rate of each classification code once issued by each commodity in the invoice data after the redundant data is eliminated;
in step 204, correcting the utilization rate of the classification codes of each commodity according to the weight value of the taxpayer on the utilization rate of the classification codes of each commodity, and normalizing the corrected utilization rate of the classification codes to establish a mathematical model, wherein when the industry to which the commodity belongs and the operating range of the taxpayer both conform to the commodity, the weight value of the utilization rate is set to be alpha, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the utilization rate is set to be beta, and when the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the utilization rate is set to be gamma;
in step 205, the invoice data with known commodity classification codes is imported into the established invoice model, different alpha, beta and gamma are set, then testing is performed, the optimal values of the industry and the business range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate are solved, and the utilization rate of each tax classification code of each commodity is calculated based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
Preferably, the pretreatment of the collected value-added tax invoice data is to introduce the invoice data collected by the invoice data collection unit into a Hadoop data platform, and use a Spark program to clean redundant data in the invoice data.
Preferably, the formula for calculating the usage rate of each category code for each commodity is:
Figure BDA0001528430140000091
wherein, PiIs the utilization rate of the ith classification code of each commodity, AiThe total number of invoices of all taxpayers of the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Preferably, the formula for correcting the usage rate of the classification code of each commodity according to the weight value of the taxpayer on the usage rate of the classification code of each commodity is as follows:
Figure BDA0001528430140000092
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, and B is the sum of the invoices of all taxpayers of all classification codes of each commodity.
Preferably, the formula for normalizing the corrected usage of classified codes to create the mathematical model is:
Figure BDA0001528430140000101
wherein, Pi"is the usage rate of each commodity after the ith classification code is normalized,
Figure BDA0001528430140000102
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number.
Preferably, the method further comprises the step 206 of sorting the normalized utilization rates of the different classification codes of each commodity from large to small, and feeding back the 1 to 3 tax classification codes with the maximum utilization rate value as the recommended tax classification codes to the drawer client.
Preferably, the value of α is 1, the value of β is 0.5 and the value of γ is 0.2.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ means, component, etc. ] are to be interpreted openly as referring to at least one instance of said means, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Claims (9)

1. A system for determining a recommended tax classification code for an item, the system comprising:
the invoice data acquisition unit is used for acquiring taxpayer information and value-added tax invoice data;
the invoice data cleaning unit is used for preprocessing the value-added tax invoice data acquired by the invoice data acquisition unit and cleaning redundant data without utilization value in the invoice data;
the invoice data analysis unit is used for calculating the utilization rate of each classification code once issued by each commodity in the invoice data, and the calculation formula is as follows:
Figure FDA0003260783220000011
wherein, PiIs the utilization rate of the ith classification code of each commodity, AiThe total number of invoicing times of all taxpayers of the ith classification code of each commodity, B is the sum of the invoicing times of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number;
an invoice model establishing unit for correcting the utilization rate of the classification codes of each commodity according to the weight value of the taxpayer on the utilization rate of the classification codes of each commodity and normalizing the corrected utilization rate of the classification codes to establish a mathematical model, wherein when the industry to which the commodity belongs and the operating range of the taxpayer both conform to the commodity, the weight value of the utilization rate is set to be alpha, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the utilization rate is set to be beta, when the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the utilization rate is set to be gamma, and then a formula for correcting the utilization rate of the classification codes of each commodity and a formula of the mathematical model are respectively:
Figure FDA0003260783220000012
Figure FDA0003260783220000013
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, Pi"is the usage rate of each commodity after the ith classification code is normalized,
Figure FDA0003260783220000021
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number;
and the testing unit is used for importing invoice data with known commodity classification codes into the established invoice model, setting different alpha, beta and gamma, then testing, solving the optimal values of the industry and the operation range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate, and calculating the utilization rate of each tax classification code of each commodity based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
2. The system according to claim 1, wherein the data collected by the invoice data collection unit comprises tax payer information and value added tax invoice data of a gold tax three-phase, invoicing software and invoice platform.
3. The system according to claim 1, wherein the invoice data cleaning unit preprocesses the invoice data collected by the invoice data collection unit to be imported into a Hadoop data platform, and redundant data in the invoice data is cleaned by using a Spark program.
4. The system of claim 1, further comprising a commodity tax classification code recommending unit, configured to perform normalized usage ranking on different classification codes of each commodity, and feed back a tax classification code corresponding to a maximum value as a recommended tax classification code to the drawer client.
5. The system of claim 1, wherein the invoice model building block has a value for α of 1, a value for β of 0.5 and a value for γ of 0.2.
6. A method of determining a recommended tax classification code for a commodity, the method comprising:
collecting taxpayer information and value-added tax invoice data;
preprocessing collected value-added tax invoice data, and cleaning redundant data without utilization value in the invoice data;
for each commodity in the invoice data after the redundant data is eliminated, calculating the utilization rate of each classification code issued by the commodity, wherein the calculation formula is as follows:
Figure FDA0003260783220000031
wherein, PiIs the utilization rate of the ith classification code of each commodity, AiThe total number of invoicing times of all taxpayers of the ith classification code of each commodity, B is the sum of the invoicing times of all taxpayers of all classification codes of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number;
correcting the utilization rate of the classification codes of each commodity according to the weight value of the taxpayer on the utilization rate of the classification codes of each commodity, and normalizing the corrected utilization rate of the classification codes to establish a mathematical model, wherein when the industry to which the commodity belongs and the operating range of the taxpayer both conform to the commodity, the weight value of the utilization rate is set to be alpha, when one of the industry to which the commodity belongs and the operating range of the taxpayer conforms to the commodity, the weight value of the utilization rate is set to be beta, and when the industry to which the commodity belongs and the operating range of the taxpayer do not conform to the commodity, the weight value of the utilization rate is set to be gamma, and then the formula for correcting the utilization rate of the classification codes of each commodity and the formula of the mathematical model are respectively:
Figure FDA0003260783220000032
Figure FDA0003260783220000033
wherein, Pi' is the corrected usage rate, X, of the ith classification code of each commodityiIs the total number of invoicing times of taxpayers with the weight value of alpha in the ith classification code of each commodity, YiIs the total number of invoicing times of taxpayers with the weight value of beta in the ith classification code of each commodity, ZiIs the total number of invoices of taxpayers with the weight value of gamma in the ith classification code of each commodity, B is the sum of the invoices of all taxpayers of all classification codes of each commodity, Pi"is the ith seed of each commodityThe usage rate of the class code after normalization,
Figure FDA0003260783220000034
is the sum of the corrected utilization rates of each classified code of each commodity, i is more than or equal to 1 and less than or equal to n, and n is a natural number;
and importing invoice data with known commodity classification codes into the established invoice model, setting different alpha, beta and gamma, then testing, solving the optimal values of the industry and the operation range of the commodities in the invoice model to the weight values alpha, beta and gamma of the utilization rate, and calculating the utilization rate of each tax classification code of each commodity based on the determined optimal values of the weight values to determine the recommended classification code of each commodity.
7. The method as claimed in claim 6, wherein the preprocessing of the collected value-added tax invoice data is to import the invoice data collected by the invoice data collection unit into a Hadoop data platform, and to clean redundant data in the invoice data by using a Spark program.
8. The method of claim 6, wherein the normalized usage rates of the different taxation codes of each commodity are ranked, and wherein the taxation code corresponding to the maximum value is the recommended taxation code for the commodity.
9. The method of claim 6, wherein α has a value of 1, β has a value of 0.5, and γ has a value of 0.2.
CN201711450703.4A 2017-12-27 2017-12-27 System and method for determining tax-recommending classification code of commodity Active CN110019404B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711450703.4A CN110019404B (en) 2017-12-27 2017-12-27 System and method for determining tax-recommending classification code of commodity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711450703.4A CN110019404B (en) 2017-12-27 2017-12-27 System and method for determining tax-recommending classification code of commodity

Publications (2)

Publication Number Publication Date
CN110019404A CN110019404A (en) 2019-07-16
CN110019404B true CN110019404B (en) 2022-01-07

Family

ID=67187046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711450703.4A Active CN110019404B (en) 2017-12-27 2017-12-27 System and method for determining tax-recommending classification code of commodity

Country Status (1)

Country Link
CN (1) CN110019404B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377801A (en) * 2019-07-24 2019-10-25 浙江诺诺网络科技有限公司 A kind of product name bearing calibration, device and computer readable storage medium
CN110597995B (en) * 2019-09-20 2022-03-11 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN113052616A (en) * 2021-03-15 2021-06-29 北京金和网络股份有限公司 Cold chain product tracing method, device and system
CN115809887B (en) * 2022-12-09 2023-10-10 蔷薇大树科技有限公司 Method and device for determining main business scope of enterprise based on invoice data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004979A (en) * 2009-09-03 2011-04-06 叶克 System and method for providing commodity matching and promoting services
CN104102833A (en) * 2014-07-10 2014-10-15 西安交通大学 Intensive interval discovery based tax index normalization and fusion calculation method
CN105117426A (en) * 2015-07-31 2015-12-02 重庆龙工场跨境电子商务投资有限公司 Intelligent search system for HSCODE
CN105631742A (en) * 2015-12-24 2016-06-01 安徽融信金模信息技术有限公司 Small and medium enterprise credit evaluation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200411457A (en) * 2002-12-20 2004-07-01 Hon Hai Prec Ind Co Ltd Notes receivable management system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004979A (en) * 2009-09-03 2011-04-06 叶克 System and method for providing commodity matching and promoting services
CN104102833A (en) * 2014-07-10 2014-10-15 西安交通大学 Intensive interval discovery based tax index normalization and fusion calculation method
CN105117426A (en) * 2015-07-31 2015-12-02 重庆龙工场跨境电子商务投资有限公司 Intelligent search system for HSCODE
CN105631742A (en) * 2015-12-24 2016-06-01 安徽融信金模信息技术有限公司 Small and medium enterprise credit evaluation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《开票不难!商品及税收分类编码选择技巧》;吴海燕;《https://www.dongao.com/c/2017-12-13/829765.shtml》;20171213;第1-7页 *

Also Published As

Publication number Publication date
CN110019404A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN110019404B (en) System and method for determining tax-recommending classification code of commodity
CN104572449A (en) Automatic test method based on case library
CN104036420A (en) Method for batch checking, downloading and utilizing invoices based on national network invoice platform
CN109766384A (en) The method and apparatus of automatic conversion data metering unit in a kind of visualization system
CN106251178A (en) Data digging method and device
CN105975486A (en) Information recommendation method and apparatus
CN110019798B (en) Method and system for measuring commodity type difference of sale and sale items
CN114398560B (en) Marketing interface setting method, device, equipment and medium based on WEB platform
CN114372731B (en) Post target making method, device, equipment and storage medium based on big data
CN112307098A (en) Cost consultation management method, system, electronic equipment and computer readable storage medium
CN110009796B (en) Invoice category identification method and device, electronic equipment and readable storage medium
CN110032513B (en) Data verification method and device and electronic equipment
CN112861500A (en) Engineering pricing table generation method and device based on engineering quantity list
CN112052310A (en) Information acquisition method, device, equipment and storage medium based on big data
CN114676931B (en) Electric quantity prediction system based on data center technology
CN111460293B (en) Information pushing method and device and computer readable storage medium
CN114781855A (en) DEA model-based logistics transmission efficiency analysis method, device, equipment and medium
CN113487256A (en) Purchase, sale and storage management method, device and equipment and storage medium
CN113592479A (en) Charging method and device based on multi-stage increasing rate
CN113988800A (en) Method and device for checking abnormal electric quantity user, computer equipment and storage medium
CN113743894A (en) Method and system for establishing rechecking rule model for rechecking electric bill
CN111179046A (en) Method and system for realizing automatic payment and posting of sales cost based on invoice data
CN109584029B (en) Method, device, medium and electronic equipment for auditing electronic invoices
CN115578007A (en) Method and system for integrating calculation of points and task in tax industry
CN113673891A (en) Planning method and device for iterative delivery mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant