CN110019798B - Method and system for measuring commodity type difference of sale and sale items - Google Patents

Method and system for measuring commodity type difference of sale and sale items Download PDF

Info

Publication number
CN110019798B
CN110019798B CN201711157256.3A CN201711157256A CN110019798B CN 110019798 B CN110019798 B CN 110019798B CN 201711157256 A CN201711157256 A CN 201711157256A CN 110019798 B CN110019798 B CN 110019798B
Authority
CN
China
Prior art keywords
commodity
goods
service classification
service
classification codes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711157256.3A
Other languages
Chinese (zh)
Other versions
CN110019798A (en
Inventor
舒南飞
林文辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201711157256.3A priority Critical patent/CN110019798B/en
Publication of CN110019798A publication Critical patent/CN110019798A/en
Application granted granted Critical
Publication of CN110019798B publication Critical patent/CN110019798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Abstract

The invention discloses a method for measuring the difference of commodity types of sale and sale items, which comprises the following steps: establishing an analysis and identification model of the classification codes of the commodities and the services according to historical invoice data and rule set data containing the details of the commodities and the services; determining an updated list of commodity and service classifications of commodities and services which are arranged in a descending probability order by utilizing an analysis recognition model of commodity and service classification codes according to attribute information of invoice data of the commodities in a preset period; and calculating the category difference of the commodities of the sale items of the enterprise according to the updated thickness classification degree of the commodity and service classification according to the sale item difference measurement formula. The invention can accurately reflect the goods and service types of the enterprises for goods input and output, thereby identifying the enterprises with abnormal operation behaviors. Meanwhile, the enterprise entry and sale items are analyzed through the corrected commodity and service classification codes, so that the classification accuracy is improved, and the calculation workload is reduced.

Description

Method and system for measuring commodity type difference of sale and sale items
Technical Field
The invention relates to the technical field of tax risk management, in particular to a method and a system for measuring commodity type difference of an input and output item.
Background
The national tax administration, 2/15/2016, issues a notification about the development of goods and service tax classification and code trial work, and a classification and code standard file of goods and service tax classification and code (trial) and requires that a code-related function is added to an upgraded version of a value-added tax invoice system. During the use of classification and coding of tax of goods and services in the past year, when an enterprise invoicing terminal invoices sold goods, a corresponding code needs to be selected from more than 4000 classification codes. Since different billing enterprises have different understandings of the classification codes of goods and services, the selected classification codes may not be consistent for goods and services of the same name. Moreover, the classification codes are classified into large classes and small classes, which is also a factor causing inconsistency of the classification codes, for example, an enterprise invoker may choose a more accurate and wider upper-level classification code of goods and services if the finest classification code of goods and services is not accurately obtained. In addition, the enterprise invoicer can also select the classification code at will for the reasons of inconvenient operation and free invoicing. The generation of these inaccurate classification code data is inconsistent with the original intention of the national tax administration to perform classification and code trial work on goods and services taxes, and does not utilize these data to perform tax data analysis work.
Due to the fact that goods and service names on invoice information issued by enterprises during goods feeding and goods selling are not completely consistent, difference is measured in a mode that name texts are completely matched, and difference measurement among goods names is inaccurate, for example: measure the degree of consistency of apple notebook and apple notebook; and the calculation is time-consuming, and the calculation cost is high under the conditions of mass invoice data and ten-million-level enterprise scale.
In view of the current invoice data issuing display condition containing classification codes and the condition that the name text description of goods and service names can not be completely consistent during invoicing, the method can not accurately reflect the goods and service categories of the enterprises for goods input and output, the goods service and category of goods output, and the condition of identifying the enterprises with abnormal operation behaviors, and a method for measuring the difference of the goods category of the input and output item is needed.
Disclosure of Invention
The invention provides a method and a system for measuring the commodity type difference of an input item and a sales item, which aim to solve the problem that the commodity and service types of the input and output of an enterprise cannot be accurately determined so as to identify a commercial and trade type enterprise with abnormal operation behaviors.
In order to solve the above problem, according to an aspect of the present invention, there is provided a method for measuring a difference in commodity category of an item to be sold, the method comprising:
establishing an analysis and identification model of the classification codes of the commodities and the services according to historical invoice data and rule set data containing the details of the commodities and the services; wherein the rule set data is: commodity name, description keyword and commodity and service classification code;
determining a commodity and service classification code updating list of commodities and services which are arranged in a descending probability order by utilizing an analysis and identification model of commodity and service classification codes according to attribute information of invoice data of the commodities in a preset period;
and calculating the category difference of the commodity of the sale item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes according to the sale item difference measurement formula.
Preferably, wherein the method further comprises:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
Preferably, the establishing an analysis recognition model of the classification code of the goods and the services according to the historical invoice data containing the details of the goods and the services and the rule set data comprises:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
Preferably, wherein the attribute information includes: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
Preferably, the sales promotion term difference metric formula is:
Figure BDA0001474630460000031
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
Preferably, the calculating the category difference of the goods sold by the enterprise according to the thickness classification degree of the updated goods and service classification codes comprises:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
According to another aspect of the present invention, there is provided a system for measuring differences in types of items to be sold, the system comprising:
the analysis and identification model establishing unit is used for establishing an analysis and identification model of the goods and service classification codes according to historical invoice data and rule set data which comprise goods and service details; wherein the rule set data is: commodity name, description keyword and commodity and service classification code;
the commodity and service classification code updating unit is used for determining a commodity and service classification code updating list of commodities and services which are arranged in a probability descending order by utilizing an analysis and identification model of the commodity and service classification codes according to the attribute information of the invoice data of the commodities in a preset period;
and the category difference calculation unit of the commodity of the sale item is used for calculating the category difference of the commodity of the sale item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes according to the sale item difference measurement formula.
Preferably, wherein the system further comprises:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
Preferably, the analysis and recognition model establishing unit for the goods and service classification code establishes an analysis and recognition model for the goods and service classification code according to historical invoice data and rule set data containing details of the goods and service, and includes:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
Preferably, wherein the attribute information includes: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
Preferably, the sales promotion term difference metric formula is:
Figure BDA0001474630460000051
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
Preferably, the calculating unit of category difference of the marketing item goods calculates category difference of marketing item goods of the enterprise according to the thickness classification degree of the updated goods and service classification codes, and includes:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
The invention provides a method and a system for measuring the difference of commodity types of sales items, which utilize invoice historical data containing commodity and service classification codes and establish an analysis and identification model of the commodity and service classification codes on the basis of the frequency of specific information-the commodity and service classification codes; then, updating the commodity and service classification codes in the invoiced data through the established analysis and identification model of the commodity and service classification codes; and calculating the category difference of the commodity of the sale-in item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes. The invention corrects the classification codes in the historical invoicing data by using the established classification identification model of the commodity and service classification codes, measures the type difference of the commodity of the purchase and sale item by using the updated classification codes of the commodity and the service, and can accurately reflect the commodity and service types of the commodity and trade type enterprises, thereby identifying the enterprises with abnormal operation behaviors. Meanwhile, the enterprise entry and sale items are analyzed through the corrected classified commodity and service classified codes, and compared with the method of directly analyzing and comparing the names of the entry and sale items, the classification accuracy is improved, and the calculation workload is reduced.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow diagram of a method 100 for measuring differences in types of items sold according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method 200 for measuring category differences of items for sale, according to an embodiment of the present invention; and
fig. 3 is a schematic diagram of a system 300 for measuring the difference between the types of items sold according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flow chart of a method 100 for measuring a difference in a category of items sold according to an embodiment of the present invention. The method for measuring the commodity type difference of the sales items in the embodiment of the invention utilizes invoice historical data containing commodity and service classification codes and establishes an analysis and identification model of the commodity and service classification codes on the basis of the frequency of the specific information-commodity and service classification codes; then, updating the commodity and service classification codes in the invoiced data through the established analysis and identification model of the commodity and service classification codes; and calculating the category difference of the commodity of the sale-in item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes. The invention mainly aims at the difference degree of the commodity and the commodity sold by the commercial enterprise to identify the abnormal operation behaviors of the commercial enterprise with different types of the purchased and sold commodities. Related enterprises that buy and sell inconsistent business activities are often accompanied by tax evasion. The invention can provide decision support for tax law enforcement departments to capture tax evasion enterprises, and simultaneously analyzes the entry and sale items of the enterprises through classified commodity and service classified codes, compared with the method of directly analyzing and comparing the names of the entry and sale items, the invention improves the classification accuracy and reduces the calculation workload. The method 100 for measuring the commodity category difference of the sales items starts from step 101, and an analysis recognition model of the classification codes of the commodities and the services is established according to historical invoice data containing commodity and service details and rule set data in step 101; wherein the rule set data is: commodity name, description keyword and commodity and service classification code.
Preferably, wherein the method further comprises:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
Preferably, the establishing an analysis recognition model of the classification code of the goods and the services according to the historical invoice data containing the details of the goods and the services and the rule set data comprises:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
Preferably, wherein the attribute information includes: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
In the implementation mode of the invention, the accumulated massive invoice detail data of the value-added tax invoice commodities and services is obtained through an information system of a national tax bureau, wherein the main fields are the names, specification models, unit prices, business operation ranges of enterprises, industry information of the enterprises, commodity and service classification codes selected by invoices when invoices are issued, invoicing dates and the like. After analyzing historical invoice data containing the names of the commodities and the services and the classification codes of the commodities and the services, about 50% of the corresponding relation from the names of the commodities and the services to the classification codes of the commodities and the services in a detailed invoice of a single commodity and service is found in the full data to be considered to be correct, namely about half of invoice data is found, and when an enterprise invoker selects the classification codes of the commodities and the services for the commodities and the services, the enterprise invoker selects the relatively correct classification codes of the commodities and the services. In addition, data which does not enter model training in historical invoice data is deleted through data cleaning according to rules of goods and service names specified by staff experience of tax data analysis personnel. For example, names of unconventional goods and services are: a trade name consisting of only numbers and letters; names of articles without practical meaning, such as "detailed in sales List", and the like. By adding the characteristics of specification models of the commodities and the services, unit prices of the commodities and the services, the operation range of enterprises and the like into the model, the accuracy of judging the name description of the commodities and the services to the specific classification codes can be improved.
After classification and coding (trial) of goods and service taxes, the national tax administration slightly adjusts classification codes and standards and also needs to correct and supplement classification codes in historical data. The classification identification model from the names of the commodities and the services to the classification codes of the commodities and the services, which is established based on the invoice detail historical data of the value-added tax invoice commodities and the services, is established under the basic idea of big data, and the invoicing of the aimed commodities and the services almost covers all the articles and believes that most enterprise invoices choose the more accurate classification codes of the commodities and the services as far as possible for the corresponding articles when invoicing; in addition, classification rules are also important inputs for model training, and are used to supplement classification of goods not present in historical data.
The classification identification model from the commodity and service names to the commodity and service classification codes, which is established based on the invoice detail historical data of the value-added tax invoice commodities and services, provided by the embodiment of the invention, can continuously improve the accuracy of model identification along with the continuous increase of invoice data. In order to increase the robustness of the classification code identification model, a plurality of commodity and service classification codes with the probability from large to small are returned for a certain commodity and service name, and the possibility of wrong classification of the returned single commodity and service classification codes is reduced.
Preferably, an updated list of goods and services classification codes of the goods and services in descending probability order is determined in step 102 according to the attribute information of the invoice data of the goods in the preset period by using an analysis recognition model of the goods and services classification codes.
In the embodiment of the invention, the commodity and service classification codes in the invoice data of the time period to be analyzed are updated according to the established classification recognition model from the commodity and service names to the commodity and service classification codes. The preset period is based on that goods are normally sold within one year of the period. During updating, only the names of the commodities and the services, the specification and the model of the commodities and the services, the unit price information of the commodities, the operation range of the invoicing enterprise, the industry information of the invoicing enterprise and the like are needed to be input and transmitted to the classification identification model, and a commodity and service classification code list containing the probability is obtained.
Preferably, the category difference of the sales item goods of the enterprise is calculated according to the thickness classification degree of the updated goods and service classification codes according to the sales item difference measurement formula in step 103.
Preferably, the sales promotion term difference metric formula is:
Figure BDA0001474630460000091
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
Preferably, the calculating the category difference of the goods sold by the enterprise according to the thickness classification degree of the updated goods and service classification codes comprises:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
Fig. 2 is a flow chart of a method 200 for measuring a category difference of a merchandize item according to an embodiment of the present invention. As shown in fig. 2, invoice data in a specific business cycle is extracted by invoicing date in step 201.
At step 202, the identification model is analyzed using the goods and services classification codes, and an updated list of goods and services classification codes is obtained.
In step 203, it is determined whether to use fine-grained goods and services classification codes; wherein
If the fine-grained goods and service classification codes are used, the method enters step 204, and respectively calculates the code sets of the goods sold in the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes, and then enters step 206;
if the fine-grained goods and service classification codes are not used, the process proceeds to step 205, where the code sets of the sales item goods of the enterprise are respectively calculated according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes and the goods and service classification codes of the superior goods, and then the process proceeds to step 206.
At step 206, the category difference of the marketing item goods of the enterprise is calculated using the marketing item difference metric formula according to the calculated encoded set of marketing item goods of the enterprise.
In the embodiment of the invention, the corrected article classification code is an article classification code list, in order to conveniently set the difference degree of the enterprise sale item types with different degrees of width, when the difference degree of the enterprise sale item is measured, the difference between the article classification code type of the purchased article and the article classification code type of the sold article in a certain operation period (the common time span is more than 1 year, and a certain time period is calculated according to the month forward). The difference measurement formula of the marketing term is as follows:
Figure BDA0001474630460000101
the Set (Buy) and the Set (Sell) are respectively a Set of classified coding types of purchased and sold articles, the | Set (Buy) | Set (Sell) | is the number of classified coding of the articles and services which simultaneously appear in the Set (Buy) and the Set (Sell), | Set (Buy) | Set (Sell) | is the number of classified coding of the articles and services which appear in the Set (Buy) and the Set (Sell), | Set (Sell) | is a small floating point number, and δ is used for avoiding the division of 0 of denominator when the | Set (Buy) | Set (Sell) | is 0. The classification code classification table of the goods and the services is classified according to the characteristics of the article classification codes from a coarse class to a fine class. As shown in table 1, the codes from the coarse category to the fine category are classified and coded for the goods and services.
TABLE 1 article sorting code List
Figure BDA0001474630460000111
When the difference between the enterprise entry and the enterprise sale is judged to be strict, the type with the highest probability of the corrected article classification codes is used as the article classification codes; when the difference between the enterprise entry and the enterprise sale item is judged to be wide, the 'chapter + class + chapter + section' of the coding rule is used as a coarse-granularity commodity and service classification coding rule, and the coarse-granularity commodity and service classification coding rule (the coding rule consisting of the chapter + class + chapter + section) with the highest occurrence probability and the largest coarse-granularity service and commodity and service classification coding rule are found in the returned commodity and service classification coding list and used as the commodity and service classification coding for measuring the difference of the goods. The fine-grained goods and services classification code is the smallest class, the lowest level class. And the method is not suitable for the classification codes of the fine-grained commodities and the services, namely summing the probabilities recommended by the classification codes of the commodities and the services of the same type, and calculating the diversity factor of the types of the articles, wherein the maximum value of the probability sum is the finally set classification code of the commodities and the services. When the difference degrees of the enterprise entry and sale items are wide, and the classification coding rules of the coarse-grained goods and services are used, the practical situation is that under the wide standard, the calculated enterprise with the larger difference degree of the enterprise entry and sale items is more serious than the abnormal behavior of the enterprise measured by the entry and sale item difference degree of the classification coding rules of the fine-grained goods and services under the same numerical value. The two article difference degree measuring methods provided by the invention can be used simultaneously, and can also focus on observing a certain measuring method according to business requirements, and are used for identifying abnormal operation behaviors such as the difference of types of input and output items of business type enterprises.
Fig. 3 is a schematic diagram of a system 300 for measuring the difference between the types of items sold according to an embodiment of the present invention. As shown in fig. 3, a system 300 for measuring a difference in the types of items sold according to an embodiment of the present invention includes: an analysis recognition model establishing unit 301 of the commodity and service classification code, a category difference calculating unit 302 of the commodity of the sale item and a category difference calculating unit 303 of the commodity of the sale item. Preferably, in the analysis and identification model establishing unit 301 for the goods and service classification code, an analysis and identification model for the goods and service classification code is established according to historical invoice data and rule set data containing goods and service details; wherein the rule set data is: commodity name, description keyword and commodity and service classification code. Preferably, wherein the system further comprises:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
Preferably, the analysis and recognition model establishing unit 301 for the goods and service classification code establishes an analysis and recognition model for the goods and service classification code according to historical invoice data and rule set data containing details of the goods and service, and includes:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
Preferably, wherein the attribute information includes: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
Preferably, in the goods and service classification code updating unit 302, the goods and service classification code update list arranged in a descending probability order of the goods and services is determined by using an analysis recognition model of the goods and service classification codes according to the attribute information of the invoice data of the goods in the preset period.
Preferably, in the category difference calculation unit 303 of the merchandize items, the category difference of the merchandize items of the enterprise is calculated according to the thickness classification degree of the updated product and service classification codes according to the merchandize item difference measurement formula.
Preferably, the sales promotion term difference metric formula is:
Figure BDA0001474630460000121
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
Preferably, the calculating unit 303 for calculating the category difference of the merchandize items of the enterprise according to the thickness classification degree of the updated product and service classification code comprises:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
The method 100 for measuring the difference of the types of commodities of the sale item in the embodiment of the present invention corresponds to the system 300 for measuring the difference of the types of commodities of the sale item in another embodiment of the present invention, and will not be described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Claims (10)

1. A method for measuring differences in types of items sold, the method comprising:
establishing an analysis and identification model of the classification codes of the commodities and the services according to historical invoice data and rule set data containing the details of the commodities and the services; wherein the rule set data is: commodity name, description keyword and commodity and service classification code;
determining a commodity and service classification code updating list of commodities and services which are arranged in a descending probability order by utilizing an analysis and identification model of commodity and service classification codes according to attribute information of invoice data of the commodities in a preset period;
calculating the category difference of the commodity of the sale item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes according to the sale item difference measurement formula;
wherein, the establishment of the analysis and recognition model of the classification code of the goods and the services according to the historical invoice data containing the details of the goods and the services and the rule set data comprises the following steps:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
2. The method of claim 1, further comprising:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
3. The method of claim 1, wherein the attribute information comprises: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
4. The method of claim 1, wherein the cost term difference metric is formulated as:
Figure FDA0002715391070000021
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
5. The method of claim 1, wherein calculating the category difference of the commodity of sale item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes comprises:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
6. A system for measuring differences in types of items sold, the system comprising:
the analysis and identification model establishing unit is used for establishing an analysis and identification model of the goods and service classification codes according to historical invoice data and rule set data which comprise goods and service details; wherein the rule set data is: commodity name, description keyword and commodity and service classification code;
the commodity and service classification code updating unit is used for determining a commodity and service classification code updating list of commodities and services which are arranged in a probability descending order by utilizing an analysis and identification model of the commodity and service classification codes according to the attribute information of the invoice data of the commodities in a preset period;
the category difference calculation unit of the commodity of the sale item is used for calculating the category difference of the commodity of the sale item of the enterprise according to the thickness classification degree of the updated commodity and service classification codes according to a sale item difference measurement formula;
the unit for establishing the analysis and identification model of the goods and service classification codes establishes the analysis and identification model of the goods and service classification codes according to historical invoice data and rule set data containing goods and service details, and comprises the following steps:
replacing the commodity and service classification codes of the old standard with the commodity and service classification codes of the new standard, and updating the commodity and service classification codes;
determining the frequency relation from the attribute information to the commodity and service classification codes by using the attribute information of the commodities and the services in the historical invoice data;
obtaining the commodity name and describing the relation of keywords, namely the commodity and service classification codes according to the regulations of commodity and service tax classification and codes, and determining rule set data;
and establishing an analysis and identification model of the goods and service classification codes by using the frequency relation and the rule set data of the attribute information to the goods and service classification codes.
7. The system of claim 6, further comprising:
before an analytical identification model of the goods and service classification codes is established according to historical invoice data and rule set data containing goods and service details,
and processing the historical invoice data, and removing the historical invoice data which is not described by the commodity and the service name words and the historical invoice data of the disabled commodity.
8. The system of claim 6, wherein the attribute information comprises: the names of the goods and the services are subjected to word segmentation to obtain word groups, the names of the goods and the services, specification models, unit prices, units, the business range of enterprises and industry information of the enterprises.
9. The system of claim 6, wherein the cost term difference metric is formulated as:
Figure FDA0002715391070000031
the Set (Buy) and the Set (Set) are respectively a Set of classified coding types of purchased and sold commodities, | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which simultaneously appear in the Set (Buy) and the Set (Set), | Set (Buy) | Set (Set) | is the number of classified codes of the commodities and the services which appear in the Set (Buy) and the Set (Set), and δ is a small floating point number.
10. The system according to claim 6, wherein the category difference calculation unit for the merchandize calculates the category difference of the merchandize of the enterprise according to the thickness classification degree of the updated product and service classification code, and includes:
if the fine-grained goods and service classification codes are used, calculating the category difference of the goods sold by the enterprise according to the goods and service classification codes corresponding to the maximum probability values of the updated goods and service classification codes;
and if the fine-grained commodity and service classification codes are not used, calculating the category difference of the commodity of the sale-in item of the enterprise according to the commodity and service classification codes corresponding to the maximum probability values of the updated commodity and service classification codes and the commodity and service classification codes of the superior commodities.
CN201711157256.3A 2017-11-20 2017-11-20 Method and system for measuring commodity type difference of sale and sale items Active CN110019798B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711157256.3A CN110019798B (en) 2017-11-20 2017-11-20 Method and system for measuring commodity type difference of sale and sale items

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711157256.3A CN110019798B (en) 2017-11-20 2017-11-20 Method and system for measuring commodity type difference of sale and sale items

Publications (2)

Publication Number Publication Date
CN110019798A CN110019798A (en) 2019-07-16
CN110019798B true CN110019798B (en) 2021-02-05

Family

ID=67185976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711157256.3A Active CN110019798B (en) 2017-11-20 2017-11-20 Method and system for measuring commodity type difference of sale and sale items

Country Status (1)

Country Link
CN (1) CN110019798B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179044B (en) * 2019-12-23 2023-08-29 望海康信(北京)科技股份公司 Bill reimbursement method and device
CN111192122A (en) * 2019-12-25 2020-05-22 航天信息股份有限公司 Method and system for calculating difference degree of sales items based on collaborative filtering
CN112529664A (en) * 2020-12-15 2021-03-19 航天信息股份有限公司 Method and device for comparing commodities sold in advance, storage medium and electronic equipment
CN115809887B (en) * 2022-12-09 2023-10-10 蔷薇大树科技有限公司 Method and device for determining main business scope of enterprise based on invoice data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform
CN103839172A (en) * 2012-11-23 2014-06-04 阿里巴巴集团控股有限公司 Goods recommendation method and system
CN103902545A (en) * 2012-12-25 2014-07-02 北京京东尚科信息技术有限公司 Category path recognition method and system
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9818093B1 (en) * 2012-06-14 2017-11-14 Amazon Technologies, Inc. Third party check-in associations with cloud wallet
CN104537561A (en) * 2015-01-20 2015-04-22 全国组织机构代码管理中心 Automatic economic activities classification device in organizing institution bar codes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839172A (en) * 2012-11-23 2014-06-04 阿里巴巴集团控股有限公司 Goods recommendation method and system
CN103902545A (en) * 2012-12-25 2014-07-02 北京京东尚科信息技术有限公司 Category path recognition method and system
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深化数据应用筑起税收风险新防线;孙正密 等;《中国税务》;20160801(第08期);第64页 *

Also Published As

Publication number Publication date
CN110019798A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN110019798B (en) Method and system for measuring commodity type difference of sale and sale items
US20080208780A1 (en) System and method for evaluating documents
US20200074486A1 (en) Information processing system, information processing device, prediction model extraction method, and prediction model extraction program
CN108921398B (en) Shop quality evaluation method and device
US20210224868A1 (en) Product valuation system and method
CN110728422A (en) Building information model, method, device and settlement system for construction project
CN111695979A (en) Method, device and equipment for analyzing relation between raw material and finished product
CN111091409B (en) Client tag determination method and device and server
CN114637842A (en) Enterprise industry classification method and device, storage medium and electronic equipment
CN112884291A (en) Enterprise supply chain analysis method and device, computer device and storage medium
CN115062687A (en) Enterprise credit monitoring method, device, equipment and storage medium
CN113240353B (en) Cross-border e-commerce oriented export factory classification method and device
CN113298291A (en) Express delivery quantity prediction method, device, equipment and storage medium
CN115187387B (en) Identification method and equipment for risk merchant
CN107168942B (en) Automatic report generation method and device
US20230136956A1 (en) Machine Learning System and Methods for Price List Determination From Free Text Data
CN116957631B (en) Tax invoice data intelligent identification method, system and medium
CN116757709B (en) Knowledge graph-based copper concentrate import risk analysis method and system
CN115145990B (en) Data processing method and related equipment for customs electronic declaration
US20240078566A1 (en) Methods, systems, articles of manufacture, and apparatus to detect promotion events
CN117787759A (en) Enterprise score determining method and device, storage medium and electronic device
CN115775094A (en) Method and device for constructing commodity library with abnormal sale entries
CN117333309A (en) Purchasing abnormal risk analysis method and system based on big data analysis
CN114881677A (en) User demand analysis method, device and equipment
CN117252176A (en) Intelligent survey report generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant