CN104636341A - Data cleaning storage method for added value tax one-number multi-name monitoring - Google Patents

Data cleaning storage method for added value tax one-number multi-name monitoring Download PDF

Info

Publication number
CN104636341A
CN104636341A CN201310547671.5A CN201310547671A CN104636341A CN 104636341 A CN104636341 A CN 104636341A CN 201310547671 A CN201310547671 A CN 201310547671A CN 104636341 A CN104636341 A CN 104636341A
Authority
CN
China
Prior art keywords
data
invoice
tax
value added
several
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310547671.5A
Other languages
Chinese (zh)
Other versions
CN104636341B (en
Inventor
陈勇
范钢
谢宇
陈博
潘竞旭
孟祥宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201310547671.5A priority Critical patent/CN104636341B/en
Publication of CN104636341A publication Critical patent/CN104636341A/en
Application granted granted Critical
Publication of CN104636341B publication Critical patent/CN104636341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Abstract

The invention discloses a data cleaning storage method for added value tax one-number multi-name monitoring. The method includes the following steps that service analysis is conducted according to tax data items required for one-number multi-name detection; a database list structure is designed according to a service analysis result; data are cleaned, the data with the special meanings are processed, and a final result is stored according to the design of the database list structure. By the adoption of the method, the data required for one-number multi-name detection can be rapidly cleaned and extracted from huge data and ingeniously stored for preparation of following judgment, and the efficiency for one-number multi-name detection of a system is improved.

Description

A kind of data cleansing storage means for a value added tax several monitoring
Technical field
The present invention relates to tax control technical field, particularly a kind of data cleansing storage means for a value added tax several monitoring.
Background technology
Forgery prevention for value-added tax taxation control system is the important component part of national golden tax project.By using numerical ciphers and nnir Rtorae technologies and strengthen the antiforge function of special invoice, accomplish the phenomenon that successfully containment utilizes VAT invoice to evade taxes, evade taxation.
Enterprises end software of making out an invoice uses numerical ciphers and electronic information Storage Techniques, and the antiforge function of strengthening special invoice, realizes, to the value added tax general taxpayer tax source control, opening the system of VAT invoice for enterprise.
The tax data of current general taxpayer mainly to be made out an invoice software, the tax-supervise system network edition and CTAIS(CTAIS from enterprises end, China Tex Administration Information System) core levies and manages three systems.
CTAIS formulates " tax collection and administration service regulation ", " office of city level tax collection and administration business demand " and " State Tax Administration CTAIS development requires summary " with the State Tax Administration and develops for benchmark, and it is orient towards the whole country tax authorities at various levels, unify, large-scale application software.
Above three system log (SYSLOG)s tax data of general taxpayer's magnanimity, the application and development for all kinds of tax system provides original Data Source.
Therefore, how designing a kind of method by these huge data efficients ground cleaning and filtering and then can store, and is that a several doubtful point detects and raises the efficiency.That is: enterprise is when opening VAT invoice, and namely the enterprise if there is a corresponding multiple pin party name of pin side's duty paragraph is classified as a several doubtful point, is the research direction place of those skilled in the art.
Summary of the invention
The object of this invention is to provide a kind of data cleansing storage means for a value added tax several monitoring, can by the raw data cleaning and filtering of mixed and disorderly redundancy, then store by the structure of design, for data encasement is carried out in a follow-up several efficient judgement.
In order to achieve the above object, the invention provides a kind of data cleansing storage means for a value added tax several monitoring, it is characterized in that, it comprises the following steps:
Business diagnosis is carried out for tax data item needed for a several detection;
The design of database table structure is carried out for business diagnosis result;
Carry out data cleansing, and process the data of particular meaning, the design of net result by described database structure is stored.
Wherein, the step of carrying out business diagnosis for tax data item needed for a several detection is with the incidence relation between the data item needed for object oriented analysis technical Analysis and data item, be that value added tax several detection is analyzed, comprise value added tax and enter sales invoice data and tax collection and administration data.
Wherein, the design carrying out database table structure for business diagnosis result carries out modeling process with Object-Oriented Design thought to the data provided from data source, produce ETL database structure model, comprise income invoice tables of data, income invoice listings data table, sales invoice table, sales invoice listings data table, the several result table of No. one, income, tax authority's tables of data, operating personnel's data table and taxpayer's tables of data.
Wherein, ETL process need carries out abnormality processing, and defines a series of tables, exception, carries out special processing to some special processing objects.
Wherein, enter sales invoice data and listings data respectively with authenticated time with declare dutiable goods the time for subregion key, monthly a subregion carries out partition zone optimizing storage, and sets up index on relevant field.
Wherein, design business diagnosis result being carried out to database table structure adopts relevant polymerization table technology, when considering Report Form Design, is saved in tables of data in advance by pre-service by the combined data calculated, and is directly supplied to user's inquiry.
Wherein, carry out data cleansing, and process the data of particular meaning, being undertaken storing by the design of described database structure by net result is wash irrelevant, redundancy, mixed and disorderly data, the data of process particular meaning, store net result by database design.
Wherein, for ETL process, need from source database, extract required data, and authority data form, be stored in ETL data model, and pay close attention on the accuracy of data, the performance of process and operation expanding.
Wherein, carrying out data cleansing, and process the data of particular meaning, during net result is stored by the design of described database structure, when extracting raw data, removing according to tables, exception or replacing unusual character, may find that the content that individual data is filled in is incorrect after extraction is come,, after correcting these data by backstage management procedure, mark them no longer by synchronized update, with specification raw data.
Wherein, carrying out data cleansing, and processing the data of particular meaning, in being stored by the design of described database structure by net result, the ETL process of source data specifically comprises: the extraction of income invoice data, conversion, loading procedure; The extraction of income invoice listings data, conversion, loading procedure; The extraction of sales invoice data, conversion, loading procedure; The extraction of sales invoice listings data, conversion, loading procedure; Tax authority's data extraction, conversion, loading procedure; The extraction of operating personnel's data, conversion, loading procedure; The several data extraction of the extraction of taxpayer's data, conversion, loading procedure and No. one, income, conversion, loading procedure.
Wherein, the several data extraction of No. one, income, conversion, loading procedure comprise the following steps:
Scanning value added tax income invoice table on the same day, gets rid of pin side Taxpayer Identification Number or name and is called sky, or pin square that pays taxes is other provinces taxpayer, or income invoice be for drawing a bill, the incomplete data of data;
One by one by the pin side's enterprise name in income invoice, carry out similarity-rough set with the taxpayer's title in taxpayer's registration form according to following rule, be inserted in the several result table of No. one, income.
Compared with prior art, beneficial effect of the present invention is: the present invention can Rapid Cleaning extract to detect several required data inside huge data, and can store dexterously as follow-up judgement provides preparation, improve the efficiency that No. one, systems axiol-ogy is several.
Accompanying drawing explanation
A kind of data cleansing storage means process flow diagram for a value added tax several monitoring that Fig. 1 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, to above-mentioned being described in more detail with other technical characteristic and advantage of the present invention.
As shown in Figure 1, be a kind of data cleansing storage means process flow diagram for a value added tax several monitoring that the embodiment of the present invention provides, a kind of data cleansing storage means for a value added tax several monitoring of the present invention comprises the following steps:
Step S1: carry out business diagnosis for tax data item needed for a several detection;
In this step, be with the incidence relation between the data item needed for object oriented analysis technology (OOA) analysis and data item, for the table structure of database provides foundation.In business, we are to value added tax several detection (in a regional extent, the situation of the multiple title of duty paragraph in cash sale side is gone out in all income invoices) analyze, for: value added tax enters sales invoice data, (what be mainly derived from that the tax-supervise system network edition provides sends a duplicate to certification snapshot data to main operand involved by discovery, comprise and send a duplicate to invoice conterfoil part detail, send a duplicate to non-deduction stub detailed, the data such as certification invoice offsetting slip is detailed, and enterprises end long-range send a duplicate to that authentication software gathers enter sales invoice seven key element, invoice schedule, the data such as invoice scanned picture) and tax collection and administration data (be mainly derived from CTAIS CTAIS and comprise tax authority's data, operating personnel, taxpayer, the data such as industry).
Step S2: the analysis result for step S1 carries out the design of database table structure;
This step is the analysis result (mainly comprising the relation between data item required for a several detection and each data item) for step S1, carries out the design of database, takes into full account the factor such as search efficiency, data scale simultaneously.According to service needed, we carry out modeling process with Object-Oriented Design thought (OOD) to the data provided from data source, produce ETL database structure model, mainly comprise income invoice tables of data, income invoice listings data table, sales invoice table, sales invoice tablet menu, the several result table of No. one, income, tax authority's tables of data, operating personnel's data table, taxpayer's tables of data etc.; In abnormality processing, simultaneously because other third party's Data in Information Management System is inaccurate, in order to avoid causing the incorrect of native system data results due to these inexact datas, ETL process need carries out abnormality processing, and define a series of tables, exception, special processing is carried out to some special processing objects; In performance optimization, the first, in order to improve the efficiency of access, enter sales invoice data and listings data respectively with authenticated time with declare dutiable goods the time for subregion key, monthly a subregion carries out partition zone optimizing storage, and sets up index on relevant field; The second, in order to improve Consumer's Experience, native system adopts relevant polymerization table technology, when considering Report Form Design, is saved in tables of data in advance by pre-service by the combined data calculated, and is directly supplied to user's inquiry, shortens query responding time in a large number.
Step S3: source data is cleaned and stores.
This step washes irrelevant, redundancy, mixed and disorderly data, and the data of process particular meaning, store the database design of net result by step S2.For ETL process, we need from source database, extract required data, and authority data form, be stored in ETL data model.Income invoice data are mainly derived from network edition certification invoice offsetting slip detailed data and enterprises end offsetting slip detailed data (being associated with invoice codes and invoice number field); Income invoice listings data is mainly derived from enterprises end offsetting slip itemized bill data, and associates income invoice data; Sales invoice data are mainly derived from the network edition and send a duplicate to invoice conterfoil part detailed data and enterprises end stub detailed data (being associated with invoice codes and invoice number field); Sales invoice listings data is mainly derived from enterprises end stub itemized bill data, and associates sales invoice data; The several result table of No. one, income can utilize existing income invoice data and registration taxpayer information architecture in ETL model; Tax authority's data are mainly extracted from CTAIS tax authority code table; Operating personnel's data are mainly extracted from CTAIS operating personnel's code table; Taxpayer's data are mainly from taxpayer's information that CTAIS registered or assert and acquisition (associating with taxpayer's electronic record field) extend information; In ETL process, we pay close attention on the accuracy of data, the performance of process and operation expanding: in data accuracy, raw data in the such as Chinese Fields such as title, abbreviation, because the error of typing usually occurs some unusual characters, as ". ", space etc.
The present invention, when extracting raw data, removing according to tables, exception or replacing unusual character, may find that the content that individual data is filled in is incorrect after extraction is come.We mark them no longer by synchronized update after correcting these data by backstage management procedure, with specification raw data, ensure the quality of data analysis; In treatment effeciency, for mass data, we do incremental processing by temporary table mechanism, only load certain section of time vicissitudinous taxpayer or tax authority's information; In order to guarantee business retaining history process data, for critical data table, set up and be set effective bit-identify, prevent historical data to be capped.
Below the concrete enforcement of above-mentioned steps is described in detail:
Step S1: carry out business diagnosis for tax data item needed for a several detection;
1.1, primitive data item:
For the business of a several detection, required raw data table is as follows:
(1) income invoice seven key element table
(2) income invoice scanned picture table
(3) income invoice commodity detail list
Title Code Annotation Data type Non-NULL major key
Vendor code CSDM Data acquisition vendor code VARchar2(20) TRUEFALSE
Invoice codes FPDM Invoice codes char(10) TRUETRUE
Invoice number FPHM Invoice number char(8) TRUETRUE
Article line number HH Article line number NUMBER TRUETRUE
Item Title WP_MC Item Title VARchar2(100) TRUEFALSE
Article model WP_XH Article model VARchar2(40) FALSEFALSE
Article unit WP_DW Article unit VARchar2(32) FALSEFALSE
Number of articles SL Number of articles NUMBER(16,2) TRUEFALSE
Unit price DJ Unit price NUMBER(16,2) FALSEFALSE
The amount of money JE The amount of money NUMBER(16,2) TRUEFALSE
The amount of tax to be paid SE The amount of tax to be paid NUMBER(16,2) TRUEFALSE
The tax rate WP_SL The tax rate NUMBER(10,6) TRUEFALSE
(4) sales invoice seven key element table
(5) sales invoice commodity detail list
(6) certification offsetting slip invoice is detailed
(7) stub invoice is sent a duplicate to detailed
(8) non-deduction stub detail list is sent a duplicate to
(9) taxpayer's information table is registered
(10) taxpayer's expansion table is registered
(11) taxpayer's qualification history information table is assert
(12) the detailed code table of industry
(13) taxpayer's qualification code table
(14) tax authority's code table
(15) operating personnel's code table
1.2, a several detection desired data item.The field of table each in 1.1 analyzed, draw the several required data item of No. one, detection, then design the storage list structure of these new data item, concrete grammar is shown in step S2.
Step S2, the analysis result for step S1 carries out the design of database table structure, and after system carries out cleaning and filtering to raw data, what obtain is the several required data of No. one, detection, and these data need in an organized way to deposit, and facilitate follow-up judgement.These data are pressed following organize by this method:
(1) value added tax income invoice table
(2) value added tax income invoice tablet menu
(3) value added tax sales invoice table
(4) value added tax sales invoice tablet menu
(5) tax authority's information table
(6) tax authority's information temporary table
(7) taxpayer's information table
(8) operating personnel's table
(9) operating personnel's temporary table
Title Code Annotation Data type Non-NULL Major key
Operating personnel's code CZRY_DM Operating personnel's code char(11) TRUE FALSE
Tax authority's code SWJG_DM Tax authority's code char(11) TRUE FALSE
Operating personnel's title CZRY_MC Operating personnel's title VARchar2(60) TRUE FALSE
(10) the several result table of No. one, income
Can find out, the data after cleaned only store just with a table, and these data are enough to ensure that No. one, detection is several.In addition, due to VAT invoice business distinctive month characteristic, the data of whole database table have also carried out partitioned storage by month, like this can quick position when retrieval.
Step S3: source data is cleaned and stores, wherein, the ETL process of source data specifically comprises:
Step S31: the extraction of income invoice data, conversion, loading procedure:
1) extract the tax-supervise system network edition and import certification offsetting slip invoice detail (RZ_FPDKL_MX) in data, stored in value added tax income invoice table (ETL_JXFP).Need in certification month to convert YYYYMM form to;
2) according to invoice codes and invoice number, import income invoice seven key element table (DKLMX) data from enterprises end software and obtain pin side's enterprise name (XF_QYMC), Acquiring enterprise's title (GF_QYMC), upgrade the pin side taxpayer title (XF_NSRMC) in value added tax income invoice table (ETL_JXFP) and the side of purchasing taxpayer title (GF_NSRMC) respectively;
3) obtain amount of money integrality (JEWZX) in income invoice seven key element table (DKLMX), be 1 data integrity, otherwise data are imperfect, result inserted " data whether complete (SFSJWZ) " field;
More than operate and process every day according to authenticated time.
Step S32: the extraction of income invoice listings data, conversion, loading procedure:
1) extract enterprises end authentication web software and import income invoice commodity detail list (DKLMX_QD) in data;
2) income invoice seven key element table (DKLMX) is associated;
More than operate and process every day according to authenticated time.
Step S33: the extraction of sales invoice data, conversion, loading procedure:
1) send a duplicate to invoice conterfoil part detail list (CB_FPCGL_MX) in extraction tax-supervise system network edition importing data and obtain VAT invoice, stored in value added tax sales invoice table (ETL_XXFP).Declare dutiable goods and need in month to convert YYYYMM form to;
2) send a duplicate to non-deduction stub detail list (CB_FDKFPCGL_MX) in extraction tax-supervise system network edition importing data and obtain common invoice, stored in value added tax sales invoice table (ETL_XXFP).Declare dutiable goods and need in month to convert YYYYMM form to;
3) according to invoice codes and invoice number, import sales invoice seven key element table (CGLMX) data from enterprises end software and obtain pin side's enterprise name (XF_QYMC), Acquiring enterprise's title (GF_QYMC), upgrade the pin side taxpayer title (XF_NSRMC) in value added tax sales invoice table (ETL_XXFP) and the side of purchasing taxpayer title (GF_NSRMC) respectively;
4) obtain amount of money integrality (JEWZX) in income invoice seven key element table (DKLMX), be 1 data integrity, otherwise data are imperfect, result inserted " data whether complete (SFSJWZ) " field;
More than operation processes according to declaring dutiable goods every day time.
Step S34: the extraction of sales invoice listings data, conversion, loading procedure:
1) extract enterprises end authentication web software and import sales invoice commodity detail list (CGLMX_QD) in data;
2) sales invoice seven key element table (CGLMX) is associated;
More than operation processes according to declaring dutiable goods every day time.
Step S35: tax authority's data extraction, conversion, loading procedure:
1) according to tax authority's code, title, abbreviation, the tax authorities at higher levels, scanning tax authority's code table (DM_SWJG), finds out the information be updated, does abnormality processing simultaneously to unreasonable field or record;
2) be loaded into successively in tax authority's dimension table (DW_DIM_SWJG) according to level.
3) to the tax authority be updated, former data are upgraded:
A. ROW_IS_CURRENT in former data is set to N;
B. ROW_END_DATE in former data is set to current time;
Step S36: the extraction of operating personnel's data, conversion, loading procedure:
Compare according to operating personnel's code, title, scan operation personnel code table (DM_CZRY), finds out the information be updated and is stored in operating personnel's temporary table (ETL_CZRY_TMP);
1) according to temporary table information, the information of renewal is loaded into the person of managing Wei Biaoli;
2) to the person's of the managing information be updated, former data are upgraded:
A. ROW_IS_CURRENT in former data is set to N;
B. ROW_END_DATE in former data is set to current time;
Step S37: the extraction of taxpayer's data, conversion, loading procedure:
1) extract in core expropriation and management system importing data and register taxpayer's information table (DJ_NSRXX), according to the amendment date, renewal rewards theory is carried out to existing taxpayer in taxpayer's information table (ETL_NSR), update is performed to the taxpayer also do not had;
2) scan in core expropriation and management system importing data and register taxpayer's expansion table (DJ_NSRXX_KZ), association Taxpayer Identification Number (NSRSBH), according to the amendment date, renewal rewards theory is carried out to taxpayer's information corresponding in taxpayer's information table (ETL_NSR);
3) scan in core expropriation and management system importing data and assert taxpayer's qualification history information table (RD_NSRZG_LSXX),
4) associate Taxpayer Identification Number (NSRSBH), according to the amendment date, renewal rewards theory is carried out to taxpayer's information corresponding in taxpayer's information table (ETL_NSR);
Step S38: the several data extraction of No. one, income, conversion, loading procedure:
1) scans value added tax income invoice table on the same day (ETL_JXFP), get rid of pin side Taxpayer Identification Number or name and be called sky, or pin square that pays taxes is other provinces taxpayer, or income invoice be for drawing a bill, the incomplete data of data.
2) one by one by the pin side's enterprise name in income invoice, carry out similarity-rough set with the taxpayer's title in taxpayer's registration form according to following rule, be inserted in the several result table (ETL_JX_YHDM_JG) of No. one, income.
A, 2 enterprise names compared, if equal, similarity is 100%;
B, 2 enterprise names that will compare remove special character (as: space, bracket etc.) respectively;
C, 2 enterprise names that will compare remove the character (as: province, autonomous region, city, county, area etc.) representing area respectively;
D, 2 enterprise names that will compare remove the character (as: joint-stock company of company limited, responsibility company limited, company, factory, group etc.) representing enterprise nature respectively;
After e, respectively rejecting key word, calculate the similarity of 2 enterprise names; If similarity is 100% after rejecting, be then set to 99%.
In sum, the present invention Rapid Cleaning can extract to detect several required data inside huge data, and can store dexterously as follow-up judgement provides preparation, improves the efficiency that No. one, systems axiol-ogy is several.
More than illustrate just illustrative for the purpose of the present invention; and nonrestrictive, those of ordinary skill in the art understand, when not departing from the spirit and scope that claim limits; many amendments, change or equivalence can be made, but all will fall within protection scope of the present invention.

Claims (11)

1., for a data cleansing storage means for a value added tax several monitoring, it is characterized in that, it comprises the following steps:
Business diagnosis is carried out for tax data item needed for a several detection;
The design of database table structure is carried out for business diagnosis result;
Carry out data cleansing, and process the data of particular meaning, the design of net result by described database structure is stored.
2. a kind of data cleansing storage means for a value added tax several monitoring according to claim 1, it is characterized in that, the step of carrying out business diagnosis for tax data item needed for a several detection is with the incidence relation between the data item needed for object oriented analysis technical Analysis and data item, be that value added tax several detection is analyzed, comprise value added tax and enter sales invoice data and tax collection and administration data.
3. a kind of data cleansing storage means for a value added tax several monitoring according to claim 1, it is characterized in that, the design carrying out database table structure for business diagnosis result carries out modeling process with Object-Oriented Design thought to the data provided from data source, produce ETL database structure model, comprise income invoice tables of data, income invoice listings data table, sales invoice table, sales invoice listings data table, the several result table of No. one, income, tax authority's tables of data, operating personnel's data table and taxpayer's tables of data.
4. a kind of data cleansing storage means for a value added tax several monitoring according to claim 3, it is characterized in that, ETL process need carries out abnormality processing, and defines a series of tables, exception, carries out special processing to some special processing objects.
5. a kind of data cleansing storage means for a value added tax several monitoring according to claim 3, it is characterized in that, enter sales invoice data and listings data respectively with authenticated time with declare dutiable goods the time for subregion key, monthly a subregion carries out partition zone optimizing storage, and sets up index on relevant field.
6. a kind of data cleansing storage means for a value added tax several monitoring according to claim 3, it is characterized in that, design business diagnosis result being carried out to database table structure adopts relevant polymerization table technology, when considering Report Form Design, in advance the combined data calculated is saved in tables of data by pre-service, is directly supplied to user's inquiry.
7. a kind of data cleansing storage means for a value added tax several monitoring according to claim 1, it is characterized in that, carry out data cleansing, and process the data of particular meaning, being undertaken storing by the design of described database structure by net result is wash irrelevant, redundancy, mixed and disorderly data, the data of process particular meaning, store net result by database design.
8. a kind of data cleansing storage means for a value added tax several monitoring according to claim 1, it is characterized in that, for ETL process, need from source database, extract required data, and authority data form, be stored in ETL data model, and pay close attention on the accuracy of data, the performance of process and operation expanding.
9. a kind of data cleansing storage means for a value added tax several monitoring according to claim 8, it is characterized in that, carrying out data cleansing, and process the data of particular meaning, during net result is stored by the design of described database structure, when extracting raw data, remove according to tables, exception or replace unusual character, may find that the content that individual data is filled in is incorrect after extraction is come, after correcting these data by backstage management procedure, mark them no longer by synchronized update, with specification raw data.
10. a kind of data cleansing storage means for a value added tax several monitoring according to claim 8, it is characterized in that, carrying out data cleansing, and process the data of particular meaning, in being stored by the design of described database structure by net result, the ETL process of source data specifically comprises: the extraction of income invoice data, conversion, loading procedure; The extraction of income invoice listings data, conversion, loading procedure; The extraction of sales invoice data, conversion, loading procedure; The extraction of sales invoice listings data, conversion, loading procedure; Tax authority's data extraction, conversion, loading procedure; The extraction of operating personnel's data, conversion, loading procedure; The several data extraction of the extraction of taxpayer's data, conversion, loading procedure and No. one, income, conversion, loading procedure.
11. a kind of data cleansing storage meanss for a value added tax several monitoring according to claim 10, is characterized in that, the several data extraction of No. one, income, conversion, loading procedure comprise the following steps:
Scanning value added tax income invoice table on the same day, gets rid of pin side Taxpayer Identification Number or name and is called sky, or pin square that pays taxes is other provinces taxpayer, or income invoice be for drawing a bill, the incomplete data of data;
One by one by the pin side's enterprise name in income invoice, carry out similarity-rough set with the taxpayer's title in taxpayer's registration form according to following rule, be inserted in the several result table of No. one, income.
CN201310547671.5A 2013-11-06 2013-11-06 A kind of data cleansing storage method for the several monitoring of value-added tax No.1 Active CN104636341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310547671.5A CN104636341B (en) 2013-11-06 2013-11-06 A kind of data cleansing storage method for the several monitoring of value-added tax No.1

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310547671.5A CN104636341B (en) 2013-11-06 2013-11-06 A kind of data cleansing storage method for the several monitoring of value-added tax No.1

Publications (2)

Publication Number Publication Date
CN104636341A true CN104636341A (en) 2015-05-20
CN104636341B CN104636341B (en) 2018-02-27

Family

ID=53215113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310547671.5A Active CN104636341B (en) 2013-11-06 2013-11-06 A kind of data cleansing storage method for the several monitoring of value-added tax No.1

Country Status (1)

Country Link
CN (1) CN104636341B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515912A (en) * 2017-08-15 2017-12-26 上海数聚软件系统股份有限公司 A kind of report data acquisition methods based on trade management model
CN109800220A (en) * 2019-01-29 2019-05-24 浙江国贸云商企业服务有限公司 A kind of big data cleaning method, system and relevant apparatus
CN109978675A (en) * 2017-12-22 2019-07-05 航天信息股份有限公司 A kind of tax monitoring method and device
CN111192128A (en) * 2019-12-30 2020-05-22 航天信息股份有限公司 Method for identifying abnormal tax payment behaviors
CN111222766A (en) * 2019-12-29 2020-06-02 航天信息股份有限公司 Method and system for early warning of enterprise false invoicing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136101A (en) * 2007-04-02 2008-03-05 四川亚元防伪科技有限公司 'Amount-checking invoice-control, invoice-checking tax-controlling' 'data greatly tracking' tax controlling method, system constructing and operation method
CN101452450A (en) * 2007-11-30 2009-06-10 上海市电力公司 Multiple source data conversion service method and apparatus thereof
US7716093B2 (en) * 2000-06-14 2010-05-11 Tax Matrix Technologies, Llc Sales tax assessment, remittance and collection system
CN102495885A (en) * 2011-12-08 2012-06-13 中国信息安全测评中心 Method for integrating information safety data based on base-networking engine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7716093B2 (en) * 2000-06-14 2010-05-11 Tax Matrix Technologies, Llc Sales tax assessment, remittance and collection system
CN101136101A (en) * 2007-04-02 2008-03-05 四川亚元防伪科技有限公司 'Amount-checking invoice-control, invoice-checking tax-controlling' 'data greatly tracking' tax controlling method, system constructing and operation method
CN101452450A (en) * 2007-11-30 2009-06-10 上海市电力公司 Multiple source data conversion service method and apparatus thereof
CN102495885A (en) * 2011-12-08 2012-06-13 中国信息安全测评中心 Method for integrating information safety data based on base-networking engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付荣: ""陕西省煤炭生产企业增值税监控管理系统的设计与实现"", 《中国优秀硕士学位论文全文数据库-信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515912A (en) * 2017-08-15 2017-12-26 上海数聚软件系统股份有限公司 A kind of report data acquisition methods based on trade management model
CN109978675A (en) * 2017-12-22 2019-07-05 航天信息股份有限公司 A kind of tax monitoring method and device
CN109978675B (en) * 2017-12-22 2022-06-07 航天信息股份有限公司 Tax monitoring method and device
CN109800220A (en) * 2019-01-29 2019-05-24 浙江国贸云商企业服务有限公司 A kind of big data cleaning method, system and relevant apparatus
CN109800220B (en) * 2019-01-29 2020-12-15 浙江国贸云商企业服务有限公司 Big data cleaning method, system and related device
CN111222766A (en) * 2019-12-29 2020-06-02 航天信息股份有限公司 Method and system for early warning of enterprise false invoicing
CN111192128A (en) * 2019-12-30 2020-05-22 航天信息股份有限公司 Method for identifying abnormal tax payment behaviors
CN111192128B (en) * 2019-12-30 2023-06-02 航天信息股份有限公司 Method for identifying abnormal tax payment behavior

Also Published As

Publication number Publication date
CN104636341B (en) 2018-02-27

Similar Documents

Publication Publication Date Title
CN104636338A (en) Data cleaning storage method for added value tax negative and positive note monitoring
CN104636337A (en) Data cleaning storage method for added value tax
CN101383028A (en) National commodity electronic monitoring method based on EPC article networking and system thereof
CN104424595A (en) Tax administration monitoring method and tax administration monitoring system thereof
CN104636341A (en) Data cleaning storage method for added value tax one-number multi-name monitoring
León et al. Integrated expert system applied to the analysis of non-technical losses in power utilities
CN108595621B (en) Early warning analysis method and system for false value-added tax invoice
CN104424613A (en) Value added tax invoice monitoring method and system thereof
EP2396720A1 (en) Creation of a data store
Hamad et al. An enhanced technique to clean data in the data warehouse
CN113902535A (en) Automatic accounting method and system for consumption tax
CN104994219B (en) A kind of data processing method and system
CN104700304A (en) Method and system for supervising enterprise tax evasion through values of plain invoices of value-added tax
US20210383405A1 (en) Method and system for processing environmental impact
JP2016181254A (en) Automatic journalizing processing apparatus, automatic journalizing processing method, and automatic journalizing processing program
CN103020753A (en) Document state display system and document state display method
CN112036995A (en) Large-scale enterprise financial data management method and system based on block chain and readable storage medium
CN104636972A (en) Method of monitoring enterprise false deduction invoice through commodity composition and system thereof
CN104636971A (en) Method of detecting one number for multiple names of value added tax invoice and system thereof
CN103473305A (en) Method and system for performing decision-making process show in statistic analysis
CN107423035B (en) Product data management system in software development process
CN104462462A (en) Service change frequency based data warehouse modeling method and device
CN106156904A (en) A kind of cross-platform fictitious assets source tracing method based on eID
CN103177330A (en) Operation order management system aiming at production and maintenance of thermodynamic system equipment
CN102254288A (en) Data processing method and system for risk mitigation in bank

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant