CN110362605A

CN110362605A - A kind of E book data verification method based on big data

Info

Publication number: CN110362605A
Application number: CN201910481957.5A
Authority: CN
Inventors: 戚晟; 朱峰; 杨开新
Original assignee: Suzhou Digital China Jet Technology Co Ltd
Current assignee: Suzhou Digital China Jet Technology Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-10-22
Also published as: CN116775726A

Abstract

The invention discloses a kind of E book data verification methods based on big data of E book technical field, data and big data processing are imported including enterprise, data are imported according to enterprise, it calculates and analyzes by multiple dimensions, obtain the range that enterprise normally imports data value, further according to industry where enterprise, after the Various types of data in industry is arranged, is concluded, is analyzed, is summarized, the common standards value in industry is obtained；The present invention imports the range of normal value and professional standard value range of data by the enterprise that big data is handled, the judgement of reasonable value can be carried out in data importing process, avoid the exceeded numerical value of typing of making mistakes due to enterprise operations, reduction legal risk caused by artificial problem during declaring, examination & approval chargeback risks are reduced, the human cost of enterprise is saved, improves enterprise's execution efficiency.

Description

A kind of E book data verification method based on big data

Technical field

The present invention relates to E book technical fields, and in particular to a kind of E book data verification method based on big data.

Background technique

" E book " is exactly the papery handbook replaced in current processing trade management with " Electronic Account ".Customs is with business circles It is the electronics bottom account that networking enterprise establishes for unit, implements electronic account book management, networking enterprise only sets up an electronic account book. Customs should be according to the condition of production of networking enterprise and the supervision of customs it needs to be determined that check and write off the period, according to checking and writing off the period to implementation The networking enterprise of electronic account book management carries out checking and writing off management.Checking in the middle period system is carried out from September 1st, 2006, enterprise is monthly Bottom is both needed to the inventory data to customs declaration item number and material.

System only carries out simple business norms verification to enterprise's typing/importing data at present, and verification is completed to return to school Test result.Intelligent verification is not carried out to data in the implementation procedure, so that verifying function is more weak, and no data is whole The links such as reason, conclusion, analysis, causing reported data to be cancelled the order, rate is high, and the clearance period is long, influences the performance of enterprises.It is of the invention based on this A kind of E book data verification method based on big data is devised, to solve the above problems.

Summary of the invention

The purpose of the present invention is to provide a kind of E book data verification method based on big data, exists to solve system Simple business norms verification is only carried out when handling enterprise's typing/importing data, no data carries out intelligent verification and to data The functions such as arranged, concluded, being analyzed, it will be by big data processing technique, no from user, behavior, conversion, activity dimension etc. Data analysis is carried out with angle, valuable data is found for different industries enterprise, examines percent of pass to improve customs, shorten Examination & approval, clearance time, provide the performance of enterprises.

To achieve the above object, the invention provides the following technical scheme: a kind of E book data check side based on big data Method, including enterprise import data and big data processing, import data according to the enterprise, calculate and analyze by multiple dimensions, The range that enterprise normally imports data value is obtained, further according to industry where enterprise, the Various types of data in industry is arranged, is returned After receiving, analyze, summarizing, the common standards value in industry is obtained, specific step is as follows for the big data processing:

Step1, Construction of Data Warehouse

The Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and answers With；

Each declaration system data source is combing intra-company's data source, external data source, structured data source and non- Structural data carries out Data Integration；

The ETL data exchange is that " Kafka " is utilized to integrate real-time production data on line, " Kettle " integrating exterior data And off-line data and " Filebeat " are integrated and are produced journal file on line in real time；

The data storage and processing include using HDFS distributed file storage system, storing data, for magnanimity number HBase Sql or Hive Sql, HBase Sql is used to operate for result set according in line computation, Hive Sql is for efficient The intermediate result of calculating is stored in race's formula storage HBase by inquiry, and carrying out off-line calculation for mass data, we use " Spark ", result is stored in HBase, is used for each application system；

The application includes calculated result being presented to each operation system, or call Hive SQL to carry out using WebApi When count calculating in fact；

Step2, data analysis in industry

Data source explanation

Data are analyzed in industry, and data source comes from each operation system, and type of database, business datum format, it is each not Identical, the Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, such as: it is prompt Close " business scope (materials and parts, finished product), Merger (materials and parts, finished product, partial loss consumption), inlet and outlet inventory, book core of way system Pin, log management " etc.；Logistics system in area is " record information (materials and parts, finished product, partial loss consumption), application form, warehouse information, simple Add inventory, go out storage bill information etc. "；

Analytical plan

The first step obtains data using SparkStreaming from Kafka, carries out big data processing and calculates, will calculate Intermediate result,<K is stored in Hbase using race's formula, V>, such as the design of Key has: electronics, clothes, food, chemical industry etc. Deng, then stored again using secondary classification, equally using<K, V>, such as the Key design of secondary classification: the V of first-level class；According to Secondary calculating intermediate result；

Second step, using<K, V>storage mode, data are poured into HDFS, in conjunction with Elasticsearch frame, benefit It is quickly searched in search engine；

Third step, for the efficiency of calculating, using the JDBC of the offer of SparkSQL, from intermediate result set (Hbase) Data pick-up and calculating are carried out, is finally presented result to the page；

Step3, loss standard intellectual analysis are reminded

Data source explanation

Data source comes from internal data, outside purchase data, enterprise's ERP creation data etc., type of database, business number According to format, different, the Data Integration of each operation system need to be synchronized to big data distributed file storage system by early period In HDFS, such as: " partial loss consumption " data, manual for processing trade " partial loss consumption " information, processing trade book of way system are closed in victory " BOM " information, the loss standard information in third party's Information Network, the loss information that third party's data supplier provides, in enterprise Portion ERP production loss information；

Analytical plan:

The first step obtains off-line data using SparkStreaming from Hdfs, carries out big data processing and calculates, in conjunction with Customs products code table operates with complicated calculations such as Map, Reduce and join and result is stored as in intermediate result MySql, Main table information is compiled from customs products code table, the total more a quotient of 1W, and data volume is less to be directly stored in MySql；

Second step is completed loss standard using " spark proposed algorithm " and is intelligently divided in conjunction with business demand practical application scene Analysis is recommended.

Preferably, specific step is as follows for enterprise's importing data:

Enterprise's login system is clicked in corresponding module and imports button, and pop-up imports page of data, clicks on the page " navigation button ", selection need the file uploaded, then click " upload " button, and system is uploaded to cloud for file is imported at this time File server, and the execution task that increases by one " upload successfully pending " is imported in record in the file of lower section；

Backstage executes task verification mode by task schedule, carries out service logic verification and intelligent data to data Verification, after the completion of verification, be classified into verify successfully, verification failure, alarm data, and it is aobvious in the common interface of importing data Show, user can downloading data confirm, play chess to verify successfully and alarm data, ACK button can be clicked and imported into system.

Preferably, the data characteristics that data are analyzed in the industry are as follows:

(1) data volume is calculated by TB, and existing customer man, company number is 4000 or so；

(2) data storage is more dispersed, has this system using Sas mode, some operation systems use C/S model, C/S's Mode carries out system deployment by the way of point library, and it is more multiple to share 400 or so, ETL design work for comprehensive each data source It is miscellaneous；

(3) efficiency requirements calculated are higher, and when user carries out data importing in operation system, page data, which imports, to be completed Afterwards, big data WebApi is called to analyze into industry data, by the analysis result presentation of industry to operation system interface, for its user With reference to, e.g., sluice-gate price lattice of same specification and model object in industry, in industry with the name of an article declare unit, same quotient in industry Buying source of product etc..

Preferably, the data characteristics that the loss standard intellectual analysis is reminded are as follows:

(1) data volume is larger, calculates by TB；

(2) data source is more, and data format disunity has formatting, there is files classes；

(3) efficiency requirements calculated are higher, when user carries out data inputting in operation system, imports, remind anti- Feedback calls big data WebApi to analyze into industry data, partial loss standard is presented to operation system interface, refers to for its user.

Compared with prior art, the beneficial effects of the present invention are: the present invention is by big data Processing Algorithm, according to each enterprise The data and professional standard that industry imports the functions such as carry out intelligent verification and are arranged, concluded to data, analyzed, by enterprise The data portion field of importing is compared with enterprise's normal value that big data is handled with professional standard value, when there are larger When difference, it is being defined as alarm data, and show in interface.The normal of data is imported by the enterprise that big data is handled It is worth range and professional standard value range, the judgement of reasonable value can be carried out in data importing process, avoids losing due to enterprise operations The exceeded numerical value of typing is missed, reduction legal risk caused by artificial problem during declaring reduces and examines chargeback risks, saves enterprise The human cost of industry improves enterprise's execution efficiency.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is Construction of Data Warehouse architecture diagram of the present invention.

Fig. 2 is design scheme flow chart of the present invention.

Fig. 3 is spark proposed algorithm flow chart of the present invention.

Fig. 4 is flow chart of the embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other Embodiment shall fall within the protection scope of the present invention.

Fig. 1-4 is please referred to, the present invention provides a kind of technical solution: a kind of E book data verification method based on big data, Data and big data processing are imported including enterprise, data are imported according to enterprise, calculates and analyzes by multiple dimensions, obtain enterprise The normal range for importing data value, further according to industry where enterprise, the Various types of data in industry is arranged, is concluded, is analyzed, After summarizing, the common standards value in industry is obtained, specific step is as follows for big data processing:

Step1, Construction of Data Warehouse

Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and application；

Each declaration system data source is combing intra-company's data source, external data source, structured data source and non-structural Change data and carries out Data Integration；

ETL data exchange be utilize " Kafka " integrate real-time production data on line, " Kettle " integrating exterior data and from Line number evidence and " Filebeat " are integrated produces journal file in real time on line；

Data storage and processing include using HDFS distributed file storage system, and storing data exists for mass data It being operated using HBase Sql or Hive Sql, HBase Sql for result set when line computation, Hive Sql is inquired for efficient, The intermediate result of calculating is stored in race's formula storage HBase, carrying out off-line calculation for mass data, we use " Spark ", Result is stored in HBase, is used for each application system；

It is real when calculated result being presented to each operation system using including, or utilizing WebApi that Hive SQL is called to carry out Statistics calculates；

Step2, data analysis in industry

Data source explanation

Analytical plan

Step3, loss standard intellectual analysis are reminded

Data source explanation

Analytical plan:

Wherein, specific step is as follows for enterprise's importing data:

The data characteristics that data are analyzed in industry are as follows:

The data characteristics that loss standard intellectual analysis is reminded are as follows:

(1) data volume is larger, calculates by TB；

One concrete application of the present embodiment are as follows: enterprise's initial landing system simultaneously enters importing interface, and selection needs to upload File carry out file uploading operation, system by import file be uploaded to clouds file server, complete the uploading operation of file. When needing, file is downloaded from clouds file server, completes the downloading work of file, analysis is handled by big data from the background and is carried out Data check work, after the completion of verification, by returned data check results be verify successfully, verify unsuccessfully, alarm data, and into Enter to import in interface and show, user can downloading data confirm.The simple business of data progress of the typing of E book enterprise, importing After verification, intelligent verification, data preparation, conclusion, analysis based on big data are carried out, customs's logistic industry informationization neck is obtained Required reference data in domain, such as the sluice-gate price lattice of specifications and models object of the same race.It is handled by big data Enterprise imports the range of normal value and professional standard value range of data, and the judgement of reasonable value can be carried out in data importing process, Legal risk caused by artificial problem, is reduced during avoiding the exceeded numerical value of typing of making mistakes due to enterprise operations, reduction from declaring Chargeback risks are examined, the human cost of enterprise is saved, improves enterprise's execution efficiency.

In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example. Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close Suitable mode combines.

Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to better explain the present invention Principle and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only It is limited by claims and its full scope and equivalent.

Claims

1. a kind of E book data verification method based on big data, including enterprise import data and big data processing, feature exists In, data are imported according to the enterprise, calculates and analyzes by multiple dimensions, obtain the range that enterprise normally imports data value, Further according to industry where enterprise, after the Various types of data in industry is arranged, is concluded, is analyzed, is summarized, obtain general in industry All over standard value, specific step is as follows for the big data processing:

Step1, Construction of Data Warehouse

The Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and application；

The ETL data exchange be utilize " Kafka " integrate real-time production data on line, " Kettle " integrating exterior data and from Line number evidence and " Filebeat " are integrated produces journal file in real time on line；

The data storage and processing include using HDFS distributed file storage system, and storing data exists for mass data It being operated using HBase Sql or Hive Sql, HBase Sql for result set when line computation, Hive Sql is inquired for efficient, The intermediate result of calculating is stored in race's formula storage HBase, carrying out off-line calculation for mass data, we use " Spark ", Result is stored in HBase, is used for each application system；

The application includes calculated result being presented to each operation system, or utilize reality when WebApi calling Hive SQL progress Statistics calculates；

Step2, data analysis in industry

Data source explanation

Data are analyzed in industry, and data source comes from each operation system, and type of database, business datum format, different, The Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, such as: victory is closed logical System " business scope (materials and parts, finished product), Merger (materials and parts, finished product, partial loss consumption), inlet and outlet inventory, book check and write off, day Will management " etc.；" record information (materials and parts, finished product, partial loss consumption), warehouse information, simply adds clearly application form logistics system in area List, out storage bill information etc. "；

Data characteristics

(2) data storage is more dispersed, has this system using Sas mode, some operation systems use C/S model, the mode of C/S System deployment is carried out by the way of point library, it is complex to share 400 or so, ETL design work for comprehensive each data source；

(3) efficiency requirements calculated are higher, when user carries out data importing in operation system, after the completion of page data imports, It calls big data WebApi to analyze into industry data, the analysis result presentation of industry to operation system interface is joined for its user Examine, e.g., sluice-gate price lattice of same specification and model object in industry, in industry with the name of an article declare unit, same commodity in industry Buying source etc.；

Analytical plan

The first step obtains data using SparkStreaming from Kafka, carries out big data processing and calculates, will be in calculating Between as a result, be stored in Hbase<K using race's formula, V>, such as the design of Key has: electronics, clothes, food, chemical industry etc., so It is stored again using secondary classification afterwards, equally using<K, V>, such as the Key design of secondary classification: the V of first-level class；Successively calculate Intermediate result；

Second step, using<K, V>storage mode, data are poured into HDFS, in conjunction with Elasticsearch frame, conducive to searching Index holds up quick lookup；

Third step, for the efficiency of calculating, using the JDBC of the offer of SparkSQL, (Hbase) is carried out from intermediate result set Data pick-up and calculating are finally presented result to the page；

Step3, loss standard intellectual analysis are reminded

Data source explanation

Data source comes from internal data, outside purchase data, enterprise's ERP creation data etc., type of database, business datum lattice Formula, different, the Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, Such as: " partial loss consumption " data, manual for processing trade " partial loss consumption " information, processing trade book " BOM " information of way system are closed in victory, Loss standard information in third party's Information Network, the loss information that third party's data supplier provides, enterprises ERP production damage Consume information；

Data characteristics

(1) data volume is larger, calculates by TB；

(3) efficiency requirements calculated are higher, when user carries out data inputting in operation system, imports, carry out prompting feedback, adjust It is analyzed with big data WebApi into industry data, partial loss standard is presented to operation system interface, is referred to for its user；

Analytical plan:

The first step obtains off-line data using SparkStreaming from Hdfs, carries out big data processing and calculates, in conjunction with customs Commodity code table operates with complicated calculations such as Map, Reduce and join and result is stored as in intermediate result MySql, main table Information is compiled from customs products code table, the total more a quotient of 1W, and data volume is less to be directly stored in MySql；

Second step is completed loss standard intellectual analysis using " spark proposed algorithm " and is pushed away in conjunction with business demand practical application scene It recommends.

2. a kind of E book data verification method based on big data according to claim 1, which is characterized in that the enterprise Industry imports data, and specific step is as follows:

Enterprise's login system is clicked in corresponding module and imports button, and pop-up imports page of data, clicks " the browsing on the page Button ", selection need the file uploaded, then click " upload " button, and system is uploaded to cloud file for file is imported at this time Server, and the execution task that increases by one " upload successfully pending " is imported in record in the file of lower section；

Backstage executes task verification mode by task schedule, carries out service logic verification to data and intelligent data verifies, After the completion of verification, be classified into verifying successfully, verification failure, alarm data, and import data common interface in show, user Can downloading data confirmed, play chess to verify successfully and alarm data, ACK button can be clicked and imported into system.