CN110362605A - A kind of E book data verification method based on big data - Google Patents
A kind of E book data verification method based on big data Download PDFInfo
- Publication number
- CN110362605A CN110362605A CN201910481957.5A CN201910481957A CN110362605A CN 110362605 A CN110362605 A CN 110362605A CN 201910481957 A CN201910481957 A CN 201910481957A CN 110362605 A CN110362605 A CN 110362605A
- Authority
- CN
- China
- Prior art keywords
- data
- enterprise
- industry
- information
- big
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000013524 data verification Methods 0.000 title claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000008676 import Effects 0.000 claims abstract description 25
- 238000012795 verification Methods 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 14
- 239000000463 material Substances 0.000 claims description 12
- 238000013461 design Methods 0.000 claims description 10
- 238000013500 data storage Methods 0.000 claims description 9
- 230000010354 integration Effects 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 7
- 238000007726 management method Methods 0.000 claims description 7
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000001360 synchronised effect Effects 0.000 claims description 6
- 238000007405 data analysis Methods 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 3
- 230000009467 reduction Effects 0.000 abstract description 3
- 230000006855 networking Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/125—Finance or payroll
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of E book data verification methods based on big data of E book technical field, data and big data processing are imported including enterprise, data are imported according to enterprise, it calculates and analyzes by multiple dimensions, obtain the range that enterprise normally imports data value, further according to industry where enterprise, after the Various types of data in industry is arranged, is concluded, is analyzed, is summarized, the common standards value in industry is obtained;The present invention imports the range of normal value and professional standard value range of data by the enterprise that big data is handled, the judgement of reasonable value can be carried out in data importing process, avoid the exceeded numerical value of typing of making mistakes due to enterprise operations, reduction legal risk caused by artificial problem during declaring, examination & approval chargeback risks are reduced, the human cost of enterprise is saved, improves enterprise's execution efficiency.
Description
Technical field
The present invention relates to E book technical fields, and in particular to a kind of E book data verification method based on big data.
Background technique
" E book " is exactly the papery handbook replaced in current processing trade management with " Electronic Account ".Customs is with business circles
It is the electronics bottom account that networking enterprise establishes for unit, implements electronic account book management, networking enterprise only sets up an electronic account book.
Customs should be according to the condition of production of networking enterprise and the supervision of customs it needs to be determined that check and write off the period, according to checking and writing off the period to implementation
The networking enterprise of electronic account book management carries out checking and writing off management.Checking in the middle period system is carried out from September 1st, 2006, enterprise is monthly
Bottom is both needed to the inventory data to customs declaration item number and material.
System only carries out simple business norms verification to enterprise's typing/importing data at present, and verification is completed to return to school
Test result.Intelligent verification is not carried out to data in the implementation procedure, so that verifying function is more weak, and no data is whole
The links such as reason, conclusion, analysis, causing reported data to be cancelled the order, rate is high, and the clearance period is long, influences the performance of enterprises.It is of the invention based on this
A kind of E book data verification method based on big data is devised, to solve the above problems.
Summary of the invention
The purpose of the present invention is to provide a kind of E book data verification method based on big data, exists to solve system
Simple business norms verification is only carried out when handling enterprise's typing/importing data, no data carries out intelligent verification and to data
The functions such as arranged, concluded, being analyzed, it will be by big data processing technique, no from user, behavior, conversion, activity dimension etc.
Data analysis is carried out with angle, valuable data is found for different industries enterprise, examines percent of pass to improve customs, shorten
Examination & approval, clearance time, provide the performance of enterprises.
To achieve the above object, the invention provides the following technical scheme: a kind of E book data check side based on big data
Method, including enterprise import data and big data processing, import data according to the enterprise, calculate and analyze by multiple dimensions,
The range that enterprise normally imports data value is obtained, further according to industry where enterprise, the Various types of data in industry is arranged, is returned
After receiving, analyze, summarizing, the common standards value in industry is obtained, specific step is as follows for the big data processing:
Step1, Construction of Data Warehouse
The Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and answers
With;
Each declaration system data source is combing intra-company's data source, external data source, structured data source and non-
Structural data carries out Data Integration;
The ETL data exchange is that " Kafka " is utilized to integrate real-time production data on line, " Kettle " integrating exterior data
And off-line data and " Filebeat " are integrated and are produced journal file on line in real time;
The data storage and processing include using HDFS distributed file storage system, storing data, for magnanimity number
HBase Sql or Hive Sql, HBase Sql is used to operate for result set according in line computation, Hive Sql is for efficient
The intermediate result of calculating is stored in race's formula storage HBase by inquiry, and carrying out off-line calculation for mass data, we use
" Spark ", result is stored in HBase, is used for each application system;
The application includes calculated result being presented to each operation system, or call Hive SQL to carry out using WebApi
When count calculating in fact;
Step2, data analysis in industry
Data source explanation
Data are analyzed in industry, and data source comes from each operation system, and type of database, business datum format, it is each not
Identical, the Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, such as: it is prompt
Close " business scope (materials and parts, finished product), Merger (materials and parts, finished product, partial loss consumption), inlet and outlet inventory, book core of way system
Pin, log management " etc.;Logistics system in area is " record information (materials and parts, finished product, partial loss consumption), application form, warehouse information, simple
Add inventory, go out storage bill information etc. ";
Analytical plan
The first step obtains data using SparkStreaming from Kafka, carries out big data processing and calculates, will calculate
Intermediate result,<K is stored in Hbase using race's formula, V>, such as the design of Key has: electronics, clothes, food, chemical industry etc.
Deng, then stored again using secondary classification, equally using<K, V>, such as the Key design of secondary classification: the V of first-level class;According to
Secondary calculating intermediate result;
Second step, using<K, V>storage mode, data are poured into HDFS, in conjunction with Elasticsearch frame, benefit
It is quickly searched in search engine;
Third step, for the efficiency of calculating, using the JDBC of the offer of SparkSQL, from intermediate result set (Hbase)
Data pick-up and calculating are carried out, is finally presented result to the page;
Step3, loss standard intellectual analysis are reminded
Data source explanation
Data source comes from internal data, outside purchase data, enterprise's ERP creation data etc., type of database, business number
According to format, different, the Data Integration of each operation system need to be synchronized to big data distributed file storage system by early period
In HDFS, such as: " partial loss consumption " data, manual for processing trade " partial loss consumption " information, processing trade book of way system are closed in victory
" BOM " information, the loss standard information in third party's Information Network, the loss information that third party's data supplier provides, in enterprise
Portion ERP production loss information;
Analytical plan:
The first step obtains off-line data using SparkStreaming from Hdfs, carries out big data processing and calculates, in conjunction with
Customs products code table operates with complicated calculations such as Map, Reduce and join and result is stored as in intermediate result MySql,
Main table information is compiled from customs products code table, the total more a quotient of 1W, and data volume is less to be directly stored in MySql;
Second step is completed loss standard using " spark proposed algorithm " and is intelligently divided in conjunction with business demand practical application scene
Analysis is recommended.
Preferably, specific step is as follows for enterprise's importing data:
Enterprise's login system is clicked in corresponding module and imports button, and pop-up imports page of data, clicks on the page
" navigation button ", selection need the file uploaded, then click " upload " button, and system is uploaded to cloud for file is imported at this time
File server, and the execution task that increases by one " upload successfully pending " is imported in record in the file of lower section;
Backstage executes task verification mode by task schedule, carries out service logic verification and intelligent data to data
Verification, after the completion of verification, be classified into verify successfully, verification failure, alarm data, and it is aobvious in the common interface of importing data
Show, user can downloading data confirm, play chess to verify successfully and alarm data, ACK button can be clicked and imported into system.
Preferably, the data characteristics that data are analyzed in the industry are as follows:
(1) data volume is calculated by TB, and existing customer man, company number is 4000 or so;
(2) data storage is more dispersed, has this system using Sas mode, some operation systems use C/S model, C/S's
Mode carries out system deployment by the way of point library, and it is more multiple to share 400 or so, ETL design work for comprehensive each data source
It is miscellaneous;
(3) efficiency requirements calculated are higher, and when user carries out data importing in operation system, page data, which imports, to be completed
Afterwards, big data WebApi is called to analyze into industry data, by the analysis result presentation of industry to operation system interface, for its user
With reference to, e.g., sluice-gate price lattice of same specification and model object in industry, in industry with the name of an article declare unit, same quotient in industry
Buying source of product etc..
Preferably, the data characteristics that the loss standard intellectual analysis is reminded are as follows:
(1) data volume is larger, calculates by TB;
(2) data source is more, and data format disunity has formatting, there is files classes;
(3) efficiency requirements calculated are higher, when user carries out data inputting in operation system, imports, remind anti-
Feedback calls big data WebApi to analyze into industry data, partial loss standard is presented to operation system interface, refers to for its user.
Compared with prior art, the beneficial effects of the present invention are: the present invention is by big data Processing Algorithm, according to each enterprise
The data and professional standard that industry imports the functions such as carry out intelligent verification and are arranged, concluded to data, analyzed, by enterprise
The data portion field of importing is compared with enterprise's normal value that big data is handled with professional standard value, when there are larger
When difference, it is being defined as alarm data, and show in interface.The normal of data is imported by the enterprise that big data is handled
It is worth range and professional standard value range, the judgement of reasonable value can be carried out in data importing process, avoids losing due to enterprise operations
The exceeded numerical value of typing is missed, reduction legal risk caused by artificial problem during declaring reduces and examines chargeback risks, saves enterprise
The human cost of industry improves enterprise's execution efficiency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is Construction of Data Warehouse architecture diagram of the present invention.
Fig. 2 is design scheme flow chart of the present invention.
Fig. 3 is spark proposed algorithm flow chart of the present invention.
Fig. 4 is flow chart of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1-4 is please referred to, the present invention provides a kind of technical solution: a kind of E book data verification method based on big data,
Data and big data processing are imported including enterprise, data are imported according to enterprise, calculates and analyzes by multiple dimensions, obtain enterprise
The normal range for importing data value, further according to industry where enterprise, the Various types of data in industry is arranged, is concluded, is analyzed,
After summarizing, the common standards value in industry is obtained, specific step is as follows for big data processing:
Step1, Construction of Data Warehouse
Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and application;
Each declaration system data source is combing intra-company's data source, external data source, structured data source and non-structural
Change data and carries out Data Integration;
ETL data exchange be utilize " Kafka " integrate real-time production data on line, " Kettle " integrating exterior data and from
Line number evidence and " Filebeat " are integrated produces journal file in real time on line;
Data storage and processing include using HDFS distributed file storage system, and storing data exists for mass data
It being operated using HBase Sql or Hive Sql, HBase Sql for result set when line computation, Hive Sql is inquired for efficient,
The intermediate result of calculating is stored in race's formula storage HBase, carrying out off-line calculation for mass data, we use " Spark ",
Result is stored in HBase, is used for each application system;
It is real when calculated result being presented to each operation system using including, or utilizing WebApi that Hive SQL is called to carry out
Statistics calculates;
Step2, data analysis in industry
Data source explanation
Data are analyzed in industry, and data source comes from each operation system, and type of database, business datum format, it is each not
Identical, the Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, such as: it is prompt
Close " business scope (materials and parts, finished product), Merger (materials and parts, finished product, partial loss consumption), inlet and outlet inventory, book core of way system
Pin, log management " etc.;Logistics system in area is " record information (materials and parts, finished product, partial loss consumption), application form, warehouse information, simple
Add inventory, go out storage bill information etc. ";
Analytical plan
The first step obtains data using SparkStreaming from Kafka, carries out big data processing and calculates, will calculate
Intermediate result,<K is stored in Hbase using race's formula, V>, such as the design of Key has: electronics, clothes, food, chemical industry etc.
Deng, then stored again using secondary classification, equally using<K, V>, such as the Key design of secondary classification: the V of first-level class;According to
Secondary calculating intermediate result;
Second step, using<K, V>storage mode, data are poured into HDFS, in conjunction with Elasticsearch frame, benefit
It is quickly searched in search engine;
Third step, for the efficiency of calculating, using the JDBC of the offer of SparkSQL, from intermediate result set (Hbase)
Data pick-up and calculating are carried out, is finally presented result to the page;
Step3, loss standard intellectual analysis are reminded
Data source explanation
Data source comes from internal data, outside purchase data, enterprise's ERP creation data etc., type of database, business number
According to format, different, the Data Integration of each operation system need to be synchronized to big data distributed file storage system by early period
In HDFS, such as: " partial loss consumption " data, manual for processing trade " partial loss consumption " information, processing trade book of way system are closed in victory
" BOM " information, the loss standard information in third party's Information Network, the loss information that third party's data supplier provides, in enterprise
Portion ERP production loss information;
Analytical plan:
The first step obtains off-line data using SparkStreaming from Hdfs, carries out big data processing and calculates, in conjunction with
Customs products code table operates with complicated calculations such as Map, Reduce and join and result is stored as in intermediate result MySql,
Main table information is compiled from customs products code table, the total more a quotient of 1W, and data volume is less to be directly stored in MySql;
Second step is completed loss standard using " spark proposed algorithm " and is intelligently divided in conjunction with business demand practical application scene
Analysis is recommended.
Wherein, specific step is as follows for enterprise's importing data:
Enterprise's login system is clicked in corresponding module and imports button, and pop-up imports page of data, clicks on the page
" navigation button ", selection need the file uploaded, then click " upload " button, and system is uploaded to cloud for file is imported at this time
File server, and the execution task that increases by one " upload successfully pending " is imported in record in the file of lower section;
Backstage executes task verification mode by task schedule, carries out service logic verification and intelligent data to data
Verification, after the completion of verification, be classified into verify successfully, verification failure, alarm data, and it is aobvious in the common interface of importing data
Show, user can downloading data confirm, play chess to verify successfully and alarm data, ACK button can be clicked and imported into system.
The data characteristics that data are analyzed in industry are as follows:
(1) data volume is calculated by TB, and existing customer man, company number is 4000 or so;
(2) data storage is more dispersed, has this system using Sas mode, some operation systems use C/S model, C/S's
Mode carries out system deployment by the way of point library, and it is more multiple to share 400 or so, ETL design work for comprehensive each data source
It is miscellaneous;
(3) efficiency requirements calculated are higher, and when user carries out data importing in operation system, page data, which imports, to be completed
Afterwards, big data WebApi is called to analyze into industry data, by the analysis result presentation of industry to operation system interface, for its user
With reference to, e.g., sluice-gate price lattice of same specification and model object in industry, in industry with the name of an article declare unit, same quotient in industry
Buying source of product etc..
The data characteristics that loss standard intellectual analysis is reminded are as follows:
(1) data volume is larger, calculates by TB;
(2) data source is more, and data format disunity has formatting, there is files classes;
(3) efficiency requirements calculated are higher, when user carries out data inputting in operation system, imports, remind anti-
Feedback calls big data WebApi to analyze into industry data, partial loss standard is presented to operation system interface, refers to for its user.
One concrete application of the present embodiment are as follows: enterprise's initial landing system simultaneously enters importing interface, and selection needs to upload
File carry out file uploading operation, system by import file be uploaded to clouds file server, complete the uploading operation of file.
When needing, file is downloaded from clouds file server, completes the downloading work of file, analysis is handled by big data from the background and is carried out
Data check work, after the completion of verification, by returned data check results be verify successfully, verify unsuccessfully, alarm data, and into
Enter to import in interface and show, user can downloading data confirm.The simple business of data progress of the typing of E book enterprise, importing
After verification, intelligent verification, data preparation, conclusion, analysis based on big data are carried out, customs's logistic industry informationization neck is obtained
Required reference data in domain, such as the sluice-gate price lattice of specifications and models object of the same race.It is handled by big data
Enterprise imports the range of normal value and professional standard value range of data, and the judgement of reasonable value can be carried out in data importing process,
Legal risk caused by artificial problem, is reduced during avoiding the exceeded numerical value of typing of making mistakes due to enterprise operations, reduction from declaring
Chargeback risks are examined, the human cost of enterprise is saved, improves enterprise's execution efficiency.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means
Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention
In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example.
Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close
Suitable mode combines.
Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to better explain the present invention
Principle and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only
It is limited by claims and its full scope and equivalent.
Claims (2)
1. a kind of E book data verification method based on big data, including enterprise import data and big data processing, feature exists
In, data are imported according to the enterprise, calculates and analyzes by multiple dimensions, obtain the range that enterprise normally imports data value,
Further according to industry where enterprise, after the Various types of data in industry is arranged, is concluded, is analyzed, is summarized, obtain general in industry
All over standard value, specific step is as follows for the big data processing:
Step1, Construction of Data Warehouse
The Construction of Data Warehouse includes each declaration system data source, ETL data exchange, data storage and processing and application;
Each declaration system data source is combing intra-company's data source, external data source, structured data source and non-structural
Change data and carries out Data Integration;
The ETL data exchange be utilize " Kafka " integrate real-time production data on line, " Kettle " integrating exterior data and from
Line number evidence and " Filebeat " are integrated produces journal file in real time on line;
The data storage and processing include using HDFS distributed file storage system, and storing data exists for mass data
It being operated using HBase Sql or Hive Sql, HBase Sql for result set when line computation, Hive Sql is inquired for efficient,
The intermediate result of calculating is stored in race's formula storage HBase, carrying out off-line calculation for mass data, we use " Spark ",
Result is stored in HBase, is used for each application system;
The application includes calculated result being presented to each operation system, or utilize reality when WebApi calling Hive SQL progress
Statistics calculates;
Step2, data analysis in industry
Data source explanation
Data are analyzed in industry, and data source comes from each operation system, and type of database, business datum format, different,
The Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period, such as: victory is closed logical
System " business scope (materials and parts, finished product), Merger (materials and parts, finished product, partial loss consumption), inlet and outlet inventory, book check and write off, day
Will management " etc.;" record information (materials and parts, finished product, partial loss consumption), warehouse information, simply adds clearly application form logistics system in area
List, out storage bill information etc. ";
Data characteristics
(1) data volume is calculated by TB, and existing customer man, company number is 4000 or so;
(2) data storage is more dispersed, has this system using Sas mode, some operation systems use C/S model, the mode of C/S
System deployment is carried out by the way of point library, it is complex to share 400 or so, ETL design work for comprehensive each data source;
(3) efficiency requirements calculated are higher, when user carries out data importing in operation system, after the completion of page data imports,
It calls big data WebApi to analyze into industry data, the analysis result presentation of industry to operation system interface is joined for its user
Examine, e.g., sluice-gate price lattice of same specification and model object in industry, in industry with the name of an article declare unit, same commodity in industry
Buying source etc.;
Analytical plan
The first step obtains data using SparkStreaming from Kafka, carries out big data processing and calculates, will be in calculating
Between as a result, be stored in Hbase<K using race's formula, V>, such as the design of Key has: electronics, clothes, food, chemical industry etc., so
It is stored again using secondary classification afterwards, equally using<K, V>, such as the Key design of secondary classification: the V of first-level class;Successively calculate
Intermediate result;
Second step, using<K, V>storage mode, data are poured into HDFS, in conjunction with Elasticsearch frame, conducive to searching
Index holds up quick lookup;
Third step, for the efficiency of calculating, using the JDBC of the offer of SparkSQL, (Hbase) is carried out from intermediate result set
Data pick-up and calculating are finally presented result to the page;
Step3, loss standard intellectual analysis are reminded
Data source explanation
Data source comes from internal data, outside purchase data, enterprise's ERP creation data etc., type of database, business datum lattice
Formula, different, the Data Integration of each operation system need to be synchronized in big data distributed file storage system HDFS by early period,
Such as: " partial loss consumption " data, manual for processing trade " partial loss consumption " information, processing trade book " BOM " information of way system are closed in victory,
Loss standard information in third party's Information Network, the loss information that third party's data supplier provides, enterprises ERP production damage
Consume information;
Data characteristics
(1) data volume is larger, calculates by TB;
(2) data source is more, and data format disunity has formatting, there is files classes;
(3) efficiency requirements calculated are higher, when user carries out data inputting in operation system, imports, carry out prompting feedback, adjust
It is analyzed with big data WebApi into industry data, partial loss standard is presented to operation system interface, is referred to for its user;
Analytical plan:
The first step obtains off-line data using SparkStreaming from Hdfs, carries out big data processing and calculates, in conjunction with customs
Commodity code table operates with complicated calculations such as Map, Reduce and join and result is stored as in intermediate result MySql, main table
Information is compiled from customs products code table, the total more a quotient of 1W, and data volume is less to be directly stored in MySql;
Second step is completed loss standard intellectual analysis using " spark proposed algorithm " and is pushed away in conjunction with business demand practical application scene
It recommends.
2. a kind of E book data verification method based on big data according to claim 1, which is characterized in that the enterprise
Industry imports data, and specific step is as follows:
Enterprise's login system is clicked in corresponding module and imports button, and pop-up imports page of data, clicks " the browsing on the page
Button ", selection need the file uploaded, then click " upload " button, and system is uploaded to cloud file for file is imported at this time
Server, and the execution task that increases by one " upload successfully pending " is imported in record in the file of lower section;
Backstage executes task verification mode by task schedule, carries out service logic verification to data and intelligent data verifies,
After the completion of verification, be classified into verifying successfully, verification failure, alarm data, and import data common interface in show, user
Can downloading data confirmed, play chess to verify successfully and alarm data, ACK button can be clicked and imported into system.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310713489.6A CN116775726A (en) | 2019-06-04 | 2019-06-04 | E account book data verification method based on big data |
CN201910481957.5A CN110362605A (en) | 2019-06-04 | 2019-06-04 | A kind of E book data verification method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910481957.5A CN110362605A (en) | 2019-06-04 | 2019-06-04 | A kind of E book data verification method based on big data |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310713489.6A Division CN116775726A (en) | 2019-06-04 | 2019-06-04 | E account book data verification method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110362605A true CN110362605A (en) | 2019-10-22 |
Family
ID=68215153
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910481957.5A Pending CN110362605A (en) | 2019-06-04 | 2019-06-04 | A kind of E book data verification method based on big data |
CN202310713489.6A Pending CN116775726A (en) | 2019-06-04 | 2019-06-04 | E account book data verification method based on big data |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310713489.6A Pending CN116775726A (en) | 2019-06-04 | 2019-06-04 | E account book data verification method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN110362605A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110928232A (en) * | 2019-12-31 | 2020-03-27 | 大连华锐重工焦炉车辆设备有限公司 | Mechanical digital twin control system of coke oven |
CN116823464A (en) * | 2023-06-06 | 2023-09-29 | 海通期货股份有限公司 | Data asset management platform, electronic device, and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075304A (en) * | 2006-05-18 | 2007-11-21 | 河北全通通信有限公司 | Method for constructing decision supporting system of telecommunication industry based on database |
CN106919685A (en) * | 2017-03-02 | 2017-07-04 | 浪潮软件集团有限公司 | Mass data file processing method |
CN107103050A (en) * | 2017-03-31 | 2017-08-29 | 海通安恒(大连)大数据科技有限公司 | A kind of big data Modeling Platform and method |
CN107945086A (en) * | 2017-11-17 | 2018-04-20 | 广州葵翼信息科技有限公司 | A kind of big data resource management system applied to smart city |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
-
2019
- 2019-06-04 CN CN201910481957.5A patent/CN110362605A/en active Pending
- 2019-06-04 CN CN202310713489.6A patent/CN116775726A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075304A (en) * | 2006-05-18 | 2007-11-21 | 河北全通通信有限公司 | Method for constructing decision supporting system of telecommunication industry based on database |
CN106919685A (en) * | 2017-03-02 | 2017-07-04 | 浪潮软件集团有限公司 | Mass data file processing method |
CN107103050A (en) * | 2017-03-31 | 2017-08-29 | 海通安恒(大连)大数据科技有限公司 | A kind of big data Modeling Platform and method |
CN107945086A (en) * | 2017-11-17 | 2018-04-20 | 广州葵翼信息科技有限公司 | A kind of big data resource management system applied to smart city |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
Non-Patent Citations (1)
Title |
---|
兰见春: "基于Spark的犯罪预警分析系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110928232A (en) * | 2019-12-31 | 2020-03-27 | 大连华锐重工焦炉车辆设备有限公司 | Mechanical digital twin control system of coke oven |
CN110928232B (en) * | 2019-12-31 | 2023-11-14 | 大连华锐重工焦炉车辆设备有限公司 | Mechanical digital twin control system of coke oven |
CN116823464A (en) * | 2023-06-06 | 2023-09-29 | 海通期货股份有限公司 | Data asset management platform, electronic device, and computer-readable storage medium |
CN116823464B (en) * | 2023-06-06 | 2024-03-26 | 海通期货股份有限公司 | Data asset management platform, electronic device, and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116775726A (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Baruffaldi et al. | Warehouse management system customization and information availability in 3pl companies: A decision-support tool | |
US8340995B2 (en) | Method and system of using artifacts to identify elements of a component business model | |
Choy et al. | A knowledge-based supplier intelligence retrieval system for outsource manufacturing | |
US20210118054A1 (en) | Resource exchange system | |
Al-Sabri et al. | A comparative study and evaluation of ERP reference models in the context of ERP IT-driven implementation: SAP ERP as a case study | |
Peng et al. | Transportation planning for sustainable supply chain network using big data technology | |
Krmac | Intelligent value chain networks: business intelligence and other ICT tools and technologies in supply/demand chains | |
Mohsen | Developments of digital technologies related to supply chain management | |
Ziari et al. | A review on competitive pricing in supply chain management problems: models, classification, and applications | |
CN110362605A (en) | A kind of E book data verification method based on big data | |
US20140129269A1 (en) | Forecasting Business Entity Characteristics Based on Planning Infrastructure | |
CN115375149A (en) | Supply chain strategy determination method, medium, device and computing equipment | |
Wei | [Retracted] A Machine Learning Algorithm for Supplier Credit Risk Assessment Based on Supply Chain Management | |
CN112990886A (en) | Aviation industry data management display system based on mobile phone terminal | |
CN117057686A (en) | Intelligent management method, device, equipment and storage medium for material purchase | |
US8417594B2 (en) | Dimension-based financial reporting using multiple combinations of dimensions | |
Schnellbächer et al. | Jumpstart to Digital Procurement | |
CN116452340A (en) | Investment management method, device and storage medium | |
Hejazi et al. | Robust optimization of sustainable closed-loop supply chain network considering product family | |
US20140149186A1 (en) | Method and system of using artifacts to identify elements of a component business model | |
CN107229996B (en) | Integrated supply chain management platform | |
Dolz Ausina | Evaluation of different AI applications for operational logistic systems | |
CN113127498A (en) | Method and system for efficiently calculating pre-actual cost of shipbuilding product | |
Baruti | Analysis and Implementation of a Business Intelligence QlikView application for logistic and procurement management. Sews Cabind case for the shortage problem. | |
Ivanov et al. | Processes, systems, and models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215000 Bamboo Garden Road, Suzhou high tech Zone, Jiangsu Province, No. 209 Applicant after: Suzhou Zhimao Jietong Technology Co.,Ltd. Address before: 215000 Bamboo Garden Road, Suzhou high tech Zone, Jiangsu Province, No. 209 Applicant before: SUZHOU DIGITAL CHINA JET TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |