CN108449216A - A kind of logistics sorting data statistical approach based on Spark technologies - Google Patents
A kind of logistics sorting data statistical approach based on Spark technologies Download PDFInfo
- Publication number
- CN108449216A CN108449216A CN201810312294.XA CN201810312294A CN108449216A CN 108449216 A CN108449216 A CN 108449216A CN 201810312294 A CN201810312294 A CN 201810312294A CN 108449216 A CN108449216 A CN 108449216A
- Authority
- CN
- China
- Prior art keywords
- data
- logistics
- sorting
- journal file
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0604—Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Fuzzy Systems (AREA)
- Strategic Management (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to logistics and transportation technical fields, it is related to a kind of logistics sorting data statistical approach based on Spark technologies, server end first remotely obtains the logistics sorting journal file of client, then utilizes Spark technologies, analyte stream sorts the sorting data in journal file, and is counted;The statistics to express parcel sorting data information may be implemented in the logistics sorting data statistical approach of the present invention, and improves statistical efficiency.
Description
Technical field
The present invention relates to a kind of data statistical approach, especially a kind of logistics based on Spark technologies sorts data statistics
Method belongs to logistics and transportation technical field.
Background technology
For traditional logistics automatic sorting data statistical approach, since sorting data are stored in tables of data, so logical
Be often carry out the statistics of data by writing SQL statement, but logistics automatic sorting includes a large amount of data information, however
Data base querying cannot meet the data query of magnanimity, and for mass data statistics, pass through the effect of data base querying
Rate is very low, or even there is also the phenomenon that inquiring interim card.With the expansion of data, big data technology is come into being, traditional
The MapReduce technical costs of big data statistical technique Hadoop is very high, while programming model is not very flexibly, to realize one simultaneously
The data statistics of the scene of row or successive ignition is really cumbersome, and with high latency and can not iterate to calculate scarce
Point, so it is most important to invent a kind of completely new logistics automatic sorting data statistical approach by comprehensive analysis.
Invention content
The purpose of the present invention is being directed to the problem of prior art encounters, a kind of logistics sorting based on Spark technologies is provided
Data statistical approach may be implemented the statistics to express parcel sorting data information, and improve statistical efficiency, from different dimensions
Sorting package number is checked, to assess the sorting efficiency of every sorting line.
To realize the above technical purpose, the technical scheme is that:A kind of logistics sorting data based on Spark technologies
Statistical method, which is characterized in that include the following steps:
Step 1 server ends remotely obtain the logistics sorting journal file of client;
Step 2 utilizes Spark technologies, and analyte stream sorts the sorting data in journal file, and is counted.
Further, the method that the logistics sorting journal file of client is obtained in the step 1 is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to service first step
The Log conditions that the time of device and needs are analyzed;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if phase
Deng, then execute third step, otherwise continue to execute second step;
Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files
Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end.
Further, the client is connect by Internet network with server end signal, and the server end needs
It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day
The operation of will file.
Further, the server end is after receiving the logistics sorting journal file that client transmits, according to sorting
Wire size and daily record date information are locally creating the file for storing daily record, by the logistics received sorting journal file point
Storage is opened, checking and managing for daily record, and the journal file of compression is carried out to subtract squeeze operation.
Further, the statistical method of logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number
According to the distribution for realizing journal file stores;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot
The processing of Spark technologies is carried out, needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step filters out information useless in log information using Filter operators, retains useful to data statistics
Information;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics
Terminate.
Logistics sorting data statistical approach of the present invention has the advantage that:
1) present invention counts logistics sorting data using Spark technologies, uses the thought divided and rule, and first will
Data carry out distribution process, and then various pieces, which synchronize, is counted, and handle the analysis that data obviously accelerate data in this way, carry
High statistical efficiency;
2) statistics to express parcel sorting data information may be implemented in the present invention, realizes from different dimensions and checks sorting packet
Number is wrapped up in, to assess the sorting efficiency of every sorting line.
Description of the drawings
Fig. 1 is the flow chart for the logistics sorting journal file that the present invention obtains client.
Fig. 2 is the statistical method flow chart of logistics automatic sorting data of the present invention.
Fig. 3 is the present invention and traditional statistical method statistical efficiency comparison diagram.
Specific implementation mode
With reference to specific drawings and examples, the invention will be further described.
A kind of logistics sorting data statistical approach based on Spark technologies, which is characterized in that include the following steps:
As shown in Figure 1, step 1 server ends remotely obtain the logistics sorting journal file of client;
The method of the specific logistics sorting journal file for obtaining client is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to service first step
The Log conditions that the time of device and needs are analyzed;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if phase
Deng, then execute third step, otherwise continue to execute second step;
Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files
Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation, can improve transmission file effect in this way
Rate;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end;
Client is connect by Internet network with server end signal in the embodiment of the present invention, and the server end needs
It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day
The operation of will file;
The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record
Date information is locally creating the file for storing daily record, and the logistics received sorting journal file is separately stored, is used
Checking and managing in daily record, and the journal file of compression is carried out to subtract squeeze operation.
As shown in Fig. 2, step 2 utilizes Spark technologies, analyte stream to sort the sorting data in journal file, go forward side by side
Row statistics.
The statistical method of specific logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number
According to the distribution for realizing journal file stores, as the basis followed by data statistics;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot
The processing of Spark technologies is carried out, needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step carries out not needing to count all letters when data statistics since journal file includes many information
Breath, filters out information useless in log information using Filter operators, retains the information useful to data statistics, in this way can be with
Accelerate the speed of data statistics;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics
Terminate;
It is after data statistics, statistical result is locally downloading from HDFS data sets, finally use report and figure aobvious
Show statistical result.
By taking a sorting line as an example, this sorting line part statistical result showed such as following table that is obtained by Spark technologies
It is shown:
DataTime | Normal_Read_Num | Manual_Read_Num | Total_Num |
2018/1/18 | 82117 | 9677 | 91794 |
2018/1/19 | 86735 | 9910 | 96645 |
2018/1/20 | 82452 | 9370 | 91822 |
2018/1/21 | 80201 | 8727 | 88928 |
2018/1/22 | 71436 | 7825 | 79261 |
By upper table, we can be clearly seen that package sum that sorting line sorts daily, are sorted by normal reading code
Wrap up number and the package number by artificial complement code.
As shown in figure 3, for the present invention and traditional statistical method statistical efficiency comparison diagram, as seen from the figure, when data volume compares
When few, the inefficient of traditional data base querying and Spark stroke analysis logistics datas is away from very little, but with the increasing of data volume
Add, the efficiency using Spark technology statistical datas is higher and higher, hence it is evident that be higher than the efficiency of traditional data base querying;Due to passing
The database of system is that data are stored entirely in tables of data, so when data volume is very big, needs retrieval from the beginning to the end
Then database counts the data for meeting search request, and when with Spark stroke analysis, uses and divides and rule
Data are first carried out distribution process, then are screened by thought, and then various pieces, which synchronize, is counted, it is clear that handles number in this way
According to statistical efficiency can be significantly improved, accelerate the analysis of data, so the logistics automatic sorting number based on Spark technologies of the present invention
Method according to statistics is a kind of very effective method for statistics logistics automatic sorting data.
The present invention and its embodiments have been described above, description is not limiting, shown in attached drawing also only
It is one of embodiments of the present invention, practical structures are not limited thereto.All in all if those skilled in the art
It is enlightened by it, without departing from the spirit of the invention, is not inventively designed similar with the technical solution
Frame mode and embodiment, are within the scope of protection of the invention.
Claims (5)
1. a kind of logistics based on Spark technologies sorts data statistical approach, which is characterized in that include the following steps:
Step 1 server ends remotely obtain the logistics sorting journal file of client;
Step 2 utilizes Spark technologies, and analyte stream sorts the sorting data in journal file, and is counted.
2. a kind of logistics based on Spark technologies according to claim 1 sorts data statistical approach, which is characterized in that
The method that the logistics sorting journal file of client is obtained in the step 1 is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to server first step
Time and the Log conditions analyzed of needs;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if equal,
Third step is then executed, second step is otherwise continued to execute;
The journal file for meeting configuration condition in third step retrieval logistics sorting journal files, then by the daily record retrieved text
Part uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end.
3. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that
The client is connect by Internet network with server end signal, and the server end needs to provide a service interface
It is accessed to client, the Web Service at client call service device end, executes the operation for uploading journal file.
4. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that
The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record date information
The file for storing daily record is locally being created, the logistics received sorting journal file is separately being stored, for daily record
It checks and manages, and the journal file of compression is carried out to subtract squeeze operation.
5. a kind of logistics automatic sorting remote diagnosis method according to claim 1, which is characterized in that the step 2
In, the statistical method of logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as initial data,
Realize the distribution storage of journal file;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot be into
The processing of row Spark technologies needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step filters out information useless in log information using Filter operators, retains the letter useful to data statistics
Breath;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics knot
Beam.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810312294.XA CN108449216A (en) | 2018-04-09 | 2018-04-09 | A kind of logistics sorting data statistical approach based on Spark technologies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810312294.XA CN108449216A (en) | 2018-04-09 | 2018-04-09 | A kind of logistics sorting data statistical approach based on Spark technologies |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108449216A true CN108449216A (en) | 2018-08-24 |
Family
ID=63199399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810312294.XA Pending CN108449216A (en) | 2018-04-09 | 2018-04-09 | A kind of logistics sorting data statistical approach based on Spark technologies |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108449216A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104735139A (en) * | 2015-03-11 | 2015-06-24 | 小米科技有限责任公司 | Terminal information statistical method, device, terminal and server |
CN105704566A (en) * | 2016-04-25 | 2016-06-22 | 浪潮软件集团有限公司 | Video recommendation system based on television set top box |
CN105893628A (en) * | 2016-05-17 | 2016-08-24 | 中国农业银行股份有限公司 | Real-time data collection system and method |
CN106714099A (en) * | 2015-11-16 | 2017-05-24 | 广州优视网络科技有限公司 | Photograph information processing and scenic spot identification method, client and server |
CN106791983A (en) * | 2016-12-23 | 2017-05-31 | Tcl集团股份有限公司 | A kind of intelligent television user behavior analysis method and system |
CN106874114A (en) * | 2017-01-20 | 2017-06-20 | 上海丞风智能科技有限公司 | Express delivery management software system |
CN107682432A (en) * | 2017-09-28 | 2018-02-09 | 北京京东尚科信息技术有限公司 | Data handling system and method based on Spark |
-
2018
- 2018-04-09 CN CN201810312294.XA patent/CN108449216A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104735139A (en) * | 2015-03-11 | 2015-06-24 | 小米科技有限责任公司 | Terminal information statistical method, device, terminal and server |
CN106714099A (en) * | 2015-11-16 | 2017-05-24 | 广州优视网络科技有限公司 | Photograph information processing and scenic spot identification method, client and server |
CN105704566A (en) * | 2016-04-25 | 2016-06-22 | 浪潮软件集团有限公司 | Video recommendation system based on television set top box |
CN105893628A (en) * | 2016-05-17 | 2016-08-24 | 中国农业银行股份有限公司 | Real-time data collection system and method |
CN106791983A (en) * | 2016-12-23 | 2017-05-31 | Tcl集团股份有限公司 | A kind of intelligent television user behavior analysis method and system |
CN106874114A (en) * | 2017-01-20 | 2017-06-20 | 上海丞风智能科技有限公司 | Express delivery management software system |
CN107682432A (en) * | 2017-09-28 | 2018-02-09 | 北京京东尚科信息技术有限公司 | Data handling system and method based on Spark |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109151078B (en) | Distributed intelligent mail analysis and filtering method, system and storage medium | |
CN103366015B (en) | A kind of OLAP data based on Hadoop stores and querying method | |
US6721749B1 (en) | Populating a data warehouse using a pipeline approach | |
US20190222603A1 (en) | Method and apparatus for network forensics compression and storage | |
CN104331435B (en) | A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms | |
CN106982150B (en) | Hadoop-based mobile internet user behavior analysis method | |
CN113360554B (en) | Method and equipment for extracting, converting and loading ETL (extract transform load) data | |
CN102906751A (en) | Method and device for data storage and data query | |
DE202012013469U1 (en) | Data Processing Service | |
CN104298736B (en) | Data acquisition system connection method, device and Database Systems | |
CN101183371A (en) | Method for quick finishing large data-handling and reporting system | |
CN105243147A (en) | Slow query log management method and system of MySQL database | |
CN106951552A (en) | A kind of user behavior data processing method based on Hadoop | |
CN106250287A (en) | A kind of log information processing means | |
CN107463606B (en) | Data compression engine and method for big data storage system | |
CN109101504A (en) | A kind of efficient log compression and indexing means | |
CN105302831A (en) | High-speed calculation analysis method based on mass user behavior data | |
CN110413478A (en) | A kind of method, equipment and medium monitoring log processing | |
CN104239353A (en) | WEB classification control and log auditing method | |
CN105302915A (en) | High-performance data processing system based on memory calculation | |
CN103345527B (en) | Intelligent data statistical system | |
CN106557483B (en) | Data processing method, data query method, data processing equipment and data query equipment | |
CN105159820A (en) | Transmission method and device of system log data | |
CN112817926B (en) | File processing method and device, storage medium and electronic device | |
CN106919566A (en) | A kind of query statistic method and system based on mass data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180824 |
|
RJ01 | Rejection of invention patent application after publication |