CN108449216A - A kind of logistics sorting data statistical approach based on Spark technologies - Google Patents

A kind of logistics sorting data statistical approach based on Spark technologies Download PDF

Info

Publication number
CN108449216A
CN108449216A CN201810312294.XA CN201810312294A CN108449216A CN 108449216 A CN108449216 A CN 108449216A CN 201810312294 A CN201810312294 A CN 201810312294A CN 108449216 A CN108449216 A CN 108449216A
Authority
CN
China
Prior art keywords
data
logistics
sorting
journal file
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810312294.XA
Other languages
Chinese (zh)
Inventor
李倩玉
李功燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Intelligent Manufacturing Technology Co Ltd
Original Assignee
Jiangsu Intelligent Manufacturing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Intelligent Manufacturing Technology Co Ltd filed Critical Jiangsu Intelligent Manufacturing Technology Co Ltd
Priority to CN201810312294.XA priority Critical patent/CN108449216A/en
Publication of CN108449216A publication Critical patent/CN108449216A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Fuzzy Systems (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to logistics and transportation technical fields, it is related to a kind of logistics sorting data statistical approach based on Spark technologies, server end first remotely obtains the logistics sorting journal file of client, then utilizes Spark technologies, analyte stream sorts the sorting data in journal file, and is counted;The statistics to express parcel sorting data information may be implemented in the logistics sorting data statistical approach of the present invention, and improves statistical efficiency.

Description

A kind of logistics sorting data statistical approach based on Spark technologies
Technical field
The present invention relates to a kind of data statistical approach, especially a kind of logistics based on Spark technologies sorts data statistics Method belongs to logistics and transportation technical field.
Background technology
For traditional logistics automatic sorting data statistical approach, since sorting data are stored in tables of data, so logical Be often carry out the statistics of data by writing SQL statement, but logistics automatic sorting includes a large amount of data information, however Data base querying cannot meet the data query of magnanimity, and for mass data statistics, pass through the effect of data base querying Rate is very low, or even there is also the phenomenon that inquiring interim card.With the expansion of data, big data technology is come into being, traditional The MapReduce technical costs of big data statistical technique Hadoop is very high, while programming model is not very flexibly, to realize one simultaneously The data statistics of the scene of row or successive ignition is really cumbersome, and with high latency and can not iterate to calculate scarce Point, so it is most important to invent a kind of completely new logistics automatic sorting data statistical approach by comprehensive analysis.
Invention content
The purpose of the present invention is being directed to the problem of prior art encounters, a kind of logistics sorting based on Spark technologies is provided Data statistical approach may be implemented the statistics to express parcel sorting data information, and improve statistical efficiency, from different dimensions Sorting package number is checked, to assess the sorting efficiency of every sorting line.
To realize the above technical purpose, the technical scheme is that:A kind of logistics sorting data based on Spark technologies Statistical method, which is characterized in that include the following steps:
Step 1 server ends remotely obtain the logistics sorting journal file of client;
Step 2 utilizes Spark technologies, and analyte stream sorts the sorting data in journal file, and is counted.
Further, the method that the logistics sorting journal file of client is obtained in the step 1 is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to service first step The Log conditions that the time of device and needs are analyzed;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if phase Deng, then execute third step, otherwise continue to execute second step;
Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end.
Further, the client is connect by Internet network with server end signal, and the server end needs It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day The operation of will file.
Further, the server end is after receiving the logistics sorting journal file that client transmits, according to sorting Wire size and daily record date information are locally creating the file for storing daily record, by the logistics received sorting journal file point Storage is opened, checking and managing for daily record, and the journal file of compression is carried out to subtract squeeze operation.
Further, the statistical method of logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number According to the distribution for realizing journal file stores;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot The processing of Spark technologies is carried out, needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step filters out information useless in log information using Filter operators, retains useful to data statistics Information;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics Terminate.
Logistics sorting data statistical approach of the present invention has the advantage that:
1) present invention counts logistics sorting data using Spark technologies, uses the thought divided and rule, and first will Data carry out distribution process, and then various pieces, which synchronize, is counted, and handle the analysis that data obviously accelerate data in this way, carry High statistical efficiency;
2) statistics to express parcel sorting data information may be implemented in the present invention, realizes from different dimensions and checks sorting packet Number is wrapped up in, to assess the sorting efficiency of every sorting line.
Description of the drawings
Fig. 1 is the flow chart for the logistics sorting journal file that the present invention obtains client.
Fig. 2 is the statistical method flow chart of logistics automatic sorting data of the present invention.
Fig. 3 is the present invention and traditional statistical method statistical efficiency comparison diagram.
Specific implementation mode
With reference to specific drawings and examples, the invention will be further described.
A kind of logistics sorting data statistical approach based on Spark technologies, which is characterized in that include the following steps:
As shown in Figure 1, step 1 server ends remotely obtain the logistics sorting journal file of client;
The method of the specific logistics sorting journal file for obtaining client is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to service first step The Log conditions that the time of device and needs are analyzed;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if phase Deng, then execute third step, otherwise continue to execute second step;
Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation, can improve transmission file effect in this way Rate;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end;
Client is connect by Internet network with server end signal in the embodiment of the present invention, and the server end needs It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day The operation of will file;
The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record Date information is locally creating the file for storing daily record, and the logistics received sorting journal file is separately stored, is used Checking and managing in daily record, and the journal file of compression is carried out to subtract squeeze operation.
As shown in Fig. 2, step 2 utilizes Spark technologies, analyte stream to sort the sorting data in journal file, go forward side by side Row statistics.
The statistical method of specific logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number According to the distribution for realizing journal file stores, as the basis followed by data statistics;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot The processing of Spark technologies is carried out, needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step carries out not needing to count all letters when data statistics since journal file includes many information Breath, filters out information useless in log information using Filter operators, retains the information useful to data statistics, in this way can be with Accelerate the speed of data statistics;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics Terminate;
It is after data statistics, statistical result is locally downloading from HDFS data sets, finally use report and figure aobvious Show statistical result.
By taking a sorting line as an example, this sorting line part statistical result showed such as following table that is obtained by Spark technologies It is shown:
DataTime Normal_Read_Num Manual_Read_Num Total_Num
2018/1/18 82117 9677 91794
2018/1/19 86735 9910 96645
2018/1/20 82452 9370 91822
2018/1/21 80201 8727 88928
2018/1/22 71436 7825 79261
By upper table, we can be clearly seen that package sum that sorting line sorts daily, are sorted by normal reading code Wrap up number and the package number by artificial complement code.
As shown in figure 3, for the present invention and traditional statistical method statistical efficiency comparison diagram, as seen from the figure, when data volume compares When few, the inefficient of traditional data base querying and Spark stroke analysis logistics datas is away from very little, but with the increasing of data volume Add, the efficiency using Spark technology statistical datas is higher and higher, hence it is evident that be higher than the efficiency of traditional data base querying;Due to passing The database of system is that data are stored entirely in tables of data, so when data volume is very big, needs retrieval from the beginning to the end Then database counts the data for meeting search request, and when with Spark stroke analysis, uses and divides and rule Data are first carried out distribution process, then are screened by thought, and then various pieces, which synchronize, is counted, it is clear that handles number in this way According to statistical efficiency can be significantly improved, accelerate the analysis of data, so the logistics automatic sorting number based on Spark technologies of the present invention Method according to statistics is a kind of very effective method for statistics logistics automatic sorting data.
The present invention and its embodiments have been described above, description is not limiting, shown in attached drawing also only It is one of embodiments of the present invention, practical structures are not limited thereto.All in all if those skilled in the art It is enlightened by it, without departing from the spirit of the invention, is not inventively designed similar with the technical solution Frame mode and embodiment, are within the scope of protection of the invention.

Claims (5)

1. a kind of logistics based on Spark technologies sorts data statistical approach, which is characterized in that include the following steps:
Step 1 server ends remotely obtain the logistics sorting journal file of client;
Step 2 utilizes Spark technologies, and analyte stream sorts the sorting data in journal file, and is counted.
2. a kind of logistics based on Spark technologies according to claim 1 sorts data statistical approach, which is characterized in that The method that the logistics sorting journal file of client is obtained in the step 1 is as follows:
In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to server first step Time and the Log conditions analyzed of needs;
Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if equal, Third step is then executed, second step is otherwise continued to execute;
The journal file for meeting configuration condition in third step retrieval logistics sorting journal files, then by the daily record retrieved text Part uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation;
4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end.
3. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that The client is connect by Internet network with server end signal, and the server end needs to provide a service interface It is accessed to client, the Web Service at client call service device end, executes the operation for uploading journal file.
4. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record date information The file for storing daily record is locally being created, the logistics received sorting journal file is separately being stored, for daily record It checks and manages, and the journal file of compression is carried out to subtract squeeze operation.
5. a kind of logistics automatic sorting remote diagnosis method according to claim 1, which is characterized in that the step 2 In, the statistical method of logistics automatic sorting data is as follows:
First step reads logistics from server end and sorts log information;
Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as initial data, Realize the distribution storage of journal file;
Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot be into The processing of row Spark technologies needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process;
4th step filters out information useless in log information using Filter operators, retains the letter useful to data statistics Breath;
The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>;
6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time;
After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics knot Beam.
CN201810312294.XA 2018-04-09 2018-04-09 A kind of logistics sorting data statistical approach based on Spark technologies Pending CN108449216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810312294.XA CN108449216A (en) 2018-04-09 2018-04-09 A kind of logistics sorting data statistical approach based on Spark technologies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810312294.XA CN108449216A (en) 2018-04-09 2018-04-09 A kind of logistics sorting data statistical approach based on Spark technologies

Publications (1)

Publication Number Publication Date
CN108449216A true CN108449216A (en) 2018-08-24

Family

ID=63199399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810312294.XA Pending CN108449216A (en) 2018-04-09 2018-04-09 A kind of logistics sorting data statistical approach based on Spark technologies

Country Status (1)

Country Link
CN (1) CN108449216A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735139A (en) * 2015-03-11 2015-06-24 小米科技有限责任公司 Terminal information statistical method, device, terminal and server
CN105704566A (en) * 2016-04-25 2016-06-22 浪潮软件集团有限公司 Video recommendation system based on television set top box
CN105893628A (en) * 2016-05-17 2016-08-24 中国农业银行股份有限公司 Real-time data collection system and method
CN106714099A (en) * 2015-11-16 2017-05-24 广州优视网络科技有限公司 Photograph information processing and scenic spot identification method, client and server
CN106791983A (en) * 2016-12-23 2017-05-31 Tcl集团股份有限公司 A kind of intelligent television user behavior analysis method and system
CN106874114A (en) * 2017-01-20 2017-06-20 上海丞风智能科技有限公司 Express delivery management software system
CN107682432A (en) * 2017-09-28 2018-02-09 北京京东尚科信息技术有限公司 Data handling system and method based on Spark

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735139A (en) * 2015-03-11 2015-06-24 小米科技有限责任公司 Terminal information statistical method, device, terminal and server
CN106714099A (en) * 2015-11-16 2017-05-24 广州优视网络科技有限公司 Photograph information processing and scenic spot identification method, client and server
CN105704566A (en) * 2016-04-25 2016-06-22 浪潮软件集团有限公司 Video recommendation system based on television set top box
CN105893628A (en) * 2016-05-17 2016-08-24 中国农业银行股份有限公司 Real-time data collection system and method
CN106791983A (en) * 2016-12-23 2017-05-31 Tcl集团股份有限公司 A kind of intelligent television user behavior analysis method and system
CN106874114A (en) * 2017-01-20 2017-06-20 上海丞风智能科技有限公司 Express delivery management software system
CN107682432A (en) * 2017-09-28 2018-02-09 北京京东尚科信息技术有限公司 Data handling system and method based on Spark

Similar Documents

Publication Publication Date Title
CN109151078B (en) Distributed intelligent mail analysis and filtering method, system and storage medium
CN103366015B (en) A kind of OLAP data based on Hadoop stores and querying method
US6721749B1 (en) Populating a data warehouse using a pipeline approach
US20190222603A1 (en) Method and apparatus for network forensics compression and storage
CN104331435B (en) A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
CN113360554B (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN102906751A (en) Method and device for data storage and data query
DE202012013469U1 (en) Data Processing Service
CN104298736B (en) Data acquisition system connection method, device and Database Systems
CN101183371A (en) Method for quick finishing large data-handling and reporting system
CN105243147A (en) Slow query log management method and system of MySQL database
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN106250287A (en) A kind of log information processing means
CN107463606B (en) Data compression engine and method for big data storage system
CN109101504A (en) A kind of efficient log compression and indexing means
CN105302831A (en) High-speed calculation analysis method based on mass user behavior data
CN110413478A (en) A kind of method, equipment and medium monitoring log processing
CN104239353A (en) WEB classification control and log auditing method
CN105302915A (en) High-performance data processing system based on memory calculation
CN103345527B (en) Intelligent data statistical system
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN105159820A (en) Transmission method and device of system log data
CN112817926B (en) File processing method and device, storage medium and electronic device
CN106919566A (en) A kind of query statistic method and system based on mass data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180824

RJ01 Rejection of invention patent application after publication