CN108449216A

CN108449216A - A kind of logistics sorting data statistical approach based on Spark technologies

Info

Publication number: CN108449216A
Application number: CN201810312294.XA
Authority: CN
Inventors: 李倩玉; 李功燕
Original assignee: Jiangsu Intelligent Manufacturing Technology Co Ltd
Current assignee: Jiangsu Intelligent Manufacturing Technology Co Ltd
Priority date: 2018-04-09
Filing date: 2018-04-09
Publication date: 2018-08-24

Abstract

The invention belongs to logistics and transportation technical fields, it is related to a kind of logistics sorting data statistical approach based on Spark technologies, server end first remotely obtains the logistics sorting journal file of client, then utilizes Spark technologies, analyte stream sorts the sorting data in journal file, and is counted；The statistics to express parcel sorting data information may be implemented in the logistics sorting data statistical approach of the present invention, and improves statistical efficiency.

Description

A kind of logistics sorting data statistical approach based on Spark technologies

Technical field

The present invention relates to a kind of data statistical approach, especially a kind of logistics based on Spark technologies sorts data statistics Method belongs to logistics and transportation technical field.

Background technology

For traditional logistics automatic sorting data statistical approach, since sorting data are stored in tables of data, so logical Be often carry out the statistics of data by writing SQL statement, but logistics automatic sorting includes a large amount of data information, however Data base querying cannot meet the data query of magnanimity, and for mass data statistics, pass through the effect of data base querying Rate is very low, or even there is also the phenomenon that inquiring interim card.With the expansion of data, big data technology is come into being, traditional The MapReduce technical costs of big data statistical technique Hadoop is very high, while programming model is not very flexibly, to realize one simultaneously The data statistics of the scene of row or successive ignition is really cumbersome, and with high latency and can not iterate to calculate scarce Point, so it is most important to invent a kind of completely new logistics automatic sorting data statistical approach by comprehensive analysis.

Invention content

The purpose of the present invention is being directed to the problem of prior art encounters, a kind of logistics sorting based on Spark technologies is provided Data statistical approach may be implemented the statistics to express parcel sorting data information, and improve statistical efficiency, from different dimensions Sorting package number is checked, to assess the sorting efficiency of every sorting line.

To realize the above technical purpose, the technical scheme is that：A kind of logistics sorting data based on Spark technologies Statistical method, which is characterized in that include the following steps：

Step 1 server ends remotely obtain the logistics sorting journal file of client；

Step 2 utilizes Spark technologies, and analyte stream sorts the sorting data in journal file, and is counted.

Further, the method that the logistics sorting journal file of client is obtained in the step 1 is as follows：

In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to service first step The Log conditions that the time of device and needs are analyzed；

Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if phase Deng, then execute third step, otherwise continue to execute second step；

Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation；

4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end.

Further, the client is connect by Internet network with server end signal, and the server end needs It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day The operation of will file.

Further, the server end is after receiving the logistics sorting journal file that client transmits, according to sorting Wire size and daily record date information are locally creating the file for storing daily record, by the logistics received sorting journal file point Storage is opened, checking and managing for daily record, and the journal file of compression is carried out to subtract squeeze operation.

Further, the statistical method of logistics automatic sorting data is as follows：

First step reads logistics from server end and sorts log information；

Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number According to the distribution for realizing journal file stores；

Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot The processing of Spark technologies is carried out, needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process；

4th step filters out information useless in log information using Filter operators, retains useful to data statistics Information；

The useful data item filtered out in log information is packaged into RDD by the 5th step<Row>；

6th step is by RDD<Row>It is converted into DataSet<Row>, data statistics processing can be carried out at this time；

After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics Terminate.

Logistics sorting data statistical approach of the present invention has the advantage that：

1) present invention counts logistics sorting data using Spark technologies, uses the thought divided and rule, and first will Data carry out distribution process, and then various pieces, which synchronize, is counted, and handle the analysis that data obviously accelerate data in this way, carry High statistical efficiency；

2) statistics to express parcel sorting data information may be implemented in the present invention, realizes from different dimensions and checks sorting packet Number is wrapped up in, to assess the sorting efficiency of every sorting line.

Description of the drawings

Fig. 1 is the flow chart for the logistics sorting journal file that the present invention obtains client.

Fig. 2 is the statistical method flow chart of logistics automatic sorting data of the present invention.

Fig. 3 is the present invention and traditional statistical method statistical efficiency comparison diagram.

Specific implementation mode

With reference to specific drawings and examples, the invention will be further described.

A kind of logistics sorting data statistical approach based on Spark technologies, which is characterized in that include the following steps：

As shown in Figure 1, step 1 server ends remotely obtain the logistics sorting journal file of client；

The method of the specific logistics sorting journal file for obtaining client is as follows：

Meet the journal file of configuration condition, the day that then will be retrieved in third step retrieval logistics sorting journal files Will file uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation, can improve transmission file effect in this way Rate；

4th step clients, which will sort wire size, upload logging time and compressed document file uploads onto the server end；

Client is connect by Internet network with server end signal in the embodiment of the present invention, and the server end needs It provides a service interface to access to client, the Web Service at client call service device end are executed and uploaded day The operation of will file；

The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record Date information is locally creating the file for storing daily record, and the logistics received sorting journal file is separately stored, is used Checking and managing in daily record, and the journal file of compression is carried out to subtract squeeze operation.

As shown in Fig. 2, step 2 utilizes Spark technologies, analyte stream to sort the sorting data in journal file, go forward side by side Row statistics.

The statistical method of specific logistics automatic sorting data is as follows：

First step reads logistics from server end and sorts log information；

Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as original number According to the distribution for realizing journal file stores, as the basis followed by data statistics；

4th step carries out not needing to count all letters when data statistics since journal file includes many information Breath, filters out information useless in log information using Filter operators, retains the information useful to data statistics, in this way can be with Accelerate the speed of data statistics；

After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics Terminate；

It is after data statistics, statistical result is locally downloading from HDFS data sets, finally use report and figure aobvious Show statistical result.

By taking a sorting line as an example, this sorting line part statistical result showed such as following table that is obtained by Spark technologies It is shown：

DataTime	Normal_Read_Num	Manual_Read_Num	Total_Num
				2018/1/18	82117	9677	91794
2018/1/19	86735	9910	96645
				2018/1/20	82452	9370	91822
2018/1/21	80201	8727	88928
				2018/1/22	71436	7825	79261

By upper table, we can be clearly seen that package sum that sorting line sorts daily, are sorted by normal reading code Wrap up number and the package number by artificial complement code.

As shown in figure 3, for the present invention and traditional statistical method statistical efficiency comparison diagram, as seen from the figure, when data volume compares When few, the inefficient of traditional data base querying and Spark stroke analysis logistics datas is away from very little, but with the increasing of data volume Add, the efficiency using Spark technology statistical datas is higher and higher, hence it is evident that be higher than the efficiency of traditional data base querying；Due to passing The database of system is that data are stored entirely in tables of data, so when data volume is very big, needs retrieval from the beginning to the end Then database counts the data for meeting search request, and when with Spark stroke analysis, uses and divides and rule Data are first carried out distribution process, then are screened by thought, and then various pieces, which synchronize, is counted, it is clear that handles number in this way According to statistical efficiency can be significantly improved, accelerate the analysis of data, so the logistics automatic sorting number based on Spark technologies of the present invention Method according to statistics is a kind of very effective method for statistics logistics automatic sorting data.

The present invention and its embodiments have been described above, description is not limiting, shown in attached drawing also only It is one of embodiments of the present invention, practical structures are not limited thereto.All in all if those skilled in the art It is enlightened by it, without departing from the spirit of the invention, is not inventively designed similar with the technical solution Frame mode and embodiment, are within the scope of protection of the invention.

Claims

1. a kind of logistics based on Spark technologies sorts data statistical approach, which is characterized in that include the following steps：

2. a kind of logistics based on Spark technologies according to claim 1 sorts data statistical approach, which is characterized in that The method that the logistics sorting journal file of client is obtained in the step 1 is as follows：

In every logistics automatic sorting line, client need to be pre-configured with sorting wire size, upload daily record to server first step Time and the Log conditions analyzed of needs；

Client current time is compared by second step with the time for uploading daily record to server is pre-configured with, if equal, Third step is then executed, second step is otherwise continued to execute；

The journal file for meeting configuration condition in third step retrieval logistics sorting journal files, then by the daily record retrieved text Part uploads in a new folder, and is pressed from both sides to this document and carry out squeeze operation；

3. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that The client is connect by Internet network with server end signal, and the server end needs to provide a service interface It is accessed to client, the Web Service at client call service device end, executes the operation for uploading journal file.

4. a kind of logistics based on Spark technologies according to claim 2 sorts data statistical approach, which is characterized in that The server end is after receiving the logistics sorting journal file that client transmits, according to sorting wire size and daily record date information The file for storing daily record is locally being created, the logistics received sorting journal file is separately being stored, for daily record It checks and manages, and the journal file of compression is carried out to subtract squeeze operation.

5. a kind of logistics automatic sorting remote diagnosis method according to claim 1, which is characterized in that the step 2 In, the statistical method of logistics automatic sorting data is as follows：

First step reads logistics from server end and sorts log information；

Second step uploads to logistics sorting log information in distributed storage file HDFS data sets, as initial data, Realize the distribution storage of journal file；

Journal file in HDFS data sets is transported in Spark computing platforms by third step, since initial data cannot be into The processing of row Spark technologies needs for initial data to be converted into initial elasticity distribution formula data set RDD in input process；

4th step filters out information useless in log information using Filter operators, retains the letter useful to data statistics Breath；

After 7th step data statistics, statistical result is output in distributed storage file HDFS data sets, data statistics knot Beam.