CN115423841A

CN115423841A - Transportation terminal calibration method and system for bulk logistics

Info

Publication number: CN115423841A
Application number: CN202210943420.8A
Authority: CN
Inventors: 吴涛; 毛嘉莉; 朱开旋; 沈文怡; 周傲英
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-12-02
Anticipated expiration: 2042-08-08
Also published as: CN115423841B

Abstract

The invention discloses a transportation terminal calibration method for bulk logistics, which comprises the following steps: extracting freight train waybill tracks based on waybill time information; eliminating abnormal speed track points in the track, and acquiring a matched road of each track point; obtaining a stopping point in a subsequence with zero speed from a track point sequence of a driving road part, and clustering to generate a stopping area; generating a road roll-out point cluster by using a road roll-out point corresponding to a staying point in a staying area, and selecting a cluster center point position to obtain a road roll-out position corresponding to the staying area; merging the staying areas according to the road roll-out position to obtain a staying hot spot; extracting various features and splicing to obtain a multi-dimensional feature vector to characterize the stay hot spot; training a classification model by using the feature vectors and the labels of the stay hot spots as a transportation terminal recognition model; and inputting the characteristic vector of the transportation hot spot to be identified into the constructed identification model, obtaining the transportation terminal matched with each waybill, and updating the terminal.

Description

Transportation terminal calibration method and system for bulk logistics

Technical Field

The invention belongs to the technical field of data mining, and relates to a transportation terminal calibration method and system for bulk logistics.

Background

In the bulk freight field, customer receiving sites of large-scale manufacturing enterprises are stored in an address base as common transportation terminals and serve for logistics applications such as vehicle scheduling, path planning, freight settlement and the like. However, due to the reason that the address is input incorrectly by people, the delivery site is changed, and the like, some transportation endpoints with fuzzy addresses and even wrong addresses exist in the address library, which brings great challenges to the transportation link of bulk logistics. With the popularization of positioning equipment, data such as historical driving tracks and waybills and the like continuously generated by transport vehicles provide a data base for the calibration of a transport terminal. In summary, in order to ensure efficient operation of the freight link, it is urgently needed to design a transportation end point calibration method based on the freight note track, and update the wrong transportation end point of the address base in time.

With the rapid development of express services, the calibration of express delivery sites is of great interest, and the calibration is mainly completed by identifying the real delivery site through the marked position of express delivery personnel when the express delivery is completed. However, the phenomenon of delayed order return of a bulk freight driver occurs sometimes, that is, the position of the freight driver when returning the order has a large deviation from the actual transportation terminal position, and the phenomenon cannot be directly applied to the calibration of the bulk logistics transportation terminal. Meanwhile, due to the "long haul" nature of bulk shipments, freight drivers often stop at gas stations, temporary rest areas, etc. during the transport trip, these stop hot spots are in close proximity to the transportation terminal, and they are of varying sizes and may even have multiple stop areas, which presents challenges to the accurate location and identification of the bulk transportation terminal.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a bulk logistics-oriented transportation terminal calibration method. The invention provides a method for identifying a staying hot spot based on a road roll-out position based on the observation that a truck usually turns to and drives away from a similar position on a road and enters the same staying hot spot. Firstly, mining a staying area based on a clustering staying point of a DBSCAN method; positioning the dense position of the road roll-out point by adopting a Meanshift method to be used as the road roll-out position of the corresponding staying area; then, identifying the roll-out positions of roads adjacent to the positions by a hierarchical clustering method and combining corresponding stay areas to obtain stay hot spots; then, in order to accurately identify the transportation terminal, an XGboost transportation terminal identification model is constructed based on the behavior characteristics and the region characteristics of the staying hot spots; and finally, updating the terminal library by using the model.

Furthermore, in the first stage of the method, the existing transportation data is preprocessed, the waybill is matched with the original track to obtain a waybill track set, and the waybill track is subjected to track preprocessing steps such as denoising and map matching; the second stage of the method is stay hotspot mining, stay points in a DBSCAN clustering waybill track are adopted to identify stay areas, a stay area merging strategy based on a road roll-out position is provided for accurately identifying the stay hotspots with a plurality of stay areas, the road roll-out positions of the stay areas are extracted based on a Meanshift method clustering the road roll-out points, and then the road roll-out positions are grouped by a hierarchical clustering method to merge the corresponding stay areas to obtain the stay hotspots; in the third stage, firstly, the behavior characteristic set and the area characteristic set of the staying hot spot are extracted, a transportation terminal identification model is constructed based on XGboost, and finally, the transportation terminal is identified by using the model to finish position calibration and address updating.

In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:

the invention provides a transportation terminal calibration method for bulk logistics, which comprises the following steps:

s1: extracting the waybill track, and sequencing the waybill according to the starting time of each truck for executing different waybills, wherein for partial preorders which are overlapped with the occurrence time period of the subsequent transportation task, the task completion time of the preorders is adjusted on the condition that the task completion time of the preorders is not later than the starting time of the follow-up waybill; and then, extracting a track point sequence (the track point sequence is sorted according to time) of the corresponding vehicle in the time interval as the waybill track based on the starting time and the finishing time of the waybill.

S2: and (4) preprocessing the waybill track, calculating the speed of each track point in the waybill track based on the distance between the adjacent track points and the time interval, and eliminating the track points with the speed values larger than a given speed threshold value as abnormal track points. Meanwhile, a distance threshold value thr is set for a certain distance from the track point _r Range (considering that the sampling error of a sampling device is typically within 50 meters, where thr _r Set to 50 meters) as a hidden state of the hidden markov model, taking the distance between the track point and the vertically mapped point of the adjacent road as a state measurement, and finding a matching section of each track point using the Viterbi algorithm. On the basis, the track point sequence which cannot be matched with the road in the waybill track is regarded as the part of the driving-away road.

S3: and (4) digging a stopping area, namely, extracting the track point sequence of the driving road obtained in the step S2 for each waybill track, extracting all track point subsequences with the speed value of zero from the track point sequence of the driving road, and taking the first track point in all the zero-speed track point subsequences as a stopping point, namely, taking the position of the first track point in the zero-speed track point subsequences as the stopping position of the transport vehicle. Subsequently, it is considered that the DBSCAN method can find clusters of arbitrary shape in noisy spatial databases, suitable for the characteristic of irregular parking areas. Therefore, all the stop points are clustered by using the DBSCAN clustering method, then a plurality of stop point clusters with high stop point density are generated, and each stop point cluster is regarded as a stop area.

S4: the method comprises the steps of extracting a road roll-out position, and identifying a plurality of stay areas belonging to the same stay hotspot based on the road roll-out position based on the observation that a truck usually rolls out from a position close to the road and drives into the same stay hotspot, namely that the roll-out points of the truck driving out of the road are adjacent to each other. Therefore, for the stay area obtained in the step S3, the last track point matched with the nearest road in the waybill track where each stay point is located is taken as the road roll-out point corresponding to the stay point; on the basis, acquiring a road roll-out point set corresponding to each staying area; and finally, considering that the Meanshift clustering method can accurately position the position with dense data distribution density, grouping the road roll-out points by using the Meanshift clustering method to generate a plurality of road roll-out point clusters, and selecting the cluster central point position with the most road roll-out points as the road roll-out position corresponding to the staying area.

S5: and merging the stopping areas, namely merging all the stopping areas corresponding to the roll-out positions of the roads. For the road roll-out position set extracted in the S4, clustering by adopting a hierarchical clustering method to generate different groups, and merging stay areas (stay point clusters) corresponding to the road roll-out positions of each group to obtain stay hot points; for each stay hot spot, acquiring a minimum convex polygon containing all the stay points in the corresponding stay hot spot by using a convex hull algorithm, and taking the minimum convex polygon as the area range of the stay hot spot; meanwhile, calculating the central positions of all the staying areas, and taking the central positions as the positions of the staying hot spots; then, for each stay hotspot, information such as a stay duration list, a stay start time list, an invoice number list and the like of the stay area set contained in the stay hotspot are counted.

S6: and (4) feature extraction, based on the stay hotspots obtained in the step (5) and the corresponding information such as the stay duration list, the stay start time list, the waybill number list and the like, respectively extracting access behavior features such as stay duration, access time interval, stay frequency, cargo type and the like, and regional features such as a Point Of Interest (POI) and a nearby POI (Point Of Interest Point, a region with a specific functional meaning, such as a market, a gas station and the like) and a stay region area and the like, and sequentially splicing the corresponding feature values to form a multi-dimensional vector to represent each stay hotspot. And extracting access behavior characteristics such as stay time, access time interval, stay frequency and cargo type to form a behavior characteristic set, and forming a region characteristic set by region characteristics such as adjacent roads, nearby POI (Point Of Interest) and stay region area.

S7: modeling a transportation terminal, extracting a transportation terminal set as a positive sample, manually marking frequently-visited stopping places of freight vehicles such as temporary rest areas, gas stations, maintenance points and the like as negative samples, characterizing all samples as feature vectors by using the method S6, and training an XGboost classification model by using the feature vectors and labels of the stopping hot spot samples, namely constructing a two-classification model comprising a plurality of regression trees; the labels used to train the XGBoost classification model are labels representing positive and negative samples.

S8: and (3) updating a transportation destination library, extracting a candidate stop hot spot set matched with the transportation destination of each waybill based on the waybill track, inputting the corresponding characteristic vector into the transportation destination identification model constructed by S7, taking the stop hot spot with the maximum model output probability value as the transportation destination matched with each waybill, acquiring address text information of a corresponding position by using an inverse geocoding service interface of a Gandy map, and updating the position and the address information of the corresponding transportation destination in the destination library.

In order to optimize the technical scheme, the specific measures adopted in each step further comprise:

the step S1 specifically includes:

considering the situation that delayed order return (here, order return is to confirm that the transportation task is completed) of a freight driver can cause the occurrence time periods of two consecutive waybill tasks to have overlapping, the invention firstly sequences the waybill according to the starting time of each truck for executing different waybills; secondly, extracting the waybill sequences with overlapped task generation time periods, and adjusting the ending time of the preorder waybill before the starting time of the subsequent waybill; and finally, extracting the corresponding track of each waybill, namely the waybill track according to the starting time and the finishing time of different waybills of each truck.

The step S2 specifically includes:

due to signal interruption of the positioning equipment and human reasons, sampling is carried outSome abnormal points may exist in the obtained track data, namely track points which are normally spaced from the sampling interval of the former track points but are far away from the sampling interval of the former track points. The points with abnormal performance are also called noise points, and the noise track points with abnormal speed mainly influence the accurate identification of subsequent stopping points and further influence the accurate positioning of a stopping area. Therefore, the invention calculates the speed of each track point according to the distance and the time interval from the previous track point, and the speed is larger than the threshold thr _sp The track points of (a) are regarded as noise points and eliminated, and the speed threshold is set to the highest speed limit of the transport vehicle (for example, considering that the highest speed limit of the truck is 100 km/h, the speed threshold thr is set _sp Set at 27.7 meters/second); meanwhile, a certain distance threshold thr is set for the distance track point _r (set to 50 meters here) candidate roads within the range are taken as hidden states of the hidden markov model, the distance between the track point and the vertical mapping point adjacent to the candidate roads is taken as a state measurement, and the best matching road section of each track point is searched by using a Viterbi algorithm as a corresponding driving road section. On the basis, the track point sequence which cannot be matched with the road in the waybill track is regarded as the track point sequence of the driving-away road.

The step S3 specifically includes:

in order to avoid the influence of the stop points generated by the conditions of waiting traffic lights, traffic jam and the like on the identification of the stop areas, the step only pays attention to the unmatched track point sequences. Therefore, for the track point sequence of the driving road obtained in the step S2 for each waybill track, for the continuous zero-speed track point subsequence, the first track point in the subsequence is extracted as a stop point, the time stamp of the point is used as stop start time, and the time interval between the first point and the last point of the sequence where the track point is located is calculated as stop time. Clustering all the stop points by adopting a DBSCAN clustering method, wherein the DBSCAN method firstly randomly selects one stop point, searches other stop point sets with the distance to the stop point being less than eps, if the number of the stop points is more than minisample, establishes a cluster for the stop point, and otherwise marks the stop point as noise; then, traversing other stop points until a cluster is established, and merging the stop points with directly reachable density (namely, the distance between the stop points and any one of the stop points in the cluster is less than eps) into the cluster; the above steps are iterated until all the dwell points are clustered into clusters or marked as noise points. To this end, each stop point cluster is regarded as a stop area, the stop point clusters are the maximum set of stop points with connected densities, and in order to avoid clustering the stop points on two sides of a road into the same cluster, the clustering parameters eps are set to be 5 meters (namely the minimum distance with connected densities is 5 meters) and the minimum is set to be 5 meters (namely the number of the minimum stop points generating one cluster is 5) according to the width of the minimum road. According to the method, the clustering parameters are set mainly according to the clustering purpose and the actual scene, the clustering radius eps is set to be 5m according to the minimum width of the road section in the actual application scene, namely, the distance between stop points positioned on two sides of a road is more than 5m, the stop points cannot be clustered into a stop point cluster (namely, the stop point cluster cannot be identified into a stop area), and the minimum sample is set mainly according to the experience.

The step S4 specifically includes:

based on the staying area obtained in the S3, extracting the last track point matched with the nearest neighbor road section in the waybill track where the staying point is located as a road transfer-out point corresponding to the staying point, and taking the direction of the road transfer-out point pointing to the subsequent track point as the direction of the road transfer-out point; on the basis, a road roll-out point set corresponding to the stay points in each stay area is obtained, and then a Meanshift clustering method is adopted to cluster the road roll-out points. The Meanshift method randomly selects one of the road roll-off points, calculates the average value of the vector distances between other road roll-off points in the given radius R of the road roll-off point as the drift direction and distance of the road roll-off point in the next step, circulates the drift steps until the drift distance is smaller than a given parameter D, and then classifies the traversed road roll-off points into a cluster. The clustering parameters R, D are empirically set to 30 meters, 10 meters, respectively. In the invention, the road turning points are set empirically according to the distribution of the road turning points in the actual application scene, namely, the road turning points with the distance less than 30 meters are considered as a road turning position. And 10 meters is used as a convergence condition for the update of the cluster centroid. And finally, extracting longitude and latitude coordinates of the central point of the cluster with the largest number of road roll-out points as the road roll-out position, and calculating the average direction of all the road roll-out points in the cluster to obtain the direction of the road roll-out position.

The step S5 specifically includes:

and (5) for the road roll-out position set extracted in the S4, generating different roll-out position groups by adopting a hierarchical clustering method. Specifically, each road roll-out position is first regarded as a cluster, and the distance is minimized and the direction difference is smaller than the threshold thr _dir Merging the road roll-out position clusters (in the invention, thr is set) _dir At 15 degrees); secondly, recalculating the average position and the average direction of the merged cluster; the above steps are iteratively performed until there is no distance less than the distance threshold thr _dis (in the present invention, thr is set _dis 10 meters) and the direction difference is less than thr _dir Until cluster (c) is reached; then, merging the stopping areas corresponding to the roll-out positions of each group of roads to obtain a stopping hot spot; and for each stay hot spot, extracting the minimum convex polygon covering all the stay points by using a convex hull algorithm, taking the minimum convex polygon as the area range of the stay hot spot, and calculating the center position of the minimum convex polygon as the position of the stay hot spot. And finally, respectively counting and storing an invoice number list, a stay time length list, a stay start time list and the like corresponding to the stay point set in each stay hotspot.

The step S6 specifically includes:

step 6.1), residence time distribution feature extraction: it is observed that the stopping behaviour of a truck at the end of transport usually takes 30 to 60 minutes, 10 to 15 minutes at gas stations, and up to several hours at rest areas, maintenance points, etc. at hot spots. Therefore, the present invention divides the following time period intervals: [ (0, 15min ], (15min, 30min ], (30min, 60min ], (60min, 120min ], (120 min, + ∞) ], the ratio of the number of the stay points in the different intervals is counted according to the stay time length list, and a 5-dimensional vector obtained by splicing the stay points is taken as the characteristic representation of the corresponding stay hot point.

Step 6.2) access time interval distribution feature extraction: based on the ratio of the number of the stay points in each period (with 1 hour as a time interval) is counted by the stay start time list, the feature representation (namely, the feature representation in a 24-dimensional vector) of the ratio of the number of the stay points in each hour in 24 hours is formed by splicing.

Step 6.3), extraction of the distribution characteristics of the staying frequency: representing dwell frequency distribution characteristics as a vector [ fre ] ₁ ,fre ₂ ,……,fre _n ]Where n denotes the historical highest frequency of stay at a stay hotspot among all transport vehicles, here fre _i (0<i is less than or equal to n) represents the ratio of the number of the transportation tasks staying for i times in a staying hot spot, and the value is obtained according to the statistics of the corresponding freight bill list.

Step 6.4) cargo type feature extraction: and counting the quantity of the goods types related to the corresponding stay hot spots according to the waybill list to represent the characteristics.

Step 6.5) extracting the adjacent road level features: and extracting a road section closest to the stay hotspot based on the OSM open source map to obtain the road grade of the open source map, and representing the characteristic by adopting a single hot coding mode (namely, an N-bit state register is used for coding N road grade states).

Step 6.6) nearby POI category feature extraction: according to research, in the field of bulk logistics, a large number of factories and companies are usually arranged near a transportation terminal; transport drivers tend to choose to stay in the open space near the restaurant for rest and to have meals; whereas a stop hotspot at a gas station, service point, etc. is typically located in a car maintenance type POI gathering area. Therefore, for any stay hotspot, the number of POIs of the types such as factories, companies, catering, automobile maintenance, gas stations and the like within 3000 meters is obtained based on a POI query interface provided by a Goodpasture map, the occupation ratios of the POIs of different types are calculated, and then the POIs are spliced to generate the feature representation of the stay hotspot.

Step 6.7), area feature extraction of a staying area: based on the area range of the staying hot spot obtained in the step S5, the convex polygon corresponding to the area range is expressed as a vertex sequence { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),……,(x _h ,y _h ) And by the formula

The area of the dwell hotspot region is obtained.

And 6.8) finally, sequentially splicing the corresponding feature vectors to form a multi-dimensional vector as the feature representation of the stay hotspot.

The step S7 specifically includes:

dwell hotspot sample set by manual annotation h ₁ ,h ₂ ,……,h _q And the method is used for training a transportation terminal identification model. And (4) characterizing all marked stay hotspot samples into feature vectors by using the method S6, and iteratively learning M regression trees (expressed as f) by using the feature vectors and labels of the stay hotspot samples through an XGboost model _k (. 1. Ltoreq. K. Ltoreq.M)), the construction process is as follows:

wherein L (t) is an objective function at the t-th iteration, y _i For the ith (i is more than or equal to 1 and less than or equal to q) stay hotspot sample h _i The true tag of (2) is set,

represents a training sample h _i Predicted value of model at t-1 model iteration, f _t (h _i ) Represents dwell hotspot sample h _i Predicting the model value in the t model iteration; gamma (f) _t ) Is a regular term, η and λ are regular term coefficients, T is the number of leaf nodes, w _j Representing the output value of the jth leaf node.

The dwell hotspot h is then obtained by summing the output scores of the M regression trees _i Fraction of (2), i.e. Scor _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) By means of logic functions

It is mapped to a probability value output. And taking the model obtained by training as a final transportation terminal identification model (M regression trees, and outputting a probability value with a staying hot spot as a transportation terminal).

The step S8 specifically includes:

and for each transport destination to be calibrated, extracting a corresponding waybill track set, searching corresponding stop hot spots to form a candidate set according to the condition that at least one waybill track in the waybill set is contained in the waybill list, inputting the corresponding feature vector into a transport destination identification model constructed by S7, and outputting the stop hot spot with the maximum probability value as a transport destination matched with the waybill. And then, acquiring the address of the transportation terminal by using the inverse geocoding service API of the Gade map, and accordingly completing the position calibration and address update corresponding to the original terminal.

The invention also provides a transportation terminal calibration system for implementing the transportation terminal calibration method, which comprises the following steps: the system comprises a data preprocessing module, a staying hotspot mining module and a terminal library updating module;

the data preprocessing module is used for intercepting the waybill track based on a task time interval, denoising the track points based on speed abnormity and matching a map based on a Viterbi algorithm;

the stay hotspot mining module is used for stay point clustering based on DBSCAN, road driving-off position identification based on Meanshift and stay point cluster combination based on hierarchical clustering;

the terminal library updating module is used for characteristic set extraction based on behavior-region, transportation terminal modeling based on XGboost, transportation terminal identification and calibration.

The invention has the following beneficial effects:

1. based on the observation that drivers are all transferred to and driven from the similar positions on the roads to stay hot spots, the method obtains the last track point of the matched road in the waybill track as a road transfer-out point based on map matching, clusters the road transfer-out point by using Meanshift to position the road transfer-out position of the corresponding stay area, and finally identifies different stay areas belonging to the same stay hot spot based on a hierarchical clustering method of road steering positions and combines the different stay areas so as to identify the transportation terminal points with different sizes and a plurality of stay areas in the bulk logistics.

2. The XGboost two-classification model is constructed to identify the transportation terminal for calibration by combining the behavior feature sets of the stay time, the visit time period, the stay frequency, the cargo type and the like of the stay area and the region feature sets of the adjacent road section, the nearby POI and the stay area.

3. Compared with the conventional express delivery site calibration method based on a real bulk logistics data set, the method provided by the invention has the advantages that the average calibration absolute error (MAE) is minimum, and is improved by about 90.86% compared with a suboptimal method; the transit terminal with the calibration error distance within 3000m is the highest, which is improved by about 72.92% compared with the suboptimal method.

Drawings

FIG. 1 is a block diagram of a transportation endpoint calibration technique framework for bulk logistics.

FIG. 2 is a schematic structural diagram of a transportation terminal identification model constructed by the invention, which is composed of M regression trees and used for a staying hot spot h to be predicted _i Each regression tree in the transportation terminal identification model outputs a score value, the score values are added to obtain a final score, and the score is mapped into a probability value through a logic function to serve as the probability that the stop hot spot is of the type of the transportation terminal.

FIG. 3 is a comparison of dwell hotspot identification results for different methods.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

In view of the fact that the existing calibration method of the transportation delivery place is only used for determining the delivery position of the goods according to express delivery personnel and is not suitable for a large number of freight transportation scenes with a large amount of delay return orders, the invention provides a transportation terminal calibration strategy based on the road transfer-out position for accurately identifying transportation terminals with the characteristics of different sizes and mutual proximity. Firstly, clustering and excavating a staying area for a staying point by adopting a DBSCAN method; meanwhile, clustering a plurality of road roll-out points extracted through map matching and corresponding to the stopping points by using a Meanshift method so as to identify road roll-out positions corresponding to different stopping areas; on the basis, a plurality of stopping areas belonging to the same road roll-out position are combined to obtain a stopping hot spot. And then, respectively extracting access behavior characteristics including stay time, access time interval, stay frequency, cargo type and the like, and regional characteristics including adjacent road sections, nearby POIs, stay region areas and the like for the stay hot spots, constructing a two-classification model by adopting an XGboost method to identify a transportation destination, and updating a transportation destination library by utilizing the two-classification model.

Specifically, the invention discloses a transportation terminal calibration method for bulk logistics. As shown in fig. 1, the method comprises three stages. In the data preprocessing stage, acquiring a waybill track set based on a historical track and a waybill, identifying abnormal track points based on the speed of the track points, and acquiring a matching road section of the track points based on Viterbi map matching; in a stay hotspot mining part, detecting stay areas by adopting DBSCAN clustering stay points, clustering road driving points based on a Meanshift method to identify road roll-out positions corresponding to the stay areas, on the basis, generating different road roll-out position groups by a hierarchical clustering method, and combining the stay areas corresponding to the road roll-out positions of each group to obtain stay hotspots; in the terminal library updating stage, the behavior feature set and the area feature set of the stay hot spot are extracted, a transportation terminal identification model is built based on XGboost, and finally, the transportation terminal is identified from the stay hot spot based on the model, and the position calibration and the address updating are carried out on the original terminal.

As shown in FIG. 1, the present invention employs a three-stage shipping endpoint calibration framework comprising the following eight steps:

s1: extracting waybill tracks, and sequencing waybill according to the starting time of each truck for executing different waybills, wherein for part of waybill which is overlapped with the occurrence time period of a subsequent transportation task, the task completion time of each truck is adjusted on the condition that the task completion time is not later than the starting time of the subsequent waybill; then, the corresponding track of each waybill, namely the waybill track, is extracted based on the starting time and the finishing time of different waybills of each truck.

In an embodiment, step S1 specifically includes:

considering the situation that delayed order return (here, order return is that the completion of the transportation task is confirmed) action of a freight driver can cause overlapping of occurrence periods of two consecutive waybill tasks, the invention firstly sequences the waybills according to the starting time of executing different waybills by each truck; secondly, extracting the waybill sequence with overlapped task occurrence time periods, and adjusting the ending time of the preorder waybill before the starting time of the subsequent waybill; and finally, extracting the corresponding track of each waybill, namely the waybill track according to the starting time and the finishing time of different waybills of each truck.

S2: and (4) preprocessing the waybill track, calculating the speed of each track point of each waybill track, and eliminating track points which are normally spaced with the sampling time of the preorder track points but are far away from the sampling time of the preorder track points as abnormal speed track points, namely eliminating track points of which the speed values exceed a speed threshold value. Meanwhile, carrying out map matching on the waybill track by using a Viterbi method to obtain track point sets which are matched with the road section and are not matched with the road section.

In an embodiment, step S2 specifically includes:

due to signal interruption of the positioning equipment and human reasons, abnormal points exist in the original track, and the sampling interval from the preamble point is normal and far away. In order to avoid the influence of the noise points on the identification of the stop points, the invention calculates the speed of each track point based on the distance and the time interval from the preorder track point, and the speed is larger than the threshold thr _sp The trace points of (a) are considered as noise and eliminated (considering that the maximum speed limit of a truck is 100 km/h, the speed threshold thr _sp Set to 27.7 m/s); meanwhile, map matching is carried out on the waybill track based on a Viterbi method to obtain track pointsAnd marking the unmatched track segments therein.

S3: and (3) digging a staying area, namely, for the unmatched track point sequences extracted in the step S2 of each waybill track, extracting a first point in the zero-speed track point sequence as a staying point, and clustering and digging the staying area (namely, a staying point cluster) for all the staying points by adopting a DBSCAN method.

In an embodiment, step S3 specifically includes:

in order to avoid the influence of the stop points generated by the conditions of waiting for traffic lights, traffic jam and the like on the identification of the stop areas, only the unmatched track point sequences are concerned in the step. Therefore, for the unmatched track point sequence of each waybill track in the step S2, the first point of the track point sequence with continuous zero speed is extracted as a stop point, the time stamp of the point is used as the stop start time, and the time interval of the track point sequence is calculated as the stop time. Finally, mining the staying area (namely the staying point cluster) by adopting the DBSCAN clustering staying points, and setting the clustering parameter eps to be 5 meters and min according to the width of the minimum road in order to avoid clustering the staying points on the two sides of the road into one cluster _sample Is set to 5.

S4: and extracting the road roll-out position, wherein based on the observation that the trucks usually roll out and drive into the same stop hot spot at the similar positions on the road, namely the roll-out points of the trucks driving out of the road are adjacent to each other, the invention identifies a plurality of stop areas belonging to the same stop hot spot based on the road roll-out position. Therefore, for the stopping area obtained in the step S3, the last track point matched with the nearest road in the waybill track where each stopping point is located is taken as the road roll-out point corresponding to the stopping point; on the basis, acquiring a road roll-out point set corresponding to each staying area; and finally, clustering the road roll-out points by using a Meanshift method, and taking the position where the road roll-out points are dense as the road roll-out position corresponding to the staying area.

In an embodiment, step S4 specifically includes:

based on the stay area obtained in the S3, extracting the last track point matched with the nearest neighbor road section in the waybill track where the stay point is located as a road roll-out point corresponding to the stay point; on the basis, a road roll-out point set corresponding to the stay points in each stay area is obtained, then the corresponding road roll-out points are clustered by using a Meanshift method, the longitude and latitude of the central point of the largest cluster with the largest number of the road roll-out points are extracted to serve as the road roll-out position, and the average direction of the road roll-out points in the cluster is calculated to serve as the direction of the road roll-out point.

S5: and merging the stopping areas, namely merging all the stopping areas corresponding to the roll-out positions of the roads. For the road roll-out position set extracted in the S4, clustering by adopting a hierarchical clustering method to generate different groups, and combining the staying areas corresponding to the road roll-out positions of each group to obtain staying hot points; for each staying hot spot, acquiring a minimum convex polygon covering all staying areas by using a convex hull algorithm, and taking the minimum convex polygon as an area range covered by the staying hot spot; meanwhile, calculating the central positions of all the staying areas, and taking the central positions as the positions of the staying hot spots; then, for each stay hotspot, information such as a stay duration list, a stay start time list and an waybill list of the stay area set contained in the stay hotspot is counted.

In an embodiment, step S5 specifically includes:

and (5) for the road roll-out position set extracted in the S4, generating different roll-out position groups by adopting a hierarchical clustering method. Specifically, each road roll-out position is first regarded as a cluster, and the distance is minimized and the direction difference is smaller than the threshold thr _dir Merging the clusters of stop points (in the present invention, thr is set) _dir At 15 degrees); secondly, recalculating the average position and the average direction of the merged cluster; the above steps are iteratively performed until there is no distance less than the distance threshold thr _dis (thr _dis This experience is set to 10 meters) and the direction difference is less than thr _dir The cluster of (a); then, merging the staying areas corresponding to the roll-out positions of each group of roads to obtain staying hot spots; and for each staying hot spot, extracting the minimum convex polygon covering all the staying points by using a convex hull algorithm, taking the minimum convex polygon as the area range of the staying hot spot, and calculating the central position of the minimum convex polygon as the position of the staying hot spot. Finally, respectively counting the list of the corresponding waybill numbers of the staying point sets in each staying hot pointThe dwell time list and the dwell start time list are stored.

S6: and (5) feature extraction, namely respectively extracting access behavior features such as stay time, access time interval, stay frequency and cargo type and region features including adjacent road grade, nearby POI type and stay region area based on the stay hotspot information obtained in the S5, and performing feature vector representation on each stay hotspot.

In an embodiment, step S6 specifically includes:

step 6.1), residence time distribution feature extraction: it is observed that the length of stay of a truck at the transport terminal is mainly distributed between 30 and 60 minutes, that of a petrol station between 10 and 15 minutes, and that of a stay hot spot in a rest area, service point, etc., typically up to several hours. Therefore, the present invention divides the following time period intervals: [ (0, 15min ], (15min, 30min ], (30min, 60min ], (60min, 120min ], (120 min, + ∞ ]) and the ratio of the number of dwell points in each interval mentioned above is counted on the basis of the dwell time length list, and the feature representation of the corresponding dwell hotspot is formed by splicing.

Step 6.2) access time interval distribution feature extraction: and counting the number ratio of the stay points in each period (taking 1 hour as a time interval) based on the stay start time list, and splicing to form the characteristic representation of the corresponding stay hot point.

Step 6.3), stay frequency distribution feature extraction: representing the dwell frequency distribution characteristics as a vector [ fre ₁ ,fre ₂ ,……,fre _n ]Where n represents the historical maximum frequency of stops at a stop hot spot (n is set to 8 in the present invention by analysis of historical data) in all transport vehicles, fre _i (0<i is less than or equal to n) represents the ratio of the number of the transportation tasks staying for i times in a staying hot spot, and the value is obtained based on the corresponding waybill list statistics.

Step 6.4) cargo type feature extraction: and counting the quantity of the goods types related to the corresponding stay hot spots based on the waybill list to represent the characteristics.

Step 6.5) extracting the adjacent road level features: and extracting a road section closest to the stay hotspot based on the OSM map to obtain the road grade of the road section, and representing the characteristic in a mode of one-hot coding.

Step 6.6) nearby POI category feature extraction: based on business research, in the field of bulk logistics, a large number of factories and companies are usually arranged near a transportation terminal; transport drivers usually choose to stay in the open space near the restaurant for rest and to have meals conveniently; whereas a stop hotspot at a gas station, service point, etc. is typically located in a car maintenance-related type of POI gathering area. Therefore, the invention obtains the number of POIs of types such as factories, companies, catering, automobile maintenance, gas stations and the like within a range of 3000 meters based on the POI query interface, and calculates the occupation ratio of the POIs and splices the POIs to form the characteristic representation of the corresponding stay hotspot.

Step 6.7) area feature extraction of the staying area: based on the area range of the staying hot spot obtained in the step S5, the convex polygon corresponding to the area range is expressed as a vertex sequence { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),……,(x _h ,y _h ) And by the formula

And obtaining the area value of the staying hot spot area.

Finally, the corresponding feature vectors are sequentially spliced to form feature representations of the corresponding stay hotspots.

S7: and (4) modeling a transportation terminal, extracting a transportation terminal set as a positive sample based on the characteristic vector representation of the stay hot spot obtained in the S6, manually marking the stay places visited by freight vehicles such as a temporary rest area, a gas station, a maintenance point and the like as negative samples, and constructing a binary model for judging the transportation terminal by using an XGboost method.

In an embodiment, step S7 specifically includes:

iteratively learning M regression trees (denoted as f) through an XGboost model based on an annotated training sample set _k (. 1. Ltoreq. K. Ltoreq.M), where M is set to 10), as shown in FIG. 2, the construction process is as follows:

wherein L (t) is an objective function at the t-th iteration, y _i For the ith dwell hotspot sample h _i (1: transportation end 0: other stay hotspots),

represents a training sample h _i Predicted value of model at t-1 model iteration, f _t (h _i ) Represents dwell hotspot sample h _i Predicting the model value in the t model iteration; gamma (f) _t ) Is a regular term, η and λ are regular term coefficients, T is the number of leaf nodes, w _j Indicating the output value of the jth leaf node.

Subsequently, a stay hotspot h is obtained by summing the output scores of the M regression trees _i Fraction of (2), i.e. Scor _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) And mapping the probability value to a probability value through a logic function for output. And finally, carrying out long-term retention on the trained transportation hot spot recognition models (M regression trees).

S8: and (3) updating the transportation destination library, extracting a candidate stop hot spot set matched with the transportation destination of each waybill based on the waybill track, inputting the corresponding characteristic vector into the transportation destination identification model constructed by S7, taking the stop hot spot with the maximum model output probability value as the transportation destination matched with each waybill, then, acquiring the address of the corresponding position by using the reverse geocoding API of the Gandy map, and updating the position and the address of the corresponding transportation destination in the destination library according to the address.

In an embodiment, step S8 specifically includes:

and for each original terminal to be calibrated, extracting a corresponding waybill track set, searching corresponding stop hot spots to form a candidate set based on at least one waybill track in the waybill list, wherein the waybill track is contained in the waybill set, inputting the corresponding feature vectors into the transportation terminal identification model constructed by S7, and outputting the stop hot spots with the maximum probability values as matched transportation terminals by using the model. And then, acquiring the address of the transportation terminal by using the inverse geocoding service API of the Gade map, and accordingly completing the position calibration and address update corresponding to the original terminal.

In order to verify the effectiveness of the invention, real massive logistics data are selected to identify stay hot spots (output is convex polygons), actual stay hot spot regions (expressed as polygons) are labeled manually, and the identification results of the transportation terminal and other stay hot spots are compared with the identification results based on the traditional clustering methods (K-Means, hierarchical clustering, OPTICS and DBSCAN); the intersection ratio (IoU) between the detection region and the actual labeling region was evaluated. The comparison result is shown in fig. 3, the stay hot spot identification methods in the invention have higher IoU values, and the identification effect on the transportation terminal is more obvious.

Subsequently, a transportation terminal calibration is performed based on the partial data, and the results of the calibration are compared with the results of other existing delivery site calibration methods; the existing delivery place selection methods comprise DTInf, geoCloud and U-Net, and the methods all depend on the goods delivery position marked by the courier, and the method is implemented by taking the position of a freight driver when returning an order as the marked goods delivery position; the evaluation indices selected were MAE (mean absolute error), P85 (85% maximum calibration error), β _k (percentage of calibration samples within a given range of distance error K). Calibration comparison results as shown in table 1, the endpoint calibration using the present invention has minimal MAE (about 90.86% improvement over suboptimal method UNet) and P85 value (about 72.92% improvement over suboptimal method UNet) while also being effective for β ₅₀₀ 、β ₁₀₀₀ 、β ₃₀₀₀ The evaluation indexes all have the largest percentage.

Table 1 comparison table of calibration effect of different methods

In conclusion, the method and the device are based on the stop hot spot identification strategy of the road driving-away position, can be used for more accurately identifying the stop hot spots with a plurality of stop areas, and are more suitable for application scenarios with a plurality of unloading places at the bulk logistics transportation terminal. Meanwhile, an XGboost binary model is constructed based on the behavior feature set and the area feature set of the staying hot spot to identify a transportation terminal for calibration, and the method does not depend on a place marked when a user returns an order, and is more suitable for the field of bulk freight compared with the conventional express delivery place calibration method.

The protection content of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, which is set forth in the following claims.

Claims

1. A transportation terminal calibration method facing to bulk logistics is characterized by comprising the following steps:

s1, extracting a track point sequence of the freight car in the freight note time period as a freight note track based on the starting time and the finishing time of the freight car freight note;

s2, eliminating track points with speed values larger than a speed threshold value in the track, and searching for a matched road section of each track point, wherein the sections which cannot be matched are regarded as the sections of the driving road;

s3, extracting a subsequence with the speed value of 0 from the track point sequence of the part of the driving road in the step S2, taking a first track point in the subsequence as a stop point, clustering all the stop points to generate a stop point cluster as a stop area;

s4, obtaining road roll-out points corresponding to the stay points in the stay area according to the stay area in the step S3, grouping and clustering all the road roll-out points to generate road roll-out point clusters, and selecting the cluster center point position with the largest number of the road roll-out points as the road roll-out position corresponding to the stay area;

s5, merging all the staying areas corresponding to the road roll-out positions in the step S4 to obtain staying hot spots;

s6, extracting access behavior characteristics and area characteristics from the staying hot spots and the corresponding information, splicing characteristic values to obtain a multi-dimensional vector, and representing the staying hot spots;

s7, training an XGboost classification model by using the feature vectors and the labels of the stay hotspots to serve as a transportation destination recognition model;

and S8, extracting a candidate stop hot spot set matched with the transportation terminal of each waybill based on the waybill track, inputting the corresponding feature vector into the transportation terminal identification model constructed in the S7, obtaining the transportation terminal matched with each waybill, and updating the position and address information of the corresponding transportation terminal in a terminal library.

2. The transportation endpoint calibration method of claim 1, wherein in step S1, the orders are sorted for each truck according to their starting times for performing different orders; if the preorder waybill and the subsequent waybill have overlapped time sections, the task completion time of the preorder waybill is adjusted to be before the start of the task of the subsequent waybill.

3. The transportation terminal calibration method according to claim 1, wherein in step S2, the speed of a track point in the track is obtained by calculating the distance and time interval from a preceding track point; the speed threshold is set as the highest speed limit of the transport vehicle; distance track point distance threshold thr _r And taking the candidate roads in the range as the hidden states of the hidden Markov model, taking the distance between the track point and the vertical mapping point of the adjacent candidate roads as state measurement, and searching the optimal matching road section of the track point by a Viterbi algorithm.

4. The transportation terminal calibration method according to claim 1, wherein in step S3, the timestamp of the stop point is used as the stop start time, and the time interval between the first point and the last point in the sequence of the track point is calculated as the stop duration; clustering the stop points by a DBSCAN clustering method; the stop point cluster is the maximum set of stop points connected in density.

5. The transportation terminal calibration method according to claim 4, wherein the DBSCAN clustering method first randomly selects a stop point, searches for other stop point sets whose distance to the stop point is less than the minimum distance eps connected to the density, establishes a cluster for the stop point if the number of the stop points is greater than the minimum stop point number minisample which generates a cluster, otherwise marks the stop point as noise; then, traversing other stop points until a cluster is established, and merging the stop points with the directly reachable density into the cluster; iterating the steps until all the stop points are clustered into clusters or marked as noise points; at this point, each stop point cluster is regarded as a stop area; the dwell point with the reachable direct density means that the distance from any dwell point in the cluster is less than eps.

6. The transportation terminal calibration method according to claim 1, wherein in step S4, the last track point matched with the nearest neighbor section in the waybill track where the stop point of the stop area is located is extracted as a road roll-out point corresponding to the stop point, and the direction of the road roll-out point pointing to the subsequent track point is taken as the direction of the road roll-out point; and acquiring a road roll-out point set corresponding to the stay points in each stay area, and then clustering the road roll-out points by adopting a Meanshift clustering method.

7. The transportation endpoint calibration method of claim 6, wherein the Meanshift method randomly selects one of the road transit points and calculates an average of vector distances between other road transit points within the radius R of the road transit point as a drift direction and distance of the next road transit point, circulates the drift steps until the drift distance is less than a parameter D, and then classifies the traversed road transit points into a cluster; and extracting longitude and latitude coordinates of the central point of the cluster with the largest number of road roll-out points as the road roll-out position, and calculating the average direction of all the road roll-out points in the cluster to obtain the direction of the road roll-out position.

8. The transportation terminal calibration method according to claim 1, wherein in step S5, for the set of road roll-out positions extracted in S4, a hierarchical clustering method is adopted for clustering to generate different groups, and the stay areas corresponding to the roll-out positions of each group of roads are combined to obtain stay hot spots; obtaining the area range of the staying hot spot by using a convex hull algorithm, and calculating the central position of the area range as the position of the staying hot spot; and counting and storing the list of the waybill number, the list of the stay time length and the list of the stay start time corresponding to the stay point set in each stay hotspot.

9. The transportation endpoint calibration method of claim 1, wherein step S6 further comprises the steps of:

step 6.1, residence time distribution characteristic extraction: the following time intervals are divided: [ (0, 15min ], (15min, 30min ], (30min, 60min ], (60min, 120min ], (120 min, + ∞) ], the ratio of the number of the stay points in the different intervals is counted according to the stay time length list, and a 5-dimensional vector which is obtained by splicing the stay points and comprises the ratio of each time length interval is taken as the characteristic representation of the corresponding stay hot point;

step 6.2, access time interval distribution feature extraction: counting the number ratio of the stay points in each hour time interval based on the stay start time list, and splicing to form a feature representation of a 24-dimensional vector of the corresponding stay hot point in relation to the number ratio of the stay points in each hour in 24 hours;

step 6.3, extraction of the distribution characteristics of the staying frequency: representing the dwell frequency distribution characteristics as a vector [ fre ₁ ,fre ₂ ,……,fre _n ]Where n represents the historical highest frequency of stay at a stay hotspot, fre, among all transport vehicles _i The ratio of the number of the transportation tasks staying for i times in a staying hot spot is represented, and the value of the ratio is obtained according to the statistics of a corresponding waybill list, wherein 0<i≤n；

6.4, extracting the type and type characteristics of the goods: counting the quantity of the goods types related to the corresponding stay hotspots according to the freight bill list to obtain the goods type characteristics;

step 6.5, extracting the level features of the adjacent roads: extracting a road section closest to the stay hotspot based on an OSM open source map to obtain the road grade of the road section, and representing the characteristics of the adjacent road grade by adopting a single hot code mode;

6.6, extracting the feature of the nearby POI categories: for any staying hotspot, acquiring the number of different types of POI based on a POI query interface provided by a Gaode map, calculating the proportions of the different types of POI, and generating POI category characteristic representation of the staying hotspot after splicing the proportions of the different types of POI;

step 6.7, area feature extraction of the staying area: based on the area range of the staying hot spot obtained in the step S5, the convex polygon corresponding to the area range is expressed as a vertex sequence { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),……,(x _h ,y _h ) And by the formula

Obtaining the area of a staying hot spot area;

and 6.8, sequentially splicing the corresponding feature vectors to form a multi-dimensional vector as the feature representation of the stay hotspot.

10. The transportation endpoint calibration method of claim 1, wherein the set of dwell hotspot samples { h } is based on manual labeling ₁ ,h ₂ ,……,h _q And characterizing all marked stay hotspot samples as feature vectors by the method S6, and iteratively learning M regression trees by the feature vectors and the positive and negative sample labels of the stay hotspot samples through an XGboost model, wherein the regression trees are expressed as f _k (. 1) k is less than or equal to M, and the construction process of the regression tree is as follows:

wherein L (t) is the time of the t-th iterationThe objective function of y _i For the ith (i is more than or equal to 1 and less than or equal to q) stay hotspot sample h _i The real label of (a) is,

represents a training sample h _i Predicted value of model at t-1 model iteration, f _t (h _i ) Represents dwell hotspot sample h _i Predicting the model value in the t model iteration; gamma (f) _t ) Is a regular term, η and λ are regular term coefficients, T is the number of leaf nodes, w _j Represents the output value of the jth leaf node;

obtaining a stay hotspot h by summing the output scores of the M regression trees _i Fraction of (2), i.e. Scor _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) By means of logic functions

And mapping the probability value to be output, and taking the trained model as a final transportation terminal identification model.

11. The transportation terminal calibration method according to claim 1, wherein for each transportation terminal to be calibrated, a corresponding waybill track set is extracted, a corresponding stay hotspot is searched to form a candidate set according to the condition that at least one waybill track in the waybill set is included in a waybill list, the corresponding feature vector is input into a transportation terminal identification model constructed by S7, and the stay hotspot with the maximum model output probability value is used as the transportation terminal matched with the waybill; and then, acquiring the address of the transportation terminal by using the inverse geocoding service API of the Gade map, and accordingly completing the position calibration and address update corresponding to the original terminal.

12. A transportation terminal calibration system for implementing the transportation terminal calibration method according to any one of claims 1 to 11, wherein the transportation terminal calibration system comprises: the system comprises a data preprocessing module, a stay hotspot mining module and a destination library updating module;