CN115423841B

CN115423841B - Transportation end point calibration method and system for bulk logistics

Info

Publication number: CN115423841B
Application number: CN202210943420.8A
Authority: CN
Inventors: 吴涛; 毛嘉莉; 朱开旋; 沈文怡; 周傲英
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2023-06-23
Anticipated expiration: 2042-08-08
Also published as: CN115423841A

Abstract

The invention discloses a transportation end point calibration method for bulk logistics, which comprises the following steps: extracting a freight train waybill track based on the waybill time information; eliminating speed abnormal track points in the track, and obtaining a matching road of each track point; acquiring stay points in a subsequence with zero speed from a track point sequence of a driving-off road part, and clustering to generate a stay region; generating a road turning-out point cluster by using the road turning-out points corresponding to the stay points in the stay region, and selecting the central point position of the cluster to obtain the road turning-out position corresponding to the stay region; merging the stay areas according to the road turning-out positions to obtain stay hot spots; extracting various features and splicing to obtain multidimensional feature vectors to characterize stay hot spots; training a classification model by using the feature vector and the label of the stay hot spot to serve as a transportation end point recognition model; and inputting the feature vector of the transport hot spot to be identified into the constructed identification model, obtaining transport terminals matched with each freight note, and updating the terminals.

Description

Transportation end point calibration method and system for bulk logistics

Technical Field

The invention belongs to the technical field of data mining, and relates to a transportation end point calibration method and system for bulk logistics.

Background

In the field of bulk freight, customer ship-to sites of large-scale manufacturing enterprises are stored in address libraries as common transportation terminals, and are used for logistics applications such as vehicle dispatching, path planning, freight settlement and the like. However, due to the reasons of manual address input errors, transition of a receiving place and the like, a plurality of address blurring and even wrong transportation terminals exist in an address library, and great challenges are brought to the transportation link of a large amount of logistics. With the popularization of positioning equipment, the data such as historical driving tracks, waybills and the like which are continuously generated by the transport vehicle provide a data base for the calibration of the transport terminal. In summary, in order to ensure efficient operation of the freight link, it is needed to design a transportation end calibration method based on the freight bill track, and update the transportation end of the address library in time.

With the rapid development of express delivery services, calibration of express delivery sites is widely focused, and is accomplished mainly by identifying the actual delivery site by the position of a mark by the courier when the delivery of the express is completed. However, the delayed return of the bulk freight driver occurs when the position of the freight driver returns a bill and the actual transportation end position have large deviation, and the method cannot be directly applied to the calibration of the bulk logistics transportation end. Meanwhile, due to the "long distance transportation" nature of bulk shipments, the shippers often stay in gas stations, temporary rest areas, etc. during the transportation journey, where the stay hot spots and transportation terminals are immediately adjacent to each other, and where they are of different sizes, and possibly even have multiple stay areas, which presents challenges for accurate location and identification of bulk logistics transportation terminals.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a transportation end point calibration method for bulk logistics. The invention provides a stay hot spot identification method based on a road turning-out position based on the observation that a truck usually turns away from a similar position on a road and enters the same stay hot spot. Firstly, clustering stay points to mine stay areas based on a DBSCAN method; positioning dense positions of the road turning points by using a Meanshift method, and taking the dense positions as the road turning positions of the corresponding stay areas; then, identifying the turning-out positions of roads with adjacent positions by a hierarchical clustering method and merging corresponding stay areas to obtain stay hot spots; then, in order to accurately identify the transportation end point, the invention constructs an XGBoost transportation end point identification model based on the behavior characteristics and the regional characteristics of the stay hot spot; and finally, updating the endpoint library by using the model.

Further, the first stage of the invention is to preprocess the existing transportation data, match the waybill with the original track to obtain the waybill track set, and preprocess the track such as denoising and map matching the waybill track; in the second stage of the invention, stay hot spot mining, stay points in DBSCAN clustering operation sheet tracks are adopted to identify stay areas, a stay area merging strategy based on road exit points is provided for accurately identifying stay hot spots with a plurality of stay areas, the road exit points are clustered based on a Meanshift method to extract the road exit positions of the stay areas, and then the road exit positions are grouped by a hierarchical clustering method to merge the corresponding stay areas to obtain the stay hot spots; in the third stage, firstly, a behavior feature set and a region feature set of stay hot spots are extracted, a transportation end point identification model is constructed based on XGBoost, and finally, the model is used for identifying the transportation end point to finish position calibration and address updating.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

the invention provides a transportation end point calibration method for bulk logistics, which comprises the following steps:

s1: the method comprises the steps of extracting a waybill track, and sequencing the waybills according to the starting time of each truck for executing different waybills, wherein for partial preceding waybills overlapped with the occurrence time of the subsequent transportation task, the task completion time of the preceding waybill is adjusted on the condition that the task completion time of the preceding waybill is not later than the starting time of the subsequent waybill; then, a track point sequence (track point sequence is ordered by time) of the corresponding vehicle within the period is extracted as a waybill track based on the start time and the completion time of the waybill.

S2: and preprocessing the waybill track, calculating the speed of each track point in the waybill track based on the distance and the time interval between adjacent track points, and eliminating the track points with the speed value larger than a given speed threshold as abnormal track points. At the same time, a certain distance threshold thr from the track point _r The range (where thr is considered to be within 50 meters of the sampling error of the sampling device _r Set to 50 meters) as hidden states of the hidden markov model, measuring distances of the track points and the adjacent road vertical mapping points as states, and searching for a matching road section of each track point by using the Viterbi algorithm. On this basis, the track point sequence which cannot be matched with the road in the waybill track is regarded as a part of the driving-off road.

S3: and (2) mining a stay region, extracting the obtained track point sequence of the driving-off road in the step (S2) for each single-track, extracting all track point sub-sequences with zero speed value from the track point sequence of the driving-off road, and taking the first track point in all zero-speed track point sub-sequences as a stay point, namely taking the first track point in the zero-speed track point sub-sequences as a stay position of the transport vehicle. Subsequently, it is considered that the DBSCAN method can find clusters of arbitrary shape in noisy spatial databases, suitable for the nature of stay-area irregularities. Therefore, the DBSCAN clustering method is utilized to cluster all the stay points, so that a plurality of stay point clusters with larger stay point density are generated, and each stay point cluster is regarded as a stay region.

S4: the invention provides a road exit position extraction method, which is based on the observation that trucks usually exit from a similar position on a road and enter the same stay hot spot, namely, the exit points of the trucks from the road are adjacent to each other. Therefore, regarding the stay area obtained in the step S3, the last track point matched with the nearest neighboring road in the single track where each stay point is located is taken as the road turning-out point corresponding to the stay point; on the basis, acquiring a road turning-out point set corresponding to each stay area; and finally, considering that the Meanshift clustering method can accurately position the position with denser data distribution density, grouping the road turning points by using the Meanshift clustering method to generate a plurality of road turning point clusters, and selecting the central point position of the cluster with the largest number of the road turning points as the road turning position corresponding to the stay area.

S5: and merging the stay areas, and merging all the stay areas corresponding to the turning-out positions of the roads. Clustering the extracted road turning-out position set by adopting a hierarchical clustering method to generate different groups, and merging stay areas (stay point clusters) corresponding to the road turning-out positions of each group to obtain stay hot spots; for each stay hot spot, acquiring a minimum convex polygon containing all stay points in the corresponding stay hot spot by using a convex hull algorithm, and taking the minimum convex polygon as the area range of the stay hot spot; meanwhile, calculating the central positions of all the stay areas, and taking the central positions as the positions of the stay hot spots; then, for each stay hot spot, the information such as a stay time length list, a stay starting time list, a waybill number list and the like of the stay region set contained in the stay hot spot is counted.

S6: and (3) extracting characteristics, namely extracting access behavior characteristics of stay time, access time period, stay frequency, goods types and the like, and regional characteristics of POIs (Point Of Interest, i.e. points of interest, regions with specific functional significance, such as malls, gas stations and the like) and stay area and the like adjacent to roads and nearby on the basis of the information of the stay time list, the stay start time list, the waybill number list and the like which are obtained in the step (S5), and sequentially splicing corresponding characteristic values to form a multidimensional vector so as to represent the characteristic vector of each stay time, i.e. the stay time. And extracting access behavior characteristics such as residence time, access time period, residence frequency, cargo type and the like to form a behavior characteristic set, and forming regional characteristics such as adjacent roads, adjacent POI (Point Of Interest), residence area and the like to form a regional characteristic set.

S7: modeling a transportation terminal, extracting a transportation terminal set as a positive sample, manually marking the stay sites frequently visited by freight vehicles such as temporary rest areas, gas stations, maintenance points and the like as negative samples, characterizing all samples as feature vectors by using the method described in S6, and finally training an XGBoost classification model by using the feature vectors and the labels of the stay hot spot samples, namely constructing a classification model comprising a plurality of regression trees; the labels used to train the XGBoost classification model are labels representing positive and negative samples.

S8: and updating a transportation terminal library, extracting a candidate stay hot point set matched with the transportation terminal of each waybill based on the waybill track, inputting the characteristic vector of each stay hot point in the stay hot point set into the constructed transportation terminal identification model, taking the stay hot point with the maximum probability value of the model output as the transportation terminal matched with each waybill, and then acquiring address text information of the corresponding position by using an inverse geocoding service interface of the Goldmap, thereby updating the position and the address information of the corresponding transportation terminal in the terminal library.

In order to optimize the technical scheme, the specific measures adopted in each step further comprise:

the step S1 specifically includes:

Considering that the delayed return of the freight driver (here, the return of the freight train confirms the completion of the transport task) may cause overlapping of the occurrence periods of two consecutive freight trains, the invention sorts the freight trains according to the starting time of each freight train to execute different freight trains; secondly, extracting a waybill sequence in which task occurrence time periods overlap, and adjusting the ending time of a preceding waybill before the starting time of a subsequent waybill; and finally, extracting the track corresponding to each waybill, namely the waybill track, according to the starting time and the finishing time of different waybills of each truck.

The step S2 specifically includes:

due to signal interruption of the positioning device and artificial reasons, some points which are abnormal in performance may exist in the sampled track data, namely track points which are normal in sampling interval with the preamble track points and are far away from the preamble track points. These points that exhibit anomalies, also referred to as noise points, are mainly noise trajectory points of speed anomalies in the present invention, which will affect the accurate identification of subsequent dwell points, further affecting the accurate positioning of dwell areas. Therefore, the invention calculates the speed of each track point according to the distance and the time interval from the previous track point, and the speed is larger than the threshold thr _sp Is treated as a noise point and eliminated, the speed threshold is set as the highest speed limit of the transport vehicle (for example, the speed threshold thr is set in consideration of the highest speed limit of the truck being 100 km/h _sp Set to 27.7 meters/second); at the same time, a certain distance threshold thr from the track point _r Candidate roads within the range (set here to 50 meters) are used as hidden states of the hidden Markov model, and the track points are associated withAnd the distance between the adjacent candidate road vertical mapping points is used as a state measurement, and the Viterbi algorithm is utilized to find the best matching road section of each track point as the corresponding driving road section. On the basis, the track point sequence which cannot be matched with the road in the waybill track is regarded as the track point sequence which is driven off the road.

The step S3 specifically includes:

in order to avoid the influence of stay points generated by waiting for traffic lights, traffic jams and the like on the identification stay region, the step only focuses on unmatched track point sequences. Therefore, for the track point sequence of the road, which is obtained in the step S2, of each waybill track, for the continuous zero-speed track point subsequence, the first track point in the subsequence is extracted as a stop point, the time stamp of the point is taken as the stop start time, and the time interval between the first point and the last point of the sequence of the track point is calculated as the stop duration. Clustering all stay points by adopting a DBSCAN clustering method, wherein the DBSCAN method firstly randomly selects one stay point, searches other stay point sets with the distance from the stay point being smaller than eps, establishes a cluster for the point if the number of the stay points is larger than minsample, and marks the stay point as noise otherwise; then, traversing other stay points until a cluster is established, and merging the stay points with the direct density (namely, the distance from any stay point in the cluster is smaller than eps) into the cluster; the above steps are iterated until all stay points are clustered or marked as noise points. So far, each stay point cluster is regarded as a stay region, the stay point clusters are the largest set of the stay points connected with each other in density, in order to avoid clustering the stay points on two sides of a road into the same cluster, the clustering parameters eps are set to be 5 meters according to the width of the smallest road (namely, the minimum distance connected with the density is 5 meters), and the minsample is 5 (namely, the number of the smallest stay points generating one cluster is 5). In the invention, the setting of the clustering parameters is mainly carried out according to the clustering purpose and the actual scene, the clustering radius eps is set to 5 meters according to the minimum width of the road section in the actual application scene, namely, the distance between the stop points positioned on two sides of the road is larger than 5 meters, the stop points cannot be clustered into one stop point cluster (namely, the stop points cannot be identified as one stop area), and the minsample is mainly set according to experience.

The step S4 specifically includes:

based on the stay area obtained in the step S3, extracting the last track point matched with the nearest neighbor road section in the single track where the stay point is located as a road turning point corresponding to the stay point, and taking the direction of the road turning point pointing to the subsequent track point as the direction of the road turning point; on the basis, a road outturn point set corresponding to the stay points in each stay region is obtained, and then the road outturn points are clustered by a Meanshift clustering method. The Meanshift method randomly selects one of the road departure points, calculates the average value of vector distances between other road departure points within a given radius R of the road departure point, takes the average value as the drift direction and distance of the next step of the road departure point, circularly calculates the drift direction and distance of the next step of the road departure point until the drift distance is smaller than a given parameter D, and then classifies the road departure points traversed until the drift distance is smaller than the parameter D into a cluster. The cluster parameters R, D are empirically set to 30 meters and 10 meters, respectively. In the invention, the road turning point with the distance less than 30 meters is regarded as a road turning position which is empirically set according to the distribution of the road turning points in the actual application scene. And 10 meters as the convergence condition for cluster centroid updating. And finally, extracting longitude and latitude coordinates of the central point of the cluster with the largest number of the road exit points as the road exit position, and calculating the average direction of all the road exit points in the cluster to obtain the direction of the road exit position.

The step S5 specifically includes:

and (4) generating different out-of-position groups by adopting a hierarchical clustering method for the out-of-position set of the road extracted in the step (S4). Specifically, each road turning-out position is first regarded as one cluster, the distance is minimized and the direction difference is smaller than the threshold thr _dir The road-exit position clusters are merged (thr is set in the present invention _dir 15 degrees); secondly, recalculating the average position and the average direction of the merging clusters; the above steps are iteratively performed until no distance is less than the distance threshold thr _dis (thr is set in the present invention) _dis 10 meters) and a direction difference of less than thr _dir To a cluster of (3); then merging the stay areas corresponding to the turning-out positions of each group of roads to obtain stay hot spots; and for each stay hot spot, extracting the minimum convex polygon covering all the stay points by using a convex hull algorithm, taking the minimum convex polygon as the area range of the stay hot spot, and calculating the central position of the minimum convex polygon as the position of the stay hot spot. And finally, respectively counting a list of waybills, a list of stay time length, a list of stay start time and the like corresponding to the stay point set in each stay hot point, and storing.

The step S6 specifically includes:

step 6.1) residence time distribution feature extraction: it has been observed that the stay-on behavior of trucks at the end of transportation typically takes 30 to 60 minutes, 10 to 15 minutes at gas stations, and several hours at rest areas, maintenance points, etc. at stay-on hotspots. Thus, the present invention divides the following time intervals: [ (0, 15min ], (15 min,30min ], (30 min,60min ], (60 min,120min ], (120 min, +++ ], counting [ (0, 15min ], (15 min,30min ], (30 min,60min ], (60 min,120min ], (120 min), and (3) taking the 5-dimensional vector which is obtained by splicing the occupancy rates of the stay points in different intervals and comprises the occupancy rate of each duration interval as the characteristic representation of the corresponding stay hot spot.

Step 6.2) access period distribution feature extraction: based on the dwell start time list, counting the dwell point number duty ratio in each period (1 hour is taken as a time interval), and splicing to form the characteristic representation (namely 24-dimensional vector representation) of the dwell point number duty ratio of the corresponding dwell hot spot in each hour in 24 hours.

Step 6.3) extracting residence frequency distribution characteristics: the stay frequency distribution characteristic is expressed as a vector [ fre ] ₁ ,fre ₂ ,……,fre _n ]Where n represents the historical highest dwell frequency at a dwell hot spot in all transport vehicles, where fre _i (0<i.ltoreq.n) represents the number of transport tasks to be stopped i times in a stop hot spot, the value of which is obtained according to statistics of the corresponding list of the freight list.

Step 6.4) cargo type category feature extraction: and counting the number of the goods types related to the corresponding stay hot spots according to the list of the freight list to represent the characteristic.

Step 6.5) adjacent road level feature extraction: the road level of the road section closest to the stay hot spot is obtained based on the OSM open source map, and the feature is represented by adopting a single-heat coding mode (namely N-bit state registers are used for coding N road level states).

Step 6.6) extracting the category characteristics of the nearby POIs: according to investigation, in the field of bulk logistics, a large number of factories and companies are usually located near a transportation terminal; the transportation driver tends to select the space nearby the restaurant to stay for rest and for convenient dining; while stay hotspots at gas stations, repair points, etc. are typically located in POI aggregation areas of the automotive maintenance type. Therefore, for any stay hot spot, the method acquires the quantity of the types of POIs such as factories, companies, catering, automobile maintenance, gas stations and the like within the range of 3000 meters based on the POI query interface provided by the Goldmap, calculates the duty ratio of the POIs of different types, and then splices the POIs to generate the characteristic representation of the stay hot spot.

Step 6.7) stay area feature extraction: based on the area range of stay hot spots obtained in the step S5, the corresponding convex polygon is expressed as a vertex sequence { (a) ₁ ,b ₁ ),(a ₂ ,b ₂ ),……,(a _h ,b _h ) And pass through the formula

Obtaining the area of the stay hot spot area, wherein a _i 、b _i Respectively the coordinates of each vertex of the convex polygon; .

And 6.8) finally, sequentially splicing the corresponding feature vectors to form a multidimensional vector serving as the feature representation of the stay hot spot.

The step S7 specifically includes:

stay hot spot sample set { h) through manual labeling ₁ ,h ₂ ,……,h _q And (3) training a transportation end recognition model. Characterizing all marked stay hot spot samples as feature vectors by the method described in S6And iteratively learn M regression trees (denoted as f) through XGBoost model using feature vectors and their labels of these stay hot spot samples _k (. Cndot.) (1. Ltoreq.k. Ltoreq.M)), the construction process is as follows:

wherein L (t) is the objective function at the t-th iteration,

as a loss function in XGBoost, y _i Stay hot spot sample h for i (1.ltoreq.i.ltoreq.q) _i Is (are) true tags->

Representing training sample h _i Predicted value of model at t-1 model iteration, f _t (h _i ) Representing stay hot spot sample h _i Model predictive value at the t model iteration; Γ (f) _t ) Is a regular term, eta and lambda are regular term coefficients, T is the number of leaf nodes, w _j Representing the output value of the j-th leaf node.

Subsequently, the stay hot spot h is obtained by summing the output scores of the M regression trees _i Score of (i.e. Scor) _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) By logic function

Mapping it to a probability value output. And taking the model obtained by training as a final transportation end point identification model (M regression trees, outputting a probability value that a stay hot spot is a transportation end point).

The step S8 specifically includes:

and for each transportation terminal point to be calibrated, extracting a corresponding waybill track set, searching a corresponding stay hot spot to form a candidate set according to the condition that at least one waybill track in the waybill list is contained, inputting a corresponding feature vector of the candidate set into a transportation terminal point identification model constructed by the S7, and taking the stay hot spot with the maximum probability value of the model output as a transportation terminal point matched with the waybill. And then, the address of the transportation destination is acquired by using an inverse geocoding service API of the Goldmap, and accordingly, the position calibration and the address updating of the corresponding original destination are completed.

The invention also provides a transportation end calibration system for realizing the transportation end calibration method, which comprises the following steps: the system comprises a data preprocessing module, a stay hot spot mining module and a terminal library updating module;

The data preprocessing module is used for intercepting a waybill track based on a task period, denoising track points based on speed abnormality and matching a map based on a Viterbi algorithm;

the stay hot spot mining module is used for stay point clustering based on DBSCAN, road driving-off position identification based on Meanshift and stay point cluster merging based on hierarchical clustering;

the terminal library updating module is used for extracting a characteristic set based on a behavior-area, modeling a transportation terminal based on XGBoost, and identifying and calibrating the transportation terminal.

The invention has the following beneficial effects:

1. based on the observation that drivers all roll out from similar positions on roads and enter stay hot spots, the invention obtains the last track point of a matched road in a waybill track as a road roll-out point based on map matching, clusters the road roll-out points by adopting Meanshift to locate the road roll-out positions of corresponding stay areas, and finally identifies and combines different stay areas belonging to the same stay hot spot based on a hierarchical clustering method of road steering positions so as to identify transportation end points with different sizes and a plurality of stay areas in a bulk logistics.

2. According to the method, the XGBoost two-class model is built to identify the transportation terminal point for calibration by combining the residence time length, the access time period, the residence frequency, the cargo type and other behavior feature sets of the residence area and the regional feature sets of the adjacent road sections, the nearby POIs and the residence area.

3. Based on a real bulk logistics data set, compared with the existing delivery site calibration method of express mail logistics, the average absolute error (MAE) of the method is minimum, and compared with a suboptimal method, the method is improved by about 90.86%; the transport endpoint duty cycle for the calibration error distance within 3000m is highest, which is improved by about 72.92% compared to the suboptimal method.

Drawings

FIG. 1 is a block flow oriented transport endpoint calibration technique framework.

FIG. 2 is a schematic structural diagram of a transportation end point recognition model constructed in accordance with the present invention, which is composed of M regression trees for a stay hot spot h to be predicted _i Each regression tree in the transportation end point identification model outputs a scoring value, and adds the scoring values to obtain a final score, and the score is mapped into a probability value through a logic function to serve as the probability that the stay hot spot is of the transportation end point type.

FIG. 3 is a graph comparing the stay hot spot recognition results of different methods.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific references, and the present invention is not particularly limited.

In view of the fact that the existing calibration method of the transportation delivery site only confirms the judgment of the goods delivery position according to express delivery personnel and is not suitable for a large amount of freight scenes with a large amount of delayed return orders, in order to accurately identify transportation terminals with different sizes and characteristics of being adjacent to each other, the invention provides a transportation terminal calibration strategy based on the road delivery position. Firstly, clustering and excavating a stay area for stay points by adopting a DBSCAN method; meanwhile, clustering a plurality of road turning points which are extracted through map matching and correspond to the stay points by using a Meanshift method to identify road turning positions corresponding to different stay areas; on the basis, a plurality of stay areas belonging to the same road turning-out position are combined to obtain stay hot spots. And then, respectively extracting access behavior characteristics including stay time, access time period, stay frequency, cargo type and the like, and regional characteristics including adjacent road sections, nearby POIs, stay regional areas and the like from stay hot spots, constructing a two-class model by using an XGBoost method to identify a transportation terminal, and updating a transportation terminal library by using the two-class model.

Specifically, the invention discloses a transportation end point calibration method for bulk logistics. As shown in fig. 1, the method includes three stages. In the data preprocessing stage, acquiring a waybill track set based on a historical track and a waybill, identifying an abnormal track point based on the speed of the track point, and acquiring a matching section of the track point based on Viterbi map matching; in a stay hot spot mining part, a DBSCAN cluster stay point is adopted to detect stay areas, road departure points are clustered based on a Meanshift method to identify road departure positions of the corresponding stay areas, different road departure position groups are generated through a hierarchical clustering method on the basis, and stay hot spots are obtained by combining the stay areas corresponding to the road departure positions of each group; and in the end point library updating stage, extracting a behavior feature set and a region feature set of the stay hot spot, constructing a transportation end point identification model based on XGBoost, and finally, identifying a transportation end point from the stay hot spot based on the model and carrying out position calibration and address updating on an original end point.

As shown in fig. 1, the present invention employs a three-stage shipping endpoint calibration framework comprising the following eight steps:

s1: the method comprises the steps of extracting a waybill track, and sequencing the waybills according to the starting time of each truck for executing different waybills, wherein for partial waybills overlapped with the occurrence time of the subsequent transportation task, the task completion time of the partial waybills is adjusted on the condition that the task completion time is not later than the starting time of the subsequent waybill; then, the track corresponding to each waybill, namely the waybill track, is extracted based on the starting time and the finishing time of different waybills of each truck.

In an embodiment, step S1 specifically includes:

S2: and carrying out waybill track pretreatment, calculating the speed of each track point for each waybill track, and eliminating the track points which are normal in sampling time interval with the preamble track points and far away from the preamble track points as speed abnormal track points, namely eliminating the track points with speed values exceeding a speed threshold value. Meanwhile, map matching is carried out on the waybill track by using a Viterbi method to obtain a track point set matched with the road section and a track point set not matched with the road section.

In an embodiment, step S2 specifically includes:

due to signal interruption of the positioning device and artificial reasons, some abnormal points exist in the original track, which are shown to be normal in sampling interval with the preamble points and far away. To avoid the influence of these noise points on the recognition of the dwell point, the present invention calculates the velocity of each trace point based on the distance and time interval from the preceding trace point, and the velocity is greater than the threshold thr _sp Is considered as noise and is eliminated (taking into account the maximum speed limit of the truck is 100 km/h, speed threshold thr _sp Set to 27.7 meters/second); meanwhile, map matching is carried out on the waybill track based on the Viterbi method to obtain a matching road section of the track point, and unmatched track sections in the matching road section are marked.

S3: and (3) mining a stay region, namely extracting a first point in the zero-speed track point sequence from the unmatched track point sequence extracted in the step (S2) of each single track as a stay point, and clustering and mining the stay region (namely, a stay point cluster) by adopting a DBSCAN method for all the stay points.

In an embodiment, step S3 specifically includes:

in order to avoid the influence of stay points generated by waiting for traffic lights, traffic jams and the like on the identification stay region, the step only focuses on unmatched track point sequences. Therefore, for the unmatched track point sequence of each waypoint track in the step S2, the first point of the track point sequence with continuous zero speed is extracted as a stop point, the time stamp of the point is taken as the stop start time, and the time interval of the track point sequence is calculated as the stop duration. Finally, the stay points are clustered by DBSCAN to dig a stay area (namely a stay point cluster), in order to avoid clustering the stay points at two sides of the road into a cluster, a clustering parameter eps is set to be 5 meters and min according to the width of the minimum road _sample Is set to 5.

S4: the invention provides a road exit position extraction method, which is based on the observation that trucks usually exit from a similar position on a road and enter the same stay hot spot, namely, the exit points of the trucks from the road are adjacent to each other. Therefore, regarding the stay area obtained in the step S3, the last track point matched with the nearest neighboring road in the single track where each stay point is located is taken as the road turning-out point corresponding to the stay point; on the basis, acquiring a road turning-out point set corresponding to each stay area; and finally, clustering the road turning points by using a Meanshift method, and taking the position with dense road turning points as the road turning position corresponding to the stay area.

In an embodiment, step S4 specifically includes:

based on the stay area obtained in the step S3, extracting the last track point matched with the nearest neighbor road section in the single track where the stay point is located as a road turning point corresponding to the stay point; on the basis, a road turning point set corresponding to the stay points in each stay region is obtained, then the road turning points corresponding to the stay points are clustered by adopting a Meanshift method, the longitude and latitude of the central point of the maximum cluster of the standard model with the maximum number of the road turning points are extracted as the road turning positions, and the average direction of the road turning points in the cluster is calculated as the direction of the road turning points.

S5: and merging the stay areas, and merging all the stay areas corresponding to the turning-out positions of the roads. Clustering the extracted road turning-out position set in the S4 by adopting a hierarchical clustering method to generate different groups, and merging stay areas corresponding to the road turning-out positions of each group to obtain stay hot spots; for each stay hot spot, acquiring a minimum convex polygon covering all stay areas by using a convex hull algorithm, and taking the minimum convex polygon as an area range covered by the stay hot spot; meanwhile, calculating the central positions of all the stay areas, and taking the central positions as the positions of the stay hot spots; and then, counting the information such as a stay time length list, a stay starting time list, a waybill list and the like of the stay region set contained in each stay hot spot.

In an embodiment, step S5 specifically includes:

and (4) generating different out-of-position groups by adopting a hierarchical clustering method for the out-of-position set of the road extracted in the step (S4). Specifically, each road turning-out position is first regarded as one cluster, the distance is minimized and the direction difference is smaller than the threshold thr _dir Is combined with the dwell point cluster (thr is set in the present invention) _dir 15 degrees); secondly, recalculating the average position and the average direction of the merging clusters; the above steps are iteratively performed until no distance is less than the distance threshold thr _dis (thr _dis The present experience is set to 10 meters) and the direction difference is less than thr _dir Is a cluster of (a); then merging the stay areas corresponding to the road turning-out positions of each group to obtain stay hot spots; and for each stay hot spot, extracting the minimum convex polygon covering all the stay points by using a convex hull algorithm, taking the minimum convex polygon as the area range of the stay hot spot, and calculating the central position of the minimum convex polygon as the position of the stay hot spot. And finally, respectively counting a list of waybills, a list of stay time length and a list of stay start time corresponding to the stay point set in each stay hot point, and storing.

S6: and (3) extracting characteristics, namely extracting access behavior characteristics such as stay time length, access time period, stay frequency, goods type and the like, and region characteristics including adjacent road level, nearby POI type, stay area and the like based on the stay hot spot information obtained in the step (S5), and carrying out characteristic vector representation on each stay hot spot.

In an embodiment, step S6 specifically includes:

step 6.1) residence time distribution feature extraction: it is observed that the residence time of trucks at the end of transportation is mainly distributed over 30 to 60 minutes, the residence time of gas stations is distributed over 10 to 15 minutes, and the residence time of stay hot spots in rest areas, maintenance points, etc. is usually several hours. Thus, the present invention divides the following time intervals: [ (0, 15min ], (15 min,30min ], (30 min,60min ], (60 min,120 min), (120 min, +++ ], and counting the duty ratio of the number of the stay points in each interval based on the stay time length list, and splicing to form the characteristic representation of the corresponding stay hot spot.

Step 6.2) access period distribution feature extraction: based on the dwell start time list, counting the dwell point number duty ratio in each period (taking 1 hour as a time interval), and splicing to form the characteristic representation of the corresponding dwell hot spot.

Step 6.3) extracting residence frequency distribution characteristics: the stay frequency distribution characteristic is expressed as a vector [ fre ] ₁ ,fre ₂ ,……,fre _n ]Where n represents the historical maximum stay frequency (by analysis of historical data, n is set to 8 in the present invention) at one stay hot spot in all transport vehicles, fre _i (0<i.ltoreq.n) represents the number of transport tasks to stay i times in a stay hot spot, the value of which is obtained based on the corresponding list statistics.

Step 6.4) cargo type category feature extraction: based on the list of waybills, counting the number of types of goods involved in the corresponding stay hot spot represents the feature.

Step 6.5) adjacent road level feature extraction: and extracting a road section closest to the stay hot spot based on the OSM map, acquiring the road grade of the road section, and representing the characteristics in a single-heat coding mode.

Step 6.6) extracting the category characteristics of the nearby POIs: based on business investigation, there are typically a large number of factories, companies near the shipping end in bulk logistics areas; the transportation driver usually selects the space nearby the restaurant to stay for rest and convenient dining; while stay hot spots at gas stations, repair points, etc. are typically located in POI aggregation areas of the type associated with automotive maintenance. Therefore, the method acquires the quantity of the types of POIs such as factories, companies, catering, automobile maintenance and gas stations in the range of 3000 meters based on the POI query interface, calculates the duty ratio of the POIs, and splices the POIs to form the characteristic representation of the corresponding stay hot spot.

And obtaining the area value of the stay hot spot area.

And finally, sequentially splicing the corresponding feature vectors to form the feature representation of the corresponding stay hot spot.

S7: and (3) modeling a transportation terminal, extracting a transportation terminal set as a positive sample based on the characteristic vector representation of the stay hot spot obtained in the step (S6), manually marking the stay place visited by the freight vehicles such as a temporary rest area, a gas station, a maintenance point and the like as a negative sample, and constructing a binary classification model for judging the transportation terminal by using an XGBoost method.

In an embodiment, step S7 specifically includes:

based on the labeled training sample set, M regression trees (denoted as f) are iteratively learned by XGBoost model _k (. Cndot.) (1. Ltoreq.k. Ltoreq.M), where M is set to 10), as shown in FIG. 2, the construction process is as follows:

wherein L (t) is the objective function at the t-th iteration, y _i For the ith stay hot spot sample h _i The true label (1: end of transport 0: other stay hot spot),

Subsequently, a stay hot point h is obtained by summing the output scores of the M regression trees _i Score of (i.e. Scor) _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) And maps it into probability values through a logic function for output. And finally, reserving the trained transportation hot spot recognition model (M regression trees) for a long time.

S8: and updating a transportation terminal library, namely extracting a candidate stay hot point set matched with the transportation terminal of each waybill based on the waybill track, inputting the corresponding feature vector into the transportation terminal identification model constructed in the step S7, taking the stay hot point with the maximum probability value of the model output as the transportation terminal matched with each waybill, and then acquiring the address of the corresponding position by using the inverse geocoding API of the Goldmap, thereby updating the position and the address of the corresponding transportation terminal in the terminal library.

In an embodiment, step S8 specifically includes:

and for each original destination to be calibrated, extracting a corresponding waybill track set, searching a corresponding stay hot spot to form a candidate set based on the condition that at least one waybill track in the waybill list is contained, inputting a corresponding feature vector into the transportation destination identification model constructed in the S7, and taking the stay hot spot with the maximum probability value of the model output as a matched transportation destination. And then, the address of the transportation destination is acquired by using an inverse geocoding service API of the Goldmap, and accordingly, the position calibration and the address updating of the corresponding original destination are completed.

In order to verify the effectiveness of the method, a real large amount of logistics data is selected to identify stay hot spots (the output is a convex polygon), the actual stay hot spot areas (expressed as polygons) are manually marked, and the identification results of the transportation end point and other stay hot spots are respectively compared with the identification results based on the traditional clustering method (K-Means, hierarchical clustering and OPTICS, DBSCAN); the evaluation is performed by the intersection ratio (IoU) of the detection region and the actual labeling region. As shown in FIG. 3, the method for identifying the stay hot spot has higher IoU value, and has more obvious identification effect on the transportation end point.

Then, carrying out transportation end point calibration based on the partial data, and comparing the calibration result with other existing delivery location calibration method results; methods of selecting existing delivery sites include DTInf, geoCloud, U-Net, which rely on courier-marked delivery locations where the location of the delivery driver when returning to the order is implemented as the marked delivery location; the evaluation indexes selected are MAE (mean absolute error), P85 (85% maximum calibration error), beta _k (percentage of calibration samples within a given range of distance errors K). Calibration results are shown in Table 1, with the end point calibration of the present invention having minimal MAE (about 90.86% improvement over the inferior method UNet) and P85 values (about 72.92% improvement over the inferior method UNet) while for beta ₅₀₀ 、β ₁₀₀₀ 、β ₃₀₀₀ The evaluation indexes all have the largest percentage ratio.

Table 1 comparison table of calibration effects for different methods

In summary, the method and the system for identifying the stay hot spot based on the road driving-away position more accurately identify the stay hot spot with a plurality of stay areas, and are more suitable for application scenes with a plurality of unloading places at the mass logistics transportation terminal. Meanwhile, an XGBoost two-class model is built based on the behavior feature set and the regional feature set of the stay hot spot to identify a transportation end point for calibration, and the method is independent of marked places when users return to a bill, and is more suitable for the field of bulk freight transportation compared with the existing express delivery place calibration method.

The protection of the present invention is not limited to the above embodiments. Variations and advantages that would occur to one skilled in the art are included within the invention without departing from the spirit and scope of the inventive concept, and the scope of the invention is defined by the appended claims.

Claims

1. A method of calibrating a shipping endpoint for a bulk stream, the method comprising the steps of:

step S1, extracting a track point sequence of a truck in a bill time period as a bill track based on the starting time and the finishing time of the bill of the truck;

S2, eliminating track points with speed values larger than a speed threshold in the track, and searching for a part which is regarded as a driving-off road and cannot be matched to obtain a matched road section of each track point;

s3, extracting a subsequence with a speed value of 0 from the track point sequence of the road part driven in the step S2, taking the first track point in the subsequence as a stay point, clustering all the stay points to generate a stay point cluster, and taking the stay point cluster as a stay area;

step S4, obtaining the road turning points corresponding to the stay points in the stay area according to the stay area in the step S3, clustering all the road turning points in groups to generate road turning point clusters, and selecting the central point position of the cluster with the largest number of the road turning points as the road turning position corresponding to the stay area;

step S5, merging all the stay areas corresponding to the road turning-out positions in the step S4 to obtain stay hot spots;

s6, extracting access behavior features and area features from the stay hot spots and the corresponding information thereof, and splicing the feature values to obtain multidimensional vectors to characterize the stay hot spots, namely feature vectors of the stay hot spots;

s7, training an XGBoost classification model by using the feature vector and the label of the stay hot spot to serve as a transportation terminal identification model; stay hot spot sample set { h ] based on manual annotation ₁ ,h ₂ ,……,h _q Characterizing all marked stay hot spot samples as feature vectors by the method described in S6, anditeratively learning M regression trees through an XGBoost model by utilizing the feature vectors of the stay hot spot samples and positive and negative sample labels, wherein the regression trees are expressed as f _k (. Cndot.) 1.ltoreq.k.ltoreq.M, the construction process of the regression tree is as follows:

wherein L (t) is the objective function at the t-th iteration, y _i Stay hot spot sample h for i (1.ltoreq.i.ltoreq.q) _i Is a real tag of the (c) in the (c),

representing training sample h _i Predicted value of model at t-1 model iteration, f _t (h _i ) Representing stay hot spot sample h _i Model predictive value at the t model iteration; Γ (f) _t ) Is a regular term, eta and lambda are regular term coefficients, T is the number of leaf nodes, w _j An output value representing a j-th leaf node;

stay hot point h is obtained by summing the output scores of the M regression trees _i Score of (i.e. Scor) _i ＝f ₁ (h _i )+f ₂ (h _i )+…+f _M (h _i ) By logic function

Mapping the model into a probability value to output, and taking the model obtained through training as a final transportation end point recognition model;

and S8, extracting a candidate stay hot point set matched with the transportation terminal of each waybill based on the waybill track, inputting the characteristic vector of each stay hot point in the stay hot point set into the constructed transportation terminal identification model S7, obtaining the transportation terminal matched with each waybill, and updating the position and address information of the corresponding transportation terminal in the terminal library.

2. The transportation end calibration method according to claim 1, wherein in step S1, the shipping slips are ordered for each truck according to their start times of executing different shipping slips; and if the preceding and the following waybills overlap in time periods, adjusting the task completion time of the preceding waybill to be before the task of the following waybill starts.

3. The transportation end calibration method according to claim 1, wherein in step S2, the track point speed in the track is obtained by calculation of a distance and a time interval from a preceding track point; the speed threshold is set as the highest speed limit of the transport vehicle; will be distant from the track point by a distance threshold thr _r And taking the candidate road in the range as the hidden state of the hidden Markov model, taking the distance between the track point and the vertical mapping point of the adjacent candidate road as the state measurement, and searching the best matching road section of the track point through the Viterbi algorithm.

4. The transportation end calibration method according to claim 1, wherein in step S3, a time stamp of the stop point is used as a stop start time, and a time interval between a first point and a last point of the sequence in which the track point is located is calculated as a stop duration; clustering the stay points is performed by a DBSCAN clustering method; the dwell point cluster is the largest set of dwell points that are connected in density.

5. The transportation end calibration method according to claim 4, wherein said DBSCAN clustering method randomly selects a dwell point first, searches for other dwell point sets having a minimum distance eps less than a density connection to said dwell point, establishes a cluster for dwell points if the number of dwell points is greater than a minimum dwell point minsample that produces a cluster, and otherwise marks the dwell point as noise; then, traversing other stay points until a cluster is established, and merging the stay points with the directly reachable density into the cluster; iterating the step of the DBSCAN clustering method until all stay points are clustered into clusters or marked as noise points; so far, each stay point cluster is regarded as a stay area; the dwell point with the direct density being reachable refers to a dwell point with a distance from any dwell point in the cluster smaller than eps.

6. The transportation end calibration method according to claim 1, wherein in step S4, a last track point matched with a nearest neighbor road section in a waypoint track of the stay area is extracted as a road exit point corresponding to the stay point, and the direction of the road exit point to a subsequent track point is taken as the direction of the road exit point; and acquiring a road roll-out point set corresponding to the stay points in each stay region, and then clustering the road roll-out points by adopting a Meanshift clustering method.

7. The transportation end calibration method according to claim 6, wherein said Meanshift clustering method randomly selects one of the road exit points and calculates an average value of vector distances between other road exit points within the radius R of the road exit point as a drift direction and distance of the next step of the road exit point, and the steps of circularly calculating the drift direction and distance of the next step of the road exit point until the drift distance is less than the parameter D, and then classifies the road exit points traversed until the drift distance is less than the parameter D as one cluster; and extracting longitude and latitude coordinates of the central point of the cluster with the maximum number of the road roll-out points as the road roll-out position, and calculating the average direction of all the road roll-out points in the cluster to obtain the direction of the road roll-out position.

8. The transportation end calibration method according to claim 1, wherein in step S5, for the set of road exit positions extracted in step S4, clustering is performed by using a hierarchical clustering method to generate different groups, and stay areas corresponding to the road exit positions of each group are combined to obtain stay hot spots; obtaining the region range of the stay hot spot by using a convex hull algorithm, and calculating the central position of the region range as the position of the stay hot spot; and counting a list of waybill numbers, a list of stay time length and a list of stay start time corresponding to the stay point set in each stay hot point, and storing the list.

9. The transportation end calibration method according to claim 1, wherein step S6 further comprises the steps of:

step 6.1, extracting residence time distribution characteristics: the following time intervals are divided: the method comprises the steps of counting the duty ratio of the number of the stay points in different time intervals according to a stay time length list, and splicing the duty ratio of the number of the stay points in different time intervals to obtain a 5-dimensional vector which comprises the duty ratio of each time interval and is used as a characteristic representation of a corresponding stay hot spot;

step 6.2, access period distribution feature extraction: counting the number proportion of the stay points in each hour interval based on the stay start time list, and splicing to form a characteristic representation of the corresponding stay hot spot about a 24-dimensional vector of the number proportion of the stay points in each hour in 24 hours;

step 6.3, extracting residence frequency distribution characteristics: the stay frequency distribution characteristic is expressed as a vector [ fre ] ₁ ,fre ₂ ,……,fre _n ]Where n represents the historical highest dwell frequency, fre, at one dwell hot spot in all transport vehicles _i The number of transport tasks which are stopped i times in one stop hot spot is represented as the ratio, and the value of the number is obtained according to corresponding list statistics of the freight list, wherein 0 <i≤n；

Step 6.4, extracting cargo type characteristics: according to the freight list, counting the number of freight types related to the corresponding stay hot spot to obtain the characteristics of the freight type types;

step 6.5, extracting adjacent road level features: extracting a road section closest to the stay hot spot based on an OSM open source map to obtain the road grade, and representing the adjacent road grade characteristics in a single-heat coding mode;

step 6.6, extracting the category characteristics of the nearby POIs: for any stay hot spot, acquiring the number of POIs of different types based on POI query interfaces provided by the Goldmap, calculating the occupation ratio of the POIs of different types, and generating POI category characteristic representations of the stay hot spot after the occupation ratio of the POIs of different types is spliced;

step 6.7, extracting area characteristics of the stay area: based on the area range of stay hot spots obtained in the step S5, the corresponding convex polygon is expressed as a vertex sequence { (a) ₁ ,b ₁ ),(a ₂ ,b ₂ ),… …,(a _h ,b _h ) And pass through the formula

Obtaining the area of a stay hot spot area;

and 6.8, sequentially splicing the feature vectors obtained in the steps 6.1-6.7 to form a multidimensional vector serving as the feature representation of the stay hot spot.

10. The transportation end calibration method according to claim 1, wherein for each transportation end to be calibrated, a corresponding waybill track set is extracted, a candidate set is formed by searching for corresponding stay hot spots according to the condition that at least one waybill track in the waybill list is included, and feature vectors of each stay hot spot in the candidate set are input into a transportation end identification model constructed by S7, and the stay hot spot with the largest probability value is output by the model as the transportation end matched with the waybill; and then, the address of the transportation destination is acquired by using an inverse geocoding service API of the Goldmap, and accordingly, the position calibration and the address updating of the corresponding original destination are completed.

11. A transportation end calibration system for implementing the transportation end calibration method of any one of claims 1-10, wherein the transportation end calibration system comprises: the system comprises a data preprocessing module, a stay hot spot mining module and a terminal library updating module;