CN110555448B - Method and system for subdividing dispatch area - Google Patents

Method and system for subdividing dispatch area Download PDF

Info

Publication number
CN110555448B
CN110555448B CN201810538551.1A CN201810538551A CN110555448B CN 110555448 B CN110555448 B CN 110555448B CN 201810538551 A CN201810538551 A CN 201810538551A CN 110555448 B CN110555448 B CN 110555448B
Authority
CN
China
Prior art keywords
cluster
address
addresses
clusters
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810538551.1A
Other languages
Chinese (zh)
Other versions
CN110555448A (en
Inventor
白文勇
杜堃
雷紫霖
王晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201810538551.1A priority Critical patent/CN110555448B/en
Publication of CN110555448A publication Critical patent/CN110555448A/en
Application granted granted Critical
Publication of CN110555448B publication Critical patent/CN110555448B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0835Relationships between shipper or supplier and carriers
    • G06Q10/08355Routing methods

Abstract

The invention discloses a method for subdividing dispatch areas, which comprises the following steps: s1, obtaining a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of an courier in a certain unit area; s2, aiming at the proper time and the proper address in a specific day, clustering the proper addresses according to the proper time to obtain a plurality of address clusters; s3, the address clusters in the step S2 are re-divided according to the text similarity among the addresses to obtain a plurality of small cluster address clusters; s4, repeating the steps S2-S3, obtaining small cluster address clusters of different days, combining the small cluster address clusters of different days, and obtaining operation points which can be continuously reached by the couriers in the dispatch areas of the couriers. The clustering method and the system for subdividing the dispatch areas of the couriers can refine the dispatch areas of the couriers and provide small unit areas in which each courier can continuously operate in the respective dispatch area.

Description

Method and system for subdividing dispatch area
Technical Field
The invention relates to the technical field of logistics data processing, in particular to a method and a system for subdividing dispatch areas.
Background
In the mode of the existing logistics industry, each logistics network is divided into a plurality of unit areas, the delivery areas of couriers are further divided according to the geographic positions of the unit areas, and each delivery area is relatively fixed. However, as society develops, express markets gradually grow larger, and in a relatively fixed area, the dispatch task volume of each courier rapidly increases, and in addition, the demand of customers for timeliness and better customer experience requires further subdivision on the dispatch area of the courier.
However, in the actual operation process, the situation that the longitude and latitude information of the address cannot be accurately obtained through the positioning of the map due to the fact that the mail addresses of the senders are not filled in the specifications (the situation that wrongly written characters exist and the like) often occurs, and therefore, the express delivery area cannot be subdivided through a simple geographic distance clustering mode between the mail addresses.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a method and a system for subdividing dispatch areas, which can refine the dispatch areas of couriers and provide small unit areas in which each courier can continuously operate in the respective dispatch area.
In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:
a method of subdividing dispatch areas comprising the steps of:
s1, obtaining a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of an courier in a certain unit area;
s2, aiming at the proper time and the proper address in a specific day, clustering the proper addresses according to the proper time to obtain a plurality of address clusters;
s3, the address clusters in the step S2 are re-divided according to the text similarity among the addresses to obtain a plurality of small cluster address clusters;
s4, repeating the steps S2-S3, obtaining small cluster address clusters of different days, combining the small cluster address clusters of different days, and obtaining operation points which can be continuously reached by the couriers in the dispatch areas of the couriers.
Preferably, in the step S2, the clustering the hit addresses according to the hit time includes the following steps:
s2.1, setting the radius of a time interval, and clustering the appropriate time;
s2.2, evaluating the clustering effect of the time of the appropriate investment to obtain a time interval with the best clustering effect;
and S2.3, dividing one day into a plurality of time intervals according to the time intervals obtained in the step S2.2, and clustering the appropriate addresses.
Preferably, the clustering of the hit time is performed according to the DBSCAN algorithm.
Preferably, the step S2 further includes the steps of: and (4) adjusting the plurality of time intervals obtained in the step (S2.3) to enable the close appropriate time to be positioned in the same time interval, and clustering the appropriate addresses again.
Preferably, the step S3 specifically includes the following steps:
s3.1, aiming at all the address clusters obtained in the step S2, obtaining the address text similarity of a single address cluster according to the text similarity between the addresses in the single address cluster, and then determining the threshold condition for re-dividing the address clusters in the step S2 according to the address text similarity of all the address clusters;
s3.2 aiming at all the address clusters obtained in the step S2, if the address text similarity of the address clusters is smaller than the threshold condition, all the addresses in the address clusters are put together and recorded as a cluster A, and if the address text similarity of the address clusters is larger than the threshold condition, the address clusters are respectively recorded as a cluster B1Cluster B2… Cluster Bn
Preferably, the method for determining the threshold condition includes the following steps:
s3.1.1 obtaining the text similarity of any two addresses in the address cluster according to the word segmentation text vector of the unit area for any one of the address clusters obtained in step S2, and taking the average of the text similarities of all any two addresses in the address cluster as the address text similarity of the address cluster;
s3.1.2 repeating step S3.1.1 to obtain the address text similarity of all address clusters obtained in step S2;
s3.1.3 taking the average value of the similarity of the address texts of all the address clusters as a threshold condition;
preferably, the step S3 further includes the steps of:
s3.3 removal of Cluster A and Cluster B1Cluster B2… Cluster BnInner repeated address, calculating any address in cluster A and cluster B1Cluster B2… Cluster BnThe random address in the cluster A is taken to be respectively similar to the cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into the cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold value condition, the address corresponding to the maximum value is left in the cluster A, wherein the text similarity of the address and the cluster is the average value of the text similarity of the address and all the addresses in the cluster;
s3.4 repeat step S3.3 until no addresses can be placed in cluster B1Cluster B2… Cluster BnIn (2), the remaining addresses in cluster a are denoted as cluster C.
Preferably, in step S4, the merging the small cluster address clusters on different days includes the following steps:
s4.1 Cluster B generated for different days1Cluster B2… Cluster BnMerging clusters related to the same small area to obtain a new cluster B1Cluster B2… Cluster Bn
S4.2 dividing the addresses in the cluster C generated on different days into a new cluster B according to the text similarity between the addresses1Cluster B2… Cluster BnIn (1).
Preferably, the step S4.2 specifically includes the following steps:
s4.2.1 combining clusters C generated on different days to obtain a new cluster C;
s4.2.2 calculating the random addresses in the new cluster C and the new cluster B1Cluster B2… Cluster BnThe random address in the new cluster C is taken as the text similarity of the new cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into a new cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold condition, the address corresponding to the maximum value is left in the new cluster C;
s4.2.3 repeat step S4.2.2 until no addresses in the new cluster C can be placed in the new cluster B1Cluster B2… Cluster BnIn (1).
In another aspect of the present invention, a clustering system for subdividing courier dispatch areas is provided, including:
the system comprises a data input unit, a data processing unit and a data processing unit, wherein the data input unit is used for acquiring a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of a courier in a certain unit area;
the data processing unit is used for clustering the appropriate addresses according to the text similarity between the appropriate time and the addresses by taking a day as a unit to obtain a plurality of small clustered address clusters;
and the data output unit is used for merging the small clustering address clusters on different days to obtain an operating point which can be continuously reached by the courier in the fixed dispatch area.
In another aspect of the invention, an apparatus is provided that includes a processor and a memory; the memory includes instructions executable by the processor to cause the processor to perform one of the above-described methods of subdividing dispatch areas, or a preferred version thereof.
In another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program for implementing one of the above-described methods of subdividing a dispatch area or a preferred version thereof.
The invention has the beneficial effects that: the method and the system for subdividing the dispatch areas can refine the dispatch areas of the couriers, provide small unit areas in which each courier can continuously operate in the respective dispatch area, cluster the routed addresses through text similarity between the routed time and the addresses, avoid subdividing the addresses by adopting geographic distance, and eliminate the influence of irregular conditions in the courier delivery addresses on address clustering. After the operating points which can be continuously reached by the couriers in the couriers dispatching areas are obtained, the method mainly aims at providing help when the new entry couriers are unfamiliar with unit areas and plan dispatching, and can place the couriers belonging to the same operating point together for dispatching so as to improve the working efficiency of the new entry couriers. In another aspect of the present invention, there is also provided an apparatus and a computer-readable storage medium to perform the above method of finely allocating a component area or a preferred aspect thereof.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 shows a flow of a clustering method for subdividing courier dispatch areas according to an embodiment of the present invention and components of a clustering system for subdividing courier dispatch areas according to an embodiment of the present invention;
FIG. 2 illustrates a process flow of a method for initially clustering the committed addresses according to the time of the committed;
fig. 3 shows a detailed flow of step S3;
FIG. 4 shows a specific flow of determining a threshold condition;
fig. 5 shows a process of merging small cluster address clusters on different days.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
Fig. 1 is a flow chart showing a method for subdividing dispatch areas according to an embodiment of the present invention.
The clustering method for subdividing the courier dispatch area comprises the following steps:
s1, obtaining a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of an courier in a certain unit area;
s2, aiming at the proper time and the proper address in a specific day, clustering the proper addresses according to the proper time to obtain a plurality of address clusters;
s3, the address clusters in the step S2 are re-divided according to the text similarity among the addresses to obtain a plurality of small cluster address clusters;
s4, repeating the steps S2-S3, obtaining small cluster address clusters of different days, combining the small cluster address clusters of different days, and obtaining operating points which can be continuously reached by the courier in the fixed delivery area of the courier
In the modern logistics field, each time a courier performs a dispatch task, a bargun is used for recording, and the proper delivery time and the proper delivery address can be obtained by calling the recording of the bargun. In the process of executing the dispatching task, if the appropriate delivery time is closer, the appropriate delivery addresses are not far away, so that the appropriate delivery addresses can be preliminarily clustered according to the appropriate delivery time. However, the rationality of the preliminary clustering performed only according to the time of the appropriateness is not high, so that the preliminary clustering is divided again by introducing text similarity between addresses, and a plurality of small clustered address clusters obtained according to the preliminary clustering method are more reasonable than a clustering method of a pure geographic distance.
As shown in fig. 2, a specific method for clustering the committed addresses according to the committed times is provided, which comprises the following steps:
s2.1, setting the radius of a time interval, and clustering the appropriate time;
s2.2, evaluating the clustering effect of the time of the appropriate investment to obtain a time interval with the best clustering effect;
and S2.3, dividing one day into a plurality of time intervals according to the time intervals obtained in the step S2.2, and clustering the appropriate addresses.
The step of clustering the appropriate time is to set the time interval radius in the interval [10 minutes, 15 minutes ] according to the DBSCAN algorithm and respectively cluster the appropriate time. If the time interval radius is not set for direct clustering, the direct clustering can occur in certain piece dispatching dates, because the pieces of a plurality of continuous areas are not concentrated, the direct clustering causes the phenomenon that some addresses which do not accord with business knowledge are gathered together, the interference effect on the subsequent more detailed address division can be realized, larger errors are easy to generate, the time interval radius is set in advance, and the address clusters obtained by address clustering can be more accurate. The interval radius is in the interval [10 minutes, 15 minutes ] is a preferable scheme, according to survey statistics, the statistical result of courier history on all dispatch time of some continuous unit areas is a range determined based on business experience, and other interval radii can be set according to the specific condition of the unit area responsible for each courier. DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) is a relatively representative Density-Based Clustering algorithm. Unlike the partitioning and hierarchical clustering method, which defines clusters as the largest set of density-connected points, it is possible to partition areas with sufficiently high density into clusters and find clusters of arbitrary shape in a spatial database of noise.
First, the daily bargun break time of each courier is taken out and made into a set. T.. T (n), for example, T (1), T (2). The clustering objective is to divide the time data set T into K clusters and a noise point set. Generating a cluster mark array m (i) ═ j (j >0), when t (i) belongs to the jth cluster; and m (i) ═ 1, and when t (i) is a noise point. The target function is to generate a tag array m (i), i ═ 1, 2, 3.. N, where K is the number of distinct non-negative numbers in { m (i) }. The variable is all bargun hold times per day for the courier.
After the time-to-live is clustered, the effect of clustering the time-to-live needs to be evaluated to obtain a time interval which enables the clustering effect to be optimal.
According to the obtained optimal time interval, dividing one day equally, and preferentially dividing the previous time period according to the optimal time interval when the time cannot be divided equally.
Since the time points of the appropriate placement are discrete and are compulsorily divided at time intervals, which inevitably leads to unreasonable occurrence, and two time points which are originally very close are divided into two time intervals, therefore, a plurality of time intervals obtained in the step S2.3 need to be adjusted to ensure that the close appropriate placement time is positioned in the same time interval, and the appropriate placement addresses are clustered again.
For example, there are now 5 time periods, two of which are [8:30,9:00], [9:00,9:30], in the time period [8:30,9:00], the time of the successful delivery is 8:31, 8:32, 8:50, 8:58, 8:59, in the time period [9:00,9:30], the time of the successful delivery is 9:00,9:01,. 8:59 of the previous interval differs from 9:00 of the next interval by only one minute, obviously, it is more reasonable to place 8:59 in the interval [9:00,9:30], such interval is updated to [8:30,8:58], [8:59,9:30], however, 8:58 and 8:59 also differ by only 1 minute, the fine adjustment is repeated, the 8:58 is also placed in the next interval, a certain time difference (e.g. 3 minutes or 5 minutes, etc.) can be artificially defined as the fault-tolerant interval time of the break point, and performing fine adjustment operation, wherein if the difference time is greater than the specified time, the adjustment is not performed. And according to the adjusted time interval, re-clustering the proper addresses to obtain a plurality of address clusters.
As shown in fig. 3, step S3 specifically includes the following steps:
s3.1, aiming at all the address clusters obtained in the step S2, obtaining the address text similarity of a single address cluster according to the text similarity between the addresses in the single address cluster, and then determining the threshold condition for re-dividing the address clusters in the step S2 according to the address text similarity of all the address clusters;
s3.2 aiming at all the address clusters obtained in the step S2, if the address text similarity of the address clusters is smaller than the threshold condition, all the addresses in the address clusters are put together and recorded as a cluster A, and if the address text similarity of the address clusters is larger than the threshold condition, the address clusters are respectively recorded as a cluster B1Cluster B2… Cluster Bn
In order to avoid the existence of unreasonably divided addresses in the cluster a, as shown in fig. 3, the step S3 further includes the following steps:
s3.3 removal of Cluster A and Cluster B1Cluster B2… Cluster BnInner repeated address, calculating any address in cluster A and cluster B1Cluster B2… Cluster BnThe random address in the cluster A is taken to be respectively similar to the cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into the cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold value condition, the address corresponding to the maximum value is left in the cluster A, wherein the text similarity of the address and the cluster is the average value of the text similarity of the address and all the addresses in the cluster;
s3.4 repeat step S3.3 until no addresses can be placed in cluster B1Cluster B2… Cluster BnIn (2), the remaining addresses in cluster a are denoted as cluster C.
The clusters C and B obtained according to the steps1Cluster B2… Cluster BnI.e. a small cluster address cluster for one day.
As shown in fig. 4, a particular method of determining a threshold condition is provided, comprising the steps of,
s3.1.1 obtaining the text similarity of any two addresses in the address cluster according to the word segmentation text vector of the unit area for any one of the address clusters obtained in step S2, and taking the average of the text similarities of all any two addresses in the address cluster as the address text similarity of the address cluster;
s3.1.2 repeating step S3.1.1 to obtain the address text similarity of all address clusters obtained in step S2;
s3.1.3 takes the average of the similarity of the address texts of all address clusters as the threshold condition.
Assume that step S2 obtains 2 address clusters, which are:
cluster 1: [ Address 1, Address 2]
Cluster 2: [ address 3, address 4, address 5 ].
The word segmentation text vector is a word bank made by extracting key address information word vectors according to the historical address dispatched by the website unit area. And converting each appropriate address into a word vector according to the word stock, and then calculating the cosine similarity of the word vector converted by any two addresses, namely the text similarity of any two addresses.
For the cluster 1, only 1 text similarity can be calculated, that is, the text similarity between the address 1 and the address 2 is x, and the address text similarity of the cluster 1 is x.
For cluster 2, 3 text similarities can be calculated, that is, the text similarity between address 3 and address 4 is y1, the text similarity between address 3 and address 5 is y2, the text similarity between address 4 and address 5 is y3, and the address text similarity of cluster 2 is (y1+ y2+ y 3)/3.
The threshold condition is the average value of the text similarity of the cluster 1 and cluster 2 addresses, namely [ (y1+ y2+ y3)/3+ x ]/2.
Assuming that x is less than the threshold condition, (y1+ y2+ y3)/3 is greater than the threshold condition, then all addresses in cluster 1 are denoted as A and cluster 2 is denoted as B1. Respectively calculating all addresses and B in the cluster A1The text similarity of (1), address 2 and B are calculated respectively1The text similarity of (2). B is1Contains address 3, address 4 and address 5. Addresses 1 and B1The text similarity calculation method of (1) is to calculate the text similarities of address 1 and address 3, address 1 and address 4, and address 1 and address 5, respectively, and then take the average value, which is assumed to be m 1. Recalculating addresses 2 and B1The text similarity of (2) is assumed to be m 2. Taking the larger of m1 and m2, say m2, to compare to the threshold condition, say m2 is greater than the threshold condition, put address 2 into B1In cluster a, address 1 remains, m1 is compared with the threshold condition, and if m1 is smaller than the threshold condition, address 1 remains in cluster a, and address 1 is marked as cluster C.
As shown in fig. 5, a method for merging small clustered address clusters on different days is provided, which includes the following steps:
s4.1 Cluster B generated for different days1Cluster B2… Cluster BnMerging clusters related to the same small area to obtain a new cluster B1Cluster B2… Cluster Bn
S4.2 dividing the addresses in the cluster C generated on different days into a new cluster B according to the text similarity between the addresses1Cluster B2… Cluster BnIn (1).
Specifically, step S4.2 specifically includes the following steps:
s4.2.1 combining clusters C generated on different days to obtain a new cluster C;
s4.2.2 calculating the random addresses in the new cluster C and the new cluster B1Cluster B2… Cluster BnThe random address in the new cluster C is taken as the text similarity of the new cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into a new cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold condition, the address corresponding to the maximum value is left in the new cluster C;
s4.2.3 repeat step S4.2.2 until no addresses in the new cluster C can be placed in the new cluster B1Cluster B2… Cluster BnIn (1).
When enough different days are taken for small cluster address cluster combination, no remaining address exists in the last cluster C, namely all the addresses are classified into a new cluster B1Cluster B2… Cluster BnIn (1).
The clustering method for subdividing courier dispatch areas according to the present invention is described below with a specific example.
First, several groups of corresponding due time and due addresses are obtained from historical dispatch data of a courier in a certain unit area, wherein the due time and the due addresses in one day are shown in table 1.
TABLE 1 time and address of appropriate delivery per day
Figure GDA0003408941090000091
Figure GDA0003408941090000101
Next, using dbscan algorithm, the hit time in the data is clustered within a time interval [10 minutes, 15 minutes ], and according to the clustering effect, the most suitable set time interval is estimated and found, assuming that the most suitable set time interval is 10 minutes.
Time period [8:30,8:40) includes time periods 8:36,8: 38;
time period [8:40,8:50) includes time periods 8:42,8:42,8:42,8: 48;
time period [8:50,9:00) includes times 8:53,8:56,8: 58;
time period [9:00,9:10) includes time periods 9:00,9:01,9: 06;
time period [9:10,9:20) comprises time 9: 11;
the time period [9:50,10:00) includes times 9:57,10: 00.
Artificially regulating the fault-tolerant time of the breakpoint interval to be 3 minutes, and regulating the time interval to become:
time periods [8:36,8:38] 8:36,8: 38;
time period [8:42,8:48)8:42,8:42,8:42,8: 48;
time periods [8:53,9:01]:8:53,8:56,8:58,9:00,9: 01;
time periods [9:06,9:11] 9:06,9: 11;
time periods [9:57,10:00] 9:57,10: 00.
The corresponding address cluster class is:
[ round mound road round mound district of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen city Longgang region Nanwan street Dan head round mound No. 1)
[ round mound road round mound district of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
a round mound road and round mound district of the Nanwan street in the Dragon sentry region of Shenzhen, Guangdong province,
number 1 of the Guangdong Shenzhen city Longsentan region south bay street garden pier west 1,
guangdong Shenzhen city Longgang region Nanwan street Dan head round mound No. 5)
[ round pier No. 0 of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzhong, Nanwan street garden No. 43,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
guangdong Shenzhen city dragon sentry region south bay round road No. 51,
guangdong Shenzhen city Longsentry region south bay street round mound road No. 55)
[ round mound district of south bay street in dragon sentry region of Shenzhen, Guangdong province,
guangdong Shenzhen city dragon hillock Dan Zhu Yuannan No. 1)
[ No. 110 of the Nanwan street Shapinglu in the Dragon sentry region of Shenzhen, Guandong province,
shenzhen, Guangdong province, Nanwan, Shengang district, Shapinglu No. 110)
The address clusters in the step S2 are re-divided according to the text similarity between addresses and the addresses in the clusters are de-duplicated, and then the small cluster address clusters are changed into the following form:
cluster B1
[ No. 1 of round mound of bamboo head of Nanwan street Dan of Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangdong province, Nanwan street round pier east No. 0,
guangdong Shenzhen city Longgang region Nanwan street Dan head round mound No. 5)
Cluster B2
[ round mound road round mound district of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzhong, Nanwan street garden No. 43,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
guangdong Shenzhen city dragon sentry region south bay round road No. 51,
guangdong Shenzhen city Longsentry region south bay street round mound road No. 55)
Cluster C
[ No. 110 of the Nanwan street Shapinglu in the Dragon sentry region of Shenzhen, Guandong province,
number 1 of the Guangdong Shenzhen city Longsentan region south bay street garden pier west 1,
guangdong Shenzhen city dragon hillock Dan Zhu Yuannan No. 1)
The time and address to be paid on the next day are processed, and the time and address to be paid on the next day are shown in table 2.
TABLE 2 time and address of the next day
Destination address Time of day of the day
Round pier district of southern bay street garden road in dragon sentry region of Guangdong Shenzhen city 2017/3/3 8:48
Road pier district of southern bay street garden in dragon sentry region of Shenzhen, Guangdong province 2017/3/3 8:50
Round pier district of southern bay street garden road in dragon sentry region of Guangdong Shenzhen city 2017/3/3 8:51
Guangdong Shenzhen city Longsentry south bay street garden pier No. 1 lane 2017/3/3 8:53
Round pier community of Nanwan street Danzhu in Longsentregion of Guangdong Shenzhen city 2017/3/3 8:54
Guangdong Shenzhen City Dragon sentry region round pier No. 25 2017/3/3 9:02
Round pier district of southern bay street garden road in dragon sentry region of Guangdong Shenzhen city 2017/3/3 9:05
Guangdong Shenzhen city Longsentry south bay street road No. 47 2017/3/3 9:08
Guangdong Shenzhen city Longsentry region south bay street garden mound No. 6 2017/3/3 9:12
Guangdong Shenzhen city dragon sentry region garden pier cell 2017/3/3 9:16
Guangdong Shenzhen city Longsentry region south bay street road No. 11 2017/3/3 9:18
Road pier district of southern bay street garden in dragon sentry region of Shenzhen, Guangdong province 2017/3/3 9:26
Shenzhen, Guandong province, southern Shaping, New City, south Bay street, Shapinglu and south road landscape 2017/3/3 9:58
The small cluster address cluster which becomes the following form after processing:
cluster B1
[ round mound district of south bay street garden road in the dragon sentry region of Shenzhen, Guangdong province,
a round mound community of the Danzhu community in the Nanwan street of the Dragon sentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzheng region round mound road No. 25,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
a park region garden mound region of Guangdong Shenzhen city,
guangdong Shenzhen city dragon Bay street garden road No. 11)
Cluster C
[ No. 1 number of the Guangdong Shenzhen city Longsentry south bay street garden pier west 1 lane,
guangdong Shenzhen, Guangzhong, Nanwan street garden mound No. 6,
shenzhen, Guangdong province, city, Nanwan, street, Shaping, south road, landscape, New City
Merging the first and second days with respect to the clusters of the same small area, cluster B of the first day2Cluster B of the second day1Is related to the cluster of the same small area, and the addresses in the two clusters are merged and recorded as a new cluster B2The second day has no cluster B which can be associated with the first day1Merged clusters, cluster B of the first day1Is marked as a new cluster B1Merging the clusters C of two days, marking as a new cluster C, and removing repeated addresses to change into the following form:
new cluster B1
[ No. 1 of round mound of bamboo head of Nanwan street Dan of Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangdong province, Nanwan street round pier east No. 0,
guangdong Shenzhen city Longgang region Nanwan street Dan head round mound No. 5)
New cluster B2
[ round mound road round mound district of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzhong, Nanwan street garden No. 43,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
guangdong Shenzhen city dragon sentry region south bay round road No. 51,
guangdong Shenzhen, Guangzhong, Nanwan street round mound road No. 55,
a round mound community of the Danzhu community in the Nanwan street of the Dragon sentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzheng region round mound road No. 25,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
a park region garden mound region of Guangdong Shenzhen city,
guangdong Shenzhen city dragon Bay street garden road No. 11)
New cluster C
[ No. 110 of the Nanwan street Shapinglu in the Dragon sentry region of Shenzhen, Guandong province,
the Danhu Yuannan No. 1 of the Dragon hillock region of Shenzhen, Guangdong province,
number 1 of the Guangdong Shenzhen city Longsentan region south bay street garden pier west 1,
guangdong Shenzhen, Guangzhong, Nanwan street garden mound No. 6,
shenzhen, Guangdong province, city, Nanwan, street, Shaping, south road, landscape, New City
Calculating the random address in the new cluster C and the new cluster B respectively1Cluster B2The similarity of the texts is judged according to a threshold condition, and the address of 'Guandong Shenzhen Nanwan street garden Tunton No. 6' and the cluster B are found1If the text similarity is greater than the threshold value, a new cluster B can be placed1In (b), the following forms are changed:
new cluster B1
[ No. 1 of round mound of bamboo head of Nanwan street Dan of Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangdong province, Nanwan street round pier east No. 0,
the round mound of the head of the Dan bamboo in the Nanwan of the Dragon sentry region of Guangdong Shenzhen, Guangdong province No. 5,
guangdong Shenzhen city Longsentry region south bay street garden mound No. 6)
New cluster B2
[ round mound road round mound district of Nanwan street in Longsentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzhong, Nanwan street garden No. 43,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
guangdong Shenzhen city dragon sentry region south bay round road No. 51,
guangdong Shenzhen, Guangzhong, Nanwan street round mound road No. 55,
a round mound community of the Danzhu community in the Nanwan street of the Dragon sentry region of Shenzhen, Guangdong province,
guangdong Shenzhen, Guangzheng region round mound road No. 25,
guangdong Shenzhen city dragon sentry region south bay street road No. 47,
a park region garden mound region of Guangdong Shenzhen city,
guangdong Shenzhen city dragon Bay street garden road No. 11)
New cluster C
[ No. 110 of the Nanwan street Shapinglu in the Dragon sentry region of Shenzhen, Guandong province,
the Danhu Yuannan No. 1 of the Dragon hillock region of Shenzhen, Guangdong province,
number 1 of the Guangdong Shenzhen city Longsentan region south bay street garden pier west 1,
shenzhen, Guangdong province, city, Nanwan, street, Shaping, south road, landscape, New City
It is expected that cluster B will continue to appear after continuing to process the time to failure and address to failure on third day …, fourth day …, day N3Cluster B4… Cluster BnThe addresses in the cluster C can be divided slowly, and all the time for putting and the cluster B obtained after the address processing is finished are all the time for putting and the address for putting1Cluster B2… Cluster BnI.e. a small unit area in which a courier can continuously work in the delivery area.
As shown in fig. 1, a clustering system for subdividing courier dispatch areas according to an embodiment of the present invention is shown, including:
the system comprises a data input unit, a data processing unit and a data processing unit, wherein the data input unit is used for acquiring a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of a courier in a certain unit area;
the data processing unit is used for clustering the appropriate addresses according to the text similarity between the appropriate time and the addresses by taking a day as a unit to obtain a plurality of small clustered address clusters;
and the data output unit is used for merging the small clustering address clusters on different days to obtain an operating point which can be continuously reached by the courier in the fixed dispatch area.
The data input unit can read the appropriate delivery time and the appropriate delivery address from the gun of the courier, or pull the appropriate delivery time and the appropriate delivery address from a background management database of the logistics company and send the appropriate delivery time and the appropriate delivery address to the data processing unit. The data processing unit divides the time to put and the address to put into the address. And finally, combining the small clustering address clusters on different days by the data output unit to obtain the operation points which can be continuously reached by the couriers in the delivery area.
The invention also provides an apparatus comprising a processor and a memory; the memory contains instructions executable by the processor to cause the processor to perform one of the above-described methods of subdividing dispatch areas, or its preferred versions, to obtain work points that are consecutively reachable by couriers in their dispatch areas.
The present invention also provides a computer-readable storage medium having stored thereon a computer program for implementing one of the above-described methods of subdividing dispatch areas, or a preferred version thereof, to obtain work sites that couriers can continuously reach in their dispatch areas. The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (11)

1. A method of subdividing dispatch areas comprising the steps of:
s1, obtaining a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of an courier in a certain unit area;
s2, aiming at the proper time and the proper address in a specific day, clustering the proper addresses according to the proper time to obtain a plurality of address clusters;
s2.1, setting the radius of a time interval, and clustering the appropriate time;
s2.2, evaluating the clustering effect of the time of the appropriate investment to obtain a time interval with the best clustering effect;
s2.3, dividing one day into a plurality of time intervals according to the time intervals obtained in the step S2.2, and clustering the proper addresses;
s3, the address clusters in the step S2 are re-divided according to the text similarity among the addresses to obtain a plurality of small cluster address clusters;
and S4, repeating the steps S2-S3, obtaining small cluster address clusters on different days, and combining the small cluster address clusters on different days to obtain operating points which can be continuously reached by the couriers in the dispatch areas of the couriers.
2. The method of subdividing dispatch areas of claim 1, wherein said clustering of the time to touch is performed in accordance with the DBSCAN algorithm.
3. The method for subdividing a dispatch area as in claim 1, wherein said step S2 further comprises the steps of: and (4) adjusting the plurality of time intervals obtained in the step (S2.3) to enable the appropriate time to be in the same time interval, and clustering the appropriate addresses again.
4. The method for subdividing dispatch areas of claim 1, wherein said step S3 specifically comprises the steps of:
s3.1, aiming at all the address clusters obtained in the step S2, obtaining the address text similarity of a single address cluster according to the text similarity between the addresses in the single address cluster, and then determining the threshold condition for re-dividing the address clusters in the step S2 according to the address text similarity of all the address clusters;
s3.2 aiming at all the address clusters obtained in the step S2, if the address text similarity of the address clusters is smaller than the threshold condition, all the addresses in the address clusters are put together and recorded as a cluster A, and if the address text similarity of the address clusters is larger than the threshold condition, the address clusters are respectively recorded as a cluster B1Cluster B2… Cluster Bn
5. The method of subdividing dispatch areas as in claim 4, wherein said threshold condition determination method comprises the steps of:
s3.1.1 obtaining the text similarity of any two addresses in the address cluster according to the word segmentation text vector of the unit area for any one of the address clusters obtained in step S2, and taking the average of the text similarities of all any two addresses in the address cluster as the address text similarity of the address cluster;
s3.1.2 repeating step S3.1.1 to obtain the address text similarity of all address clusters obtained in step S2;
s3.1.3 takes the average of the similarity of the address texts of all address clusters as the threshold condition.
6. The method for subdividing a dispatch area as in claim 4, wherein said step S3 further comprises the steps of:
s3.3 removal of Cluster A and Cluster B1Cluster B2… Cluster BnInner repeated address, calculating any address in cluster A and cluster B1Cluster B2… Cluster BnThe random address in the cluster A is taken to be respectively similar to the cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into the cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold value condition, the address corresponding to the maximum value is left in the cluster A, wherein the text similarity of the address and the cluster is the average value of the text similarity of the address and all the addresses in the cluster;
s3.4 repeat step S3.3 until no addresses can be placed in cluster B1Cluster B2… Cluster BnIn (2), the remaining addresses in cluster a are denoted as cluster C.
7. The method for subdividing dispatch areas of claim 6, wherein said step S4, merging small cluster address clusters of different days comprises the steps of:
s4.1 Cluster B generated for different days1Cluster B2… Cluster BnMerging clusters related to the same small area to obtain a new cluster B1Cluster B2… Cluster Bn
S4.2 dividing the addresses in the cluster C generated on different days into a new cluster B according to the text similarity between the addresses1Cluster B2… Cluster BnIn (1).
8. Method for subdividing dispatch areas according to claim 7, wherein said step S4.2 comprises in particular the steps of:
s4.2.1 combining clusters C generated on different days to obtain a new cluster C;
s4.2.2 calculating the random addresses in the new cluster C and the new cluster B1Cluster B2… Cluster BnThe random address in the new cluster C is taken as the text similarity of the new cluster B1Cluster B2… Cluster BnIf the maximum value is larger than the threshold value condition, the address corresponding to the maximum value is put into a new cluster B corresponding to the maximum value1Cluster B2… Cluster BnIf the maximum value is smaller than the threshold condition, the address corresponding to the maximum value is left in the new cluster C;
s4.2.3 repeat step S4.2.2 until no addresses in the new cluster C can be placed in the new cluster B1Cluster B2… Cluster BnIn (1).
9. A system for subdividing dispatch areas, comprising:
the system comprises a data input unit, a data processing unit and a data processing unit, wherein the data input unit is used for acquiring a plurality of groups of corresponding appropriate delivery time and appropriate delivery addresses from historical delivery data of a courier in a certain unit area;
the data processing unit is used for clustering the appropriate addresses according to the text similarity between the appropriate time and the addresses by taking a day as a unit to obtain a plurality of small clustered address clusters; the method comprises the following steps: setting the radius of a time interval, and clustering the appropriate time; evaluating the clustering effect of the time of the appropriate investment to obtain the time interval with the best clustering effect; dividing one day into a plurality of time intervals according to the time intervals, and clustering the appropriate addresses; the address clusters are re-divided according to the text similarity among the addresses to obtain a plurality of small clustered address clusters;
and the data output unit is used for merging the small clustering address clusters on different days to obtain an operating point which can be continuously reached by the courier in the fixed dispatch area.
10. An apparatus for subdividing dispatch areas, comprising a processor and a memory; the method is characterized in that:
the memory contains instructions executable by the processor to cause the processor to perform the method of any of claims 1-8.
11. A computer-readable storage medium, having stored thereon a computer program for implementing the method according to any one of claims 1-8.
CN201810538551.1A 2018-05-30 2018-05-30 Method and system for subdividing dispatch area Active CN110555448B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810538551.1A CN110555448B (en) 2018-05-30 2018-05-30 Method and system for subdividing dispatch area

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810538551.1A CN110555448B (en) 2018-05-30 2018-05-30 Method and system for subdividing dispatch area

Publications (2)

Publication Number Publication Date
CN110555448A CN110555448A (en) 2019-12-10
CN110555448B true CN110555448B (en) 2022-03-29

Family

ID=68734148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810538551.1A Active CN110555448B (en) 2018-05-30 2018-05-30 Method and system for subdividing dispatch area

Country Status (1)

Country Link
CN (1) CN110555448B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262804A1 (en) * 2016-03-14 2017-09-14 United Parcel Service Of America, Inc. Determining estimated pick-up/delivery windows using clustering
CN115271354B (en) * 2022-06-24 2023-08-25 湖南湘邮科技股份有限公司 Service electronic fence dynamic planning method and system based on delivery unit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593747A (en) * 2013-11-07 2014-02-19 银江股份有限公司 Large-scale client point classified dispatching method based on meanshift classification
CN107305577A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Correct-distribute address date processing method and system based on K-means
CN107451673A (en) * 2017-06-14 2017-12-08 北京小度信息科技有限公司 Dispense region partitioning method and device
CN107844885A (en) * 2017-09-05 2018-03-27 北京小度信息科技有限公司 Information-pushing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593747A (en) * 2013-11-07 2014-02-19 银江股份有限公司 Large-scale client point classified dispatching method based on meanshift classification
CN107305577A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Correct-distribute address date processing method and system based on K-means
CN107451673A (en) * 2017-06-14 2017-12-08 北京小度信息科技有限公司 Dispense region partitioning method and device
CN107844885A (en) * 2017-09-05 2018-03-27 北京小度信息科技有限公司 Information-pushing method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Minimization of Transportation Cost in Courier Service Industry;Muthu karthikeyan et al.;《International Journal of Innovative Research in Science,Engineering and Technology》;20140330;第3卷(第3期);第1116-1122页 *
Mining the most influential k-location set from massive trajectories;Yuhong Li et al.;《SIGSPACIAL "16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems》;20161031;第1-4页 *
基于密度的轨迹时空聚类分析;吴笛;《地球信息科学学报》;20151009;第17卷(第10期);第1162-1171页 *
轨迹系列3-通过时间及距离维度进行轨迹聚类平滑的一种方案;李晓辉;《博客园》;20170331;第1-4页 *

Also Published As

Publication number Publication date
CN110555448A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110502509B (en) Traffic big data cleaning method based on Hadoop and Spark framework and related device
CN104050240A (en) Method and device for determining categorical attribute of search query word
CN109446207A (en) A kind of normal address database update method and address matching method
US8768866B2 (en) Computer-implemented systems and methods for forecasting and estimation using grid regression
CN107305577B (en) K-means-based appropriate address data processing method and system
CN102725753A (en) Method and apparatus for optimizing data access, method and apparatus for optimizing data storage
CN103870474A (en) News topic organizing method and device
US20200334246A1 (en) Information processing device, combination condition generation method, and combination condition generation program
CN110543562A (en) Event map-based automatic urban management event distribution method and system
CN103699441A (en) MapReduce report task execution method based on task granularity
CN110555448B (en) Method and system for subdividing dispatch area
WO2021169174A1 (en) Road congestion degree prediction method, apparatus, computer device, and readable storage medium
CN111353620A (en) Method, device and equipment for constructing network point component prediction model and storage medium
CN105808582A (en) Parallel generation method and device of decision tree on the basis of layered strategy
CN113868351A (en) Address clustering method and device, electronic equipment and storage medium
CN114372060A (en) Data storage method, device, equipment and storage medium
CN112100177A (en) Data storage method and device, computer equipment and storage medium
CN109190816B (en) GIS-based commodity distribution center site selection method
CN115374944B (en) Model reasoning method and device, electronic equipment and storage medium
CN115146653B (en) Dialogue scenario construction method, device, equipment and storage medium
Othman et al. Malaysia’s tourism demand: a gravity model approach
CN111984637B (en) Missing value processing method and device in data modeling, equipment and storage medium
CN114692871A (en) Decision tree training method, waybill type identification device, equipment and medium
CN103823843A (en) Gauss mixture model tree and incremental clustering method thereof
CN112948469A (en) Data mining method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant